GroupedTable
- class calista.group.GroupedTable(engine, agg_keys)
Bases:
object- analyze(rule_name: str, rule: AggregateCondition) Metrics
Compute
Metricsbased on a condition.- Args:
rule_name (str): The name of the rule.
rule (AggregateCondition): The aggregate condition to evaluate.
- Returns:
Metrics: The metrics resulting from the analysis.- Raises:
Any exceptions raised by the engine’s execute_condition method.
Example
>>> from calista import CalistaEngine >>> from calista import functions as func >>> >>> # Create your CalistaTable >>> calista_table = CalistaEngine(engine="pandas").load_from_dict({"TEAM": ["red", "red", "red", "blue", "blue", "blue"], >>> "POINTS": [10, 20, 30, 40, 20, 10]}) >>> >>> # Define your rule >>> my_rule = func.sum_gt_value(col_name="POINTS", value=65) >>> >>> # Generate and print your metrics >>> metrics = calista_table.group_by("TEAM").analyze(rule_name="Total points higher than 65", rule=my_rule) >>> print(metrics)
>>> rule_name : Total points higher than 65 >>> total_row_count : 2 >>> valid_row_count : 1 >>> valid_row_count_pct : 50.0 >>> timestamp : 2024-01-01 00:00:00.000000
- analyze_rules(rules: Dict[str, AggregateCondition]) List[Metrics]
Compute
Metricsbased on a condition.- Args:
rules (dict[RuleName, AggregateCondition]): The name of the rules and the aggregate conditions to execute.
- Returns:
List[Metrics]: The metrics resulting from the analysis.- Raises:
Any exceptions raised by the engine’s execute_condition method.
Example
>>> from calista import CalistaEngine >>> from calista import functions as func >>> >>> # Create your CalistaTable >>> calista_table = CalistaEngine(engine="pandas").load_from_dict({"TEAM": ["red", "red", "red", "blue", "blue", "blue"], >>> "POINTS": [10, 20, 30, 40, 20, 10]}) >>> >>> # Define your rules >>> my_rule = func.sum_gt_value(col_name="POINTS", value=65) >>> my_rule_2 = func.median_eq_value(col_name="POINTS", value=20) >>> >>> # Generate and print your metrics >>> metrics = calista_table.group_by("TEAM").analyze_rules({"Total points higher than 65": my_rule, >>> "Median of the team equals 20": my_rule_2}) >>> for metric in metrics: >>> print(metrics) >>> print("-----------------")
>>> rule_name : Total points higher than 65 >>> total_row_count : 2 >>> valid_row_count : 1 >>> valid_row_count_pct : 50.0 >>> timestamp : 2024-01-01 00:00:00.000000 >>> ----------------- >>> rule_name : Median of the team equals 20 >>> total_row_count : 2 >>> valid_row_count : 2 >>> valid_row_count_pct : 100.0 >>> timestamp : 2024-01-01 00:00:00.000000
- apply_rule(rule: AggregateCondition) DataFrameType
Returns the dataset with new columns of booleans for given condition.
- Args:
rule (AggregateCondition): The aggregate condition to execute.
- Returns:
DataFrameType: The aggregated dataset with the new column resulting from the analysis.
Example
>>> from calista import CalistaEngine >>> from calista import functions as func >>> >>> # Create your CalistaTable >>> calista_table = CalistaEngine(engine="pandas").load_from_dict({"TEAM": ["red", "red", "red", "blue", "blue", "blue"], >>> "POINTS": [10, 20, 30, 40, 20, 10]}) >>> >>> # Define your rule >>> my_rule = func.sum_gt_value(col_name="POINTS", value=65) >>> >>> # Generate and print the resulting dataframe >>> df_result = calista_table.group_by("TEAM").apply_rule(my_rule) >>> print(df_result) >>> SUM_POINTS RESULT >>> TEAM >>> blue 70 True >>> red 60 False
- apply_rules(rules: Dict[str, AggregateCondition]) DataFrameType
Returns the dataset with new columns of booleans for each rules or the given condition.
- Args:
rules (Dict[RuleName, AggregateCondition]): The name of the rules and the aggregate conditions to execute.
- Returns:
DataFrameType: The aggregate dataset with new columns resulting from the analysis.
Example
>>> from calista import CalistaEngine >>> from from calista import functions as func >>> >>> # Create your CalistaTable >>> calista_table = CalistaEngine(engine="pandas").load_from_dict({"TEAM": ["red", "red", "red", "blue", "blue", "blue"], >>> "POINTS": [10, 20, 30, 40, 20, 10]}) >>> >>> # Define your rules >>> my_rule = func.sum_gt_value(col_name="POINTS", value=65) >>> my_rule_2 = func.median_eq_value(col_name="POINTS", value=20) >>> >>> # Generate and print the resulting dataframe >>> df_result = calista_table.group_by("TEAM").apply_rules({"Total points higher than 65": my_rule, >>> "Median of the team equals 20": my_rule_2}) >>> print(df_result)
>>> SUM_POINTS MEDIAN_POINTS Total points higher than 65 Median of the team equals 20 >>> TEAM >>> blue 70 20.0 True True >>> red 60 20.0 False True
- get_invalid_rows(rule: AggregateCondition, granular=False) DataFrameType
Returns the dataset filtered with the rows not validating the rules.
- Args:
rule (AggregateCondition): The aggregate condition to evaluate.
granular (bool, optional): default
False. Whether or not to retrieve the data at the granular level.
- Returns:
DataFrameType: The aggregated dataset filtered with the rows where the condition is not satisfied.
Example
>>> from calista import CalistaEngine >>> from calista import functions as func >>> >>> # Create your CalistaTable >>> calista_table = CalistaEngine(engine="pandas").load_from_dict({"TEAM": ["red", "red", "red", "blue", "blue", "blue"], >>> "POINTS": [10, 20, 30, 40, 20, 10]}) >>> >>> # Define your rule >>> my_rule = func.sum_gt_value(col_name="POINTS", value=65) >>> >>> # Generate and print the resulting dataframe >>> df_result = calista_table.group_by("TEAM").get_invalid_rows(my_rule) >>> print(df_result)
>>> SUM_POINTS >>> TEAM >>> red 60
- get_valid_rows(rule: AggregateCondition, granular=False) DataFrameType
Returns the dataset filtered with the rows validating the rules.
- Args:
rule (AggregateCondition): The aggregate condition to evaluate.
granular (bool, optional): default
False. Whether or not to retrieve the data at the granular level.
- Returns:
DataFrameType: The aggregated dataset filtered with the rows where the condition is satisfied.
Example
>>> from calista import CalistaEngine >>> from calista import functions as func >>> >>> # Create your CalistaTable >>> calista_table = CalistaEngine(engine="pandas").load_from_dict({"TEAM": ["red", "red", "red", "blue", "blue", "blue"], >>> "POINTS": [10, 20, 30, 40, 20, 10]}) >>> >>> # Define your rule >>> my_rule = func.sum_gt_value(col_name="POINTS", value=65) >>> >>> # Generate and print the resulting dataframe >>> df_result = calista_table.group_by("TEAM").get_valid_rows(my_rule) >>> print(df_result)
>>> SUM_POINTS >>> TEAM >>> blue 70