GroupedTable

class calista.group.GroupedTable(engine, agg_keys)

Bases: object

analyze(rule_name: str, rule: AggregateCondition) Metrics

Compute Metrics based on a condition.

Args:
  • rule_name (str): The name of the rule.

  • rule (AggregateCondition): The aggregate condition to evaluate.

Returns:

Metrics: The metrics resulting from the analysis.

Raises:

Any exceptions raised by the engine’s execute_condition method.

Example

>>> from calista import CalistaEngine
>>> from calista import functions as func
>>>
>>> # Create your CalistaTable
>>> calista_table = CalistaEngine(engine="pandas").load_from_dict({"TEAM": ["red", "red", "red", "blue", "blue", "blue"],
>>>                                                                "POINTS": [10, 20, 30, 40, 20, 10]})
>>>
>>> # Define your rule
>>> my_rule = func.sum_gt_value(col_name="POINTS", value=65)
>>>
>>> # Generate and print your metrics
>>> metrics = calista_table.group_by("TEAM").analyze(rule_name="Total points higher than 65", rule=my_rule)
>>> print(metrics)
>>> rule_name : Total points higher than 65
>>> total_row_count : 2
>>> valid_row_count : 1
>>> valid_row_count_pct : 50.0
>>> timestamp : 2024-01-01 00:00:00.000000
analyze_rules(rules: Dict[str, AggregateCondition]) List[Metrics]

Compute Metrics based on a condition.

Args:

rules (dict[RuleName, AggregateCondition]): The name of the rules and the aggregate conditions to execute.

Returns:

List[Metrics]: The metrics resulting from the analysis.

Raises:

Any exceptions raised by the engine’s execute_condition method.

Example

>>> from calista import CalistaEngine
>>> from calista import functions as func
>>>
>>> # Create your CalistaTable
>>> calista_table = CalistaEngine(engine="pandas").load_from_dict({"TEAM": ["red", "red", "red", "blue", "blue", "blue"],
>>>                                                                "POINTS": [10, 20, 30, 40, 20, 10]})
>>>
>>> # Define your rules
>>> my_rule = func.sum_gt_value(col_name="POINTS", value=65)
>>> my_rule_2 = func.median_eq_value(col_name="POINTS", value=20)
>>>
>>> # Generate and print your metrics
>>> metrics = calista_table.group_by("TEAM").analyze_rules({"Total points higher than 65": my_rule,
>>>                                                           "Median of the team equals 20": my_rule_2})
>>> for metric in metrics:
>>>     print(metrics)
>>>     print("-----------------")
>>> rule_name : Total points higher than 65
>>> total_row_count : 2
>>> valid_row_count : 1
>>> valid_row_count_pct : 50.0
>>> timestamp : 2024-01-01 00:00:00.000000
>>> -----------------
>>> rule_name : Median of the team equals 20
>>> total_row_count : 2
>>> valid_row_count : 2
>>> valid_row_count_pct : 100.0
>>> timestamp : 2024-01-01 00:00:00.000000
apply_rule(rule: AggregateCondition) DataFrameType

Returns the dataset with new columns of booleans for given condition.

Args:

rule (AggregateCondition): The aggregate condition to execute.

Returns:

DataFrameType: The aggregated dataset with the new column resulting from the analysis.

Example

>>> from calista import CalistaEngine
>>> from calista import functions as func
>>>
>>> # Create your CalistaTable
>>> calista_table = CalistaEngine(engine="pandas").load_from_dict({"TEAM": ["red", "red", "red", "blue", "blue", "blue"],
>>>                                                                "POINTS": [10, 20, 30, 40, 20, 10]})
>>>
>>> # Define your rule
>>> my_rule = func.sum_gt_value(col_name="POINTS", value=65)
>>>
>>> # Generate and print the resulting dataframe
>>> df_result = calista_table.group_by("TEAM").apply_rule(my_rule)
>>> print(df_result)
>>>          SUM_POINTS    RESULT
>>>    TEAM
>>>    blue          70      True
>>>    red           60     False
apply_rules(rules: Dict[str, AggregateCondition]) DataFrameType

Returns the dataset with new columns of booleans for each rules or the given condition.

Args:

rules (Dict[RuleName, AggregateCondition]): The name of the rules and the aggregate conditions to execute.

Returns:

DataFrameType: The aggregate dataset with new columns resulting from the analysis.

Example

>>> from calista import CalistaEngine
>>> from from calista import functions as func
>>>
>>> # Create your CalistaTable
>>> calista_table = CalistaEngine(engine="pandas").load_from_dict({"TEAM": ["red", "red", "red", "blue", "blue", "blue"],
>>>                                                                "POINTS": [10, 20, 30, 40, 20, 10]})
>>>
>>> # Define your rules
>>> my_rule = func.sum_gt_value(col_name="POINTS", value=65)
>>> my_rule_2 = func.median_eq_value(col_name="POINTS", value=20)
>>>
>>> # Generate and print the resulting dataframe
>>> df_result = calista_table.group_by("TEAM").apply_rules({"Total points higher than 65": my_rule,
>>>                                                         "Median of the team equals 20": my_rule_2})
>>> print(df_result)
>>>          SUM_POINTS  MEDIAN_POINTS  Total points higher than 65  Median of the team equals 20
>>>    TEAM
>>>    blue          70           20.0                         True                          True
>>>    red           60           20.0                        False                          True
get_invalid_rows(rule: AggregateCondition, granular=False) DataFrameType

Returns the dataset filtered with the rows not validating the rules.

Args:
  • rule (AggregateCondition): The aggregate condition to evaluate.

  • granular (bool, optional): default False. Whether or not to retrieve the data at the granular level.

Returns:

DataFrameType: The aggregated dataset filtered with the rows where the condition is not satisfied.

Example

>>> from calista import CalistaEngine
>>> from calista import functions as func
>>>
>>> # Create your CalistaTable
>>> calista_table = CalistaEngine(engine="pandas").load_from_dict({"TEAM": ["red", "red", "red", "blue", "blue", "blue"],
>>>                                                                "POINTS": [10, 20, 30, 40, 20, 10]})
>>>
>>> # Define your rule
>>> my_rule = func.sum_gt_value(col_name="POINTS", value=65)
>>>
>>> # Generate and print the resulting dataframe
>>> df_result = calista_table.group_by("TEAM").get_invalid_rows(my_rule)
>>> print(df_result)
>>>          SUM_POINTS
>>>    TEAM
>>>    red           60
get_valid_rows(rule: AggregateCondition, granular=False) DataFrameType

Returns the dataset filtered with the rows validating the rules.

Args:
  • rule (AggregateCondition): The aggregate condition to evaluate.

  • granular (bool, optional): default False. Whether or not to retrieve the data at the granular level.

Returns:

DataFrameType: The aggregated dataset filtered with the rows where the condition is satisfied.

Example

>>> from calista import CalistaEngine
>>> from calista import functions as func
>>>
>>> # Create your CalistaTable
>>> calista_table = CalistaEngine(engine="pandas").load_from_dict({"TEAM": ["red", "red", "red", "blue", "blue", "blue"],
>>>                                                                "POINTS": [10, 20, 30, 40, 20, 10]})
>>>
>>> # Define your rule
>>> my_rule = func.sum_gt_value(col_name="POINTS", value=65)
>>>
>>> # Generate and print the resulting dataframe
>>> df_result = calista_table.group_by("TEAM").get_valid_rows(my_rule)
>>> print(df_result)
>>>          SUM_POINTS
>>>    TEAM
>>>    blue          70