CalistaTable

class calista.table.CalistaTable(engine: LazyEngine)

Bases: object

analyze(rule_name: str, rule: Condition) → Metrics

Compute Metrics based on a condition.

Args:: rule_name (str): The name of the rule. rule (Condition): The Condition to evaluate.
Returns:: Metrics: The metrics resulting from the analysis.
Raises:: Any exceptions raised by the analyze_rules method.

analyze_rules(rules: Dict[str, Condition]) → List[Metrics]

Compute List[Metrics] based on rules.

Args:: rules (dict[RuleName, Condition]): The name of the rules and the conditions to execute.
Returns:: List[Metrics]: The metrics resulting from the analysis.
Raises:: Any exceptions raised by the engine’s execute_conditions method.

apply_rule(rule: Condition, rule_name: str | None = None) → DataFrameType

Returns the dataset with new columns of booleans for given rule.

Args:: rule (Condition): The Condition to execute. rule_name (str): Name of the rule (Default: None)
Returns:: DataFrameType: The dataset with the new column resulting from the analysis.

apply_rules(rules: Dict[str, Condition]) → DataFrameType

Returns the dataset with new columns of booleans for each rules or the given condition.

Args:: rules (Dict[RuleName, Condition]): The name of the rules and the conditions to execute.
Returns:: DataFrameType: The dataset with new columns resulting from the analysis.

filter(condition: Condition) → CalistaTable

get_invalid_rows(rule: Condition) → DataFrameType

Returns the dataset filtered with the rows not validating the rules.

Args:: rule (Condition): The Condition to evaluate.
Returns:: DataFrameType: The dataset filtered with the rows where the rule is not satisfied.

get_valid_rows(rule: Condition) → DataFrameType

Returns the dataset filtered with the rows validating the rules.

Args:: rule (Condition): The Condition to evaluate.
Returns:: DataFrameType: The dataset filtered with the rows where the rule is satisfied.

group_by(*cols: str) → GroupedTable

Groups the CalistaTable using the specified columns, so we can execute aggregation conditions on them. See GroupedTable for all the available functions after calling group_by.

Args:: cols (list, str):columns to group by. Each element should be a column name (string).

property schema: dict[str, str]

Returns the schema of the underlying dataset.

Returns:: Dict[ColumnName, PythonType]: Dict representing the schema of the underlying dataset.

show(n: int = 10) → None

Prints the first n rows to the console.

Args:: n (int, optional): Number of rows to show

where(condition: Condition) → CalistaTable

Filters rows using the given condition.

filter() is an alias for where().

Args:: condition : Condition
Returns:: CalistaTable: Filtered CalistaTable.