CalistaTable

class calista.table.CalistaTable(engine: str, config: Dict[str, Any] | None = None)

Bases: object

For now, you can execute data quality checks using the following engines or platforms: spark, pandas, polars, snowflake, bigquery, postgre.

analyze(rule_name: str, condition: Condition) → Metrics

Compute Metrics based on a condition.

Args:: rule_name (str): The name of the rule. condition (Condition): The condition to evaluate.
Returns:: Metrics: The metrics resulting from the analysis.
Raises:: Any exceptions raised by the analyze_rules method.

analyze_rules(rules: dict[slice(<class 'str'>, <class 'str'>, None), ~calista.core._conditions.Condition]) → list[Metrics]

Compute list[Metrics] based on conditions.

Args:: rules (dict[RuleName, Condition]): The name of the rules and the conditions to execute.
Returns:: list[Metrics]: The metrics resulting from the analysis.
Raises:: Any exceptions raised by the engine’s execute_conditions method.

groupBy(*cols: str) → GroupedTable

Load data from a dataset into a CalistaTable.

Parameters:

path ((str, optional)) – The path if you’re loading a file.
file_format ((str, optional)) – The format of the file (e.g., ‘csv’, ‘parquet’).
data ((dict)) – The dictionary containing the data of the table.
table ((str, optional)) – The name of the table if you’re not loading a file.
schema ((str, optional)) – The schema containing the table.
database ((str, optional)) – The database containing the table.
dataframe ((Any, optional)) – An existing dataframe.
options ((Dict[str, Any], optional)) – Additional configuration file options.

Returns:: CalistaTable: The loaded table.
Raises:: Any exceptions raised by the engine’s read_dataset method.

load_from_database(table: Any, schema: str | None = None, database: str | None = None) → CalistaTable

Load data from a table into a CalistaTable.

Parameters:

table ((str)) – The name of the table.
schema ((str, optional)) – The schema containing the table
database ((str, optional)) – The database containing the table.

Returns:: CalistaTable: The loaded table.
Raises:: Any exceptions raised by the engine’s read_dataset method.

load_from_dataframe(dataframe: Any) → CalistaTable

Load data from a dataframe into a CalistaTable.

Parameters:: dataframe ((Any)) – An existing dataframe.

Returns:: CalistaTable: The loaded table.
Raises:: Any exceptions raised by the engine’s read_dataset method.

load_from_dict(data: Dict[str, List]) → CalistaTable

Load data from a dictionary into a CalistaTable.

Parameters:: data ((dict)) – The dictionary containing the data of the table.

Returns:: CalistaTable: The loaded table.
Raises:: Any exceptions raised by the engine’s read_dataset method.

Example

>>> calista_table = CalistaTable(engine = "spark").load_from_dict({"ID": [1, 2, 3, 4]})
>>> calista_table.show()
+---+
| ID|
+---+
|  1|
|  2|
|  3|
|  4|
+---+

load_from_path(path: str, file_format: str, options: Dict[str, Any] | None = None) → CalistaTable

Load data from a path into a CalistaTable.

Parameters:

path ((str)) – The path of the file containing your table.
file_format ((str, optional)) – The format of the file (e.g., ‘csv’, ‘parquet’).

Returns:: CalistaTable: The loaded table.
Raises:: Any exceptions raised by the engine’s read_dataset method.

Example

>>> csv_options = {
>>> "delimiter": ",",
>>> "header": "True"
>>> }
>>> calista_table = CalistaTable(engine = "spark").load_from_path(path='my_csv.csv',file_format="csv",options=csv_options)

show(n: int = 10) → None

Prints the first n rows to the console.

Args:: n (int, optional): Number of rows to show