CalistaEngine

class calista.table.CalistaEngine(engine: str, config: Dict[str, Any] | None = None)

Bases: object

For now, you can execute data quality checks using the following engines or platforms: spark, pandas, polars, snowflake, bigquery.

load(path: str | None = None, file_format: str | None = None, data: Dict[str, List] | None = None, table: str | None = None, schema: str | None = None, database: str | None = None, dataframe: Any | None = None, options: Dict[str, Any] | None = None) CalistaTable

Load data from a dataset into a CalistaTable.

Parameters:
  • path ((str, optional)) – The path if you’re loading a file.

  • file_format ((str, optional)) – The format of the file (e.g., ‘csv’, ‘parquet’).

  • data ((dict)) – The dictionary containing the data of the table.

  • table ((str, optional)) – The name of the table if you’re not loading a file.

  • schema ((str, optional)) – The schema containing the table.

  • database ((str, optional)) – The database containing the table.

  • dataframe ((Any, optional)) – An existing dataframe.

  • options ((Dict[str, Any], optional)) – Additional configuration file options.

Returns:

CalistaTable: The loaded table.

Raises:

Any exceptions raised by the engine’s read_dataset method.

load_from_database(table: Any, schema: str | None = None, database: str | None = None) CalistaTable

Load data from a table into a CalistaTable.

Parameters:
  • table ((str)) – The name of the table.

  • schema ((str, optional)) – The schema containing the table

  • database ((str, optional)) – The database containing the table.

Returns:

CalistaTable: The loaded table.

Raises:

Any exceptions raised by the engine’s read_dataset method.

>>> from calista import CalistaEngine
>>>
>>> calista_table = CalistaEngine(engine="snowflake").load_from_database(database="my_database",
>>>                                                                      schema="my_schema",
>>>                                                                      table="my_table")
load_from_dataframe(dataframe: Any) CalistaTable

Load data from a dataframe into a CalistaTable.

Parameters:

dataframe ((Any)) – An existing dataframe.

Returns:

CalistaTable: The loaded table.

Raises:

Any exceptions raised by the engine’s read_dataset method.

>>> import pandas as pd
>>> from calista import CalistaEngine
>>>
>>> data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
>>> df = pd.DataFrame.from_dict(data)
>>> calista_table = CalistaEngine(engine="pandas").load_from_dataframe(df)
>>> calista_table.show()
>>>       col_1 col_2
>>>    0      3     a
>>>    1      2     b
>>>    2      1     c
>>>    3      0     d
load_from_dict(data: Dict[str, List]) CalistaTable

Load data from a dictionary into a CalistaTable.

Parameters:

data ((dict)) – The dictionary containing the data of the table.

Returns:

CalistaTable: The loaded table.

Raises:

Any exceptions raised by the engine’s read_dataset method.

Example

>>> from calista import CalistaEngine
>>>
>>> calista_table = CalistaEngine(engine="spark").load_from_dict({"ID": [1, 2, 3, 4]})
>>> calista_table.show()
>>>    +---+
>>>    | ID|
>>>    +---+
>>>    |  1|
>>>    |  2|
>>>    |  3|
>>>    |  4|
>>>    +---+
load_from_path(path: str, file_format: str, options: Dict[str, Any] | None = None) CalistaTable

Load data from a path into a CalistaTable.

Parameters:
  • path ((str)) – The path of the file containing your table.

  • file_format ((str, optional)) – The format of the file (e.g., ‘csv’, ‘parquet’).

Returns:

CalistaTable: The loaded table.

Raises:

Any exceptions raised by the engine’s read_dataset method.

Example

>>> from calista import CalistaEngine
>>>
>>> csv_options = {
>>> "delimiter": ",",
>>> "header": "True"
>>> }
>>> calista_table = CalistaEngine(engine="spark").load_from_path(path='my_csv.csv',file_format="csv",options=csv_options)