data_validation_framework.util

Util functions.

Functions

apply_to_df(df, func, *args[, nb_processes, ...])

Apply a function to df rows using tqdm.

check_missing_columns(df, required_columns)

Return a list of missing columns in a pandas.DataFrame.

message_worker(progress_bar, message_queue)

Write a message without interfering with the progress bar using the message Queue.

report_missing_columns(df, required_columns)

Check that required columns exist in a pandas.DataFrame.

tqdm_worker(progress_bar, tqdm_queue)

Update progress bar using the Queue.

try_operation(row, func, *args, **kwargs)

Try to apply a function on a pandas.Series, and record exception.

Classes

StreamToQueue(*args[, message_queue])

Fake file-like stream object that redirects all prints to a Queue.

class data_validation_framework.util.StreamToQueue(*args, message_queue=None, **kwargs)

Bases: DummyTqdmFile

Fake file-like stream object that redirects all prints to a Queue.

write(buf)

Redirect write calls to the Queue.

data_validation_framework.util.apply_to_df(df, func, *args, nb_processes=None, redirect_stdout=None, **kwargs)

Apply a function to df rows using tqdm.

data_validation_framework.util.check_missing_columns(df, required_columns)

Return a list of missing columns in a pandas.DataFrame.

Parameters:
  • df (pandas.DataFrame) – The DataFrame to check.

  • required_columns (list) – The list of column names. A column name can be a str for a one level column or either a list(tuple(str)) or a dict(list(str)) for a two-level column.

data_validation_framework.util.message_worker(progress_bar, message_queue)

Write a message without interfering with the progress bar using the message Queue.

data_validation_framework.util.report_missing_columns(df, required_columns)

Check that required columns exist in a pandas.DataFrame.

Parameters:
  • df (pandas.DataFrame) – The DataFrame to check.

  • required_columns (list) – The list of column names. A column name can be a str for a one level column or either a list(tuple(str)) or a dict(list(str)) for a two-level column.

data_validation_framework.util.tqdm_worker(progress_bar, tqdm_queue)

Update progress bar using the Queue.

data_validation_framework.util.try_operation(row, func, *args, **kwargs)

Try to apply a function on a pandas.Series, and record exception.