The FeaturesAnalyzer class

class features_analyzer.FeaturesAnalyzer(inputs_gatherer, forecast_type, cfg, logger)[source]

Bases: object

Given a dataset composed of features on the columns and days on the rows of a pandas df, this class computes the best features and their importance

clean_features_list(region, important_features, new_features, str_pars)[source]
dataset_creator()[source]

Build the datasets according to the instructions in the config file in the datasetSettings section

dataset_reader(region, target_column)[source]

Read a previously created or provided csv file. If the dataset is created from a custom JSON or from regionals signals, this method has to be preceded by a call of dataset_creator

dataset_splitter(region, data, target_column)[source]

Split a dataFrame in design matrix X and response vector Y

Parameters
  • name (str) – code name of the region/json/csv

  • data (pandas.DataFrame) – full dataset

Returns

split datasets in multiple formats

Return type

numpy.array, numpy.array, list, pandas.DataFrame, pandas.DataFrame

important_features(region, x_data, y_data, features, target_data, ngbPars=None)[source]

Calculate the important features given design matrix, target vector and full list of features

Parameters
  • x_data (numpy.array) – design matrix

  • y_data (numpy.array) – response vector

  • features (list) – list of features names

Returns

list of new features and dataframe with relative importance of each single feature

Return type

list, pandas.DataFrame

perform_feature_selection(region, x_data, y_data, features, target, target_data, hps=None)[source]

Obtain selected features and also save them in the output folder

Parameters
  • x_data (numpy.array) – design matrix

  • y_data (numpy.array) – response vector

  • features (list) – list of features names

Returns

list of new features and dataframe with relative importance of each single feature

Return type

list, pandas.DataFrame

save_csv(important_features, target, new_features, output_folder_path)[source]

Save selected features and their relative importance

Parameters
  • important_features (pandas.DataFrame) – dataframe of the selected features and their relative importance

  • new_features (list) – selected features

update_datasets(name, output_dfs, target_columns)[source]

Initialize folders and add metadata to container of datasets