The FeaturesAnalyzer class
- class features_analyzer.FeaturesAnalyzer(inputs_gatherer, forecast_type, cfg, logger)[source]
Bases:
objectGiven a dataset composed of features on the columns and days on the rows of a pandas df, this class computes the best features and their importance
- dataset_creator()[source]
Build the datasets according to the instructions in the config file in the datasetSettings section
- dataset_reader(region, target_column)[source]
Read a previously created or provided csv file. If the dataset is created from a custom JSON or from regionals signals, this method has to be preceded by a call of dataset_creator
- dataset_splitter(region, data, target_column)[source]
Split a dataFrame in design matrix X and response vector Y
- Parameters
name (str) – code name of the region/json/csv
data (pandas.DataFrame) – full dataset
- Returns
split datasets in multiple formats
- Return type
numpy.array, numpy.array, list, pandas.DataFrame, pandas.DataFrame
- important_features(region, x_data, y_data, features, target_data, ngbPars=None)[source]
Calculate the important features given design matrix, target vector and full list of features
- Parameters
x_data (numpy.array) – design matrix
y_data (numpy.array) – response vector
features (list) – list of features names
- Returns
list of new features and dataframe with relative importance of each single feature
- Return type
list, pandas.DataFrame
- perform_feature_selection(region, x_data, y_data, features, target, target_data, hps=None)[source]
Obtain selected features and also save them in the output folder
- Parameters
x_data (numpy.array) – design matrix
y_data (numpy.array) – response vector
features (list) – list of features names
- Returns
list of new features and dataframe with relative importance of each single feature
- Return type
list, pandas.DataFrame