Optimizers

Optimizer

class hiphive.fitting.Optimizer(fit_data, fit_method='least-squares', training_size=0.75, test_size=None, training_set=None, test_set=None, seed=42, **kwargs)[source]

Optimizer for single Ax = y fit.

One has to specify either training_size/test_size or training_set/test_set If either training_set or test_set (or both) is specified the fractions will be ignored.

Warning

Repeatedly setting up a Optimizer and training without changing the seed for the random number generator will yield identical or correlated results, to avoid this please specify a different seed when setting up multiple Optimizer instances.

Parameters:
  • fit_data (tuple of NumPy (N, M) array and NumPy (N) array) – the first element of the tuple represents the fit matrix A whereas the second element represents the vector of target values y; here N (=rows of A, elements of y) equals the number of target values and M (=columns of A) equals the number of parameters
  • fit_method (string) – method to be used for training; possible choice are “least-squares”, “lasso”, “elasticnet”, “bayesian-ridge”, “ardr”
  • training_size (float or int) – If float represents the fraction of fit_data (rows) to be used for training. If int, represents the absolute number of rows to be used for training.
  • test_size (float or int) – If float represents the fraction of fit_data (rows) to be used for testing. If int, represents the absolute number of rows to be used for testing.
  • training_set (tuple/list of ints) – indices of rows of A/y to be used for training
  • test_set (tuple/list of ints) – indices of rows of A/y to be used for testing
  • seed (int) – seed for pseudo random number generator
training_scatter_data

ScatterData object (namedtuple) – target and predicted value for each row in the training set

test_scatter_data

ScatterData object (namedtuple) – target and predicted value for each row in the test set

compute_rmse(A, y)

Compute the root mean square error using the A, y, and the vector of fitted parameters x corresponding to ||Ax-y||_2.

Parameters:
  • A (NumPy (N, M) array) – fit matrix where N (=rows of A, elements of y) equals the number of target values and M (=columns of A) equals the number of parameters (=elements of x)
  • y (NumPy (N) array) – vector of target values
Returns:

root mean squared error

Return type:

float

fit_method

string – fit method.

get_contributions(A)

Compute the average contribution to the predicted values from each element of the parameter vector.

Parameters:A (NumPy (N, M) array) – fit matrix where N (=rows of A, elements of y) equals the number of target values and M (=columns of A) equals the number of parameters
Returns:average contribution for each row of A from each parameter
Return type:NumPy (N, M) array
number_of_parameters

int – number of parameters (=columns in A matrix).

number_of_target_values

int – number of target values (=rows in A matrix).

parameters

NumPy array – copy of parameter vector.

predict(A)

Predict data given an input matrix A, i.e., Ax, where x is the vector of the fitted parameters.

Parameters:A (NumPy (N, M) array) – fit matrix where N (=rows of A, elements of y) equals the number of target values and M (=columns of A) equals the number of parameters
Returns:vector of predicted values
Return type:NumPy (N) array
rmse_test

float – root mean squared error for test set.

rmse_training

float – root mean squared error for training set.

seed

int – seed used to initialize pseudo random number of generator.

summary

dict – Comprehensive information about the optimizer.

test_fraction

float – fraction of rows included in test set.

test_set

list – indices of the rows included in the test set.

test_size

int – number of rows included in test set.

train()[source]

Carry out training.

training_fraction

float – fraction of rows included in training set.

training_set

list – indices of the rows included in the training set.

training_size

int – number of rows included in training set.

EnsembleOptimizer

class hiphive.fitting.EnsembleOptimizer(fit_data, fit_method='least-squares', ensemble_size=50, training_size=1.0, bootstrap=True, seed=42, **kwargs)[source]

Ensemble optimizer that carries out a series of single optimization runs using the Optimizer class and then provides access to various ensemble averaged quantities including e.g., errors and parameters.

Warning

Repeatedly setting up a EnsembleOptimizer and training without changing the seed for the random number generator will yield identical or correlated results, to avoid this please specify a different seed when setting up multiple EnsembleOptimizer instances.

Parameters:
  • fit_data (tuple of (N, M) NumPy array and (N) NumPy array) – the first element of the tuple represents the fit matrix A whereas the second element represents the vector of target values y; here N (=rows of A, elements of y) equals the number of target values and M (=columns of A) equals the number of parameters
  • fit_method (string) – method to be used for training; possible choice are “least-squares”, “lasso”, “elasticnet”, “bayesian-ridge”, “ardr”
  • ensemble_size (int) – number of fits in the ensemble
  • training_size (float or int) – If float represents the fraction of fit_data (rows) to be used for training. If int, represents the absolute number of rows to be used for training.
  • bootstrap (boolean) – if True sampling will be carried out with replacement
  • seed (int) – seed for pseudo random number generator
bootstrap

boolean – True if sampling is carried out with replacement.

compute_rmse(A, y)

Compute the root mean square error using the A, y, and the vector of fitted parameters x corresponding to ||Ax-y||_2.

Parameters:
  • A (NumPy (N, M) array) – fit matrix where N (=rows of A, elements of y) equals the number of target values and M (=columns of A) equals the number of parameters (=elements of x)
  • y (NumPy (N) array) – vector of target values
Returns:

root mean squared error

Return type:

float

ensemble_size

int – number of training rounds.

fit_method

string – fit method.

get_contributions(A)

Compute the average contribution to the predicted values from each element of the parameter vector.

Parameters:A (NumPy (N, M) array) – fit matrix where N (=rows of A, elements of y) equals the number of target values and M (=columns of A) equals the number of parameters
Returns:average contribution for each row of A from each parameter
Return type:NumPy (N, M) array
get_errors()[source]

Get the errors for each fit and each target value.

Returns:matrix of fit errors where N is the number of target values and M is the number of fits (i.e., the size of the ensemble)
Return type:NumPy (N,M) array
number_of_parameters

int – number of parameters (=columns in A matrix).

number_of_target_values

int – number of target values (=rows in A matrix).

parameter_vectors

list – all parameter vectors in the ensemble.

parameters

NumPy array – copy of parameter vector.

parameters_stddev

NumPy array – standard deviation for each parameter.

predict(A)

Predict data given an input matrix A, i.e., Ax, where x is the vector of the fitted parameters.

Parameters:A (NumPy (N, M) array) – fit matrix where N (=rows of A, elements of y) equals the number of target values and M (=columns of A) equals the number of parameters
Returns:vector of predicted values
Return type:NumPy (N) array
rmse_test

float – ensemble average of root mean squared error over test sets.

rmse_test_ensemble

list – root mean squared test errors obtained during for each fit in ensemble.

rmse_training

float – ensemble average of root mean squared error over training sets.

rmse_training_ensemble

list – root mean squared training errors obtained during for each fit in ensemble.

seed

int – seed used to initialize pseudo random number of generator.

summary

dict – Comprehensive information about the optimizer.

train()[source]

Carry out ensemble training and construct the final model by averaging over all models in the ensemble.

training_fraction

float – fraction of input data used for training; this value can differ slightly from the value set during initialization due to rounding.

training_size

int – number of rows included in training sets. Note that this will be different from the number of unique rows if boostrapping.

CrossValidationEstimator

class hiphive.fitting.CrossValidationEstimator(fit_data, fit_method='least-squares', validation_method='k-fold', number_of_splits=10, seed=42, **kwargs)[source]

Optimizer with cross validation.

This optimizer first computes a cross-validation score and finally generates a model using the full set of input data.

Warning

Repeatedly setting up a CrossValidationEstimator and training without changing the seed for the random number generator will yield identical or correlated results, to avoid this please specify a different seed when setting up multiple CrossValidationEstimator instances.

Parameters:
  • fit_data (tuple of NumPy (N, M) array and NumPy (N) array) – the first element of the tuple represents the fit matrix A whereas the second element represents the vector of target values y; here N (=rows of A, elements of y) equals the number of target values and M (=columns of A) equals the number of parameters
  • fit_method (string) – method to be used for training; possible choice are “least-squares”, “lasso”, “elasticnet”, “bayesian-ridge”, “ardr”
  • validation_method (string) – method to use for cross-validation; possible choices are “shuffle-split”, “k-fold”
  • number_of_splits (int) – number of times the fit data set will be split for the cross-validation
  • seed (int) – seed for pseudo random number generator
training_scatter_data

ScatterData object (namedtuple) – contains target and predicted values from each individual traininig set in the cross-validation split

validation_scatter_data

ScatterData object (namedtuple) – contains target and predicted values from each individual validation set in the cross-validation split

compute_rmse(A, y)

Compute the root mean square error using the A, y, and the vector of fitted parameters x corresponding to ||Ax-y||_2.

Parameters:
  • A (NumPy (N, M) array) – fit matrix where N (=rows of A, elements of y) equals the number of target values and M (=columns of A) equals the number of parameters (=elements of x)
  • y (NumPy (N) array) – vector of target values
Returns:

root mean squared error

Return type:

float

fit_method

string – fit method.

get_contributions(A)

Compute the average contribution to the predicted values from each element of the parameter vector.

Parameters:A (NumPy (N, M) array) – fit matrix where N (=rows of A, elements of y) equals the number of target values and M (=columns of A) equals the number of parameters
Returns:average contribution for each row of A from each parameter
Return type:NumPy (N, M) array
number_of_parameters

int – number of parameters (=columns in A matrix).

number_of_splits

string – number of splits (folds) used for cross-validation.

number_of_target_values

int – number of target values (=rows in A matrix).

parameters

NumPy array – copy of parameter vector.

predict(A)

Predict data given an input matrix A, i.e., Ax, where x is the vector of the fitted parameters.

Parameters:A (NumPy (N, M) array) – fit matrix where N (=rows of A, elements of y) equals the number of target values and M (=columns of A) equals the number of parameters
Returns:vector of predicted values
Return type:NumPy (N) array
rmse_training

float – average root mean squared training error obtained during cross-validation.

rmse_training_final

float – root mean squared error when using the full set of input data.

rmse_training_splits

list – root mean squared training errors obtained during cross-validation.

rmse_validation

float – average root mean squared cross-validation error.

rmse_validation_splits

list – root mean squared validation errors obtained during cross-validation.

seed

int – seed used to initialize pseudo random number of generator.

summary

dict – Comprehensive information about the optimizer.

train()[source]

Construct the final model using all input data available.

validate()[source]

Run validation.

validation_method

string – validation method name.