cvopt.model_selection.SimpleoptCV

class SimpleoptCV(estimator, param_distributions, scoring=None, cv=5, max_iter=32, random_state=None, n_jobs=1, pre_dispatch='2*n_jobs', verbose=0, logdir=None, save_estimator=0, saver='sklearn', model_id=None, cloner='sklearn', refit=True, backend='hyperopt', **kwargs)[source]

Each cross validation optimizer class’s wrapper.

This class allow unified handling in different type backend.

For each backend optimizer class, refer to each class`s page.

Parameters:
  • estimator – scikit-learn estimator like.
  • param_distributions (dict.) – Search space.
  • scoring (string or sklearn.metrics.make_scorer.) – Evaluation index of search. When scoring is None, use stimator default scorer and this score greater is better.
  • cv (scikit-learn cross-validator or int(number of folds), default=5.) – Cross validation setting.
  • max_iter (int, default=32.) – Number of search.
  • random_state (int or None, default=None.) – The seed used by the random number generator.
  • n_jobs (int, default=1.) – Number of jobs to run in parallel.
  • pre_dispatch (int or string, default="2*n_jobs".) – Controls the number of jobs that get dispatched during parallel.
  • verbose (int(0, 1 or 2), default=0.) –

    Controls the verbosity

    0: don’t display status.

    1: display status by stdout.

    2: display status by graph.

  • logdir (str or None, default=None.) –

    Path of directory to save log file. When logdir is None, log is not saved.

    [directory structure]

    logdir

    |-cv_results

    |-{model_id}.csv : search log

    |-cv_results_graph

    |-{model_id}.html : search log(graph)

    |-estimators_{model_id}

    |-{model_id}_index{search count}_split{fold count}.pkl: an estimator which is fitted fold train data

    |-{model_id}_index{search count}_test.pkl : an estimator which is fitted whole train data.

  • save_estimator (int, default=0.) –

    estimator save setting.

    0: An estimator is not saved.

    1: An estimator which is fitted fold train data is saved per cv-fold.

    2: In addition to 1, an estimator which is fitted whole train data is saved per cv.

  • saver (str or function, default="sklearn".) –

    estimator`s saver.

    • sklearn: use sklearn.externals.joblib.dump. Basically for scikit-learn.
    • function: function whose variable are model class and save path.

    Examples

    >>> def saver(model, path):
    >>>     save_model(model, path+".h5")
    
  • model_id (str or None, default=None.) – This is used to log filename. When model_id is None, this is generated by date time.
  • cloner (str or function, default="sklearn".) –

    estimator`s cloner.

    • sklearn: use try:sklearn.base.clone, except:copy.deepcopy. Basically for scikit-learn.
    • function: function whose variable is model.

    Examples

    >>> def cloner(model):
    >>>     clone_model(model)
    
  • refit (bool, default=True.) – Refit an estimator using the best found parameters on all train data(=X).
  • backend (str, default="hyperopt".) –

    backend optimeizer. Supports the following back ends.

    • hyperopt: Sequential Model Based Global Optimization
    • bayesopt: Bayesian Optimization
    • gaopt: Genetic Algorithm
    • randomopt: Random Search
cv_results_

dict of numpy (masked) ndarrays – A dict with keys as column headers and values as columns, that can be imported into a pandas DataFrame.

best_estimator_

estimator or dict – Estimator that was chosen by the search.

best_score_

float – Cross-validated score of the best_estimator.

best_params_

dict – Parameter setting that gave the best results on the hold out data.

Methods

__init__(estimator, param_distributions[, …])