cvopt.model_selection.RandomoptCV

class RandomoptCV(estimator, param_distributions, scoring=None, cv=5, max_iter=32, random_state=None, n_jobs=1, pre_dispatch='2*n_jobs', verbose=0, logdir=None, save_estimator=0, saver='sklearn', model_id=None, cloner='sklearn', refit=True)[source]

Cross validation optimizer by Random Search.

Parameters:
  • estimator – scikit-learn estimator like.
  • param_distributions (dict.) – Search space.
  • scoring (string or sklearn.metrics.make_scorer.) – Evaluation index of search. When scoring is None, use stimator default scorer and this score greater is better.
  • cv (scikit-learn cross-validator or int(number of folds), default=5.) – Cross validation setting.
  • max_iter (int, default=32.) – Number of search.
  • random_state (int or None, default=None.) – The seed used by the random number generator.
  • n_jobs (int, default=1.) – Number of jobs to run in parallel.
  • pre_dispatch (int or string, default="2*n_jobs".) – Controls the number of jobs that get dispatched during parallel.
  • verbose (int(0, 1 or 2), default=0.) –

    Controls the verbosity

    0: don’t display status.

    1: display status by stdout.

    2: display status by graph.

  • logdir (str or None, default=None.) –

    Path of directory to save log file. When logdir is None, log is not saved.

    [directory structure]

    logdir

    |-cv_results

    |-{model_id}.csv : search log

    |-cv_results_graph

    |-{model_id}.html : search log(graph)

    |-estimators_{model_id}

    |-{model_id}_index{search count}_split{fold count}.pkl: an estimator which is fitted fold train data

    |-{model_id}_index{search count}_test.pkl : an estimator which is fitted whole train data.

  • save_estimator (int, default=0.) –

    estimator save setting.

    0: An estimator is not saved.

    1: An estimator which is fitted fold train data is saved per cv-fold.

    2: In addition to 1, an estimator which is fitted whole train data is saved per cv.

  • saver (str or function, default="sklearn".) –

    estimator`s saver.

    • sklearn: use sklearn.externals.joblib.dump. Basically for scikit-learn.
    • function: function whose variable are model class and save path.

    Examples

    >>> def saver(model, path):
    >>>     save_model(model, path+".h5")
    
  • model_id (str or None, default=None.) – This is used to log filename. When model_id is None, this is generated by date time.
  • cloner (str or function, default="sklearn".) –

    estimator`s cloner.

    • sklearn: use try:sklearn.base.clone, except:copy.deepcopy. Basically for scikit-learn.
    • function: function whose variable is model.

    Examples

    >>> def cloner(model):
    >>>     clone_model(model)
    
  • refit (bool, default=True.) – Refit an estimator using the best found parameters on all train data(=X).
cv_results_

dict of numpy (masked) ndarrays – A dict with keys as column headers and values as columns, that can be imported into a pandas DataFrame.

best_estimator_

estimator or dict – Estimator that was chosen by the search.

best_score_

float – Cross-validated score of the best_estimator.

best_params_

dict – Parameter setting that gave the best results on the hold out data.

Methods

__init__(estimator, param_distributions[, …])
apply(X) Call apply on the estimator with the best found parameters.
classes_() Retern list of target classes.
decision_function(X) Call decision_function on the estimator with the best found parameters.
fit(X[, y, validation_data, groups, …]) Run fit.
get_params([deep]) Get parameters for this estimator.
inverse_transform(Xt) Call inverse_transform on the estimator with the best found parameters.
predict(X) Call predict on the estimator with the best found parameters.
predict_log_proba(X) Call predict_log_proba on the estimator with the best found parameters.
predict_proba(X) Call predict_proba on the estimator with the best found parameters.
score_summarizer(a[, axis, dtype, out, keepdims]) Compute the arithmetic mean along the specified axis.
set_params(**params) Set the parameters of this estimator.
transform(X) Call transform on the estimator with the best found parameters.
apply(X)

Call apply on the estimator with the best found parameters.

Parameters:X (numpy.array, pandas.DataFrame or scipy.sparse, shape(axis=0) = (n_samples)) – Features. Detail depends on estimator.
classes_()

Retern list of target classes.

decision_function(X)

Call decision_function on the estimator with the best found parameters.

Parameters:X (numpy.array, pandas.DataFrame or scipy.sparse, shape(axis=0) = (n_samples)) – Features. Detail depends on estimator.
fit(X, y=None, validation_data=None, groups=None, feature_groups=None, min_n_features=2)[source]

Run fit.

Parameters:
  • X (numpy.array, pandas.DataFrame or scipy.sparse, shape(axis=0) = (n_samples)) – Features. Detail depends on estimator.
  • y (np.ndarray or pd.core.frame.DataFrame, shape(axis=0) = (n_samples) or None, default=None.) – Target variable. detail depends on estimator.
  • validation_data (tuple(X, y) or None, default=None.) – Data to compute validation score. detail depends on estimator. When validation_data is None, computing validation score is not run.
  • groups (array-like, shape = (n_samples,) or None, default=None.) – Group labels for the samples used while splitting the dataset into train/test set. (input of scikit-learn cross-validator)
  • feature_groups (array-like, shape = (n_samples,) or None, default=None.) –

    Group labels for the features used while fearture select. When feature_groups is None, fearture selection is not run.

    When feature_group’s value is -1, this group’s features always are used.

  • min_n_features (int, default=2.) –

    When number of X’s feature cols is less than min_n_features, return search failure.

    e.g. If estimator has columns sampling function, use this option to avoid X become too small and error.

get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
inverse_transform(Xt)

Call inverse_transform on the estimator with the best found parameters.

Parameters:Xt (numpy.array, pandas.DataFrame or scipy.sparse, shape(axis=0) = (n_samples)) – Features. Detail depends on estimator.
predict(X)

Call predict on the estimator with the best found parameters.

Parameters:X (numpy.array, pandas.DataFrame or scipy.sparse, shape(axis=0) = (n_samples)) – Features. Detail depends on estimator.
predict_log_proba(X)

Call predict_log_proba on the estimator with the best found parameters.

Parameters:X (numpy.array, pandas.DataFrame or scipy.sparse, shape(axis=0) = (n_samples)) – Features. Detail depends on estimator.
predict_proba(X)

Call predict_proba on the estimator with the best found parameters.

Parameters:X (numpy.array, pandas.DataFrame or scipy.sparse, shape(axis=0) = (n_samples)) – Features. Detail depends on estimator.
score_summarizer(a, axis=None, dtype=None, out=None, keepdims=<class 'numpy._globals._NoValue'>)

Compute the arithmetic mean along the specified axis.

Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis. float64 intermediate and return values are used for integer inputs.

Parameters:
  • a (array_like) – Array containing numbers whose mean is desired. If a is not an array, a conversion is attempted.
  • axis (None or int or tuple of ints, optional) –

    Axis or axes along which the means are computed. The default is to compute the mean of the flattened array.

    New in version 1.7.0.

    If this is a tuple of ints, a mean is performed over multiple axes, instead of a single axis or all the axes as before.

  • dtype (data-type, optional) – Type to use in computing the mean. For integer inputs, the default is float64; for floating point inputs, it is the same as the input dtype.
  • out (ndarray, optional) – Alternate output array in which to place the result. The default is None; if provided, it must have the same shape as the expected output, but the type will be cast if necessary. See doc.ufuncs for details.
  • keepdims (bool, optional) –

    If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

    If the default value is passed, then keepdims will not be passed through to the mean method of sub-classes of ndarray, however any non-default value will be. If the sub-classes sum method does not implement keepdims any exceptions will be raised.

Returns:

m – If out=None, returns a new array containing the mean values, otherwise a reference to the output array is returned.

Return type:

ndarray, see dtype parameter above

See also

average()
Weighted average

std(), var(), nanmean(), nanstd(), nanvar()

Notes

The arithmetic mean is the sum of the elements along the axis divided by the number of elements.

Note that for floating-point input, the mean is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-precision accumulator using the dtype keyword can alleviate this issue.

By default, float16 results are computed using float32 intermediates for extra precision.

Examples

>>> a = np.array([[1, 2], [3, 4]])
>>> np.mean(a)
2.5
>>> np.mean(a, axis=0)
array([ 2.,  3.])
>>> np.mean(a, axis=1)
array([ 1.5,  3.5])

In single precision, mean can be inaccurate:

>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> np.mean(a)
0.54999924

Computing the mean in float64 is more accurate:

>>> np.mean(a, dtype=np.float64)
0.55000000074505806
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
transform(X)

Call transform on the estimator with the best found parameters.

Parameters:X (numpy.array, pandas.DataFrame or scipy.sparse, shape(axis=0) = (n_samples)) – Features. Detail depends on estimator.