cvopt.utils.mk_metafeature

mk_metafeature(X, y, logdir, model_id, target_index, cv, validation_data=None, feature_groups=None, estimator_method='predict', merge=True, loader='sklearn')[source]

Make meta feature for stacking(https://mlwave.com/kaggle-ensembling-guide/)

Parameters:
  • X (np.ndarray or pd.core.frame.DataFrame, shape(axis=0) = (n_samples)) – Features that was used in optimizer training. Detail depends on estimator. Meta feature correspond to X is made using cross validation’s estimator.
  • y (np.ndarray or pd.core.frame.DataFrame, shape(axis=0) = (n_samples) or None, default=None.) – Target variable that was used in optimizer training. Detail depends on estimator.
  • logdir (str.) – cvopt’s log directory path.
  • model_id (str.) – cvopt’s model id.
  • target_index (int.) – Logfile index(start:0). The estimator correspond to index is used to make meta feature.
  • cv (scikit-learn cross-validator) – Cross validation setting that was used in optimizer training.
  • validation_data (tuple(X, y) or None, default=None.) – Detail depends on estimator. Meta feature correspond to validation_data is made using the estimator which is fitted whole train data.
  • feature_groups (array-like, shape = (n_samples,) or None, default=None.) – cvopt feature_groups that was used in optimizer training.
  • estimator_method (str, default="predict".) – Using estimator’s method to make meta feature.
  • merge (bool, default=True.) – if True, return matrix which result per cv is merged into.
  • loader (str or function, default="sklearn".) –

    estimator`s loader.

    • sklearn: use sklearn.externals.joblib.load. Basically for scikit-learn.
    • function: function whose variable is estimator`s path.
Returns:

X_meta or X_meta, X_meta_validation_data – When validation_data is input, return tuple.

Return type:

np.ndarray or tuple of np.ndarray.