Je souhaite essayer différentes configurations d'un pipeline pour la classification de texte.Erreur lors de l'utilisation de la commande scikit-learn Pipeline et GridSearchCV
Je l'ai fait
pipe = Pipeline([('c_vect', CountVectorizer()),('feat_select', SelectKBest()),
('ridge', RidgeClassifier())])
parameters = {'c_vect__max_features': [10, 50, 100, None],
'feat_select__score_func': [chi2, f_classif, mutual_info_classif, SelectFdr, SelectFwe, SelectFpr],
'ridge__solver': ['sparse_cg', 'lsqr', 'sag'], 'ridge__tol': [1e-2, 1e-3], 'ridge__alpha': [0.01, 0.1, 1]}
gs_clf = GridSearchCV(pipe, parameters, n_jobs=5)
gs_clf = gs_clf.fit(clean_train_data, train_labels_list)
Mais je reçois cette erreur, même si SelectFdr est censé être l'une des fonctions de sélection de fonctions disponibles en fonction de la documentation SelectKBest ici: http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html
Traceback (most recent call last):
File ".../anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.p
y", line 350, in __call__
return self.func(*args, **kwargs)
File ".../anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 1
31, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File ".../anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 1
31, in <listcomp>
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File ".../anaconda3/lib/python3.5/site-packages/sklearn/model_selection/_validation.py", line
437, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File ".../anaconda3/lib/python3.5/site-packages/sklearn/pipeline.py", line 257, in fit
Xt, fit_params = self._fit(X, y, **fit_params)
File ".../anaconda3/lib/python3.5/site-packages/sklearn/pipeline.py", line 222, in _fit
**fit_params_steps[name])
File ".../anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/memory.py", line 362
, in __call__
return self.func(*args, **kwargs)
File ".../anaconda3/lib/python3.5/site-packages/sklearn/pipeline.py", line 589, in _fit_trans
form_one
res = transformer.fit_transform(X, y, **fit_params)
File ".../anaconda3/lib/python3.5/site-packages/sklearn/base.py", line 521, in fit_transform
return self.fit(X, y, **fit_params).transform(X)
File ".../anaconda3/lib/python3.5/site-packages/sklearn/feature_selection/base.py", line 76,
in transform
mask = self.get_support()
File ".../anaconda3/lib/python3.5/site-packages/sklearn/feature_selection/base.py", line 47,
in get_support
mask = self._get_support_mask()
File ".../anaconda3/lib/python3.5/site-packages/sklearn/feature_selection/univariate_selectio
n.py", line 503, in _get_support_mask
scores = _clean_nans(self.scores_)
File ".../anaconda3/lib/python3.5/site-packages/sklearn/feature_selection/univariate_selectio
n.py", line 30, in _clean_nans
scores = as_float_array(scores, copy=True)
File ".../anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py", line 93, in as_
float_array
return X.astype(return_dtype)
TypeError: float() argument must be a string or a number, not 'SelectFdr'
Une idée de pourquoi cela arrive?
Merci beaucoup! Je ne savais pas que tu pouvais faire ça. J'ai une autre question qui est un peu différente. Le SelectFdr va essayer de diminuer les faux positifs? Y a-t-il une fonction pour diminuer les faux négatifs? sinon, y a-t-il un moyen de spécifier l'étiquette que je veux être considérée comme positive dans le pipeline? – Atirag