0
utilisation sklean tf-idf, l'espace d'utilisation de defult diviséPython, sklearn, il-idf comment diviser par "####", l'espace defult
corpus = [
'This is the first document.',
'This is the second second document.',
'And the third one.',
'Is this the first document?'
]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
mais, je veux utiliser ce formulaire:
enter code herecorpus = [
'This####is####the####first####document.',
'This####is####the####second####second####document.'
]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
tfidf=transformer.fit_transform(vectorizer.fit_transform(documents))
word=vectorizer.get_feature_names()
weight=tfidf.toarray()
Comment faire?
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html passer votre propre tokenizer –