Vector Space Modeling (20%)

handle large text collections, using data streaming and efficient incremental algorithms, which differentiates it from most other scientific software packages that only target batch and in-memory proc

Gensim

REF:

sklearn features extract

REF

Gensim Fasttext pre-trained model get vectors for out-of-vocabulary words

out-of-vocab words are represented as the sum of character ngram vectors. While the intent is to handle out-of-vocab words (unks) like "blargfizzle", it also handles phrases like your input.

https://stackoverflow.com/questions/50828314/how-does-the-gensim-fasttext-pre-trained-model-get-vectors-for-out-of-vocabulary

BERT

文本相似度

Getting started with Word2Vec

Word2vec Made Easy

全面擁抱Transformer:NLP三大特徵抽取器(CNN/RNN/TF)比較

https://blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/86446077

Last updated