segtok - tokenize text to setences and words for european languages
 
 
Apache Tika – Tike extracts and tokenizes text from 1400 file formats
Tike extracts and tokenizes text from 1400 file formats, like .doc, .pdf, .html, etc.
 
 
Using Word2Vec for sentiment Analysis
Descriptive example for NLP technic Word2Vec
 
 
Detecting text language with python and NLTK
 
 
Categorizing text with n-gram (python)