#WordOcc A word frequency tool that outputs sorted results in csv format, supporting stop words. # Requirements - Python 2.6 - TextBlob https://textblob.readthedocs.org/en/dev/ # Usage ## Basic python wordocc.py a_interesting_text.txt Outputs such content in wordocc.csv : top,43 image,31 sample,29 ... ## Options wordocc.py -h Usage: wordocc.py [options] FILE Options: -h, --help show this help message and exit -s STOP_WORDS, --stop-words=STOP_WORDS path to stop word file -o OUTPUT, --output=OUTPUT csv output filename -e ENCODING, --encoding=ENCODING file encoding (default: utf-8) ## Stop words ### Introduction Stop words are words that are not interesting for the statistic study, like articles, conjunctions, etc ... You have to provide a file containing those words (one per line). Following files can help : - English : http://snowball.tartarus.org/algorithms/english/stop.txt - French :http://snowball.tartarus.org/algorithms/french/stop.txt ### Usage python wordocc.py -e /home/jdoe/en/stop.txt a_interesting_text.txt