1.2 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			1.2 KiB
		
	
	
	
	
	
	
	
#WordOcc
A word frequency tool that outputs sorted results in csv format, supporting stop words.
Requirements
- Python 2.6
 - TextBlob https://textblob.readthedocs.org/en/dev/
 
Usage
Basic
python wordocc.py a_interesting_text.txt
Outputs such content in wordocc.csv in the current directory :
top,43
image,31
sample,29
...
Options
wordocc.py -h
Usage: wordocc.py [options] FILE
Options:
  -h, --help            show this help message and exit
  -s STOP_WORDS, --stop-words=STOP_WORDS
                        path to stop word file
  -o OUTPUT, --output=OUTPUT
	                    csv output filename (default: wordocc.csv)
  -e ENCODING, --encoding=ENCODING
                        file encoding (default: utf-8)
Stop words
Stop words are words that are not interesting for the statistic study, like articles, conjunctions, etc ...
You can provide a file containing those words (one per line). Following files can help :
- English : http://snowball.tartarus.org/algorithms/english/stop.txt
 - French :http://snowball.tartarus.org/algorithms/french/stop.txt
 
Use -s option to specify the file path :
python wordocc.py -s /home/jdoe/en/stop.txt a_interesting_text.txt