Enhanced doc

This commit is contained in:
Mutah 2015-12-07 15:36:04 +01:00
parent 18fab1c6f9
commit 79513f9e57

View File

@ -13,7 +13,7 @@ A word frequency tool that outputs sorted results in csv format, supporting stop
python wordocc.py a_interesting_text.txt
Outputs such content in wordocc.csv :
Outputs such content in _wordocc.csv_ in the current directory :
top,43
image,31
@ -30,22 +30,20 @@ Outputs such content in wordocc.csv :
-s STOP_WORDS, --stop-words=STOP_WORDS
path to stop word file
-o OUTPUT, --output=OUTPUT
csv output filename
csv output filename (default: wordocc.csv)
-e ENCODING, --encoding=ENCODING
file encoding (default: utf-8)
## Stop words
### Introduction
Stop words are words that are not interesting for the statistic study, like articles, conjunctions, etc ...
You have to provide a file containing those words (one per line). Following files can help :
You can provide a file containing those words (one per line). Following files can help :
- English : http://snowball.tartarus.org/algorithms/english/stop.txt
- French :http://snowball.tartarus.org/algorithms/french/stop.txt
### Usage
Use -s option to specify the file path :
python wordocc.py -e /home/jdoe/en/stop.txt a_interesting_text.txt
python wordocc.py -s /home/jdoe/en/stop.txt a_interesting_text.txt