Enhanced doc

This commit is contained in:
Mutah 2015-12-07 15:36:04 +01:00
parent 18fab1c6f9
commit 79513f9e57

View File

@ -13,7 +13,7 @@ A word frequency tool that outputs sorted results in csv format, supporting stop
python wordocc.py a_interesting_text.txt python wordocc.py a_interesting_text.txt
Outputs such content in wordocc.csv : Outputs such content in _wordocc.csv_ in the current directory :
top,43 top,43
image,31 image,31
@ -30,22 +30,20 @@ Outputs such content in wordocc.csv :
-s STOP_WORDS, --stop-words=STOP_WORDS -s STOP_WORDS, --stop-words=STOP_WORDS
path to stop word file path to stop word file
-o OUTPUT, --output=OUTPUT -o OUTPUT, --output=OUTPUT
csv output filename csv output filename (default: wordocc.csv)
-e ENCODING, --encoding=ENCODING -e ENCODING, --encoding=ENCODING
file encoding (default: utf-8) file encoding (default: utf-8)
## Stop words ## Stop words
### Introduction
Stop words are words that are not interesting for the statistic study, like articles, conjunctions, etc ... Stop words are words that are not interesting for the statistic study, like articles, conjunctions, etc ...
You have to provide a file containing those words (one per line). Following files can help : You can provide a file containing those words (one per line). Following files can help :
- English : http://snowball.tartarus.org/algorithms/english/stop.txt - English : http://snowball.tartarus.org/algorithms/english/stop.txt
- French :http://snowball.tartarus.org/algorithms/french/stop.txt - French :http://snowball.tartarus.org/algorithms/french/stop.txt
### Usage Use -s option to specify the file path :
python wordocc.py -e /home/jdoe/en/stop.txt a_interesting_text.txt python wordocc.py -s /home/jdoe/en/stop.txt a_interesting_text.txt