Enhanced doc
This commit is contained in:
parent
18fab1c6f9
commit
79513f9e57
12
README.md
12
README.md
@ -13,7 +13,7 @@ A word frequency tool that outputs sorted results in csv format, supporting stop
|
|||||||
|
|
||||||
python wordocc.py a_interesting_text.txt
|
python wordocc.py a_interesting_text.txt
|
||||||
|
|
||||||
Outputs such content in wordocc.csv :
|
Outputs such content in _wordocc.csv_ in the current directory :
|
||||||
|
|
||||||
top,43
|
top,43
|
||||||
image,31
|
image,31
|
||||||
@ -30,22 +30,20 @@ Outputs such content in wordocc.csv :
|
|||||||
-s STOP_WORDS, --stop-words=STOP_WORDS
|
-s STOP_WORDS, --stop-words=STOP_WORDS
|
||||||
path to stop word file
|
path to stop word file
|
||||||
-o OUTPUT, --output=OUTPUT
|
-o OUTPUT, --output=OUTPUT
|
||||||
csv output filename
|
csv output filename (default: wordocc.csv)
|
||||||
-e ENCODING, --encoding=ENCODING
|
-e ENCODING, --encoding=ENCODING
|
||||||
file encoding (default: utf-8)
|
file encoding (default: utf-8)
|
||||||
|
|
||||||
|
|
||||||
## Stop words
|
## Stop words
|
||||||
|
|
||||||
### Introduction
|
|
||||||
|
|
||||||
Stop words are words that are not interesting for the statistic study, like articles, conjunctions, etc ...
|
Stop words are words that are not interesting for the statistic study, like articles, conjunctions, etc ...
|
||||||
|
|
||||||
You have to provide a file containing those words (one per line). Following files can help :
|
You can provide a file containing those words (one per line). Following files can help :
|
||||||
|
|
||||||
- English : http://snowball.tartarus.org/algorithms/english/stop.txt
|
- English : http://snowball.tartarus.org/algorithms/english/stop.txt
|
||||||
- French :http://snowball.tartarus.org/algorithms/french/stop.txt
|
- French :http://snowball.tartarus.org/algorithms/french/stop.txt
|
||||||
|
|
||||||
### Usage
|
Use -s option to specify the file path :
|
||||||
|
|
||||||
python wordocc.py -e /home/jdoe/en/stop.txt a_interesting_text.txt
|
python wordocc.py -s /home/jdoe/en/stop.txt a_interesting_text.txt
|
Loading…
Reference in New Issue
Block a user