No Description
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Mutah 79513f9e57 Enhanced doc 4 years ago
README.md Enhanced doc 4 years ago
wordocc.py Remove default for stop word file 4 years ago

README.md

WordOcc

A word frequency tool that outputs sorted results in csv format, supporting stop words.

Requirements

Usage

Basic

python wordocc.py a_interesting_text.txt

Outputs such content in wordocc.csv in the current directory :

top,43
image,31
sample,29
...

Options

wordocc.py -h
Usage: wordocc.py [options] FILE

Options:
  -h, --help            show this help message and exit
  -s STOP_WORDS, --stop-words=STOP_WORDS
                        path to stop word file
  -o OUTPUT, --output=OUTPUT
                        csv output filename (default: wordocc.csv)
  -e ENCODING, --encoding=ENCODING
                        file encoding (default: utf-8)

Stop words

Stop words are words that are not interesting for the statistic study, like articles, conjunctions, etc …

You can provide a file containing those words (one per line). Following files can help :

Use -s option to specify the file path :

python wordocc.py -s /home/jdoe/en/stop.txt a_interesting_text.txt