pysentiment2 package¶
Submodules¶
pysentiment2.base module¶
This module contains base classes for dictionaries.
-
class
pysentiment2.base.
BaseDict
(tokenizer=None)[source]¶ Bases:
object
A base class for sentiment analysis. For now, only ‘positive’ and ‘negative’ analysis is supported.
Subclasses should implement
init_dict
, in which_posset
and_negset
are initialized.Polarity
andSubjectivity
are calculated in the same way of Lydia system. See also http://www.cs.sunysb.edu/~skiena/lydia/The formula for
Polarity
is,\[Polarity= \frac{N_{pos}-N_{neg}}{N_{pos}+N_{neg}}\]The formula for
Subjectivity
is,\[Subjectivity= \frac{N_{pos}+N_{neg}}{N}\]- Parameters
tokenizer¶ (obj) – An object which provides interface of
tokenize
. If it isNone
, a default tokenizer, which is defined inutils
, will be assigned.
-
EPSILON
= 1e-06¶
-
TAG_NEG
= 'Negative'¶
-
TAG_POL
= 'Polarity'¶
-
TAG_POS
= 'Positive'¶
-
TAG_SUB
= 'Subjectivity'¶
pysentiment2.hiv4 module¶
-
class
pysentiment2.hiv4.
HIV4
(tokenizer=None)[source]¶ Bases:
pysentiment2.base.BaseDict
Dictionary class for Harvard IV-4. See also http://www.wjh.harvard.edu/~inquirer/
The terms for the dictionary are stemmed by the default tokenizer.
-
PATH
= '/home/runner/work/pysentiment/pysentiment/pysentiment2/static/HIV-4.csv'¶
-
pysentiment2.lm module¶
-
class
pysentiment2.lm.
LM
(tokenizer=None)[source]¶ Bases:
pysentiment2.base.BaseDict
Dictionary class for Loughran and McDonald Financial Sentiment Dictionaries.
See also https://www3.nd.edu/~mcdonald/Word_Lists.html
The terms for the dictionary are stemmed by the default tokenizer.
-
PATH
= '/home/runner/work/pysentiment/pysentiment/pysentiment2/static/LM.csv'¶
-
pysentiment2.utils module¶
This module contains methods to tokenize sentences.
-
class
pysentiment2.utils.
Tokenizer
[source]¶ Bases:
pysentiment2.utils.BaseTokenizer
The default tokenizer for
pysentiment2
, which only takes care of words made up of[a-z]+
. The output of the tokenizer is stemmed bynltk.PorterStemmer
.The stoplist from https://www3.nd.edu/~mcdonald/Word_Lists.html is included in this tokenizer. Any word in the stoplist will be excluded from the output.