pysentiment2 package

Submodules

pysentiment2.base module

This module contains base classes for dictionaries.

class pysentiment2.base.BaseDict(tokenizer=None)[source]

Bases: object

A base class for sentiment analysis. For now, only ‘positive’ and ‘negative’ analysis is supported.

Subclasses should implement init_dict, in which _posset and _negset are initialized.

Polarity and Subjectivity are calculated in the same way of Lydia system. See also http://www.cs.sunysb.edu/~skiena/lydia/

The formula for Polarity is,

\[Polarity= \frac{N_{pos}-N_{neg}}{N_{pos}+N_{neg}}\]

The formula for Subjectivity is,

\[Subjectivity= \frac{N_{pos}+N_{neg}}{N}\]
Parameters

tokenizer (obj) – An object which provides interface of tokenize. If it is None, a default tokenizer, which is defined in utils, will be assigned.

EPSILON = 1e-06
TAG_NEG = 'Negative'
TAG_POL = 'Polarity'
TAG_POS = 'Positive'
TAG_SUB = 'Subjectivity'
__init__(tokenizer=None)[source]

Initialize self. See help(type(self)) for accurate signature.

get_score(terms)[source]

Get score for a list of terms.

Parameters

terms (list) – A list of terms to be analyzed.

Returns

dict

abstract init_dict()[source]
tokenize(text)[source]
Returns

list

tokenize_first(x)[source]
Returns

str

pysentiment2.hiv4 module

class pysentiment2.hiv4.HIV4(tokenizer=None)[source]

Bases: pysentiment2.base.BaseDict

Dictionary class for Harvard IV-4. See also http://www.wjh.harvard.edu/~inquirer/

The terms for the dictionary are stemmed by the default tokenizer.

PATH = '/home/runner/work/pysentiment/pysentiment/pysentiment2/static/HIV-4.csv'
init_dict()[source]

pysentiment2.lm module

class pysentiment2.lm.LM(tokenizer=None)[source]

Bases: pysentiment2.base.BaseDict

Dictionary class for Loughran and McDonald Financial Sentiment Dictionaries.

See also https://www3.nd.edu/~mcdonald/Word_Lists.html

The terms for the dictionary are stemmed by the default tokenizer.

PATH = '/home/runner/work/pysentiment/pysentiment/pysentiment2/static/LM.csv'
init_dict()[source]

pysentiment2.utils module

This module contains methods to tokenize sentences.

class pysentiment2.utils.BaseTokenizer[source]

Bases: object

An abstract class for tokenize text.

abstract tokenize(text)[source]

Return tokenized temrs.

Returns

list

class pysentiment2.utils.Tokenizer[source]

Bases: pysentiment2.utils.BaseTokenizer

The default tokenizer for pysentiment2, which only takes care of words made up of [a-z]+. The output of the tokenizer is stemmed by nltk.PorterStemmer.

The stoplist from https://www3.nd.edu/~mcdonald/Word_Lists.html is included in this tokenizer. Any word in the stoplist will be excluded from the output.

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

get_stopset()[source]
tokenize(text)[source]

Return tokenized temrs.

Returns

list