VADER Sentiment Analysis. VADER is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

Overview

VADER-Sentiment-Analysis

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. It is fully open-sourced under the [MIT License] (we sincerely appreciate all attributions and readily accept most contributions, but please don't hold us liable).

Features and Updates

Many thanks to George Berry, Ewan Klein, Pierpaolo Pantone for key contributions to make VADER better. The new updates includes capabilities regarding:

  1. Refactoring for Python 3 compatibility, improved modularity, and incorporation into [NLTK] ...many thanks to Ewan & Pierpaolo.

  2. Restructuring for much improved speed/performance, reducing the time complexity from something like O(N^4) to O(N)...many thanks to George.

  3. Simplified pip install and better support for vaderSentiment module and component import. (Dependency on vader_lexicon.txt file now uses automated file location discovery so you don't need to manually designate its location in the code, or copy the file into your executing code's directory.)

  4. More complete demo in the __main__ for vaderSentiment.py. The demo has:

    • examples of typical use cases for sentiment analysis, including proper handling of sentences with:

      • typical negations (e.g., "not good")
      • use of contractions as negations (e.g., "wasn't very good")
      • conventional use of punctuation to signal increased sentiment intensity (e.g., "Good!!!")
      • conventional use of word-shape to signal emphasis (e.g., using ALL CAPS for words/phrases)
      • using degree modifiers to alter sentiment intensity (e.g., intensity boosters such as "very" and intensity dampeners such as "kind of")
      • understanding many sentiment-laden slang words (e.g., 'sux')
      • understanding many sentiment-laden slang words as modifiers such as 'uber' or 'friggin' or 'kinda'
      • understanding many sentiment-laden emoticons such as :) and :D
      • translating utf-8 encoded emojis such as πŸ’˜ and πŸ’‹ and 😁
      • understanding sentiment-laden initialisms and acronyms (for example: 'lol')
    • more examples of tricky sentences that confuse other sentiment analysis tools

    • example for how VADER can work in conjunction with NLTK to do sentiment analysis on longer texts...i.e., decomposing paragraphs, articles/reports/publications, or novels into sentence-level analyses

    • examples of a concept for assessing the sentiment of images, video, or other tagged multimedia content

    • if you have access to the Internet, the demo has an example of how VADER can work with analyzing sentiment of texts in other languages (non-English text sentences).

Introduction

This README file describes the dataset of the paper:

VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text
(by C.J. Hutto and Eric Gilbert)
Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.
For questions, please contact:
C.J. Hutto
Georgia Institute of Technology, Atlanta, GA 30032
cjhutto [at] gatech [dot] edu

Citation Information

If you use either the dataset or any of the VADER sentiment analysis tools (VADER sentiment lexicon or Python code for rule-based sentiment analysis engine) in your research, please cite the above paper. For example:

Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.

Installation

There are a couple of ways to install and use VADER sentiment:

  1. The simplest is to use the command line to do an installation from [PyPI] using pip, e.g.,
    > pip install vaderSentiment
  2. Or, you might already have VADER and simply need to upgrade to the latest version, e.g.,
    > pip install --upgrade vaderSentiment
  3. You could also clone this [GitHub repository]
  4. You could download and unzip the [full master branch zip file]

In addition to the VADER sentiment analysis Python module, options 3 or 4 will also download all the additional resources and datasets (described below).

Resources and Dataset Descriptions

The package here includes PRIMARY RESOURCES (items 1-3) as well as additional DATASETS AND TESTING RESOURCES (items 4-12):

  1. vader_icwsm2014_final.pdf

    The original paper for the data set, see citation information (above).

  2. vader_lexicon.txt
    FORMAT: the file is tab delimited with TOKEN, MEAN-SENTIMENT-RATING, STANDARD DEVIATION, and RAW-HUMAN-SENTIMENT-RATINGS

    NOTE: The current algorithm makes immediate use of the first two elements (token and mean valence). The final two elements (SD and raw ratings) are provided for rigor. For example, if you want to follow the same rigorous process that we used for the study, you should find 10 independent humans to evaluate/rate each new token you want to add to the lexicon, make sure the standard deviation doesn't exceed 2.5, and take the average rating for the valence. This will keep the file consistent.

    DESCRIPTION: Empirically validated by multiple independent human judges, VADER incorporates a "gold-standard" sentiment lexicon that is especially attuned to microblog-like contexts.

    The VADER sentiment lexicon is sensitive both the polarity and the intensity of sentiments expressed in social media contexts, and is also generally applicable to sentiment analysis in other domains.

    Sentiment ratings from 10 independent human raters (all pre-screened, trained, and quality checked for optimal inter-rater reliability). Over 9,000 token features were rated on a scale from "[–4] Extremely Negative" to "[4] Extremely Positive", with allowance for "[0] Neutral (or Neither, N/A)". We kept every lexical feature that had a non-zero mean rating, and whose standard deviation was less than 2.5 as determined by the aggregate of those ten independent raters. This left us with just over 7,500 lexical features with validated valence scores that indicated both the sentiment polarity (positive/negative), and the sentiment intensity on a scale from –4 to +4. For example, the word "okay" has a positive valence of 0.9, "good" is 1.9, and "great" is 3.1, whereas "horrible" is –2.5, the frowning emoticon :( is –2.2, and "sucks" and it's slang derivative "sux" are both –1.5.

    Manually creating (much less, validating) a comprehensive sentiment lexicon is a labor intensive and sometimes error prone process, so it is no wonder that many opinion mining researchers and practitioners rely so heavily on existing lexicons as primary resources. We are pleased to offer ours as a new resource. We began by constructing a list inspired by examining existing well-established sentiment word-banks (LIWC, ANEW, and GI). To this, we next incorporate numerous lexical features common to sentiment expression in microblogs, including:

    • a full list of Western-style emoticons, for example, :-) denotes a smiley face and generally indicates positive sentiment
    • sentiment-related acronyms and initialisms (e.g., LOL and WTF are both examples of sentiment-laden initialisms)
    • commonly used slang with sentiment value (e.g., nah, meh and giggly).

    We empirically confirmed the general applicability of each feature candidate to sentiment expressions using a wisdom-of-the-crowd (WotC) approach (Surowiecki, 2004) to acquire a valid point estimate for the sentiment valence (polarity & intensity) of each context-free candidate feature.

  3. vaderSentiment.py

    The Python code for the rule-based sentiment analysis engine. Implements the grammatical and syntactical rules described in the paper, incorporating empirically derived quantifications for the impact of each rule on the perceived intensity of sentiment in sentence-level text. Importantly, these heuristics go beyond what would normally be captured in a typical bag-of-words model. They incorporate word-order sensitive relationships between terms. For example, degree modifiers (also called intensifiers, booster words, or degree adverbs) impact sentiment intensity by either increasing or decreasing the intensity. Consider these examples:

    1. "The service here is extremely good"
    2. "The service here is good"
    3. "The service here is marginally good"

    From Table 3 in the paper, we see that for 95% of the data, using a degree modifier increases the positive sentiment intensity of example (a) by 0.227 to 0.36, with a mean difference of 0.293 on a rating scale from 1 to 4. Likewise, example (c) reduces the perceived sentiment intensity by 0.293, on average.

  4. tweets_GroundTruth.txt

    FORMAT: the file is tab delimited with ID, MEAN-SENTIMENT-RATING, and TWEET-TEXT

    DESCRIPTION: includes "tweet-like" text as inspired by 4,000 tweets pulled from Twitter’s public timeline, plus 200 completely contrived tweet-like texts intended to specifically test syntactical and grammatical conventions of conveying differences in sentiment intensity. The "tweet-like" texts incorporate a fictitious username (@anonymous) in places where a username might typically appear, along with a fake URL (http://url_removed) in places where a URL might typically appear, as inspired by the original tweets. The ID and MEAN-SENTIMENT-RATING correspond to the raw sentiment rating data provided in 'tweets_anonDataRatings.txt' (described below).

  5. tweets_anonDataRatings.txt

    FORMAT: the file is tab delimited with ID, MEAN-SENTIMENT-RATING, STANDARD DEVIATION, and RAW-SENTIMENT-RATINGS

    DESCRIPTION: Sentiment ratings from a minimum of 20 independent human raters (all pre-screened, trained, and quality checked for optimal inter-rater reliability).

  6. nytEditorialSnippets_GroundTruth.txt

    FORMAT: the file is tab delimited with ID, MEAN-SENTIMENT-RATING, and TEXT-SNIPPET

    DESCRIPTION: includes 5,190 sentence-level snippets from 500 New York Times opinion news editorials/articles; we used the NLTK tokenizer to segment the articles into sentence phrases, and added sentiment intensity ratings. The ID and MEAN-SENTIMENT-RATING correspond to the raw sentiment rating data provided in 'nytEditorialSnippets_anonDataRatings.txt' (described below).

  7. nytEditorialSnippets_anonDataRatings.txt

    FORMAT: the file is tab delimited with ID, MEAN-SENTIMENT-RATING, STANDARD DEVIATION, and RAW-SENTIMENT-RATINGS

    DESCRIPTION: Sentiment ratings from a minimum of 20 independent human raters (all pre-screened, trained, and quality checked for optimal inter-rater reliability).

  8. movieReviewSnippets_GroundTruth.txt

    FORMAT: the file is tab delimited with ID, MEAN-SENTIMENT-RATING, and TEXT-SNIPPET

    DESCRIPTION: includes 10,605 sentence-level snippets from rotten.tomatoes.com. The snippets were derived from an original set of 2000 movie reviews (1000 positive and 1000 negative) in Pang & Lee (2004); we used the NLTK tokenizer to segment the reviews into sentence phrases, and added sentiment intensity ratings. The ID and MEAN-SENTIMENT-RATING correspond to the raw sentiment rating data provided in 'movieReviewSnippets_anonDataRatings.txt' (described below).

  9. movieReviewSnippets_anonDataRatings.txt

    FORMAT: the file is tab delimited with ID, MEAN-SENTIMENT-RATING, STANDARD DEVIATION, and RAW-SENTIMENT-RATINGS

    DESCRIPTION: Sentiment ratings from a minimum of 20 independent human raters (all pre-screened, trained, and quality checked for optimal inter-rater reliability).

  10. amazonReviewSnippets_GroundTruth.txt

    FORMAT: the file is tab delimited with ID, MEAN-SENTIMENT-RATING, and TEXT-SNIPPET

    DESCRIPTION: includes 3,708 sentence-level snippets from 309 customer reviews on 5 different products. The reviews were originally used in Hu & Liu (2004); we added sentiment intensity ratings. The ID and MEAN-SENTIMENT-RATING correspond to the raw sentiment rating data provided in 'amazonReviewSnippets_anonDataRatings.txt' (described below).

  11. amazonReviewSnippets_anonDataRatings.txt

    FORMAT: the file is tab delimited with ID, MEAN-SENTIMENT-RATING, STANDARD DEVIATION, and RAW-SENTIMENT-RATINGS

    DESCRIPTION: Sentiment ratings from a minimum of 20 independent human raters (all pre-screened, trained, and quality checked for optimal inter-rater reliability).

  12. Comp.Social website with more papers/research:

    [Comp.Social](http://comp.social.gatech.edu/papers/)

Python Demo and Code Examples

Demo, including example of non-English text translations

For a more complete demo, point your terminal to vader's install directory (e.g., if you installed using pip, it might be \Python3x\lib\site-packages\vaderSentiment), and then run python vaderSentiment.py. (Be sure you are set to handle UTF-8 encoding in your terminal or IDE... there are also additional library/package requirements such as NLTK and requests to help demonstrate some common real world needs/desired uses).

The demo has more examples of tricky sentences that confuse other sentiment analysis tools. It also demonstrates how VADER can work in conjunction with NLTK to do sentiment analysis on longer texts...i.e., decomposing paragraphs, articles/reports/publications, or novels into sentence-level analysis. It also demonstrates a concept for assessing the sentiment of images, video, or other tagged multimedia content.

If you have access to the Internet, the demo will also show how VADER can work with analyzing sentiment of non-English text sentences. Please be aware that VADER does not inherently provide it's own translation. The use of "My Memory Translation Service" from MY MEMORY NET (see: http://mymemory.translated.net) is part of the demonstration showing (one way) for how to use VADER on non-English text. (Please note the usage limits for number of requests: http://mymemory.translated.net/doc/usagelimits.php)

Code Examples

    from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
    #note: depending on how you installed (e.g., using source code download versus pip install), you may need to import like this:
    #from vaderSentiment import SentimentIntensityAnalyzer

# --- examples -------
sentences = ["VADER is smart, handsome, and funny.",  # positive sentence example
             "VADER is smart, handsome, and funny!",  # punctuation emphasis handled correctly (sentiment intensity adjusted)
             "VADER is very smart, handsome, and funny.", # booster words handled correctly (sentiment intensity adjusted)
             "VADER is VERY SMART, handsome, and FUNNY.",  # emphasis for ALLCAPS handled
             "VADER is VERY SMART, handsome, and FUNNY!!!", # combination of signals - VADER appropriately adjusts intensity
             "VADER is VERY SMART, uber handsome, and FRIGGIN FUNNY!!!", # booster words & punctuation make this close to ceiling for score
             "VADER is not smart, handsome, nor funny.",  # negation sentence example
             "The book was good.",  # positive sentence
             "At least it isn't a horrible book.",  # negated negative sentence with contraction
             "The book was only kind of good.", # qualified positive sentence is handled correctly (intensity adjusted)
             "The plot was good, but the characters are uncompelling and the dialog is not great.", # mixed negation sentence
             "Today SUX!",  # negative slang with capitalization emphasis
             "Today only kinda sux! But I'll get by, lol", # mixed sentiment example with slang and constrastive conjunction "but"
             "Make sure you :) or :D today!",  # emoticons handled
             "Catch utf-8 emoji such as such as πŸ’˜ and πŸ’‹ and 😁",  # emojis handled
             "Not bad at all"  # Capitalized negation
             ]

analyzer = SentimentIntensityAnalyzer()
for sentence in sentences:
    vs = analyzer.polarity_scores(sentence)
    print("{:-<65} {}".format(sentence, str(vs)))

Again, for a more complete demo, go to the install directory and run python vaderSentiment.py. (Be sure you are set to handle UTF-8 encoding in your terminal or IDE.)

Output for the above example code

VADER is smart, handsome, and funny.----------------------------- {'pos': 0.746, 'compound': 0.8316, 'neu': 0.254, 'neg': 0.0}
VADER is smart, handsome, and funny!----------------------------- {'pos': 0.752, 'compound': 0.8439, 'neu': 0.248, 'neg': 0.0}
VADER is very smart, handsome, and funny.------------------------ {'pos': 0.701, 'compound': 0.8545, 'neu': 0.299, 'neg': 0.0}
VADER is VERY SMART, handsome, and FUNNY.------------------------ {'pos': 0.754, 'compound': 0.9227, 'neu': 0.246, 'neg': 0.0}
VADER is VERY SMART, handsome, and FUNNY!!!---------------------- {'pos': 0.767, 'compound': 0.9342, 'neu': 0.233, 'neg': 0.0}
VADER is VERY SMART, uber handsome, and FRIGGIN FUNNY!!!--------- {'pos': 0.706, 'compound': 0.9469, 'neu': 0.294, 'neg': 0.0}
VADER is not smart, handsome, nor funny.------------------------- {'pos': 0.0, 'compound': -0.7424, 'neu': 0.354, 'neg': 0.646}
The book was good.----------------------------------------------- {'pos': 0.492, 'compound': 0.4404, 'neu': 0.508, 'neg': 0.0}
At least it isn't a horrible book.------------------------------- {'pos': 0.363, 'compound': 0.431, 'neu': 0.637, 'neg': 0.0}
The book was only kind of good.---------------------------------- {'pos': 0.303, 'compound': 0.3832, 'neu': 0.697, 'neg': 0.0}
The plot was good, but the characters are uncompelling and the dialog is not great. {'pos': 0.094, 'compound': -0.7042, 'neu': 0.579, 'neg': 0.327}
Today SUX!------------------------------------------------------- {'pos': 0.0, 'compound': -0.5461, 'neu': 0.221, 'neg': 0.779}
Today only kinda sux! But I'll get by, lol----------------------- {'pos': 0.317, 'compound': 0.5249, 'neu': 0.556, 'neg': 0.127}
Make sure you :) or :D today!------------------------------------ {'pos': 0.706, 'compound': 0.8633, 'neu': 0.294, 'neg': 0.0}
Catch utf-8 emoji such as πŸ’˜ and πŸ’‹ and 😁-------------------- {'pos': 0.279, 'compound': 0.7003, 'neu': 0.721, 'neg': 0.0}
Not bad at all--------------------------------------------------- {'pos': 0.487, 'compound': 0.431, 'neu': 0.513, 'neg': 0.0}

About the Scoring

  • The compound score is computed by summing the valence scores of each word in the lexicon, adjusted according to the rules, and then normalized to be between -1 (most extreme negative) and +1 (most extreme positive). This is the most useful metric if you want a single unidimensional measure of sentiment for a given sentence. Calling it a 'normalized, weighted composite score' is accurate.

    It is also useful for researchers who would like to set standardized thresholds for classifying sentences as either positive, neutral, or negative. Typical threshold values (used in the literature cited on this page) are:

  1. positive sentiment: compound score >= 0.05
  2. neutral sentiment: (compound score > -0.05) and (compound score < 0.05)
  3. negative sentiment: compound score <= -0.05

NOTE: The compound score is the one most commonly used for sentiment analysis by most researchers, including the authors.

  • The pos, neu, and neg scores are ratios for proportions of text that fall in each category (so these should all add up to be 1... or close to it with float operation). These are the most useful metrics if you want to analyze the context & presentation of how sentiment is conveyed or embedded in rhetoric for a given sentence. For example, different writing styles may embed strongly positive or negative sentiment within varying proportions of neutral text -- i.e., some writing styles may reflect a penchant for strongly flavored rhetoric, whereas other styles may use a great deal of neutral text while still conveying a similar overall (compound) sentiment. As another example: researchers analyzing information presentation in journalistic or editorical news might desire to establish whether the proportions of text (associated with a topic or named entity, for example) are balanced with similar amounts of positively and negatively framed text versus being "biased" towards one polarity or the other for the topic/entity.
    • IMPORTANTLY: these proportions represent the "raw categorization" of each lexical item (e.g., words, emoticons/emojis, or initialisms) into positve, negative, or neutral classes; they do not account for the VADER rule-based enhancements such as word-order sensitivity for sentiment-laden multi-word phrases, degree modifiers, word-shape amplifiers, punctuation amplifiers, negation polarity switches, or contrastive conjunction sensitivity.

Ports to Other Programming Languages

Feel free to let me know about ports of VADER Sentiment to other programming languages. So far, I know about these helpful ports:

  1. Java
    VaderSentimentJava by apanimesh061
  2. JavaScript
    vaderSentiment-js by nimaeskandary
  3. PHP
    php-vadersentiment by abusby
  4. Scala
    Sentiment by ziyasal
  5. C#
    vadersharp by codingupastorm Jordan Andrews
  6. Rust
    vader-sentiment-rust by ckw017
  7. Go
    GoVader by jonreiter Jon Reiter
  8. R
    R Vader by Katie Roehrick
Issues
  • ImportError: cannot import name sentiment

    ImportError: cannot import name sentiment

    I installed vaderSentiment with pip and have ensured it is in the correct file, un- and re-installed it, attempted to upgrade pip, attempted to change the permissions for the files and am still having difficulty using this library. Error below:

    Traceback (most recent call last): File "search_twitter.py", line 1, in from vaderSentiment import sentiment as vaderSentiment ImportError: cannot import name sentiment

    Any help as soon as possible would be greatly appreciated as my project is due on Tuesday. Thank you very much, Jon

    opened by jatkins23 27
  • "To die for" misinterpreted

    I have found this basic common expression which is misinterpreted by Vader:

    To die for.-------------- {'neg': 0.661, 'neu': 0.339, 'pos': 0.0, 'compound': -0.5994}

    Could you consider adding a new rule for that?

    opened by fcpenha 7
  • Negation interpretation is very poor

    Negation interpretation is very poor

    I will be looking for a solution to this but right now, things like the following:

    no problems ever Everything has been smooth. No problems or complaints. No problem as of yet Doing just fine no problems All good. No complaints. No problem everything good Very satisfied. No bad experiences.

    are all being categorized as ~60% negative or more.

    and yet things like this:

    New website is HORRIBLE

    Has 53% negativity. For capitalized descriptions like "HORRIBLE", I would have expected much more accurate results, it seems like the same logic that is overestimating the negativity of words like "no" and "problem" is totally ignoring less general words like "horrible". This is not good at all.

    I bring this up not in hopes it would get fixed, I think it's fundamentally a problem with the approach this solution takes. I bring this up in case anyone was wondering if they should use this in a production scenario. You shouldn't. Especially not for customer support or CXM-related jobs. There is almost no common sense context awareness in VADER and worse, it misses on obvious adjectives which have little ironic or contradictory uses.

    opened by DylanAlloy 6
  • syntax error

    syntax error

    successful import platform : windows 7(x64) python version : 3.5.1

    Traceback (most recent call last): File "C:\Users\user\Desktop\sentiment\sentiment2.py", line 2, in from vaderSentiment.vaderSentiment import sentiment as vaderSentiment File "", line 969, in _find_and_load File "", line 954, in _find_and_load_unlocked File "", line 896, in _find_spec File "", line 1136, in find_spec File "", line 1112, in _get_spec File "", line 1093, in _legacy_get_spec File "", line 444, in spec_from_loader File "", line 530, in spec_from_file_location File "C:\Python\Python35\lib\site-packages\vadersentiment-0.5-py3.5.egg\vaderSentiment\vaderSentiment.py", line 23 return dict(map(lambda (w, m): (w, float(m)), [wmsr.strip().split('\t')[0:2] for wmsr in open(f) ])) ^ SyntaxError: invalid syntax

    opened by somenathmaji 6
  • TypeError: 'encoding' is an invalid keyword argument for this function

    TypeError: 'encoding' is an invalid keyword argument for this function

    Im getting this error while compiling SentimentIntensityAnalyzer()

    ErrorLog: venv/local/lib/python2.7/site-packages/vaderSentiment/vaderSentiment.py", line 212, in init with open(lexicon_full_filepath, encoding='utf-8') as f: TypeError: 'encoding' is an invalid keyword argument for this function

    opened by esitharth 5
  • Codec Issue

    Codec Issue

    Hi @cjhutto

    When I run the code from the NLTK tutorial - http://www.nltk.org/howto/sentiment.html - about using Vader I get the error below. I worked out that I had to move the vader_lexicon.txt file into my NLTK sentiment folder, but that didn't solve this Codec problem.

    Have run the code with both python 2 and 3.

    Any ideas what I can do?

    UnicodeDecodeError                        Traceback (most recent call last)
    <ipython-input-4-76d3725b79f2> in <module>()
         57 sentences.extend(tricky_sentences)
         58 
    ---> 59 sid = SentimentIntensityAnalyzer()
         60 
         61 for sentence in sentences:
    
    //anaconda/lib/python3.5/site-packages/nltk/sentiment/vader.py in __init__(self, lexicon_file)
        200     def __init__(self, lexicon_file="vader_lexicon.txt"):
        201         self.lexicon_file = os.path.join(os.path.dirname(__file__), lexicon_file)
    --> 202         self.lexicon = self.make_lex_dict()
        203 
        204     def make_lex_dict(self):
    
    //anaconda/lib/python3.5/site-packages/nltk/sentiment/vader.py in make_lex_dict(self)
        208         lex_dict = {}
        209         with codecs.open(self.lexicon_file, encoding='utf8') as infile:
    --> 210             for line in infile:
        211                 (word, measure) = line.strip().split('\t')[0:2]
        212                 lex_dict[word] = float(measure)
    
    //anaconda/lib/python3.5/codecs.py in __next__(self)
        709 
        710         """ Return the next decoded line from the input stream."""
    --> 711         return next(self.reader)
        712 
        713     def __iter__(self):
    
    //anaconda/lib/python3.5/codecs.py in __next__(self)
        640 
        641         """ Return the next decoded line from the input stream."""
    --> 642         line = self.readline()
        643         if line:
        644             return line
    
    //anaconda/lib/python3.5/codecs.py in readline(self, size, keepends)
        553         # If size is given, we call read() only once
        554         while True:
    --> 555             data = self.read(readsize, firstline=True)
        556             if data:
        557                 # If we're at a "\r" read one extra character (which might
    
    //anaconda/lib/python3.5/codecs.py in read(self, size, chars, firstline)
        499                 break
        500             try:
    --> 501                 newchars, decodedbytes = self.decode(data, self.errors)
        502             except UnicodeDecodeError as exc:
        503                 if firstline:
    
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xde in position 0: invalid continuation byte
    
    opened by jd155 5
  • Error message..

    Error message..

    When I try the sample code, I get the following error message. How should I fix it? Python 3.5.0 on Mac.

    Traceback (most recent call last): File "vader_sentiment.py", line 3, in from vaderSentiment.vaderSentiment import sentiment as vaderSentiment File "/Users/sungmoon/.pyenv/versions/3.5.0a4/lib/python3.5/site-packages/vaderSentiment/vaderSentiment.py", line 23 return dict(map(lambda (w, m): (w, float(m)), [wmsr.strip().split('\t')[0:2] for wmsr in open(f) ])) ^ SyntaxError: invalid syntax

    opened by sungmoonc 5
  • Not predicting sentiment of emoticons correctly

    Not predicting sentiment of emoticons correctly

    It is not predicting inconsistent results on emoticons.For instance, when I am passing this as 'πŸ™‚' an argument, it is correctly predicting the outcome but on using same emoticons multiple times 'πŸ™‚πŸ™‚', it is giving neutral results.Similarly ,the same issue is arising in different cases of other emoji and sometimes ,it is not even detecting the single emoji too.

    opened by Rishav09 5
  • UnicodeDecodeError when calling SentimentIntensityAnalyzer

    UnicodeDecodeError when calling SentimentIntensityAnalyzer

    Hi all

    I've just been trying to learn how to use the SentimentIntensityAnalyzer() and I've come up with the problem where:

    analyzer = SentimentIntensityAnalyzer()
     ---------------------------------------------------------------------------
    UnicodeDecodeError                        Traceback (most recent call last)
    <ipython-input-31-6c626c4ef428> in <module>()
    ----> 1 analyzer = SentimentIntensityAnalyzer()
          2 analyzer.polarity_score(line_first)
    
    /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site
    packages/nltk/sentiment/vader.pyc in __init__(self, lexicon_file)
        200     def __init__(self, lexicon_file="sentiment/vader_lexicon.zip/vader_lexicon/vader_lexicon.txt"):
        201         self.lexicon_file = nltk.data.load(lexicon_file)
    --> 202         self.lexicon = self.make_lex_dict()
        203 
        204     def make_lex_dict(self):
    
    /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk/sentiment/vader.pyc in make_lex_dict(self)
        208         lex_dict = {}
        209         for line in self.lexicon_file.split('\n'):
    --> 210             (word, measure) = line.strip().split('\t')[0:2]
        211             lex_dict[word] = float(measure)
        212         return lex_dict
    
    /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.pyc in next(self)
        697 
        698         """ Return the next decoded line from the input stream."""
    --> 699         return self.reader.next()
        700 
        701     def __iter__(self):
    
    /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.pyc in next(self)
        628 
        629         """ Return the next decoded line from the input stream."""
    --> 630         line = self.readline()
        631         if line:
        632             return line
    
    /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.pyc in readline(self, size, keepends)
        543         # If size is given, we call read() only once
        544         while True:
    --> 545             data = self.read(readsize, firstline=True)
        546             if data:
        547                 # If we're at a "\r" read one extra character (which might
    
    /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.pyc in read(self, size, chars, firstline)
        490             data = self.bytebuffer + newdata
        491             try:
    --> 492                 newchars, decodedbytes = self.decode(data, self.errors)
        493             except UnicodeDecodeError, exc:
        494                 if firstline:
    
    UnicodeDecodeError: 'utf8' codec can't decode byte 0xde in position 0: invalid continuation byte
    

    I've read the thread with a similar issue however, I dont quite understand where to add the 'u' to make the string unicode. I've only did: analyzer = SentimentIntensityAnalyzer()

    can someone help me?

    opened by aWildRiceHasAppeared 5
  • Emoticons UTF-8

    Emoticons UTF-8

    First of all, I really like your lib. It's powerful. I used utf-8 twitter emoticons like β€οΈπŸ˜‚πŸ˜«πŸ˜Š and all of them have neutral sentiment. Maybe it is just my issue, but I tried using UTF-8 encoding in my code and it didn't help. I think, it's a good idea to add them to vader_lexicon.txt and support UTF-8 emoticons.

    Nowadays, they are more popular in social media than standard emoticons (':)' , ':(' ,':D', etc.) Please check this website: http://unicode.org/emoji/charts/full-emoji-list.html I think, it might increase VADER accuracy significantly.

    Do you plan to rewrite this code to Java to make it more popular? I can help you with that.

    opened by kokojumbo 4
  • Julia Port

    Julia Port

    null

    opened by nusretipek 0
  • is this thing still alive?

    is this thing still alive?

    Is this thing still alive?

    Last commit is from march. There are still unaddressed PR's from april. Also a lot of issues are not being addressed?

    opened by BtencateSphereon 0
  • Doubt about threshold values used in VADER categorization

    Doubt about threshold values used in VADER categorization

    Hey @cjhutto , based on the "About the scoring" section: https://github.com/cjhutto/vaderSentiment#about-the-scoring

    "Typical threshold values (used in the literature cited on this page) are:
    
    positive sentiment: compound score >= 0.05
    neutral sentiment: (compound score > -0.05) and (compound score < 0.05)
    negative sentiment: compound score <= -0.05
    NOTE: The compound score is the one most commonly used for sentiment analysis by most researchers, including the authors."
    

    Can you please explain why -0.05 and 0.05 are used as threshold values to categorize the compound scores as positive, neutral and negative? Thanking you in advance.

    opened by rashktech 0
  • Wrong weight assigned for hashtags with capitals

    Wrong weight assigned for hashtags with capitals

    The values thrown by sentiment analyzer is wrong when using " # " in the beginning of the sentence followed with capitalized letters.

    Ex : I have updated the value for "stress" to 0 in my use case. However for the following message: image

    You can see the compound value is negative.

    Without using the space: image

    The output should ideally be 0 for both cases.

    opened by VibhaRavi1 0
  • vaderSentiment data output in a different order than specified in the docs

    vaderSentiment data output in a different order than specified in the docs

    Hey there,

    just trying out vaderSentiment for emotional analysis and everything works fine from a technical point of view. Only thing that's kind of confusing is that the output we get from vs = analyzer.polarity_scores() is different from what is stated in the README.md.

    The README at the following section https://github.com/cjhutto/vaderSentiment#code-examples shows f.e. this result: {'pos': 0.746, 'compound': 0.8316, 'neu': 0.254, 'neg': 0.0}, while when running it locally on my jupyter notebook the output looks like that: {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}

    image

    Expected behaviour: Get the values in the same order as in vaderSentiment's README.md

    Actual behaviour: Printing out the result of vs = analyzer.polarity_scores() shows a different order (see example above).

    Just want to make sure I'm not mixing up something here. For my point of view it looks like as if the documentation seems not up to date here .. but maybe I'm missing something?

    Thanks

    opened by dataluchs 0
  • Optimization to

    Optimization to "negated" function.

    "negated" was made more concise and the NEGATE variable changed from a list to a set to speed up membership checks. The function performs about twice as fast in my rudimentary timeit tests.

    opened by naturalscienceuser 0
  • Typo fix in readme

    Typo fix in readme

    See "Output for the above example code" to verify that "Catch utf-8 emoji such as πŸ’˜ and πŸ’‹ and 😁" is actually the correct string.

    opened by Maxim-Mazurok 0
  • Added Lua version

    Added Lua version

    I created a NLP toolkit in Lua and ported VADER algorithm for adding sentiment analysis functionality. Relevant documentation for VADER is mentioned in the README.

    opened by pncnmnp 0
  • Is NLP still required when using VADER?

    Is NLP still required when using VADER?

    Hi, is stemming or lemmatisation or another kind of NLP still required when using VADER or is it processed automatically? @cjhutto

    opened by SamuelFairbrass 0
  • Dictionary contains phrases like

    Dictionary contains phrases like "fed up" that will never hit because of how the sentence is tokenized

    The dictionary contains phrases like "fed up" but since the code checks if words are in the dictionary on a word by word basis, these phrases never hit:

    > from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
    >>> analyzer=SentimentIntensityAnalyzer()
    >>> analyzer.polarity_scores("I am fed up")
    {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
    >>>
    
    opened by kirsten-stallings 0
Releases(0.5)
Owner
C.J. Hutto
C.J. Hutto
Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang

Natural Language Processing Implementations of selected machine learning algorithms for natural language processing in golang. The primary focus for t

James Bowman 351 Jan 1, 2022
A Go package for n-gram based text categorization, with support for utf-8 and raw text

A Go package for n-gram based text categorization, with support for utf-8 and raw text. To do: write documentation make it faster Keywords: text categ

Peter Kleiweg 65 Jun 21, 2021
A tool to find all duplicates in large sets of text documents.

⊧ dupi Dupi is an engine for identifying and exploring duplicative text in sets of documents. Status Dupi is in alpha/early beta development stage. Pl

go-air 12 Dec 21, 2021
i18n (Internationalization and localization) engine written in Go, used for translating locale strings.

go-localize Simple and easy to use i18n (Internationalization and localization) engine written in Go, used for translating locale strings. Use with go

Miles Croxford 29 Dec 7, 2021
Read and use word2vec vectors in Go

Introduction This is a package for reading word2vec vectors in Go and finding similar words and analogies. Installation This package can be installed

DaniΓ«l de Kok 43 Dec 24, 2021
[UNMANTEINED] Extract values from strings and fill your structs with nlp.

nlp nlp is a general purpose any-lang Natural Language Processor that parses the data inside a text and returns a filled model Supported types int in

Juan Alvarez 376 Dec 1, 2021
A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

prose is a natural language processing library (English only, at the moment) in pure Go. It supports tokenization, segmentation, part-of-speech tagging, and named-entity extraction.

Joseph Kato 2.9k Jan 15, 2022
Self-contained Machine Learning and Natural Language Processing library in Go

If you like the project, please β˜… star this repository to show your support! ?? A Machine Learning library written in pure Go designed to support rele

NLP Odyssey 1.1k Jan 13, 2022
Stemmer packages for Go programming language. Includes English, German and Dutch stemmers.

Stemmer package for Go Stemmer package provides an interface for stemmers and includes English, German and Dutch stemmers as sub-packages: porter2 sub

Dmitry Chestnykh 50 Nov 17, 2021
A go library for reading and creating ISO9660 images

iso9660 A package for reading and creating ISO9660 Joliet and Rock Ridge extensions are not supported. Examples Extracting an ISO package main import

Kamil DomaΕ„ski 199 Dec 20, 2021
Package i18n provides internationalization and localization for your Go applications.

i18n Package i18n provides internationalization and localization for your Go applications. Installation The minimum requirement of Go is 1.16. go get

null 53 Jan 5, 2022
fast tool for separate existing domains from list of domains using DNS/HTTP.

NETGREP How To Install β€’ How to use Description netgrep can send http/https request or resolve domain from dns (can customize dns server) to separate

aWolver 1 Dec 3, 2021
Godaddy-domains-client-go - Godaddy domains api Client golang - Write automaticly from swagger codegen

Go API client for swagger Overview This API client was generated by the swagger-codegen project. By using the swagger-spec from a remote server, you c

Mickael Stanislas 0 Jan 9, 2022
A social media API to handle users and their posts, written from scratch in Golang

Initial Set-Up To start the project on your own machine you'll need Golang instlled, along with mongoDB. Once you've insured these requirements are me

Ayush Neekhar 0 Oct 9, 2021
Automated social media post sharing

- SOBOT - Automated Social Media Sharing Tool Social media post sharing tool Features The application has a stable beta version. Errors that will occu

Can Açıkgâz 5 Jan 12, 2022
prometheus rule distributor, distribute rule to path

prometheus rule distributor, distribute rule to path.Support add/remove/delete/list app rule. Rule group by appID

charlie 0 Nov 3, 2021
A telegram bot that fetches multiple RSS cryptocurrency news feeds for sentiment analysis

Crypto News Telegram Bot A simple telegram bot that will help you stay updated on your latest crypto news This bot will help you keep track of the lat

Cha 4 Aug 22, 2021
Sentiment Analysis Pipeline + API written in Golang (currently processing Twitter tweets).

Go Sentiment Analysis Components Config: config module based in JSON (enter twitter credentials for use) Controllers: handle the API db call/logic for

Joseph Moussa 0 Jan 6, 2022
Sentiment Analysis Pipeline + API written in Golang (currently processing Twitter tweets).

Go Sentiment Analysis Components Config: config module based in JSON (enter twitter credentials for use) Controllers: handle the API db call/logic for

Joseph Moussa 0 Jan 6, 2022
This static analysis tool works to ensure your program's data flow does not spill beyond its banks.

Go Flow Levee This static analysis tool works to ensure your program's data flow does not spill beyond its banks. An input program's data flow is expl

Google 114 Jan 18, 2022
A quick introduction to how Apache Kafka works and differs from other messaging systems using an example application.

Apache Kafka in 6 minutes A quick introduction to how Apache Kafka works and differs from other messaging systems using an example application. In thi

bagher sohrabi 2 Oct 27, 2021
Gountries provides: Countries (ISO-3166-1), Country Subdivisions(ISO-3166-2), Currencies (ISO 4217), Geo Coordinates(ISO-6709) as well as translations, country borders and other stuff exposed as struct data.

gountries Inspired by the countries gem for ruby. Countries (ISO-3166-1), Country Subdivisions(ISO-3166-2), Currencies (ISO 4217), Geo Coordinates(ISO

PΓ€r Karlsson 333 Jan 7, 2022
Create production ready microservices mono repo pattern wired with Neo4j. Microservices for other languages and front end repos to be added as well in future.

Create Microservices MonoRepo in GO/Python Create a new production-ready project with backend (Golang), (Python) by running one CLI command. Focus on

GoChronicles 13 Dec 31, 2021
Request-logging-tool - A tool logs the md5 codes of the responses of the given domains in parameter

request-logging-tool Application to send http requests and log the md5 responses

Kushan Pandipperuma 1 Jan 7, 2022
TFTP and HTTP server specifically designed to serve iPXE ROMs and scripts.

pixie TFTP and HTTP server specifically designed to serve iPXE ROMs and scripts. pixie comes embedded with the following ROMs provided by the iPXE pro

Adrian L Lange 8 Jan 5, 2022
scrapligo -- is a Go library focused on connecting to devices, specifically network devices (routers/switches/firewalls/etc.) via SSH and NETCONF.

scrapligo -- scrap(e c)li (but in go!) -- is a Go library focused on connecting to devices, specifically network devices (routers/switches/firewalls/etc.) via SSH and NETCONF.

null 92 Jan 13, 2022
A reimplementation of AlphaGo in Go (specifically AlphaZero)

A reimplementation of AlphaGo in Go (specifically AlphaZero)

Gorgonia 196 Dec 15, 2021
A boiler-plate like base for people to get started in creating automation software specifically for purchasing items on websites.

Bot-Base Bot-Base is a small project with concepts for most elements of a bot. Feel free to contact me on Twitter with any questions. Contributing Pul

Edwin J 57 Jan 10, 2022
REST-API specifically build to support online store system of Zahir

Rest Test. β€’ From Above ERD please create Rest full API. Create register API(Include Generate password). β€’ Acceptance o Phone number and email is uniq

Sandi Permana Soebagio 0 Nov 15, 2021