Twitter Sentiment analysis using R

Twitter sentiment analysis R

Twitter sentiment analysis R

Twitter sentiment analysis using R

In the past one decade, there has been an exponential surge in the online activity of people across the globe. The volume of posts that are made on the web every second runs into millions. To add to this, the rise of social media platforms has led to flooding to content on the internet.

Social media is not just a platform where people talk to each other, but it has become very vast and serves many more purposes. It has become a medium where people

  • Express their interests.
  • Share their views.
  • Share their displeasures.
  • Compliment companies for good and poor services.

So in this article, we are going to learn how we can analyze what people are posting on social networks (Twitter) to come up a great application which helps companies to understand about their customers.

Before we drive further, let’s look at the table of contents of this article.

Twitter sentiment analysis using R Click To Tweet

Table of contents:

  • People emotions to how customers felt about the product
  • How to create the twitter app
  • Sentiment analysis using twitter tweets
    • Why sentiment analysis?
  • Challenges in performing sentiment analysis on twitter tweets
  • Implementing sentiment analysis application in R
    • Extracting tweets using Twitter application
    • Cleaning the tweets for further analysis
    • Getting sentiment score for each tweet
    • Segregating positive and negative tweets
  • Conclusion

People emotions to how customers felt about the product

Social networks has grown from a mere chatting platform to a storehouse of data which could help companies solve many problems.

Which could help companies understand their customers better. What competitors are doing. Which could help companies understand what customers are talking about it.

Though at prima facie, it looks like a storehouse of insights it may not be as easy to extract the relevant information out of the unstructured text. Analyzing textual data is always difficult because of the inherent ways in which people write their posts.

Nevertheless, posts made by people on social media can be very expressive and help us understand their sentiments and emotions. Twitter, being one of the most popular social media platforms, is a platform where people often resort to express their emotions and sentiments about a brand, a product or a service.

How to create the Twitter app?

Twitter has made the task of analyzing tweets posted by users easier by developing an API which people can use to extract tweets and underlying metadata.

This API helps us extract twitter data in a very structured format which can then be cleaned and processed further for analysis.

To create a Twitter app, you first need to have a Twitter account. Once you have created a Twitter account, visit Twitter’s app page (Click here) and create an application.

Write the basic details such as application name, description along with a website name. You may enter any test website name as well. Once you have entered these details, you will get keys and access tokens. You will get 4 keys and tokens:

  1. Consumer Key (API Key)
  2. Consumer Secret (API Secret)
  3. Access Token
  4. Access Token Secret

These keys and tokens will be used to extract data from Twitter in R.

Sentiment Analysis Using Twitter tweets

Before going a step further into the technical aspect of sentiment analysis, let’s first understand why do we even need sentiment analysis.

Why sentiment analysis?

Let’s look from a company’s perspective and understand why would a company want to invest time and effort in analyzing sentiments of the posts. Analyzing each post and understanding the sentiment associated with that post helps us find out which are the key topics or themes which resonate well with the audience.

If the sentiment around the post is very positive, then people want to talk about the topic in that post. The topic could be a product or a service or a social message or any other thing. Understanding this can help us decide the kind of posts the company needs to put on social media platforms to increase the user engagement.

Also, analyzing the sentiment of a company over a period could help us relate its sales data with the overall sentiment. Was there a negative campaign at some time which resulted in the negative sentiment of the company.

Addressing questions

  • Thereby, resulting in the decline in sales during that period?
  • Was there a huge spike in positive sentiment because a celebrity talked about company’s product?
  • Did that positive spike result in positive sales?
  • Understanding the posts with negative sentiment could help us find the common themes in these posts?
  • Is customer service a common topic among posts which have high negative emotion?

All these questions could help us understand how customers are perceiving the company. What they are talking about the company product. What are they liking and what are they disliking.

I am sure, you will agree with me if I say, “Sentiment analysis of tweets or social media posts can help companies better analyze customer feedback and opinion, and better position their strategy.”

Challenges in performing sentiment analysis on twitter tweets

Given all the use cases of sentiment analysis, there are a few challenges in analyzing tweets for sentiment analysis. The first one is data quality. The Twitter application helps us in overcoming this problem to an extent.

After basic cleaning of data extracted from the Twitter app, we can use it to generate sentiment score for tweets. The second problem comes in understanding and analyzing slangs used on Twitter.

People have a different way of writing and while posting on Twitter, people are least bothered about the correct spelling of words or they may use a lot of slangs which are not proper English words but are used in informal conversations.

There is a lot of research going on in this area and a lot of people have been able to develop slang dictionaries to understand their meaning. We won’t be focusing on this part in this article; we will use the standard dictionaries and packages available in R for sentiment analysis.

The third and the biggest problem in sentiment analysis is decoding sarcasm. Since sentiment analysis works on the semantics of words, it becomes difficult to decode if the post has a sarcasm.

Implementing sentiment analysis application in R

Now, we will try to analyze the sentiments of tweets made by a Twitter handle. We will develop the code in  R step by step and see the practical implementation of sentiment analysis in R.

The code is divided into following parts:

  1. Extracting tweets using Twitter application
  2. Cleaning the tweets for further analysis
  3. Getting sentiment score for each tweet
  4. Segregating positive and negative tweets

Extracting tweets using Twitter application

We will first install the relevant packages that we need. To extract tweets from Twitter, we will need package ‘twitteR’.

‘Syuzhet’ package will be used for sentiment analysis; while ‘tm’ and ‘SnowballC’ packages are used for text mining and analysis.

#  Install Requried Packages
installed.packages("SnowballC")
installed.packages("tm")
installed.packages("twitteR")
installed.packages("syuzhet")

# Load Requried Packages
library("SnowballC")
library("tm")
library("twitteR")
library("syuzhet")

Next, we will invoke Twitter API using the app we have created and using the keys and access tokens we got through the app.

# Authonitical keys
consumer_key <- 'ABCDEFGHI1234567890'
consumer_secret <- 'ABCDEFGHI1234567890'
access_token <- 'ABCDEFGHI1234567890'
access_secret <- 'ABCDEFGHI1234567890'

setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)
tweets <- userTimeline("realDonaldTrump", n=200)

n.tweet <- length(tweets)

We have invoked the Twitter app and extracted data from the twitter handle ‘@realDonaldTrump’. We will now see what format we have got the extract and what all steps do we need to take to clean the data.

Cleaning the tweets for further analysis

tweets.df <- twListToDF(tweets) 

head(tweets.df)

We get a total of 16 variables using ‘userTimeline’ function, snapshot of the sample data is shown below.

Twitter Sentiment analysis using R

Twitter Sentiment analysis using R

The field ‘text’ contains the tweet part, hashtags, and URLs. We need to remove hashtags and URLs from the text field so that we are left only with the main tweet part to run our sentiment analysis.

Our current text field looks like below:

> head(tweets.df$text)
 [1] "We believe that every American should stand for the National Anthem, and we proudly pledge allegiance to one NATION… https://t.co/4GQmdSmiRk"
 [2] "This is your land, this is your home, and it's your voice that matters the most. So speak up, be heard, and fight,… https://t.co/u09Brwnow3"
 [3] "Just arrived at the Pensacola Bay Center. Join me LIVE on @FoxNews in 10 minutes! #MAGA https://t.co/RQFqOkcpNV"
 [4] "On my way to Pensacola, Florida. See everyone soon! #MAGA https://t.co/ijwxVSYQ52"
 [5] "“The unemployment rate remains at a 17-year low of 4.1%. The unemployment rate in manufacturing dropped to 2.6%, th… https://t.co/ujuFLRG8lc"
 [6] "MAKE AMERICA GREAT AGAIN! https://t.co/64a93S07s7"

This contains a lot of URLs, hashtags and other twitter handles. We will remove all these using the gsub function.

tweets.df2 <- gsub("http.*","",tweets.df$text)

tweets.df2 <- gsub("https.*","",tweets.df2)

tweets.df2 <- gsub("#.*","",tweets.df2)

tweets.df2 <- gsub("@.*","",tweets.df2)

Our output now looks like below:

> head(tweets.df2)
 [1] "We believe that every American should stand for the National Anthem, 
and we proudly pledge allegiance to one NATION… "
 [2] "This is your land, this is your home, and it's your voice that matters the most. 
So speak up, be heard, and fight,… "
 [3] "Just arrived at the Pensacola Bay Center. Join me LIVE on "
 [4] "On my way to Pensacola, Florida. See everyone soon! "
 [5] "“The unemployment rate remains at a 17-year low of 4.1%. 
The unemployment rate in manufacturing dropped to 2.6%, th… "
 [6] "MAKE AMERICA GREAT AGAIN! "

Now, we have only the relevant part of the tweets and we can run our sentiment analysis part on the data.

Getting sentiment score for each tweet

We will first try to get the emotion score for each of the tweets. ‘Syuzhet’ breaks the emotion into 10 different emotions – anger, anticipation, disgust, fear, joy, sadness, surprise, trust, negative and positive.

word.df <- as.vector(tweets.df2)

emotion.df <- get_nrc_sentiment(word.df)

emotion.df2 <- cbind(tweets.df2, emotion.df) 

head(emotion.df2)
> head(emotion.df2)
 tweets.df2 anger anticipation disgust
 1 We believe that every American should stand for the National Anthem, and we proudly pledge allegiance to one NATION… 0 0 0
 2 This is your land, this is your home, and it's your voice that matters the most. So speak up, be heard, and fight,… 1 0 0
 3 Just arrived at the Pensacola Bay Center. Join me LIVE on 0 0 0
 4 On my way to Pensacola, Florida. See everyone soon! 0 0 0
 5 “The unemployment rate remains at a 17-year low of 4.1%. The unemployment rate in manufacturing dropped to 2.6%, th… 0 0 1
 6 MAKE AMERICA GREAT AGAIN! 0 0 0
 fear joy sadness surprise trust negative positive
 1 0 1 0 0 3 0 2
 2 1 0 0 0 0 1 1
 3 0 0 0 0 1 0 2
 4 0 0 0 0 0 0 0
 5 1 0 0 0 1 1 1
 6 0 0 0 0 0 0 0

The above output shows us the different emotions present in each of the tweets.
Now, we will use the get_sentiment function to extract sentiment score for each of the tweets.

sent.value <- get_sentiment(word.df)

most.positive <- word.df[sent.value == max(sent.value)]

most.positive
most.negative <- word.df[sent.value <= min(sent.value)] 
most.negative 
> most.positive
 [1] "Stock Market hits new Record High. Confidence and enthusiasm abound. 
More great numbers coming out!"
> most.negative
 [1] "Horrible and cowardly terrorist attack on innocent and defenseless worshipers in Egypt. The world cannot tolerate t… "

Let us see how the score of each of the tweets has been calculated. In all, there are 154 tweets that we are evaluating, so there should be 154 positive/negative scores, one for each of the tweets.

> sent.value
 [1] 1.55 -0.50 0.50 0.00 -0.60 0.50 -0.75 0.50 1.00 1.55 0.00 -1.00 1.85 0.00 0.50 -0.50 1.55 0.50 0.25 0.75 0.50 0.50 2.75
 [24] 0.85 0.75 -0.25 -0.50 0.40 -1.75 -1.75 -1.60 0.50 -1.65 0.75 1.00 -1.35 0.50 0.25 -2.60 0.00 1.15 0.25 -1.25 -0.50 -2.75 -1.10
 [47] -2.25 1.85 0.60 0.00 2.10 0.50 -0.25 3.05 -0.25 -0.75 -0.75 0.05 -0.85 0.00 -0.75 0.00 2.80 1.50 0.75 0.00 -0.05 0.65 -0.75
 [70] -0.50 2.25 -1.75 0.00 0.75 0.75 1.55 0.15 0.65 0.15 0.80 0.00 -0.10 -2.00 -3.25 -3.45 -0.10 0.00 -1.50 0.50 0.50 0.00 2.25
 [93] 1.55 0.80 0.50 0.00 2.35 0.30 -0.25 0.60 0.00 0.65 0.80 0.55 0.40 1.15 -0.10 -1.35 0.00 1.35 -1.00 0.00 -1.10 -1.10 0.00
 [116] -1.15 1.95 1.50 1.55 0.00 0.50 -0.50 -0.75 0.50 0.75 0.70 0.25 0.75 1.25 -0.25 -1.95 -2.75 1.25 -0.75 -0.40 0.50 0.50 -0.50
 [139] 0.00 2.85 1.25 0.50 1.50 0.50 0.40 0.00 0.50 0.50 1.00 1.00 2.05 0.25 0.50 0.50

Segregating positive and negative tweets

Now, we will segregate positive and negative tweets based on the score assigned to each of the tweets.

> positive.tweets <- word.df[sent.value > 0]
 >
 > head(positive.tweets)
 [1] "We believe that every American should stand for the National Anthem, and we proudly pledge allegiance to one NATION… "
 [2] "Just arrived at the Pensacola Bay Center. Join me LIVE on "
 [3] "MAKE AMERICA GREAT AGAIN! "
 [4] "LAST thing the Make America Great Again Agenda needs is a Liberal Democrat in Senate where we have so little margin… "
 [5] "Big crowd expected today in Pensacola, Florida, for a Make America Great Again speech. We have done so much in so s… "
 [6] "I fulfilled my campaign promise - others didn’t! "
> negative.tweets <- word.df[sent.value < 0] >
 > head(negative.tweets)
 [1] "This is your land, this is your home, and it's your voice that matters the most. So speak up, be heard, and fight,… "
 [2] "“The unemployment rate remains at a 17-year low of 4.1%. The unemployment rate in manufacturing dropped to 2.6%, th… "
 [3] "Fines and penalties against Wells Fargo Bank for their bad acts against their customers and others will not be drop… "
 [4] "Across the battlefields, oceans, and harrowing skies of Europe and the Pacific throughout the war, one great battle… "
 [5] "National Pearl Harbor Remembrance Day - “A day that will live in infamy!” December 7, 1941"
 [6] "Putting Pelosi/Schumer Liberal Puppet Jones into office in Alabama would hurt our great Republican Agenda of low on… "
> neutral.tweets <- word.df[sent.value == 0] >
 > head(neutral.tweets)
 [1] "On my way to Pensacola, Florida. See everyone soon! "
 [2] "Tonight, "
 [3] "Today, the U.S. flag flies at half-staff at the "
 [4] "Biggest Tax Bill and Tax Cuts in history just passed in the Senate. Now these great Republicans will be going for f… "
 [5] "Our FIFTH 1K milestone of 2017!\n"
 [6] "The only people who don’t like the Tax Cut Bill are the people that don’t understand it or the Obstructionist Democ… "

# Alternate way to classify as Positive, Negative or Neutral tweets

category_senti <- ifelse(sent.value < 0, "Negative", ifelse(sent.value > 0, "Positive", "Neutral"))

head(category_senti)
> head(category_senti)
 [1] "Positive" "Negative" "Positive" "Neutral" "Negative" "Positive"

> category_senti2 <- cbind(tweets,category_senti,senti) > head(category_senti2)
 tweets category_senti senti
 [1,] "We believe that every American should stand for the National Anthem, and we proudly pledge allegiance to one NATION… " "Positive" "1.55"
 [2,] "This is your land, this is your home, and it's your voice that matters the most. So speak up, be heard, and fight,… " "Negative" "-0.5"
 [3,] "Just arrived at the Pensacola Bay Center. Join me LIVE on " "Positive" "0.5"
 [4,] "On my way to Pensacola, Florida. See everyone soon! " "Neutral" "0"
 [5,] "“The unemployment rate remains at a 17-year low of 4.1%. The unemployment rate in manufacturing dropped to 2.6%, th… " "Negative" "-0.6"
 [6,] "MAKE AMERICA GREAT AGAIN! " "Positive" "0.5"

So, now we have analyzed the twitter handle of Donald Trump and got the sentiment around tweets. The break of total number of tweets by sentiment is

> table(category_senti)
 category_senti
 Negative Neutral Positive
 49 20 85

Conclusion

I’m sure you can now easily relate to the significance of sentiment analysis that I have discussed at the beginning of the article.

Sentiment analysis could be extended to a far greater extent, even to images as well. Though there are a lot of tools available in the market already but having practical knowledge of how does the entire process works is beneficial.

Moreover, the available tools are very expensive and do not offer the level of flexibility and customization that you can develop using R.

Follow us:

FACEBOOKQUORA |TWITTERGOOGLE+ LINKEDINREDDIT MEDIUM GITHUB

I hope you like this post. If you have any questions, then feel free to comment below.  If you want me to write on one particular topic, then do tell it to me in the comments below.

Related Courses:

Author Bio:

This article was contributed by Perceptive Analytics. Chaitanya Sagar, Jyothirmayee Thondamallu, and Saneesh Veetil contributed to this article.
Perceptive Analytics provides data analytics, data visualization, business intelligence and reporting services to e-commerce, retail, healthcare, and pharmaceutical industries. Our client roster includes Fortune 500 and NYSE listed companies in the USA and India.

13 Responses to “Twitter Sentiment analysis using R

  • IT Education
    11 months ago

    That was a great post.

    • Thank you so much! I’m delighted to hear that you enjoyed it. If you have any questions or topics you’d like to see explored further, please don’t hesitate to share your thoughts. Your feedback is always appreciated! 😊📚

  • Sheetal Jamdade
    1 year ago

    You write amazing value-sharing blogs. I just love to read your blogs. We are an institute that imparts digital skills. I will for sure share your valuable article with our students which will be of great help to them and clear all their doubts related. Thank You.

  • “Nice post, thank you for sharing this information.

  • Harshad Mane
    3 years ago

    The best way of Twitter using R is explained in this blog. A value-added blog for the above topic.

  • alejandro sanchez
    4 years ago

    Congratulations for this work!!!

    • Thanks, Alejandro,

      We are glad that you like the article.

      • SIDDHANT
        4 years ago

        Thank You For This Blog!

        But I’ve a doubt in understanding your code.

        What is “senti” inside the cbind() function in the second last block.

        And where was it initially declared in the script?

        Thank You Once Again!

        • Hi Siddhant,

          Thanks for your kind words, it’s not senti, it is sent typo error, sorry for that.

          Thanks and happy learning,
          Saimadhu

  • Very well explained. Nice Post.

Leave a Reply

Your email address will not be published. Required fields are marked *

>