Getting Twitter data and tweets is easy in R and can be a great source of text data for NLP applications to better understand your customers or gain insight into topics that are being tweeted about. In this brief post, I’ll walk through how to easily start searching for tweets in minutes, once you’ve setup your free Twitter developer’s account which allows access to the Twitter API.
Setup Twitter Developer Account
- If you’re not already a twitter user, you’ll need to signup for a new account: www.twitter.com/signup
- Then you’ll need to apply for a developer account: https://developer.twitter.com/en/apply/user. Just follow through and answer the questions about what you are planning to do with Twitter data and submit the application. In my application I described that I would like to use the account for learning purposes. After you submit, you’ll get an email asking to confirm your email address. Finally, you should receive an email saying you’ve been approved (this took a couple days for me) and a link to get started.
- From the get started page select the first option to ‘Create an app’.
- Fill in the 3 required fields; the name, a description, and a website URL (you can use a personal website or even your twitter page www.twitter.com/{your twitter username}. For our R code using the rtweet package you will also need to set the optional Callback URL to http://127.0.0.1:1410 and then submit to create the app.
- Now you can generate the API keys and access tokens you will need to authenticate and communicate with the Twitter API. Select the ‘Keys and tokens’ option on the app page and click to generate an access token and access token secret. Keep this page open, you’ll need these 4 ridiculously long codes in your R script and your app name for later.
Rtweet and User Authentication
Now that you’ve got everything you need to get started, it’s super easy to start searching for tweets in R. There are several R packages that make it easier to interact with the Twitter API, I found the rtweet package the most current and easy to use. If you find another that you think is better, please let me know in the comments.
# Packages library(rtweet) # set Callback as http://127.0.0.1:1410 on Twitter App # Enter you Twitter app info here consumer_key <- " " consumer_secret <- " " access_token <- " " access_secret <- " " appname = " " twitter_token <- create_token( app = appname, consumer_key = consumer_key, consumer_secret = consumer_secret, access_token = access_token, access_secret = access_secret)
Replace the blanks above with your consumer key & secret, your access token & secret, and your app name you just created. Run this code and voila , you are ready to go!
There are a number of things you can do using the API. In addition to searching for Tweets, you can do pretty much anything else you can do directly in Twitter, including posting new Tweets, following other users, or look at current trends on Twitter. Although for this tutorial let’s just focus on retrieving Tweets for future text analysis. To do some of the other tasks, a good place to start is the rtweet manual or vignette, and the Twitter developer documentation.
Below are three quick examples of how to query the Twitter database using the search_tweet function. Because the search can return nested lists, we use the flatten function before saving with the standard write.csv function. Note that the basic Standard API developer account only queries tweets from the past week or so, you’ll have to pony up some cash if you want access to more historical data and additional functionality.
Search Tweets by User
tweets <- search_tweets( "from:realDonaldTrump", n = 10000, include_rts = FALSE ) tweets <- flatten(tweets) write.csv(tweets, paste0(Sys.Date(), "_Trump_tweets.csv"), row.names = FALSE)
By using the ‘from:‘ format in the search term we can search tweets for a particular user, in this case a prolific presidential tweeter. Did he really Tweet 71 times in the past week? Apparently so. We set the number of tweets to retrieve to 10,000 and set to not include retweets by adding include_rts = FALSE. The search will return a large number of fields, but you can see a sample of the most relevant ones below; date&time, user, tweet, and source.
Search Tweets by Topic
tweets <- search_tweets( "#canucks", n = 10000, include_rts = FALSE ) tweets <- flatten(tweets) write.csv(tweets, paste0(Sys.Date(), "_Canucks_tweets.csv"), row.names = FALSE)
Similar to the query above, but we replace the search term with ‘#canucks‘, which will return mostly tweets about the Vancouver Canucks hockey team.
Stream Tweets by Topic
tweets <- stream_tweets("canada", timeout = 20) tweets <- flatten(tweets) write.csv(tweets, paste0(Sys.Date(), "_Canada_tweets.csv"), row.names = FALSE)
This is a neat query because it let’s you stream tweets in real-time. For example, the query above will retrieve all tweets mentioning Canada for a period of 20 seconds.
Summary
So that’s it! In this quick post, I shared how to get set up with the Twitter API and how to query tweets in R using the rtweet package in a quick and easy way. You can find all the code and sample Tweet data at Github. In my next post I’ll share how we can use this text data to do some unsupervised topic modelling using word and document vectors. Happy Tweet Querying!
I just applied for my Developer Account and now looking forward to finding and analysing text data.
Thanks for this article.