How to Visualize Twitter Trends in 4 Simple Steps
A Swift Approach to Research Trending Topics in 10 Minutes
1. Install Twint and Import Libraries
Twint is a powerful yet straightforward Python package that allows scraping Twitter posts without API.
!pip3 install --user --upgrade git+https://github.com/twintproject/twint.git@origin/master#egg=twint
!pip install nest-asyncio
import twint
import nest_asyncio
nest_asyncio.apply()
2. Configure your Twitter Requests
config.Search: specifies any topics that you are interested in
config.Limit: how many tweets to scrape
config.Since and config.Until: controls the time range of the tweets published date
config.Output: specifies the output location. Twint supports the format csv, JSON, SQLite and ElasticSearch
config = twint.Config()
config.Search = "data science" #replace the topic as you like
config.Limit = 1000
config.Lang = "en"
config.Store_csv = True
config.Since = '2022-06-04'
config.Until = '2022-06-06'
config.Output = "twitter.csv"
twint.run.Search(config)
3. Visualize Using WordCloud
Now that the information is stored in the file “twitter.csv”, we can import it as a Pandas dataframe and use WordCloud library to generate word cloud by passing the content df.tweet. WordCloud is an easy tool to visualize the word based on its occurrence frequency in texts.
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from nltk.corpus import stopwords
import pandas as pd
# read csv as pandas dataframe
df = pd.read_csv("twitter.csv")
# optional - remove stopwords from the tweets
stopwords = stopwords.words('english')
stopwords.extend(['data science', 'data', 'science', 'Data Science', 'DataScience'])
# optional - remove urls from the tweets
tweet = df.tweet.replace('https://t.co/|https://', '', regex = True)
# generate wordcloud
wordcloud = WordCloud (
stopwords = stopwords,
background_color = 'white',
width = 1920,
height = 1080,
colormap = "GnBu"
).generate(' '.join(tweet))
plt.subplots(figsize = (16,16))
plt.imshow(wordcloud)
We can run some data cleansing to make the results more relevant.
remove stopwords using library stopwords and append your own stop words to the list, e.g. variations of the topic keywords are are not bringing additional value here
remove hyperlinks/urls in texts e.g. df.tweet.replace(https://, '', regex = True)
There are several parameters of the WordCloud that you can play around with such as background_color, width, height, colormap. Most importantly, pass the tweet dataset to generate the word cloud.
The most mentioned topics will appear larger compared to those tiny ones that are mentioned less frequently — as shown here, “Python” and “AI” stand out.
4. Spot the Trend
Time to implement it by yourself. You can play around with different settings of the topic, timeframe, or sample size and see how the trends evolve.
explore different topics — machine learning vs. deep learning
comparing data science trends with different timeframe — on June 5th 2020, June 5th 2018, June 5th 2012
Hope you found this article helpful. If you’d like to support my work and see more articles like this, treat me a coffee ☕️ by signing up Premium Membership with $10 one-off purchase.
Comments