Jun 6, 20222 min

How to Visualize Twitter Trends in 4 Simple Steps

A Swift Approach to Research Trending Topics in 10 Minutes

1. Install Twint and Import Libraries

Twint is a powerful yet straightforward Python package that allows scraping Twitter posts without API.


 
!pip3 install --user --upgrade git+https://github.com/twintproject/twint.git@origin/master#egg=twint
 
!pip install nest-asyncio
 
import twint
 
import nest_asyncio
 
nest_asyncio.apply()
 


 
2. Configure your Twitter Requests

  • config.Search: specifies any topics that you are interested in

  • config.Limit: how many tweets to scrape

  • config.Since and config.Until: controls the time range of the tweets published date

  • config.Output: specifies the output location. Twint supports the format csv, JSON, SQLite and ElasticSearch

config = twint.Config()
 
config.Search = "data science" #replace the topic as you like
 
config.Limit = 1000
 
config.Lang = "en"
 
config.Store_csv = True
 
config.Since = '2022-06-04'
 
config.Until = '2022-06-06'
 
config.Output = "twitter.csv"
 
twint.run.Search(config)


 
3. Visualize Using WordCloud

Now that the information is stored in the file “twitter.csv”, we can import it as a Pandas dataframe and use WordCloud library to generate word cloud by passing the content df.tweet. WordCloud is an easy tool to visualize the word based on its occurrence frequency in texts.

from wordcloud import WordCloud
 
import matplotlib.pyplot as plt
 
from nltk.corpus import stopwords
 
import pandas as pd
 

 
# read csv as pandas dataframe
 
df = pd.read_csv("twitter.csv")
 

 
# optional - remove stopwords from the tweets
 
stopwords = stopwords.words('english')
 
stopwords.extend(['data science', 'data', 'science', 'Data Science', 'DataScience'])
 
# optional - remove urls from the tweets
 
tweet = df.tweet.replace('https://t.co/|https://', '', regex = True)
 

 
# generate wordcloud
 
wordcloud = WordCloud (
 
stopwords = stopwords,
 
background_color = 'white',
 
width = 1920,
 
height = 1080,
 
colormap = "GnBu"
 
).generate(' '.join(tweet))
 
plt.subplots(figsize = (16,16))
 
plt.imshow(wordcloud)


 
We can run some data cleansing to make the results more relevant.

  • remove stopwords using library stopwords and append your own stop words to the list, e.g. variations of the topic keywords are are not bringing additional value here

  • remove hyperlinks/urls in texts e.g. df.tweet.replace(https://, '', regex = True)

There are several parameters of the WordCloud that you can play around with such as background_color, width, height, colormap. Most importantly, pass the tweet dataset to generate the word cloud.

The most mentioned topics will appear larger compared to those tiny ones that are mentioned less frequently — as shown here, “Python” and “AI” stand out.
 

4. Spot the Trend

Time to implement it by yourself. You can play around with different settings of the topic, timeframe, or sample size and see how the trends evolve.

  • explore different topics — machine learning vs. deep learning

machine learning topics on Twitter
deep learning topics on Twitter

  • comparing data science trends with different timeframe — on June 5th 2020, June 5th 2018, June 5th 2012

data science topics on 2022 June 5th
data science topics on 2018 June 5th
data science topics on 2012 June 5th

Hope you found this article helpful. If you’d like to support my work and see more articles like this, treat me a coffee ☕️ by signing up Premium Membership with $10 one-off purchase.


 

    7940
    0