top of page
Upgrade to Get Unlimited Access
($10 One Off Payment)

How to Visualize Twitter Trends in 4 Simple Steps

A Swift Approach to Research Trending Topics in 10 Minutes


1. Install Twint and Import Libraries

Twint is a powerful yet straightforward Python package that allows scraping Twitter posts without API.


!pip3 install --user --upgrade git+https://github.com/twintproject/twint.git@origin/master#egg=twint
!pip install nest-asyncio
import twint
import nest_asyncio
nest_asyncio.apply()

2. Configure your Twitter Requests

  • config.Search: specifies any topics that you are interested in

  • config.Limit: how many tweets to scrape

  • config.Since and config.Until: controls the time range of the tweets published date

  • config.Output: specifies the output location. Twint supports the format csv, JSON, SQLite and ElasticSearch

config = twint.Config()
config.Search = "data science" #replace the topic as you like
config.Limit = 1000
config.Lang = "en"
config.Store_csv = True
config.Since = '2022-06-04'
config.Until = '2022-06-06'
config.Output = "twitter.csv"
twint.run.Search(config)

3. Visualize Using WordCloud

Now that the information is stored in the file “twitter.csv”, we can import it as a Pandas dataframe and use WordCloud library to generate word cloud by passing the content df.tweet. WordCloud is an easy tool to visualize the word based on its occurrence frequency in texts.

from wordcloud import WordCloud
import matplotlib.pyplot as plt
from nltk.corpus import stopwords
import pandas as pd

# read csv as pandas dataframe
df = pd.read_csv("twitter.csv")

# optional - remove stopwords from the tweets 
stopwords = stopwords.words('english')
stopwords.extend(['data science', 'data', 'science', 'Data Science', 'DataScience'])
# optional - remove urls from the tweets
tweet = df.tweet.replace('https://t.co/|https://', '', regex = True)

# generate wordcloud
wordcloud = WordCloud (
                    stopwords = stopwords,
                    background_color = 'white',
                    width = 1920,
                    height = 1080,
                    colormap = "GnBu"
            ).generate(' '.join(tweet))
plt.subplots(figsize = (16,16))
plt.imshow(wordcloud)

We can run some data cleansing to make the results more relevant.

  • remove stopwords using library stopwords and append your own stop words to the list, e.g. variations of the topic keywords are are not bringing additional value here

  • remove hyperlinks/urls in texts e.g. df.tweet.replace(https://, '', regex = True)

There are several parameters of the WordCloud that you can play around with such as background_color, width, height, colormap. Most importantly, pass the tweet dataset to generate the word cloud.

data science topics

The most mentioned topics will appear larger compared to those tiny ones that are mentioned less frequently — as shown here, “Python” and “AI” stand out.

4. Spot the Trend

Time to implement it by yourself. You can play around with different settings of the topic, timeframe, or sample size and see how the trends evolve.

  • explore different topics — machine learning vs. deep learning

machine learning topics on Twitter
machine learning topics on Twitter
deep learning topics on Twitter
deep learning topics on Twitter

  • comparing data science trends with different timeframe — on June 5th 2020, June 5th 2018, June 5th 2012

data science topics on 2022 June 5th
data science topics on 2022 June 5th
data science topics on 2018 June 5th
data science topics on 2018 June 5th
data science topics on 2012 June 5th
data science topics on 2012 June 5th
Hope you found this article helpful. If you’d like to support my work and see more articles like this, treat me a coffee ☕️ by signing up Premium Membership with $10 one-off purchase.






Recent Posts

See All

Comments


bottom of page