1. 程式人生 > >Finding the favorite team in 2018 FIFA World Cup through scraping Tweets

Finding the favorite team in 2018 FIFA World Cup through scraping Tweets

Finding the favorite team in 2018 FIFA World Cup through scraping Tweets

Job of a Footballer and a Data scientist exhibit identical work-life set up which is combination of skill and will .

On the eve of 14th June,an acclaimed religious festival of all football followers — FIFA World Cup 2018 began in Russia.This month long prestigious sports bonanza will be celebrated across the globe till it’s mega finale scheduled on 15th July.

Seizing this opportunity, I tested my newly found skills to gauge who will be a crowd favorite among the two competing teams in a given fixture.

Things you’ll learn here :
  1. How to scrape data from Twitter?
  2. Data Mining
Prerequisites :
  1. Tweepy library should be installed on your machine.

2. Access to Twitter. Make an account if you don’t have already.

Scraping textual data is an integral part of natural language processing.Twitter is very utilitarian platform when task in hand is analyzing sentiments.

Thought Process :

This is how the our approach will be — firstly, we will generate desired credentials to make use of Twitter API, then we will write a python code to extract live tweets, and in the end we will analyze the tweets by selecting pertinent keywords.

Let the fun begin…

Step-wise Guide for Scraping Tweets

A. Generating Credentials for Twitter API :

I. Visit https://apps.twitter.com/and log in with your twitter credentials.

II. Click on “Create New App

III. Enter necessary details and click “Create your Twitter application

IV. On next page, click on “API keys” tab, and copy your “API key” and “API secret”. (Suggestion: Do make a sticky note on desktop of these credentials)

V. Scroll down and click “Create my access token”, and copy your “Access token” and “Access token secret”.

B. Connecting to Twitter API through Python code :

Below is the code you need in order to connect to live streaming of tweets on Twitter and download them on your machine.

  • Note: Python 2.7 version used while coding.

Once you execute the code on command prompt (for windows users), you’ll witness following data flow.

Live Streaming Tweets

C. Analyzing Data :

Type below line to save downloaded tweets on command line.

python twitter_streaming.py > twitter_data.txt

To understand the direction of the polarity of the tournament opener, I ran the program for 2 days in regular intervals (on 13th and 14th June) to get a sizable and meaningful data sample. I could scrape text consisting 79,359 tweets of size is 450 MB in above mentioned time period.