1. 程式人生 > >Build a predictive model on Watson Studio using CSV data set from Tweets

Build a predictive model on Watson Studio using CSV data set from Tweets

In the era that we currently live in, all the focus has shifted towards data. Each day, the amount of data that is generated and consumed is increasing, adding somewhere around 5 exabytes of data. Everything we do generates data, be it turning on and off the light, or commuting from home to work. This data can be used to generate information that can be used for insights to predict and extract patterns. Data Mining or Data Science is the term that has taken the industry abuzz. It is the process of discovering patterns, insights, and associations from data. In this how-to guide we’ll learn how to use data and implement a predictive model on it to get insights. Our intended audience include developers, general users with basic knowledge of programming, and organizations that want to enhance customer experience. It will enable a user to create a predictive model on Watson Studio, which is a cloud-based environment for Data Scientists. By using this how-to user can predict and optimize their twitter interaction and would lead to optimum traffic on their tweets.

Learning objectives

After completing this how-to, the reader will be able to:

  • Learn Watson Studio to build a predictive model using any CSV data.
  • Extract user information from Twitter.
  • Leverage Twitter to predict and optimize their twitter interactions.


Estimated time

To complete this tutorial it should take around 45 minutes.


Use sample data or get your own?

The first thing we’ll need to do is get a bunch of tweets to analyze. In this step we’ll go through how to get a bunch of tweets, but if you’re not interested in doing that, we provide a sample data set:

  • : Tweets from a Ufone, a phone operator, cleaned up and ready for Watson Studio. (Use this one!)
  • : Same as above, but raw, taken directly from tweepy. (Only added for completeness.)

Step 1. Getting Twitter API access (optional)

If you’re using the sample data, then skip to Step 3.

Before we use tweepy to get tweets we neeed to generate OAuth Consumer and Access token keys and secrets. There are various guides that show how to do this, like this one, but the Twitter UI will change. It’s best to go to https://developer.twitter.com to follow along. In the end you’ll end up with these keys and secrets:

  • Consumer API Key
  • Consumer API Secret
  • Access Token
  • Access Token Secret

These can be revoked and regenerated, but as with any other key, you should keep these secret.

Step 2: Saving Tweets to CSV format (optional)

Again, if you’re using the sample data, then skip to Step 3.

Now that we’ve got our Twitter API keys and secrets, we can use tweepy to save tweets into a CSV file. Free developer accounts on Twitter will limit the amount of tweets that are retrieved, but that’s enough for our purposes.

If you don’t have Python, then download and install the latest version, and then install tweepy. This can be done using pip install tweepy, if you have pip installed.

Copy the code below into a new file and save it. There are a few lines to update at the top, add values to the variables for keys, secrets, and the twitter handle you want to analyze.

import csv
import tweepy

# Twitter API credentials
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""
screen_name = ""

def get_all_tweets():
    # initialize tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)

    alltweets = []

    # request first 200 tweets, the max allowed
    new_tweets = api.user_timeline(screen_name=screen_name, count=200)
    oldest = alltweets[-1].id - 1

    # keep grabbing tweets until the 3200 tweet limit is hit
    while len(new_tweets) > 0:
        print("getting tweets before id: %s" % (oldest))
        new_tweets = api.user_timeline(screen_name=screen_name,
        oldest = alltweets[-1].id - 1
        print("...%s tweets downloaded so far" % (len(alltweets)))

    return alltweets

def write_tweets_to_csv(tweets):
    # transform the tweepy tweets into an array
    outtweets = [[tweet.id_str, tweet.created_at,
                  tweet.text.encode("utf-8"), tweet.retweet_count,
                  tweet.favorite_count] for tweet in tweets]

    # write the csv
    with open('%s_tweets.csv' % screen_name, 'w') as f:
        writer = csv.writer(f)
        writer.writerow(["id", "created_at", "text", "Retweets", "Favorites"])


if __name__ == '__main__':
    # pass in the username of the account you want to download
    tweets = get_all_tweets()

Run the script by running python tweets.py in a terminal, a CSV file will be output, containing various tweets and information about those tweets, for example:

You can remove the id and created_at columns, and remove empty rows to clean the data a bit.

Step 3: Log into Watson Studio

IBM Watson Studio is an easy-to-use, collaborative and cloud based environment for data scientists where they can use tools like Scala, R, Jupyter Notebookc etc.

Log into https://dataplatform.cloud.ibm.com/ and choose to create a New Project, the Complete option will work for this tutorial.

At the new project wizard, enter a Name and Description, You will also be required to create a new Object Storage service or choose an existing service during project creation. Once created, you’ll be able to see a project overview, for example:

Once created, we can add an asset, by clicking Add to project and in this case, we’ll click Model, to add a new model.

Step 4: Create a new model

Give your model a Name and Description. We will also set the Model type option to Model builder and choose the Manual for this exercise.

Before proceeding we need to associate two services. An Apache Spark service, and a Machine Learning service. You can use the UI to create a new one or select an existing one. For an example of how to do that with Apache Spark, refer to this IBM Code Tutorial. To do that with Machine Learning is the same exercise.

Step 5: Add data to the model

We’re now going to add the CSV file to the model. Click Add Data Assets, browse to either the generated CSV file or the saved sample CSV file. The data should appear in the dashboard, for example:

Click on the Next button to continue. Loading the data may take a few minutes.

Step 6: Select a training technique

For this example we’re trying to predict the best time to send a tweet, so let’s set the Column value to predict to be hour. Leave the Feature columns unchanged and set to All. The important choice here is the technique used, we’ll be using the Regression technique. We’ll also be leving the Validation Split unchanged.

It should be noted that because the classifier is set to hour, which has around 20 values, Watson Studio will suggested Multiclass classification. But in this case the best technique according to our data is Regression.

We also need to add estimators. To do that, click on Add Estimators and select all avilable choices, then click Add.

Once we have our technique and estimators selected we can click Next. This will start training and testing data. This step will take a few minutes to fully complete.

Step 7: Wrapping up

The results show just how accurate each estimator is, with the most optimal estimator at the top. Here it is Isotonic Regression, click on the first one and select the Save option, for example:

Once saved, you will be redirected to an overview of the model, for example:

From here, we can create a web deployment so our model is accessible over a REST call.

Congratulations! Your model is saved, deployed, and you can start testing it out with the generated cURL, Java, JavaScript and Python snippets.


In this tutorial we learned to extract user data from twitter and then perform data science predictive model on it to optimize future tweeting and increasing the users audience. This tutorial of building a model on Watson Studio can be applied on any other CSV file as well and can be further deployed on a web application. We also learned how to deploy the model as a web application to allow REST calls.


Build a predictive model on Watson Studio using CSV data set from Tweets

In the era that we currently live in, all the focus has shifted towards data. Each day, the amount of data that is generated and co

1.2 vrep例程之建立模型(build a clean model

文章目錄 前言 建立可見的形狀 模型預處理 簡化mesh 簡化方法 效果圖 應用 劃分為連桿 含空洞模型的劃分(拓展) 統一各部分屬性

Build a wealth insights and management application using a Jupyter Notebook

About this event Diane Reynolds and Raheel Zubairy walk through the Client Insight for Wealth Management APIS using a Jupyter Noteboo

Ask HN: How to build a Game Center on recurring revenue?

My company sells gaming desktop computers for about 10 years. We accomplished a well know website, social media followers and customer base, but business i

How to build a front-line concussion monitoring system using AWS IoT and serverless data lakes

In part 1 of this series, we demonstrated how to build a data pipeline in support of a data lake. We used key AWS services such as Amazon Kinesis

Running a TensorFlow model on iOS and Android

Running a TensorFlow model on iOS and AndroidShrink the model size and reduce the computational resources needed to do the inference calculationsSo you are

Clever Application Of A Predictive Model

Tweet Share Share Google Plus What if you could use a predictive model to find new combinations

Host a Public Website on Amazon EC2 Using IIS

Amazon Web Services is Hiring. Amazon Web Services (AWS) is a dynamic, growing business unit within Amazon.com. We are currently hiring So

Use Watson Studio to visualize query results from Watson Discovery News

Summary Is it possible to get a pulse of the overall sentiment of something using news articles? The answer is yes. Using Watson serv

Build machine learning model for analyzing financial credit risk using Watson Studio

IBM Watson Studio is a data science platform that provides all of the tools needed to develop a data-centric solution on the cloud.

Use Watson Knowledge Studio to build a custom machine learning model in the medical domain

About this webcast One of the key benefits of building a machine learning annotator is the ability to train Watson in a complex domain such as medicine.

Build a customer churn predictor using Watson Studio and Jupyter Notebooks

Summary This code pattern walks you through the full cycle of a data science project. You begin by understanding the business perspec

Author name disambiguation using a graph model with node splitting and merging based on bibliographic information

分隔 需要 sin 相似性度量 進行 ati 判斷 特征向量 edi Author name disambiguation using a graph model with node splitting and merging based on bibliographic

Build a handwritten digit recognizer in Watson Studio and PyTorch

Summary Recognizing handwritten numbers is a piece of cake for humans, but it’s a non-trivial task for machines. Currently, however,

mysql5.7 Installing MySQL on Microsoft Windows Using a noinstall Zip Archive(mysql解壓版安裝)

ase order gin dmi 選項 -s 包含 xtra tar 註:參考官網文檔 mysql解壓版安裝配置大致分為以下6步: Extract the main archive to the desired install directory Optional:

pcl之projecting points using a parametric model

ffi tin tcl his clu argc types.h fill putc pcl之projecting points using a parametric model #include <iostream> #include <pcl/io/p

Machine Learning: How to Build a Model From Scratch

As an online travel booking company, Momentum Travel realized early on that identifying and preventing fraud is a vital part of their business. Hear from S

Build a virtual assistant for iOS with Watson

Summary Create an application that understands natural language and responds to customers in human-like conversation – in multiple la

Access GSuite APIs on your domain using a service account

Access GSuite APIs on your domain using a service accountThis tutorial provides a step-by-step guide to creating a service account that has delegated domai

Why build a dapp or blockchain on Tendermint consensus?

Tendermint is fast. I mean really fast: my tests showed thousands of transaction per second. It’s possible because it doesn’t use Proof of Work, but implem