How Analysts “Read” 1,846 Political Tweets Instantly

阿新 • • 發佈：2018-12-28

Trying out the algorithms:

The topics derived from LSA seemed pretty unclear, with a lot of overlapping words. Topic zero seemed roughly to be about former president Megawati Sukarnoputri’s support for Presidential Candidate and Jakarta Governor Jokowi and his running mate, Jusuf Kalla (also known as JK) against Former General Prabowo Subianto. Topic one was almost identical, substituting the word for “talks” with the word for “governor.” Topic three was interesting, discussing Presidential Candidate Prabowo and rival Vice Presidential Candidate Kalla, as well as two dropouts from the presidential race, Aburizal Bakrie and Hanura Wiranto, as well as the words for candidate and Vice Presidential candidate. Perhaps these tweets were regarding a “dream team” of Prabowo as President and Kalla as Vice President. Topic three once again discussed Prabowo and Kalla, with mention of a convention, Aburizal Bakrie, and the words “election”, “enter”, and “forward”. The final topic once again included Golkar Chairman Aburizal Bakrie, Hatta, Wiranto, Prabowo, Dahlan, Prabowo’s PAN party, and the words “vice president”, “open”, and “evaluation”. Overall, these groupings don’t appear particularly useful or enlightening.

LDA with a bag of words seemed better, with topics such as “Kalla”, “Leader(ship)”, “Governor (Jokowi)”, “Real”, and “Popular” (Topic zero: Jokowi), “Dahlan”, “Prabowo”, “Wiranto”, “Fill in”, “Yes”, “Support”, “Ahok”, and “Governor” (Topic one: Famous endorsements for Jokowi against Prabowo), “Kalla”, “Aburizal Bakrie”, “Prabowo” “Tri Rismaharini”, and “Mahfud MD” (Topic 2: Famous Endorsements of Prabowo against the Jokowi-Kalla ticket, or in Rismaharini’s case, a fake endorsement), “convention”, “PDI-P party”, “Golkar party”, “Mahfud MD”, “survey”, “People’s/public”, “win” (Topic three: polling and party comparison), and “Megawati”, “PAN”, “Presidential candidacy”, and “Hatta Radjasa” (Topic four: Prabowo). The topics also had a nice symmetry, with one topic per major candidate, one topic of famous supporters of each candidacy, and a topic on polling and party comparison.

LDA with Tf-idf gave interesting topics such as “Dahlan”, “PDI-P”, “Forward”, “Support”, “Mega(wati)”, “Governor”, “Kalla” (Topic zero: Jokowi), “Hatta”, “Megawati”, “Party”, “Kalla”, “Vice President”, “PAN” and “Evaluation” (Topic one: Vice Presidents), “Jakarta”, “Dahlan”, “Convention”, “Vice President”, “Megawati”, “Yudhoyono”, “Mahfud MD” (Topic two: national political players), “Aburizal Bakrie”, “Candidate”, “Ad”, “Survey”, “Tri Rismaharini”, “Yudhoyono”, “Prabowo”, “Vice Presidential Candidate”, and “Kalla” (Topic three: unclear, possibly political advertisements and polling), and “Prabowo”, “Vice”, “Kalla”, “Wiranto”, “Megawati”, “People”, and “Indonesia” (Topic five: unclear, more famous political players). However, the topics seemed less clearly-defined than LDA with the bag of words.

Now for some fun testing the algorithm on fake tweets:

For this particular corpus, and perhaps this is related to the small total number of tweets and the limited words in each tweet, LDA performed better when not weighted with Tf-idf, so that is the model and topic grouping I chose to proceed with.

I ran the LDA with BOW algorithm on a tweet about Jokowi — “kalau jadi presiden jokowi tetep jadi gubernur jakarta tidak” — which matched 73% with the first topic: Jokowi. A fake Indonesian tweet I wrote supporting Jokowi, PDI-P, and Kalla — “Saya mendukung JK dan Kalla! PDI-P selamanya!” — also scored as an 80% match with the Jokowi topic.

The verdict:

Despite my small corpus and limited vocabulary in each tweet, LSA and LDA helped me quickly suss out topics within the dataset and see sensible topic clusterings. A simple LDA with a bag of words gave me the most sensible clusterings of the two presidential candidates, famous endorsements or supporters of each ticket, and public polling. The model also seemed to perform well on a sample tweet and synthetically-created tweet.

To really shine, though, these models should be applied to a much larger corpus to better represent the Indonesian twitter population during the 2014 Indonesian election. A more complete corpus could also enable me to map the sizes of each topic cluster more meaningfully to answer questions such as whether Jokowi or Prabowo seemed to generate more tweets, or how many tweets seem to have been about endorsements or supporters rather than focusing on the actual candidates. Still, the fioundation is here, and there’s lots of room to expand this basic code framework for use by politicians and political campaigns (as well as political historians examining past elections and political movements in the digital age).

For more information and the code behind this analysis, please check out the respository and full report on my GitHub account.

How Analysts “Read” 1,846 Political Tweets Instantly

Trying out the algorithms:

Now for some fun testing the algorithm on fake tweets:

The verdict:

How Analysts “Read” 1,846 Political Tweets Instantly

java 讀取配置文件工具類 (how to read values from properties file in java)

How To Read A Paper

Yii2 HOW-TO（1）：把數據從控制器傳遞到視圖

How to read version (and other) information from Android and iOS apps using Java

How Tomcat Works 1: A Simple Http Server

How to read a file bytes from a offset

How to read text file in client side via HTML5

How to read *.data in Matlab and Python

Learning How to Learn(1)

如何閱讀一片論文 How to read a paper

How to read the environment variables in groovy email template 郵件模板中讀取系統環境變數

How to Read a Paper丨如何閱讀一篇論文

窩上課不聽，how to learn C language easily(1)

img = img1mask + img2(1-mask) How do that ?

How To run OAI eNB (No S1) with USRP X310(1)

build your own website 1 -- how to install Apache Server

How to Pronounce the Numbers 1 – 10

安卓：Could not read cache value from'C:\Users\Username\.gradle\daemon\1.12\registry.bin'

How to setup oAuth 1.0 in NetSuite RESTlet API 如何在NetSuite中設定RESTlet API的oAuth認證

How Analysts “Read” 1,846 Political Tweets Instantly

Trying out the algorithms:

Now for some fun testing the algorithm on fake tweets:

The verdict:

相關推薦