Measuring Discourse Bias Using Text Network Analysis

阿新 • • 發佈：2018-12-28

Measuring Discourse Bias Using Text Network Analysis

In this article I propose a method and a tool to measure the level of bias in discourse based on text network analysis. The measure is based on the structure of text and uses both quantitive and qualitative parameters of a text graph to identify how strongly biased it is. Therefore, it can be used by humans as well as be implemented into various APIs and AI to perform automatic bias analysis.

Bias: the Good and the Bad

Bias is commonly understood as inclination or prejudice towards a certain point of view. A discourse or text that has a bias may have a certain agenda or promote a certain ideology.

In the age of “fake news”, the rise of extreme ideologies and various misinformation techniques it is important to be able to identify the level of bias in discourse: be it social network posts, newspaper articles or political speeches.

Bias is not necessarily a bad thing. Sometimes it can make an intention stronger, push an agenda forward, make a point, persuade, dissuade and transform. Bias is an agent of change, however, when there is too much of it, bias can also be destructive. When we measure bias we measure how ideologically charged a text is, how much it wants to put forward a certain point of view. In some contexts — like fiction or highly charged political speeches — strong bias may be preferential. In some other contexts — like news or non-fiction — strong bias may reveal an agenda.

Currently there are no tools that can measure how biased a text is. Various text mining APIs categorize texts based on its content and sentiment, but there are no instruments that can measure the level of inclination towards a certain point of view in text. The instrument and the method proposed in this article can serve as the first step in this direction. The open-source online tool for text network analysis that I developed already can measure bias based on this methodology, so you are welcome to try it on your own texts and see how it works. Below I describe how the bias index works and some technical details.

Discourse Structure as a Dynamic Network

Any discourse can be represented as a network: the words are the nodes and their co-occurrences are the connections between them. The resulting graph traces the pathways of meaning circulation. We can make it more readable by aligning the clusters of nodes that are more densely connected (force-atlas algorithm) into the distinct groups marked with a specific color. We can also make the more influential nodes bigger on the graph (the nodes with the high betweenness centrality). You can read more about the technical details in this whitepaper on text network analysis.

For example, here’s a visualization of the TED talk by Julian Treasure called “How to Speak So People Will Want to Listen” made using this method. If you’re interested to look at the actual interactive graph, you can open it here.

From this graph we can clearly see that the main concepts are the notions of

“people”, “time”, “world”, “listen”, “voice” etc.

These concepts are the junctions for meaning circulation in that particular discourse. They connect the different communities of nodes (designated by distinct colors).

The algorithm works in a way that emulates human perception (following the landscape reading model, the idea of semantic priming, and also the common sense): if the words are frequently mentioned in the same context, they will form a community in the graph. If they appear in different contexts, they will be pushed away from each other. If the words are frequently used to connect different contexts together, they’ll appear bigger in the graph.

As a result, the structure of a text network graph can tell us a lot about the structure of the discourse.

For example, if the graph has a pronounced community structure (several different communities of words), the discourse also has several distinct topics, which are expressed in the text. In our example we have at least 4 major topics:

people — listen — speak (dark green)time —talk —register (light green)world—sound—powerful (orange)amazing—voice (pink)

If we analyze other texts in the same way, we will see that the resulting graph structures are different. For instance, here’s a visualization of the first chapter of Quaran:

Text network visualization of Quaran made using InfraNodus. The structure of the graph is less diversified and more centralized. There are only a few main concepts, the discourse circulates around them, the rest of the text supports the main concepts.

It can be seen that it has a different network structure. It is much more centralized and less diversified. There are a few main concepts:

“god”, “people”, “believe”, “lord”, “give”

and the whole discourse circulates around these concepts. All the other notions are there to support the main ones.

We performed a similar analysis with the inauguration speeches of the US presidents from 1969 to 2013 and visualized the way their narrative changed over time:

Visualization of the US presidents’ inauguration speeches made using InfraNodus (TNA) and Gephi (visualization). It can be seen that over time the structure stays more or less the same, however, Obama’s speeches seem to have more distinct influential terms, indicating a more diversified discourse.

It can be seen that while the structure of the discourse stayed more or less the same over the years, while the emphasized concepts have changed with every address. This may indicate that rhetorical strategy stayed the same, while the content has transformed over the years. Obama’s speeches seem to have a higher number of distinct influential nodes, which may indicate a more diversified discourse.

Bias as a Conduit for Ideology in Networks

Now that we’ve shown how discourse can be represented as a network structure, we can discuss the notion of bias in the context of network science. We will use some ideas for epidemiology to demonstrate how network’s topology affect the speed and propagation of information across the nodes.

A network can be seen as a representation of interactions that happen over time, a diagram of traces left by a dynamic process. If we study topology of a network, we can get a lot of insights about the nature of the dynamic processes it represents.

In the context of social sciences and health care information about network structure can provide valuable insights for epidemiology: how fast a disease (a virus, an opinion or any other (mis)information) may spread, how far it may propagate, what the best immunological strategies may be.

It has been demonstrated (Abramson & Kuperman 2001; PastorSatorras & Vespignani 2001) that as a network structure becomes more randomized, its epidemiological threshold decreases. Diseases, viruses, misinformation can spread faster and to a higher number of nodes. In other words, as the community structure of a network is less and less pronounced and the number of connections increase, the network propagates information to more nodes and this propagation occurs in highly pronounced oscillations (infected / not infected).

A figure from the study by Abramson & Kuperman (2001) where they have shown the fraction of infected elements (n) in relation to time (t) for networks with a different degree of disorder (p). The higher the degree of disorder, the more elements get infected, the oscillations get more and more intensified,, but also the time-span of the infection is relatively short.

At the same time, when the community structure is pronounced while the network is relatively interconnected (small-world network), the “pockets” of nodes help maintain epidemic disease for a longer time in the network. In other words, less nodes may become infected, but the infection might stay longer (endemic state).

Representation of network structures: [a] random, [b] scale-free (better pronounced communities) and, [c] hierarchical (less global connectivity) (from Stocker et al. 2001)

In another study performed on various social networks (Stocker, Cornforth & Bossomaier 2002) it has been shown that hierarchically flat networks (i.e. disordered) networks are not as stable as the scale-free ones (i.e. the ones that have a more pronounced community structure). In other words, hierarchies may be good for passing down the orders, but scale-free structures are better for maintaining a certain worldview.

As we can see there is not one network topology that may be considered to be preferential. In fact it depends on the intention, the context, the situation. In some cases it can be good if a network can propagate information easily to all of its elements relatively fast. In some other cases stability can be more preferential.

Overall, the topology of a network reflects how well it can propagate information, how susceptible it is to the new ideas, whether the ideas will take over the whole network for only a short time or remain for a longer period.

Measuring Discourse Bias Using Text Network Analysis

Measuring Discourse Bias Using Text Network AnalysisIn this article I propose a method and a tool to measure the level of bias in discourse based on text n

Text Mining 101: A Stepwise Introduction to Topic Modeling using Latent Semantic Analysis (using…

Text Mining 101: A Stepwise Introduction to Topic Modeling using Latent Semantic Analysis (using Python)Have you ever been inside a well-maintained library

Social Network Analysis的Centrality總結，以及networkx實現EigenCentrality，PageRank和KatzCentrality的對比

節點 int nod pos rect arc 分享 http import 本文主要總結近期學習的Social Network Analysis(SNA)中的各種Centrality度量，我暫且翻譯為中心度。本文主要是實戰，理論方面幾乎沒有，因為對於龐大的SNA，我可能連

disable NetworkManager and boot on static ip configuration using /etc/network/interfaces

sta ifdown isa ip link emctl lin mas 8.0 down 1.stop and disable NetworkManager sudo systemctl stop NetworkManager sudo systemctl disable

Codeforces Round #375 (Div. 2) B - Text Document Analysis 模擬

本場詳細題解見：https://blog.csdn.net/xiang_6/article/details/83549528 題意&思路見上述連結 #include<bits/stdc++.h> using namespace s

(Network Analysis)graph centrality measures

Which vertices are important? （是Youtube上的課程，這兒做筆記學習） Graph-theoretic measures: （每個node旁的數字，即這個node的eccentricity，因此diameter就是

(Network Analysis)Link Analysis

左邊的圖共有兩個cycles，上面的period=5，下面的period=6，因此最大公因數=1，是aperiodic graph 而右邊的圖，三個cycle的period=3，另一個period=6，因此最大公因數是3，就不是aperiodic graph

Complex Network Analysis for Characterizing Global Value Chains in Equipment Manufacturing

什麼是全球價值鏈以所謂“外包”，“分散生產”和“任務交易”為特點的全球價值鏈（全球價值鏈）的興起一直被認為是最重要的21世紀的貿易現象。研究的問題是什麼由於國際生產網路日益複雜和複雜，特別是在裝備製造業，傳統貿易方面統計數字和相應的貿易指標可能會給我們一個扭曲的貿易圖景。

Samsung acquires network analysis firm Zhilabs for 5G prep

Samsung has acquired a Spanish network analysis firm to enhance its 5G capabilities, the company has announced. The South Korean tech giant acquired Zhilab

Detecting the Fault Line Using Principal Component Analysis (PCA)

locations of earthquakes on the San Andreas fault near Parkfield (x, y, z (in km), mag)The dataset I worked with has 6,129 earthquake location data on the

SNAP（Standford Network Analysis Project）執行環境的搭建與配置

因為我女朋友的研究方向是複雜網路（Complex Network），所以我有時也關注一下這個領域的相關知識，更重要的是要幫女朋友做論文的實驗。雖然對複雜網路瞭解不是很深入，但是其對於資料和實驗結果的依賴是非常嚴重的事實我還是非常明瞭的，這個領域的論文特點就是通篇貫穿

網路分析(Network Analysis)入門篇(四) 網路的演化——隨機圖模型

目錄之前的內容，我們瞭解的都是一個靜態的網路，比如在某一個時刻中，網路節點之間的連線關係，而這裡我們要說的是一個動態的網路，即網路的結構隨著時間的變化而變化。這樣的例子在實際中相當的常見，比如分析隨著時間的推移，企業的層次結構是否變得合理了，還是說上半年

[Network Analysis] 複雜網路分析總結

　　在我們的現實生活中，許多複雜系統都可以建模成一種複雜網路進行分析，比如常見的電力網路、航空網路、交通網路、計算機網路以及社交網路等等。複雜網路不僅是一種資料的表現形式，它同樣也是一種科學研究的手段。複雜網路方面的研究目前受到了廣泛的關注和研究，尤其是隨著各種線上社交平臺的蓬勃發展，各領域對於線上社交網路的

Simplifying Sentiment Analysis using VADER in Python (on Social Media Text)

What is Sentiment Analysis?Sentiment Analysis, or Opinion Mining, is a sub-field of Natural Language Processing (NLP) that tries to identify and extract op