騰訊技術工程 | 騰訊AI Lab 現場陳述論文:使眾包配對排名聚合信息最大化的 HodgeRan
中文概要
眾包近來已經成為了許多領域解決大規模人力需求的有效範式。但是任務發布者通常預算有限,因此有必要使用一種明智的預算分配策略以獲得更好的質量。在這篇論文中,我們在 HodgeRank 框架中研究了用於主動采樣策略的信息最大化原理;其中HodgeRank 這種方法基於多個眾包工人(worker)的配對排名數據的霍奇分解(Hodge Decomposition)。
該原理給出了兩種主動采樣情況:費希爾信息最大化(Fisher information maximization)和貝葉斯信息最大化(Bayesian information maximization)。其中費希爾信息最大化可以在無需考慮標簽的情況下基於圖的代數連接性(graph algebraic connectivity)的序列最大化而實現無監督式采樣;貝葉斯信息最大化則可以選擇從先驗到後驗的過程有最大信息增益的樣本,這能實現利用所收集標簽的監督式采樣。實驗表明,相比於傳統的采樣方案,我們提出的方法能提高采樣效率,因此對實際的眾包實驗而言是有價值的。
英文概要
Recently, crowdsourcing has emerged as an effective paradigm for human-powered large scale problem solving in various domains. However, task requester usually has a limited amount of budget, thus it is desirable to have a policy to wisely allocate the budget to achieve better quality. In this paper, we study the principle of information maximization for active sampling strategies in the framework of HodgeRank, an approach based on Hodge Decomposition of pairwise ranking data with multiple workers.
The principle exhibits two scenarios of active sampling: Fisher information maximization that leads to unsupervised sampling based on a sequential maximization of graph algebraic connectivity without consideringlabels; and Bayesian information maximization that selects samples with the largest information gain from prior to posterior, which gives a supervised sampling involving the labels collected. Experiments show that the proposed methods boost the sampling efficiency as compared to traditional sampling schemes and are thus valuable to practical crowdsourcing experiments.
英文演講PPT
In this paper, we present a principle of active sampling based on information maximization in the framework of HodgeRank.
Our contributions in this work are three fold:
1. A new version of Hodge decomposition of pairwise comparison data with multiple voters is presented. Within this framework, two schemes of information maximization, Fisher and Bayesian that lead to unsupervised and supervised sampling respectively, are systematically investigated.
2. Closed form update and a fast online algorithm are derived for supervised sampling with Bayesian information maximization for HodgeRank, which is shown faster and more accurate than the state-of-the-art method Crowd-BT (Chen et al.2013).
3. These schemes exhibit better sampling efficiency than random sampling as well as a better loop-free control in clique complex of paired comparisons, thus reduce the possibility of causing voting chaos by harmonic ranking (Saari 2001) (i.e., the phenomenon that the inconsistency of preference data may lead to totally different aggregate orders using different methods).
騰訊技術工程 | 騰訊AI Lab 現場陳述論文:使眾包配對排名聚合信息最大化的 HodgeRan