1. 程式人生 > >Discovering Types for Entity Disambiguation

Discovering Types for Entity Disambiguation

Neural type system

Using the top solution from our type system optimization, we can now label data from Wikipedia using labels generated by the type system. Using this data (in our experiments, 400M tokens for each of English and French), we can now train a bidirectional LSTM to independently predict all the type memberships for each word. On the Wikipedia source text, we only have supervision on intra-wiki links, however this is sufficient to train a deep neural network to predict type membership with an

F1 of over 0.91.

One of our type systems, discovered by beam search, includes types such as Aviation, Clothing, and Games (as well as surprisingly specific ones like 1754 in Canada — indicating 1754 was an exciting year in the dataset of 1,000 Wikipedia articles it was trained on); you can also view the

full type system.

Inference

Predicting entities in a document usually relies on a "coherence" metric between different entities, e.g. measuring how well each entity fits with each other, which is O(N^2) in the length of the document. Instead, our runtime is O(N) as we need only to look up each phrase in a trie which maps phrases to their possible meanings. We rank each of the possible entities according to the link frequency seen in Wikipedia, refined by weighting each entity by its likelihood under the type classifier. New entities can be added just by specifying their type memberships (person, animal, country of origin, time period, etc..).

Next steps

Our approach has many differences to previous work on this problem. We're interested in how well end-to-end learning of distributed representations performs in comparison to the type-based inference we developed here. The type systems here were discovered using a small Wikipedia subset; scaling to all of Wikipedia could discover a type system with broad application. We hope you find our code useful!

If you'd like to help push research like this forward, please apply to OpenAI!