1. 程式人生 > >Top Books on Natural Language Processing

Top Books on Natural Language Processing

Natural Language Processing, or NLP for short, is the study of computational methods for working with speech and text data.

The field is dominated by the statistical paradigm and machine learning methods are used for developing predictive models.

In this post, you will discover the top books that you can read to get started with natural language processing.

After reading this post, you will know:

  • The top books for practical natural language processing.
  • The top textbooks for the theoretical foundations of natural language processing.
  • The NLP books I have on my shelf.

Let’s get started.

Need help with Deep Learning for Text Data?

Take my free 7-day email crash course now (with code).

Click to sign-up and also get a free PDF Ebook version of the course.

Top Practical Books on Natural Language Processing

As practitioners, we do not always have to grab for a textbook when getting started on a new topic.

Code examples in the book are in the Python programming language.

Although there are fewer practical books on NLP than textbooks, I have tried to pick the top 3 books that will help you get started and bring NLP method to your machine learning project.

1. Natural Language Processing with Python

Amazon Image

This book provides an introduction to NLP using the Python stack for practitioners.

The book focuses on using the NLTK Python library, which is very popular for common NLP tasks.

Contents include:

  1. Language Processing and Python
  2. Accessing Text Corpora and Lexical Resources
  3. Processing Raw Text
  4. Writing Structured Programs
  5. Categorizing and Tagging Words
  6. Learning to Classify Text
  7. Extracting Information from Text
  8. Analyzing Sentence Structure
  9. Building Feature-Based GRammars
  10. Analyzing the Meaning of Sentences
  11. Managing Linguistic Data

This book is perfect if you are looking at getting into classical NLP using the go-to NLTK platform.

Resources

2. Taming Text

This book provides an introduction to a suite of different NLP tools and problems, such as Apache Solr, Apache OpenNLP, and Apache Mahout.

Amazon Image

Code examples are in Java.

It may be more suited to developers getting started with larger enterprise-grade NLP tools on work projects.

Notably, Grant Ingersoll is a cofounder of the Apache Mahout project.

Contents include:

  1. Getting Started Taming Text
  2. Foundations of Taming Text
  3. Searching
  4. Fuzzy String Matching
  5. Identifying People, Places and Things
  6. Clustering Text
  7. Classification, Categorization and Tagging
  8. Building an Example Question Answering System
  9. Untaming Text: Exploring the Next Frontier

Resources

3. Text Mining with R

Amazon Image

This book demonstrates statistical natural language processing methods on a range of modern applications.

Code examples are in R.

Code focuses on the “tidy” principles by Hadley Wickham (paper) and the tidytext package by the authors.

Of the three books, this is the most recently published and has a more practical and modern feel to the demonstrations.

Contents include:

  1. The Tidy Text Format
  2. Sentiment Analysis with Tidy Data
  3. Analyzing word and Document Frequency: tf-idf
  4. Relationships Between Words: N-grams and Correlations
  5. Converting to and from Nontidy Formats
  6. Topic Modeling
  7. Case Study: Comparing Twitter Archives
  8. Case Study: Mining NASA Metadata
  9. Case Study: Analyzing Usenet Text

Resources

Do you know of other great practical books on natural language processing?
Let me know in the comments.

Top Textbooks on Natural Language Processing

There are a ton of textbooks on natural language processing and on specific sub-topics.

In this section, I have tried to focus on what I (and consensus) seems to see as the best books on the topic for beginners, e.g. undergraduate or graduate students and practitioners looking to step deeper into the theory.

I have tried to pick a mix of general NLP books as well as books on highly studied topics like translation and speech.

The first two books in this section are essentially cannon for NLP students.

1. Foundations of Statistical Natural Language Processing

Amazon Image

This book provides an introduction to statistical methods for natural language processing covering both the required linguistics and the newer (at the time, circa 1999) statistical methods.

This book provides a strong foundation to better grasp the newer methods and encodings.

Contents include:

  1. Introduction
  2. Mathematical Foundations
  3. Linguistic Essentials
  4. Corpus-Based Work
  5. Collocations
  6. Statistical Inference: n-gram Models over Sparse Data
  7. Word Sense Disambiguation
  8. Lexical Acquisition
  9. Markov Models
  10. Part-of-Speech Tagging
  11. Probabilistic Context Free Grammars
  12. Probabilistic Parsing
  13. Statistical Alignment and Machine Translation
  14. Clustering
  15. Topics in Information Retrieval
  16. Text Categorization

Resources

2. Speech and Language Processing

Amazon Image

This book provides coverage of NLP from both speech and text perspectives with a strong focus on applications (one in each chapter).

Coverage of the topic feels exhaustive.

Contents include:

  1. Introduction
  2. Regular Expressions and Automata
  3. Words and Transducers
  4. N-grams
  5. Part-of-Speech Tagging
  6. Hidden Markov and Maximum Entropy Models
  7. Phonetics
  8. Speech Synthesis
  9. Automatic Speech Recognition
  10. Speech Recognition: Advanced Topics
  11. Computational Phonology
  12. Formal Grammars of English
  13. Syntactic Parsing
  14. Statistical Parsing
  15. Features and Unification
  16. Language and Complexity
  17. The Representation of Meaning
  18. Computational Semantics
  19. Lexical Semantics
  20. Computational Lexical Semantics
  21. Computational Discourse
  22. Information Extraction
  23. Question Answering and Summarization
  24. Dialog and Conversational Agents
  25. Machine Translation

Resources

4. Statistical Machine Translation

Amazon Image

This book provides an introduction to the topic of statistical machine translation, a s subfield of NLP.

Contents include:

  1. Introduction
  2. Words, Sentences, Corpa
  3. Probability Theory
  4. Word-Based Models
  5. Phrase-Based Models
  6. Decoding
  7. Language Models
  8. Evaluation
  9. Discriminative Training
  10. Integrating Linguistic Information
  11. Tree-Based Methods

Resources

5. Statistical Methods for Speech Recognition

Amazon Image

This book provides an introduction to the topic of statistical speech recognition, another subfield of NLP that saw an overhaul in the 1990s with statistical approaches.

Contents Include

  1. The Speech Recognition Problem
  2. Hidden Markov Models
  3. The Acoustic Model
  4. Basic Language Modeling
  5. The Viterbi Search
  6. Hypothesis Search on a Tree and the Fast Match
  7. Elements of Information Theory
  8. The Complexity of Tasks – The Quality of Language Models
  9. The Expectation-Maximization Algorithm and Its Consequences
  10. Decision Trees and Tree Language Models
  11. Phonetics from Orthography: Spelling-to-Base Form Mappings
  12. Triphones and Allophones
  13. Maximum Entropy Probability Estimation and Language Models
  14. Tree Applications of Maximum Entropy Estimation to Language Modeling
  15. Estimation of Probabilities from Counts and the Back-Off Method

Resources

NLP Books that I Own

I like to have a mixture of practical and reference texts on my shelf.

The hard part of NLP (for me) is simply the large number of sub-problems and the specialized terminology and theory used.

For this reason I have the following 3 NLP textbooks on my shelf:

I also really like the look of:

I recommend choosing the NLP books that are right for you and your needs or project.

Let me know which books you chose or own.
Leave a comment below.

Further Reading

This section provides more resources on the topic if you are looking go deeper.

Top NLP Books

Quora

Summary

In this post, you discovered the top books on natural language processing.

Specifically, you learned:

  • The top books for practical natural language processing.
  • The top textbooks for the theoretical foundations of natural language processing
  • The NLP books I have on my shelf.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.


Develop Deep Learning models for Text Data Today!

Deep Learning for Natural Language Processing

Develop Your Own Text models in Minutes

…with just a few lines of python code

It provides self-study tutorials on topics like:
Bag-of-Words, Word Embedding, Language Models, Caption Generation, Text Translation and much more…

Finally Bring Deep Learning to your Natural Language Processing Projects

Skip the Academics. Just Results.


相關推薦

Top Books on Natural Language Processing

Tweet Share Share Google Plus Natural Language Processing, or NLP for short, is the study of com

論文閱讀:A Primer on Neural Network Models for Natural Language Processing(1)

選擇 works embed 負責 距離 feature 結構 tran put 前言 2017.10.2博客園的第一篇文章,Mark。 由於實驗室做的是NLP和醫療相關的內容,因此開始啃NLP這個硬骨頭,希望能學有所成。後續將關註知識圖譜,深度強化學習等內

Review of Stanford Course on Deep Learning for Natural Language Processing

Tweet Share Share Google Plus Natural Language Processing, or NLP, is a subfield of machine lear

Coursera, Deep Learning 5, Sequence Models, week2, Natural Language Processing & Word Embeddings

roc learn 做了 eat del sin img feature enc Word embeding 給word 加feature,用來區分word 之間的不同,或者識別word之間的相似性.               

語言模型和RNN CS244n 大作業 Natural Language Processing

語言模型 語言模型能夠計算一段特定的字詞組合出現的頻率, 比如:”the cat is small” 和 “small the is cat”, 前者出現的頻率高 同樣的,根據前面所有的字詞序列資訊, 我們可以確定下一個位置某個特定詞出現的頻率, 豎線左邊表示下一個出現詞

CS224n: Natural Language Processing with Deep Learning 學習筆記

課程地址:http://web.stanford.edu/class/cs224n/ 時間:2017年 主講:Christopher Manning、Richard Lecture 1: Introduction NLP:Natural language processing 常見

Recent Trends in Deep Learning Based Natural Language Processing(arXiv)筆記

深度學習方法採用多個處理層來學習資料的層次表示,並在許多領域中產生了最先進的結果。最近,在自然語言處理(NLP)的背景下,各種模型設計和方法蓬勃發展。本文總結了已經用於大量NLP任務的重要深度學習相關模型和方法,及回顧其演變過程。我們還對各種模型進行了總結、比較

Hands-Natural-language-processing-python 1: NLTK

基本用法: >>> from nltk.tokenize import word_tokenize as wtoken >>> wtoken(samples_tw[20]) >>> from nltk.stem import Porter

Investing in AI: When natural language processing pays off

Investing in AI: When natural language processing pays offFor the past 18 months, my teams at Acxiom Research have worked extensively with a specific form

See this simple introduction to Natural Language Processing (NLP)

Today, with Digitization of everything, 80 percent the data being created is unstructured. Audio, Video, our social footprints, the data generated from co

Natural Language Processing for Fuzzy String Matching with Python

Fuzzy string search can be used in various applications, such as:A spell checker and spelling-error, typos corrector. For example, a user types “Missisaga”

natural language processing blog: finite state methods

(Can you tell, by the recent frequency of posts, that I'm try not to work on getting ready for classes next week?)[This post is based partially on some co

natural language processing blog: information retrieval

Due to a small off-the-radar project I'm working on right now, I've been building my own inverted indices. (Yes, I'm vaguely aware of discussions in DB/W

natural language processing blog: Yet another list of things we can do to have more diverse sets of invited speakers

Great post Hal, and very timely as we start to consider such issues for NAACL 2019. I think disclosing conflicts of interest between those who are doing

natural language processing blog: structured prediction

Ellen Riloff and I run an NLP reading group pretty much every semester. Last semester we covered "old school NLP." We independently came up with lists o

natural language processing blog: machine translation

Happy new year, all... Apologies for being remiss about posting recently. (Can I say "apologies about my remission"?) This post is a bit more of a revie

natural language processing blog: Many opportunities for discrimination in deploying machine learning systems

A while ago I created this image for thinking about how machine learning systems tend to get deployed. In this figure, for Chapter 2 of CIML, the left co

Deep Learning for Natural Language Processing Archives

Machine translation is the challenging task of converting text from a source language into coherent and matching text in a target language. Neural machine

Biopharma Navigator: Natural Language Processing for Life Sciences

Because our databases are built using deep links to the most up-to-date and relevant biopharma and healthcare industry information, new data and connectio

Cogito API: A Natural Language Processing API

Use Cogito API for full semantic and natural language processing functions such as text mining (with semantic reasoning and inferential entities), catego