1. 程式人生 > >Extend Watson text classification

Extend Watson text classification

Summary

Watson Natural Language Understanding requires multiple documents for training in order to obtain good results. In new subject domains, there is limited time to create multiple training documents. In such a scenario, the approach suggested in this developer journey augments the results from Natural Language Understanding with a simple input configuration JSON file, which can be prepared by a domain expert. This approach gives accurate results without the need for training documents.

Description

In this pattern, we show you how to use Watson Natural Language Understanding (NLU) service and IBM Watson Studio to augment the text classification results when there is no historical data available. A configuration JSON document prepared by a domain expert is taken as input by IBM Watson Studio. The configuration JSON document can be modified to obtain better results and insights into the text content.

When you have completed this pattern, you will understand how to:

  • Create and run a Jupyter Notebook in Watson Studio.
  • Use Watson Studio Object Storage to access data and configuration files.
  • Use the NLU API to extract metadata from a document in Jupyter Notebooks.
  • Extract and format unstructured data using simplified Python functions.
  • Use a configuration file to build configurable and layered classification grammar.
  • Use the combination of grammatical classification and regex patterns from a configuration file to classify word token classes.
  • Store the processed output JSON in Watson Studio Object Storage.

Flow

flow

  1. Documents that require analysis are stored in IBM Cloud Object Storage.
  2. The Python code retrieves the document content from Object Storage along with the configuration JSON.
  3. The document contents are sent to Watson NLU and a response is obtained.
  4. The Python Natural Language Toolkit (NLTK) module is used to parse the document and generate keywords, POS tags, and chunks, based on tag patterns.
  5. The configuration JSON is read, and the domain-specific keywords and attributes are classified.
  6. The response from NLU is augmented with the results from Python code.
  7. The final document classification and attribution is stored in Object Storage for further consumption and processing.

Instructions

Ready to put this code pattern to use? Complete details on how to get started running and using this application are in the README.

相關推薦

Extend Watson text classification

Summary Watson Natural Language Understanding requires multiple documents for training in order to obtain good results. In new subjec

python---chinese text classification

detail os.path bytes nor post [] sea art fault #http://blog.csdn.net/github_36326955/article/details/54891204#comments # #-*- coding:

Learning Structured Representation for Text Classification via Reinforcement Learning 學習筆記

ctu recursive fec 註釋 css 進攻 imp column converge Representation learning : 表征學習,端到端的學習 pre-specified 預先指定的 demonstrate 論證;證明,證實;顯示

Investigating Capsule Networks with Dynamic Routing for Text Classification

探索使用動態路由的膠囊網路進行文字分類,提出三種策略穩定動態路由來減輕噪音膠囊的分佈,這些膠囊可能包含背景資訊,或是訓練不好。膠囊網路獲得很好的分類效果,而且訓練多標籤的效果好於單標籤 1 Introduction 文章或是句子建模是NLP的基礎問題,如果組成,層次,結構都考慮的話,很是複雜

fasttext論文 Bag of Tricks for Efficient Text Classification

fasttext: Bag of Tricks for Efficient Text Classification Abstract 1 Introduction 2 Model architecture 2.1 Hiera

《Character-level convolutional networks for text classification》論文網路結構解讀

1.資料 比如有一條資料【x=“Simultaneous Tropical Storms are Very Rare”】.則把該句子的大寫字母全部表示成小寫,構建char字符集的詞彙表如下(這裡詞彙表長度為70(69+1,即其他的不在詞彙表的表示為0)): 資料可以表示為x=70X

tf.keras入門(2) Film review text Classification(IMDB 資料集)

影評文字分類 使用 IMDB 資料集,其中包含來自網際網路電影資料庫的 50000 條影評文字。將這些影評拆分為訓練集(25000 條影評)和測試集(25000 條影評)。訓練集和測試集之間達成了平衡,意味著它們包含相同數量的正面和負面影評。 介面解釋 train_

Recurrent Neural Network for Text Classification with Multi-Task Learning

引言 Pengfei Liu等人在2016年的IJCAI上發表的論文,論文提到已存在的網路都是針對單一任務進行訓練,但是這種模型都存在問題,即缺少標註資料,當然這是任何機器學習任務都面臨的問題。 為了應對資料量少,常用的方法是使用一個無監督的預訓練模型,比如詞向量,實驗中也取得了不錯

Week1.3 Simple deep learning for text classification

Neural networks for words(and characters) 在本節中我們將學習如何將神經網路用於文字分類,還將學習卷積神經網路相關的原理. 回顧–Bag of words way 在前面課程中,我們學習瞭如何將一段文本當作一系列word

Text Classification

Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement

Class Text Classification Model Comparison and Selection

Way better!df['post'].apply(lambda x: len(x.split(' '))).sum()3421180After text cleaning and removing stop words, we have only over 3 million words to work

Using Feature Selection Methods in Text Classification

In text classification, the feature

Practical Text Classification With Python and Keras

Imagine you could know the mood of the people on the Internet. Maybe you are not interested in its entirety, but only if people are today happy on your

《Universal Language Model Fine-tuning for Text Classification》翻譯

ULMFiT-用於文字分類的通用語言模型微調 翻譯自《Universal Language Model Fine-tuning for Text Classification》 Jeremy Howard* fast.ai | University of San F

Text Classification using Neural Networks

Text Classification using Neural NetworksUnderstanding how chatbots work is important. A fundamental piece of machinery inside a chat-bot is the text class

機器學習(二十)——文字分類的事件模型(Event models for text classification

為了結束我們對生成性學習演算法的討論,讓我們再來談談一個專門用於文字分類的模型。雖然我們已經介紹了樸素貝葉斯,但它在許多分類問題例如文字分類上都會很好地工作,會有一個相關的模型做得更好。在文字分類的具體背景下,提出的樸素貝葉斯採用所謂的多元伯努利事件模型。在這個模型中,我們假

Building with Watson: Advanced audio transcription with Speech to Text

IBM Watson Senior Offering Manager Bhavik Shah discusses the Speech to Text service and the host of recent improvements and new features designed to make

Sublime Text 3 快捷鍵總結

tex pack text 安裝 定位 pac 窗口 位置 默認 選擇類 Ctrl+D 選中光標所占的文本,繼續操作則會選中下一個相同的文本。 Alt+F3 選中文本按下快捷鍵,即可一次性選擇全部的相同文本進行同時編輯。舉個栗子:快速選中並更改所有相同的變量名、函數名等。

pdf can't copy text 無法復制文字

-- lin rac help bsp images ges net .com 有些 pdf 是通過圖片弄出來的,或者被 protect 了. 我們會無法 copy 裏面的字. 這個時候可以用 OCR (Optical character recognition) 就是從

關於Text Kit 一些事

div 文字 而且 let cin 支持 res tutorial win 1. Text Kit 是什麽? 在iOS7中,蘋果引入了Text Kit——Text Kit是一個高速而又現代化的文字排版和渲染引擎。Text Kit在UIKit framewo