Build a knowledge graph from documents

阿新 • • 發佈：2018-12-29

Summary

In any business, Microsoft Word documents are commonly used. They contain information in the form of raw text, tables, and images. And all of the documents contain facts important to that business. This code pattern addresses the problem of extracting knowledge out of text and tables in domain-specific word documents. We build a knowledge graph on the knowledge extracted, which makes the knowledge queryable. This gives you the best of both worlds – training and a rules-based approach to extract knowledge out of documents.

Description

One of the biggest challenges in the industry today is how to make machines understand data in documents just like humans understand the context and intent of the document by reading it. The first step towards this goal is to convert the unstructured information (free-floating text and tables text) to a semi-structured format and then process it further. That's where graphs play a major role – giving shape and structure to the unstructured information present in the documents. This code pattern looks at the problem of extracting knowledge out of text and tables in domain-specific Word documents. A domain-specific knowledge graph is built on the knowledge extracted, and this makes the knowledge queryable. You can use this code pattern to to shape your analysis and use the data for further processing to get better insights.

The code pattern demonstrates a way to derive insights from a document containing raw text and information in tables using IBM Cloud, IBM Watson services, the Python package Mammoth, the Python NLTK, and IBM Watson Studio.

With this code pattern, you get:

The ability to process the tables in .docx files along with the free-floating text

A strategy for combining the results of a real-time analysis by Watson NLU along with the results from the rules defined by a subject matter expert or domain expert.

Flow

flow

The unstructured text data from the .docx files (HTML tables and free-floating text) that needs to be analyzed and correlated is extracted from the documents using custom Python code.
The text is classified using NLU and tagged using the Extend Watson text classification code pattern.
The text is correlated with other text using the Correlate documents code pattern.
The results are filtered using custom Python code.
The knowledge graph is constructed.

Instructions

Find the detailed steps for this pattern in the README. Those steps will show you how to:

Create IBM Cloud services.
Run using a Jupyter Notebook in IBM Watson Studio.
Analyze the results.

Build a knowledge graph from documents

Summary

Description

Flow

Instructions

Build a knowledge graph from documents

Machine Learning: How to Build a Model From Scratch

Using a Grakn Knowledge Graph for Biological Sequence Alignment Analysis

Knowledge base: how to build the perfect knowledge base from scratch

Build a predictive model on Watson Studio using CSV data set from Tweets

Build and train a neural net from scratch in Python

Use Watson Knowledge Studio to build a custom machine learning model in the medical domain

使用ConstraintLayout構建響應式UI(Build a Responsive UI with ConstraintLayout)

How to Remove A Service Entry From Win10 Service List

[WASM] Call a JavaScript Function from WebAssembly

hdu6121 Build a tree 模擬

HDU 6121 Build a tree —— 2017 Multi-University Training 7

SDN實戰: Build a mini-lab environment and practice SDN-IP/ONOS with GNS3, Mininet and VMware

2017南寧網絡賽 Problem J Minimum Distance in a Star Graph ( 模擬 )

SDN實戰:Build a VXLAN Tunnel by Making Python-based API Calls for DCI

Generate a binary tree from parent->child relationship

[Recompose] Merge RxJS Button Event Streams to Build a React Counter Component

《Let's Build A Simple Interpreter》之 Golang 版

1099. Build A Binary Search Tree (30)

c#npoi 報錯Cannot get a numeric value from a text cell 的解決

Build a knowledge graph from documents

Summary

Description

Flow

Instructions

相關推薦