1. 程式人生 > >使用pyltp提取文本中的地址

使用pyltp提取文本中的地址

使用 += star port 名稱 模型 tput pan coder

  • 首先安裝pyltp
    pytlp項目首頁

  • 單例類(第一次調用時加載模型)

class Singleton(object):
    def __new__(cls, *args, **kwargs):
        if not hasattr(cls, ‘_the_instance‘):
            cls._the_instance = object.__new__(cls, *args, **kwargs)
        return cls._the_instance
  • 使用pyltp提取地址
import os
from pyltp import Segmentor, Postagger, NamedEntityRecognizer
from
main.models.Singleton import Singleton class address_extract_model(Singleton): print(‘load ltp model start...‘) pwd = os.getcwd() project_path = os.path.abspath(os.path.dirname(pwd) + os.path.sep + ".") LTP_DATA_DIR = project_path + ‘\AlarmClassification\main\ltp\model‘ # ltp模型目錄的路徑
cws_model_path = os.path.join(LTP_DATA_DIR, ‘cws.model‘) pos_model_path = os.path.join(LTP_DATA_DIR, ‘pos.model‘) # 詞性標註模型路徑,模型名稱為`pos.model` ner_model_path = os.path.join(LTP_DATA_DIR, ‘ner.model‘) # 命名實體識別模型路徑,模型名稱為`ner.model` print(‘path‘ + cws_model_path) segmentor = Segmentor() # 初始化實例
segmentor.load(cws_model_path) # 加載模型 postagger = Postagger() # 初始化實例 postagger.load(pos_model_path) # 加載模型 recognizer = NamedEntityRecognizer() # 初始化實例 recognizer.load(ner_model_path) # 加載模型 def get_model(self): return self.segmentor, self.postagger, self.recognizer def get_address_prediction(alarm_content): model = address_extract_model() segmentor, postagger, recognizer = model.get_model() words = segmentor.segment(alarm_content) # 分詞 postags = postagger.postag(words) # 詞性標註 netags = recognizer.recognize(words, postags) # 命名實體識別 result = ‘‘ for i in range(0, len(netags)): print(words[i] + ‘: ‘ + netags[i]) # 地名標簽為 ns if ‘s‘ in netags[i]: result += words[i] + ‘,‘ if len(result) < 1: result = ‘No address!‘ print(result) return result def get_address(alarm_content): print("start get_address...") result = "Exception" try: result = get_address_prediction(alarm_content) except Exception as ex: print(ex) print("Output is " + result) return result # segmentor.release() # 釋放模型 # postagger.release() # recognizer.release()
  • 運行效果

技術分享圖片

  • 項目源碼 ( 命名實體提取代碼位於main/ltp, 模型文件需要到pyltp下載 )
    https://github.com/haibincoder/AlarmClassification

使用pyltp提取文本中的地址