載入GloVe模型和Word2Vec模型
阿新 • • 發佈:2019-01-11
可以用gensim載入進來,但是需要記憶體足夠大。
#載入Google訓練的詞向量
import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin',binary=True)
print(model['love'])
2 用Glove預訓練的詞向量也可以用gensim載入進來,只是在載入之前要多做一步操作,程式碼參考。
Glove300維的詞向量有5.25個G。
# 用gensim開啟glove詞向量需要在向量的開頭增加一行:所有的單詞數 詞向量的維度 import gensim import os import shutil import hashlib from sys import platform #計算行數,就是單詞數 def getFileLineNums(filename): f = open(filename, 'r') count = 0 for line in f: count += 1 return count #Linux或者Windows下開啟詞向量檔案,在開始增加一行 def prepend_line(infile, outfile, line): with open(infile, 'r') as old: with open(outfile, 'w') as new: new.write(str(line) + "\n") shutil.copyfileobj(old, new) def prepend_slow(infile, outfile, line): with open(infile, 'r') as fin: with open(outfile, 'w') as fout: fout.write(line + "\n") for line in fin: fout.write(line) def load(filename): num_lines = getFileLineNums(filename) gensim_file = 'glove_model.txt' gensim_first_line = "{} {}".format(num_lines, 300) # Prepends the line. if platform == "linux" or platform == "linux2": prepend_line(filename, gensim_file, gensim_first_line) else: prepend_slow(filename, gensim_file, gensim_first_line) model = gensim.models.KeyedVectors.load_word2vec_format(gensim_file) load('glove.840B.300d.txt')
生成的glove_model.txt就是可以直接用gensim開啟的模型。