word processing in nlp with tensorflow

阿新 • • 發佈：2021-07-08

through tokenier and Serialization achieve word processing for train neuro network，i use some sample with tensorflow to introduce.

Preprocessing

Tokenizer

source code：https://github.com/keras-team/keras-preprocessing/blob/master/keras_preprocessing/text.py#L490-L519

some important functions and variables

init
def fit_on_texts(self, texts) #texts can be a string or a list of strings or a list of list of strings
self.word_index # the type of variance is dictonary, which contain a specific word subject to a unique index
self.index_word #r eserve the key and value of the word_index

sample

  import tensorflow as tf
  from tensorflow import keras
  # the package which can tokenizer
  from tensorflow.keras.preprocessing.text import Tokenizer
  '''
    transform the word into number
  '''
  sentences= ['i love my dog 
', 'i love my cat','you love my dog!']
  tokenizer = Tokenizer(num_words = 100)
  tokenizer.fit_on_texts(sentences)
  word_index = tokenizer.word_index
  print(word_index)
  # get the result {'love': 1, 'my': 2, 'i': 3, 'dog': 4, 'cat': 5, 'you': 6}

Serialization

texts_to_sequences(self,texts) # transforms each text in texts to a sequence of integers.
tf.keras.preprocessing.sequence.pad_sequences( sequences, maxlen=None, dtype='int32',padding='pre', truncating='pre', value=0.) # make the sentences with same length.
- sorce code https://github.com/tensorflow/tensorflow/blob/v2.5.0/tensorflow/python/keras/preprocessing/sequence.py#L88-L154

sample

sentences= ['i love my dog', 'i love my cat','you love my dog!','do you think my dog is amazing']
sequences = tokenizer.texts_to_sequences(sentences)
print(sequences)
 '''
   result is [[3, 1, 2, 4], [3, 1, 2, 5], [6, 1, 2, 4], [6, 2, 4]]
   which is not encoding for amazing, because it's not appear in fit texts
 '''

To solve this problem，we can set a oov in tokenizer to encode a word which not appear before.

tokenizer = Tokenizer(num_words = 100, oov_token = "<OOV>")
'''
    restart the code,we can get the result 
    [[4, 2, 3, 5], [4, 2, 3, 6], [7, 2, 3, 5], [1, 7, 1, 3, 5, 1, 1]]
'''

but each sequences has the different length of the series, it's difficult for train a neuro network,so we need make the sequnces has the same length.

from tensorflow.keras.preprocessing.sequence import pad_sequences
padded_sequences = pad_sequences(sequences, 
                                 padding = 'post',   # right padding
                                 maxlen = 5,         # max len of senquence  
                                 truncating = 'post') # right cut
padded_sequences
'''
then we can get the result 
array([[5, 3, 2, 4, 0],
       [5, 3, 2, 7, 0],
       [6, 3, 2, 4, 0],
       [8, 6, 9, 2, 4]])
'''

word processing in nlp with tensorflow

through tokenier and Serialization achieve word processing for train neuro network，i use some sample with tensorflow to introduce.

Rest Api CRUD in Laravel with Api Resources

執行：php artisan make:model Task –mf 執行: php artisan make:controller TaskController -r <?php

登入VideoScribe時顯示一條錯誤訊息 to use offline mode, you must first log in online with ‘Remember Me‘ selected

登入VideoScribe時，顯示一條錯誤訊息，指出“要使用離線模式，您必須首先線上選擇“記住我”。即使計算機連線到Internet，這仍然可能發生。

Insert new row in collect with powerapps canvas

Insert A New Row Into A Collection Input collection:myScores31a FullName Age TestScore David Jones 32 78 Anne Lisbon

每日一篇文獻：Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching

標題：Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching

Text classification with TensorFlow Hub: Movie reviews

This notebook classifies movie reviews as positive or negative using the text of the review. This is an example of binary—or two-class—classification, an important and widely applicable kind of mac

javax.servlet.ServletException: Could not resolve view with name in servlet with name 'dispatcherServlet'

異常名稱:ServletException 異常分析:無法解析檢視 , java物件轉json資料不成功? 異常原因:Controller層註解錯誤

[Linux]cp command in Linux with examples

cpstands forcopy. This command is used to copy files or group of files or directory. It creates an exact image of a file on a disk with different file name.cpcommand require at least two filenames

論文閱讀：3D human pose estimation in video with temporal convolutions and semi-supervised training

2019 CVPR的文章，使用時序卷積和半監督訓練的3D人體姿態估計論文連結：https://arxiv.org/abs/1811.11742

【論文閱讀】Effects of Emotional Music on Facial Emotion Recognition in Children with Autism Spectrum Disorder (ASD)

1.這篇文章究竟講了什麼問題？研究情感一致(congrunent)音樂對患有自閉症兒童的面部情感識別能力的影響

Fall in Love with English

1.Fall in Love with English　　　　Hiding behind the loose dusty curtain, a teenager packed up his overcoat into the suitcase. He planned to leave home at dusk though there was thunder and lightning

M2 Chip Rumored to Arrive in 2022 With Redesigned MacBook Air

Apple is planning to launch the \"M2\" chip with redesigned MacBook Air models in the first half of 2022, according to the leaker known as \"Dylandkt.\"

簡化NLP：TensorFlow中tf.strings的使用

簡化NLP：TensorFlow中tf.strings的使用 TensorFlow中很早就包含了tf.strings這個模組，不過實話說，在tf 1.x的固定計算圖的情況下，各種操作頗為複雜，我們在迎來了2.0中才更好可以看出tf.strings的威力。

C:\php-7.4.5-nts-Win32-vc15-x64\php-cgi.exe - FastCGI 程序意外退出問題解決 PHP Warning: 'vcruntime140.dll' 14.0 is not compatible with this PHP build linked with 14.16 in Unk

win10兩臺電腦，一臺裝了vs2019iis正常，另一臺沒裝iisphp環境出現些故障，深查原因，出現如下警告，現程式碼部分

執行tensorflow是出現的問題This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU

2019-09-06 11:01:39.589297: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical

執行react專案，npm run start/build, 報錯 There might be a problem with the project dependency tree. It is likely not a bug in Create React App, but something you need to fix locally.

如題：這個問題困擾了我半天，網上搜索各種解決方法，都沒能解決，最後仔細讀一遍原因才發現問題很簡單，就是版本不一致

word processing in nlp with tensorflow

Preprocessing

Tokenizer

some important functions and variables

sample

Serialization

sample

word processing in nlp with tensorflow

Rest Api CRUD in Laravel with Api Resources

登入VideoScribe時顯示一條錯誤訊息 to use offline mode, you must first log in online with ‘Remember Me‘ selected

Insert new row in collect with powerapps canvas

每日一篇文獻：Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching

Text classification with TensorFlow Hub: Movie reviews

javax.servlet.ServletException: Could not resolve view with name in servlet with name 'dispatcherServlet'

[Linux]cp command in Linux with examples

論文閱讀：3D human pose estimation in video with temporal convolutions and semi-supervised training

【論文閱讀】Effects of Emotional Music on Facial Emotion Recognition in Children with Autism Spectrum Disorder (ASD)

Fall in Love with English

M2 Chip Rumored to Arrive in 2022 With Redesigned MacBook Air

簡化NLP：TensorFlow中tf.strings的使用

C:\php-7.4.5-nts-Win32-vc15-x64\php-cgi.exe - FastCGI 程序意外退出問題解決 PHP Warning: 'vcruntime140.dll' 14.0 is not compatible with this PHP build linked with 14.16 in Unk

執行tensorflow是出現的問題This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU

執行react專案，npm run start/build, 報錯 There might be a problem with the project dependency tree. It is likely not a bug in Create React App, but something you need to fix locally.

1252. Cells with Odd Values in a Matrix

1519. Number of Nodes in the Sub-Tree With the Same Label

window下 mysql5.7查詢報錯： ORDER BY clause is not in GROUP BY..this is incompatible with sql_mode=only_full_group_by

Unhandled exception handling in iOS and Android with Xamarin.

word processing in nlp with tensorflow

Preprocessing

Tokenizer

some important functions and variables

sample

Serialization

sample

相關推薦