【夢溪筆談】7.TensorFlow學習筆記

阿新 • • 發佈：2020-11-18

#matmul:將矩陣 a 乘以矩陣 b,生成a * b
#pow（x,y）=x^y
#subtract：返回x-y 的元素
#multiply 對應元素相乘，不是矩陣相乘，而是相同維度的兩個向量（或者矩陣）對應的元素相乘，結果還是原向量的維度一致的向量
#reduce_sum:https://blog.csdn.net/u012193416/article/details/83349138
# n * 1,先對應元素相乘，再通過reduce_sum求和（結果保持向量結果），最後再和w0偏置進行求和
linear_terms = tf.add(w0,tf.reduce_sum(tf.multiply(w,x),1,keep_dims=True)) 
pair_interactions  
= 0.5 * tf.reduce_sum(
    tf.subtract(
        tf.pow(
            tf.matmul(x,tf.transpose(v)),2),
        tf.matmul(tf.pow(x,2),tf.transpose(tf.pow(v,2)))
    ),axis = 1 , keep_dims=True)

2.關於tf.truncated_normal()函式介紹

參考：https://blog.csdn.net/qq_36512295/article/details/100599979

tf.truncated_normal(shape, mean, stddev)
釋義：截斷的產生正態分佈的隨機數，即隨機數與均值的差值若大於兩倍的標準差，則重新生成。

shape，生成張量的維度
mean，均值
stddev，標準差

截斷正態分佈，是指限制正態分佈的區間，可以是上限也可以是下限。

3.tensorflow的sess.run的解釋

比如

a=tf.add(2,5)  #這裡本來a為7

b=tf.multiply(a,3) #b=21

sess=tf.Session()

replace_dict={a:15} #把a=15替換原a

sess.run(b,feed_dict=replace_dict) #這裡就是用新a替換掉舊a。所以結果為15X3=45

4.tf.feature_column詳解

參考：https://blog.csdn.net/kangshuangzhu/article/details/106851826

在tensorflow2.0 環境下的tfrecord讀寫及tf.io.parse_example和tf.io.parse_single_example的區別中已經講到了從tfrecord 中讀取資料需要提供一個dict，裡面包含了特徵名稱和特徵的型別，如果我們特徵很少，只需要手寫這個dict就可以。但是當特徵非常多的時候，就需要更方便的工具來生成這個dict。這個工具的就是tf.feature_column，同時tf.feature_column也是一個特徵工程的工具，可以用來自動one-hot處理，還有hash分桶等處理。

categorical_column_with_vocabulary_list：

對於列舉值量少的類別型特徵，比如省份等。

city = tf.feature_column.categorical_column_with_vocabulary_list("city",["shanghai","beijing","guangzhou","tianjin","shenzhen"])

categorical_column_with_identity

這個方法用於已經編碼的sparse特徵，例如，店鋪id雖然數量非常大，但是已經把每個店鋪id都從0開始編碼，那麼就可以用。（其實就是先做label encoding）

#其中，num_bucket是最大編號
poi = tf.feature_column.categorical_column_with_identity("poi", num_buckets=10, default_value=0)

categorical_column_with_vocabulary_file

前面已經說了，當sparse特徵的種類數量非常巨大的時候，就不能用用categorical_column_with_vocabulary_list了，用categorical_column_with_identity 又需要事先對sparse特徵編碼，這時候可以用tf.feature_column.categorical_column_with_vocabulary_file命令，讀取sparse特徵的所有可能取值。當然這種方法的效率也是比較低的，在要求低延遲的線上是不太划算的。

tf.feature_column.categorical_column_with_vocabulary_file(
    key, vocabulary_file, vocabulary_size=None, dtype=tf.dtypes.string,
    default_value=None, num_oov_buckets=0
)

categorical_column_with_hash_bucket

如果sparse特徵非常龐大，例如上面的poi可以寫成

poi = tf.feature_column.categorical_column_with_hash_bucket("poi", hash_bucket_size=10, dtype=tf.dtypes.int64)

但是應該注意的是，hash_bucket_size的大小應該留有充分的冗餘量，否則非常容易出現hash衝突，在這個例子中，一共有3個店鋪，把hash_bucket_size設定為10，仍然得到了hash衝突的結果，這樣poi的資訊就被丟失了一些資訊。

feature_column.indicator column

tf.feature_column.indicator column 是一個onehot工具，用於把sparse特徵進行onehot 變換，用於把categorical_column_with_*工具生成的特徵變成onehot 編碼

tf.feature_column.indicator column 的入參非只有一個，就是categorical_column_with_*的結果。

poi = tf.feature_column.categorical_column_with_hash_bucket("poi", hash_bucket_size=15, dtype=tf.dtypes.int64)
poi_idc = tf.feature_column.indicator_column(poi)

feature_column.embedding_column

用於生成embedding後的張量。

categorical_column：categorical_column_with_*工具的結果

dimension：embedding後的維度

combiner：對於多種類的sparse特徵怎麼組合，Currently 'mean', 'sqrtn' and 'sum' are supported

tf.feature_column.embedding_column(
    categorical_column, dimension, combiner='mean', initializer=None,
    ckpt_to_load_from=None, tensor_name_in_ckpt=None, max_norm=None, trainable=True,
    use_safe_embedding_lookup=True
)

【夢溪筆談】7.TensorFlow學習筆記

categorical_column_with_vocabulary_list：

categorical_column_with_identity

categorical_column_with_vocabulary_file

categorical_column_with_hash_bucket

feature_column.indicator column

feature_column.embedding_column

【夢溪筆談】7.TensorFlow學習筆記

【夢溪筆談】4.leetCode筆記

【夢溪筆談】6.spark-sql相關程式碼

【客戶端學習】Kotlin Android 學習筆記

【計算機網路】Stanford CS144 學習筆記

【筆記】主席樹學習筆記

【JS高階程式設計(第4版)學習筆記】第三章語言基礎

【JS高階程式設計(第4版)學習筆記】第四章變數、作用域與記憶體

【PTA題目解答】7-6 求一批整數中出現最多的個位數字 (20分)

【Google Earth Engine】GEE例項學習（3）--土地利用分類

【設計模式】HEAD FIRST學習筆記（一）

Python_items()方法【詳解】——Python系列學習筆記

【刷穿 LeetCode】7. 整數反轉（簡單）

【原創】Jetpack Compose學習筆記(二)

【OI向】數論問題學習筆記

【新手C語言】7.基礎搜尋法

【更新中】字尾陣列學習筆記

【尋徑06】如何突破學習瓶頸-學習筆記

【Skill】Candence Skill學習筆記

【Linux】當初的學習筆記

【夢溪筆談】7.TensorFlow學習筆記

categorical_column_with_vocabulary_list：

categorical_column_with_identity

categorical_column_with_vocabulary_file

categorical_column_with_hash_bucket

feature_column.indicator column

feature_column.embedding_column

相關推薦