Factorization Machines簡介與程式碼實現

阿新 • • 發佈：2018-12-23

介紹

FM是聯合SVM與因式分解模型的優點所得。在有比較大的資料稀疏情況下，也能從中找出聯絡。FM可以線上性時間內優化。

優點

可以在非常稀疏的資料中進行合理的引數估計
FM模型的時間複雜度是線性的
FM是一個通用模型，它可以用於任何特徵為實值的情況

特徵向量例子

在這裡插入圖片描述

演算法原理

model equation:
Expressiveness:

對於一個W總是存在 $W =$

V ⋅ V t W=V·V^t

W = V \cdot V^{t}

,也就說對於任何W只要V的列k取得適當，總是能從

W=V·V^t

獲得。但是在資料非常稀疏的時候，因為沒有足夠的資料來得到W，那麼就可以通過

W=V·V^t

，V的k取得足夠小來得到W。

Parameter Estimation Under Sparsity:

因為FM的因式分解，打破了變數之間的獨立性，使我們可以通過一個互動來估計相關互動的引數

Computation:

對於上述公式，時間複雜度是O(k $n^2$ )

但是對於上述公式成對互動可以重新化簡為：
在這裡插入圖片描述
第一步推導可以從下圖得出：

則複雜度變為了O(kn)

FM as Predictor

可以做迴歸
二分類
排序

上述都可以使用L2正則來優化防止過擬合

Learning FM

在這裡插入圖片描述
利用梯度來更新

程式碼實現

簡單資料：


import numpy as np 

import tensorflow as tf 

x_data = np.matrix([ 

# Users | Movies | Movie Ratings | Time | Last Movies Rated 

# A B C | TI NH SW ST | TI NH SW ST | | TI NH SW ST 

[1, 0, 0, 1, 0, 0, 0, 0.3, 0.3, 0.3, 0, 13, 0, 0, 0, 0 ], 

[1, 0, 0, 0, 1, 0, 0, 0.3, 0.3, 0.3, 0, 14, 1, 0, 0, 0 ], 

[1, 0, 0, 0, 0, 1, 0, 0.3, 0.3, 0.3, 0, 16, 0, 1, 0, 0 ], 

[0, 1, 0, 0, 0, 1, 0, 0, 0, 0.5, 0.5, 5, 0, 0, 0, 0 ], 

[0, 1, 0, 0, 0, 0, 1, 0, 0, 0.5, 0.5, 8, 0, 0, 1, 0 ], 

[0, 0, 1, 1, 0, 0, 0, 0.5, 0, 0.5, 0, 9, 0, 0, 0, 0 ], 

[0, 0, 1, 0, 0, 1, 0, 0.5, 0, 0.5, 0, 12, 1, 0, 0, 0 ] 

]) 

# ratings 

y_data = np.array([5, 3, 1, 4, 5, 1, 5]) 

# Let's add an axis to make tensoflow happy. 

y_data.shape += (1, ) 

n, p = x_data.shape 

# number of latent factors 

k = 5 

# design matrix 

X = tf.placeholder('float32', [n, p]) 

# target vector 

y = tf.placeholder('float32', [n, 1]) 

# bias and weights 

w0 = tf.Variable(tf.zeros([1])) 

W = tf.Variable(tf.zeros([p])) 

# interaction factors, randomly initialized 

V = tf.Variable(tf.random_normal([k, p], stddev=0.01)) 

# estimate of y, initialized to 0. 

y_hat = tf.Variable(tf.zeros([n, 1])) 

linear_terms = tf.add(w0, 

tf.reduce_sum( 

tf.multiply(W, X), 1, keepdims=True)) 

interactions = (tf.multiply(0.5, 

tf.reduce_sum( 

tf.subtract( 

tf.pow(tf.matmul(X, tf.transpose(V)), 2), 

tf.matmul(tf.pow(X, 2), tf.transpose(tf.pow(V, 2)))), 

1, keepdims=True))) 

y_hat = tf.add(linear_terms, interactions) 

# L2 regularized sum of squares loss function over W and V 

lambda_w = tf.constant(0.001, name='lambda_w') 

lambda_v = tf.constant(0.001, name='lambda_v') 

l2_norm = (tf.reduce_sum( 

tf.add( 

tf.multiply(lambda_w, tf.pow(W, 2)), 

tf.multiply(lambda_v, tf.pow(V, 2))))) 

error = tf.reduce_mean(tf.square(tf.subtract(y, y_hat))) 

loss = tf.add(error, l2_norm) 

eta = tf.constant(0.1) 

optimizer = tf.train.AdagradOptimizer(eta).minimize(loss) 

# that's a lot of iterations 

N_EPOCHS = 1000 

# Launch the graph. 

init = tf.global_variables_initializer() 

with tf.Session() as sess: 

sess.run(init) 

for epoch in range(N_EPOCHS): 

# indices = np.arange(n) 

# np.random.shuffle(indices) 

# x_data, y_data = x_data[indices], y_data[indices] 

sess.run(optimizer, feed_dict={X: x_data, y: y_data}) 

print('MSE: ', sess.run(error, feed_dict={X: x_data, y: y_data})) 

print('Loss (regularized error):', sess.run(loss, feed_dict={X: x_data, y: y_data})) 

print('Predictions:', sess.run(y_hat, feed_dict={X: x_data, y: y_data})) 

print('Learnt weights:', sess.run(W, feed_dict={X: x_data, y: y_data})) 

print('Learnt factors:', sess.run(V, feed_dict={X: x_data, y: y_data}))

複雜資料版：使用資料來自MovieLens100K Dataset


from scipy.sparse import csr 

import pandas as pd 

import numpy as np 

import tensorflow as tf 

def vectorize_dic(dic,ix=None,p=None,n=0,g=0): 

""" 

dic -- dictionary of feature lists. Keys are the name of features 

ix -- index generator (default None) 

p -- dimension of feature space (number of columns in the sparse matrix) (default None) 

""" 

if ix==None: 

ix = dict() 

nz = n * g 

col_ix = np.empty(nz,dtype = int) 

i = 0 

numofUsers=0 

flag=True 

for k,lis in dic.items(): 

for t in range(len(lis)): 

if k=='users': 

ix[str(lis[t]) + str(k)] = ix.get(str(lis[t]) + str(k),lis[t]-1) 

elif k=='items': 

if flag==True: 

numofUsers=len(ix) 

flag=False 

ix[str(lis[t]) + str(k)] = lis[t]-1+numofUsers 

col_ix[i+t*g] = ix[str(lis[t]) + str(k)] 

i += 1 

row_ix = np.repeat(np.arange(0,n),g) 

data = np.ones(nz) 

if p == None: 

p = len(ix) 

ixx = np.where(col_ix < p) 

return csr.csr_matrix((data[ixx],(row_ix[ixx],col_ix[ixx])),shape=(n,p)),ix 

#where data, row_ind and col_ind satisfy the relationship a[row_ind[k], col_ind[k]] = data[k]. 

def batcher(X_, y_=None, batch_size=-1): 

n_samples = X_.shape[0] 

if batch_size == -1: 

batch_size = n_samples 

if batch_size < 1: 

raise ValueError('Parameter batch_size={} is unsupported'.format(batch_size)) 

for i in range(0, n_samples, batch_size): 

upper_bound = min(i + batch_size, n_samples) 

ret_x = X_[i:upper_bound] 

if y_ is not None: 

ret_y = y_[i:upper_bound] 

yield (ret_x, ret_y) 

cols = ['user','item','rating','timestamp'] 

train = pd.read_csv('ua.base',delimiter='\t',names = cols) 

test = pd.read_csv('ua.test',delimiter='\t',names = cols) 

x_train,ix = vectorize_dic({'users':train['user'].values, 

'items':train['item'].values},n=len(train.index),g=2) 

x_test,ix = vectorize_dic({'users':test['user'].values, 

'items':test['item'].values},ix,x_train.shape[1],n=len(test.index),g=2) 

y_train = train['rating'].values 

y_test = test['rating'].values 

x_train = x_train.todense() 

x_test = x_test.todense() 

n,p = x_train.shape 

k = 10 

x = tf.placeholder('float',[None,p]) 

y = tf.placeholder('float',[None,1]) 

w0 = tf.Variable(tf.zeros([1])) 

w = tf.Variable(tf.zeros([p])) 

v = tf.Variable(tf.random_normal([k,p],mean=0,stddev=0.01)) 

#y_hat = tf.Variable(tf.zeros([n,1])) 

linear_terms = tf.add(w0,tf.reduce_sum(tf.multiply(w,x),1,keepdims=True)) # n * 1 

pair_interactions = 0.5 * tf.reduce_sum( 

tf.subtract( 

tf.pow( 

tf.matmul(x,tf.transpose(v)),2), 

tf.matmul(tf.pow(x,2),tf.transpose(tf.pow(v,2))) 

),axis = 1 , keepdims=True) 

y_hat = tf.add(linear_terms,pair_interactions) 

lambda_w = tf.constant(0.001,name='lambda_w') 

lambda_v = tf.constant(0.001,name='lambda_v') 

l2_norm = tf.reduce_sum( 

tf.add( 

tf.multiply(lambda_w,tf.pow(w,2)), 

tf.multiply(lambda_v,tf.pow(v,2)) 

) 

) 

error = tf.reduce_mean(tf.square(y-y_hat)) 

loss = tf.add(error,l2_norm) 

train_op = tf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(loss) 

epochs = 10 

batch_size = 1000 

# Launch the graph 

init = tf.global_variables_initializer() 

with tf.Session() as sess: 

sess.run(init) 

for epoch in range(epochs): 

perm = np.random.permutation(x_train.shape[0]) 

# iterate over batches 

for bX, bY in batcher(x_train[perm], y_train[perm], batch_size): 

_,t = sess.run([train_op,loss], feed_dict={x: bX.reshape(-1, p), y: bY.reshape(-1, 1)}) 

print(t) 

print('MSE: ', sess.run(error, feed_dict={x: x_test.reshape(-1, p), y: y_test.reshape(-1, 1)})) 

print('Predictions:', sess.run(y_hat, feed_dict={x: x_test.reshape(-1, p), y: y_test.reshape(-1, 1)}))

Factorization Machines簡介與程式碼實現

介紹 FM是聯合SVM與因式分解模型的優點所得。在有比較大的資料稀疏情況下，也能從中找出聯絡。FM可以線上性時間內優化。優點可以在非常稀疏的資料中進行合理的引數估計 FM模型的時間複雜度是線性的 FM是一個通用模型，它可以用於任何特徵為實值的

Field-aware Factorization Machines for CTR Prediction簡介與程式碼實現

摘要 FM被廣泛應用在CTR，但是FFM在一些世界範圍的CTR競賽表現好於目前存在的模型。作者實現了相關程式碼，並與一些競爭模型進行了全面的分析。實驗證明FFM在某些分類問題上非常有用。介紹 FFM 對於這個例子來說，FM的隱向量表示應該為：在FM中，每個

DeepFM: A Factorization-Machine based Neural Network for CTR Prediction簡介與程式碼實現

論文簡介 Abstract 對於一個基於CTR預估的推薦系統，最重要的是學習到使用者點選行為背後隱含的特徵組合。在不同的推薦場景中，低階組合特徵或者高階組合特徵可能都會對最終的CTR產生影響。但是現存的方法總是忽視了高階或低階組合特徵的聯絡，或者要求專門的特徵工程，因此作者建立了

MD5簡介與程式碼實現

MD5簡介 md5的全稱是message-digest algorithm 5（資訊-摘要演算法），在90年代初由mit laboratory for computer science和rsa data security inc的ronald l. rivest開

PHP 控制反轉與依賴注入詳細分析與程式碼實現

PHP有很多的設計模式，比如單例模式，訂閱模式，策略模式，工廠模式，觀察者模式，這些設計模式其實無非都是為了讓程式簡化，容易維護，模組間解耦。現在我們來講講PHP的另外一種設計模式，控制反轉/依賴注入，這兩者其實是同一個概念，只是凶不同的角度去解釋的而已。依賴注入：是從需要實現的業務邏輯上面去

Java常用的八種排序演算法與程式碼實現（三）：桶排序、計數排序、基數排序

三種線性排序演算法：桶排序、計數排序、基數排序線性排序演算法（Linear Sort）：這些排序演算法的時間複雜度是線性的O(n)，是非比較的排序演算法桶排序（Bucket Sort）　　將要排序的資料分到幾個有序的桶裡，每個桶裡的資料再單獨進行排序，桶內排完序之後，再把桶裡的

Java常用的八種排序演算法與程式碼實現（二）：歸併排序法、快速排序法

注：這裡給出的程式碼方案都是通過遞迴完成的－－－歸併排序（Merge Sort）：　　分而治之，遞迴實現　　如果需要排序一個數組，我們先把陣列從中間分成前後兩部分，然後對前後兩部分進行分別排序，再將排好序的數組合並在一起，這樣整個陣列就有序了　　歸併排序是穩定的排序演算法，時間

全國天氣預報資訊資料 API 功能簡介與程式碼呼叫實戰視訊

此文章對開放資料介面 API 之「全國天氣預報資訊資料 API」進行了功能介紹、使用場景介紹以及呼叫方法的說明，供使用者在使用資料介面時參考之用，並對實戰開發進行了視訊演示。 1. 產品功能介面開放了全國天氣預報資訊資料，你可以通過關鍵字查詢任意市或者區級別的位置程式碼，通過位置程式碼查詢最詳細的天氣預

機器學習之logistic迴歸演算法與程式碼實現

Logistic迴歸演算法與程式

機器學習之AdaBoost原理與程式碼實現

1 2.000000 1.000000 38.500000 66.000000 28.000000 3.000000 3.000000 0.000000 2.000000 5.000000 4.000000 4.000000 0.0

機器學習之KNN原理與程式碼實現

KNN原理與程式碼實現 KNN原理 KNN（k-Nearest Neighbour）：K-近鄰演算法，主要思想可以歸結為一個成語：物以類聚工作原理給定一個訓練資料集，對新的輸入例項，在訓練資料集中找到與該例項最鄰近的 k （k

機器學習系列文章：Apriori關聯規則分析演算法原理分析與程式碼實現

1.關聯規則淺談關聯規則（Association Rules）是反映一個事物與其他事物之間的相互依存性和關聯性，如果兩個或多個事物之間存在一定的關聯關係，那麼，其中一個事物就能通過其他事物預測到。關聯規則是資料探勘的一個重要技術，用於從大量資料中挖掘出有價值的資料

Java常用的八種排序演算法與程式碼實現（一）：氣泡排序法、插入排序法、選擇排序法

這三種排序演算法適合小規模資料排序－－－　　共同點：基於比較，時間複雜度均為O(n2)，空間複雜度均為O(1)（原地排序演算法）　　不同點：插入排序和氣泡排序是穩定的排序演算法，選擇排序不是－－－　　穩定排序演算法：可以保持數值相等的兩個物件，在排序之

機器學習之樸素貝葉斯演算法與程式碼實現

樸素貝葉斯演算法與程式碼實現演算法原理樸素貝葉斯是經典的機器學習演算法之一，也是為數不多的基於概率論的分類演算法。樸素貝葉斯原理簡單，也很容易實現，多用於文字分類，比如垃圾郵件過濾。該演算法的優點在於簡單易懂、學習效率高、在某些領

Neutron的基本原理與程式碼實現

分享正文大家好，很高興今天能與大家分享一些Neutron的知識。今天分享的思路是：Openstack網路基礎、Neutron的軟體實現、Nova虛擬機器啟動時的網路處理以及OVS流表分析。一、Openstack網路基礎下面對Openstack和Neutr

歸併排序演算法原理分析與程式碼實現

歸併排序是建立在歸併操作上的一種有效的排序演算法。該演算法是採用分治法（Divide and Conquer）的一個非常典型的應用，歸併排序將兩個已排序的表合併成一個表。歸併排序基本原理

Embedding理解與程式碼實現

Embedding 字面理解是 “嵌入”，實質是一種對映，從語義空間到向量空間的對映，同時儘可能在向量空間保持原樣本在語義空間的關係，如語義接近的兩個詞彙在向量空間中的位置也比較接近。下面以一個基於Keras的簡單的文字情感分類問題為例解釋Embedding的訓練過程：首先，

vysor原理與程式碼實現

看過 vysor原理以及Android同屏方案 , 我突然想到整個過程應該如何驗證的問題。於是反編譯了vysor 最新的apk, 其中的程式碼邏輯依然具有很強的借鑑意義。其中通過 shell 環境下呼叫 adb 獲取截圖許可權成為了全篇的亮點所在。以下文字簡要地記錄了個人的理解過程，同時希

二分類模型評估指標的計算方法與程式碼實現

一、定義在研究評估指標之前，先給出分類結果混淆矩陣（confusion matrix）。預測真實正例反例正例 TP FN 反例 FP TN 1.準確率--accuracy 定義：對於給定的測試資料集，分類器正確分類的樣本數與總樣

Java常用的八種排序演算法與程式碼實現(一)

本文需要5分鐘左右閱讀完成，建議收藏以後閱讀，裡面都是乾貨，可以親自試驗一下，如果覺得好用可以幫忙點贊轉發一下，謝謝！交流學習java大資料可以加群460570824。 1.直接插入排序經常碰到這樣一類排序問題：把新的資料插入到已經排好的資料列中。將第一個數和第二個數

Factorization Machines簡介與程式碼實現

介紹

優點

特徵向量例子

演算法原理

FM as Predictor

Learning FM

程式碼實現

相關推薦