【Python】尋找電影品味相似的使用者並推薦相關電影

阿新 • • 發佈：2019-01-06

過程：

用爬蟲抓取豆瓣電影使用者資訊
用多重分類法，定義電影評價等級
計算自己與使用者的皮爾遜相關度
以人為主體分析相似度：找出志同道合的人，可以發現潛在喜歡的商品
以商品為主體分析相似度：找出相似的商品，可以發現潛在的客戶（如亞馬遜的‘買了該商品的使用者還買了’）

電影評價多重分類：

使用者資訊錄入：

#-*- coding: utf-8 -*-
import json
import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )

user_info = {}

#爬取到的資料
user_dict = {
             'ns2250225':[4,3,4,5,4],
             'justin':[3,4,3,4,2],
             'totox':[2,3,5,1,4],
             'fabrice':[4,1,3,4,5],
             'doreen':[3,4,2,5,3]
             }

#錄入使用者資料
def user_data(user_dict):
    for name in user_dict:
        user_info[name] = {u'消失的愛人' : user_dict[name][0]}
        user_info[name][u'霍位元人3'] = user_dict[name][1]
        user_info[name][u'神去村'] = user_dict[name][2]
        user_info[name][u'泰坦尼克號'] = user_dict[name][3]
        user_info[name][u'這個殺手不太冷'] = user_dict[name][4]
        

user_data(user_dict)    

#存放使用者資料
try:
    with open('user_data.txt', 'w') as data:
        for key in user_info:
            data.write(key)
            for key2 in user_info[key]:
                data.write('\t')
                data.write(key2)
                data.write('\t')
                data.write('\t')
                data.write(str(user_info[key][key2]))
                data.write('\n')
            data.write('\n')
except IOError as err:
    print('File error: ' + str(err))

計算皮爾遜相關係數，找出興趣相投的使用者：(插入自己的資料)

from math import sqrt 

#計算皮爾遜相關度(1為完全正相關，-1為完成負相關)
def sim_pearson(prefs, p1, p2):  
    # Get the list of mutually rated items  
    si = {}  
    for item in prefs[p1]:  
        if item in prefs[p2]:  
            si[item] = 1  
  
    # if they are no ratings in common, return 0  
    if len(si) == 0:  
        return 0  
  
    # Sum calculations  
    n = len(si)  
  
    # Sums of all the preferences  
    sum1 = sum([prefs[p1][it] for it in si])  
    sum2 = sum([prefs[p2][it] for it in si])  
  
    # Sums of the squares  
    sum1Sq = sum([pow(prefs[p1][it], 2) for it in si])  
    sum2Sq = sum([pow(prefs[p2][it], 2) for it in si])  
  
    # Sum of the products  
    pSum = sum([prefs[p1][it] * prefs[p2][it] for it in si])  
  
    # Calculate r (Pearson score)  
    num = pSum - (sum1 * sum2 / n)  
    den = sqrt((sum1Sq - pow(sum1, 2) / n) * (sum2Sq - pow(sum2, 2) / n))  

    if den == 0:  
        return 0  
    r = num / den
    
    return r


#插入自己的資料
user_info['me'] = {u'消失的愛人' : 5,
                   u'神去村' : 3,
                   u'炸裂鼓手' : 5}


          
#找出皮爾遜相關係數>0的使用者，說明該使用者與自己的電影品味比較相近
for user in user_info:
    res = sim_pearson(user_info, 'me', user)
    if res > 0:
        print('the user like %s is : %s' % ('me', user))
        print('result :%f\n' % res)

向某使用者推薦電影（加權平均所有人的評價）

#向某個使用者推薦電影(加權平均所有人的評價值)
def getRecommendations(prefs,person,similarity=sim_pearson):
  totals={}
  simSums={}
  for other in prefs:
    # don't compare me to myself
    if other==person: continue
    sim=similarity(prefs,person,other)

    # ignore scores of zero or lower
    if sim<=0: continue
    for item in prefs[other]:
	    
      # only score movies I haven't seen yet
      if item not in prefs[person] or prefs[person][item]==0:
        # Similarity * Score
        totals.setdefault(item,0)
        totals[item]+=prefs[other][item]*sim
        # Sum of similarities
        simSums.setdefault(item,0)
        simSums[item]+=sim

  # Create the normalized list
  rankings=[(total/simSums[item],item) for item,total in totals.items()]

  # Return the sorted list
  rankings.sort()
  rankings.reverse()
  return rankings


#向我推薦電影
res = getRecommendations(user_info, "me")
print('Recommand watching the movie:')
print json.dumps(res, encoding='UTF-8', ensure_ascii=False)

結果與分析：

與我電影口味相近的使用者有：doreen, fabrice
推薦我看的電影有：泰坦尼克號，這個殺手不太冷
以人為主體分析，找出有相似愛好的人，並向這些人推薦商品，可以發現潛在喜歡的商品
而若以商品為主體分析，找出相似的商品，找出喜歡這個產品的人，可以發現商品潛在的客戶

【Python】尋找電影品味相似的使用者並推薦相關電影

過程：用爬蟲抓取豆瓣電影使用者資訊用多重分類法，定義電影評價等級計算自己與使用者的皮爾遜相關度以人為主體分析相似度：找出志同道合的人，可以發現潛在喜歡的商品以商品為主體分析相似度：找出相似的商品，

【Python】尋找第n個默尼森數

題目內容：找第n個默尼森數。P是素數且M也是素數，並且滿足等式M=2P-1，則稱M為默尼森數。例如，P=5，M=2P-1=31，5和31都是素數，因此31是默尼森數。輸入格式: 按提示用input()函式輸入輸出格式： int型別輸入樣例： 4 輸出樣例

【Python】工廠模式和建造者模式的區別

最近在看《精通Python設計模式（Mastering Python Design patterns）》，剛剛看完建造者模式，由於和看完工廠模式隔的時間有點久，再加上兩者本來就很像，有點分不清了。所以，今天又看了看這兩者，區分一下，加強下記憶，以便安心看後邊的設計模式

【Python】從0開始寫爬蟲——豆瓣電影

for tag pes wing 信息 kit headers 自動動畫 1. 最近略忙。。java在搞soap，之前是用工具自動生成代碼的。最近可能會寫一個soap的java調用 2. 這個豆瓣電影的爬蟲。扒信息的部分暫時先做到這了。扒到的信息如下 from s

【Python】大規模電影推薦

簡介推薦系統把我們從洪水般氾濫的資訊中解放出來，為我們制定了個性化的資訊流。網易雲音樂、電子商務平臺等都從推薦系統中獲益頗多。推薦系統的實現是如此簡單，但是在資料量稀疏師很容易產生怪異的結果和過擬合。最簡單最容易理解的方法就是看一下所信賴的人有哪些偏好，從中得到我們的推

【Python】Requests+正則表示式爬取貓眼電影TOP100

1.先獲取到一個頁面，狀態碼200是成功返回 def get_one_page(url): # 獲取一個頁面 try: response = requests.get(url) if response.status_cod

【Python】比較圖片相似度

# Filename: histsimilar.py # -*- coding: utf-8 -*- import Image def make_regalur_image(img, size = (256, 256)): return img.resize

【Python】正則表達式1（未完）

pes mmu get regular rop 則表達式 line out github 1、正則表達式唯一的用途就是在文本中匹配和尋找模式，模式可以簡單，也可以復雜。 2、Regexr 這個網站很個性的就是，有一個community標簽，打開後可以看到評分由高到低

【LeetCode】【Python】Binary Tree Inorder Traversal

nod 不知道 otto div ack return integer neu else Given a binary tree, return the inorder traversal of its nodes‘ values. For example: Gi

【Python】決策樹的python實現

uia bmp say 不知道 times otto outlook lru bgm 【Python】決策樹的python實現 2016-12-08 數據分析師Nieson 1. 決策樹是什麽? 簡單地理解，就是根據一些 feature 進行分類，每個節點提一個問

【Python】基礎知識

數據文件專用一切都元組內存引用傳遞 asc ilo 1. python腳本語言的第一行，目的就是指出，你想要你的這個文件中的代碼用什麽可執行程序去運行它 #!/usr/bin/python 是告訴操作系統執行這個腳本的時候，調用/usr/bin下

【Python】python2.7 安裝配置OpenCV2

pen 2.4.1 安裝 so文件 strong make lib con ack 環境：Ubuntu16.04 anaconda Python2.7 opencv2.4.13 安裝opencv後 import cv2 遇到錯誤信息： No module named cv2

【Python】數組排序

log false blog sort函數 () ron 返回 imp 總結 1.numpy庫：argsort() 　　argsort函數返回的是數組值從小到大的索引值（升序排列）一維： In [1]: import numpy as np In [2]: x

【Python】selenium調用IE11瀏覽器，報錯“找不到元素”NoSuchWindowException: Message：Unable to find element on closed window

conn ont csdn creates logs 註冊 target get 意思當編寫自動化腳本，定位瀏覽器元素時，報如下錯誤：代碼： >>> # coding=utf-8 >>> from selenium import w

【Python】Selenium元素定位錯誤之解決辦法

tor log -m alt src 多個 common nbu invalid 當使用class定位元素時發現報錯：錯誤信息：selenium.common.exceptions.InvalidSelectorException: Message: Compound

【Python】京東商品價格監控

ets amp inpu text init clas bject logs while import requests,json,re,winsound,time class Stock(object): def __init__(self):

【python】python魔法方法(待填坑)

絕對值 tle init cls -m del __init__ 另一個 trunc 參考博文：http://pyzh.readthedocs.io/en/latest/python-magic-methods-guide.html 參考博文英文原版：http://www

【Python】05、python程序結構控制語句

python一、程序結構程序結構：語句和表達式按照什麽樣的順序執行所有語言無非就三種程序結構：順序：默認結構語句從上到下依次一行一行的執行，分支：選擇一個分支執行，永遠最多只執行一個分支循環：二、分支結構語句1、Python的比較操作

【Python】06、python內置數據結構1

python list 一、數據結構與獲取幫助信息1、數據結構通過某種方式（例如對元素進行編號）組織在一起的數據元素的集合，這些數據元素可以是數字或者字符，甚至可以是其它的數據結構。 python的最基本數據結構是序列序列中的每個元素被分配一個序號（即元素的位置），也稱為索引：索引從0開始編

【Python】07、python內置數據結構之字符串及bytes

str 字符串一、字符串1、定義和初始化In [4]: s = "hello python" In [4]: s = "hello python" In [5]: s Out[5]: ‘hello python‘ In [6]: s = ‘hello python‘ In [7]: s Out

【Python】尋找電影品味相似的使用者並推薦相關電影

過程：

電影評價多重分類：

使用者資訊錄入：

計算皮爾遜相關係數，找出興趣相投的使用者：(插入自己的資料)

向某使用者推薦電影（加權平均所有人的評價）

結果與分析：

相關推薦