coursera公開課——recommender system作業(第二週)
阿新 • • 發佈:2019-01-02
寫這麼醜的程式碼我也是醉了,繼續學習。
第二週的assignment:
- Mean Rating: Calculate the mean rating for each movie, order with the highest rating listed first, and submit the top 5.
- % of ratings 4+: Calculate the percentage of ratings for each movie that are 4 or higher. Order with the highest percentage first, and submit the top 5.
- Rating Count: Count the number of ratings for each movie, order with the most number of ratings first, and submit the top 5.
- Top 5 Star Wars: Calculate movies that most often occur with Star Wars: Episode IV - A New Hope (1977) using the (x+y)/x method described in class. In other words, for each movie, calculate the percentage of Star Wars raters who also rated that movie. Order with the highest percentage first, and submit the top 5.
#coding:utf-8
import csv
#top n function
def topn(name,scores,n=5):
tmpscores=scores[:] #create a new array
tmpscores.sort()
flag=[1 for i in range(len(name))] # flags
for i in range(n):
for j in range(len(name)):
if scores[j]==tmpscores[-1-i]:
if flag[j]:
flag[j]=0
print name[j],scores[j]
def caldiv(name,array1,array2):
result=[0.0 for i in range(len(name))]
for i in range(len(name)):
if i!=0:
result[i]=array1[i]*1.0/array2[i]
return result
star_level=4
csvfile=file('A1Ratings.csv','rU')
reader=csv.reader(csvfile,dialect='excel')
for line in reader:
if reader.line_num==1:
name=line
scores=[0 for i in range(len(name))]
totalcount=[0 for i in range(len(name))]
star_count=[0 for i in range(len(name))]
if reader.line_num!=1:
for num in name:
ff=name.index(num)
if ff>0:
temp=1
item=line[ff]
if not item.strip(): # to solve the proble of ""
item=0
temp=0
scores[ff]=scores[ff]+int(item)
totalcount[ff]=totalcount[ff]+temp
if int(item)>=star_level:
star_count[ff]=star_count[ff]+1
average=caldiv(name,scores,totalcount)
average1=caldiv(name,star_count,totalcount)
topn(name,average)
topn(name,average1)
csvfile.close()
csvfile=file('A1Ratings.csv','rU')
reader=csv.reader(csvfile,dialect='excel')
sit=1
count=[0.0 for i in range(len(name))]
for line in reader:
if reader.line_num!=1:
for i in range(len(name)):
if not line[i].strip():
line[i]=0
if i>0 and i!=sit:
if int(line[sit])*int(line[i]):
count[i]=count[i]+1.0/15
topn(name,count,5)
中間遇到的問題:
1. new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
initial code:
csvfile1=file(‘A1Ratings.csv’,’rb’)
update code:
csvfile1=file(‘A1Ratings.csv’,’rU’)
2. 問題程式碼如下:
for line in reader:
print line
if reader.line_num==1:
name=line
scores=[0 for i in range(len(name))]
totalcount=[0 for i in range(len(name))]
if reader.line_num!=1:
for num in name:
if name.index(num)>0:
for item in line:
if name.index(num)==line.index(item):
temp=1
# print item,num
if not item.strip():
item=0
temp=0
scores[name.index(num)]=scores[name.index(num)]+int(item)
totalcount[name.index(num)]=totalcount[name.index(num)]+temp
print scores
print totalcount
csvfile.close()
打flag問題,如果沒有打flag,當有相同的分數時(如4分),會定位到第一個打4分的位置。