【推薦演算法】協同過濾演算法——基於使用者 Java實現
基本概念就不過多介紹了,相信能看明白的都瞭解。如果想了解相關推薦先做好知識儲備:
1.什麼事推薦演算法
2.什麼是基於鄰域的推薦演算法
筆者選用的是GroupLens的MoviesLens資料
傳送門GroupLens
資料集處理
此處擷取資料 UserId + MovieId 作為隱反饋資料。個人的實現方式並不是很好,之後再考慮優化,如果有好的想法歡迎小紙條。
基本設定專案結構如下:
/project
/analyzer --推薦分析
-CollaborativeFileringanalyzer
/bean --資料元組
-BasicBean
-HabitsBean
/input --輸入設定
-ReaderFormat
/recommender --推薦功能
-UserRecommender
首先思路是擷取MovieLens資料,轉化為格式化的書籍格式。MovieLens資料基本格式為
| user id | item id | rating | timestamp |
讀取後的資料為表結構,實際可以用 Map 或者 二維陣列 進行儲存。
考慮到之後轉化的問題,決定用二維陣列。
設定BasicBean用於儲存表結構中的行,主要設定List < String >用於儲存一行資料中的單項資料
/**
* A row of data sets describes in witch the parameters are included.
*
* @author wqd
* 2016/01/18
*/
public class BasicBean {
private List<String> parameters;
// private int num;
private boolean tableHead;
///Default constructor,the row set n floders and is or not a table head
public BasicBean(boolean head) {
parameters = new ArrayList<String>();
this.tableHead = head;
}
//Default constructor,the row set table head and how much the row
//set is defined by the variable parameters,it isn't a table head
public BasicBean(String... strings) {
this(false, strings);
}
//Default constructor,the row set table head and how much the row
//set is defined by the variable parameters and is or not a table head
public BasicBean(boolean head, String... strings) {
parameters = new ArrayList<String>();
for(String string : strings) {
parameters.add(string);
}
// this.num = parameters.size();
this.tableHead = head;
}
public int add(String param) {
parameters.add(param);
return this.getSize();
}
//replace a parameter value pointed to a new value
//If success,return true.If not,return false.
public boolean set(int index, String param) {
if(index < this.getSize())
parameters.set(index, param);
else
return false;
return true;
}
//Get the head.If it has table head,return ture.
//If not,return flase;
public boolean isHead() {
return tableHead;
}
//Override toString()
public String toString() {
StringBuilder str = new StringBuilder(" ");
int len = 1;
for (String string : parameters) {
str.append("\t|" + string);
if(len++ % 20 == 0)
str.append("\n");
}
return str.toString();
}
//Get number of parameters
public int getSize() {
return parameters.size();
}
//Get array
public List<String> getArray() {
return this.parameters;
}
//Get ID of a set
public int getId() {
return this.getInt(0);
}
public String getString(int index) {
return parameters.get(index);
}
public int getInt(int index) {
return Integer.valueOf(parameters.get(index));
}
public boolean getBoolean(int index) {
return Boolean.valueOf(parameters.get(index));
}
public float getFloat(int index) {
return Float.valueOf(parameters.get(index));
}
}
在原資料讀取之後,資料處理的話效率還是比較差,冗餘欄位比較多,因為一個使用者會對多個電影反饋資料。因此,將
| user id | item id | rating | timestamp |
=>
| user id | item id 1 | item id 2 | item id 3 | item id 4 …
這邊設定HabitsBean用於儲存,單獨將id進行抽取,直接儲存在Bean中。實際在list中,儲存user item ids,原因是在之後進行操作時,ID操作頻繁。
public class HabitsBean extends BasicBean {
private int id ;
//get the ID
public int getId() {
return id;
}
//set the ID
public void setId(int id) {
this.id = id;
}
public HabitsBean() {
this(-1);
}
//default id is -1,it means the id hadn't been evaluated
public HabitsBean(int id) {
this.id = id;
}
//Override Object toString() method
public String toString() {
StringBuilder str = new StringBuilder("HabitBean " + this.id + " :");
str.append(super.toString());
return str.toString();
}
}
將元組資料讀取之後,再將元組資料進行壓縮重組,轉化為方便與處理的資料格式。設定ReaderFormat進行處理,Demo如下:
/**
* This class for reading training and test files.It can
* be suitable for Grouplens and other data sets.
* @author wqd
*
*/
public class ReaderFormat {
List<BasicBean> lists;
List<HabitsBean> formLists;
public List<BasicBean> read (String filePath) throws IOException {
@SuppressWarnings("resource")
BufferedReader in = new BufferedReader(
new FileReader(filePath));
String s;
BasicBean basicBean = null;
lists = new ArrayList<BasicBean>();
while((s = in.readLine()) != null) {
// System.out.println(s);
String[] params = s.split("\t");
// for (String string : params) {
// System.out.println(string);
// }
basicBean = new BasicBean(params);
lists.add(basicBean);
}
return lists;
}
//combine user log like | userID | habitID | ...
//to userID and | habitID1 | habitID2 | habitID3 | ...
//sort the userID
public List<HabitsBean> formateLogUser(String filePath) throws IOException {
lists = this.read(filePath);
formLists = new LinkedList<HabitsBean>();
HabitsBean row = null;
for (BasicBean basicBean : lists) {
if(basicBean.) {
row = new HabitsBean(1);
row.setId(basicBean.getInt(0));
row.add(basicBean.getString(1));
formLists.add(row);
} else {
this.addBinarySerch(formLists, basicBean);
}
}
return formLists;
}
//binary serch
private void addBinarySerch(List<HabitsBean> lists, BasicBean bean) {
int start = 0;
int end = lists.size()-1;
int pointer = (start + end + 1) / 2;
HabitsBean row = lists.get(pointer);
while(start <= end) {
if(row.getId() == bean.getId()) {
row.add(bean.getString(1));
lists.set(pointer, row);
return ;
} else if(start == end) {
break;
}else if(row.getId() > bean.getId()) {
end = pointer;
} else if(row.getId() < bean.getId()) {
start = pointer;
}
pointer = (start + end + 1) / 2;
row = lists.get(pointer);
}
HabitsBean newBean = new HabitsBean(bean.getId());
newBean.add(bean.getString(1));
lists.add(newBean);
return ;
}
// test
public static void main(String[] args) {
ReaderFormat readerFormat = new ReaderFormat();
try {
List<HabitsBean> lists = readerFormat.formateLogUser("E:/WorkSpace/Input/ml-100k/u1.base");
for (HabitsBean habitsBean : lists) {
System.out.println(habitsBean.toString());
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
推薦演算法
協同過濾演算法的核心思想是根據使用者間的相似度,來進行推薦。
N(u),N(v)表示u,v使用者有過隱性反饋的集合,Jaccard公式
或者採用餘弦相似度