用Mahout構建職位推薦引擎【一起學Mahout】
阿新 • • 發佈:2019-01-08
閱讀導讀:
1.如何設計職位推薦引擎的指標?
2.簡述職位推薦引擎所需要的系統架構?
3.如何對推薦結果進行人工比較?
4.職位推薦引擎中什麼情況的資料最好做排除?
Mahout框架包含了一套完整的推薦系統引擎,標準化的資料結構,多樣的演算法實現,簡單的開發流程。Mahout推薦的推薦系統引擎是模組化的,分為5個主要部分組成:資料模型,相似度演算法,近鄰演算法,推薦演算法,演算法評分器。
下面我們將從一個公司案例出發來全面的解釋,如何進行職位推薦引擎指標設計。
案例介紹:
網際網路某職業社交網站,主要產品包括 個人簡歷展示頁,人脈圈,微博及分享連結,職位釋出,職位申請,教育培訓等。
使用者在完成註冊後,需要完善自己的個人資訊,包括教育背景,工作經歷,專案經歷,技能專長等等資訊。然後,你要告訴網站,你是否想找工作!!當你選擇“是”(求職中),網站會從資料庫中為你推薦你可能感興趣的職位。
通過簡短的描述,我們可以粗略地看出,這家職業社交網站的定位和主營業務。核心點有2個:
因此,職位推薦引擎 將成為這個網站的核心功能。
KPI指標設計
2個測試資料集:
部分資料:
我們選擇UserCF,ItemCF,SlopeOne的 3種推薦演算法,進行7種組合的測試。
上圖中,左邊是Application業務系統,右邊是Mahout,下邊是Hadoop叢集。
開發環境mahout版本為0.8。 ,請參考文章:
新建Java類:
UserCityBlock演算法評估的結果是最好的,基於UserCF的演算法比ItemCF都要好,SlopeOne演算法幾乎沒有得分。
原始碼:
搜尋pv.csv:
搜尋pv.csv:
大家可以參考上文中RecommenderFilterOutdateResult.java,自行實現。
這樣,我們就完成用Mahout構建職位推薦引擎的演算法。如果沒有Mahout,我們自己寫這個演算法引擎估計還要花個小半年的時間,善加利用開源技術會幫助我們飛一樣的成長!!
1.如何設計職位推薦引擎的指標?
2.簡述職位推薦引擎所需要的系統架構?
3.如何對推薦結果進行人工比較?
4.職位推薦引擎中什麼情況的資料最好做排除?
1. Mahout推薦系統框架概述
Mahout框架包含了一套完整的推薦系統引擎,標準化的資料結構,多樣的演算法實現,簡單的開發流程。Mahout推薦的推薦系統引擎是模組化的,分為5個主要部分組成:資料模型,相似度演算法,近鄰演算法,推薦演算法,演算法評分器。
2. 需求分析:職位推薦引擎指標設計
下面我們將從一個公司案例出發來全面的解釋,如何進行職位推薦引擎指標設計。
案例介紹:
網際網路某職業社交網站,主要產品包括 個人簡歷展示頁,人脈圈,微博及分享連結,職位釋出,職位申請,教育培訓等。
使用者在完成註冊後,需要完善自己的個人資訊,包括教育背景,工作經歷,專案經歷,技能專長等等資訊。然後,你要告訴網站,你是否想找工作!!當你選擇“是”(求職中),網站會從資料庫中為你推薦你可能感興趣的職位。
通過簡短的描述,我們可以粗略地看出,這家職業社交網站的定位和主營業務。核心點有2個:
- 使用者:儘可能多的儲存有效完整的使用者資料
-
服務:幫助使用者找到工作,幫助獵頭和企業找到員工
因此,職位推薦引擎 將成為這個網站的核心功能。
KPI指標設計
- 通過推薦帶來的職位瀏覽量: 職位網頁的PV
-
通過推薦帶來的職位申請量: 職位網頁的有效轉化
3. 演算法模型:推薦演算法
2個測試資料集:
-
pv.csv: 職位被瀏覽的資訊,包括使用者ID,職位ID
-
job.csv: 職位基本資訊,包括職位ID,釋出時間,工資標準
1). pv.csv
- 2列資料:使用者ID,職位ID(userid,jobid)
- 瀏覽記錄:2500條
- 使用者數:1000個,使用者ID:1-1000
-
職位數:200個,職位ID:1-200
1,11
2,136
2,187
3,165
3,1
3,24
4,8
4,199
5,32
5,100
6,14
7,59
7,147
8,92
9,165
9,80
9,171
10,45
10,31
10,1
10,152
2). job.csv
- 3列資料:職位ID,釋出時間,工資標準(jobid,create_date,salary)
-
職位數:200個,職位ID:1-200
部分資料:
1,2013-01-24,5600為了完成KPI的指標,我們把問題用“技術”語言轉化一下:我們需要讓職位的推薦結果更準確,從而增加使用者的點選。
2,2011-03-02,5400
3,2011-03-14,8100
4,2012-10-05,2200
5,2011-09-03,14100
6,2011-03-05,6500
7,2012-06-06,37000
8,2013-02-18,5500
9,2010-07-05,7500
10,2010-01-23,6700
11,2011-09-19,5200
12,2010-01-19,29700
13,2013-09-28,6000
14,2013-10-23,3300
15,2010-10-09,2700
16,2010-07-14,5100
17,2010-05-13,29000
18,2010-01-16,21800
19,2013-05-23,5700
20,2011-04-24,5900
- 1. 組合使用推薦演算法,選出“評估推薦器”驗證得分較高的演算法
- 2. 人工驗證推薦結果
- 3. 職位有時效性,推薦的結果應該是釋出半年內的職位
-
4. 工資的標準,應不低於使用者瀏覽職位工資的平均值的80%
我們選擇UserCF,ItemCF,SlopeOne的 3種推薦演算法,進行7種組合的測試。
- userCF1: LogLikelihoodSimilarity + NearestNUserNeighborhood + GenericBooleanPrefUserBasedRecommender
- userCF2: CityBlockSimilarity+ NearestNUserNeighborhood + GenericBooleanPrefUserBasedRecommender
- userCF3: UserTanimoto + NearestNUserNeighborhood + GenericBooleanPrefUserBasedRecommender
- itemCF1: LogLikelihoodSimilarity + GenericBooleanPrefItemBasedRecommender
- itemCF2: CityBlockSimilarity+ GenericBooleanPrefItemBasedRecommender
- itemCF3: ItemTanimoto + GenericBooleanPrefItemBasedRecommender
-
slopeOne:SlopeOneRecommender
4. 架構設計:職位推薦引擎系統架構
上圖中,左邊是Application業務系統,右邊是Mahout,下邊是Hadoop叢集。
- 1. 當資料量不太大時,並且演算法複雜,直接選擇用Mahout讀取CSV或者Database資料,在單機記憶體中進行計算。Mahout是多執行緒的應用,會並行使用單機所有系統資源。
- 2. 當資料量很大時,選擇並行化演算法(ItemCF),先業務系統的資料匯入到Hadoop的HDFS中,然後用Mahout訪問HDFS實現演算法,這時演算法的效能與整個Hadoop叢集有關。
-
3. 計算後的結果,儲存到資料庫中,方便查詢
5. 程式開發:基於Mahout的推薦演算法實現
開發環境mahout版本為0.8。 ,請參考文章:
新建Java類:
- RecommenderEvaluator.java, 選出“評估推薦器”驗證得分較高的演算法
- RecommenderResult.java, 對指定數量的結果人工比較
- RecommenderFilterOutdateResult.java,排除過期職位
-
RecommenderFilterSalaryResult.java,排除工資過低的職位
1). RecommenderEvaluator.java, 選出“評估推薦器”驗證得分較高的算
原始碼:public class RecommenderEvaluator {執行結果,控制檯輸出:
final static int NEIGHBORHOOD_NUM = 2;
final static int RECOMMENDER_NUM = 3;
public static void main(String[] args) throws TasteException, IOException {
String file = "datafile/job/pv.csv";
DataModel dataModel = RecommendFactory.buildDataModelNoPref(file);
userLoglikelihood(dataModel);
userCityBlock(dataModel);
userTanimoto(dataModel);
itemLoglikelihood(dataModel);
itemCityBlock(dataModel);
itemTanimoto(dataModel);
slopeOne(dataModel);
}
public static RecommenderBuilder userLoglikelihood(DataModel dataModel) throws TasteException, IOException {
System.out.println("userLoglikelihood");
UserSimilarity userSimilarity = RecommendFactory.userSimilarity(RecommendFactory.SIMILARITY.LOGLIKELIHOOD, dataModel);
UserNeighborhood userNeighborhood = RecommendFactory.userNeighborhood(RecommendFactory.NEIGHBORHOOD.NEAREST, userSimilarity, dataModel, NEIGHBORHOOD_NUM);
RecommenderBuilder recommenderBuilder = RecommendFactory.userRecommender(userSimilarity, userNeighborhood, false);
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
return recommenderBuilder;
}
public static RecommenderBuilder userCityBlock(DataModel dataModel) throws TasteException, IOException {
System.out.println("userCityBlock");
UserSimilarity userSimilarity = RecommendFactory.userSimilarity(RecommendFactory.SIMILARITY.CITYBLOCK, dataModel);
UserNeighborhood userNeighborhood = RecommendFactory.userNeighborhood(RecommendFactory.NEIGHBORHOOD.NEAREST, userSimilarity, dataModel, NEIGHBORHOOD_NUM);
RecommenderBuilder recommenderBuilder = RecommendFactory.userRecommender(userSimilarity, userNeighborhood, false);
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
return recommenderBuilder;
}
public static RecommenderBuilder userTanimoto(DataModel dataModel) throws TasteException, IOException {
System.out.println("userTanimoto");
UserSimilarity userSimilarity = RecommendFactory.userSimilarity(RecommendFactory.SIMILARITY.TANIMOTO, dataModel);
UserNeighborhood userNeighborhood = RecommendFactory.userNeighborhood(RecommendFactory.NEIGHBORHOOD.NEAREST, userSimilarity, dataModel, NEIGHBORHOOD_NUM);
RecommenderBuilder recommenderBuilder = RecommendFactory.userRecommender(userSimilarity, userNeighborhood, false);
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
return recommenderBuilder;
}
public static RecommenderBuilder itemLoglikelihood(DataModel dataModel) throws TasteException, IOException {
System.out.println("itemLoglikelihood");
ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.LOGLIKELIHOOD, dataModel);
RecommenderBuilder recommenderBuilder = RecommendFactory.itemRecommender(itemSimilarity, false);
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
return recommenderBuilder;
}
public static RecommenderBuilder itemCityBlock(DataModel dataModel) throws TasteException, IOException {
System.out.println("itemCityBlock");
ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.CITYBLOCK, dataModel);
RecommenderBuilder recommenderBuilder = RecommendFactory.itemRecommender(itemSimilarity, false);
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
return recommenderBuilder;
}
public static RecommenderBuilder itemTanimoto(DataModel dataModel) throws TasteException, IOException {
System.out.println("itemTanimoto");
ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.TANIMOTO, dataModel);
RecommenderBuilder recommenderBuilder = RecommendFactory.itemRecommender(itemSimilarity, false);
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
return recommenderBuilder;
}
public static RecommenderBuilder slopeOne(DataModel dataModel) throws TasteException, IOException {
System.out.println("slopeOne");
RecommenderBuilder recommenderBuilder = RecommendFactory.slopeOneRecommender();
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
return recommenderBuilder;
}
public static RecommenderBuilder knnLoglikelihood(DataModel dataModel) throws TasteException, IOException {
System.out.println("knnLoglikelihood");
ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.LOGLIKELIHOOD, dataModel);
RecommenderBuilder recommenderBuilder = RecommendFactory.itemKNNRecommender(itemSimilarity, new NonNegativeQuadraticOptimizer(), 10);
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
return recommenderBuilder;
}
public static RecommenderBuilder knnTanimoto(DataModel dataModel) throws TasteException, IOException {
System.out.println("knnTanimoto");
ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.TANIMOTO, dataModel);
RecommenderBuilder recommenderBuilder = RecommendFactory.itemKNNRecommender(itemSimilarity, new NonNegativeQuadraticOptimizer(), 10);
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
return recommenderBuilder;
}
public static RecommenderBuilder knnCityBlock(DataModel dataModel) throws TasteException, IOException {
System.out.println("knnCityBlock");
ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.CITYBLOCK, dataModel);
RecommenderBuilder recommenderBuilder = RecommendFactory.itemKNNRecommender(itemSimilarity, new NonNegativeQuadraticOptimizer(), 10);
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
return recommenderBuilder;
}
public static RecommenderBuilder svd(DataModel dataModel) throws TasteException {
System.out.println("svd");
RecommenderBuilder recommenderBuilder = RecommendFactory.svdRecommender(new ALSWRFactorizer(dataModel, 5, 0.05, 10));
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
return recommenderBuilder;
}
public static RecommenderBuilder treeClusterLoglikelihood(DataModel dataModel) throws TasteException {
System.out.println("treeClusterLoglikelihood");
UserSimilarity userSimilarity = RecommendFactory.userSimilarity(RecommendFactory.SIMILARITY.LOGLIKELIHOOD, dataModel);
ClusterSimilarity clusterSimilarity = RecommendFactory.clusterSimilarity(RecommendFactory.SIMILARITY.FARTHEST_NEIGHBOR_CLUSTER, userSimilarity);
RecommenderBuilder recommenderBuilder = RecommendFactory.treeClusterRecommender(clusterSimilarity, 3);
RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
return recommenderBuilder;
}
}
userLoglikelihood視覺化“評估推薦器”輸出:
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.2741487771272658
Recommender IR Evaluator: [Precision:0.6424242424242422,Recall:0.4098360655737705]
userCityBlock
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.575306732961736
Recommender IR Evaluator: [Precision:0.919580419580419,Recall:0.4371584699453552]
userTanimoto
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.5546485136181523
Recommender IR Evaluator: [Precision:0.6625766871165644,Recall:0.41803278688524603]
itemLoglikelihood
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.5398332608612343
Recommender IR Evaluator: [Precision:0.26229508196721296,Recall:0.26229508196721296]
itemCityBlock
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.9251437840891661
Recommender IR Evaluator: [Precision:0.02185792349726776,Recall:0.02185792349726776]
itemTanimoto
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.9176432856689655
Recommender IR Evaluator: [Precision:0.26229508196721296,Recall:0.26229508196721296]
slopeOne
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.0
Recommender IR Evaluator: [Precision:0.01912568306010929,Recall:0.01912568306010929]
UserCityBlock演算法評估的結果是最好的,基於UserCF的演算法比ItemCF都要好,SlopeOne演算法幾乎沒有得分。
2). RecommenderResult.java, 對指定數量的結果人工比較
為得到差異化結果,我們分別取UserCityBlock,itemLoglikelihood,對推薦結果人工比較。原始碼:
public class RecommenderResult {控制檯輸出:只擷取部分結果
final static int NEIGHBORHOOD_NUM = 2;
final static int RECOMMENDER_NUM = 3;
public static void main(String[] args) throws TasteException, IOException {
String file = "datafile/job/pv.csv";
DataModel dataModel = RecommendFactory.buildDataModelNoPref(file);
RecommenderBuilder rb1 = RecommenderEvaluator.userCityBlock(dataModel);
RecommenderBuilder rb2 = RecommenderEvaluator.itemLoglikelihood(dataModel);
LongPrimitiveIterator iter = dataModel.getUserIDs();
while (iter.hasNext()) {
long uid = iter.nextLong();
System.out.print("userCityBlock =>");
result(uid, rb1, dataModel);
System.out.print("itemLoglikelihood=>");
result(uid, rb2, dataModel);
}
}
public static void result(long uid, RecommenderBuilder recommenderBuilder, DataModel dataModel) throws TasteException {
List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM);
RecommendFactory.showItems(uid, list, false);
}
}
...我們檢視uid=974的使用者推薦資訊:
userCityBlock =>uid:968,(61,0.333333)
itemLoglikelihood=>uid:968,(121,1.429362)(153,1.239939)(198,1.207726)
userCityBlock =>uid:969,
itemLoglikelihood=>uid:969,(75,1.326499)(30,0.873100)(85,0.763344)
userCityBlock =>uid:970,
itemLoglikelihood=>uid:970,(13,0.748417)(156,0.748417)(122,0.748417)
userCityBlock =>uid:971,
itemLoglikelihood=>uid:971,(38,2.060951)(104,1.951208)(83,1.941735)
userCityBlock =>uid:972,
itemLoglikelihood=>uid:972,(131,1.378395)(4,1.349386)(87,0.881816)
userCityBlock =>uid:973,
itemLoglikelihood=>uid:973,(196,1.432040)(140,1.398066)(130,1.380335)
userCityBlock =>uid:974,(19,0.200000)
itemLoglikelihood=>uid:974,(145,1.994049)(121,1.794289)(98,1.738027)
...
搜尋pv.csv:
> pv[which(pv$userid==974),]搜尋job.csv:
userid jobid
2426 974 106
2427 974 173
2428 974 82
2429 974 188
2430 974 78
> job[job$jobid %in% c(145,121,98,19),]上面兩種演算法,推薦的結果都是2010年的職位,這些結果並不是太好,接下來我們要排除過期職位,只保留2013年的職位。
jobid create_date salary
19 19 2013-05-23 5700
98 98 2010-01-15 2900
121 121 2010-06-19 5300
145 145 2013-08-02 6800
3).RecommenderFilterOutdateResult.java,排除過期職位
原始碼:public class RecommenderFilterOutdateResult {控制檯輸出:只擷取部分結果
final static int NEIGHBORHOOD_NUM = 2;
final static int RECOMMENDER_NUM = 3;
public static void main(String[] args) throws TasteException, IOException {
String file = "datafile/job/pv.csv";
DataModel dataModel = RecommendFactory.buildDataModelNoPref(file);
RecommenderBuilder rb1 = RecommenderEvaluator.userCityBlock(dataModel);
RecommenderBuilder rb2 = RecommenderEvaluator.itemLoglikelihood(dataModel);
LongPrimitiveIterator iter = dataModel.getUserIDs();
while (iter.hasNext()) {
long uid = iter.nextLong();
System.out.print("userCityBlock =>");
filterOutdate(uid, rb1, dataModel);
System.out.print("itemLoglikelihood=>");
filterOutdate(uid, rb2, dataModel);
}
}
public static void filterOutdate(long uid, RecommenderBuilder recommenderBuilder, DataModel dataModel) throws TasteException, IOException {
Set jobids = getOutdateJobID("datafile/job/job.csv");
IDRescorer rescorer = new JobRescorer(jobids);
List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM, rescorer);
RecommendFactory.showItems(uid, list, true);
}
public static Set getOutdateJobID(String file) throws IOException {
BufferedReader br = new BufferedReader(new FileReader(new File(file)));
Set jobids = new HashSet();
String s = null;
while ((s = br.readLine()) != null) {
String[] cols = s.split(",");
SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd");
Date date = null;
try {
date = df.parse(cols[1]);
if (date.getTime() < df.parse("2013-01-01").getTime()) {
jobids.add(Long.parseLong(cols[0]));
}
} catch (ParseException e) {
e.printStackTrace();
}
}
br.close();
return jobids;
}
}
class JobRescorer implements IDRescorer {
final private Set jobids;
public JobRescorer(Set jobs) {
this.jobids = jobs;
}
@Override
public double rescore(long id, double originalScore) {
return isFiltered(id) ? Double.NaN : originalScore;
}
@Override
public boolean isFiltered(long id) {
return jobids.contains(id);
}
}
...我們檢視uid=994的使用者推薦資訊:
itemLoglikelihood=>uid:965,(200,0.829600)(122,0.748417)(170,0.736340)
userCityBlock =>uid:966,(114,0.250000)
itemLoglikelihood=>uid:966,(114,1.516898)(101,0.864536)(99,0.856057)
userCityBlock =>uid:967,
itemLoglikelihood=>uid:967,(105,0.873100)(114,0.725016)(168,0.707119)
userCityBlock =>uid:968,
itemLoglikelihood=>uid:968,(174,0.735004)(39,0.696716)(185,0.696171)
userCityBlock =>uid:969,
itemLoglikelihood=>uid:969,(197,0.723203)(81,0.710230)(167,0.668358)
userCityBlock =>uid:970,
itemLoglikelihood=>uid:970,(13,0.748417)(122,0.748417)(28,0.736340)
userCityBlock =>uid:971,
itemLoglikelihood=>uid:971,(28,1.540753)(174,1.511881)(39,1.435575)
userCityBlock =>uid:972,
itemLoglikelihood=>uid:972,(14,0.800605)(60,0.794088)(163,0.710230)
userCityBlock =>uid:973,
itemLoglikelihood=>uid:973,(56,0.795529)(13,0.712680)(120,0.701026)
userCityBlock =>uid:974,(19,0.200000)
itemLoglikelihood=>uid:974,(145,1.994049)(89,1.578694)(19,1.435193)
...
搜尋pv.csv:
> pv[which(pv$userid==974),]搜尋job.csv:
userid jobid
2426 974 106
2427 974 173
2428 974 82
2429 974 188
2430 974 78
> job[job$jobid %in% c(19,145,89),]排除過期的職位比較,我們發現userCityBlock結果都是19,itemLoglikelihood的第2,3的結果被替換為了得分更低的89和19。
jobid create_date salary
19 19 2013-05-23 5700
89 89 2013-06-15 8400
145 145 2013-08-02 6800
4).RecommenderFilterSalaryResult.java,排除工資過低的職位
我們檢視uid=994的使用者,瀏覽過的職位。> job[job$jobid %in% c(106,173,82,188,78),]平均工資為=6140,我們覺得使用者的瀏覽職位的行為,一般不會看比自己現在工資低的職位,因此設計演算法,排除工資低於平均工資80%的職位,即排除工資小於4912的推薦職位(6140*0.8=4912)
jobid create_date salary
78 78 2012-01-29 6800
82 82 2010-07-05 7500
106 106 2011-04-25 5200
173 173 2013-09-13 5200
188 188 2010-07-14 6000
大家可以參考上文中RecommenderFilterOutdateResult.java,自行實現。
這樣,我們就完成用Mahout構建職位推薦引擎的演算法。如果沒有Mahout,我們自己寫這個演算法引擎估計還要花個小半年的時間,善加利用開源技術會幫助我們飛一樣的成長!!