網約車的車輛熱點聚類1
阿新 • • 發佈:2019-04-27
min set 因此 lte 密度 del pen 初始化 ans
類似Uber,需要處理處不同時間段的不同地區的訂單熱點區域,幫助進行網約車的及時調度處理
每個成交單Order中,都會有訂單上車位置的起始經緯度:open_lat , open_lng
那麽在這個時間段內,哪些地區是高密集訂單區域,好進行及時的調度,所以需要得到不同地區的熱力圖
初期想法是基於經緯度做聚類操作,典型的聚類算法是K- means,一種基於層次的聚類操作:
但是註意,kmeans是不適合做車輛聚類的,因為未來哪些地方會有訂單其實是位置的,而kmeans要求必須制定K值,這就相當於,我最終要分成多少個聚類,顯示不合適;
因此,想到了基於密度的聚類,而且不需要制定類別數,還可以自動識別噪點的DBScan算法
dbscan算法的思路:
輸入:樣本集D=(x1,x2,...,xm),鄰域參數(?,MinPts), 樣本距離度量方式 輸出: 簇劃分C. 1)初始化核心對象集合Ω=∅, 初始化聚類簇數k=0,初始化未訪問樣本集合Γ = D, 簇劃分C = ∅ 2) 對於j=1,2,...m, 按下面的步驟找出所有的核心對象: a) 通過距離度量方式,找到樣本xj的?-鄰域子樣本集N?(xj) b) 如果子樣本集樣本個數滿足|N?(xj)|≥MinPts, 將樣本xj加入核心對象樣本集合:Ω=Ω∪{xj}3)如果核心對象集合Ω=∅,則算法結束,否則轉入步驟4. 4)在核心對象集合Ω中,隨機選擇一個核心對象o,初始化當前簇核心對象隊列Ωcur={o}, 初始化類別序號k=k+1,初始化當前簇樣本集合Ck={o}, 更新未訪問樣本集合Γ=Γ−{o} 5)如果當前簇核心對象隊列Ωcur=∅,則當前聚類簇Ck生成完畢, 更新簇劃分C={C1,C2,...,Ck}, 更新核心對象集合Ω=Ω−Ck, 轉入步驟3。 6)在當前簇核心對象隊列Ωcur中取出一個核心對象o′,通過鄰域距離閾值?找出所有的?-鄰域子樣本集N?(o′),令Δ=N?(o′)∩Γ, 更新當前簇樣本集合Ck=Ck∪Δ, 更新未訪問樣本集合Γ=Γ−Δ, 更新Ωcur=Ωcur∪(Δ∩Ω)−o′,轉入步驟5. 輸出結果為: 簇劃分C={C1,C2,...,Ck}
利用java代碼實現dbscan:
package com.df.dbscan; import java.util.ArrayList; /** * Created by angel */ public class DBScan { private double radius; private int minPts; /** * @param radius 單位米 * @param minPts 最小聚合數 * */ public DBScan(double radius,int minPts) { this.radius = radius; this.minPts = minPts; } public void process(ArrayList<Point> points) { int size = points.size(); int idx = 0; int cluster = 1; while (idx<size) { Point p = points.get(idx++); //choose an unvisited point if (!p.getVisit()) { p.setVisit(true);//set visited ArrayList<Point> adjacentPoints = getAdjacentPoints(p, points); //set the point which adjacent points less than minPts noised if (adjacentPoints != null && adjacentPoints.size() < minPts) { p.setNoised(true); } else { p.setCluster(cluster); for (int i = 0; i < adjacentPoints.size(); i++) { Point adjacentPoint = adjacentPoints.get(i); //only check unvisited point, cause only unvisited have the chance to add new adjacent points if (!adjacentPoint.getVisit()) { adjacentPoint.setVisit(true); ArrayList<Point> adjacentAdjacentPoints = getAdjacentPoints(adjacentPoint, points); //add point which adjacent points not less than minPts noised if (adjacentAdjacentPoints != null && adjacentAdjacentPoints.size() >= minPts) { //adjacentPoints.addAll(adjacentAdjacentPoints); for (Point pp : adjacentAdjacentPoints){ if (!adjacentPoints.contains(pp)){ adjacentPoints.add(pp); } } } } //add point which doest not belong to any cluster if (adjacentPoint.getCluster() == 0) { adjacentPoint.setCluster(cluster); //set point which marked noised before non-noised if (adjacentPoint.getNoised()) { adjacentPoint.setNoised(false); } } } cluster++; } } if (idx%1000==0) { System.out.println(idx); } } } private ArrayList<Point> getAdjacentPoints(Point centerPoint,ArrayList<Point> points) { ArrayList<Point> adjacentPoints = new ArrayList<Point>(); for (Point p:points) { //include centerPoint itself double distance = centerPoint.GetDistance(p); if (distance<=radius) { adjacentPoints.add(p); } } return adjacentPoints; } }View Code
我的處理方式:
所以,我只需要將數據從Hbase中查詢出來,在封裝好具體的需要數據,就可以推送到算法中,最後識別出結果
//查詢Hbase操作 val result = Controll.rowEndFilter2(tableName, startDate, endDate) //將查詢出來的數據組裝成算法需要的結構 import scala.collection.JavaConversions._ for (map <- result) { val lon = map.get("open_lng") val lat = map.get("open_lat") val begin_address_code = map.get("begin_address_code") points.add(new Point(lat.toDouble, lon.toDouble,begin_address_code)) } //算法處理 val dbScan = new DBScan(radius, density) dbScan.process(points) //將java的list轉成scala的list val point_List: List[Point] = JavaConverters.asScalaIteratorConverter(points.iterator()).asScala.toList //得到每一個族下的坐標系 val groupData: Map[Int, List[Point]] = point_List.groupBy(line => line.getCluster) //在將結果進一步處理發送出去即可
網約車的車輛熱點聚類1