You Are (Probably) Here: Better Map Pins with DBSCAN & Random Forests

阿新 • • 發佈：2018-12-29

You Are (Probably) Here: Better Map Pins with DBSCAN & Random Forests

Users of Foursquare City Guide and Foursquare Swarm (in addition to users of the thousands of apps built by our API and data partners) routinely interact with our venues on a map. A map is the most natural backdrop against which to present geospatial data. Yet, the problem of determining where exactly to drop a venue’s map pin is a surprisingly difficult one. A common, but limiting, solution is to geocode a venue’s street address to a pair of latitude/longitude coordinates. At Foursquare, however, we have accumulated a massive location dataset that has allowed us to sidestep the geocoding approach altogether. In this post, we will describe how we harness the patterns inherent in user behavior to continually improve map pin placement over time.

Unbeholden to Geocoders

Geocoding venue addresses is not only expensive to do at scale, but also severely limited by the ability of geocoders to parse and error-correct a wide variety of international addresses. In addition, commercial geocoders’ address coverage and resolution vary widely across, and even within, countries. The point a geocoder returns could be the exact rooftop or doorstep coordinates in the best case, or an

interpolation against published address ranges along the length of the street in the average case, or simply the centroid of the postal code or city.

A lot of place databases are essentially like “yellow pages” in that they contain basic information of the kind you might find listed in a business directory (or that you could call a place and get over the phone). Geocoding is often the only way that place records can be rendered on a map at scale. But at Foursquare, for every one of over twelve billion check-ins to date, we know where a user was physically located when they checked in at a specific venue.

From Filtering to Clustering

In the past, our approach to using these check-ins to place map pins was rather straightforward — for every venue, we simply computed the centroid of the latitude/longitude coordinates of its check-ins, and dropped the pin at that centroid. One problem with this approach is that GPS data is noisy, so not all check-ins are equal. Mobile phones report a horizontal accuracy along with latitude/longitude coordinates, where a numerically larger value corresponds to a higher error or lower confidence. When we noticed our centroids were thrown off by check-ins with large horizontal accuracy values, we naturally resorted to filtering them out, but that did not solve the entire problem either.

As we visualized check-ins at one venue after another, we noticed the underlying pattern in human behavior that generates check-ins staring right back at us. Most users check in when they’re physically inside a venue. But sometimes, they check in from the parking lot, or from across the street as they walk in or out. Less frequently, they might cheat at the Foursquare Swarm game and check in from somewhere else altogether in order to get coins, unlock stickers, or become mayor. We can usually detect fraudulent check-ins, but regardless, the key observation here is that check-ins are inherently clustered, with some outliers.

Check-ins at a Red Robin in Bellevue, WA showing distinct clusters centered around the actual restaurant location and on the other side of the street.

Our new approach, therefore, was to identify clusters in the check-ins, and use the centroid of just the “dominant,” or largest, cluster as the map pin location. Ideally, this cluster should consist of the check-ins that originated from inside the venue itself. There will typically be smaller clusters corresponding to legitimate check-ins from the parking lot or across the street. The rest are noise for our purposes.

Combining DBSCAN and Random Forests

Having reduced our situation to a flavor of clustering problem, our goal became to build a system that was robust to outliers and which made no assumptions about the size or shape of individual clusters or how many clusters there were in the data. We identified DBSCAN, a clustering algorithm that is well known and tested in the industry, as being an effective option for our purposes. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a highly versatile algorithm, and, as the name suggests, it is particularly well suited to datasets that contain noise in a way that more popular clustering methods, such as k-means, are not. At its core, once again as its name suggests, DBSCAN is density-based, and especially when operating in two-dimensional space, the algorithm reflects the intuition by which our brains visually pick out clusters of points.

The DBSCAN algorithm takes two parameters, canonically referred to as MinPts and Eps. Without going into too much detail, MinPts is a number of neighboring points and Eps is a radius, and together, they intuitively describe the density criteria for points to form a cluster. The fact that there are only two parameters should technically make it easy to evaluate different models, but the catch is that the distribution of check-in densities in the real world varies greatly as a function of venue shape and size, which in turn are a function of category or chain, and location. So, the check-ins at a suburban Target look very different from the check-ins at a bar in Manhattan. There is no single value of MinPts and Eps that would work globally.

You Are (Probably) Here: Better Map Pins with DBSCAN & Random Forests

You Are (Probably) Here: Better Map Pins with DBSCAN & Random ForestsUsers of Foursquare City Guide and Foursquare Swarm (in addition to users of the t

You really CAN follow your nose: People with a good sense of smell are better at navigation

When asking for directions, you should look for someone with a good sense of smell. That is the advice of scientists who have found that people with a natu

You are what you think, what you read, who you with, and how you are taught.

I am a student in Wuhan University, one of the prestigious universities in China. My major is the automation. My profession is chosen b

With Project xCloud, Xbox Wants to Bring Gaming Anywhere You Are

When I take the Xbox gamepad to start playing Halo: Master Chief Collection, I say something I've said approximately 17,000 times in my life: "Let me just

nyoj 282 You are my brother

enter nts align other div proc strong ron cee You are my brother 時間限制：1000 ms | 內存限制：65535 KB 難度：3 描述 Little A gets to know a new frien

you are the one(區間dp)

ttr java 第一個 meet void esp ++ cau there 傳送門 You Are the One Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/O

vue報錯：Component template should contain exactly one root element. If you are using v-if on multiple elements, use v-else-if to chain them instead.

顯示 sin ont img root png ima pan 如果在.vue文件中引入了 element-ui 的 table 和 pagination 組件後，報錯：Component template should contain exactly one root

【maven】maven的web項目打包報錯：No compiler is provided in this environment. Perhaps you are running on a JRE rather than a JDK

應用 cga snapshot ace owin span ons sed sse 打包過程中報錯如下： No compiler is provided in this environment. Perhaps you are running on a JRE rather

提交到github報錯Please tell me who you are

global bject config count tel xxx hub his 所在 *** Please tell me who you are. Run git config --global user.email "[email protected]"

[轉]You Could Become an AI Master Before You Know It. Here’s How.

roc -i rod mil eat difficult company been putting 轉自：https://www.technologyreview.com/s/608921/ai-algorithms-are-starting-to-teac

Sorry, the page you are looking for is currently unavailable. Please try again later. Nginx

ges 是否 gin blog tar ble try bsp star 訪問html可以正常訪問,但是訪問PHP則錯誤,原因: nginx不能正常通過FastCGI結果訪問PHP 查看php-fpm是否正常運行: 果然沒有,重啟php-fpm: /etc/init.

git commit -m "wrote a readme file" *** Please tell me who you are.xxx

-m commit file 技術 mage logs eas .com 分享剛使用git的小白，提交到倉庫報錯，原來是沒有配置用戶名和email。配置完成即可！ OK！git commit -m "wrote a readme file&qu

異常:This application has no explicit mapping for /error, so you are seeing this as a fallback.

prop 控制 pre fall size erro ati his xpl 出現這個異常說明了跳轉頁面的url無對應的值. 原因1: Application啟動類的位置不對.要將Application類放在最外側,即包含所有子包原因:spring-boot會自動加載啟動

Nginx報錯：Sorry, the page you are looking for is currently unavailable. Please try again later.

cgi pre static 排除 filename 就是不能 code please 查看了進程, nginx, php-fpm都在運行, 排除程序錯誤, 那麽就是配置的問題了. 一個可能的錯誤, 是由於配置中的 fastcgi_pass 配置錯了錯誤的配置如下

You Are the One HDU - 4283（區間dp）

eas value script names elf for stdio.h 表示 text You Are the One Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Jav

mysql更新字段值提示You are using safe update mode and you tried to update a table without a WHERE that uses a KEY column To disable safe mode

error without 使用 using ble mod code span set 1 引言當更新字段缺少where語句時，mysql會提示一下錯誤代碼： Error Code: 1175. You are using safe update mode and yo

You Are (Probably) Here: Better Map Pins with DBSCAN & Random Forests

You Are (Probably) Here: Better Map Pins with DBSCAN & Random Forests

Unbeholden to Geocoders

From Filtering to Clustering

Combining DBSCAN and Random Forests

You Are (Probably) Here: Better Map Pins with DBSCAN & Random Forests

You really CAN follow your nose: People with a good sense of smell are better at navigation

You are what you think, what you read, who you with, and how you are taught.

With Project xCloud, Xbox Wants to Bring Gaming Anywhere You Are

nyoj 282 You are my brother

you are the one(區間dp)

vue報錯：Component template should contain exactly one root element. If you are using v-if on multiple elements, use v-else-if to chain them instead.

【maven】maven的web項目打包報錯：No compiler is provided in this environment. Perhaps you are running on a JRE rather than a JDK

提交到github報錯Please tell me who you are

[轉]You Could Become an AI Master Before You Know It. Here’s How.

Sorry, the page you are looking for is currently unavailable. Please try again later. Nginx

git commit -m "wrote a readme file" *** Please tell me who you are.xxx

異常:This application has no explicit mapping for /error, so you are seeing this as a fallback.

Nginx報錯：Sorry, the page you are looking for is currently unavailable. Please try again later.

You Are the One HDU - 4283（區間dp）

mysql更新字段值提示You are using safe update mode and you tried to update a table without a WHERE that uses a KEY column To disable safe mode

git *** Please tell me who you are.錯誤

poj 3130 How I Mathematician Wonder What You Are! 【半平面交】

DELLR720服務器更換硬盤，啟動系統報錯：there are offline or missing virtual drivers with preserved cache

No compiler is provided in this environment. Perhaps you are running on a JRE rather than a JDK?

You Are (Probably) Here: Better Map Pins with DBSCAN & Random Forests

You Are (Probably) Here: Better Map Pins with DBSCAN & Random Forests

Unbeholden to Geocoders

From Filtering to Clustering

Combining DBSCAN and Random Forests

相關推薦