1. 程式人生 > >Real-Time Hotspot Detection in Amazon Kinesis Analytics

Real-Time Hotspot Detection in Amazon Kinesis Analytics

Today we’re releasing a new machine learning feature in Amazon Kinesis Data Analytics for detecting “hotspots” in your streaming data. We launched Kinesis Data Analytics

in August of 2016 and we’ve continued to add features since. As you may already know, Kinesis Data Analytics is a fully managed real-time processing engine for streaming data that lets you write SQL queries to derive meaning from your data and output the results to Kinesis Data Firehose, Kinesis Data Streams
, or even an AWS Lambda function. The new HOTSPOTS function adds to the existing machine learning capabilities in Kinesis that allow customers to leverage unsupervised streaming based machine learning algorithms. Customers don’t need to be experts in data science or machine learning to take advantage of these capabilities.

Hotspots

The function is a new Kinesis Data Analytics SQL function you can use to identify relatively dense regions in your data without having to explicitly build and train complicated machine learning models. You can identify subsections of your data that need immediate attention and take action programmatically by streaming the hotspots out to a Kinesis Data stream, to a Firehose delivery stream, or by invoking a AWS Lambda function.

There are a ton of really cool scenarios where this could make your operations easier. Imagine a ride-share program or autonomous vehicle fleet communicating spatiotemporal data about traffic jams and congestion, or a datacenter where a number of servers start to overheat indicating an HVAC issue. HOTSPOTS is not limited to spatiotemporal data and you could apply it across many problem domains.

The function follows some simple syntax and accepts the DOUBLE, INTEGER, FLOAT, TINYINT, SMALLINT, REAL, and BIGINT data types.

The HOTSPOTS function takes a cursor as input and returns a JSON string describing the hotspot. This will be easier to understand with an example.

Using Kinesis Data Analytics to Detect Hotspots

Let’s take a simple data set from NY Taxi and Limousine Commission that tracks yellow cab pickup and drop-off locations. Most of this data is already on S3 and publicly accessible at s3://nyc-tlc/. We will create a small python script to load our Kinesis Data Stream with Taxi records which will feed our Kinesis Data Analytics. Finally, we’ll output all of this to a Kinesis Data Firehose connected to an Amazon Elasticsearch Service cluster for visualization with Kibana. I know from living in New York for 5 years that we’ll probably find a hotspot or two in this data.

First, we’ll create an input Kinesis stream and start sending our NYC Taxi Ride data into it. I just wrote a quick python script to read from one of the CSV files and used boto3 to push the records into Kinesis. You can put the record in whatever way works for you.

import csv
import json
import boto3
def chunkit(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]

kinesis = boto3.client("kinesis")
with open("taxidata2.csv") as f:
    reader = csv.DictReader(f)
    records = chunkit([{"PartitionKey": "taxis", "Data": json.dumps(row)} for row in reader], 500)
    for chunk in records:
        kinesis.put_records(StreamName="TaxiData", Records=chunk)

Next, we’ll create the Kinesis Data Analytics application and add our input stream with our taxi data as the source.

Next we’ll automatically detect the schema.

Now we’ll create a quick SQL Script to detect our hotspots and add that to the Real Time Analytics section of our application.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    "pickup_longitude" DOUBLE,
    "pickup_latitude" DOUBLE,
    HOTSPOTS_RESULT VARCHAR(10000)
); 
CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT "pickup_longitude", "pickup_latitude", "HOTSPOTS_RESULT" FROM
        TABLE(HOTSPOTS(
            CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001"),
            1000,
            0.013,
            20
        )
    );


Our HOTSPOTS function takes an input stream, a window size, scan radius, and a minimum number of points to count as a hotspot. The values for these are application dependent but you can tinker with them in the console easily until you get the results you want. There are more details about the parameters themselves in the documentation. The HOTSPOTS_RESULT returns some useful JSON that would let us plot bounding boxes around our hotspots:

{
  "hotspots": [
    {
      "density": "elided",
      "minValues": [40.7915039, -74.0077401],
      "maxValues": [40.7915041, -74.0078001]
    }
  ]
}

When we have our desired results we can save the script and connect our application to our Amazon Elastic Search Service Firehose Delivery Stream. We can run an intermediate lambda function in the firehose to transform our record into a format more suitable for geographic work. Then we can update our mapping in Elasticsearch to index the hotspot objects as Geo-Shapes.

Finally, we can connect to Kibana and visualize the results.

Looks like Manhattan is pretty busy!

Available Now
This feature is available now in all existing regions with Kinesis Data Analytics. I think this is a really interesting new feature of Kinesis Data Analytics that can bring immediate value to many applications. Let us know what you build with it on Twitter or in the comments!

相關推薦

YOLO(You Only Look Once):Real-Time Object Detection

path nor bat pen 2-0 object network file with caffe-yolo:https://github.com/xingwangsfu/caffe-yolo YOLO in caffe Update 12-05-2016: Curre

論文閱讀筆記(六)Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

采樣 分享 最終 產生 pre 運算 減少 att 我們 作者:Shaoqing Ren, Kaiming He, Ross Girshick, and Jian SunSPPnet、Fast R-CNN等目標檢測算法已經大幅降低了目標檢測網絡的運行時間。可是盡管如此,仍然

【Faster RCNN】《Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks》

NIPS-2015 NIPS,全稱神經資訊處理系統大會(Conference and Workshop on Neural Information Processing Systems),是一個關於機器學習和計算神經科學的國際會議。該會議固定在每年的12月舉行

《You Only Look Once: Unified, Real-Time Object Detection》論文筆記

1. 論文思想 YOLO(YOLO-v1)是最近幾年提出的目標檢測模型,它不同於傳統的目標檢測模型,將檢測問題轉換到一個迴歸問題,以空間分隔的邊界框和相關的類概率進行目標檢測。在一次前向運算中,一個單一的神經網路直接從完整的影象中預測邊界框和類概率。由於整個檢測管道是一個單一的網路,

YOLO前篇---Real-Time Grasp Detection Using Convolutional Neural Networks

論文地址:https://arxiv.org/abs/1412.3128 1. 摘要 比目前最好的方法提高了14%的精度,在GPU上能達到13FPS 2. 基於神經網路的抓取檢測 A 結構 使用AlexNet網路架構,5個卷積層+3個全連線層,卷積層

Train C4: Real-time pedestrian detection models——C4行人檢測演算法訓練過程

1.樣本的準備         樣本可以使用之前訓練的模型,通過OpenCV的imwrite截圖儲存然後再人工篩選,這個C4-Real-time-pedestrian-detection工程裡面我有實現。也可以自己寫一個程式,手動截圖。將正樣本都裁剪成只包

C4: Real-time pedestrian detection——C4實時行人檢測演算法

http://cs.nju.edu.cn/wujx/projects/C4/C4.htm Jianxin Wu實現的快速行人檢測方法。 Real-Time Human Detection Using Contour Cues: http://c2inet.sce.ntu.edu

6-----A Random Forest Method for Real-Time Price Forecasting in New York Electricity Market

實時價格的隨機森林法紐約電力市場預測(清華的) 隨機森林,作為一種新引入的方法,將提供價格概率分佈   此外,該模型可以調整最新的預報條件,即最新的氣候,季節和市場條件,通過更新隨機森林 引數與新的觀測。這種適應性避免了不同氣候或經濟條件下的模型失效訓練集。  

論文閱讀筆記二十六:Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks(CVPR 2016)

論文源址:https://arxiv.org/abs/1506.01497 tensorflow程式碼:https://github.com/endernewton/tf-faster-rcnn 摘要       目標檢測依賴於區域proposals演算法對目標的位置進

You Only Look Once: Unified, Real-Time Object Detection 論文閱讀

本文僅是對論文的解讀,供個人學習使用,如果有侵權的地方,還請聯絡我刪除博文 一、簡述 Yolo方法是一種目標檢測的方法。整個演算法的框架其實是一個迴歸的過程。現在簡單介紹一個下這個演算法的運轉流程。建立網路模型,輸入影象,然後其輸出結果記錄了影象中的Bounding Box(後

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Abstract SPPnet和Fast R-CNN雖然減少了演算法執行時間,但region proposal仍然是限制演算法速度的瓶頸。而Faster R-CNN提出了Region Proposal Network (RPN),該網路基於卷積特徵預測每個位置是否為物體以及

faced: CPU Real Time face detection using Deep Learning

What is the problem?There are many scenarios where a single class object detection is needed. This means that we want to detect the location of all objects

論文閱讀:You Only Look Once: Unified, Real-Time Object Detection

Preface 注:這篇今年 CVPR 2016 年的檢測文章 YOLO,我之前寫過這篇文章的解讀。但因為不小心在 Markdown 編輯器中編輯時刪除了。幸好同組的夥伴轉載了我的,我就直

【論文筆記】Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

寫在前面:      我看的paper大多為Computer Vision、Deep Learning相關的paper,現在基本也處於入門階段,一些理解可能不太正確。說到底,小女子才疏學淺,如果有錯

【筆記】Faster-R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

論文程式碼:重要:訓練檔案.prototxt說明:http://blog.csdn.net/Seven_year_Promise/article/details/60954553從RCNN到fast R

經典計算機視覺論文筆記——《Robust Real-Time Face Detection

        第一次讀這篇傳奇之作大概是九年前了,也就是2007年,而那時距論文正式發表(2004年)也已經有四年之久了。現在讀來,一些想法,在深度學習大行其道的今天仍然具有借鑑意義,讓人敬佩不已。         VJ人臉檢測器應該是歷史上第一個成功商業應用的實時人臉檢

[論文學習]《Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks 》

faster R-CNN的主要貢獻 提出了 region proposal network(RPN),通過該網路我們可以將提取region proposal的過程也納入到深度學習的過程之中。這樣做既增加了Accuracy,由降低了耗時。之所以說增加Accura

【翻譯】Faster-R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

摘要 目前最先進的目標檢測網路需要先用區域建議演算法推測目標位置,像SPPnet[7]和Fast R-CNN[5]這些網路已經減少了檢測網路的執行時間,這時計算區域建議就成了瓶頸問題。本文中,我們介紹一種區域建議網路(Region Proposal Network, R

CPU Real-time Face Detection and Alignment-68 using MTCNN

mtcnn的landmark採用了5點迴歸,博主嘗試了68點迴歸,發現效果不錯! 主要特點:同時完成人臉檢測和特徵點回歸,演算法速度實時! 開源地址:https://github.com/samylee/mtcnn_landmark68(歡迎star和fork)   1