1. 程式人生 > >Amazon Kinesis Agent Update – New Data Preprocessing Features

Amazon Kinesis Agent Update – New Data Preprocessing Features

My colleague Ray Zhu wrote the guest post below to introduce you to some new data preprocessing features for the Amazon Kinesis Agent.

Jeff;

Amazon Kinesis Agent is a stand-alone Java software application that provides an easy and reliable way to send data to Amazon Kinesis Streams

and Amazon Kinesis Firehose. The agent monitors a set of files for new data and then sends it to Kinesis Streams or Kinesis Firehose continuously. It handles file rotation, checkpointing, and retrial upon failures. It also supports Amazon CloudWatch so that you can closely monitor and troubleshoot the data flow from the agent.

Data Preprocessing with Kinesis Agent
Today we are adding data preprocessing capabilities to the agent so that your data can be well formatted before it is sent to Kinesis Streams or Kinesis Firehose. The agent currently supports the three processing options listed below. Because the agent is open source, you can further develop and extend these processing options.

SINGLELINE – This option converts a multi-line record to a single line record by removing newline characters, and leading and trailing spaces.

CSVTOJSON – This option converts a record from delimiter separated format to JSON format.

LOGTOJSON – This option converts a record from several commonly used log formats to JSON format. Currently supported log formats are Apache Common Log, Apache Combined Log, Apache Error Log, and RFC3164 (syslog).

Analyze Apache Tomcat Access Log in Near Real-Time
Let’s look at an example of analyzing Tomcat access logs in near real-time using Kinesis Agent’s preprocessing feature, Amazon Kinesis Firehose, and Amazon Redshift. Here’s the overall flow:

First I need to create a table in my Redshift cluster to store the Tomcat access log. The following SQL statement is used to create the table:

CREATE TABLE logs(
host VARCHAR(40),
ident VARCHAR(25),
authuser VARCHAR(25),
datetime VARCHAR(60),
request VARCHAR(2048),
response SMALLINT NOT NULL,
bytes INTEGER,
referer VARCHAR(2048),
agent VARCHAR(256));

Then I need to create a Kinesis Firehose delivery stream that continuously delivers data to the Redshift table created above:

Now I’ve set up my Redshift table and Firehose delivery stream. Next I need to install the Kinesis Agent on my Tomcat server to monitor my Tomcat access log files and continuously send the log data to my delivery stream. Here is a screenshot of the raw Tomcat access log:

In the agent configuration, I use the LOGTOJSON processing option to convert raw Tomcat access log data to JSON format before sending the data to my delivery stream. Here’s how I set that up:

{
   "cloudwatch.emitMetrics":true,
   "flows":[
      {
         "filePattern":"/data/access.log*",
         "deliveryStream":"access_log_stream",
         "initialPosition":"START_OF_FILE",
         "dataProcessingOptions":[
            {
               "optionName":"LOGTOJSON",
               "logFormat":"COMBINEDAPACHELOG"
            }
         ]
      }
   ]
}

Everything is set up now and let’s start the agent! After a minute or two, my Tomcat access log data shows up in my S3 bucket and Redshift table. Here is how the data looks like in my S3 bucket. Notice that the raw log data has been nicely formatted as JSON:

Here is how the data looks like in my Redshift table:

I can run SQL queries to analyze my Tomcat access log, or use the Business Intelligence tool of my choice to visualize the data:

It took me less than an hour to set up the whole data pipeline. Now I can analyze and visualize access log data using my favorite Business Intelligence tool, only minutes after the data is generated on my Tomcat server!

Available Now
Kinesis Agent’s data preprocessing feature is available now and you can start using it today – visit the Amazon Kinesis Agent Repository! To learn more, read Use Agent to Preprocess Data in the Kinesis Firehose Developer Guide.

Ray Zhu, Senior Product Manager

相關推薦

Amazon Kinesis Agent UpdateNew Data Preprocessing Features

My colleague Ray Zhu wrote the guest post below to introduce you to some new data preprocessing features for the Amazon Kinesis Agent. — J

Build More Reliable and Secure Windows Services Using Amazon Kinesis Agent for Microsoft Windows

We’ve all been there. You’ve deployed a new service on Windows servers. Maybe it’s based on Microsoft technology such as IIS, AD, DHCP, Microsoft

Amazon Kinesis Data Firehose Features

You can configure Amazon Kinesis Data Firehose to prepare your streaming data before it is loaded to data stores. Simply select an AWS Lambda fun

Amazon EBS UpdateNew Cold Storage and Throughput Options

The AWS team spends a lot of time looking in to ways to deliver innovation based around improvements in price/performance. Quite often, this means

Amazon Kinesis Data Firehose blog posts

Stream data into an Aurora PostgreSQL Database using AWS DMS and Amazon Kinesis Data Firehose In this blog post, we explore a solution to

Amazon Kinesis Data Streams Resources

This is a pre-built library that helps you easily integrate Amazon Kinesis Data Streams with other AWS services and third-party tools. Amazon Ki

Amazon Kinesis Data Streams getting started

Reducing the time to get actionable insights from data is important to all businesses and customers who employ batch data analytics tools are exp

Amazon Kinesis Data Streams FAQs

Q: What is an Amazon Kinesis Application? An Amazon Kinesis Application is a data consumer that reads and processes data from an Amazon

Building a Data Processing Pipeline with Amazon Kinesis Data Streams and Kubeless

If you’re already running Kubernetes, FaaS (Functions as a Service) platforms on Kubernetes can help you leverage your existing investment in EC2

Amazon Kinesis Data Streams News

Two years ago we introduced Amazon Kinesis, which we now call Amazon Kinesis Streams, to allow customers to build applications that collect,

Amazon Kinesis Data Streams Pricing

Let’s assume that our data producers put 100 records per second in aggregate, and each record is 35KB. In this case, the total data input rate is

Amazon Kinesis Firehose Data Transformation with AWS Lambda

Shiva Narayanaswamy, Solution Architect Amazon Kinesis Firehose is a fully managed service for delivering real-time streaming data to des

Amazon Kinesis Data Streams:AWS

Amazon Kinesis Data Streams (KDS) は、大規模にスケーラブルで持続的なリアルタイムのデータストリーミングサービスです。KDS はウエブサイトクリックストリームやデータべースイベントストリームや金融取引、ソーシャルメディアフィード、ITロゴ、ロケーション追跡イベ

Amazon Kinesis Data Firehose Pricing

If you send 5,000 records of streaming data per second, each record 7KB in size, to Amazon Kinesis Data Firehose in US-East to be loaded into Amaz

Amazon Kinesis Data Firehose Resources

Reducing the time to get actionable insights from data is important to all businesses and customers who employ batch data analytics tools are ex

Questions fréquentes (FAQ) sur Amazon Kinesis Data Streams

Q : Qu'est-ce qu'une application Amazon Kinesis ? Une application Amazon Kinesis est un consommateur de données qui lit et traite des do

Amazon Kinesis- Setting up a Streaming Data Pipeline

Ray Zhu from the Amazon Kinesis team wrote this great post about how to set up a streaming data pipeline. He carefully shows you step by step how

Вопросы и ответы по Amazon Kinesis Data Streams

Вопрос: Что такое приложение Amazon Kinesis? Приложение Amazon Kinesis – это потребитель данных, который считывает и обрабатывает данные