Automatic Feature Engineering: An Event

阿新 • • 發佈：2018-12-28

Feature Engineering: the Heart of Data Science

“Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data.”

— Dr. Jason Brownlee

The groundwork for this field was laid long before the hype of

Kaggle and Machine Learning, with KPI design.

Key Performance Indicators (KPIs) are crucial for companies of all sizes. They offer concrete metrics on business performance. Take the classic RFM (recency, frequency, monetary value) paradigm that is used by retailers to measure customer value. More recently, the

Power User Curve is an interesting example of how thoughtful metrics help companies understand user engagement in a complex context.

We believe that a good feature engineering tool should not only generate sophisticated indicators, but should also keep them interpretable so that data scientists can use them either for the machine learning models or for the KPI dashboards.

The Need for Automated Feature Engineering

Imagine you are working for an e-commerce company. You have collected transactional data and are now almost ready to make some magic with machine learning.

The task at hand is churn prediction: you want to predict who might stop visiting your website next week so that the marketing team has enough time to react.

Before doing any feature engineering, you need to choose a reference date in the past, in this case it will be 2018–09–01. Only data before that date will be taken into account by the model, which predicts the churners of the week after 2018–09–01. This is a way to make sure that there is no data leaking: we are not looking at the future to predict the past.

As an experienced data scientist, you know that one important feature for this type of problem will be the recency of the client: if the timespan between two visits of a client is increasing, that’s an alert to potential!

You put on your SQL ninja hat and write the following PostgreSQL query:

This is fine, but now you want to go further: you want to add a time filter to capture long and short term signals, then you want to compute this feature for each type of activity the user does, and then you want to add some more statistics on top of these results, and then …

The list of features we would love to compute

You get the idea, the list of ideas keeps growing exponentially, and this is just for one feature!

EventsAggregator to the Rescue

Now let’s see how things change with EventsAggregator.

First, we need to instantiate the feature_aggregator object as follows:

We then apply the feature_aggregator to the input dataset:

Under the hood, feature_aggregatorgenerates a set of SQL queries corresponding to our criteria.

For example, you can see below one of the generated queries for the Postgres database, where it considers only the 6 most recent months of history:

Automatic Feature Engineering: An Event

Feature Engineering: the Heart of Data Science

The Need for Automated Feature Engineering

EventsAggregator to the Rescue

Automatic Feature Engineering: An Event

Feature Engineering

Understanding Feature Engineering (Part 4) — A hands-on intuitive approach to Deep Learning Methods

Understanding Feature Engineering (Part 3) — Traditional Methods for Text Data

Understanding Feature Engineering (Part 2) — Categorical Data

Understanding Feature Engineering (Part 1) — Continuous Numeric Data

Google Machine Learning Course NoteBook--Data Preparation and Feature Engineering in ML

How Annalect built an event log data analytics solution using Amazon Redshift

Feature Engineering-（1）PCA的理解實現

Software Engineer (Feature Engineering & Data Transformation) | Open position

AI學習---特征工程(Feature Engineering)

【USE】《An End-to-End System for Automatic Urinary Particle Recognition with CNN》

【論文閱讀】《Delta TFIDF：An Improved Feature Space for Sentiment Analysis》（論文及實驗）

Muxer, an open source event aggregator

How to Plan an Engineering Project in 5 Easy Steps

Create an RDS Event Subscription

What is an Engineering Manager?

JavaScript基礎 event(For IE) 顯示鼠標點擊處的坐標值距離窗口的距離距離網頁的距離距離屏幕的距離

SAP CRM 使用Javascript觸發SAP Server Event

【LeetCode】215. Kth Largest Element in an Array

Automatic Feature Engineering: An Event

Feature Engineering: the Heart of Data Science

The Need for Automated Feature Engineering

EventsAggregator to the Rescue

相關推薦