Mastering The New Generation of Gradient Boosting

阿新 • • 發佈：2018-12-28

Mastering The New Generation of Gradient Boosting

Gradient Boosted Decision Trees and Random Forest are my favorite ML models for tabular heterogeneous datasets. These models are the top performers on Kaggle competitions and in widespread use in the industry.

Catboost, the new kid on the block, has been around for a little more than a year now, and it is already threatening XGBoost

, LightGBM and H2O.

Why Catboost?

Better Results

Catboost achieves the best results on the benchmark, and that’s great, yet I don’t know if I would replace a working production model for only a fraction of a log-loss improvement alone (especially when the company who conducted the benchmark has a clear interest in the favor of Catboost ?). Though, when you look at datasets where categorical features play a large role

, such as Amazon and the Internet datasets, this improvement becomes significant and undeniable.

Faster Predictions

While training time can take up longer than other GBDT implementations, prediction time is 13–16 times faster than the other libraries according to the Yandex benchmark.

Batteries Included

Catboost’s default parameters are a better starting point than in other GBDT algorithms. And this is good news for beginners who want a plug and play model to start experience tree ensembles or Kaggle competitions. Yet, there are some very important parameters which we must address and we’ll talk about those in a moment.

GBDT Algorithms with default parameters Benchmark

Some more noteworthy advancements by Catboost are the features interactions, object importance and the snapshot support.

In addition to classification and regression, Catboost supports ranking out of the box.

Battle Tested

Yandex is relying heavily on Catboost for ranking, forecasting and recommendations. This model is serving more than 70 million users each month.

CatBoost is an algorithm for gradient boosting on decision trees. Developed by Yandex researchers and engineers, it is the successor of the MatrixNet algorithm that is widely used within the company for ranking tasks, forecasting and making recommendations. It is universal and can be applied across a wide range of areas and to a variety of problems.

The Algorithm

Classic Gradient Boosting

Catboost Secret Sauce

Catboost introduces two critical algorithmic advances - the implementation of ordered boosting, a permutation-driven alternative to the classic algorithm, and an innovative algorithm for processing categorical features. Both techniques are using random permutations of the training examples to fight the prediction shift caused by a special kind of target leakage present in all existing implementations of gradient boosting algorithms.

Categorical Feature Handling

Ordered Target Statistic

Most of the GBDT algorithms and Kaggle competitors are already familiar with the use of Target Statistic (or target mean encoding).It’s a simple yet effective approach in which we encode each categorical feature with the estimate of the expected target y conditioned by the category.Well, it turns out that applying this encoding carelessly (average value of y over the training examples with the same category) results in a target leakage.

To fight this prediction shift CatBoost uses a more effective strategy. It relies on the ordering principle and is inspired by online learning algorithms which get training examples sequentially in time. In this setting, the values of TS for each example rely only on the observed history.To adapt this idea to a standard offline setting, Catboost introduces an artificial “time”— a random permutation σ1 of the training examples. Then, for each example, it uses all the available “history” to compute its Target Statistic. Note that, using only one random permutation, results in preceding examples with higher variance in Target Statistic than subsequent ones. To this end, CatBoost uses different permutations for different steps of gradient boosting.

One Hot Encoding

Catboost uses a one-hot encoding for all the features with at most one_hot_max_size unique values. The default value is 2.

Orederd Boosting

CatBoost has two modes for choosing the tree structure, Ordered and Plain. Plain mode corresponds to a combination of the standard GBDT algorithm with an ordered Target Statistic. In Ordered mode boosting we perform a random permutation of the training examples - σ2, and maintain n different supporting models - M1, . . . , Mn such that the model Mi is trained using only the first i samples in the permutation. At each step, in order to obtain the residual for j-th sample, we use the model Mj−1. Unfortunately, this algorithm is not feasible in most practical tasks due to the need of maintaining n different models, which increase the complexity and memory requirements by n times. Catboost implements a modification of this algorithm, on the basis of the gradient boosting algorithm, using one tree structure shared by all the models to be built.

Catboost Ordered Boosting and Tree Building

In order to avoid prediction shift, Catboost uses permutations such that σ1 = σ2. This guarantees that the target-yi is not used for training Mi neither for the Target Statistic calculation nor for the gradient estimation.

Hands On

For this section, we’ll use the Amazon Dataset, since it’s clean and has a strong emphasize on categorical features.

Mastering The New Generation of Gradient Boosting

Mastering The New Generation of Gradient Boosting

Why Catboost?

Better Results

Faster Predictions

Batteries Included

Battle Tested

The Algorithm

Classic Gradient Boosting

Catboost Secret Sauce

Categorical Feature Handling

Ordered Target Statistic

One Hot Encoding

Orederd Boosting

Hands On

Mastering The New Generation of Gradient Boosting

流利閱讀12.20 Angry young women: A new generation of activists is challenging misogyny

推薦系統論文筆記（2）：Towards the Next Generation of Recommender Systems:A Survey of the State-of-the-Art ....

Why we're training the next generation of lawyers in big data

AI, Chatbots and Designing the Next Generation of Automated Customer Engagement

Shanghai Spawns a New Generation of AI Startups

The New Wave of Digital Transformation | AITopics

Just Let Them Compete: Raising the Next Generation of Wargamers

AI Is The New Face Of Systemic (And Automated) Inequality

The new geopolitics of artificial intelligence

Educating the next generation of medical professionals with machine learning is essential

See How AI is Inspiring the Next Generation of Developers

The New Rules of Build vs. Buy in An AI World

The next generation of Linux container tools | Red Hat Developer

How to Reins GRUB After the New Update of WINDOWS

Mastering the game of Go with deep neural networks and tree search

論文翻譯：Mastering the Game of Go without Human Knowledge (第一部分)

Mastering the game of Go with deep neural networks and tree search譯文

New CERN Research Details Evidence of the Direct Decay of Grid Analytics System the Higgs Boson to

Embed,encode,attend,predict:the new deep learning formula for state-of-the -art NLP models

Mastering The New Generation of Gradient Boosting

Mastering The New Generation of Gradient Boosting

Why Catboost?

Better Results

Faster Predictions

Batteries Included

Battle Tested

The Algorithm

Classic Gradient Boosting

Catboost Secret Sauce

Categorical Feature Handling

Ordered Target Statistic

One Hot Encoding

Orederd Boosting

Hands On

相關推薦