Getting Better at Machine Learning

阿新 • • 發佈：2018-12-29

Models that are an integrated part of a product experience, or what we referred to as data products, often involve feedback loops. When done right, feedback loops can help us to create better experiences. However, feedback loops can also create unintended negative consequences, such as bias or inaccurate model performance measurements.

User Feedback Can Make Your Model Better

One of the most unexpected skills that I learned about real-life machine learning is the ability to spot opportunities for users to provide model feedback via product interactions. These decisions might seem only relevant to UI/UX at first, but they can actually have a profound impact on the quality of the features that the data product offers.

Image source: From Xavier Amatriain’s post “10 more lessons learned from building real-life ML system”

For example, Netflix decided last year to move away from the star-rating system to a thumbs up/down system, reportedly because its simplicity prompts more users to provide feedback, which in terms help Netflix to make their recommendations better. Similarly, Facebook, Twitter, Quora, and other social networks have long designed features such as likes, retweets, and comments which not only make the product more interactive, but also allow these companies to monetize better via personalization.

Creating feedback opportunities in product, instrumenting and capturing these feedback, and integrating it back into model development is important for both improving user experience as well as optimizing the companies’ business objectives and bottom lines.

Feedback Loops Can Also Bias Model Performance

While feedback loops can be powerful, they can also have unintended, negative consequences. One important topic is that models that are biased will amplify the bias the feedback loop introduces (see here). Other times, feedback loop can affect our ability to measure model performance accurately.

This latter phenomenon is best illustrated by Michael Manapat, who explains this bias based on his experience building fraud models at Stripe. In his example, he pointed out that when a live fraud model enforces certain policy (e.g. block a transaction if its fraud score is above certain threshold), the system never gets to observe the ground truth for those blocked transactions, regardless of whether they are fraudulent or not. This blind spot can affect our ability to measure the effectiveness of a model running live in production.

Source: Michael Manapat’s “Counterfactual evaluation of machine learning models” from PyData

Why? When obvious fraudulent transactions are blocked, the ones that remained with ground truth that we can observe are typically false negative transactions that are harder to get right. When we re-train our models on these “harder” examples, our model performance will necessarily be worse than what it really is performing in production.

Michael’s solution to this bias is to inject randomness in production traffic to understand the counterfactuals. Specifically, for transactions that are deemed fraudulent, we will let a small percentage of transactions pass, regardless of their scores, so we can observe the ground truth. Using these additional labels, we can then re-adjust the calculation for model performance. This approach is simple but not entirely obvious. In fact, it took me a long while before spotting the same feedback loop in my model, and it is not until I encountered Michael’s talk that I found a solution.

Takeaway: Feedback loops in machine learning models are subtle. Knowing how to leverage feedback loops can help you to build a better user experience, and being aware of feedback loops can inform you to calculate the performance of your live system more accurately.

Conclusion

Source: From the paper “Hidden Technical Debt in Machine Learning System” by D. Sculley et al

Throughout this post, I gave concrete examples around topics such as problem definition, feature engineering, model debugging, productionization, and dealing with feedback loops. The main underlying theme here is that building a machine learning system involves a lot more nuances than just fitting a model on a laptop. While the materials that I have covered here are only a subset of the topics that one would encounter in practice, I hope that they have been informative in helping you to move beyond “Laptop Data Science”.

Happy Machine Learning!

Getting Better at Machine Learning

Models that are an integrated part of a product experience, or what we referred to as data products, often involve feedback loops. When done right, feedbac

How To Get Better At Machine Learning

Tweet Share Share Google Plus Colorado Reed from Metacademy wrote a great post recently titled “

Five steps for getting started in machine learning: Top data scientists share their tips

If you want to carve out a career in machine learning then knowing where to start can be daunting. Not only is the technology built on college-level math,

Getting Started With Machine Learning

Getting Started With Machine LearningWhat are the fundamentals of machine learning, and what are the necessary tools to evaluate risk and other concerns in

Practical Advice for Getting Started in Machine Learning

Tweet Share Share Google Plus David Mimno is an assistant professor in the Information Sciences

How Facebook Uses Bayesian Optimization to Conduct Better Experiments in Machine Learning Models

How Facebook Uses Bayesian Optimization to Conduct Better Experiments in Machine Learning ModelsHyperparameter optimization is a key aspect of the lifecycl

Personalization at Scale With Machine Learning: The Xero Story

When Nigel Piper, Executive General Manager, first joined Xero, the company only had 100,000 subscribers. In over ten years that number has risen to over 1

All eyes on AI, machine learning at Cocon 2018

KOCHI: It could well be a harbinger of'machine learning'. In tune with the'technology' strewn around the venue of Cocon 2018, a robot stepped forward to de

Artificial Intelligence and Machine Learning Require a Better Network

Artificial intelligence (AI) applications are quickly finding their way into everyday life – whether it's traffic data for Waze maps, sensor data from self

Data Science Program at Argonne Looks to Machine Learning for New Breakthroughs

The 3D X-ray microscopy of frozen hydrated biological specimens is currently approaching a limit to specimen thickness, the surpassing of which violates th

Building Machine Learning at LinkedIn Scale

Building Machine Learning at LinkedIn ScaleBuilding machine learning at scale is a road full of challenges and there are not many well-documented case stud

CobbleStone Software's Contract Machine Learning Showcased at 2018 ACC Conference

CobbleStone's CLM platform provides powerful AI and ML to systematically analyze paper contracts and performs contract text extraction for improved contrac

Artificial Intelligence and Machine Learning Require a Better Network Business Technology

All of these apps generate extreme volumes of data that must be collected and processed in real time. Networks built as recently as 10 years ago weren't re

The Ins And Outs Of Adopting Machine Learning At A Corporate Level

The interest in Machine Learning can be understood by merely understanding that there is a rise in volumes and varieties of raw data, as well as the variou

Machine learning job: Data Scientist at Homesnap (Bethesda, Maryland, United States)

Data Scientist at Homesnap Bethesda, Maryland, United States (Posted Sep 25 2018) About the company Homesnap is the market-leading national home searc

Using Machine Learning to Improve Streaming Quality at Netflix

Using Machine Learning to Improve Streaming Quality at NetflixOne of the common questions we get asked is: “Why do we need machine learning to improve stre

Is Machine Learning at Google Falling Apart? Google’s System Doesn’t Believe I‘m a Person.

Is Machine Learning at Google Falling Apart? Google’s System Doesn’t Believe I‘m a Person.Thoughts here are my own, and are not related to my employer.My h

First Business Models at Scale | Machine Learning Blog

Earlier this week, MIT, in collaboration with Boston Consulting Group, released their second global study looking at AI adoption in industry. A top fin

Abdul Latif Jameel Clinic for Machine Learning in Health at MIT aims to revolutionize disease prevention, detection, and treatme

Today, MIT and Community Jameel, the social enterprise organization founded and chaired by Mohammed Abdul Latif Jameel ’78, launched the Abdul Latif Jameel

Machine-learning system tackles speech and object recognition, all at once

MIT computer scientists have developed a system that learns to identify objects within an image, based on a spoken description of the image. Given an image

Getting Better at Machine Learning

User Feedback Can Make Your Model Better

Feedback Loops Can Also Bias Model Performance

Conclusion

相關推薦