Linear Regression in the Wild
In one of my job interviews for a data scientist position, I was given a home assignment I'd like to share with you. The interviewer sent me a CSV file containing samples of measured quantities x and y, where y is a response variable which can be written as an explicit function of x. It is known that the technique used for measuring x is twice as better than that for measuring y in the sense of standard deviation. Here are all the imports I'll need: It clearly looks like linear regression case. First I'll manually remove the outliers: I'll use LinearRegression to fit the best line: If you're not familiar with the linear regression assumptions, you can read about it in the article Going Deeper into Regression Analysis with Assumptions, Plots & Solutions.