1. 程式人生 > 實用技巧 >Challenge: Machine Learning Basics

Challenge: Machine Learning Basics

2019獨角獸企業重金招聘Python工程師標準>>> hot3.png

1:How Challenges Work

At Dataquest, we're huge believers in learning through doing and we hope this shows in the learning experience of the missions. While missions focus on introducing concepts, challenges allow you to perform deliberate practice by completing structured problems. You can read more about deliberate practice

hereandhere. Challenges will feel similar to missions but with little instructional material and a larger focus on exercises.

For these challenges, westronglyencourage programming on your own computer so you practice using these tools outside the Dataquest environment. You can also use the Dataquest interface to write and quickly run code to see if you’re on the right track. By default, clicking the check code button runs your code and performs answer checking. You can toggle this behavior so that your code is run and the results are returned, without performing any answer checking. Executing your code without performing answer checking is much quicker and allows you to iterate on your work. When you’re done and ready to check your answer, toggle the behavior so that answer checking is enabled.

If you have questions or run into issues, head over to theDataquest forumsor ourSlack community.

2:Data Cleaning

In this challenge, you'll build on the exploration from the last mission, where we tried to answer the question:

  • How do the properties of a car impact it's fuel efficiency?

We focused the last mission on capturing how the weight of a car affects it's fuel efficiency by fitting a linear regression model. In this challenge, you'll explore how the horsepower of a car affects it's fuel efficiency and practice using scikit-learn to fit the linear regression model.

Unlike theweightcolumn, thehorsepowercolumn has some missing values. These values are represented using the?character. Let's filter out these rows so we can fit the model. We've already readauto-mpg.datainto a Dataframe namedcars.

Instructions

  • Remove all rows where the value forhorsepoweris?and convert thehorsepowercolumn to a float.
  • Assign the new Dataframe tofiltered_cars.

import pandas as pd
columns = ["mpg", "cylinders", "displacement", "horsepower", "weight", "acceleration", "model year", "origin", "car name"]
cars = pd.read_table("auto-mpg.data", delim_whitespace=True, names=columns)
filtered_cars=cars[cars["horsepower"]!="?"]
filtered_cars["horsepower"]=filtered_cars["horsepower"].astype("float")

3:Data Exploration

Now that the horsepower values are cleaned, generate a scatter plot that visualizes the relation between thehorsepowervalues and thempgvalues. Let's compare this to the scatter plot that visualizesweightagainstmpg.

Instructions

  • Use the Dataframeplotto generate 2 scatter plots, in vertical order:
    • On the top plot, generate a scatter plot with thehorsepowercolumn on the x-axis and thempgcolumn on the y-axis.
    • On the bottom plot, generate a scatter plot with theweightcolumn on the x-axis and thempgcolumn on the y-xis.

import matplotlib.pyplot as plt
%matplotlib inline
filtered_cars.plot("weight","mpg",kind="scatter")
filtered_cars.plot("acceleration","mpg",kind="scatter")
plt.show()

4:Fitting A Model

While it's hard to directly compare the plots since the scales for the x axes are very different, there does seem to be some relation between a car's horsepower and it's fuel efficiency. Let's fit a linear regression model using the horsepower values to get a quantitive understanding of the relationship.

Instructions

  • Create a new instance of the LinearRegression model and assign it tolr.
  • Use thefitmethod to fit a linear regression model using thehorsepowercolumn as the input.
  • Use the model to make predictions on the same data the model was trained on (thehorsepowercolumn fromfiltered_cars) and assign the resulting predictions topredictions.
  • Display the first 5 values inpredictionsand the first 5 values in thempgcolumn fromfiltered_cars.

import sklearn
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(filtered_cars[["horsepower"]], filtered_cars["mpg"])
predictions = lr.predict(filtered_cars[["horsepower"]])
print(predictions[0:5])
print(filtered_cars["mpg"][0:5].values)

Output

[ 19.41604569 13.89148002 16.25915102 16.25915102 17.83759835]

[ 18. 15. 18. 16. 17.]

5:Plotting The Predictions

In the last mission, we plotted the predicted values and the actual values on the same plot to visually understand the model's effectiveness. Let's repeat that here for the predictions as well.

Instructions

  • Generate 2 scatter plots on the same chart (Matplotlib axes instance):
    • One containing thehorsepowervalues on the x-axis against the predicted fuel efficiency values on the y-axis. Usebluefor the color of the dots.
    • One containing thehorsepowervalues on the x-axis against the actual fuel efficiency values on the y-axis. Useredfor the color of the dots.

import matplotlib.pyplot as plt
%matplotlib inline

plt.scatter(filtered_cars["horsepower"],predictions,c="blue")
plt.scatter(filtered_cars["horsepower"],filtered_cars["mpg"],c="red")
plt.show()

6:Error Metrics

To evaluate how well the model fits the data, you can compute the MSE and RMSE values for the model. Then, you can compare the MSE and RMSE values with those from the model you fit in the last mission. Recall that the model you fit in the previous mission captured the relationship between the weight of a car (weightcolumn) and it's fuel efficiency (mpgcolumn).

Instructions

  • Calculate the MSE of the predicted values and assign tomse.
  • Calculate the RMSE of the predicted values and assign tormse.

from sklearn.metrics import mean_squared_error

mse = mean_squared_error(filtered_cars["mpg"], predictions)
print(mse)
rmse = mse ** 0.5
print(rmse)

7:Next Steps

The MSE for the model from the last mission was18.78while the RMSE was4.33. Here's a table comparing the approximate measures for both models:

Weight Horsepower
MSE 18.78 23.94
RMSE 4.33 4.89

If we could only use one input to our model, we should definitely use theweightvalues to predict the fuel efficiency values because of the lower MSE and RMSE values. There's a lot more before we can build a reliable, working model to predict fuel efficiency however. In later missions, we'll learn how to use multiple features to build a more reliable predictive model.

轉載於:https://my.oschina.net/Bettyty/blog/751301