Challenge: Machine Learning Basics

阿新 • • 發佈：2020-10-22

2019獨角獸企業重金招聘Python工程師標準>>>

1:How Challenges Work

At Dataquest, we're huge believers in learning through doing and we hope this shows in the learning experience of the missions. While missions focus on introducing concepts, challenges allow you to perform deliberate practice by completing structured problems. You can read more about deliberate practice

hereandhere. Challenges will feel similar to missions but with little instructional material and a larger focus on exercises.

For these challenges, westronglyencourage programming on your own computer so you practice using these tools outside the Dataquest environment. You can also use the Dataquest interface to write and quickly run code to see if you’re on the right track. By default, clicking the check code button runs your code and performs answer checking. You can toggle this behavior so that your code is run and the results are returned, without performing any answer checking. Executing your code without performing answer checking is much quicker and allows you to iterate on your work. When you’re done and ready to check your answer, toggle the behavior so that answer checking is enabled.

If you have questions or run into issues, head over to theDataquest forumsor ourSlack community.

2:Data Cleaning

In this challenge, you'll build on the exploration from the last mission, where we tried to answer the question:

How do the properties of a car impact it's fuel efficiency?

We focused the last mission on capturing how the weight of a car affects it's fuel efficiency by fitting a linear regression model. In this challenge, you'll explore how the horsepower of a car affects it's fuel efficiency and practice using scikit-learn to fit the linear regression model.

Unlike theweightcolumn, thehorsepowercolumn has some missing values. These values are represented using the?character. Let's filter out these rows so we can fit the model. We've already readauto-mpg.datainto a Dataframe namedcars.

Instructions

Remove all rows where the value forhorsepoweris?and convert thehorsepowercolumn to a float.
Assign the new Dataframe tofiltered_cars.

import pandas as pd
columns = ["mpg", "cylinders", "displacement", "horsepower", "weight", "acceleration", "model year", "origin", "car name"]
cars = pd.read_table("auto-mpg.data", delim_whitespace=True, names=columns)
filtered_cars=cars[cars["horsepower"]!="?"]
filtered_cars["horsepower"]=filtered_cars["horsepower"].astype("float")

3:Data Exploration

Now that the horsepower values are cleaned, generate a scatter plot that visualizes the relation between thehorsepowervalues and thempgvalues. Let's compare this to the scatter plot that visualizesweightagainstmpg.

Instructions

Use the Dataframeplotto generate 2 scatter plots, in vertical order:
- On the top plot, generate a scatter plot with thehorsepowercolumn on the x-axis and thempgcolumn on the y-axis.
- On the bottom plot, generate a scatter plot with theweightcolumn on the x-axis and thempgcolumn on the y-xis.

import matplotlib.pyplot as plt
%matplotlib inline
filtered_cars.plot("weight","mpg",kind="scatter")
filtered_cars.plot("acceleration","mpg",kind="scatter")
plt.show()

4:Fitting A Model

While it's hard to directly compare the plots since the scales for the x axes are very different, there does seem to be some relation between a car's horsepower and it's fuel efficiency. Let's fit a linear regression model using the horsepower values to get a quantitive understanding of the relationship.

Instructions

Create a new instance of the LinearRegression model and assign it tolr.
Use thefitmethod to fit a linear regression model using thehorsepowercolumn as the input.
Use the model to make predictions on the same data the model was trained on (thehorsepowercolumn fromfiltered_cars) and assign the resulting predictions topredictions.
Display the first 5 values inpredictionsand the first 5 values in thempgcolumn fromfiltered_cars.

import sklearn
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(filtered_cars[["horsepower"]], filtered_cars["mpg"])
predictions = lr.predict(filtered_cars[["horsepower"]])
print(predictions[0:5])
print(filtered_cars["mpg"][0:5].values)

Output

[ 19.41604569 13.89148002 16.25915102 16.25915102 17.83759835]

[ 18. 15. 18. 16. 17.]

5:Plotting The Predictions

In the last mission, we plotted the predicted values and the actual values on the same plot to visually understand the model's effectiveness. Let's repeat that here for the predictions as well.

Instructions

Generate 2 scatter plots on the same chart (Matplotlib axes instance):
- One containing thehorsepowervalues on the x-axis against the predicted fuel efficiency values on the y-axis. Usebluefor the color of the dots.
- One containing thehorsepowervalues on the x-axis against the actual fuel efficiency values on the y-axis. Useredfor the color of the dots.

import matplotlib.pyplot as plt
%matplotlib inline

plt.scatter(filtered_cars["horsepower"],predictions,c="blue")
plt.scatter(filtered_cars["horsepower"],filtered_cars["mpg"],c="red")
plt.show()

6:Error Metrics

To evaluate how well the model fits the data, you can compute the MSE and RMSE values for the model. Then, you can compare the MSE and RMSE values with those from the model you fit in the last mission. Recall that the model you fit in the previous mission captured the relationship between the weight of a car (weightcolumn) and it's fuel efficiency (mpgcolumn).

Instructions

Calculate the MSE of the predicted values and assign tomse.
Calculate the RMSE of the predicted values and assign tormse.

from sklearn.metrics import mean_squared_error

mse = mean_squared_error(filtered_cars["mpg"], predictions)
print(mse)
rmse = mse ** 0.5
print(rmse)

7:Next Steps

The MSE for the model from the last mission was18.78while the RMSE was4.33. Here's a table comparing the approximate measures for both models:

	Weight	Horsepower
MSE	18.78	23.94
RMSE	4.33	4.89

If we could only use one input to our model, we should definitely use theweightvalues to predict the fuel efficiency values because of the lower MSE and RMSE values. There's a lot more before we can build a reliable, working model to predict fuel efficiency however. In later missions, we'll learn how to use multiple features to build a more reliable predictive model.

轉載於:https://my.oschina.net/Bettyty/blog/751301

Challenge: Machine Learning Basics

1:How Challenges Work

2:Data Cleaning

Instructions

3:Data Exploration

Instructions

4:Fitting A Model

Instructions

5:Plotting The Predictions

Instructions

6:Error Metrics

Instructions

7:Next Steps

Challenge: Machine Learning Basics

Elasticsearch Machine Learning AIOps 實踐

machine learning學習之邏輯迴歸解決多分類問題&神經網路前向傳播

[Machine Learning] Octave Basic Operations

[Machine Learning] Octave Moving data around

[Machine Learning] Octave Computing on Data

[Machine Learning] Octave Control Statements, for while if

【機器學習 Azure Machine Learning】Azure Machine Learning 訪問SQL Server 無法寫入問題 (使用微軟Python AML Core SDK）

【機器學習 Azure Machine Learning】使用Aure虛擬機器搭建Jupyter notebook環境，為Machine Learning做準備(Ubuntu 18.04，Linux)

【機器學習 Azure Machine Learning】使用VS Code登入到Linux VM上 (Remote-SSH)

HypoML: Visual Analysis for Hypothesis-based Evaluation of Machine Learning Models

《machine learning》2單變數線性迴歸

ml-6-1-應用機器學習的建議Advice for Applying Machine Learning

paper1—Machine Learning Approach for Ship Detection using Remotely Sensed Images

Introduction to Machine Learning

《Machine Learning in Action》—— 剖析支援向量機，優化SMO

《Machine Learning in Action》—— Taoye給你講講決策樹到底是支什麼“鬼”

《Machine Learning in Action》—— 淺談線性迴歸的那些事

《Machine Learning in Action》—— Taoye給你講講Logistic迴歸是咋回事

Machine-Learning–Based Column Selection for Column Generation

Challenge: Machine Learning Basics

1:How Challenges Work

2:Data Cleaning

Instructions

3:Data Exploration

Instructions

4:Fitting A Model

Instructions

5:Plotting The Predictions

Instructions

6:Error Metrics

Instructions

7:Next Steps

相關推薦