1. 程式人生 > >AI-IBM-cognitive class --Liner Regression

AI-IBM-cognitive class --Liner Regression

learn spec ole cep ice 數據包 object exclusive line

Liner Regression

1 import matplotlib.pyplot as plt
2 import pandas as pd
3 import pylab as pl
4 import numpy as np
5 %matplotlib inline
%motib inline

%matplotlib作用

  1. 是在使用jupyter notebook 或者 jupyter qtconsole的時候,才會經常用到%matplotlib,
  2. 而%matplotlib具體作用是當你調用matplotlib.pyplot的繪圖函數plot()進行繪圖的時候,或者生成一個figure畫布的時候,可以直接在你的python console裏面生成圖像。

在spyder或者pycharm實際運行代碼的時候,可以註釋掉這一句

下載數據包

!wget -O FuelConsumption.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/FuelConsumptionCo2.csv
df = pd.read_csv("./FuelConsumptionCo2.csv") # use pandas to read csv file.

# take a look at the dataset, show top 10 lines.
df.head(10)

out:

技術分享圖片

# summarize the data
print(df.describe())

使用describe函數進行表格的預處理,求出最大最小值,已經分比例的數據。

out:

技術分享圖片

進行表格的重新組合, 提取出我們關心的數據類型。

out:

cdf = df[[ENGINESIZE,CYLINDERS,FUELCONSUMPTION_COMB,CO2EMISSIONS,FUELCONSUMPTION_CITY]]
cdf.head(9)

技術分享圖片

每一列數據可生成hist(直方圖)

viz = cdf[[CYLINDERS,ENGINESIZE
,CO2EMISSIONS,FUELCONSUMPTION_COMB,FUELCONSUMPTION_CITY]] viz.hist() plt.show()

技術分享圖片

使用scatter生成散列圖, 定義散列圖的參數, 顏色

具體使用可參考連接:https://blog.csdn.net/qiu931110/article/details/68130199

plt.scatter(cdf.FUELCONSUMPTION_COMB, cdf.CO2EMISSIONS,  color=blue)
plt.xlabel("FUELCONSUMPTION_COMB")
plt.ylabel("Emission")
plt.show()

技術分享圖片

選擇表中len長度小於8的數據, 創建訓練集合測試集,並生成散列圖

Creating train and test dataset

Train/Test Split involves splitting the dataset into training and testing sets respectively, which are mutually exclusive. After which, you train with the training set and test with the testing set. This will provide a more accurate evaluation on out-of-sample accuracy because the testing dataset is not part of the dataset that have been used to train the data. It is more realistic for real world problems.

This means that we know the outcome of each data point in this dataset, making it great to test with! And since this data has not been used to train the model, the model has no knowledge of the outcome of these data points. So, in essence, it is truly an out-of-sample testing.

msk = np.random.rand(len(df)) < 0.8
train = cdf[msk]
test = cdf[~msk]
print(train)
print(test)
plt.scatter(train.ENGINESIZE, train.CO2EMISSIONS,  color=blue)
plt.xlabel("Engine size")
plt.ylabel("Emission")
plt.show()

技術分享圖片

Modeling: Using sklearn package to model data.

from sklearn import linear_model
regr = linear_model.LinearRegression()
train_x = np.asanyarray(train[[ENGINESIZE]])
train_y = np.asanyarray(train[[CO2EMISSIONS]])
regr.fit (train_x, train_y)
# The coefficients
print (Coefficients: , regr.coef_)
print (Intercept: ,regr.intercept_)

out:

Coefficients:  [[39.64984954]]
Intercept:  [124.08949291]

As mentioned before, Coefficient and Intercept in the simple linear regression, are the parameters of the fit line. Given that it is a simple linear regression,
with only 2 parameters, and knowing that the parameters are the intercept and slope of the line, sklearn can estimate them directly from our data.
Notice that all of the data must be available to traverse and calculate the parameters.
plt.scatter(train.ENGINESIZE, train.CO2EMISSIONS,  color=blue)
plt.plot(train_x, regr.coef_[0][0]*train_x + regr.intercept_[0], -r)
# 通過斜率和截距畫出線性回歸曲線 plt.xlabel(
"Engine size") plt.ylabel("Emission")

技術分享圖片

使用sklearn.linear_model.LinearRegression進行線性回歸 參考以下連接:

https://www.cnblogs.com/magle/p/5881170.html




AI-IBM-cognitive class --Liner Regression