Python-深入淺出資料分析-線性迴歸
阿新 • • 發佈:2020-08-21
目錄
在閱讀前,讀一下Python-深入淺出資料分析-總結會更好點,以後遇到問題比如程式碼執行不了,再讀讀也行,>-_-<
做一個薪水預測器
書中例子很有意思,如果真的可以做一個薪水預測器,那麼你就可以自己發自己薪水了,:)
怎麼做
分析以前要求加薪的人最後得到的薪水提高幅度,看看有什麼規律。
要求加薪的幅度\(\mapsto\)得到加薪的幅度,這兩者有什麼關係,散點圖
import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline df = pd.read_csv('./hfda_ch10_employees.csv', names =['staff_num', 'received', 'requested', 'negotiated', 'gender', 'year'], skiprows=1) fig = plt.figure(figsize=(8, 6)) ax = fig.add_subplot(1, 1, 1) ax.scatter(df['requested'][df['negotiated']], df['received'][df['negotiated']]) ax.set_xlabel('requested', fontsize=12) ax.set_ylabel('recieved', fontsize=12)
兩者的相關性有多大?
df['requested'][df['negotiated']].corr(df['received'][df['negotiated']])
'''
輸出:0.66564810255571794
'''
是不是有一個函式
輸入:要求加薪的幅度
輸出:大致得到加薪的幅度
from sklearn.linear_model import LinearRegression X = df['requested'][df['negotiated']] y = df['received'][df['negotiated']] regr = LinearRegression() regr.fit(X.values[:, np.newaxis], y.values) fig = plt.figure(figsize=(8, 6)) ax = fig.add_subplot(1, 1, 1) ax.scatter(X, y, color='black') ax.plot(X.values, regr.predict(X.values[:, np.newaxis]),linewidth= 3, color= 'blue') regr.coef_ regr.intercept_
誤差分析
待更新