房價預測《進階版，測試》

阿新 • • 發佈：2017-10-23

rest 哪些 tle blog model lln one atp feature

#coding=utf8

import numpy as np
import pandas as pd
from sklearn.linear_model import Ridge
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor, BaggingRegressor, AdaBoostRegressor
from xgboost import XGBRegressor

# 
不要第一列id，只是作為索引
train_df = pd.read_csv(‘./input/train.csv‘, index_col=0)
test_df = pd.read_csv(‘./input/test.csv‘, index_col=0)
prices = pd.DataFrame({‘price‘:train_df[‘SalePrice‘], ‘log(price + 1)‘:np.log1p(train_df[‘SalePrice‘])})
#print train_df.columns
#prices.hist()
#print ‘ok‘
#print train_df.index
# 
print test_df.index

y_train = np.log1p(train_df.pop(‘SalePrice‘))
#print y_train.shape
#print train_df.index

all_df = pd.concat((train_df,test_df), axis=0)
#變量轉換
print train_df.index
print test_df.index

#print all_df[‘MSSubClass‘].dtypes
all_df[‘MSSubClass‘] = all_df[‘MSSubClass‘].astype(str)
#print all_df.shape 

#print all_df[‘MSSubClass‘].value_counts()
#print all_df[‘MSSubClass‘].dtypes
#print pd.get_dummies(all_df[‘MSSubClass‘], prefix=‘MSSubClass‘).head()
#當我們用numerical來表達categorical的時候，要註意，數字本身有大小的含義，所以亂用數字會給之後的模型學習帶來麻煩。於是我們可以用One-Hot的方法來表達category。
#pandas自帶的get_dummies方法，一鍵做到One-Hot。
#把所有的category數據，都給One-Hot了
all_dummy_df = pd.get_dummies(all_df)
#print all_dummy_df.head()
#print all_dummy_df.isnull().sum().sort_values(ascending=False).head(10)
#處理缺失值
mean_cols = all_dummy_df.mean()
#print mean_cols
all_dummy_df = all_dummy_df.fillna(mean_cols)
#print all_dummy_df.isnull().sum().sum()
#標準化numerical數據,這裏，我們當然不需要把One-Hot的那些0/1數據給標準化。我們的目標應該是那些本來就是numerical的數據：
#先來看看 哪些是numerical的
numeric_cols = all_df.columns[all_df.dtypes != ‘object‘]
#print numeric_cols
#print train_df.index
numeric_col_means = all_dummy_df.loc[:, numeric_cols].mean()
numeric_col_std = all_dummy_df.loc[:, numeric_cols].std()
all_dummy_df.loc[:, numeric_cols] = (all_dummy_df.loc[:, numeric_cols] - numeric_col_means) / numeric_col_std

dummy_train_df = all_dummy_df.loc[train_df.index]
dummy_test_df = all_dummy_df.loc[test_df.index]
#print train_df.index
#print test_df.index
#print dummy_train_df.shape
#print dummy_test_df.shape
#print type(dummy_train_df)

X_train = dummy_train_df.values
X_test = dummy_test_df.values
#print type(X_train)

print X_train.shape
alphas = np.logspace(-3, 2, 50)
test_scores = []
for alpha in alphas:
    clf = Ridge(alpha)
    test_score = np.sqrt(-cross_val_score(clf, X_train, y_train, cv=10, scoring=‘neg_mean_squared_error‘))
    test_scores.append(np.mean(test_score))

plt.plot(alphas, test_scores)
plt.title(‘Alpha vs CV Error‘)

max_features = [.1, .3, .5, .7, .9, .99]
test_scores = []
for max_feat in max_features:
    clf = RandomForestRegressor(n_estimators=200, max_features=max_feat)
    test_score = np.sqrt(-cross_val_score(clf, X_train, y_train, cv=5, scoring=‘neg_mean_squared_error‘))
    test_scores.append(np.mean(test_score))

plt.plot(max_features, test_scores)
plt.title("Max Features vs CV Error")

#做一點高級的Ensemble
#這裏，可以不必輸入Base_estimator，使用自帶的，但是結果不及已經調好的 base_estimator，通過作圖可以驗證。
ridge = Ridge(alpha=15)

#Bagging
params = [1, 10, 15, 20, 25, 30, 40]
test_scores = []
for param in params:
    clf = BaggingRegressor(n_estimators=param, base_estimator=ridge)
    test_score = np.sqrt(-cross_val_score(clf, X_train, y_train, cv=10, scoring=‘neg_mean_squared_error‘))
    test_scores.append(np.mean(test_score))

plt.plot(params, test_scores)
plt.title("n_estimator vs CV Error")

#Boosting
params = [10, 15, 20, 25, 30, 35, 40, 45, 50]
test_scores = []
for param in params:
    clf = AdaBoostRegressor(n_estimators=param, base_estimator=ridge)
    test_score = np.sqrt(-cross_val_score(clf, X_train, y_train, cv=10, scoring=‘neg_mean_squared_error‘))
    test_scores.append(np.mean(test_score))

plt.plot(params, test_scores)
plt.title("n_estimator vs CV Error")

#XGBoost
params = [1,2,3,4,5,6]
test_scores = []
for param in params:
    clf = XGBRegressor(max_depth=param)
    test_score = np.sqrt(-cross_val_score(clf, X_train, y_train, cv=10, scoring=‘neg_mean_squared_error‘))
    test_scores.append(np.mean(test_score))

plt.plot(params, test_scores)
plt.title("max_depth vs CV Error")


"""
rf = RandomForestRegressor(n_estimators=500, max_features=.3)

ridge.fit(X_train, y_train)
rf.fit(X_train, y_train)

y_ridge = np.expm1(ridge.predict(X_test))
y_rf = np.expm1(rf.predict(X_test))
y_final = (y_ridge + y_rf) / 2
"""

房價預測《進階版，測試》

rest 哪些 tle blog model lln one atp feature #coding=utf8 import numpy as np import pandas as pd from sklearn.linear_model import Ridge f

高手進階，終極記憶體技術指南——完整/進階版 II (轉)【轉】

轉自：https://blog.csdn.net/lightseed/article/details/4630170 SDRAM晶片的預充電與重新整理操作預充電由於SDRAM的定址具體獨佔性，所以在進行完讀寫操作後，如果要對同一L-Bank的另一行進行定址，就要將原來有效（工作）的行關閉，重新發送行

JMeter深入進階性能測試體系，接口性能測試，各領域企業實戰

處理器 www 寫入百分比隨機化創建數據庫 path multi 博客 jmeter是apache公司基於java開發的一款開源壓力測試工具，體積小，功能全，使用方便，是一個比較輕量級的測試工具，使用起來非常簡單。因為jmeter是java開發的，所以運行的時候必須先

Selenium WebDriver API 進階使用，模組化引數化進行自動化測試設計

WebDriver API 進階使用元素定位我們知道，WebDriver API的呼叫以及自動化測試，務必從頁面元素的定位開始，WebDriver提供了一系列的定位符以便使用元素定位方法。常見的定位符有以下幾種： id name class name tag link te

高手進階，終極記憶體技術指南——完整/進階版

老實話，我個人覺得這兩篇文章真的寫的很不錯，前段時間在study關於DDR記憶體的初始化以及其相關知識的時候，詳細學習了它。個人很佩服原文的作者能夠寫出這麼深刻的文章。（應該說，這篇文章把DDR2的spec翻譯了大部份，對於我們BIOS工程師而言，是一篇不錯的資料。）下面是原

NLP中kaggle比賽例項《每日新聞對股票市場的預測》進階版

這篇比基礎版加了什麼呢？基礎版是直接將文字放入TF-IDF,雖然簡單方便，但還是不夠嚴謹，可以對資料進行進一步處理，如先小寫化，將文字分成曉得tokens，接著刪減如the，a，that等停止詞，用正則

Java經典例項：進階版堆疊實現，支援任何物件型別

支援任何物件型別，有更多的錯誤檢查。 package Stack; /** * Created by Frank */ public class MyStack<T> implements SimpleStack<T> { public stati

四則運算進階版

過去 alt 選擇 logs tro 生成自己文檔 height 鄭媛：201421122093 雍斌凡：201421122100 coding地址：https://coding.net/u/Jerryii/p/Arithmetic2.0/git/tree/master

BZOJ 2243 染色 | 樹鏈剖分模板題進階版

pos sam class getchar() spa namespace top logs ati BZOJ 2243 染色 | 樹鏈剖分模板題進階版這道題呢~就是個帶區間修改的樹鏈剖分~ 如何區間修改？跟樹鏈剖分的區間詢問一個道理，再加上線段樹的區間修改就好了。這道

FFmpeg從入門到精通——進階篇，SEI那些事兒

uid ffffff nco 生成角色根據開發工程師實踐流過濾前言在直播應用的開發過程中，如果把主播端消息事件傳遞到觀眾端，一般會以Instant Messaging（即時通訊）的方式傳遞過去，但因為消息分發通道和直播通道是分開的，因此消息與直播音視頻數據的同步

JMeter 深入進階性能測試體系各領域企業實戰

模塊流數據 arc 點餐工程師微服務架構全棧 out 技術轉型詳情請交流 QQ 709639943 00、JMeter 深入進階性能測試體系各領域企業實戰 00、30天搞定大數據爬蟲項目 00、零基礎實戰機器學學習 00、企業級實戰 Spark離線和

WiFi攻擊進階版——Deauth攻擊

Wifi安全 Deauth 硬件安全 WiFi攻擊一、背景介紹：在之前我們做過Wi-Fi定位劫持實驗，其實有關Wi-Fi的攻擊方式還有很多，而且其中的大多數需要的設備和操作都很簡單。今天就再介紹一種破壞性更強、隱蔽性更高的攻擊方式——取消驗證洪水攻擊。首先了解一下什麽是取消驗證洪水攻擊：

MySQL數據庫進階版 --MySQL項目優化

-- id字段來替方式差異判斷添加表示 sin 項目中常用的 19 條 MySQL 優化方案聲明一下：下面的優化方案都是基於 “ Mysql-索引-BTree類型 ” 的一、EXPLAIN 做MySQL優化，我們要善用 EXPLAIN 查看SQL執行計劃。

進階學習，如何無代碼設計一款美觀且實用的網站？

可用性測試很好 product 收獲遷移落地流行最好的人員作為一門新興的邊緣性職業，網站設計既要從外觀上創意，又要適當結合圖形、版面及交互設計等相關原理，使得它成為一門獨特且令人神往的藝術。毫無疑問，好的設計能讓網站在諸多站點中脫穎而出，優秀的創意和表現方式能

Spring Boot 進階之Web進階學習 - 單元測試

自動生成添加學習 one 類文件 dma AC mock ring 可在類文件中,右鍵->GO TO->Test 自動生成測試文件 1.添加測試註解簡單方法測試 @RunWith(SpringRunner.class)@SpringBootTes

Python爬蟲新手進階版：怎樣讀取非結構化、圖像、視頻、語音數據

image clas 訓練在線的功能方式 base64編碼 contain width 通過open讀取之後會返回一個圖像文件對象，後續所有的圖像處理都基於該對象進行。上述代碼執行後，通過 img.show() 會調用系統默認的圖像瀏覽器查看打

MMM架構實現MySQL高可用讀寫分離（進階版,包含Amoeba）

meid _id status mysqld 服務無法啟動 flush 忽略 kit pri 前兩天逛博客偶然發現了某大神寫的關於MMM的文章，看完就迫不及待的自己試了一下，大神寫的很順暢，以為自己也能操作的很順暢，但是實際情況是手腳太不麻利，碰到很多坑，如果大神的

四則運算進階版

%d () 分數 std 支持 break stdlib.h 輸入題目在原有四則運算基礎上，除整數以外要支持真分數運算（驗證正確性）一次出的題避免相互重復可定制出題數目 #include <stdio.h>#include <stdlib.

結對作業二——WordCount進階版

reporting app develop plan https tps 工作 clas .html 軟工作業三要求地址作業要求地址結對碼雲項目地址結對夥伴：秦玉博客地址 PSP表格 PSP2.1 個人開發流程預估耗費時間（分鐘）實際耗費時間（分鐘）

JMeter 深入進階性能測試進階案例實戰

高並發 lan 分層架構語言工作 mys 網絡拆分生命第1章性能測試整體認知了解性能測試的目的，想做好性能測試都需要掌握哪些方面的技能(開發語言、操作系統、網絡、工具等)。性能測試、穩定性、壓力、疲勞、容量預估、多並發邏輯。掌握如何開始性能測試，並且掌握在性能測

房價預測《進階版，測試》

相關推薦