Housing Prices Competition

阿新 • • 發佈：2019-01-13

# Code you have previously used to load data
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor




# Path of the file to read. We changed the directory structure to simplify submitting to a competition 

iowa_file_path = r'G:/kaggle/housePrice/train.csv'

home_data = pd.read_csv(iowa_file_path)
# Create target object and call it y
y = home_data.SalePrice
# Create X
features = ['LotArea','YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd', 
            'MSSubClass','OverallQual' 
,'OverallCond','YearRemodAdd','MasVnrArea','BsmtFullBath','BsmtHalfBath','HalfBath','KitchenAbvGr','TotRmsAbvGrd','Fireplaces','GarageCars','GarageArea','PoolArea']
X = home_data[features]

#handle NaN , mean values every column
from sklearn.impute import SimpleImputer
my_imputer = SimpleImputer()
X = 
 my_imputer.fit_transform(X)

# Split into validation and training data
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)

# Specify Model
iowa_model = DecisionTreeRegressor(random_state=1)
# Fit Model
iowa_model.fit(train_X, train_y)

# Make validation predictions and calculate mean absolute error
val_predictions = iowa_model.predict(val_X)
val_mae = mean_absolute_error(val_predictions, val_y)
print("Validation MAE when not specifying max_leaf_nodes: {:,.0f}".format(val_mae))

# Using best value for max_leaf_nodes
iowa_model = DecisionTreeRegressor(max_leaf_nodes=100, random_state=1)
iowa_model.fit(train_X, train_y)
val_predictions = iowa_model.predict(val_X)
val_mae = mean_absolute_error(val_predictions, val_y)
print("Validation MAE for best value of max_leaf_nodes: {:,.0f}".format(val_mae))

# Define the model. Set random_state to 1
rf_model = RandomForestRegressor(random_state=1)
rf_model.fit(train_X, train_y)
rf_val_predictions = rf_model.predict(val_X)
rf_val_mae = mean_absolute_error(rf_val_predictions, val_y)

print("Validation MAE for Random Forest Model: {:,.0f}".format(rf_val_mae))

Validation MAE when not specifying max_leaf_nodes: 28,365
Validation MAE for best value of max_leaf_nodes: 26,087
Validation MAE for Random Forest Model: 18,974


d:\python27\lib\site-packages\sklearn\ensemble\forest.py:248: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)

Create a model :採用了Random Forest

# To improve accuracy, create a new Random Forest model which you will train on all training data
rf_model_on_full_data = RandomForestRegressor(random_state=1)

# fit rf_model_on_full_data on all data from the 
rf_model_on_full_data.fit(X,y)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,
           oob_score=False, random_state=1, verbose=0, warm_start=False)

Make predictions

# path to file you will use for predictions
test_data_path = r'G:/kaggle/housePrice/test.csv'

# read test data file using pandas
test_data = pd.read_csv(test_data_path)

# create test_X which comes from test_data but includes only the columns you used for prediction.
# The list of columns is stored in a variable called features
test_X = test_data[features]

#handle NaN , mean values every column
from sklearn.impute import SimpleImputer
my_imputer = SimpleImputer()
test_X = my_imputer.fit_transform(test_X)

# make predictions which we will submit. 
test_preds = rf_model_on_full_data.predict(test_X)

# The lines below shows you how to save your data in the format needed to score it in the competition
output = pd.DataFrame({'Id': test_data.Id,
                       'SalePrice': test_preds})

output.to_csv(r'G:/kaggle/housePrice/submission.csv',index=0)

output.head()

	Id	SalePrice
0	1461	121550.8
1	1462	151415.0
2	1463	174114.0
3	1464	175490.0
4	1465	201250.0

Housing Prices Competition

# Code you have previously used to load data import pandas as pd from sklearn.ensemble import RandomForestRegressor from sklearn.metrics import

advanced regression to predict housing prices

p-value：拒絕原假設H0時犯錯誤的概率，即其值越小，越說明拒絕原假設H0 接受備擇假設H1是正確的。直觀來說，就是犯錯概率越低越好。也可以解釋為，假定“不靠譜”原假設為真時，得到與樣本相同或者比樣本更極端結果的概率。例如，原假設“人們拇指平均

【bzoj4145】[AMPPZ2014]The Prices 狀壓dp

return std sin highlight string span 題目狀態壓縮dp print 原文地址：http://www.cnblogs.com/GXZlegend/p/6832200.html 題目描述你要購買m種物品各一件，一共有n家商店，你到第i家

Interesting Housing Problem HDU - 2426 （KM）

set problem 技術 origin nbsp display 超時 log pla Interesting Housing Problem HDU - 2426 題意：n個人，m個房間，安排住宿。要求每個人不能分到不喜歡的房間，且使滿意度最大。不用slack幾

bzoj4145 [AMPPZ2014]The Prices

狀壓 etc AI inpu def 現在 while set 一行 Description 你要購買 \(m\) 種物品各一件，一共有 \(n\) 家商店，你到第 \(i\) 家商店的路費為 \(d[i]\) ，在第 \(i\) 家商店購買第 \(j\) 種物品的費用為

[AMPPZ2014]The Prices

cstring class 依次 sam 套路 out ice cpp 當前 Description 你要購買m種物品各一件，一共有n家商店，你到第i家商店的路費為d[i]，在第i家商店購買第j種物品的費用為c[i][j]，求最小總費用。 Input 第一行包含兩個正整數n

Power BI For Competition

bsp 技術 img 分享圖片 nbsp png for inf mage Power BI For Competition

Cooking Competition（POJ）

BCI Competition 2008 - Graz data set B（中文翻譯）

R. Leeb 1 , C. Brunner 1 , G. R. M uller-Putz 1 , A.Schlogl 2 , and G.Pfurtscheller 1

"巴卡斯杯" 中國大學生程式設計競賽 - 女生專場 C - Luck Competition

Participants of the Luck Competition choose a non-negative integer no more than 100 in their mind. After choosing their number, let KK

ICDAR2017 Competition on Reading Chinese Text in the Wild(RCTW-17) 介紹

閱讀文章：《ICDAR2017 Competition on Reading Chinese Text in the Wild(RCTW-17)》　　這篇文章是對一項中文檢測和識別比賽專案（RCTW）的介紹和總結，這是一項新的專注於中文識別的競賽。這項競賽的特點在於，包含12263張標註過的中文資料集，有

HDU 2426 Interesting Housing Problem (最大權完美匹配)【KM】

!= string += 需要 bsp std 題意 case 個學生 <題目鏈接> 題目大意：學校裏有n個學生和m個公寓房間，每個學生對一些房間有一些打分，如果分數為正，說明學生喜歡這個房間，若為0，對這個房間保持中立，若為負，則不喜歡這個房間。學生不會

Asphalt Mix Plant Prices – Here's What You Must Know

In case you are running any kind of construction business that needs to make new driveways, sidewalks, or highways, then it’s really worth looking

My secret sauce to be in top 2% of a kaggle competition

from:https://towardsdatascience.com/my-secret-sauce-to-be-in-top-2-of-a-kaggle-competition-57cff0677d3c Competing in kaggle competitions is fun and

Codeforces 1082C Multi-Subject Competition（字首+思維）

題目連結：Multi-Subject Competition 題意：給定n名選手，每名選手都有唯一選擇的科目si和對應的能力水平。並且給定科目數量為m。求選定若干個科目，並且每個科目參與選手數量相同的情況下的最大能力水平。題解：每位選手扔到對應的科目裡面，從1-m遍歷科目，能力值排序下，維護下能力值和，

Codeforces 1082C Multi-Subject Competition（前綴+思維）

tar \n isp code pla using clas algorithm span 題目鏈接：Multi-Subject Competition 題意：給定n名選手，每名選手都有唯一選擇的科目si和對應的能力水平。並且給定科目數量為m。求選定若幹個科目，並且每個科

ANZ Chengdu Data Science Competition——BASELINE 澳新銀行存款大資料建模預測

# -*- coding: utf-8 -*- """ Created on Fri Nov 9 09:58:21 2018 @author: Lenovo """ import lightgbm as lgb import pandas as pd from sklearn.model_

C. Multi-Subject Competition

連結 [https://codeforces.com/contest/1082/problem/C] 題意有n個人，m個科目，每個人都有選的科目si,以及他的能力值ri, 規則是每個科目要麼選要麼不選的，選的那些科目要求人數相同，問你最大能力總和是多少分析就先排序，求字首和，後面就貪心程式

C. Multi-Subject Competition—貪心

http://codeforces.com/contest/1082/problem/C 題意：有N個人M門課，給出每個人學習的課的種類x，掌握的水平y，然後選擇一部分人使得分數最高，選擇規則為，要麼不選人，選人的課人數就得相等。思路：桶標記一下每一個課

Codeforces 1082C Multi-Subject Competition 字首和 A

Aizu 0525 Osenbei https://vjudge.net/problem/Aizu-0525 time limit per test：2 seconds memory limit per test：256 megabytes input：standard input

Housing Prices Competition

Create a model :採用了Random Forest

Make predictions

相關推薦