python3抓取杭州房價資訊
阿新 • • 發佈:2019-02-04
因為馬上要去杭州,杭州房價去年漲的太厲害了,現在政策比較多看不清杭州房價的形式,所以想寫個爬蟲將杭州房產的交易資訊做個記錄。
準備階段
自己的伺服器用的CentOS,上面裝了python3,因為要連線資料庫,需要安裝psycopg2,於是
python3 -m pip install psycopg2
設計好資料庫
----------------------------------------
create table for new house transaction each day
----------------------------------------
CREATE TABLE hangzhou.trans_daily_info (
trans_date DATE NOT NULL,
downtown_new_trans SMALLINT NOT NULL,
downtown_new_vol INTEGER NOT NULL,
xiaoshan_new_trans SMALLINT NOT NULL,
xiaoshan_new_vol INTEGER NOT NULL,
yuhang_new_trans SMALLINT NOT NULL,
yuhang_new_vol INTEGER NOT NULL ,
fuyang_new_trans SMALLINT NOT NULL,
fuyang_new_vol INTEGER NOT NULL,
djd_new_trans SMALLINT NOT NULL,
djd_new_vol INTEGER NOT NULL,
urban_new_daily_trans SMALLINT NOT NULL,
urban_new_daily_vol INTEGER NOT NULL,
other4county_new_qty SMALLINT NOT NULL,
other4country_new_vol INTEGER NOT NULL,
downtown_old_qty SMALLINT NOT NULL
PRIMARY KEY (trans_date)
);
----------------------------------------
create table for weekly hot residence area
----------------------------------------
create table hangzhou.old_weekly_hot_residence(
id SERIAL primary key ,
start_time DATE NOT NULL,
end_time DATE NOT NULL,
residence_name VARCHAR(50) NOT NULL
);
----------------------------------------
create table for weekly hotest residence
----------------------------------------
CREATE TABLE hangzhou.old_weekly_hotest_residence (
start_date DATE NOT NULL,
end_date DATE NOT NULL,
week SMALLINT NOT NULL,
residence_name VARCHAR(50) NOT NULL,
comment TEXT NOT NULL,
PRIMARY KEY (start_date,end_date)
);
----------------------------------------
create table for second hand residence transaction info
----------------------------------------
CREATE TABLE hangzhou.old_trans_weekly_info (
start_date DATE NOT NULL,
end_date DATE NOT NULL,
week SMALLINT NOT NULL,
city_commercial_house_qty INTEGER NOT NULL,
city_residence_qty INTEGER NOT NULL,
urban_commerical_house_qty INTEGER NOT NULL,
urban_residence_qty INTEGER NOT NULL,
shangcheng_qty INTEGER DEFAULT 0 ,
xiacheng_qty INTEGER DEFAULT 0,
jianggan_qty INTEGER DEFAULT 0,
gongshu_qty INTEGER DEFAULT 0,
xihu_qty INTEGER DEFAULT 0,
bingjiang_qty INTEGER DEFAULT 0,
zhijiang_qty INTEGER DEFAULT 0,
xiasha INTEGER DEFAULT 0,
PRIMARY KEY (start_date,end_date)
);
後來發現crontab中的命令不執行,check /var/log/cron中發現也沒有更新,於是check crond 發現問題,重啟
service crond status
當執行的時候發現week欄位多餘於是刪除
alter table hangzhou.old_trans_weekly_info drop week;
alter table hangzhou.old_weekly_hotest_residence drop week;
之後發現需要新增comment2 欄位,於是
alter table hangzhou.old_weekly_hotest_residence ADD comment2 TEXT ;
alter table hangzhou.old_weekly_hotest_residence ALTER comment2 SET NOT NULL;
未完待續