python調參神器hyperopt

阿新 • • 發佈：2017-07-12

條件 ssi als sha time ans 模擬退火中間 adf

一、安裝

pip install hyperopt

二、說明

Hyperopt提供了一個優化接口，這個接口接受一個評估函數和參數空間，能計算出參數空間內的一個點的損失函數值。用戶還要指定空間內參數的分布情況。
Hyheropt四個重要的因素：指定需要最小化的函數，搜索的空間，采樣的數據集(trails database)（可選），搜索的算法（可選）。
首先，定義一個目標函數,接受一個變量,計算後返回一個函數的損失值，比如要最小化函數q(x,y) = x**2 + y**2

指定搜索的算法，算法也就是hyperopt的fmin函數的algo參數的取值。當前支持的算法由隨機搜索(對應是hyperopt.rand.suggest)，模擬退火(對應是hyperopt.anneal.suggest)，TPE算法。

關於參數空間的設置，比如優化函數q，輸入fmin(q,space=hp.uniform(‘a’,0,1)).hp.uniform函數的第一個參數是標簽，每個超參數在參數空間內必須具有獨一無二的標簽。hp.uniform指定了參數的分布。其他的參數分布比如

hp.choice返回一個選項，選項可以是list或者tuple.options可以是嵌套的表達式，用於組成條件參數。
hp.pchoice(label,p_options)以一定的概率返回一個p_options的一個選項。這個選項使得函數在搜索過程中對每個選項的可能性不均勻。
hp.uniform(label,low,high)參數在low和high之間均勻分布。

hp.quniform(label,low,high,q),參數的取值是round(uniform(low,high)/q)*q，適用於那些離散的取值。
hp.loguniform(label,low,high)繪制exp(uniform(low,high)),變量的取值範圍是[exp(low),exp(high)]
hp.randint(label,upper) 返回一個在[0,upper)前閉後開的區間內的隨機整數。

搜索空間可以含有list和dictionary.

from hyperopt import hp
list_space = [
hp.uniform(’a’, 0, 1),
hp.loguniform(’b’, 0, 1)]
tuple_space = (
hp.uniform(’a’, 0, 1),
hp.loguniform(’b’, 0, 1))
dict_space = {
’a’: hp.uniform(’a’, 0, 1),
’b’: hp.loguniform(’b’, 0, 1)}

三、簡單例子

from hyperopt import  hp,fmin, rand, tpe, space_eval

def q (args) :
    x, y = args
    return x**2-2*x+1 + y**2

space = [hp.randint(‘x‘, 5), hp.randint(‘y‘, 5)]

best = fmin(q,space,algo=rand.suggest,max_evals=10)

print(best)

輸出：

{‘x‘: 2, ‘y‘: 0}

四、xgboost舉例

xgboost具有很多的參數，把xgboost的代碼寫成一個函數，然後傳入fmin中進行參數優化，將交叉驗證的auc作為優化目標。auc越大越好，由於fmin是求最小值，因此求-auc的最小值。所用的數據集是202列的數據集，第一列樣本id，最後一列是label,中間200列是屬性。

#coding:utf-8
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import xgboost as xgb
from random import shuffle
from xgboost.sklearn import XGBClassifier
from sklearn.cross_validation import cross_val_score
import pickle
import time
from hyperopt import fmin, tpe, hp,space_eval,rand,Trials,partial,STATUS_OK

def loadFile(fileName = "E://zalei//browsetop200Pca.csv"):
    data = pd.read_csv(fileName,header=None)
    data = data.values
    return data

data = loadFile()
label = data[:,-1]
attrs = data[:,:-1]
labels = label.reshape((1,-1))
label = labels.tolist()[0]

minmaxscaler = MinMaxScaler()
attrs = minmaxscaler.fit_transform(attrs)

index = range(0,len(label))
shuffle(index)
trainIndex = index[:int(len(label)*0.7)]
print len(trainIndex)
testIndex = index[int(len(label)*0.7):]
print len(testIndex)
attr_train = attrs[trainIndex,:]
print attr_train.shape
attr_test = attrs[testIndex,:]
print attr_test.shape
label_train = labels[:,trainIndex].tolist()[0]
print len(label_train)
label_test = labels[:,testIndex].tolist()[0]
print len(label_test)
print np.mat(label_train).reshape((-1,1)).shape


def GBM(argsDict):
    max_depth = argsDict["max_depth"] + 5
    n_estimators = argsDict[‘n_estimators‘] * 5 + 50
    learning_rate = argsDict["learning_rate"] * 0.02 + 0.05
    subsample = argsDict["subsample"] * 0.1 + 0.7
    min_child_weight = argsDict["min_child_weight"]+1
    print "max_depth:" + str(max_depth)
    print "n_estimator:" + str(n_estimators)
    print "learning_rate:" + str(learning_rate)
    print "subsample:" + str(subsample)
    print "min_child_weight:" + str(min_child_weight)
    global attr_train,label_train

    gbm = xgb.XGBClassifier(nthread=4,    #進程數
                            max_depth=max_depth,  #最大深度
                            n_estimators=n_estimators,   #樹的數量
                            learning_rate=learning_rate, #學習率
                            subsample=subsample,      #采樣數
                            min_child_weight=min_child_weight,   #孩子數
                            max_delta_step = 10,  #10步不降則停止
                            objective="binary:logistic")

    metric = cross_val_score(gbm,attr_train,label_train,cv=5,scoring="roc_auc").mean()
    print metric
    return -metric

space = {"max_depth":hp.randint("max_depth",15),
         "n_estimators":hp.randint("n_estimators",10),  #[0,1,2,3,4,5] -> [50,]
         "learning_rate":hp.randint("learning_rate",6),  #[0,1,2,3,4,5] -> 0.05,0.06
         "subsample":hp.randint("subsample",4),#[0,1,2,3] -> [0.7,0.8,0.9,1.0]
         "min_child_weight":hp.randint("min_child_weight",5), #
        }
algo = partial(tpe.suggest,n_startup_jobs=1)
best = fmin(GBM,space,algo=algo,max_evals=4)

print best
print GBM(best)

詳細參考：http://blog.csdn.net/qq_34139222/article/details/60322995

python調參神器hyperopt

條件 ssi als sha time ans 模擬退火中間 adf 一、安裝 pip install hyperopt 二、說明 Hyperopt提供了一個優化接口，這個接口接受一個評估函數和參數空間，能計算出參數空間內的一個點的損失函數值。用戶還要指定空間內參數的分布

python調參神器hyperopt

python調參神器hyperopt

python----貝葉斯優化調參之Hyperopt

【技術翻譯】支持向量機簡明教程及其在python和R下的調參

Python中Gradient Boosting Machine(GBM）調參方法詳解

嵌入Python | 調用Python模塊中有參數的函數

Hyperopt調參時: ‘generator‘ object is not subs

python 機器學習中模型評估和調參

如何使用hyperopt對xgboost進行自動調參

深度學習模型調參-基於keras的python學習筆記（四）

Python資料分析與機器學習-SVM調參例項

ROS動態調參(dynamic reconfigure)客戶端服務端之C++ Python實現

XGBoost-Python完全調參指南-引數解釋篇

產品經理學Python：參數傳遞方式

python調用http接口,並入mysql數據庫

Python 調用C函數

python調用Java代碼，完畢JBPM工作流application

XGBoost調參

Python 調用讓系統自動調用默認程序打開文件？

內核調試神器SystemTap — 簡單介紹與使用（一）

回發或回調參數無效 “HtmlSelect”不能有類型為“LiteralControl”的子級

python調參神器hyperopt

相關推薦