这次终于彻底理解了 LightGBM 原理及代码( 六 )

 
(2)基于Scikit-learn接口的分类from lightgbm import LGBMClassifierfrom sklearn.metrics import accuracy_scorefrom sklearn.model_selection import GridSearchCVfrom sklearn.datasets import load_irisfrom sklearn.model_selection import train_test_splitfrom sklearn.externals import joblib# 加载数据iris = load_irisdata = https://www.isolves.com/it/ai/2022-03-04/iris.datatarget = iris.target# 划分训练数据和测试数据X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.2)# 模型训练gbm = LGBMClassifier(num_leaves=31, learning_rate=0.05, n_estimators=20)gbm.fit(X_train, y_train, eval_set=[(X_test, y_test)], early_stopping_rounds=5)# 模型存储joblib.dump(gbm, 'loan_model.pkl')# 模型加载gbm = joblib.load('loan_model.pkl')# 模型预测y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration_)# 模型评估print('The accuracy of prediction is:', accuracy_score(y_test, y_pred))# 特征重要度print('Feature importances:', list(gbm.feature_importances_))# 网格搜索,参数优化estimator = LGBMClassifier(num_leaves=31)param_grid = {'learning_rate': [0.01, 0.1, 1],'n_estimators': [20, 40]}gbm = GridSearchCV(estimator, param_grid)gbm.fit(X_train, y_train)print('Best parameters found by grid search are:', gbm.best_params_) 
(3)基于LightGBM原生接口的回归对于LightGBM解决回归问题,我们用Kaggle比赛中回归问题:House Prices: Advanced Regression Techniques,地址:
https://www.kaggle.com/c/house-prices-advanced-regression-techniques 来进行实例讲解 。
该房价预测的训练数据集中一共有列,第一列是Id,最后一列是label,中间列是特征 。这列特征中,有列是分类型变量,列是整数变量,列是浮点型变量 。训练数据集中存在缺失值 。
import pandas as pdfrom sklearn.model_selection import train_test_splitimport lightgbm as lgbfrom sklearn.metrics import mean_absolute_errorfrom sklearn.preprocessing import Imputer# 1.读文件data = https://www.isolves.com/it/ai/2022-03-04/pd.read_csv('./dataset/train.csv')# 2.切分数据输入:特征 输出:预测目标变量y = data.SalePriceX = data.drop(['SalePrice'], axis=1).select_dtypes(exclude=['object'])# 3.切分训练集、测试集,切分比例7.5 : 2.5train_X, test_X, train_y, test_y = train_test_split(X.values, y.values, test_size=0.25)# 4.空值处理,默认方法:使用特征列的平均值进行填充my_imputer = Imputertrain_X = my_imputer.fit_transform(train_X)test_X = my_imputer.transform(test_X)# 5.转换为Dataset数据格式lgb_train = lgb.Dataset(train_X, train_y)lgb_eval = lgb.Dataset(test_X, test_y, reference=lgb_train)# 6.参数params = {'task': 'train','boosting_type': 'gbdt', # 设置提升类型'objective': 'regression', # 目标函数'metric': {'l2', 'auc'}, # 评估函数'num_leaves': 31, # 叶子节点数'learning_rate': 0.05, # 学习速率'feature_fraction': 0.9, # 建树的特征选择比例'bagging_fraction': 0.8, # 建树的样本采样比例'bagging_freq': 5, # k 意味着每 k 次迭代执行bagging'verbose': 1 # <0 显示致命的, =0 显示错误 (警告), >0 显示信息}# 7.调用LightGBM模型,使用训练集数据进行训练(拟合)# Add verbosity=2 to print messages while running boostingmy_model = lgb.train(params, lgb_train, num_boost_round=20, valid_sets=lgb_eval, early_stopping_rounds=5)# 8.使用模型对测试集数据进行预测predictions = my_model.predict(test_X, num_iteration=my_model.best_iteration)# 9.对模型的预测结果进行评判(平均绝对误差)print("Mean Absolute Error : " + str(mean_absolute_error(predictions, test_y))) 
(4)基于Scikit-learn接口的回归import pandas as pdfrom sklearn.model_selection import train_test_splitimport lightgbm as lgbfrom sklearn.metrics import mean_absolute_errorfrom sklearn.preprocessing import Imputer# 1.读文件data = https://www.isolves.com/it/ai/2022-03-04/pd.read_csv('./dataset/train.csv')# 2.切分数据输入:特征 输出:预测目标变量y = data.SalePriceX = data.drop(['SalePrice'], axis=1).select_dtypes(exclude=['object'])# 3.切分训练集、测试集,切分比例7.5 : 2.5train_X, test_X, train_y, test_y = train_test_split(X.values, y.values, test_size=0.25)# 4.空值处理,默认方法:使用特征列的平均值进行填充my_imputer = Imputertrain_X = my_imputer.fit_transform(train_X)test_X = my_imputer.transform(test_X)# 5.调用LightGBM模型,使用训练集数据进行训练(拟合)# Add verbosity=2 to print messages while running boostingmy_model = lgb.LGBMRegressor(objective='regression', num_leaves=31, learning_rate=0.05, n_estimators=20,verbosity=2)my_model.fit(train_X, train_y, verbose=False)# 6.使用模型对测试集数据进行预测predictions = my_model.predict(test_X)# 7.对模型的预测结果进行评判(平均绝对误差)print("Mean Absolute Error : " + str(mean_absolute_error(predictions, test_y)))5.3 LightGBM调参
在上一部分中,LightGBM模型的参数有一部分进行了简单的设置,但大都使用了模型的默认参数,但默认参数并不是最好的 。要想让LightGBM表现的更好,需要对LightGBM模型进行参数微调 。下图展示的是回归模型需要调节的参数,分类模型需要调节的参数与此类似 。


推荐阅读