Chinese Journal of Chromatography ›› 2025, Vol. 43 ›› Issue (4): 355-362.DOI: 10.3724/SP.J.1123.2024.07014
• Articles • Previous Articles Next Articles
WANG Qianyi, ZHU Yongle, LI Xuehua*()
Received:
2024-07-21
Online:
2025-04-08
Published:
2025-03-26
CLC Number:
WANG Qianyi, ZHU Yongle, LI Xuehua. Construction of a machine learning ensemble prediction model for gas chromatographic retention index on stationary phases with different polarities[J]. Chinese Journal of Chromatography, 2025, 43(4): 355-362.
Category | Number | Chromatographic column | Stationary phase | McReynolds’ constant | Refs. |
---|---|---|---|---|---|
Strong polarity | 1372 | carbowax-20M | polyethylene glycol | 462 | [ |
Polarity | 198 | DB-225MS | 50% cyanopropylphenyl-50% dimethylpolysiloxane | 363* | [ |
Medium polarity | 484 | DB-624 | 6% cyanopropylphenyl-94% dimethylpolysiloxane | 158* | [ |
OV17 | 50% diphenyl-50% dimethylpolysiloxane | 177 | [ | ||
Weak polarity | 1316 | DB-5 | 5% diphenyl-95% dimethylpolysiloxane | 67 | [ |
HP5-MS | 67 | [ | |||
- | 67 | [ | |||
Non-polar | 817 | HP-1 | 100% dimethylpolysiloxane | 44 | [ |
OV101 | 44 | [ | |||
- | 44 | [ |
Table 1 Source and distribution of 4237 modeling data
Category | Number | Chromatographic column | Stationary phase | McReynolds’ constant | Refs. |
---|---|---|---|---|---|
Strong polarity | 1372 | carbowax-20M | polyethylene glycol | 462 | [ |
Polarity | 198 | DB-225MS | 50% cyanopropylphenyl-50% dimethylpolysiloxane | 363* | [ |
Medium polarity | 484 | DB-624 | 6% cyanopropylphenyl-94% dimethylpolysiloxane | 158* | [ |
OV17 | 50% diphenyl-50% dimethylpolysiloxane | 177 | [ | ||
Weak polarity | 1316 | DB-5 | 5% diphenyl-95% dimethylpolysiloxane | 67 | [ |
HP5-MS | 67 | [ | |||
- | 67 | [ | |||
Non-polar | 817 | HP-1 | 100% dimethylpolysiloxane | 44 | [ |
OV101 | 44 | [ | |||
- | 44 | [ |
Fig. 3 One-way analysis of variance for retention index (RI) of compounds with different polarity columns (n=45) a and b meant there was a significant difference among groups (p<0.05).
Regression model | Hyperparameterization |
---|---|
LR | - |
DT | max_depth=17.00, random_state=85.00, min_ |
samples_leaf=1.00, min_samples_split=4.00 | |
RF | n_estimators=251.00, random_state=0.00, |
max_depth=17.00 | |
SVR | kernel=radial basis function, C=49.00 |
KNN | n_neighbors=7.00, weights=‘distance’ |
GBDT | n_estimators=291.00, random_state=0.00 |
XGBoost | Booster=‘gbtree’, n_estimators=166.00, earning_ |
rate=0.12, max_depth=5.00, colsample_bytree=0.56, | |
gamma=0.99, reg_alpha=0.57, reg_lambda=0.91, | |
subsample=0.90 | |
AdaBoost | base_estimator=DecisionTreeRegressor, n_estimators= |
21.00, random_state=80.00, learning_rate=0.30 | |
LightGBM | n_estimators=299.00, learning_rate=0.10, |
max_depth=5.00, random_state=80.00 | |
VR | - |
Table 2 Hyperparameters for 10 machine learning prediction models
Regression model | Hyperparameterization |
---|---|
LR | - |
DT | max_depth=17.00, random_state=85.00, min_ |
samples_leaf=1.00, min_samples_split=4.00 | |
RF | n_estimators=251.00, random_state=0.00, |
max_depth=17.00 | |
SVR | kernel=radial basis function, C=49.00 |
KNN | n_neighbors=7.00, weights=‘distance’ |
GBDT | n_estimators=291.00, random_state=0.00 |
XGBoost | Booster=‘gbtree’, n_estimators=166.00, earning_ |
rate=0.12, max_depth=5.00, colsample_bytree=0.56, | |
gamma=0.99, reg_alpha=0.57, reg_lambda=0.91, | |
subsample=0.90 | |
AdaBoost | base_estimator=DecisionTreeRegressor, n_estimators= |
21.00, random_state=80.00, learning_rate=0.30 | |
LightGBM | n_estimators=299.00, learning_rate=0.10, |
max_depth=5.00, random_state=80.00 | |
VR | - |
Regression model | Training (n=2928) | Testing (n=1255) | |||||
---|---|---|---|---|---|---|---|
R2 | RMSE | R2 | RMSE | ||||
LR | 0.93 | 0.93 | 163.81±14.55 | 0.93 | 0.94 | 153.55±12.13 | |
DT | 0.99 | 0.96 | 166.19±22.53 | 0.956 | 0.95 | 186.87±19.72 | |
RF | 1.00 | 0.93 | 114.31±15.32 | 0.92 | 0.91 | 134.78±13.52 | |
SVR | 0.88 | 0.86 | 228.68±32.04 | 0.87 | 0.77 | 288.76±36.38 | |
KNN | 0.99 | 0.92 | 165.20±16.84 | 0.91 | 0.90 | 180.24±20.13 | |
GBDT | 0.99 | 0.96 | 113.12±17.95 | 0.96 | 0.97 | 108.04±8.87 | |
XGBoost | 0.99 | 0.97 | 106.03±14.25 | 0.97 | 0.97 | 107.82±14.60 | |
AdaBoost | 0.99 | 0.96 | 116.03±18.33 | 0.96 | 0.94 | 143.13±18.73 | |
LightGBM | 0.99 | 0.97 | 104.67±17.19 | 0.97 | 0.96 | 116.94±15.77 | |
VR | 0.99 | 0.97 | 101.85±17.73 | 0.97 | 0.97 | 107.44±15.63 |
Table 3 Predictive performance of 10 machine learning models
Regression model | Training (n=2928) | Testing (n=1255) | |||||
---|---|---|---|---|---|---|---|
R2 | RMSE | R2 | RMSE | ||||
LR | 0.93 | 0.93 | 163.81±14.55 | 0.93 | 0.94 | 153.55±12.13 | |
DT | 0.99 | 0.96 | 166.19±22.53 | 0.956 | 0.95 | 186.87±19.72 | |
RF | 1.00 | 0.93 | 114.31±15.32 | 0.92 | 0.91 | 134.78±13.52 | |
SVR | 0.88 | 0.86 | 228.68±32.04 | 0.87 | 0.77 | 288.76±36.38 | |
KNN | 0.99 | 0.92 | 165.20±16.84 | 0.91 | 0.90 | 180.24±20.13 | |
GBDT | 0.99 | 0.96 | 113.12±17.95 | 0.96 | 0.97 | 108.04±8.87 | |
XGBoost | 0.99 | 0.97 | 106.03±14.25 | 0.97 | 0.97 | 107.82±14.60 | |
AdaBoost | 0.99 | 0.96 | 116.03±18.33 | 0.96 | 0.94 | 143.13±18.73 | |
LightGBM | 0.99 | 0.97 | 104.67±17.19 | 0.97 | 0.96 | 116.94±15.77 | |
VR | 0.99 | 0.97 | 101.85±17.73 | 0.97 | 0.97 | 107.44±15.63 |
Order | Method | Stationary phases | Number | Training | Testing | Ref. | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
R2 | RMSE | MAE | R2 | RMSE | MAE | ||||||
1 | VR | five polar stationary phases | 4183 | 0.99 | 101.85 | 23.60 | 0.97 | 107.44 | 75.28 | this study | |
2 | GNN | SSNP | 29518 | - | - | 11.80 | 0.99 | - | 30.92 | [ | |
SNP | 14033 | - | - | 23.33 | 0.99 | - | 42.41 | ||||
SP | 7052 | - | - | 45.46 | 0.95 | - | 84.34 | ||||
3 | GNN | non polarity | 94183 | - | 20.69 | - | - | 57.90 | - | [ | |
4 | PLR | non polarity | 90 | - | - | - | 0.99 | 17.40 | - | [ | |
5 | - | strong polarity | 1179 | 0.83 | 170.90 | 124.20 | 0.90 | 132.90 | 102.70 | [ |
Table 4 Comparison of the ensemble learning prediction model with previous models
Order | Method | Stationary phases | Number | Training | Testing | Ref. | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
R2 | RMSE | MAE | R2 | RMSE | MAE | ||||||
1 | VR | five polar stationary phases | 4183 | 0.99 | 101.85 | 23.60 | 0.97 | 107.44 | 75.28 | this study | |
2 | GNN | SSNP | 29518 | - | - | 11.80 | 0.99 | - | 30.92 | [ | |
SNP | 14033 | - | - | 23.33 | 0.99 | - | 42.41 | ||||
SP | 7052 | - | - | 45.46 | 0.95 | - | 84.34 | ||||
3 | GNN | non polarity | 94183 | - | 20.69 | - | - | 57.90 | - | [ | |
4 | PLR | non polarity | 90 | - | - | - | 0.99 | 17.40 | - | [ | |
5 | - | strong polarity | 1179 | 0.83 | 170.90 | 124.20 | 0.90 | 132.90 | 102.70 | [ |
|
[1] | YANG Meifang, ZHENG Kangni, LONG Yixing, LI Yijie, WANG Xueping, ZHANG Junhui, YUAN Liming. Two-dimensional chiral metal-organic-framework nanosheets based on Co-BDC-NH2 used as stationary phases for gas chromatography [J]. Chinese Journal of Chromatography, 2025, 43(4): 335-344. |
[2] | JU Min, SONG Yuming, ZHAO Jinfeng, SUN Yuming, ZHOU Lina, YIN Qingxin, WANG Chen, CAI Rui, XU Qiang, WAN Huihui. Determination of 18 free amino acids in strawberries at different ripening stages by ultra performance liquid chromatography-triple quadrupole mass spectrometry based on hydrophilic interaction [J]. Chinese Journal of Chromatography, 2025, 43(4): 372-381. |
[3] | HUANG Xinghua, HUANG Yiyao, GAO Wu, ZHANG Yida, LIU Xiaoyan, ZHANG Haixia. Open experiment: QuEChERS combined with fluorescence derivatization for the detection of atrazine and its effect on enzyme activity [J]. Chinese Journal of Chromatography, 2025, 43(4): 388-393. |
[4] | TANG Yan, WEN Sheng, CAO Wencheng, LIU Xiao, LEI Chenglin, CHENG Qingyun, CHEN Haichuan, LIU Ling, LIU Xiaofang, ZHOU Yan. Determination of eight organophosphate esters in animal-derived foods by ultra performance liquid chromatography-tandem mass spectrometry [J]. Chinese Journal of Chromatography, 2025, 43(4): 309-316. |
[5] | FENG Zhuangzhuang, LIN Xiao, BAO Dejun, HU Xiaojian, ZHANG Haijing, ZHU Ying, ZHANG Xu. Determination of four oxidative stress biomarkers in human urine using solid-phase extraction coupled with ultra performance liquid chromatography-tandem mass spectrometry [J]. Chinese Journal of Chromatography, 2025, 43(4): 317-325. |
[6] | HU Yangyang, YANG Ge, QU Feng. Research advances in non-immobilized aptamer screening techniques for small-molecule targets [J]. Chinese Journal of Chromatography, 2025, 43(4): 297-308. |
[7] | LIU Bolin, ZHANG Dan, ZHAO Ziwei, XIE Ji’an, ZHAN Ziyue, ZHANG Qi, LI Weidong. Rapid and simultaneous determination of 11 ergot alkaloids in cereals and their products by ultra performance liquid chromatography-tandem mass spectrometry combined with Captiva EMR-Lipid column purification [J]. Chinese Journal of Chromatography, 2025, 43(4): 326-334. |
[8] | GUO Lulu, ZHANG Chen, HUANG Yanjun, LIU Xingyu, LIU Deshui, LONG Teng, SUN Jinhao, LIU Shaofeng, LI Zhonghao, WANG Jiazhong, MAO Jian. Effects of nicotine exposure on endogenous metabolites in mouse brain based on metabolomics and mass spectrometry imaging [J]. Chinese Journal of Chromatography, 2025, 43(4): 363-371. |
[9] | PENG Yonghan, TANG Jingwen, LI Yihua, ZHANG Feifang, YANG Bingcheng. Design and application of a gas-liquid separator for the removal of carbon dioxide in the eluent of an ion chromatography system [J]. Chinese Journal of Chromatography, 2025, 43(4): 382-387. |
[10] | GAO Menghao, LI Xiaoying, GAO Yuan, ZHANG Haijun, CHEN Jiping. Determination of four classes of 34 chlorinated persistent organic pollutants in seawater by solid-phase extraction and gas chromatography-electrostatic field orbitrap high resolution mass spectrometry [J]. Chinese Journal of Chromatography, 2025, 43(4): 345-354. |
[11] | XU Jianjun, LOU Chaoyan, ZHUO Yanhong, ZHU Yan. Determination of glucose in exhaled breath and saliva by ion chromatography [J]. Chinese Journal of Chromatography, 2025, 43(3): 245-251. |
[12] | LIU Yu, XIE Jihui, ZHANG Pingping, ZHOU Di, ZHAO Weike, ZHANG Juzhou. Determination of vitamin D and 25-hydroxyvitamin D in animal-derived foods by derivatization-ultra performance liquid chromatography-tandem mass spectrometry [J]. Chinese Journal of Chromatography, 2025, 43(3): 228-236. |
[13] | LI Yilan, YUAN Huiming, CAO Jingtian, ZHENG Yidi, LI Lan, GAO Peifeng. Open experiment: quantitative proteomics analysis of thyroid-cancer tissue slices using ultra-high performance liquid chromatography-tandem mass spectrometry [J]. Chinese Journal of Chromatography, 2025, 43(3): 275-282. |
[14] | MAO Lingwen, SUN Hao, CHEN Haijie, YANG Qianzhan, XU Lan. Determination of 30 bile acids in the bile of Micropterus salmoides and Ctenopharyngodon idella using ultra-high performance liquid chromatography-triple quadrupole mass spectrometry [J]. Chinese Journal of Chromatography, 2025, 43(3): 220-227. |
[15] | LIU Junjun, LI Ju, YU Wanwan, HAN Ying, MA Xinxin, ZHAN Chunrui, LI Shixiang, WU Huawen, HU Kui, WAN Jianchun. Determination of cycloxaprid and paichongding residues in foods of plant origin by ultra performance liquid chromatography-tandem mass spectrometry [J]. Chinese Journal of Chromatography, 2025, 43(3): 261-268. |
Viewed | ||||||||||||||||||||||||||||||||||||||||||||||||||
Full text 58
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||
Abstract 121
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||