SARATOV FALL MEETING SFM 

© 2026 All Rights Reserved

Refining Interpretability Accuracy in Machine Learning Models for Infrared Spectroscopy

I.S. Golyak1, A.S. Safayan1
1Bauman Moscow State Technical University, Moscow, Russia

Abstract

The paper presents a noninvasive diagnostic method for diabetes mellitus, asthma, and pneumonia using exhaled breath analysis. A tunable quantum cascade laser (QCL) operating in the mid-infrared range (5.3–12.8 µm) was coupled with a multipass Herriott gas cell (optical path length: 76 m) to acquire high-resolution absorption spectra of volatile organic compounds (VOCs) in exhaled breath from 165 participants. The dataset, initially imbalanced across disease classes, was augmented via synthetic minority oversampling (SMOTE) to improve classifier training. Through SHAP (SHapley Additive exPlanations) analysis and permutation feature importance, we identified critical spectral bands (e.g., 8.1–8.3 µm for acetone in diabetes, 8.6–8.8 µm for nitric oxide in asthma) that strongly correlate with disease-specific biomarkers. Post-augmentation and feature selection, the optimized logistic regression and Support Vector Machine models achieved a mean accuracy improvement of 7%, demonstrating enhanced diagnostic capability via interpretable machine learning.

Speaker

Igor Golyak
Bauman Moscow State Technical University
Russia

Discussion

Ask question