SARATOV FALL MEETING SFM 

© 2026 All Rights Reserved

Informative feature selection for machine learning predictive models based on THZ and IR data

Denis A. Vrazhnov1, Alexey V. Borisov1, Viktor V. Nikolaev1, Georgy K. Raspopin1, Didar R. Makashev1, Yuri V. Kistenev1; 1Tomsk State University, Tomsk, Russia

Abstract

Construction of explainable prediction machine learning models for THz and IR spectral data plays pivotal role in biomedical and physical tasks, such as identification of molecular biomarkers of diseases, and detection of presence specific substances. Main problems are small sample size, high correlation between data on various frequencies, and high data dimensionality. Existing methods are tradeoff between generalization ability of predictive models, computational performance, and quality metrics. In addition, methods may point to different informative features, so external validation procedures are needed.
Results of comparative study of various feature selection techniques applied to highly dimensional spectral data are presented. Due to the small sample size problem special attention is paid to methods of generating predictive models with good generalizing ability, such as Support Vector Machines. Possible solutions and trends for artificial neural networks are presented.
This research was funded by the Ministry of Science and Higher Education of the Russian Federation grant number 075-15-2024-557 dated 04/25/2024.

Speaker

Denis A. Vrazhnov
Tomsk State University
Russia

Discussion

Ask question