Enhancing endometriosis classification: feature selection strategy with Fourier transform infrared spectroscopy and high-performance liquid chromatography of blood plasma data
SANOOP PAVITHRAN M1, REKHA UPADHYA2, ANJALI M3 AND SANTHOSH CHIDANGIL1
1Manipal Institute of Applied Physics, Manipal Academy of Higher Education, Manipal, Karnataka, India
2Department of Obstetrics & Gynaecology, Kasturba medical college-Manipal, Manipal Academy of Higher Education, Karnataka, India
3Department of Reproductive Medicine, Kasturba medical college-Manipal, Manipal Academy of Higher Education, Karnataka, India
Abstract
Endometriosis is a highly prevalent, non-cancerous gynecological disorder linked to infertility and pelvic pain [1]. It involves the growth of endometrial tissue outside the uterus, often on the ovaries, fallopian tubes, and pelvic lining [2]. Approximately 10% of women in their reproductive age are affected, experiencing symptoms like chronic pelvic pain, irregular bleeding, heavy menstrual flow, painful intercourse, and fertility issues. Despite affecting 176 million women worldwide, diagnosis often takes several years [3].
In the present work the ATR-FTIR data and HPLC data of blood plasma have been utilized to evaluate their diagnostic potential for endometriosis. The FTIR spectra and chromatogram of blood plasma samples from control (40) and endometriosis patients (40) were obtained using ThermoFisher ultimate 3000 HPLC system and JASCO FTIR spectrometer (ATR PRO-1 module) in collaboration with department of Obstetrics and Gynecology and Department of Reproductive Medicine, KMC Manipal. The feature band selection processes are mostly used to eliminate the unnecessary bands that carried no significant information. The feature selection strategy mitigates the risk of model overfitting, which in turn enhances the model’s ability to identify blind samples [4]. The present work explores the capability of wrapper and embedded methods such as GA (genetic algorithm), VIP (variable importance projection) based PLS-DA algorithm and RF (Random Forest) algorithm. The selected data has been used for machine learning based classification with SVM (support vector machines), KNN (K-nearest neighbors) and NN (neural network). Figure 1 and Figure 2 shows the Table 1 shows the feature selection process and Table 2 and 3 shows the model evaluation of the commonly selected features from different feature selection methods for ATR-FTIR and HPLC data respectively.As shown in Table 3 the data selection algorithms have reduced the full range data to a few significant variables and the KNN model has classified the endometriosis from control with a classification accuracy of 93.75% and with a sensitivity of 92.85% as shown in Figure 3. The area under curve of the ROC is 0.99(Figure 4) which is excellent for a diagnostic model [5].
Speaker
sanoop pavithran m
Manipal Institute of Applied Physics
India
Discussion
Ask question