Abstract
Background and Aim
Coronary artery disease (CAD) remains the leading cause of morbidity and mortality worldwide, underscoring the need for early detection of CAD before myocardial infarction (MI) develops.
Materials and Methods
This cross-sectional study included 471 Iraqi participants (126 controls, 126 confirmed CAD with MI and 149 suspected CAD without MI) assessed at cardiology departments in Baghdad. Biochemical parameters, including asymmetric dimethylarginine (ADMA), lipid profile, C-reactive protein, and cardiac troponin I, were measured. One-way analysis of variance showed significant differences in all parameters among confirmed CAD patients.
Results
Four machine learning models—logistic regression, support vector machine, random forest, and XGBoost—were applied to evaluate the detection capacity of ADMA under two clinical classes: C1) all groups (A, B, and C); and C2) A and C only. In C1, random forest achieved the highest overall area under the curve (AUC): (0.803), while logistic regression and support vector machine showed overfitting driven by MI. In C2, random forest (AUC: 0.822) and XGBoost (AUC: 0.781) maintained clinically relevant discriminatory power. Shapley Additive exPlanations analysis confirmed ADMA as the primary marker in early CAD. This study demonstrates that ADMA, combined with machine learning, enhances the detection of subclinical CAD and provides a more reliable risk-stratification tool prior to progression to acute coronary events.
Conclusion
DMA showed significant differences across groups. Random forest retained its diagnostic ability in suspected cases, supporting early detection. ADMA indicated potential for early detection of CAD in the suspected group. Random forest and XGBoost demonstrate the strongest diagnostic performance for clinical decision-making.
INTRODUCTION
Coronary artery disease (CAD), also known as ischemic heart disease (IHD), continues to cause substantial morbidity and mortality worldwide. In 2023, cardiovascular diseases (CVD) claimed an estimated 19.2 million lives, making them the leading cause of death worldwide.[1] Disability-adjusted life years (DALYs) for the number of individuals suffering from CVD reached 437 million in the same year, indicating the growing global burden.[2] According to the Institute for Health Metrics and Evaluation, age-standardized DALY rates for IHD remain among the greatest globally.[3] Projections suggest that by 2050, the population with IHD may increase to more than double the 2021 level, and associated deaths and DALYs could increase by 80% and 62%, respectively.[4] In the Middle East and North Africa data from 2021 reported particularly high rates of IHD, both in distribution and mortality, compared to many other regions.[5] CAD begins when atherosclerotic plaque develops in the coronary arteries, impeding blood flow and causing myocardial infarction (MI). The non-modifiable risk factors included age and genetics, whereas modifiable risk factors included hypertension, diabetes, smoking, and dyslipidemia. Early detection rises in IHD concerns improved risk and effective preventive strategies are more critical.[6] CAD also imposes a marked economic burden on healthcare systems worldwide. The expenditure includes direct costs from hospitalization, medication, and surgical procedures as well as indirect costs from productivity loss, disability, and premature mortality. Low and middle-income regions including many developed countries aspect additional challenges due to rapidly growing populations, changing lifestyles, and limited resources assigned for disease management.[7] These rising numbers of morbidity and mortality necessitate examination of how CAD develops at the biological and biochemical levels and how early identification can prevent progression to severe clinical conditions.
CAD is a multifactorial disease that begins with endothelial injury, inflammation, and the gradual accumulation of lipids, particularly low-density lipoprotein (LDL) cholesterol, which migrates beneath the endothelial cells of the coronary arteries, allowing oxidation and leading to macrophage recruitment and foam cell formation. As the plaque develops as a result of foam accumulation and deposition causing arterial narrowing over time which limits myocardial oxygen supply leading to stenosis and thereby ischemia and clinical manifestations such as angina or MI.[8] The role of asymmetric dimethylarginine (ADMA) is to reduce nitric oxide production by inhibiting nitric oxide synthase from L-arginine. This decline in nitric oxide concentration leads to endothelial dysfunction and increases vascular stiffness and narrowing.[9] Preventive approaches are now recognized as central to reducing the growing concern about CAD. Lifestyle interventions, including diet modification, regular exercise, and weight management, are essential strategies. Lipid-lowering medications have been highly effective in reducing risk. Public health education programs play an important role in prevention.[10] Early diagnosis of CAD, regardless of advances in clinical evaluation, remains a concern because symptoms may be silent or non-specific until the disease is advanced by MI, making the management of the disease challenging. Traditional diagnostic tools, instrumentation, and lab tests such as electrocardiography, cardiac enzymes, angiography, and stress testing are commonly used.[11]
Previous Related Works
Several investigations have emphasized the continuing importance of traditional lipid and metabolic parameters in the assessment of CAD. For example, Xia et al.[12] found that high-density lipoprotein cholesterol (HDL-C) levels were independently associated with the severity and extent of coronary atherosclerosis, indicating that even in the statin era, HDL-C remains a meaningful marker of CAD burden. Several studies have reported that the triglyceride (TG)/HDL-C[13-15] and LDL-C/HDL-C[16-18] are robustly associated with CAD. Jiang et al.[19] mention that high levels of oxidized LDL induce metabolic stress in primary human aortic endothelial cells. Another study mentioned that a high level of lipoprotein (a) increases the risk of developing CAD independently and additively in both individuals with or without a family history of CAD.[20] Nathir et al.[21] mention that a high level of apoprotein B increases the risk of CAD. Numerous studies mention alteration of the protein levels, such as desmosine,[22] matrix metalloproteinase (MMP-1 and MMP-2),[23] osteoprotegerin,[24] fibroblast growth factor-23.[25] Elevated level of ADMA, as a metabolite (methylated amino acids), inhibits nitric oxide, which leads to endothelial dysfunction.[26] Inflammatory proteins are key to the development of CAD because they drive the process of atherosclerosis, from plaque initiation to rupture. Major inflammatory proteins linked to CAD include C-reactive protein (CRP), which is a general marker of inflammation, and various cytokines like interleukin-6 (IL-6), IL-1, and tumor necrosis factor-alpha, which play specific roles in plaque development and instability.[27, 28] These proteins are not just markers but also contribute to the disease progression, making them potential targets for future therapies.[29] After reviewing the relevant classical studies, it is appropriate to discuss the new statistical models that enable earlier diagnosis and disease prediction.
Machine learning (ML) has been increasingly applied to CAD detection, risk stratification, and outcome prediction, using a wide range of clinical, biochemical, and imaging parameters. In parallel, a growing number of studies have applied ML or artificial intelligence techniques to CAD detection or risk stratification. Many cross-sectional ML studies incorrectly refer to their findings as predictions, when in fact a cross-sectional diagnostic dataset can only support classification of existing disease status rather than forecasting future outcomes. Table 1 summarizes previous studies using ML to investigate biochemical parameters in patients with CVD. Previous studies reported area under the curve (AUC) values ranging from approximately 0.68 to 0.94. Commonly used features included lipid profiles, age, and blood pressure, and these features were used for disease prediction rather than diagnostic classification. However, these findings are presented in comparison to those of the present study, which focuses on disease classification and diagnosis of CAD.[30-35] To avoid this misjudgment, our study specifically focuses on early CAD classification and diagnosis and does not claim long-term prognostic prediction.
Research Gap
Although numerous studies have applied ML to CAD detection and risk prediction, several research gaps remain. Most studies treated CVD as a major challenge, while others focused on CAD versus non-CAD to develop ML models that distinguish between intact and healthy individuals; one exception was a study by Sun et al.,[18] which included a suspected group. This study employed only logistic regression, did not include any ML models, and used relatively routine biochemical parameters. Most previous models focused on limited clinical or lipid parameters and on prediction using cross-sectional datasets and did not integrate broader biochemical and inflammatory markers that reflect the complex pathophysiology of CAD. In addition, many studies relied on large retrospective datasets to distinguish between CAD and non-CAD disease using “predictors” terms of routine laboratory tests, including almost healthy individuals with mild hyperlipidemia and only one focused on suspected CAD. Thus, there is a need for a comprehensive (not predictive) classification approach based on specific CAD-related parameters, in addition to routine biochemical tests, to improve diagnostic accuracy and early risk assessment of CVD. The present work aims to investigate whether the ADMA levels in the suspected CAD group can classify and/or detect CAD early, and this work is designed for diagnostic classification of existing CAD, not for prediction due to the cross-sectional nature of the study.
METHODS
A total of 471 individuals whose ages ranged from 41 to 72 years were selected in the present work. Patients were selected from the Department of Cardiology at Ebn Al-Bitar and Medical City Hospital and Medical City (Madinat Al-Tib) Hospital in Baghdad, Iraq. The study was conducted in accordance with the ethical standards of the Declaration of Helsinki, and approval was granted by the Al-Karkh University of Science (approval number RS2-432KUS102025, date: 05.06.2022). Hospitalized patients, individuals who came to the hospital, were referred from non-specialized clinics or were directly admitted to the department of cardiology for further diagnosis and management from March 2022 to Aug 2025. The American Heart Association/American College of Cardiology (AHA/ACC) guideline[36] diagnostic criteria for CAD were adopted by the department and were as follows: Diagnostic confirmation consistently relies on ischemic symptoms, electrocardiography changes, biochemical parameters such as blood pressure, lipid panel, cTln, sugar, and CRP, and computed tomography coronary angiography ≥50-90% coronary stenosis considered significant CAD and ≥90% indicating severe, MI related to the disease. This protocol was performed by the staff of the cardiology department and began with blood collection before any diagnostic evaluation. The patients were divided into two CAD groups based on the cardiologist’s assessment. Group B (confirmed CAD) included patients with ischemic symptoms, elevated cardiac troponin I (cTnI), and severe coronary artery stenosis ≥90% in one or more major vessels, confirmed by invasive coronary angiography, indicating MI CAD. Group C (suspected CAD) comprised patients with mild ischemic manifestations, angiographic stenosis of 50-90%, and normal troponin levels. The inclusion criteria were as follows: diagnosis by cardiology staff of CAD, with or without suspected MI, based on AHA/ACC criteria. Patients aged 45 years or older were evaluated for suspected CAD and underwent coronary angiography. The exclusion criteria included 1) a history of CAD or coronary revascularization procedures, 2) chronic systemic or inflammatory diseases, 3) liver or renal failure, and 4) malignancy, 5) non-fasted patient. It should be noted that patients with a prior history of CAD or coronary revascularization procedures were excluded because the aim of this study was to investigate early detection of CAD in patients without established disease. The levels of ADMA and cTnI were measured in the research laboratory at Al-Karkh University as cardiac centers do incorporate ADMA in their routine diagnostic protocol and use only qualitative cTnl for rapid decision-making regarding treatment strategy.
Study Design
This study was a cross-sectional diagnostic investigation with the main experimental part conducted in the cardiology department. Angiographic imaging and biochemical tests, such as the rapid cTnI test, CRP, and lipid panel were performed in the cardiology department. Sample size was determined pragmatically based on patient availability during the study period; patients with missing data were excluded, and only complete, verified data were included. The final sample comprises 345 patients. The patients were divided into two groups: the first group, B (n=149, confirmed CAD), included patients with symptoms of severe ischemia and stenosis >90% and positive cTnI, reflecting MI. Group C (n=196) included patients with mild ischemia and stenosis of 50-89% and negative cTnI. Group A comprised 126 control individuals (n=126). The control group (n=126) consisted of individuals who had no clinically documented CAD, selected from the same source population and with the same age and sex distribution as the patient groups to minimize selection bias. The sample was selected from two sites, Al-Karkh University of Science and Al-Nokhba Clinical Laboratory Center, and consisted of individuals who came for routine checkups. Controls had no history of MI, angiographically confirmed coronary lesions, hyperlipidemia, diabetes, or hypertension. A venous blood sample was collected between 7:00 and 9:00 am after 10-14 hours of fasting. The remaining serum sample was stored at -20 °C for further biochemical analysis at the private clinical laboratory in Baghdad. The biochemical reagent of related parameters with the corresponding assay sensitivity, intra and inter-assay precision (CV%) were as follows: ADMA, (0.30 µmol/L, 10-12%) cTnI (0.3 ng/L, 10%) was determined by ELISA manufactured by Biomatic, Ontario, Canada, cholesterol (0.1 mg/dL, 1.1%), TG (0.1 mg/dL, 1.67%, and HDL (0.3 mg/dL, 2.68%) manufactured by Agappe, Greek were determined using a enzymatic colorimetric method on an autoanalyzer (Agappe, Mispa Chem Dx, Greek).
Statistical Analysis
All experimental results were cleaned and formatted; the data were carefully split for ML, and no data leakage was observed. The Shapiro-Wilk test was applied to assess the normality of data distribution. Statistical analysis was performed using Python version 3.12.3. Descriptive statistical analysis expressed as mean ± standard deviation to describe the mean and deviation. One-way analysis of variance (ANOVA) was used to investigate overall variance among groups (A, B, and C) and to evaluate group-level differences in biochemical parameters.
Bias Control
Several potential sources of bias were considered in the current study. To minimize bias, only patients who met the inclusion and exclusion criteria of the study were enrolled. The ML analysis indicated overfitting because models were trained on the complete dataset without external validation.
Machine Learning Analysis
Early diagnostic classification was performed using ML with four models, including logistic regression, random forest (RF), support vector machine (SVM), and XGBoost, with evaluation metrics including accuracy, sensitivity, specificity, positive classification value, recall (F1-score), AUC, mean absolute error, and log loss, to investigate whether ADMA concentration could detect early CAD across different groups and to compare the performance of the models. To assess internal validity and reduce performance variability arising from a single train-test split, k-fold cross-validation was used for model evaluation. To ensure the clinical reliability of the evaluation and to reduce the effect of MI on model outcomes, ML analysis was performed in two classes. In the first class (C1), all study groups (A, B, and C) were included to assess the diagnostic ability of ADMA. In the second class (C2), the analysis was restricted to groups A and C, excluding patients with MI and severe stenosis. This approach allowed us to evaluate the performance of the models under more subtle clinical conditions in which differences in ADMA levels are less well defined and early CAD detection is critical. Decision curves and net benefits were used to display the model’s behavior. In the present study, Shapley Additive exPlanations (SHAP) analysis identified ADMA and CRP as the most influential features in the early-stage C2 model, highlighting their importance in detecting subclinical CAD. Specifically, high ADMA values (Figure 1) were associated with a positive SHAP contribution of approximately 0.35, indicating a strong influence on the model; CRP showed a SHAP contribution reaching approximately 0.3. TG, total cholesterol (TC), and cTnI had lower SHAP contributions at this stage, generally below 0.2. These findings are clinically meaningful, because they reflect the pathophysiological sequence in which ADMA, an indicator of endothelial dysfunction, and CRP act as early measurable signals before substantial lipid derangements progress to MI. Notably, over 60% of the high-risk classifications in C2 were driven primarily by elevated ADMA and CRP, underscoring the importance of these markers in early diagnosis and potential preventive interventions. Figure 1 summarizes the methodological steps evaluated in the present work.
RESULTS
Based on the cross-sectional nature of the current study, the results reflect the distribution of confirmed CAD cases (group B), suspected CAD cases (group C), and controls among Iraqi participants from March 2022 to Aug 2025. A total of 471 blood samples were collected in the present study from n=277 (58.8%) females and n=194 (41.2%) males. Participants were divided into three groups as presented in Table 2.
Biochemical parameters, including AMDA, cTnI, CRP, TC, TG, and HDL-C, were evaluated in both study groups and are presented in Table 3.
The results indicated significant variations in the study groups compared with the control. A significant difference was also observed between groups B and C. The levels of cTnI, ADMA, and CRP were significantly elevated in group B compared to group C, as a direct effect of MI on these parameters.
The performance metrics of the ML models were evaluated for class 1 (C1) to assess classification performance and provide a clear understanding of their effectiveness, as shown in Table 4.
Comparison of the performance of the four ML models reveals notable variability in their diagnostic capabilities. RF demonstrated the best overall balance, achieving the highest accuracy. Strong sensitivity combined with moderate specificity results in reliable detection of both CAD-positive and CAD-negative subjects, while XGBoost specificity shows a bias toward FP. Logistic regression and SVM models exhibited overfitting due to the models’ high sensitivity and low specificity. As discussed above and presented in Table 3, it is clear that most models exhibited overfitting or values approaching overfitting and were unable to distinguish between negative and positive cases in the C1 model analysis. To assess internal validity and to establish that performance was not dependent on a single train-test split, k-fold cross-validation was performed, as presented in Table 4.
As shown in Table 5, the 5-fold cross-validation results were consistent with the independent hold-out evaluation, indicating that RF and XGBoost exhibited the strongest overall classification ability, with AUC values of 0.854 and 0.879, respectively. Some variation in individual metrics was observed. The stability of the model confirms that the selected ensemble methods exhibit good generalizability and a reduced tendency to overfit.
The C1 evaluation allowed exploration of the ability of ADMA to distinguish individuals with any form of CAD from healthy participants. Markedly elevated ADMA, reflecting the acute myocardial injury present in group B patients, caused a very strong class separation, which likely contributed to overfitting of some models above. This MI-driven class produced relatively imbalanced metric values across several models, particularly in those demonstrating close-to-perfect sensitivity but poor specificity. Consequently, the high-performance C1 metrics reflect the ease of differentiating advanced MI-CAD rather than true diagnostic discrimination across disease stages, thereby restricting direct clinical interpretability for early CAD classification and diagnosis.
After analyzing the C1 models, it is to investigate the C2 model, which does not include the group B patients. Table 6 illustrates the 4 ML model groups, A and C. The results indicate that all classes a clinically meaningful detection capability, though with notable variability in performance.
RF showed the highest selectivity with an AUC of 0.822. XGBoost followed closely, with an AUC of 0.781-the highest among models with balanced sensitivity and specificity. Logistic regression and SVM demonstrated weaker variable selection. Sensitivity remained high across most models (0.75-1.00), whereas specificity was low (0.36-0.64), indicating overfitting for most models except XGBoost and RF, which recorded acceptable values of 0.72 and 0.64, respectively. Comparison of the C1 and C2 models’ performances indicated that the C2 models exhibited a slight decrease in accuracy, dropping by 4-12% across models and a reduction in sensitivity, especially in RF and XGBoost. The specificity increased markedly to 1.00 in the SVM model, showing a better balance between true positive and true negative detection, with an AUC of 0.657-0.822, demonstrating that ADMA has diagnostic value for non-MI CAD, particularly under conditions in which biochemical abnormalities are more subtle.
To verify the model’s generalizability for early detection, internal validation was performed using k-fold cross-validation, as shown in Table 7. As shown in Table 6, RF and XGBoost maintained the strongest overall performance, with AUC values of 0.768 and 0.767 respectively, consistent with the independent holdout test. Despite variations in sensitivity and specificity, the model ranking remained unchanged, indicating the robustness of these methods for early detection of CAD. Cross-validation results, therefore, support the validity of the hold-out analyses, whereas calibration and decision-curve assessments were based exclusively on the final detection test for clinical interpretability. Further clarification was obtained using calibration-curve analysis to assess how well each C1 model converts detected values into true clinical measurements, as illustrated in Figure 2. Logistic regression and SVM models exhibit notably poor calibration, as shown by their calibration curves. RF shows improved calibration, with variation above the reference suggesting overfitting. High accuracy combined with moderate specificity indicates that it distinguishes disease severity more effectively; however, it tends to misclassify lower-risk individuals (group C) as higher risk. XGBoost is closest to the perfect calibration line, especially at mid- to high detection probabilities, among the tested models. XGBoost provides the most accurate risk estimates for groups B and C while still partially separating groups B and C from healthy controls.
The calibration curves comparing the models in class 2 highlight important differences in their behavior, as presented in Figure 3. RF shows better alignment, but still deviates at the upper end, indicating slight overfitting in suspected cases. Conversely, XGBoost aligns most closely with the diagonal reference, especially in mid- to high-probability ranges, indicating remarkable calibration and more clinically reliable risk estimates for suspected CAD, compared with controls.
Table 8 compares the calibration for both classes (C1 and C2); RF identified the highest diagnostic performance in the C1 class, with accuracy and specificity of 81.1% and 60% respectively, while RF showed reliable performance in the C2 class. XGBoost also showed strong generalizability across both classes, but had slightly lower specificity and accuracy than RF in C2. Conversely, logistic regression and SVM exhibited an improvement in C2 by about 36%. Overall, RF provides the most robust and clinically meaningful classification, especially for multi-group CAD stratification. Based on class 2, all models showed improved specificity, resulting in a more balanced analysis. RF demonstrated improved and clinically useful performance with a specificity of 64%, confirming its robustness across severity levels. XGBoost also remained strong, with improved specificity (52%), indicating stable generalization. Table 5 summarizes the key metrics that change across C1 and C2 for the two classes.
As demonstrated by the decision curve analysis (Figure 4), excluding group B in class 2 resulted in a noticeable shift in the clinical utility of the models. In the C1, all models recorded a higher overall net benefit across clinically relevant threshold probabilities, particularly at pt=0.01-0.03. This superior performance is expected, as group B patients already exhibit advanced and easily distinguishable disease characteristics, which contribute to a clearer separation between positive and negative cases (Figure 4a). However, in C2, as illustrated in Figure 4b, the net benefit curves shifted to approximately 0.61 at the same thresholds, representing an estimated 16% reduction. The class 2 evaluation provides more clinically applicable and functional diagnostic performance in early CAD detection, where biomarker levels and symptoms are less definitive and more challenging to classify.
Collectively, these findings indicate that although the inclusion of confirmed CAD improves model performance metrics, its exclusion yields a more meaningful appraisal of diagnostic capability in pre-event disease states. Accordingly, RF appears to be the most reliable approach for identifying early high risk of CAD, as evidenced by its stability and consistently positive net benefit under both modeling conditions. A shift in results contributions is observed when comparing the SHAP results from the C1 to the C2 models as shown in Figure 5. In Figure 5a, TG shows the highest mean SHAP contribution of 0.131, followed by TC (0.114). These results indicate that in C2 cases, hyperlipidemia plays the strongest role in distinguishing disease from non-disease states. Troponin (cTnI) remains an important marker (0.106), consistent with myocardial injury in these subjects. ADMA, while elevated (0.108), appears to be masked by other more dominant risk markers in severe conditions. CRP exhibits the lowest influence (0.004). In Figure 5b, with severe MI cases removed and the comparison limited to early versus healthy CAD, the importance profile changes significantly. ADMA was the most influential feature (0.134), followed closely by CRP (0.131), indicating that endothelial dysfunction and inflammation provide stronger detection cues in early stages of CAD. cTnI (0.091) and TC (0.097) show reduced but still notable contributions, while TG drops sharply to 0.010, a >90% reduction compared to Figure 5a. HDL-C remains a consistently minor, yet protective, factor in both models.
DISCUSSION
CAD remains a major global health concern, and early detection before the onset of MI is essential to improve clinical outcomes. This study evaluates the diagnostic performance of ADMA and conventional biochemical parameters and explores advanced ML to enhance CAD risk stratification. Accordingly, the discussion will interpret these results in the context of current evidence, emphasizing their implications for early CAD diagnosis and clinical applicability.
In the present cohort, one-way ANOVA revealed significant differences in most biochemical parameters, with patients in group B showing higher levels of TC, TG, ADMA, CRP, and cTnI, and lower HDL-C, compared with those in the suspected CAD and control groups, supporting a graded worsening of cardiometabolic risk across the spectrum of disease. This pattern is consistent with recent evidence that serum ADMA rises in parallel with the extent and severity of coronary atherosclerosis and may serve as a marker of atherosclerotic burden and adverse outcomes in CAD patients.[37] Likewise, our finding of a less favorable lipid profile in the study groups agrees with contemporary studies showing that elevated TG-rich lipoproteins and reduced HDL-C are independently associated with a greater prevalence and extent of CAD, and higher residual cardiovascular risk despite standard therapy.[38] These ANOVA results indicate that the significant biochemical differences between groups in the present study reflect well-established pathophysiological gradients of dyslipidemia, endothelial dysfunction, and inflammation in CAD compared with controls. Also, differences between groups B and C were observed as a clear MI-driven effect that caused a significant elevation of some biochemical parameters. Since the previous ML work was also cross-sectional but reported prediction results, their findings must be interpreted as diagnostic classification rather than as evidence of prediction from a cross-sectional study. Therefore, the comparison is based on classification performance.
In the present work, ML models indicated differential performance in two distinct classifications, C1 and C2. For the C1, the RF model achieved the most balanced outcomes, with an accuracy of 81.1%, sensitivity of 88.6%, specificity of 60%, and an AUC of 0.803, indicating moderate robustness. However, logistic regression and SVM showed 100% sensitivity and 0% specificity, meaning that all subjects were classified as CAD-positive. This behavior indicates overfitting to the severe CAD signal driven by group B, in which markedly elevated AMDA provided a strong separation, leading to the complete misclassification of healthy individuals. Analysis was restricted to the first class (C1), indicating early or subtle disease, RF maintained strong discrimination with an AUC of 0.822 and reduced overall accuracy (70.8%) and enhanced specificity through models, explaining that earlier inflated performance metrics were triggered by class imbalance. While XGBoost showed slightly higher specificity, better performance in k-fold AUC, and closer calibration, RF was favored due to its higher sensitivity, which is critical for early diagnostic classification and for minimizing missed CAD cases.
By comparison, many prior studies achieved higher AUCs in broader CAD versus non-CAD settings, but did not separate MI from non-MI populations. For example, Zhang et al.[30]conducted a cohort study of 1,647 participants (CAD vs. healthy) and used 6 machine-learning models [k-nearest neighbors, logistic regression (LR), SVM, decision tree, multilayer perceptron, and XGBoost] to predict CAD. They found that the XGBoost algorithm recorded the best score by achieving an AUC of 0.94 using lipid panels and demographic data.[30] Cheng et al.[31] by using a set data of 9,640 adults and using three ML models (Gradient Boosting, RF, and LR) reported an AUC of 0.846 with gradient boosting using routine anthropometric, clinical and physiological parameters (age, sex, body mass index, systolic blood pressure/diastolic blood pressure, liver function test, renal function test, lipid panel, smoking, and alcohol consumption). Bibi et al.[32] investigated 3,316 individuals using a gradient boosting model reported an AUC of 0.910 for overall CVD and 0.841 for CAD subgroups using lipid profile and troponin data. Another cohort study of 7,260 subjects reported an accuracy of 0.871 using XGBoost and ranked LDL‐C, TG, and blood pressure as the top features.[33] None of these prior studies explicitly isolated an MI subgroup and a non-MI suspected CAD group, nor did any study report that CAD patients with current MI were excluded from the present work. As observed in the present work and as it well documented that CAD driven MI strongly affects the levels of TC, TG, HDL-C, and cTnl[39, 40] that act on strong class separation between control and CAD groups which may affect the results of ML models. The present work addresses a more refined clinical question regarding screening for early CAD before MI.
In the C2 models, the output results become substantially more challenging because biomarker abnormalities are less pronounced and MI is absent. Despite this, RF maintained the most reliable diagnostic performance overall, with the highest AUC (0.822) and well-balanced sensitivity and specificity of 75% and 64%, respectively, suggesting that it can identify biochemical risk signatures associated with early CAD development. XGBoost demonstrated slightly higher specificity (72%), which is advantageous for reducing unnecessary interventions. Few previous studies have evaluated early CAD separately from MI but recent report indicate that ML tends to perform less effectively in mild disease due to weaker biomarker contrast for example study performed by Kim et al.[41] in suspected CAD populations reported AUCs in the range of 0.68-0.78 when relying on lipid ratios and carotid intima-media thickness alone with accuracy of 74.6%. These findings support the notion that our C2 performance values reflect a real-world diagnostic challenge rather than indicating model limitations, reinforcing the promise of ML for CAD detection before high-risk features, such as elevated troponin, appear. In contrast to models that perform well only when MI present, our results demonstrate that early CAD detection is feasible before irreversible cardiac damage occurs and support the clinical value of ML as a decision-support tool in screening contexts. Although few prior studies have isolated non-MI CAD populations for diagnostic purposes, they provide a more realistic assessment of early diagnostic utility, emphasizing the importance of distinguishing risk from overt disease.
It should be clarified that d espite the XGBoost model indicating slightly higher specificity of 72% than RF with 64% and higher k-fold AUC with values 0.879 and 0.854 respectively, and closer calibration to the ideal values (line) in certain metrics, RF was selected as the primary classifier because of its higher sensitivity. In early CAD screening, sensitivity is critical to minimize the risk of missing true-positive cases, which could have serious clinical consequences. Therefore, despite XGBoost’s advantages in specificity and calibration, RF provides a better balance for identifying high-risk patients, aligning with the study’s objective of early identification. Future studies could investigate ensemble strategies combining the models to optimize sensitivity and specificity.
Study Limitations
The present work has several limitations. First, the cross-sectional design limits the ability to infer causal relationships between ADMA levels and the presence of CAD. Second, the sample size remains modest, which may limit the statistical power and generalizability of the findings. Third, despite multiple ML models being evaluated, they were trained and evaluated on data from only two cohorts in Baghdad and lacked external validation. The absence of external validation may restrict the evaluation of model robustness and increase the risk of overfitting, thereby reducing confidence in the models’ performance when applied to independent populations.
CONCLUSION
This study demonstrates the clinical potential of ADMA and ML-based analytics for improving the early detection of CAD prior to the onset of MI. The findings emphasize the importance of biomarkers of endothelial dysfunction in the early detection of CAD and support the integration of predictive modeling into preventive cardiology strategies.
Significant elevations of ADMA correlate with CAD severity-driven MI.
ML models confirmed that ADMA independently contributes to CAD detection, especially in the absence of myocardial infarction.
RF and XGBoost demonstrated the strongest diagnostic balance, indicating their suitability as clinical decision support tools.
SHAP analysis highlighted endothelial dysfunction and inflammation as the leading features in early CAD, before overt cardiac injury (troponin rise).
Excluding MI cases provided a more realistic early-screening scenario, showing clinical feasibility for early detection CAD prior to irreversible cardiac damage.


