Background: Early recognition of sepsis remains a major clinical challenge. Conventional clinical scoring systems such as qSOFA, SOFA, SIRS, and NEWS have limited sensitivity for early detection. Machine learning (ML)–based early warning systems have been increasingly developed to leverage electronic health record data for earlier and more accurate sepsis . Objectives: To systematically evaluate and quantitatively synthesize the diagnostic performance of ML-based early warning systems for early sepsis prediction in hospitalized adult patients. Methods: A systematic review and meta-analysis were conducted following PRISMA guidelines. PubMed, Scopus, Web of Science, and related databases were searched for studies published between 2015 and 2025 that evaluated ML-based models for early sepsis prediction in adult populations. Pooled estimates of sensitivity, specificity, area under the receiver operating characteristic curve (AUC), accuracy, precision, and F1 score were calculated using random-effects models. Risk of bias was assessed using the ROBINS-I tool. Results: Thirty studies encompassing approximately 5.5–6.0 million adult patient encounters were included. ML-based early warning systems demonstrated high pooled sensitivity (0.89), specificity (0.87), and overall accuracy (0.88). Discriminative performance was excellent, with a pooled AUC of 0.93. Precision and F1 score indicated balanced diagnostic performance but showed substantial heterogeneity across studies, largely driven by differences in sepsis prevalence, clinical settings, and model thresholds. Most studies exhibited a moderate risk of bias, primarily related to retrospective design. Conclusions: ML-based early warning systems demonstrate robust and superior diagnostic performance for early sepsis prediction compared with conventional clinical scoring tools. While heterogeneity across studies remains substantial, the consistency of pooled estimates supports the potential clinical value of ML-driven approaches. Prospective validation and context-specific calibration are essential for successful real-world implementation.
Sepsis, defined by the Sepsis-3 consensus as life-threatening organ dysfunction resulting from a dysregulated host response to infection, remains a major global public health challenge [1]. Contemporary estimates indicate that sepsis affects approximately 48 to 50 million individuals annually and contributes to nearly 11 million deaths worldwide, accounting for almost one fifth of global mortality [2,3]. The burden is disproportionately higher in low- and middle-income countries, where delayed recognition and limited healthcare resources exacerbate morbidity and mortality [2]. Children under five years of age represent a particularly vulnerable population, contributing substantially to global sepsis incidence [3]. In high income settings, sepsis also imposes a considerable economic burden. In the United States alone, it is among the most expensive causes of hospitalization, with annual healthcare expenditures exceeding USD 20 billion and average per patient costs surpassing USD 30,000 [2,3]. These data underscore the substantial clinical and economic impact of sepsis across healthcare systems.
Timely recognition and early initiation of appropriate therapy are critical determinants of survival in sepsis, as each hour of delay in treatment is associated with increased mortality [4]. However, early diagnosis remains challenging due to nonspecific early clinical features and heterogeneous disease trajectories [4]. Conventional clinical scoring systems, including the Systemic Inflammatory Response Syndrome, Modified Early Warning Score, Sequential Organ Failure Assessment, and quick Sequential Organ Failure Assessment, are widely used for risk stratification but exhibit important limitations [1,5]. The Systemic Inflammatory Response Syndrome demonstrates high sensitivity but poor specificity, whereas the quick Sequential Organ Failure Assessment favors specificity at the expense of sensitivity, particularly in early disease stages [5]. As a result, sepsis is frequently identified only after established organ dysfunction, limiting opportunities for early intervention. Moreover, traditional biomarkers and imaging modalities have shown limited accuracy for early sepsis prediction [6].
In recent years, artificial intelligence and machine learning approaches have been increasingly investigated as tools for early sepsis detection. By leveraging high dimensional electronic health record data, including vital signs, laboratory values, demographic variables, and temporal trends, machine learning models aim to identify subtle physiological patterns preceding clinical recognition of sepsis [7]. A wide range of algorithms has been explored, including tree-based models, neural networks, recurrent and convolutional architectures, and ensemble methods. Individual studies have reported promising predictive performance, often demonstrating earlier detection compared with conventional clinical scoring systems [7,8].
Despite the rapid growth of this literature, substantial variability exists across studies with respect to patient populations, data sources, model architectures, prediction horizons, outcome definitions, and validation strategies. Concerns regarding generalizability, interpretability, risk of bias, and lack of prospective validation persist [8]. Furthermore, reported performance metrics vary widely, making it difficult to draw definitive conclusions regarding the overall diagnostic accuracy and clinical utility of artificial intelligence-based sepsis prediction models. To date, no consensus has been reached regarding which modeling approaches offer the most robust and generalizable performance.
Accordingly, a systematic review and meta-analysis is warranted to quantitatively synthesize the available evidence on artificial intelligence driven models for early sepsis detection. The objectives of this review are to systematically evaluate the performance of machine learning based sepsis prediction models, pool diagnostic accuracy metrics where feasible, assess between study heterogeneity, and critically appraise study quality and risk of bias. By providing a comprehensive and quantitative synthesis of existing evidence, this review aims to clarify the current state of artificial intelligence-based sepsis prediction and identify key gaps to inform future model development, validation, and clinical implementation.
Literature Screening and Study Selection The literature screening and study selection process was conducted in accordance with PRISMA guidelines [9]. Following database searches, all retrieved records were imported into a reference management software and duplicates were removed. Titles and abstracts were independently screened by two reviewers to assess eligibility based on predefined inclusion and exclusion criteria. Full text articles were subsequently reviewed for studies meeting the criteria, including use of artificial intelligence or machine learning models for early sepsis prediction in human populations. Disagreements were resolved through discussion or consultation with a third reviewer. The final set of studies was included for qualitative synthesis and quantitative meta-analysis. Registered with the Prospero with number CRD420251276030. Data Extraction and Statistical Analysis Data extraction was independently performed by two reviewers using a standardized and pre piloted extraction form. Extracted variables included study design, setting, population characteristics, sample size, sepsis definition, type of artificial intelligence or machine learning model, input features, prediction horizon, validation strategy, and reported performance metrics such as area under the receiver operating characteristic curve, sensitivity, specificity, and accuracy. Discrepancies were resolved by consensus. For meta-analysis, pooled estimates of diagnostic accuracy were calculated using random effects models. Between study heterogeneity was assessed using the I² statistic, and subgroup analyses were performed where feasible. Risk of Bias Assessment Risk of bias was independently assessed by two reviewers using the Risk Of Bias In Non randomized Studies of Interventions tool (Robins-I) [10]. The following domains were evaluated: bias due to confounding, participant selection, classification of interventions, deviations from intended interventions, missing data, outcome measurement, and selection of the reported result. Disagreements were resolved by consensus or third reviewer adjudication, and overall risk of bias was classified as low, moderate, serious, or critical.
This systematic review and meta-analysis demonstrate that machine learning–based early warning systems provide high diagnostic performance for early sepsis prediction, with consistently strong sensitivity, specificity, discriminative ability, and overall accuracy across diverse clinical settings. Compared with conventional clinical scoring systems, these models show superior capacity for early risk stratification while maintaining acceptable false-positive rates. Despite substantial between-study heterogeneity, the robustness of pooled estimates supports the potential clinical value of machine learning–driven approaches. Future efforts should focus on prospective validation, local calibration, and seamless clinical integration to optimize real-world implementation and improve patient outcomes in sepsis care. Conflict of Interest The authors certify that there is no conflict of interest with any financial organization regarding the material discussed in the manuscript. Funding The authors report no involvement in the research by the sponsor that could have influenced the outcome of this work. Authors’ contributions. All authors contributed equally to the manuscript and read and approved the final version of the manuscript. Acknowledgement This paper is the collaborative work of all authors under the mentorship for the research work from BIR (Biomedical and International Research). We all authors acknowledge this mentorship for this meta-analysis.
[40] Fleuren LM, Klausch TLT, Zwager CL, et al. Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy. Intensive Care Med. 2020;46(3):383–400. doi:10.1007/s00134-019-05872-y