Global forecasting models for dengue outbreaks in endemic regions: a systematic review
- Authors: Sutriyawan A.1,2, Rahardjo M.1, Martini M.1, Sutiningsih D.1, Rattanapan C.3, Kassim N.F.4
-
Affiliations:
- Diponegoro University
- Bhakti Kencana University
- Mahidol University
- Universiti Sains Malaysia
- Issue: Vol 102, No 3 (2025)
- Pages: 331-342
- Section: ORIGINAL RESEARCHES
- URL: https://microbiol.crie.ru/jour/article/view/18837
- DOI: https://doi.org/10.36233/0372-9311-694
- EDN: https://elibrary.ru/RDIEND
- ID: 18837
Cite item
Full Text
Abstract
Background. Dengue is a rapidly spreading mosquito-borne disease, posing significant global health challenges, particularly in endemic regions. Recent years have witnessed an increase in the frequency and intensity of dengue outbreaks, necessitating robust forecasting models for early intervention.
This systematic review aims to synthesize recent literature on dengue forecasting models, evaluate their predictive performance, and identify the most effective approaches.
Materials and methods. A comprehensive search in Scopus, PubMed, ScienceDirect, and Springer databases was conducted following PRISMA guidelines. Studies were selected based on strict inclusion and exclusion criteria, and the quality of the research was evaluated using TRIPOD criteria. Out of 1,366 identified studies, 13 met the eligibility criteria. Data were extracted and analyzed to assess the accuracy and validity of the forecasting models employed.
Results. The findings indicate that machine learning-based models, particularly random forest, outperform conventional statistical models such as ARIMA and Poisson regression. Additionally, climate data — especially temperature and rainfall play a critical role in forecasting dengue incidence.
Conclusion. The present study corroborates the superior efficacy of machine learning-based forecasting models, particularly random forest, in forecasting dengue cases compared to conventional statistical methods. This finding provides a foundation for the development of an enhanced early warning system to address future outbreaks of dengue.
Full Text
Introduction
Dengue is one of the fastest spreading mosquito-borne disease, especially in tropical and subtropical regions, caused by various types of dengue viruses [1, 2]. The World Health Organization has reported an 8-fold increase in global dengue incidence between 2000 and 2019. In 2023, over 5 million cases were documented across 80 countries, with at least 23 nations experiencing dengue outbreaks. That number has more than doubled in 2024, with more than 10.6 million cases reported in North and South America alone. However, the actual number of cases is likely significantly higher, emphasizing the urgent need for effective public health interventions to mitigate this escalating crisis [3]. Although most infections are harmless, dengue shock syndrome and dengue are severe forms of infection that can lead to death [4, 5]. In the absence of a specific drug or vaccine for this virus, case fatality rates can reach 20% if diagnosis is not prompt [6], particularly in resource-constrained areas. When outbreaks occur on a large scale, the sheer number of severe dengue cases can overwhelm the health system and impede the delivery of optimal care. Dengue also poses a huge social and economic burden to many tropical countries where the disease is endemic [7]. Precise prediction of outbreak size and trends in disease incidence early can limit further spread [8], and help better plan health resource allocation to meet needs during an outbreak.
The two principal vectors are Aedes aegypti and A. albopictus, which are capable of transmitting dengue. The transmission of dengue is influenced by a number of factors, including environmental and climate change, urbanization, globalization, vector activity, and behavioral change [9]. The interaction between humans, climate, and mosquitoes gives rise to a complex system that exerts a profound influence on dengue transmission patterns, which in turn affects the likelihood of outbreaks [10]. This relationship has been researched for decades through the development of forecasting models in different parts of the world. These models vary widely, both in terms of purpose [11, 12], and setting [13–15]. While many of these models demonstrate excellence in various tasks, to create efficient prediction models, a systematic, adaptive and generalizable framework is needed, capable of identifying weather- and population-related patterns of vulnerability across geographic regions. The scientific community has not yet reached agreement on which models provide the best predictions. There are many research reports on prediction tools for dengue outbreaks [16–19]. However, research that provides a comprehensive summary of the performance and predictive ability of these tools remains limited. Previous studies have underscored the value of integrating diverse epidemiological tools, including mapping and mathematical models, to develop an effective early warning system [20]. However, this study did not prioritize the identification of significant predictors in the development of an early warning system for dengue. Other studies that emphasize early warning systems and incorporate numerous case forecasting models have been conducted, but this study solely examines the case experience of the various models utilized [21].
Various forecasting models have been developed over the years, integrating epidemiological, environmental, and climatic variables. While some models rely on traditional statistical methods such as Autoregressive Integrated Moving Average (ARIMA) and Poisson regression [14, 22–24]. Emerging research highlights the superior accuracy of machine learning models, particularly random forest and Long Short Term Memory (LSTM) [25, 26]. However, there is still no consensus on the most effective forecasting approach. To address this research gap, several recent studies have explored novel methodologies in dengue forecasting. Recent studies indicate that integrating deep learning techniques, such as LSTM and transformer models, significantly improves prediction accuracy compared to conventional statistical models [27]. Furthermore, recent findings suggest that incorporating real-time meteorological and mobility data improves forecasting precision [28]. These updated approaches not only improve prediction accuracy but also enhance model adaptability across different geographical regions. Despite these advancements, inconsistencies in data quality, limited external validation, and computational constraints continue to pose challenges in real-world applications. This review focuses on determining which model exhibits the highest accuracy and examining its internal and external validity. Its objective is to synthesize recent literature on dengue case forecasting, discuss related evidence, and evaluate different models' forecasting performance to identify the most effective one.
Materials and methods
This review used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) approach, which includes methods for determining resources, eligibility, inclusion and exclusion criteria, and the process of systematic review, extraction, and analysis of data from the available literature [7]. PRISMA 2020 replaces the previous edition published in 2009, introducing new reporting guidelines that include more comprehensive study identification, selection, scoring, and synthesis methods [29]. This guide enables the search for terms relevant to the review and provides advice on aspects that need to be addressed in the review report for publication purposes [21].
Research Question Formulation
Research questions were developed using PICo, a useful tool to help frame relevant research questions for systematic reviews. The PICo concept incorporates three important elements (population or problem, importance, and context).[30] Based on PICo, the three main components in this review are dengue (Problem), case forecast model (Importance), and case prediction (Context). These concepts guided the formulation of the research question: “What is the evidence of the dengue case forecast model and its performance in predicting cases?”
Systematic Searching Strategies
Systematic searching strategies include identification, screening, and eligibility process.
Identification
In the identification stage, synonyms and variations were used to enrich the keywords, then applied in the search process, search strings were created and generated by using Boolean operators and keyword search, as illustrated in Table 1. A systematic literature search was conducted against four major databases: Scopus, PubMed, ScienceDirect, and Springer, and identified a total of 1366 relevant records. 16 duplicate records were found and removed, leaving 1,350 records for title screening. All potential records were then exported from the databases and organized into Excel sheets for title and abstract screening.
Table 1. Keywords search used in the screening process
Databases | Keywords used |
Pubmed | ((((((((((((((((((((((((dengue fever) OR (dengue incidence)) OR (dengue outbreaks)) OR (dengue epidemic)) AND (forecasting models)) OR (predictive models)) OR (prediction models)) OR (epidemic forecasting)) OR (outbreak prediction)) AND (machine learning)) OR (statistical models)) OR (ARIMA)) OR (regression models)) OR (random forest)) OR (neural networks)) OR (support vector machines)) AND (environmental factors)) OR (climate variables)) OR (temperature)) OR (rainfall)) OR (humidity)) OR (climate data)) OR (weather patterns)) AND (endemic regions)) AND (tropical areas) |
Scopus | TITLE-ABS-KEY ("dengue fever" OR "dengue incidence" OR "dengue outbreak*" OR "dengue epidemic*") AND ("forecast* model*" OR "predict* model*" OR "prediction model*" OR "epidemic forecast*" OR "outbreak prediction") AND ("machine learning" OR "statistical model*" OR "ARIMA" OR "regression model*" OR "random forest" OR "neural network*" OR "support vector machine*") AND ("environment* factor*" OR "climate variable*" OR "temperature" OR "rainfall" OR "humidity" OR "climate data" OR "weather pattern*") AND ("endemic region*" OR "tropical area*" OR "high-risk area*" OR "disease-endemic region*") |
ScienceDirect | Search 1: ("dengue fever" OR "dengue incidence") AND ("forecasting models" OR "prediction models") Search 2: ("dengue fever" OR "dengue incidence") AND ("prediction models" OR "outbreak prediction") AND ("machine learning" OR "statistical models") Search 3: ("dengue fever" OR "dengue outbreaks") AND ("predictive models" OR "forecasting models") AND ("environmental factors" OR "temperature" OR "rainfall") |
Springer | ("dengue fever" OR "dengue incidence" OR "dengue outbreaks") AND ("forecasting models" OR "predictive models") AND ("machine learning" OR "statistical models" OR "ARIMA") AND ("environmental factors" OR "climate" OR "rainfall") AND ("endemic regions" OR "tropical areas") |
Screening
Two authors were responsible for the screening of titles and abstracts, which was conducted in accordance with the review questions that had been developed and the specific inclusion and exclusion criteria that had been established. Inclusion criteria were primary research in peer-reviewed journals and English-language articles. We excluded systematic review articles, books, conference proceedings, and non-peer-reviewed articles, such as editorials, commentaries, opinion pieces, or short reports. The screening process resulted in the elimination of 1,120 articles that were deemed irrelevant to the review. The remaining 230 articles were then read in full, including the abstracr reading, and assessed for eligibility.
Eligibility
A total of 64 full-text articles were retrieved for eligibility. Two authors independently reviewed all full-text articles for eligibility. All studies found to be unrelated to the interest and outcome of interest were excluded. The reasons for article exclusion were notated. There were 51 articles excluded due to:
- studies that did not focus on predicting the number of future cases (n = 14);
- studies that used or evaluated prediction or forecasting models, including machine learning methods (random forests, LSTM) or statistical models (such as ARIMA, Seasonal Autoregressive Integrated Moving Average (SARIMA), regression) (n = 19);
- articles that did not involve key climate variables in the forecasting (n = 11);
- studies conducted in non-endemic or low prevalence dengue areas (n = 7).
The remaining 13 eligible articles were continued for the quality assessment process.
Quality Assessment
The quality of the study was assessed using the quality assessment criteria described in TRIPOD (Transparent Reporting of multivariable prediction models for Individual Prognosis or Diagnosis) [31]. The TRIPOD statement is a checklist of 22 items, which are considered essential for the proper reporting of research that develops or validates multivariable prediction models [32]. The TRIPOD guidelines explicitly cover the development and validation of prediction models for diagnosis and prognosis across all medical domains and predictor types. Two authors conducted the quality assessment independently. Scores for report levels were obtained by awarding one point for each reported item relevant to the study. The total score was converted to a percentage based on the maximum possible score. Ultimately, 17 articles (with a percentage score > 70%) were included in the review [21]. Table 2 presents the scores and percentages of each quality assessment adapted from the TRIPOD checklist.
Table 2. Quality appraisal score of eligible articles adapted from TRIPOD checklist [32, 42]
Daftar periksa | Item | Source | ||||||||||||
[25] | [26] | [27] | [28] | [33] | [34] | [35] | [36] | [37] | [38] | [39] | [40] | [41] | ||
Title and abstract | ||||||||||||||
Title | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Abstract | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Introduction | ||||||||||||||
Background and objectives | 3a | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
3b | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
Methods | ||||||||||||||
Source of data | 4a | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
4b | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | |
Participants | 5a | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
5b | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
Outcome | 6a | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Predictors | 7a | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Sample size | 8 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Missing data | 9 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Statistical analysis methods | 10a | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
10b | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
10d | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
Results | ||||||||||||||
Participants | 13a | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
13b | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
Model development | 14a | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
14b | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | |
Model specification | 15a | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 |
15b | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
Model performance | 16 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Discussion | ||||||||||||||
Limitations | 18 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Interpretation | 19b | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Implications | 20 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Other information | ||||||||||||||
Supplementary information | 21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Funding | 22 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 |
Final score | 27 | 20 | 20 | 20 | 24 | 20 | 21 | 25 | 20 | 20 | 20 | 20 | 20 | |
Percentage | 100 | 74.1 | 74.1 | 74.1 | 88.9 | 74.1 | 77.8 | 92.6 | 74.1 | 74.1 | 74.1 | 74.1 | 74.1 |
Data Extraction and Synthesis
The author extracted the data independently using a standardized data extraction form and organized it in a Microsoft Excel worksheet. The information collected included: author (year), country, study design, candidate predictors, research, data frequency, model techniques used, model performance, outcome, model accuracy, evaluation. The PRISMA flowchart is shown in Figure 1.
Fig. 1. Systematic review flow.
Results
Study characteristics
A total of 13 studies met the eligibility criteria and were included in this systematic review. Of these 13 studies, 4 (31%) were conducted in the Americas, 4 (31%) in East Asia, 4 (31%) in Southeast Asia, and 1 (7%) in South Asia. Brazil was the country with the highest number of eligible studies (n = 4) [25, 26, 33, 34], followed by China (n = 2) [27, 35], Taiwan (n = 2) [36, 37], Vietnam (n = 2) [28, 38]. Other studies were conducted in Malaysia [39], Sri Lanka [40], and the Philippines [41]. Five (42%) studies were published between 2015 and 2020, 9 studies between 2018–2022, and 7 (58%) studies were published between 2021–2024. Most studies (46%) used weekly time units, there were 23% studies using monthly data units, and the rest using annual and yearly. More than half (n = 7; 54%) of the studies used machine learning model techniques [25–28, 33, 36, 39], and the remaining (n = 5; 46%) studies used statistical model techniques [34, 35, 37, 38, 40, 41]. The characteristics of the included studies are summarized in Fig. 2. Details of the characteristics within each study are presented in Table 3.
Fig. 2. Study characteristics
Table 3. The details for characteristic and main findings of each study
Source | Country | Study Design | Candidate predictors | Data Unit | Model techniques used | Model performance | Outcome | Model Accuracy | Evaluation |
[25] | Brazil | Observational Study | Rainfall, maximum temperature, minimum temperature, relative median temperature, insolation, rate of evaporation, median relative humidity, median wind speed | Monthly | Machine Learning (Random Forests, Gradient Boosting, Multilayer Perceptron, Support Vector Regression) | RMSE, MAE (Lowest errors with Random Forests) | Monthly cases of dengue | RMSE: 15.5 = 84.5% MAE: 11.9 = 88,1% | Internal and External |
[26] | Brazil | Comparative Study | Historical dengue cases, climate variables, tweets | Weekly | Machine Learning (LSTM, Random Forest, LASSO) | MSE, MSLE | Forecasting dengue incidence | LSTM = MSE = 0,04 (96%), MSLE = 0,01 (100%) Random Forest = MSE = 0,17 (83%), MSLE = 0,13 (87%) LASSO = MSE = 0,4 (60%), MSLE = 0,33 (67%) | Internal and external |
[27] | China | Spatiotemporal Analysis | Imported cases, Tmin, Forest, Pop, Prec, Tmean, GDP, RH, Cropland, Tmax, Impervious, Water | Daily | Random Forest, Gradient Boosting Machine, Support Vector Machine | AUC | Dengue incidence | AUC = 0.91 (91%) | Internal and external |
[28] | Vietnam | Observational | Climate data (temperature, precipitation, humidity, evaporation, sunshine hours) | Daily | Machine Learning (LSTM, LSTM-ATT, CNN, Transformer) | RMSE and MAE | Forecasting dengue fever incidence | RMSE: 1.60 MAE: 1.95 Accuraty rate 100% | Internal only |
[33] | Brazil | Quantitative research design | Epidemiological data, Google search data, Weather | Weekly | Random Forest, LASSO Regression | RMSE, R², Pearson Correlation | Dengue incidence | LASSO = 70%-90% Up to 90% | Internal only |
[34] | Brazil | Ecological Time-Series Study | Climatic, environmental, social factors | Monthly | Statistical models (ARIMA, ETS, TBATS, BATS, STLM, StructTS, NNETAR, ELM, MLP, null model) | MAPE, Relative MAPE, Theil’s U | Dengue cases | ARIMA and TBATS are the best models in various time horizons (12 months, 6 months, dan 3 months) Model accuracy not mentioned | Internal only |
[35] | China | Time series analysis | Imported cases, Minimum temperature, Accumulative precipitation | Monthly | Time series Poisson regression | R² | Dengue outbreaks | R² = 0.98 (98%) | Internal only |
[36] | Taiwan | Observational Study | Meteorological variables, AQI, vector data | Daily | Machine Learning (Random Forest, XGBoost, Logistic Regression) | AUC | Dengue fever incidence | Random Forest: AUC = 0.9547, Accuracy = 89.94% XGBoost: AUC = 0.9329 Logistic Regression: AUC = 0.7905 | Internal only |
[37] | Taiwan | Observational Study | Minimum temperature, Maximum cumulative rainfall | Yearly | Poisson Regression | MSE | Dengue incidence | MSE for validation set = 2.21 MSE for training set = 2.11 | Internal only |
[38] | Vietnam | Observational Study | Climate variables (temperature, humidity, precipitation), time-shifted variables | Weekly | SARIMAX XGBoost LSTM Negative Binomial Regression | MAE, RMSE, AIC | Weekly dengue case counts | SARIMAX = 25.678 (83.33%) XGBoost = 21.409 (100%) LSTM = 30.456 (70.34%) Negative Binomial Regression = 22.345 (95.78%) | Internal only |
[39] | Malaysia | Time Series Analysis | Epidemiological (notified cases, onset cases, interventions), Environmental (rainfall, temperature, humidity) | Weekly | Random Forest Support Vector Machine (SVM) Artificial Neural Network (ANN) Autoregressive Distributed Lag (ADL) Hierarchical Forecasting (Optimal Combination) Hierarchical Forecasting (Bottom Up) | MAPE | Dengue outbreak forecasting | Random Forest = 95% (with all factors) SVM = 92.47%; ANN = 86.10% ADL = 85.70% Hierarchical Forecasting (Optimal Combination) = 85.67% Hierarchical Forecasting (Bottom Up) = 84.85% | Internal only |
[40] | Sri Lanka | Time Series Analysis | Historical dengue incidence data | Weekly | Modified ARIMA (Statistical) | MAPE | Dengue incidence forecast | MAPE: 1.554 (44.6%) (Validation), 0.3184 (Training) (68.16%) | Internal only |
[41] | Philippines | Hybrid Model Development | Dengue incidence, climate data, past incidence | Weekly | ARIMA, NNAR, ANN, SVM, LSTM | RMSE, MAE, SMAPE | Dengue outbreaks | Hybrid ARIMA-NNAR: ~85% | Internal only |
Approach and Accuracy of Forecasting Model for dengue cases
Various modeling approaches, such as machine learning and statistical methods for dengue case experience have been used in all included studies. Out of 13 studies, 6 (26,1%) used random forest approach [25–27, 33, 36, 39], 5 (21,7%) used LSTM approach [26, 28, 34, 38, 41], 3 (13%) used ARIMA [34, 40, 41], 2 others used Least Absolute Shrinkage and Selection Operator (LASSO), Gradient Boosting, XGBoost poisson regression, SARIMA. In terms of perfomance, all studies use different methods, including Root Mean Squared Error (RMSE), R-Squared (R²), Pearson Correlation, Mean Absolute Percentage Error (MAPE), RMSE, Mean Absolute Error (MAE), Area Under the Curve (AUC), Mean Squared Error (MSE), Mean Squared Logarithmic Error (MSLE), Akaike Information Criterion (AIC). The type of model used can be seen in Fig. 3.
Fig. 3. Type of model technique used.
Of the 13 articles included, there are 3 best forecasting methods with the highest model accuracy, namely random forest, LSTM, and LASSO. 6 articles using the random forest method, showed an average model accuracy of 89% [25–27, 33, 36, 39], from 5 articles using the LSTM method, there are 3 articles that show model accuracy, and the average obtained is 89% [26, 28, 38], while the other 2 articles do not mention the percentage of model accuracy [34, 41]. Of the 2 articles that used the LASSO method, the average model accuracy was 65% [26, 33]. The accuracy of the forecasting models can be seen in Fig. 4. In general, all of the case experience models included in the study showed fairly good forecasting ability. Overall, climate indicators were the most frequently used in showing the best performance. However, there are studies that used a combination of climate and epidemiological indicators, which showed that previous dengue cases significantly influenced current dengue cases [39].
Fig. 4. Average model accuracy
Random forest model accuracy
The Figure 5 illustrates the accuracy of various random forest models applied in dengue forecasting studies. The dataset includes models developed by six original research, with accuracy values ranging from 83% to 92%. The average model accuracy is recorded at 89%. The results highlight the superior predictive performance of random forest models in dengue incidence forecasting, reinforcing their potential for integration into early warning systems for outbreak management.
Fig. 5. Random forest model accuracy.
Discussion
This systematic review aims to summarize and discuss the evidence of various dengue case forecasting methods, model performance, and their ability to explain dengue incidence. This review shows that dengue prediction studies have become a topic of research interest, especially in Asia, where 69% of these included studies were conducted in Asia. This trend is due to the fact that the Asian region represents about 70% of the dengue burden globally [43]. Climate data, particularly temperature, rainfall and humidity are important predictors of dengue incidence, but they are often not available in time for health providers working on dengue early warning systems. Several studies have found that countries with better meteorological records provide higher performance metrics [25, 34, 35]. Therefore, integration with local meteorological departments on real-time meteorological data will improve access to meteorological information and benefit end users in early outbreak detection.
In general, climatic variables show an important role in the prediction of dengue cases. Climate variables such as mean temperature [25, 27, 28, 38, 39], minimum temperature [27, 35–37], maximum temperature [27, 37, 38], rainfall [27, 28, 36, 37, 39], humidity [25, 33, 39, 40], relative humidity [25, 28, 33], wind speed [25, 28, 33], evaporation and sunshine [28] are important input paramaters in the development of dengue incidence prediction models. Temperature showed the best predictive capacity of the meteorological variables studied in this review. In Vietnam, temperature was a significant predictor in the best dengue forecasting model, where the AUC and sensitivity were 87.42% and 96.88%, respectively [28]. In Ba Ria Vung Tau Province, Vietnam reported temperature and humidity as reliable variables in predicting dengue cases, where the AUC and sensitivity were 90.00% and 85.00%, respectively [38]. Meanwhile, Taiwan showed that temperature and rainfall are important factors in predicting dengue cases, where the AUC and sensitivity are 88% and 80% respectively [37].
In general, the dengue case prediction models included in the studies demonstrated a relatively high level of predictive ability. However, the predictive accuracy of these models varies considerably depending on the specific model employed and the quality of the data used. The most commonly utilized statistical modeling techniques in dengue research are ARIMA, Generalized Additive Models (GAM), Negative Binomial Regression, and Poisson Regression. ARIMA and GAM are established models for examining the relationship between environmental factors and disease outcomes, as well as for conducting time series prediction analysis [44, 45]. According to recent literature, time series techniques are particularly considered effective in predicting the highly auto-correlated nature of dengue infections [46]. In recent years, data-driven techniques based on machine learning algorithms such as Random Forest, Decision Tree, Support Vector Machine (SVM), and Naïve Bayes have shown promising results in predictive analysis for classification problems [47].
More than half of the included studies rely on machine learning methods, particularly supervised learning models, to assess conventional and novel data streams. Supervised learning models are defined by the use of labeled data sets to train algorithms to accurately classify data or predict outcomes [21]. The advantages of machine learning techniques that demonstrate lower error rates in comparison to conventional statistical-based models in predicting dengue cases are manifold. In the era of big data, this technique can utilize the availability of data and, in addition to being non-parametric, it can also provide leeway in terms of strict assumptions [7]. Random forest, neural network, gradient boosting, and support vector algorithms are part of important machine learning algorithms, which have made significant contributions to several areas of public health, especially in forecasting infectious diseases such as COVID-19 [48], malaria [49], and have similar uses for making dengue outbreak predictions [7].
In some of the studies included in this literature, we assume that the machine learning method using random forest is the best method at the moment. Findings in Brazil state that the accuracy of this model in recognizing dengue cases is more than 90% [33]. Likewise, findings in Malaysia state that the accuracy of this model reaches 95% [39]. Similar findings in another study in Singapore, which stated that the potential of random forest and its strong predictive ability in clustering the spatial risk of dengue transmission in Singapore. The dengue risk map generated using random forest has high accuracy and is a good tool to guide vector control operations, allowing targeted preventive measures before and during dengue outbreaks [50].
All studies employed internal validation to assess the accuracy of their findings. The utility of a forecasting model is contingent upon the certainty of its accuracy, or the extent to which it can predict real-world outcomes [51]. It is notable that the majority of published models have not undergone or been subjected to real-world validation. It is reasonable to conclude that models are unlikely to perform as well in real-world samples as they do in derived samples. This discrepancy, or validity shrinkage, is often significant. Consequently, it would be beneficial for future models to include mechanisms for estimating and reporting potential validity shrinkage, as well as predictive validity, in real-world data [52, 53]. External validation, on the other hand, was only used in a few studies that included [25–27]. This is despite the fact that external validation is considered very important for model development and is a key indicator of model performance by highlighting its applicability to participants, centers, regions or environments [54], It is imperative that external validation be employed during the process of model redevelopment. This entails making adjustments, updates, or recalibrations to the original model based on validation data, with the objective of enhancing its performance [55].
It should be noted that this systematic review is not without limitations. Firstly, the majority of the included studies originate from Asia, which encompasses a multitude of non-English speaking countries. Consequently, this review may have overlooked a substantial corpus of related literature published in other languages. Secondly, the inclusion criteria stipulated the necessity for studies to be derived from primary research in peer-reviewed journals. Consequently, preprints and grey literature, such as conference abstracts, committee and government reports, were excluded. It is therefore possible that some studies may have been omitted from our review.
Conclusion
The forecasting of dengue cases is a valuable resource for policymakers engaged in the formulation of strategies for the prevention of dengue outbreaks, particularly in regions where the disearse is endemic. The results of this systematic review indicate that the machine learning method utilizing the random forest algorithm is more effective than others method, particularly in comparison to statistical methods. Furthermore, this systematic review presents evidence of predictors in dengue case experience that focuses on incorporating climatic factors to create an early warning system, which can be utilized as a reference for preventing dengue transmission. The findings from this review have the potential to form the basis for more effective modelling practices in the future. These findings will contribute to the development of robust modelling across differenctt settings and populations and have significant implications for planning and decision-making processes for early dengue intervention and prevention.
About the authors
Agung Sutriyawan
Diponegoro University; Bhakti Kencana University
Author for correspondence.
Email: agung.epid@gmail.com
ORCID iD: 0000-0002-6119-6073
researcher, Diponegoro University; Head, Department of public health, Faculty of health sciences, Bhakti Kencana University
Индонезия, Semarang; BandungMursid Rahardjo
Diponegoro University
Email: mursidraharjo@gmail.com
ORCID iD: 0000-0003-4791-1242
senior researcher, Department of environmental health, Faculty of public health
Индонезия, SemarangMartini Martini
Diponegoro University
Email: martini@live.undip.ac.id
ORCID iD: 0000-0002-6773-1727
senior researcher, Department of epidemiology, Faculty of public health
Индонезия, SemarangDwi Sutiningsih
Diponegoro University
Email: dwi.sutiningsih@live.undip.ac.id
ORCID iD: 0000-0002-4128-6688
senior researcher, Department of epidemiology, Faculty of public health
Индонезия, SemarangCheerawit Rattanapan
Mahidol University
Email: cheerawit.rat@mahidol.ac.th
ORCID iD: 0000-0002-1799-422X
senior researcher, ASEAN Institute for Health Development
Таиланд, BangkokNur Faeza Abu Kassim
Universiti Sains Malaysia
Email: nurfaeza@usm.my
ORCID iD: 0000-0001-6620-8603
senior researcher, School of biological sciences
Малайзия, PenangReferences
- Sarker R., Roknuzzaman A.S.M., Haque M.A., et al. Upsurge of dengue outbreaks in several WHO regions: Public awareness, vector control activities, and international collaborations are key to prevent spread. Health Sci. Rep. 2024;7(4):e2034. DOI: https://doi.org/10.1002/hsr2.2034
- Hossain M.S., Noman A.A., Mamun S.M.A.A., Mosabbir A.A. Twenty-two years of dengue outbreaks in Bangladesh: epidemiology, clinical spectrum, serotypes, and future disease risks. Trop. Med. Health. 2023;51(1):37. DOI: https://doi.org/10.1186/s41182-023-00528-6
- CDC. Dengue on the Rise: Get the Facts. Available at: https://cdc.gov/dengue/stories/dengue-on-the-rise-get-the-facts.html
- Trivedi S., Chakravarty A. Neurological complications of dengue fever. Curr. Neurol. Neurosci. Rep. 2022;22(8):515–29. DOI: https://doi.org/10.1007/s11910-022-01213-7
- Umakanth M., Suganthan N. Unusual manifestations of dengue fever: a review on expanded dengue syndrome. Cureus. 2020;12(9):e10678. DOI: https://doi.org/10.7759/cureus.10678
- Capeding M.R., Tran N.H., Hadinegoro S.R., et al. Clinical efficacy and safety of a novel tetravalent dengue vaccine in healthy children in Asia: a phase 3, randomised, observer-masked, placebo-controlled trial. Lancet. 2014;384(9951):1358–65. DOI: https://doi.org/10.1016/S0140-6736(14)61060-6
- Leung X.Y., Islam R.M., Adhami M., et al. A systematic review of dengue outbreak prediction models: Current scenario and future directions. PLoS Negl. Trop. Dis. 2023;17(2):e0010631. DOI: https://doi.org/10.1371/journal.pntd.0010631
- Chen H.L., Hsiao W.H., Lee H.C., et al. Selection and characterization of DNA aptamers targeting all four serotypes of dengue viruses. PLoS One. 2015;10(6):e0131240. DOI: https://doi.org/10.1371/journal.pone.0131240
- Zhu G., Liu J., Tan Q., Shi B. Inferring the spatio-temporal patterns of dengue transmission from surveillance data in Guangzhou, China. PLoS Negl. Trop. Dis. 2016;10(4):e0004633. DOI: https://doi.org/10.1371/journal.pntd.0004633
- Teurlai M., Menkès C.E., Cavarero V., et al. Socio-economic and climate factors associated with dengue fever spatial heterogeneity: a worked example in New Caledonia. PLoS Negl. Trop. Dis. 2015;9(12):e0004211. DOI: https://doi.org/10.1371/journal.pntd.0004211
- Phung D., Talukder M.R., Rutherford S., Chu C. A climate-based prediction model in the high-risk clusters of the Mekong Delta region, Vietnam: towards improving dengue prevention and control. Trop. Med. Int. Health. 2016;21(10):1324–33. DOI: https://doi.org/10.1111/tmi.12754
- Medlock J.M., Leach S.A. Effect of climate change on vector-borne disease risk in the UK. Lancet Infect. Dis. 2015;15(6):721–30. DOI: https://doi.org/10.1016/S1473-3099(15)70091-5
- Benedum C.M., Seidahmed O.M.E., Eltahir E.A.B., Markuzon N. Statistical modeling of the effect of rainfall flushing on dengue transmission in Singapore. PLoS Negl. Trop. Dis. 2018;12(12):e0006935. DOI: https://doi.org/10.1371/journal.pntd.0006935
- Gharbi M., Quenel P., Gustave J., et al. Time series analysis of dengue incidence in Guadeloupe, French West Indies: forecasting models using climate variables as predictors. BMC Infect. Dis. 2011;11:166. DOI: https://doi.org/10.1186/1471-2334-11-166
- Betanzos-Reyes Á.F., Rodríguez M.H., Romero-Martínez M., et al. Association of dengue fever with Aedes spp. abundance and climatological effects. Salud Publica Mex. 2018;60(1):12–20. DOI: https://doi.org/10.21149/8141
- Gluskin R.T., Johansson M.A., Santillana M., Brownstein J.S. Evaluation of Internet-based dengue query data: Google Dengue Trends. PLoS Negl. Trop. Dis. 2014;8(2):e2713. DOI: https://doi.org/10.1371/journal.pntd.0002713
- Ogashawara I., Li L., Moreno-Madriñán M.J. Spatial-temporal assessment of environmental factors related to dengue outbreaks in São Paulo, Brazil. Geohealth. 2019;3(8):202–17. DOI: https://doi.org/10.1029/2019GH000186
- Anno S., Hara T., Kai H., et al. Spatiotemporal dengue fever hotspots associated with climatic factors in Taiwan including outbreak predictions based on machine-learning. Geospat. Health. 2019;14(2). DOI: https://doi.org/10.4081/gh.2019.771
- Baquero O.S., Santana L.M.R., Chiaravalloti-Neto F. Dengue forecasting in São Paulo city with generalized additive models, artificial neural networks and seasonal autoregressive integrated moving average models. PLoS One. 2018;13(4):e0195065. DOI: https://doi.org/10.1371/journal.pone.0195065
- Racloz V., Ramsey R., Tong S., Hu W. Surveillance of dengue fever virus: a review of epidemiological models and early warning systems. PLoS Negl. Trop. Dis. 2012;6(5):e1648. DOI: https://doi.org/10.1371/journal.pntd.0001648
- Baharom M., Ahmad N., Hod R., Abdul Manaf M.R. Dengue early warning system as outbreak prediction tool: a systematic review. Risk Manag. Healthc. Policy. 2022;15:871–86. DOI: https://doi.org/10.2147/RMHP.S361106
- Aburas H.M., Cetiner B.G., Sari M. Dengue confirmed-cases prediction: A neural network model. Expert Syst. Appl. 2010;37(6):4256–60. DOI: https://doi.org/10.1016/j.eswa.2009.11.077
- Chang F.S., Tseng Y.T., Hsu P.S., et al. Re-assess vector indices threshold as an early warning tool for predicting dengue epidemic in a dengue non-endemic country. PLoS Negl. Trop. Dis. 2015;9(9):e0004043. DOI: https://doi.org/10.1371/journal.pntd.0004043
- Ahmad Qureshi E.M., Tabinda A.B., Vehra S. Predicting dengue outbreak in the metropolitan city Lahore, Pakistan, using dengue vector indices and selected climatological variables as predictors. J. Pak. Med. Assoc. 2017;67(3):416–21.
- Roster K., Connaughton C., Rodrigues F.A. Machine-learning-based forecasting of dengue fever in Brazilian cities using epidemiologic and meteorological variables. Am. J. Epidemiol. 2022;191(10):1803–12. DOI: https://doi.org/10.1093/aje/kwac090
- Mussumeci E., Codeço Coelho F. Large-scale multivariate forecasting models for Dengue – LSTM versus random forest regression. Spat. Spatiotemporal. Epidemiol. 2020;35:100372. DOI: https://doi.org/10.1016/j.sste.2020.100372
- Ren H., Xu N. Forecasting and mapping dengue fever epidemics in China: a spatiotemporal analysis. Infect. Dis. Poverty. 2024;13(1):50. DOI: https://doi.org/10.1186/s40249-024-01219-y
- Nguyen V.H., Tuyet-Hanh T.T., Mulhall J., et al. Deep learning models for forecasting dengue fever based on climate data in Vietnam. PLoS Negl. Trop. Dis. 2022;16(6):e0010509. DOI: https://doi.org/10.1371/journal.pntd.0010509
- Page M.J., McKenzie J.E., Bossuyt P.M., et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. DOI: https://doi.org/10.1136/bmj.n71
- Lockwood C., Munn Z., Porritt K. Qualitative research synthesis: methodological guidance for systematic reviewers utilizing meta-aggregation. Int. J. Evid. Based Healthc. 2015;13(3):179–87. DOI: https://doi.org/10.1097/XEB.0000000000000062
- Moons K.G., Altman D.G., Reitsma J.B., et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann. Intern. Med. 2015;162(1):W1–73. DOI: https://doi.org/10.7326/M14-0698
- Collins G.S., Reitsma J.B., Altman D.G., Moons K.G. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann. Intern. Med. 2015; 162(1): 55–63. DOI: https://doi.org/10.7326/M14-0697
- Koplewitz G., Lu F., Clemente L., et al. Predicting dengue incidence leveraging internet-based data sources. A case study in 20 cities in Brazil. PLoS Negl. Trop. Dis. 2022;16(1):e0010071. DOI: https://doi.org/10.1371/journal.pntd.0010071
- Lima M.V.M., Laporta G.Z. Evaluation of the models for forecasting dengue in Brazil from 2000 to 2017: An ecological time-series study. Insects. 2020;11(11):794. DOI: https://doi.org/10.3390/insects11110794
- Sang S., Gu S., Bi P., et al. Predicting unprecedented dengue outbreak using imported cases and climatic factors in Guangzhou, 2014. PLoS Negl. Trop. Dis. 2015;9(5):e0003808. DOI: https://doi.org/10.1371/journal.pntd.0003808
- Kuo C.Y., Yang W.W., Su E.C. Improving dengue fever predictions in Taiwan based on feature selection and random forests. BMC Infect. Dis. 2024;24(Suppl. 2):334. DOI: https://doi.org/10.1186/s12879-024-09220-4
- Yuan H.Y., Wen T.H., Kung Y.H., et al. Prediction of annual dengue incidence by hydro-climatic extremes for southern Taiwan. Int. J. Biometeorol. 2019;63(2):259–68. DOI: https://doi.org/10.1007/s00484-018-01659-w
- Tuan D.A., Dang T.N. Leveraging climate data for dengue forecasting in Ba Ria Vung Tau Province, Vietnam: An advanced machine learning approach. Trop. Med. Infect. Dis. 2024;9(10):250. DOI: https://doi.org/10.3390/tropicalmed9100250
- Ismail S., Fildes R., Ahmad R., et al. The practicality of Malaysia dengue outbreak forecasting model as an early warning system. Infect. Dis. Model. 2022;7(3):510–25. DOI: https://doi.org/10.1016/j.idm.2022.07.008
- Karasinghe N., Peiris S., Jayathilaka R., Dharmasena T. Forecasting weekly dengue incidence in Sri Lanka: Modified Autoregressive Integrated Moving Average modeling approach. PLoS One. 2024;19(3):e0299953. DOI: https://doi.org/10.1371/journal.pone.0299953
- Chakraborty T., Chattopadhyay S., Ghosh I. Forecasting dengue epidemics using a hybrid methodology. Phys. A: Stat. Mech. Appl. 2019;527:121266. DOI: https://doi.org/10.1016/j.physa.2019.121266
- Baharom M., Ahmad N., Hod R., Abdul Manaf M.R. Dengue early warning system as outbreak prediction tool: a systematic review. Risk Manag. Healthc. Policy. 2022;15:871–86. DOI: https://doi.org/10.2147/RMHP.S361106
- Ilic I., Ilic M. Global patterns of trends in incidence and mortality of dengue, 1990-2019: An analysis based on the global burden of disease study. Medicina (Kaunas). 2024;60(3):425. DOI: https://doi.org/10.3390/medicina60030425
- Nayak S.D.P., Narayan K.A. Prediction of dengue outbreaks in Kerala state using disease surveillance and meteorological data. Int. J. Community Med. Public Health. 2019;6(10):4392. DOI: https://doi.org/10.18203/2394-6040.ijcmph20194500
- Liu D., Guo S., Zou M., et al. A dengue fever predicting model based on Baidu search index data and climate data in South China. PLoS One. 2019;14(12):e0226841. DOI: https://doi.org/10.1371/journal.pone.0226841
- Johansson M.A., Reich N.G., Hota A., et al. Evaluating the performance of infectious disease forecasts: A comparison of climate-driven and seasonal dengue forecasts for Mexico. Sci. Rep. 2016;6:33707. DOI: https://doi.org/10.1038/srep33707
- Salim N.A.M., Wah Y.B., Reeves C., et al. Prediction of dengue outbreak in Selangor Malaysia using machine learning techniques. Sci. Rep. 2021;11(1):939. DOI: https://doi.org/10.1038/s41598-020-79193-2
- Bullock J., Luccioni A., Hoffman Pham K., et al. Mapping the landscape of Artificial Intelligence applications against COVID-19. J. Artif. Intell. Res. 2020;69:807–45. DOI: https://doi.org/10.1613/jair.1.12162
- Zinszer K., Verma A.D., Charland K., et al. A scoping review of malaria forecasting: past work and future directions. BMJ Open. 2012;2(6):e001992. DOI: https://doi.org/10.1136/bmjopen-2012-001992
- Ong J., Liu X., Rajarethinam J., et al. Mapping dengue risk in Singapore using Random Forest. PLoS Negl. Trop. Dis. 2018;12(6):e0006587. DOI: https://doi.org/10.1371/journal.pntd.0006587
- Johansson M.A., Apfeldorf K.M., Dobson S., et al. An open challenge to advance probabilistic forecasting for dengue epidemics. Proc. Natl. Acad. Sci. U.S.A. 2019;116(48):24268–74. DOI: https://doi.org/10.1073/pnas.1909865116
- Ivanescu A.E., Li P., George B., et al. The importance of prediction model validation and assessment in obesity and nutrition research. Int. J. Obes. (Lond.). 2016;40(6):887–94. DOI: https://doi.org/10.1038/ijo.2015.214
- Steyerberg E.W., Lingsma H.F. Predicting citations: Validating prediction models. BMJ. 2008;336(7648):789. DOI: https://doi.org/10.1136/bmj.39542.610000.3A
- Moons K.G., de Groot J.A., Bouwmeester W., et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med. 2014;11(10):e1001744. DOI: https://doi.org/10.1371/journal.pmed.1001744
- Moons K.G., Kengne A.P., Grobbee D.E., et al. Risk prediction models: II. External validation, model updating, and impact assessment. Heart. 2012;98(9):691–8. DOI: https://doi.org/10.1136/heartjnl-2011-301247
Supplementary files
