Global forecasting models for dengue outbreaks in endemic regions: a systematic review

Agung Sutriyawan; Sutriyawan Agung; Mursid Rahardjo; Rahardjo Mursid; Martini Martini; Martini Martini; Dwi Sutiningsih; Sutiningsih Dwi; Cheerawit Rattanapan; Rattanapan Cheerawit; Nur Faeza Abu Kassim; Kassim Nur Faeza Abu

doi:10.36233/0372-9311-694

Global forecasting models for dengue outbreaks in endemic regions: a systematic review

Authors: Sutriyawan A.¹^,2, Rahardjo M.¹, Martini M.¹, Sutiningsih D.¹, Rattanapan C.³, Kassim N.F.⁴
Affiliations:
1. Diponegoro University
2. Bhakti Kencana University
3. Mahidol University
4. Universiti Sains Malaysia
Issue: Vol 102, No 3 (2025)
Pages: 331-342
Section: ORIGINAL RESEARCHES
URL: https://microbiol.crie.ru/jour/article/view/18837
DOI: https://doi.org/10.36233/0372-9311-694
EDN: https://elibrary.ru/RDIEND
ID: 18837

Cite item

Full Text

Abstract
Full Text
About the authors
References
Supplementary files
Statistics

Abstract

Background. Dengue is a rapidly spreading mosquito-borne disease, posing significant global health challenges, particularly in endemic regions. Recent years have witnessed an increase in the frequency and intensity of dengue outbreaks, necessitating robust forecasting models for early intervention.

This systematic review aims to synthesize recent literature on dengue forecasting models, evaluate their predictive performance, and identify the most effective approaches.

Materials and methods. A comprehensive search in Scopus, PubMed, ScienceDirect, and Springer databases was conducted following PRISMA guidelines. Studies were selected based on strict inclusion and exclusion criteria, and the quality of the research was evaluated using TRIPOD criteria. Out of 1,366 identified studies, 13 met the eligibility criteria. Data were extracted and analyzed to assess the accuracy and validity of the forecasting models employed.

Results. The findings indicate that machine learning-based models, particularly random forest, outperform conventional statistical models such as ARIMA and Poisson regression. Additionally, climate data — especially temperature and rainfall play a critical role in forecasting dengue incidence.

Conclusion. The present study corroborates the superior efficacy of machine learning-based forecasting models, particularly random forest, in forecasting dengue cases compared to conventional statistical methods. This finding provides a foundation for the development of an enhanced early warning system to address future outbreaks of dengue.

Keywords

dengue, forecast model, machine learning, random forest, early warning system

Full Text

Introduction

Dengue is one of the fastest spreading mosquito-borne disease, especially in tropical and subtropical regions, caused by various types of dengue viruses [1, 2]. The World Health Organization has reported an 8-fold increase in global dengue incidence between 2000 and 2019. In 2023, over 5 million cases were documented across 80 countries, with at least 23 nations experiencing dengue outbreaks. That number has more than doubled in 2024, with more than 10.6 million cases reported in North and South America alone. However, the actual number of cases is likely significantly higher, emphasizing the urgent need for effective public health interventions to mitigate this escalating crisis [3]. Although most infections are harmless, dengue shock syndrome and dengue are severe forms of infection that can lead to death [4, 5]. In the absence of a specific drug or vaccine for this virus, case fatality rates can reach 20% if diagnosis is not prompt [6], particularly in resource-constrained areas. When outbreaks occur on a large scale, the sheer number of severe dengue cases can overwhelm the health system and impede the delivery of optimal care. Dengue also poses a huge social and economic burden to many tropical countries where the disease is endemic [7]. Precise prediction of outbreak size and trends in disease incidence early can limit further spread [8], and help better plan health resource allocation to meet needs during an outbreak.

The two principal vectors are Aedes aegypti and A. albopictus, which are capable of transmitting dengue. The transmission of dengue is influenced by a number of factors, including environmental and climate change, urbanization, globalization, vector activity, and behavioral change [9]. The interaction between humans, climate, and mosquitoes gives rise to a complex system that exerts a profound influence on dengue transmission patterns, which in turn affects the likelihood of outbreaks [10]. This relationship has been researched for decades through the development of forecasting models in different parts of the world. These models vary widely, both in terms of purpose [11, 12], and setting [13–15]. While many of these models demonstrate excellence in various tasks, to create efficient prediction models, a systematic, adaptive and generalizable framework is needed, capable of identifying weather- and population-related patterns of vulnerability across geographic regions. The scientific community has not yet reached agreement on which models provide the best predictions. There are many research reports on prediction tools for dengue outbreaks [16–19]. However, research that provides a comprehensive summary of the performance and predictive ability of these tools remains limited. Previous studies have underscored the value of integrating diverse epidemiological tools, including mapping and mathematical models, to develop an effective early warning system [20]. However, this study did not prioritize the identification of significant predictors in the development of an early warning system for dengue. Other studies that emphasize early warning systems and incorporate numerous case forecasting models have been conducted, but this study solely examines the case experience of the various models utilized [21].

Various forecasting models have been developed over the years, integrating epidemiological, environmental, and climatic variables. While some models rely on traditional statistical methods such as Autoregressive Integrated Moving Average (ARIMA) and Poisson regression [14, 22–24]. Emerging research highlights the superior accuracy of machine learning models, particularly random forest and Long Short Term Memory (LSTM) [25, 26]. However, there is still no consensus on the most effective forecasting approach. To address this research gap, several recent studies have explored novel methodologies in dengue forecasting. Recent studies indicate that integrating deep learning techniques, such as LSTM and transformer models, significantly improves prediction accuracy compared to conventional statistical models [27]. Furthermore, recent findings suggest that incorporating real-time meteorological and mobility data improves forecasting precision [28]. These updated approaches not only improve prediction accuracy but also enhance model adaptability across different geographical regions. Despite these advancements, inconsistencies in data quality, limited external validation, and computational constraints continue to pose challenges in real-world applications. This review focuses on determining which model exhibits the highest accuracy and examining its internal and external validity. Its objective is to synthesize recent literature on dengue case forecasting, discuss related evidence, and evaluate different models' forecasting performance to identify the most effective one.

Materials and methods

This review used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) approach, which includes methods for determining resources, eligibility, inclusion and exclusion criteria, and the process of systematic review, extraction, and analysis of data from the available literature [7]. PRISMA 2020 replaces the previous edition published in 2009, introducing new reporting guidelines that include more comprehensive study identification, selection, scoring, and synthesis methods [29]. This guide enables the search for terms relevant to the review and provides advice on aspects that need to be addressed in the review report for publication purposes [21].

Research Question Formulation

Research questions were developed using PICo, a useful tool to help frame relevant research questions for systematic reviews. The PICo concept incorporates three important elements (population or problem, importance, and context).[30] Based on PICo, the three main components in this review are dengue (Problem), case forecast model (Importance), and case prediction (Context). These concepts guided the formulation of the research question: “What is the evidence of the dengue case forecast model and its performance in predicting cases?”

Systematic Searching Strategies

Systematic searching strategies include identification, screening, and eligibility process.

Identification

In the identification stage, synonyms and variations were used to enrich the keywords, then applied in the search process, search strings were created and generated by using Boolean operators and keyword search, as illustrated in Table 1. A systematic literature search was conducted against four major databases: Scopus, PubMed, ScienceDirect, and Springer, and identified a total of 1366 relevant records. 16 duplicate records were found and removed, leaving 1,350 records for title screening. All potential records were then exported from the databases and organized into Excel sheets for title and abstract screening.

Table 1. Keywords search used in the screening process

Databases	Keywords used
Pubmed	((((((((((((((((((((((((dengue fever) OR (dengue incidence)) OR (dengue outbreaks)) OR (dengue epidemic)) AND (forecasting models)) OR (predictive models)) OR (prediction models)) OR (epidemic forecasting)) OR (outbreak prediction)) AND (machine learning)) OR (statistical models)) OR (ARIMA)) OR (regression models)) OR (random forest)) OR (neural networks)) OR (support vector machines)) AND (environmental factors)) OR (climate variables)) OR (temperature)) OR (rainfall)) OR (humidity)) OR (climate data)) OR (weather patterns)) AND (endemic regions)) AND (tropical areas)
Scopus	TITLE-ABS-KEY ("dengue fever" OR "dengue incidence" OR "dengue outbreak" OR "dengue epidemic") AND ("forecast* model" OR "predict model" OR "prediction model" OR "epidemic forecast" OR "outbreak prediction") AND ("machine learning" OR "statistical model" OR "ARIMA" OR "regression model" OR "random forest" OR "neural network" OR "support vector machine") AND ("environment factor" OR "climate variable" OR "temperature" OR "rainfall" OR "humidity" OR "climate data" OR "weather pattern") AND ("endemic region" OR "tropical area" OR "high-risk area" OR "disease-endemic region*")
ScienceDirect	Search 1: ("dengue fever" OR "dengue incidence") AND ("forecasting models" OR "prediction models") Search 2: ("dengue fever" OR "dengue incidence") AND ("prediction models" OR "outbreak prediction") AND ("machine learning" OR "statistical models") Search 3: ("dengue fever" OR "dengue outbreaks") AND ("predictive models" OR "forecasting models") AND ("environmental factors" OR "temperature" OR "rainfall")
Springer	("dengue fever" OR "dengue incidence" OR "dengue outbreaks") AND ("forecasting models" OR "predictive models") AND ("machine learning" OR "statistical models" OR "ARIMA") AND ("environmental factors" OR "climate" OR "rainfall") AND ("endemic regions" OR "tropical areas")

Screening

Two authors were responsible for the screening of titles and abstracts, which was conducted in accordance with the review questions that had been developed and the specific inclusion and exclusion criteria that had been established. Inclusion criteria were primary research in peer-reviewed journals and English-language articles. We excluded systematic review articles, books, conference proceedings, and non-peer-reviewed articles, such as editorials, commentaries, opinion pieces, or short reports. The screening process resulted in the elimination of 1,120 articles that were deemed irrelevant to the review. The remaining 230 articles were then read in full, including the abstracr reading, and assessed for eligibility.

Eligibility

A total of 64 full-text articles were retrieved for eligibility. Two authors independently reviewed all full-text articles for eligibility. All studies found to be unrelated to the interest and outcome of interest were excluded. The reasons for article exclusion were notated. There were 51 articles excluded due to:

studies that did not focus on predicting the number of future cases (n = 14);
studies that used or evaluated prediction or forecasting models, including machine learning methods (random forests, LSTM) or statistical models (such as ARIMA, Seasonal Autoregressive Integrated Moving Average (SARIMA), regression) (n = 19);
articles that did not involve key climate variables in the forecasting (n = 11);
studies conducted in non-endemic or low prevalence dengue areas (n = 7).

The remaining 13 eligible articles were continued for the quality assessment process.

Quality Assessment

The quality of the study was assessed using the quality assessment criteria described in TRIPOD (Transparent Reporting of multivariable prediction models for Individual Prognosis or Diagnosis) [31]. The TRIPOD statement is a checklist of 22 items, which are considered essential for the proper reporting of research that develops or validates multivariable prediction models [32]. The TRIPOD guidelines explicitly cover the development and validation of prediction models for diagnosis and prognosis across all medical domains and predictor types. Two authors conducted the quality assessment independently. Scores for report levels were obtained by awarding one point for each reported item relevant to the study. The total score was converted to a percentage based on the maximum possible score. Ultimately, 17 articles (with a percentage score > 70%) were included in the review [21]. Table 2 presents the scores and percentages of each quality assessment adapted from the TRIPOD checklist.

Table 2. Quality appraisal score of eligible articles adapted from TRIPOD checklist [32, 42]

Daftar periksa	Item	Source
Daftar periksa	Item	[25]	[26]	[27]	[28]	[33]	[34]	[35]	[36]	[37]	[38]	[39]	[40]	[41]
Title and abstract
Title	1	1	1	1	1	1	1	1	1	1	1	1	1	1
Abstract	2	1	1	1	1	1	1	1	1	1	1	1	1	1
Introduction
Background and objectives	3a	1	1	1	1	1	1	1	1	1	1	1	1	1
	3b	1	1	1	1	1	1	1	1	1	1	1	1	1
Methods
Source of data	4a	1	1	1	1	1	1	1	1	1	1	1	1	1
	4b	1	0	0	0	1	0	0	1	0	0	0	0	0
Participants	5a	1	1	1	1	1	1	1	1	1	1	1	1	1
	5b	1	1	1	1	1	1	1	1	1	1	1	1	1
Outcome	6a	1	1	1	1	1	1	1	1	1	1	1	1	1
Predictors	7a	1	1	1	1	1	1	1	1	1	1	1	1	1
Sample size	8	1	0	0	0	1	0	0	1	0	0	0	0	0
Missing data	9	1	0	0	0	0	0	0	0	0	0	0	0	0
Statistical analysis methods	10a	1	1	1	1	1	1	1	1	1	1	1	1	1
	10b	1	1	1	1	1	1	1	1	1	1	1	1	1
	10d	1	1	1	1	1	1	1	1	1	1	1	1	1
Results
Participants	13a	1	0	0	0	1	0	0	0	0	0	0	0	0
	13b	1	1	1	1	1	1	1	1	1	1	1	1	1
Model development	14a	1	1	1	1	1	1	1	1	1	1	1	1	1
	14b	1	0	0	0	0	0	0	1	0	0	0	0	0
Model specification	15a	1	1	0	0	1	1	1	1	1	0	1	1	1
	15b	1	1	1	1	1	1	1	1	1	1	1	1	1
Model performance	16	1	1	1	1	1	1	1	1	1	1	1	1	1
Discussion
Limitations	18	1	1	1	1	1	1	1	1	1	1	1	1	1
Interpretation	19b	1	1	1	1	1	1	1	1	1	1	1	1	1
Implications	20	1	1	1	1	1	1	1	1	1	1	1	1	1
Other information
Supplementary information	21	1	0	0	0	0	0	0	1	0	0	0	0	0
Funding	22	1	0	1	1	1	0	1	1	0	1	0	0	0
Final score		27	20	20	20	24	20	21	25	20	20	20	20	20
Percentage		100	74.1	74.1	74.1	88.9	74.1	77.8	92.6	74.1	74.1	74.1	74.1	74.1

Data Extraction and Synthesis

The author extracted the data independently using a standardized data extraction form and organized it in a Microsoft Excel worksheet. The information collected included: author (year), country, study design, candidate predictors, research, data frequency, model techniques used, model performance, outcome, model accuracy, evaluation. The PRISMA flowchart is shown in Figure 1.

Fig. 1. Systematic review flow.

Results

Study characteristics

A total of 13 studies met the eligibility criteria and were included in this systematic review. Of these 13 studies, 4 (31%) were conducted in the Americas, 4 (31%) in East Asia, 4 (31%) in Southeast Asia, and 1 (7%) in South Asia. Brazil was the country with the highest number of eligible studies (n = 4) [25, 26, 33, 34], followed by China (n = 2) [27, 35], Taiwan (n = 2) [36, 37], Vietnam (n = 2) [28, 38]. Other studies were conducted in Malaysia [39], Sri Lanka [40], and the Philippines [41]. Five (42%) studies were published between 2015 and 2020, 9 studies between 2018–2022, and 7 (58%) studies were published between 2021–2024. Most studies (46%) used weekly time units, there were 23% studies using monthly data units, and the rest using annual and yearly. More than half (n = 7; 54%) of the studies used machine learning model techniques [25–28, 33, 36, 39], and the remaining (n = 5; 46%) studies used statistical model techniques [34, 35, 37, 38, 40, 41]. The characteristics of the included studies are summarized in Fig. 2. Details of the characteristics within each study are presented in Table 3.

Fig. 2. Study characteristics

Table 3. The details for characteristic and main findings of each study

Source	Country	Study Design	Candidate predictors	Data Unit	Model techniques used	Model performance	Outcome	Model Accuracy	Evaluation
[25]	Brazil	Observational Study	Rainfall, maximum temperature, minimum temperature, relative median temperature, insolation, rate of evaporation, median relative humidity, median wind speed	Monthly	Machine Learning (Random Forests, Gradient Boosting, Multilayer Perceptron, Support Vector Regression)	RMSE, MAE (Lowest errors with Random Forests)	Monthly cases of dengue	RMSE: 15.5 = 84.5% MAE: 11.9 = 88,1%	Internal and External
[26]	Brazil	Comparative Study	Historical dengue cases, climate variables, tweets	Weekly	Machine Learning (LSTM, Random Forest, LASSO)	MSE, MSLE	Forecasting dengue incidence	LSTM = MSE = 0,04 (96%), MSLE = 0,01 (100%) Random Forest = MSE = 0,17 (83%), MSLE = 0,13 (87%) LASSO = MSE = 0,4 (60%), MSLE = 0,33 (67%)	Internal and external
[27]	China	Spatiotemporal Analysis	Imported cases, Tmin, Forest, Pop, Prec, Tmean, GDP, RH, Cropland, Tmax, Impervious, Water	Daily	Random Forest, Gradient Boosting Machine, Support Vector Machine	AUC	Dengue incidence	AUC = 0.91 (91%)	Internal and external
[28]	Vietnam	Observational	Climate data (temperature, precipitation, humidity, evaporation, sunshine hours)	Daily	Machine Learning (LSTM, LSTM-ATT, CNN, Transformer)	RMSE and MAE	Forecasting dengue fever incidence	RMSE: 1.60 MAE: 1.95 Accuraty rate 100%	Internal only
[33]	Brazil	Quantitative research design	Epidemiological data, Google search data, Weather	Weekly	Random Forest, LASSO Regression	RMSE, R², Pearson Correlation	Dengue incidence	LASSO = 70%-90% Up to 90%	Internal only
[34]	Brazil	Ecological Time-Series Study	Climatic, environmental, social factors	Monthly	Statistical models (ARIMA, ETS, TBATS, BATS, STLM, StructTS, NNETAR, ELM, MLP, null model)	MAPE, Relative MAPE, Theil’s U	Dengue cases	ARIMA and TBATS are the best models in various time horizons (12 months, 6 months, dan 3 months) Model accuracy not mentioned	Internal only
[35]	China	Time series analysis	Imported cases, Minimum temperature, Accumulative precipitation	Monthly	Time series Poisson regression	R²	Dengue outbreaks	R² = 0.98 (98%)	Internal only
[36]	Taiwan	Observational Study	Meteorological variables, AQI, vector data	Daily	Machine Learning (Random Forest, XGBoost, Logistic Regression)	AUC	Dengue fever incidence	Random Forest: AUC = 0.9547, Accuracy = 89.94% XGBoost: AUC = 0.9329 Logistic Regression: AUC = 0.7905	Internal only
[37]	Taiwan	Observational Study	Minimum temperature, Maximum cumulative rainfall	Yearly	Poisson Regression	MSE	Dengue incidence	MSE for validation set = 2.21 MSE for training set = 2.11	Internal only
[38]	Vietnam	Observational Study	Climate variables (temperature, humidity, precipitation), time-shifted variables	Weekly	SARIMAX XGBoost LSTM Negative Binomial Regression	MAE, RMSE, AIC	Weekly dengue case counts	SARIMAX = 25.678 (83.33%) XGBoost = 21.409 (100%) LSTM = 30.456 (70.34%) Negative Binomial Regression = 22.345 (95.78%)	Internal only
[39]	Malaysia	Time Series Analysis	Epidemiological (notified cases, onset cases, interventions), Environmental (rainfall, temperature, humidity)	Weekly	Random Forest Support Vector Machine (SVM) Artificial Neural Network (ANN) Autoregressive Distributed Lag (ADL) Hierarchical Forecasting (Optimal Combination) Hierarchical Forecasting (Bottom Up)	MAPE	Dengue outbreak forecasting	Random Forest = 95% (with all factors) SVM = 92.47%; ANN = 86.10% ADL = 85.70% Hierarchical Forecasting (Optimal Combination) = 85.67% Hierarchical Forecasting (Bottom Up) = 84.85%	Internal only
[40]	Sri Lanka	Time Series Analysis	Historical dengue incidence data	Weekly	Modified ARIMA (Statistical)	MAPE	Dengue incidence forecast	MAPE: 1.554 (44.6%) (Validation), 0.3184 (Training) (68.16%)	Internal only
[41]	Philippines	Hybrid Model Development	Dengue incidence, climate data, past incidence	Weekly	ARIMA, NNAR, ANN, SVM, LSTM	RMSE, MAE, SMAPE	Dengue outbreaks	Hybrid ARIMA-NNAR: ~85%	Internal only

Approach and Accuracy of Forecasting Model for dengue cases

Various modeling approaches, such as machine learning and statistical methods for dengue case experience have been used in all included studies. Out of 13 studies, 6 (26,1%) used random forest approach [25–27, 33, 36, 39], 5 (21,7%) used LSTM approach [26, 28, 34, 38, 41], 3 (13%) used ARIMA [34, 40, 41], 2 others used Least Absolute Shrinkage and Selection Operator (LASSO), Gradient Boosting, XGBoost poisson regression, SARIMA. In terms of perfomance, all studies use different methods, including Root Mean Squared Error (RMSE), R-Squared (R²), Pearson Correlation, Mean Absolute Percentage Error (MAPE), RMSE, Mean Absolute Error (MAE), Area Under the Curve (AUC), Mean Squared Error (MSE), Mean Squared Logarithmic Error (MSLE), Akaike Information Criterion (AIC). The type of model used can be seen in Fig. 3.

Fig. 3. Type of model technique used.

Of the 13 articles included, there are 3 best forecasting methods with the highest model accuracy, namely random forest, LSTM, and LASSO. 6 articles using the random forest method, showed an average model accuracy of 89% [25–27, 33, 36, 39], from 5 articles using the LSTM method, there are 3 articles that show model accuracy, and the average obtained is 89% [26, 28, 38], while the other 2 articles do not mention the percentage of model accuracy [34, 41]. Of the 2 articles that used the LASSO method, the average model accuracy was 65% [26, 33]. The accuracy of the forecasting models can be seen in Fig. 4. In general, all of the case experience models included in the study showed fairly good forecasting ability. Overall, climate indicators were the most frequently used in showing the best performance. However, there are studies that used a combination of climate and epidemiological indicators, which showed that previous dengue cases significantly influenced current dengue cases [39].

Fig. 4. Average model accuracy

Random forest model accuracy

The Figure 5 illustrates the accuracy of various random forest models applied in dengue forecasting studies. The dataset includes models developed by six original research, with accuracy values ranging from 83% to 92%. The average model accuracy is recorded at 89%. The results highlight the superior predictive performance of random forest models in dengue incidence forecasting, reinforcing their potential for integration into early warning systems for outbreak management.

Fig. 5. Random forest model accuracy.

Discussion

This systematic review aims to summarize and discuss the evidence of various dengue case forecasting methods, model performance, and their ability to explain dengue incidence. This review shows that dengue prediction studies have become a topic of research interest, especially in Asia, where 69% of these included studies were conducted in Asia. This trend is due to the fact that the Asian region represents about 70% of the dengue burden globally [43]. Climate data, particularly temperature, rainfall and humidity are important predictors of dengue incidence, but they are often not available in time for health providers working on dengue early warning systems. Several studies have found that countries with better meteorological records provide higher performance metrics [25, 34, 35]. Therefore, integration with local meteorological departments on real-time meteorological data will improve access to meteorological information and benefit end users in early outbreak detection.

In general, climatic variables show an important role in the prediction of dengue cases. Climate variables such as mean temperature [25, 27, 28, 38, 39], minimum temperature [27, 35–37], maximum temperature [27, 37, 38], rainfall [27, 28, 36, 37, 39], humidity [25, 33, 39, 40], relative humidity [25, 28, 33], wind speed [25, 28, 33], evaporation and sunshine [28] are important input paramaters in the development of dengue incidence prediction models. Temperature showed the best predictive capacity of the meteorological variables studied in this review. In Vietnam, temperature was a significant predictor in the best dengue forecasting model, where the AUC and sensitivity were 87.42% and 96.88%, respectively [28]. In Ba Ria Vung Tau Province, Vietnam reported temperature and humidity as reliable variables in predicting dengue cases, where the AUC and sensitivity were 90.00% and 85.00%, respectively [38]. Meanwhile, Taiwan showed that temperature and rainfall are important factors in predicting dengue cases, where the AUC and sensitivity are 88% and 80% respectively [37].

In general, the dengue case prediction models included in the studies demonstrated a relatively high level of predictive ability. However, the predictive accuracy of these models varies considerably depending on the specific model employed and the quality of the data used. The most commonly utilized statistical modeling techniques in dengue research are ARIMA, Generalized Additive Models (GAM), Negative Binomial Regression, and Poisson Regression. ARIMA and GAM are established models for examining the relationship between environmental factors and disease outcomes, as well as for conducting time series prediction analysis [44, 45]. According to recent literature, time series techniques are particularly considered effective in predicting the highly auto-correlated nature of dengue infections [46]. In recent years, data-driven techniques based on machine learning algorithms such as Random Forest, Decision Tree, Support Vector Machine (SVM), and Naïve Bayes have shown promising results in predictive analysis for classification problems [47].

More than half of the included studies rely on machine learning methods, particularly supervised learning models, to assess conventional and novel data streams. Supervised learning models are defined by the use of labeled data sets to train algorithms to accurately classify data or predict outcomes [21]. The advantages of machine learning techniques that demonstrate lower error rates in comparison to conventional statistical-based models in predicting dengue cases are manifold. In the era of big data, this technique can utilize the availability of data and, in addition to being non-parametric, it can also provide leeway in terms of strict assumptions [7]. Random forest, neural network, gradient boosting, and support vector algorithms are part of important machine learning algorithms, which have made significant contributions to several areas of public health, especially in forecasting infectious diseases such as COVID-19 [48], malaria [49], and have similar uses for making dengue outbreak predictions [7].

In some of the studies included in this literature, we assume that the machine learning method using random forest is the best method at the moment. Findings in Brazil state that the accuracy of this model in recognizing dengue cases is more than 90% [33]. Likewise, findings in Malaysia state that the accuracy of this model reaches 95% [39]. Similar findings in another study in Singapore, which stated that the potential of random forest and its strong predictive ability in clustering the spatial risk of dengue transmission in Singapore. The dengue risk map generated using random forest has high accuracy and is a good tool to guide vector control operations, allowing targeted preventive measures before and during dengue outbreaks [50].

All studies employed internal validation to assess the accuracy of their findings. The utility of a forecasting model is contingent upon the certainty of its accuracy, or the extent to which it can predict real-world outcomes [51]. It is notable that the majority of published models have not undergone or been subjected to real-world validation. It is reasonable to conclude that models are unlikely to perform as well in real-world samples as they do in derived samples. This discrepancy, or validity shrinkage, is often significant. Consequently, it would be beneficial for future models to include mechanisms for estimating and reporting potential validity shrinkage, as well as predictive validity, in real-world data [52, 53]. External validation, on the other hand, was only used in a few studies that included [25–27]. This is despite the fact that external validation is considered very important for model development and is a key indicator of model performance by highlighting its applicability to participants, centers, regions or environments [54], It is imperative that external validation be employed during the process of model redevelopment. This entails making adjustments, updates, or recalibrations to the original model based on validation data, with the objective of enhancing its performance [55].

It should be noted that this systematic review is not without limitations. Firstly, the majority of the included studies originate from Asia, which encompasses a multitude of non-English speaking countries. Consequently, this review may have overlooked a substantial corpus of related literature published in other languages. Secondly, the inclusion criteria stipulated the necessity for studies to be derived from primary research in peer-reviewed journals. Consequently, preprints and grey literature, such as conference abstracts, committee and government reports, were excluded. It is therefore possible that some studies may have been omitted from our review.

Conclusion

The forecasting of dengue cases is a valuable resource for policymakers engaged in the formulation of strategies for the prevention of dengue outbreaks, particularly in regions where the disearse is endemic. The results of this systematic review indicate that the machine learning method utilizing the random forest algorithm is more effective than others method, particularly in comparison to statistical methods. Furthermore, this systematic review presents evidence of predictors in dengue case experience that focuses on incorporating climatic factors to create an early warning system, which can be utilized as a reference for preventing dengue transmission. The findings from this review have the potential to form the basis for more effective modelling practices in the future. These findings will contribute to the development of robust modelling across differenctt settings and populations and have significant implications for planning and decision-making processes for early dengue intervention and prevention.

About the authors

Agung Sutriyawan

Diponegoro University; Bhakti Kencana University

Author for correspondence.
Email: agung.epid@gmail.com
ORCID iD: 0000-0002-6119-6073

researcher, Diponegoro University; Head, Department of public health, Faculty of health sciences, Bhakti Kencana University

Indonesia, Semarang; Bandung

Mursid Rahardjo

Diponegoro University

Email: mursidraharjo@gmail.com
ORCID iD: 0000-0003-4791-1242

senior researcher, Department of environmental health, Faculty of public health

Indonesia, Semarang

Martini Martini

Diponegoro University

Email: martini@live.undip.ac.id
ORCID iD: 0000-0002-6773-1727

senior researcher, Department of epidemiology, Faculty of public health

Indonesia, Semarang

Dwi Sutiningsih

Diponegoro University

Email: dwi.sutiningsih@live.undip.ac.id
ORCID iD: 0000-0002-4128-6688

senior researcher, Department of epidemiology, Faculty of public health

Indonesia, Semarang

Cheerawit Rattanapan

Mahidol University

Email: cheerawit.rat@mahidol.ac.th
ORCID iD: 0000-0002-1799-422X

senior researcher, ASEAN Institute for Health Development

Thailand, Bangkok

Nur Faeza Abu Kassim

Universiti Sains Malaysia

Email: nurfaeza@usm.my
ORCID iD: 0000-0001-6620-8603

senior researcher, School of biological sciences

Malaysia, Penang

References

Sarker R., Roknuzzaman A.S.M., Haque M.A., et al. Upsurge of dengue outbreaks in several WHO regions: Public awareness, vector control activities, and international collaborations are key to prevent spread. Health Sci. Rep. 2024;7(4):e2034. DOI: https://doi.org/10.1002/hsr2.2034
Hossain M.S., Noman A.A., Mamun S.M.A.A., Mosabbir A.A. Twenty-two years of dengue outbreaks in Bangladesh: epidemiology, clinical spectrum, serotypes, and future disease risks. Trop. Med. Health. 2023;51(1):37. DOI: https://doi.org/10.1186/s41182-023-00528-6
CDC. Dengue on the Rise: Get the Facts. Available at: https://cdc.gov/dengue/stories/dengue-on-the-rise-get-the-facts.html
Trivedi S., Chakravarty A. Neurological complications of dengue fever. Curr. Neurol. Neurosci. Rep. 2022;22(8):515–29. DOI: https://doi.org/10.1007/s11910-022-01213-7
Umakanth M., Suganthan N. Unusual manifestations of dengue fever: a review on expanded dengue syndrome. Cureus. 2020;12(9):e10678. DOI: https://doi.org/10.7759/cureus.10678
Capeding M.R., Tran N.H., Hadinegoro S.R., et al. Clinical efficacy and safety of a novel tetravalent dengue vaccine in healthy children in Asia: a phase 3, randomised, observer-masked, placebo-controlled trial. Lancet. 2014;384(9951):1358–65. DOI: https://doi.org/10.1016/S0140-6736(14)61060-6
Leung X.Y., Islam R.M., Adhami M., et al. A systematic review of dengue outbreak prediction models: Current scenario and future directions. PLoS Negl. Trop. Dis. 2023;17(2):e0010631. DOI: https://doi.org/10.1371/journal.pntd.0010631
Chen H.L., Hsiao W.H., Lee H.C., et al. Selection and characterization of DNA aptamers targeting all four serotypes of dengue viruses. PLoS One. 2015;10(6):e0131240. DOI: https://doi.org/10.1371/journal.pone.0131240
Zhu G., Liu J., Tan Q., Shi B. Inferring the spatio-temporal patterns of dengue transmission from surveillance data in Guangzhou, China. PLoS Negl. Trop. Dis. 2016;10(4):e0004633. DOI: https://doi.org/10.1371/journal.pntd.0004633
Teurlai M., Menkès C.E., Cavarero V., et al. Socio-economic and climate factors associated with dengue fever spatial heterogeneity: a worked example in New Caledonia. PLoS Negl. Trop. Dis. 2015;9(12):e0004211. DOI: https://doi.org/10.1371/journal.pntd.0004211
Phung D., Talukder M.R., Rutherford S., Chu C. A climate-based prediction model in the high-risk clusters of the Mekong Delta region, Vietnam: towards improving dengue prevention and control. Trop. Med. Int. Health. 2016;21(10):1324–33. DOI: https://doi.org/10.1111/tmi.12754
Medlock J.M., Leach S.A. Effect of climate change on vector-borne disease risk in the UK. Lancet Infect. Dis. 2015;15(6):721–30. DOI: https://doi.org/10.1016/S1473-3099(15)70091-5
Benedum C.M., Seidahmed O.M.E., Eltahir E.A.B., Markuzon N. Statistical modeling of the effect of rainfall flushing on dengue transmission in Singapore. PLoS Negl. Trop. Dis. 2018;12(12):e0006935. DOI: https://doi.org/10.1371/journal.pntd.0006935
Gharbi M., Quenel P., Gustave J., et al. Time series analysis of dengue incidence in Guadeloupe, French West Indies: forecasting models using climate variables as predictors. BMC Infect. Dis. 2011;11:166. DOI: https://doi.org/10.1186/1471-2334-11-166
Betanzos-Reyes Á.F., Rodríguez M.H., Romero-Martínez M., et al. Association of dengue fever with Aedes spp. abundance and climatological effects. Salud Publica Mex. 2018;60(1):12–20. DOI: https://doi.org/10.21149/8141
Gluskin R.T., Johansson M.A., Santillana M., Brownstein J.S. Evaluation of Internet-based dengue query data: Google Dengue Trends. PLoS Negl. Trop. Dis. 2014;8(2):e2713. DOI: https://doi.org/10.1371/journal.pntd.0002713
Ogashawara I., Li L., Moreno-Madriñán M.J. Spatial-temporal assessment of environmental factors related to dengue outbreaks in São Paulo, Brazil. Geohealth. 2019;3(8):202–17. DOI: https://doi.org/10.1029/2019GH000186
Anno S., Hara T., Kai H., et al. Spatiotemporal dengue fever hotspots associated with climatic factors in Taiwan including outbreak predictions based on machine-learning. Geospat. Health. 2019;14(2). DOI: https://doi.org/10.4081/gh.2019.771
Baquero O.S., Santana L.M.R., Chiaravalloti-Neto F. Dengue forecasting in São Paulo city with generalized additive models, artificial neural networks and seasonal autoregressive integrated moving average models. PLoS One. 2018;13(4):e0195065. DOI: https://doi.org/10.1371/journal.pone.0195065
Racloz V., Ramsey R., Tong S., Hu W. Surveillance of dengue fever virus: a review of epidemiological models and early warning systems. PLoS Negl. Trop. Dis. 2012;6(5):e1648. DOI: https://doi.org/10.1371/journal.pntd.0001648
Baharom M., Ahmad N., Hod R., Abdul Manaf M.R. Dengue early warning system as outbreak prediction tool: a systematic review. Risk Manag. Healthc. Policy. 2022;15:871–86. DOI: https://doi.org/10.2147/RMHP.S361106
Aburas H.M., Cetiner B.G., Sari M. Dengue confirmed-cases prediction: A neural network model. Expert Syst. Appl. 2010;37(6):4256–60. DOI: https://doi.org/10.1016/j.eswa.2009.11.077
Chang F.S., Tseng Y.T., Hsu P.S., et al. Re-assess vector indices threshold as an early warning tool for predicting dengue epidemic in a dengue non-endemic country. PLoS Negl. Trop. Dis. 2015;9(9):e0004043. DOI: https://doi.org/10.1371/journal.pntd.0004043
Ahmad Qureshi E.M., Tabinda A.B., Vehra S. Predicting dengue outbreak in the metropolitan city Lahore, Pakistan, using dengue vector indices and selected climatological variables as predictors. J. Pak. Med. Assoc. 2017;67(3):416–21.
Roster K., Connaughton C., Rodrigues F.A. Machine-learning-based forecasting of dengue fever in Brazilian cities using epidemiologic and meteorological variables. Am. J. Epidemiol. 2022;191(10):1803–12. DOI: https://doi.org/10.1093/aje/kwac090
Mussumeci E., Codeço Coelho F. Large-scale multivariate forecasting models for Dengue – LSTM versus random forest regression. Spat. Spatiotemporal. Epidemiol. 2020;35:100372. DOI: https://doi.org/10.1016/j.sste.2020.100372
Ren H., Xu N. Forecasting and mapping dengue fever epidemics in China: a spatiotemporal analysis. Infect. Dis. Poverty. 2024;13(1):50. DOI: https://doi.org/10.1186/s40249-024-01219-y
Nguyen V.H., Tuyet-Hanh T.T., Mulhall J., et al. Deep learning models for forecasting dengue fever based on climate data in Vietnam. PLoS Negl. Trop. Dis. 2022;16(6):e0010509. DOI: https://doi.org/10.1371/journal.pntd.0010509
Page M.J., McKenzie J.E., Bossuyt P.M., et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. DOI: https://doi.org/10.1136/bmj.n71
Lockwood C., Munn Z., Porritt K. Qualitative research synthesis: methodological guidance for systematic reviewers utilizing meta-aggregation. Int. J. Evid. Based Healthc. 2015;13(3):179–87. DOI: https://doi.org/10.1097/XEB.0000000000000062
Moons K.G., Altman D.G., Reitsma J.B., et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann. Intern. Med. 2015;162(1):W1–73. DOI: https://doi.org/10.7326/M14-0698
Collins G.S., Reitsma J.B., Altman D.G., Moons K.G. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann. Intern. Med. 2015; 162(1): 55–63. DOI: https://doi.org/10.7326/M14-0697
Koplewitz G., Lu F., Clemente L., et al. Predicting dengue incidence leveraging internet-based data sources. A case study in 20 cities in Brazil. PLoS Negl. Trop. Dis. 2022;16(1):e0010071. DOI: https://doi.org/10.1371/journal.pntd.0010071
Lima M.V.M., Laporta G.Z. Evaluation of the models for forecasting dengue in Brazil from 2000 to 2017: An ecological time-series study. Insects. 2020;11(11):794. DOI: https://doi.org/10.3390/insects11110794
Sang S., Gu S., Bi P., et al. Predicting unprecedented dengue outbreak using imported cases and climatic factors in Guangzhou, 2014. PLoS Negl. Trop. Dis. 2015;9(5):e0003808. DOI: https://doi.org/10.1371/journal.pntd.0003808
Kuo C.Y., Yang W.W., Su E.C. Improving dengue fever predictions in Taiwan based on feature selection and random forests. BMC Infect. Dis. 2024;24(Suppl. 2):334. DOI: https://doi.org/10.1186/s12879-024-09220-4
Yuan H.Y., Wen T.H., Kung Y.H., et al. Prediction of annual dengue incidence by hydro-climatic extremes for southern Taiwan. Int. J. Biometeorol. 2019;63(2):259–68. DOI: https://doi.org/10.1007/s00484-018-01659-w
Tuan D.A., Dang T.N. Leveraging climate data for dengue forecasting in Ba Ria Vung Tau Province, Vietnam: An advanced machine learning approach. Trop. Med. Infect. Dis. 2024;9(10):250. DOI: https://doi.org/10.3390/tropicalmed9100250
Ismail S., Fildes R., Ahmad R., et al. The practicality of Malaysia dengue outbreak forecasting model as an early warning system. Infect. Dis. Model. 2022;7(3):510–25. DOI: https://doi.org/10.1016/j.idm.2022.07.008
Karasinghe N., Peiris S., Jayathilaka R., Dharmasena T. Forecasting weekly dengue incidence in Sri Lanka: Modified Autoregressive Integrated Moving Average modeling approach. PLoS One. 2024;19(3):e0299953. DOI: https://doi.org/10.1371/journal.pone.0299953
Chakraborty T., Chattopadhyay S., Ghosh I. Forecasting dengue epidemics using a hybrid methodology. Phys. A: Stat. Mech. Appl. 2019;527:121266. DOI: https://doi.org/10.1016/j.physa.2019.121266
Baharom M., Ahmad N., Hod R., Abdul Manaf M.R. Dengue early warning system as outbreak prediction tool: a systematic review. Risk Manag. Healthc. Policy. 2022;15:871–86. DOI: https://doi.org/10.2147/RMHP.S361106
Ilic I., Ilic M. Global patterns of trends in incidence and mortality of dengue, 1990-2019: An analysis based on the global burden of disease study. Medicina (Kaunas). 2024;60(3):425. DOI: https://doi.org/10.3390/medicina60030425
Nayak S.D.P., Narayan K.A. Prediction of dengue outbreaks in Kerala state using disease surveillance and meteorological data. Int. J. Community Med. Public Health. 2019;6(10):4392. DOI: https://doi.org/10.18203/2394-6040.ijcmph20194500
Liu D., Guo S., Zou M., et al. A dengue fever predicting model based on Baidu search index data and climate data in South China. PLoS One. 2019;14(12):e0226841. DOI: https://doi.org/10.1371/journal.pone.0226841
Johansson M.A., Reich N.G., Hota A., et al. Evaluating the performance of infectious disease forecasts: A comparison of climate-driven and seasonal dengue forecasts for Mexico. Sci. Rep. 2016;6:33707. DOI: https://doi.org/10.1038/srep33707
Salim N.A.M., Wah Y.B., Reeves C., et al. Prediction of dengue outbreak in Selangor Malaysia using machine learning techniques. Sci. Rep. 2021;11(1):939. DOI: https://doi.org/10.1038/s41598-020-79193-2
Bullock J., Luccioni A., Hoffman Pham K., et al. Mapping the landscape of Artificial Intelligence applications against COVID-19. J. Artif. Intell. Res. 2020;69:807–45. DOI: https://doi.org/10.1613/jair.1.12162
Zinszer K., Verma A.D., Charland K., et al. A scoping review of malaria forecasting: past work and future directions. BMJ Open. 2012;2(6):e001992. DOI: https://doi.org/10.1136/bmjopen-2012-001992
Ong J., Liu X., Rajarethinam J., et al. Mapping dengue risk in Singapore using Random Forest. PLoS Negl. Trop. Dis. 2018;12(6):e0006587. DOI: https://doi.org/10.1371/journal.pntd.0006587
Johansson M.A., Apfeldorf K.M., Dobson S., et al. An open challenge to advance probabilistic forecasting for dengue epidemics. Proc. Natl. Acad. Sci. U.S.A. 2019;116(48):24268–74. DOI: https://doi.org/10.1073/pnas.1909865116
Ivanescu A.E., Li P., George B., et al. The importance of prediction model validation and assessment in obesity and nutrition research. Int. J. Obes. (Lond.). 2016;40(6):887–94. DOI: https://doi.org/10.1038/ijo.2015.214
Steyerberg E.W., Lingsma H.F. Predicting citations: Validating prediction models. BMJ. 2008;336(7648):789. DOI: https://doi.org/10.1136/bmj.39542.610000.3A
Moons K.G., de Groot J.A., Bouwmeester W., et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med. 2014;11(10):e1001744. DOI: https://doi.org/10.1371/journal.pmed.1001744
Moons K.G., Kengne A.P., Grobbee D.E., et al. Risk prediction models: II. External validation, model updating, and impact assessment. Heart. 2012;98(9):691–8. DOI: https://doi.org/10.1136/heartjnl-2011-301247