Brief Report: Forecasting Influenza with the Long Short-Term Memory Model: Results from the 2023-2024 Influenza Season

Image of 6. Since 2019 the Integrated Biosurveillance Branch of the Armed Forces Health Surveillance Division has conducted forecasting activities during influenza season to provide early warning and increased awareness of potential health risks to the Department of Defense.

Timely detection of infectious diseases and health threats is of increasing importance, particularly for U.S. military service members. Existing surveillance systems are hindered, however, by a 1- to 2-week delay between actual disease outbreaks and release of surveillance data.1 To address this challenge, since 2019 the Integrated Biosurveillance Branch of the Armed Forces Health Surveillance Division has conducted forecasting activities during influenza season to provide early warning and increased awareness of potential health risks to the Department of Defense enterprise.2 At the end of each influenza season, IB evaluates the performance of the individual forecasting models and assesses potential integration of new algorithms to improve forecasting capabilities for the next influenza season.

The Long Short-Term Memory model is a machine-learning method with potential to improve forecasting accuracy for respiratory disease surveillance.3 The LSTM model is a recurrent neural network model that can be used in almost all modeling fields. LSTM has the capacity to selectively add new information and forget previously accumulated information. While LSTM models are well-established, their performance in forecasting influenza encounters utilizing DOD surveillance data has not been studied. This report assesses the performance of the LSTM model for possible inclusion in future DOD influenza forecasting analyses.

Methods

Influenza encounters were defined as outpatient visits with an International Classification of Diseases, 10th Revision discharge diagnosis code, with codes J09 through J11 selected and identified for influenza encounters. Outpatient influenza encounter data from Military Health System beneficiaries were collected weekly during the 2023-2024 influenza season from all U.S. military hospitals and clinics. Total outpatient encounter data were obtained from the DOD’s Electronic Surveillance System for the Early Notification of Community-based Epidemics. The percentage of outpatient influenza encounters was calculated as the weekly percentage of total outpatient encounters.

Short-term, 1-2-week forecasts were previously generated by the IB Branch each week during the influenza season for the U.S., including all military hospitals and clinics for 2023 epidemiological week 40 through 2024 EW 20. Forecasts were generated weekly using various time series and machine learning models, including autoregressive integrated moving average, error-trend-seasonality, exponentially weighted moving average, naïve, neural network, poisson, prophet, random forest, time series linear model, and vector autoregressive model. An ensemble model was created as an average of all the forecasting models used.

Short-term, 1-2-week LSTM model forecasts were generated for percentages of MHS influenza encounters for each week of the 2023-2024 influenza season by utilizing training data from the previous influenza season (2022 EW 40 through 2023 EW 20). Forecast horizons, the timeframe for which a forecast is made, were defined for 1 week, 2 weeks, and 1-2 weeks ahead. To validate the model, the data were separated into training and testing sets for each EW of evaluation. Training loss was calculated using mean squared error. Key hyper-parameters including number of hidden units (50), dropout rate (0.2), and an adaptive retrospective period were used to improve model performance.

Weekly forecasts were then compared with observed values from each EW using the weighted interval score4 and absolute percentage error. Scores from the LSTM model were then combined with all previously generated model scores to assess model performance.

All analyses and data processing used R version 4.4.2. LSTM models were created using the “torch” package in R, an opensource machine learning framework based on PyTorch.5

Results

WIS, log-transformed WIS, and APE were calculated for 1,924 total forecasts. The average training loss per evaluation week for the LSTM model was 0.5. Median log-transformed WIS and median APE are shown in the Table for each model as well as 1-week, 2-week, and combined 1-2-week forecasts. The LSTM model had the lowest median log-transformed WIS for all forecasting horizons: 1 week (0.3), 2 weeks (0.4), and combined 1-2 weeks (0.4). The VAR model had the lowest median APE for all forecasting horizons (37.5%). Figure 1a presents forecasts with 95% confidence interval bands for the LSTM and ENSEMBLE models over the study period. During 2023 EWs 51 and 52, observed influenza encounter percentages peaked at 0.5% and 0.8%, respectively. The LSTM and ENSEMBLE models under-predicted values, however, with estimates ranging from 0.17% to 0.2% during this period. Figure 1b displays a grouped boxplot of log WIS for each forecast target for all models, ranked by median log WIS. The LSTM model had the lowest log WIS, while the POISSON model had the highest.

FIGURE 1a. Influenza Encounter Percentage by Forecast Target, Military Health System, November 2023–June 2024. This figure is composed of two graphs, each of which charts observed as well as forecasted weekly data, with one graph presenting data for one week in advance, or ahead, forecasts and the other presenting data for two week advance, or ahead, forecasts. Each graph presents a series of data points connected by three different lines along the horizontal, or x-, axis, with two lines in each graph representing a different forecasting model, and the third line in each graph plotting observed data for the same time periods. The intervals along the x axis represent the months from October 2023 through June 2024 in both graphs. In each chart, each line connects 32 data points, each representing a distinct week. The vertical, or y-, axis measures encounter percentages and is divided into units of .25, from 0.00 to 0.75. Corresponding shaded areas around the lines representing the forecasting models represent 95 percent confidence intervals for those forecasts. In each graph, both models lagged behind the greatest spike in the observed data, by a week, and both under-estimated it by nearly one third. The confidence interval for the LSTM model was significantly more precise than the confidence interval for the ENSEMBLE model.FIGURE 1b. Weighted Interval Score by Forecast Target.  This figure displays two grouped boxplot charts showing the distribution of log-transformed weighted interval score (log WIS) for 10 different forecasting models, one for 1-week-ahead and the other 2-week-ahead forecasts, ranked by increasing median log WIS from left to right, indicating decreasing forecast accuracy across the models. In the 1-week-ahead boxplot, LSTM has shorter box and whiskers than the other models, indicating that the model has higher prediction accuracy and lower uncertainty. On the other hand, ARIMA has a shorter median line to minimum than other models, but its box and whiskers are longer, which means the range of WIS values is wider, indicating lower accuracy and greater uncertainty. All models except EWMA and PROPHET show that data values tend to cluster around a central point. The box plot for 2-week shows similar results to the 1-week-ahead boxplot, showing that LSTM has a shorter box and ARIMA has a longer box. However, except for the VAR and PROPHET models, the median lines inside the boxes positioned close to the top edge of the box indicating that most models have skewed distribution.

Discussion

Our analyses indicate that LSTM had the lowest log WIS among the individual models for all forecasting horizons, resulting in more accurate forecasts. These findings align with previous studies that successfully used LSTM models to forecast influenza-like illness and influenza hospitalizations.6,7 Neither the LSTM nor ENSEMBLE models accurately predicted the peak period, 2023 EWs 51-52 (December 17-30), however. This could be due to the utilization of 2022-2023 influenza season data for the training data, as recent seasonal influenza patterns have exhibited significantly higher peaks earlier in the season compared to influenza seasons prior to the COVID-19 pandemic.8,9 To improve influenza peak period forecasts, training data may need to include multiple years, before and after the COVID-19 pandemic, as part of further analysis.

This study had some limitations. First, this study did not employ a formal cross-validation method to optimize hyper-parameters and construct the best-performing LSTM model, which may have contributed to poor predictions, particularly in the early weeks of the study period. Further research is needed to optimize the LSTM model for influenza encounter predictions. Second, some WIS values were found to be zero, indicating that the estimated value was an exact match to the observed value. Scores equal to zero should be interpreted with caution, as those values may be due to overconfidence and result in an undefined log-transformed WIS.10 Consequently, WIS values equal to 0 were excluded from the calculation of log-transformed WIS, but this may have introduced bias by excluding forecasts that were very close to actual values. Third, it is not possible to state with confidence that these results are generalizable to other respiratory diseases or related metrics such as hospitalizations, admission rates, or case rates. Lastly, this analysis does not reflect changes after the 2023-2024 influenza season to improve forecasting, such as the removal of the ETS, EWMA, PROPHET, and TSLM models. Although the LSTM model outperformed several models included in the ENSEMBLE model, it is likely the ENSEMBLE model will perform better for the 2024-2025 influenza season. 

The findings of this study demonstrate that the addition of the LSTM model improves the short-term forecasting performance of the ENSEMBLE model for outpatient influenza encounter data, which is commonly used to assess the activity intensity of this respiratory disease within the MHS population. Further research is recommended to determine the performance of the LSTM model for other respiratory infections, including COVID-19.

Authors’ Affiliation

Armed Forces Health Surveillance Division, Integrated Biosurveillance Branch, Silver Spring, MD: Ms. Cherukuri, Mr. Bova, Ms. Mehta, Dr. Bautista

References

  1. Jang B, Kim I, Kim JW. Effective training data extraction method to improve influenza outbreak prediction from online news articles: deep learning model study. JMIR Med Inform. 2021;9(5):e23305. doi:10.2196/23305 
  2. Armed Forces Health Surveillance Division. Integrated Biosurveillance. Defense Health Agency, U.S. Dept. of Defense. Accessed Jan 3., 2025. https://health.mil/military-health-topics/health-readiness/afhsd/integrated-biosurveillance 
  3. Dai S, Han L. Influenza surveillance with Baidu index and attention-based long short-term memory model. PLoS One. 2023;18(1):e0280834. doi:10.1371/journal.pone.0280834   
  4. Torch for R. Mlverse.org. Accessed Jan 13, 2025. https://torch.mlverse.org 
  5. Bracher J, Ray EL, Gneiting T, Reich NG. Evaluating epidemic forecasts in an interval format [published correction in PLoS Comput Biol. 2022;18(10):e1010592. doi:10.1371/journal.pcbi.1010592]. PLoS Comput Biol. 2021;17(2):e1008618. doi:10.1371/journal.pcbi.1008618 
  6. Tsan YT, Chen DY, Liu PY, et al. The prediction of influenza-like illness and respiratory disease using LSTM and ARIMA. Int J Environ Res Public Health. 2022;19(3):1858. doi:10.3390/ijerph19031858 
  7. Li G, Li Y, Han G, et al. Forecasting and analyzing influenza activity in Hebei province, China, using a CNN-LSTM hybrid model. BMC Public Health. 2024;24(1):2171. doi:10.1186/s12889-024-19590-8 
  8. Del Riccio M, Caini S, Bonaccorsi G, et al. Global analysis of respiratory viral circulation and timing of epidemics in the pre-COVID-19 and COVID-19 pandemic eras, based on data from the Global Influenza Surveillance and Response System (GISRS). Int J Infect Dis. 2024;144:107052. doi:10.1016/j.ijid.2024.107052 
  9. Lewis T. Why this year’s flu season is the worst in more than a decade. Scientific American. [published online.] Mar. 3, 2025. Accessed Mar 11, 2025. https://www.scientificamerican.com/article/why-this-years-flu-season-is-the-worst-in-more-than-a-decade 
  10. Bosse NI, Abbott S, Cori A, et al. Scoring epidemiological forecasts on transformed scales. PLoS Comput Biol. 2023;19(8):e1011393. doi:10.1371/journal.pcbi.1011393

You also may be interested in...

Article
May 1, 2019

Absolute and relative morbidity burdens attributable to various illnesses and injuries, non-service member beneficiaries of the Military Health System, 2018

A senior airman of 366th Medical Support Squadron pediatric clinic checks vitals of the child of its service member at Mountain Home Air Force Base in Idaho. (Photo courtesy of U.S. Air Force)

In 2018, mental health disorders accounted for the largest proportions of the morbidity and healthcare burdens that affected the pediatric and younger adult beneficiary age groups. Among adults aged 45–64 years, musculoskeletal diseases accounted for the most morbidity and health care burdens, and among adults aged 65 years or older, cardiovascular ...

Article
May 1, 2019

Morbidity burdens attributable to various illnesses and injuries, deployed active and reserve component service members, U.S. Armed Forces, 2018

A U.S. naval officer listens through his stethoscope to hear his patient’s lungs at Camp Schwab in Okinawa, Japan in 2018. (Photo courtesy of U.S. Marine Corps) photo by Lance Cpl. Cameron Parks)

Among service members deployed during 2018, injury/poisoning, musculoskeletal diseases, and signs/symptoms accounted for more than half of the total health care burden while deployed. Compared to the distribution of major burden of disease categories documented in garrison, a relatively greater proportion of in-theater medical encounters due to ...

Article
May 1, 2019

Ambulatory visits, active component, U.S. Armed Forces, 2018

A U.S. naval officer listens through his stethoscope to hear his patient’s lungs at Camp Schwab in Okinawa, Japan in 2018. (Photo courtesy of U.S. Marine Corps) photo by Lance Cpl. Cameron Parks)

Musculoskeletal disorders and mental health disorders accounted for more than half (52.6%) of all illness- and injury-related ambulatory encounters among active component service members in 2018. Since 2014, the number of ambulatory visits for mental health disorders has decreased, while the numbers of ambulatory visits for musculoskeletal system ...

Article
Apr 1, 2019

Update: Exertional Hyponatremia, Active Component, U.S. Armed Forces, 2003–2018

Drink water the day before and during physical activity or if heat is going to become a factor. (Photo Courtesy: U.S. Air Force)

From 2003 through 2018, there were 1,579 incident diagnoses of exertional hyponatremia among active component service members, for a crude overall incidence rate of 7.2 cases per 100,000 person-years (p-yrs). Compared to their respective counterparts, females, those less than 20 years old, and recruit trainees had higher overall incidence rates of ...

Article
Apr 1, 2019

Modeling Lyme Disease Host Animal Habitat Suitability, West Point, New York

A deer basks in the morning sun at Joint Base San Antonio-Fort Sam Houston, Texas.  (Photo Courtesy: U.S. Air Force)

As the most frequently reported vector-borne disease among active component U.S. service members, with an incidence rate of 16 cases per 100,000 person-years in 2011, Lyme disease poses both a challenge to health care providers in the Military Health System and a threat to military readiness. Spread through the bite of an infected blacklegged tick, ...

Article
Apr 1, 2019

Incidence, Timing, and Seasonal Patterns of Heat Illnesses During U.S. Army Basic Combat Training, 2014–2018

U.S. Marines participate in morning physical training during a field exercise at Marine Corps Base Camp Pendleton, California. (Photo Courtesy: U.S. Marine Corps)

Risk factors for heat illnesses (HIs) among new soldiers include exercise intensity, environmental conditions at the time of exercise, a high body mass index, and conducting initial entry training during hot and humid weather when recruits are not yet acclimated to physical exertion in heat. This study used data from the Defense Health Agency’s ...

Article
Apr 1, 2019

Update: Exertional Rhabdomyolysis, Active Component, U.S. Armed Forces, 2014–2018

U.S. Marines sprint uphill during a field training exercise at Marine Corps Air Station Miramar, California. to maintain contact with an aviation combat element, teaching and sustaining their proficiency in setting up and maintaining communication equipment.  (Photo Courtesy: U.S. Marine Corps)

Among active component service members in 2018, there were 545 incident diagnoses of rhabdomyolysis likely due to exertional rhabdomyolysis, for an unadjusted incidence rate of 42.0 cases per 100,000 person-years. Subgroup-specific rates in 2018 were highest among males, those less than 20 years old, Asian/Pacific Islander service members, Marine ...

Article
Mar 1, 2019

Brief Report: Male Infertility, Active Component, U.S. Armed Forces, 2013–2017

Sperm is the male reproductive cell  Photo: iStock

Infertility, defined as the inability to achieve a successful pregnancy after 1 year or more of unprotected sexual intercourse or therapeutic donor insemination, affects approximately 15% of all couples. Male infertility is diagnosed when, after testing both partners, reproductive problems have been found in the male. A male factor contributes in part ...

Article
Mar 1, 2019

Sexually Transmitted Infections, Active Component, U.S. Armed Forces, 2010–2018

Anopheles merus

This report summarizes incidence rates of the 5 most common sexually transmitted infections (STIs) among active component service members of the U.S. Armed Forces during 2010–2018. Infections with chlamydia were the most common, followed in decreasing order of frequency by infections with genital human papillomavirus (HPV), gonorrhea, genital herpes ...

Article
Mar 1, 2019

Vasectomy and Vasectomy Reversals, Active Component, U.S. Armed Forces, 2000–2017

Sperm is the male reproductive cell  Photo: iStock

During 2000–2017, a total of 170,878 active component service members underwent a first-occurring vasectomy, for a crude overall incidence rate of 8.6 cases per 1,000 person-years (p-yrs). Among the men who underwent incident vasectomy, 2.2% had another vasectomy performed during the surveillance period. Compared to their respective counterparts, the ...

Article
Mar 1, 2019

Testosterone Replacement Therapy Use Among Active Component Service Men, 2017

Image of Marines carrying a wooden log for physical fitness. Click to open a larger version of the image.

This analysis summarizes the prevalence of testosterone replacement therapy (TRT) during 2017 among active component service men by demographic and military characteristics. This analysis also determines the percentage of those receiving TRT in 2017 who had an indication for receiving TRT using the 2018 American Urological Association (AUA) clinical ...

Skip subpage navigation
Refine your search