Abstract: |
Seasonal influenza epidemics causes severe illnesses and 250,000 to 500,000 deaths worldwide each year.
Other pandemics like the 1918 “Spanish Flu” may change into a devastating one. Reducing the impact of
these threats is of paramount importance for health authorities, and studies have shown that effective interventions
can be taken to contain the epidemics, if early detection can be made. In this paper, we introduce
the Social Network Enabled Flu Trends (SNEFT), a continuous data collection framework which monitors flu
related tweets and track the emergence and spread of an influenza. We show that text mining significantly
enhances the correlation between the Twitter and the Influenza like Illness (ILI) rates provided by Centers
for Disease Control and Prevention (CDC). For accurate prediction, we implemented an auto-regression with
exogenous input (ARX) model which uses current Twitter data, and CDC ILI rates from previous weeks to
predict current influenza statistics. Our results show that, while previous ILI data from CDC offer a true (but
delayed) assessment of a flu epidemic, Twitter data provides a real-time assessment of the current epidemic
condition and can be used to compensate for the lack of current ILI data. We observe that the Twitter data is
highly correlated with the ILI rates across different regions within USA and can be used to effectively improve
the accuracy of our prediction. Our age-based flu prediction analysis indicates that for most of the regions,
Twitter data best fit the age groups of 5-24 and 25-49 years, correlating well with the fact that these are likely,
the most active user age groups on Twitter. Therefore, Twitter data can act as supplementary indicator to gauge
influenza within a population and helps discovering flu trends ahead of CDC. |