The main goal of this project is to estimate the 7-day hospitalization incidence for Germany and its states reliably and to assess recent trends based on incomplete data. The frequently referred to most recent values of the raw hospitalization incidence underestimate the true number of hospitalizations (see also answers to the other questions). Nowcasting corrections of these numbers allow for a better assessment of the current epidemic situation.
At the same time we have a scientific interest to compare different nowcasting methods and to assess whether the combination of different nowcasts yields more reliable results.
We follow the definition by Robert Koch Institute. Today’s seven-day hospitalization incidence is the number of hospitalized cases of COVID-19 (in absolute numbers or per 100.000 population) whose Meldedatum, i.e. the date when the respective infection was electronically recorded at the local health authorities was in the 7 preceding days. It is thus not equivalent to the number of new hospitalizations during the last 7 days. Also, the 7-day hospitalization incidence does not take into account whether the reason of hospitalization was COVID-19 or not.
Further information on the 7-day hospitalization incidence (in German) are available on the GitHub site of Robert Koch-Institute.
In this context we note that some German states (Bundesländer) also publish their own hospitalization incidences which follow different definitions (e.g., temporal aggregation by date of hospitalization rather than Meldedatum of the infection; see this news item by NDR)). We focus exclusively on the indicator provided by RKI.
The Meldedatum and the date when a hospitalization first appears the data set at Robert Koch Institut can be several days or even weeks apart. Several aspects play a role here. Firstly, an infected person may not be in a state which requires hospitalization on the Meldedatum, but reach such a state at a later point. In this case, the number of hospitalizations for the respective Meldedatum will be retrospectively increased by one. Secondly, there can be reporting delays between the actual date of hospitalization and the appearance of the hospitalization in the RKI data.
The daily values of the hospitalization incidence aggregated by Meldedatum are thus usually corrected upwards during the following days and weeks. Most additions occur within a few days, so that the values for the last few days are most strongly affected. Oftentimes they considerably underestimate the true number of hospitalizations (see e.g. this news item by the public broadcaster NDR or CODAG-Report Nr. 21 by LMU Munich; both in German). In particular, preliminary data can create the impression of a decreasing hospitalization incidence even if it is actually on the rise.
Nowcasting is a statistical tool which, based on preliminary data, estimates which value a given quantity will take once measurements are complete. In our application we estimate how many hospitalizations will be reported in total for a given Meldedatum. To this end, information on the hospitalizations already reported up to the current date and data on past reporting delays are used.
The term nowcasting typically refers to events which have already occurred, but have not been measured or reported completely. For instance, nowcasting methods can be used to estimate how many cases of COVID-19 have been detected on a given day before these information have been aggregated into a central data set. This is not exactly the case for the hospitalization incidence as it is possible that hospitalizations linked to a given Meldedatum have not even occurred yet at the time of nowcasting. We nonetheless use the term nowcasting as it has become common for this type of analysis.
The nowcasts presented here should be interpreted as probability statements. An exact estimation of the true number of hospitalizations is not feasible and nowcasts can merely provide a range of probable values (see below).
Nowcasts are always based on a number of assumptions. Moeover, different models may include different additional data sources. Therefore, results based on different approaches often vary, and it is reasonable to compare different nowcasts to get an idea of the range of predictions. Moreover it can be helpful to combine several nowcasts into a so-called ensemble nowcast to achieve a more robust estimation. This approach has shown benefits for instance in weather forecasting, but also in epidmimological applications.
No model is perfect and the exact number of hospitalizations for a given Meldedatum canot be predicted exactly. The nowcasts displayed here therefore explicitly quantify their own uncertainty, i.e. they state how reliable they consider their own estimation. This is done via intervals which are supposed to contain the true value with a given probability (50% or 95%).
The central assumption on which most nowcasts are built is that the delays between the Meldedatum and the apppearance of a hospitalization in the RKI data set will follow similar patterns in the future as they did in the past. If this is not the case, e.g., due to major changes in testing strategies or an overload of the health system, the quality of nowcasts may suffer.
Nowcasts are updated on working days (Monday through Friday). As long as a team have not updated their nowcast yet, their nowcast from the previous day (or the most recent nowcasts which is not older than seven days) is displayed. For instance, the RKI nowcast is currently only made available on Thursdays, when a new weekly report appears. We always display this nowcast until a newer version becomes available.
For the last two days a particularly large number of additional reports must be expected. Therefore, nowcasts for these days are less reliable than for days which are further in the past. For this reason we do not show nowcasts for the last two days by default and recomend to interpret them with special care.
An alternative to nowcasting the hospitalization incidence by Meldedatum (i.e. the day when the respective infection was electronically recorded by the local health authority, see above) is to aggregate hospitalizations by the day when they first appeared in the RKI data set. These numbers do not change over the following days, meaning that trends are more straightforward to interpret. Owing to reporting delays, the resulting curve is typically shifted to the right compared to the seven-day hospitalization incidence by Meldedatum.
Another alternative to nowcasting is to show the 7-day hospitalization incidence for each Meldedatum based on the data version from the respective date. This way, all shown values will be similarly incomplete and more comparable over time. A downside of this approach is that it only exploits part of the information already available.
The main goal of this project is to provide nowcasts in real time to allow for an improved assessment of the current situation. However, in order to systematically compare different modelling approaches, we also collect nowcast which have been created retrospectively and evaluate how they would have performed in the past. For a fair comparison it is crucial that models only use data which would already have been available at the respective time of nowcasting.
This platform is run by members of the Chair of Statistics and Econometrics at Karlsruhe Institute of Technology and the Computational Statistics Group at Heidelberg Institute for Theoretical Studies. Several other independent groups from academia and media contribute nowcasts (see also metadata files in our GitHub repository):
Moreover we display the most current nowcasts from the weekly reports of the Robert Koch Institute.
These Nowcasts are based on the proportion of the 7-day incidence of COVID-19 cases that appear in the hospitalization incidence after one, two etc. weeks. This model then multiplies todays known 7-day incidence by these weekly proportions and predicts todays 7-day hospitalization incidence by summing these up. The uncertainty is based on a log-normal (within age-groups) or normal (sum over all age groups) distribution where the dispersion is estimated based on retrospective application of the model.
Nowcasting is based on a simple estimation of the distribution of delays between the Meldedatum and appearance in the RKI data set (based on the last 60 days). From these, multiplication factors are derived and used for an upward correction of incomplete observations. To assess the nowcast uncertainty, the same corrected values are generated for past time points (based on the information available at the respective time) and compared to the subsequently observed values. To this end we assume a negative binomial distribution, where the dispersion parameter is a function of the time difference between Meldedatum and date of nowcast. Estimation of the dispersion parameter ist done via a maximum likelihood approach.
This method is purposefully kept simple and has the role of a reference/baseline model in our comparative evaluation study (see below). The central assumption is that delay distributions are temporally stable. Weekday effects and recent developments in case numbers are not accounted for.
Nowcasts are based on a generalized additive model and the sequential multinomial structure of the time delay. The model is a slightly adapted version of the method by Schneble et al. (2020) for nowcasting of fatal infections.
This model is a simplified version of the model presented by van de Kassteele, Eilers and Wallinga (Epidemiology, 2019). The reported counts by date and delay are described by a negative binomial distribution. The expected values are modelled by a two-dimensional P-spline surface and other covariates. This surface is extrapolated for all dates and delays outside the reporting triangle. The nowcast is obtained by summing all counts over the delays by date. Prediction intervals are obtained by Monte Carlo simulation from the predictive distribution. This simplified model is without the calender time x delay interaction, the unimodality and boundary constraints. Model fitting is done efficiently using the mgcv package in R.
These nowcasts are taken from the weekly reports of the Robert Koch Institute (updated on Thursdays). These are based on a modified version of the nowcasting method for 7-day case incidences.
As the RKI nowcasts are currently only communiated via the weekly reports, we read out the numbers from the respective figures, which may lead to minor imprecisions.
SZ estimates the nowcasting values for the hospitalization incidence based on differences between the daily published and retrospectively corrected values resulting from later reports. To this end the archived data sets of Robert-Koch-Institute (https://github.com/robert-koch-institut/COVID-19-Hospitalisierungen_in_Deutschland) from the last 60 days are analysed. For each of the 25 days before the last date in the data set we compute by how many percent the later corrected value differs from the originally reported value. Quantiles of the resulting multiplication factors are computed, and the currently reported hospitalization incidence is multiplied with these quantiles to estimate the total incidence. Finally, the results are smoothed over a three-day window to remove unrealistic fluctuations.
A semi-parametric nowcasting method for right censored hospitalisations by date of positive test. Hospitalisations are modelled using a random walk on the log scale. Reporting delays are then modelled parametrically using a lognormal distribution with the log mean and log standard deviation each modelled using a weekly random walk with a pooled standard deviation. Report date effects are modelling using a random effect for day of the week with public holidays assumed to be reported like Sundays. Age groups and locations are nowcast independently (thus the name of the model). The model is implemented using the epinowcast R package. The analysis code is available here. Note: There is a second version of this model currently not displayed in which the different time series are modelled in a hierarchical manner (Epiforecasts-hierarchical).
All nowcasts are available under open licences in a public GitHub repository. They can be re-used for a broad range of purposes provided that the source is acknowledged (check the respective license files in the repository for details). You are invited to briefly get in touch with the organizing team or the creators of the respective nowcasts when re-using them publicly.