2020: Curious Case of ‘Missing’ Patients in Germany

Sameer S
6 min readApr 20, 2021

Besides lockdowns and “work from home”, another thing that 2020 has oriented us towards is ever increasing rate of ‘affected’ patients. We, unfortunately, are now used to the idea of seeing graphs and trends showing growing incidence rate (of COVID) and normalizing it in our heads. While it in itself is a scary pattern to observe and track, this singular focus on COVID-19 in recent months hides another scarier fact — that many other diseases and indications might not be getting the appropriate attention from medical community which could have long-term damaging or even irreversible affect on the patients (and their caregivers).

Using the databases of notifiable diseases (SurvStat@RKI 2.0) maintained by Robert Koch Institute, I analyzed the historical incidence rate of 75+ diseases in Germany and statistically evaluated the trends observed in 2020. The analytical models were designed to:

- Identify the diseases and indications for which number of cases reported in 2020 are significantly different from historical trends

- Measure the difference in actual reported cases from expected cases; the expected cases were arrived at by predicting number of cases we would have observed in a ‘normal’ 2020 using historical trends

While the agenda for this paper is to highlight the diseases / indications that have either been ‘under-diagnosed or under-reported’ in 2021 using quantitative methods, we do need a subsequent study to understand the reasons that might be causing these ‘missing’ patients.

Stacking the data

SurvStat@RKI 2.0 maintains database of last 20+ years covering cases of about 77 notifiable diseases and confirmation of pathogens in Germany. The data is updated weekly for most of the diseases, but is refreshed monthly for some pathogens (e.g., HIV, Syphilis). Even though the data for many diseases is available for over 20 years, I used data from last 5 years in my analytical models — it provided enough data points to build a robust statistical model and ensured that ‘older’ historical data did not cause unexpected ‘bias’.

As a first step, I measured the Year on Year (yoy) change in number of cases between 2019 and 2020. While a fairly basic KPI to report, the data did throw some large relative changes in case numbers across the two years. E.g., for Noroviral gastroenteritis, reported cases dropped by 62% while cases for Seasonal Influenza reported an 1% increase in 2020 from 2019.

(Chart also published online at https://datawrapper.dwcdn.net/0hzvl/1/)

In summary, the total number of cases dropped by ~ 27% from 2019 to 2020 across all diseases (excluding COVID-19) monitored by RKI. This amounts to about 130,000 lower disease cases reported in 2020 as compared to 2019.

(Chart also published online at https://datawrapper.dwcdn.net/RocU0/1/)

In an ideal scenario, this might be a very positive news as it could as well mean that the incidence and infection rate for many diseases dropped significantly in 2020. But, it could also mean that a few diseases were either under-diagnosed or under-reported largely due to limited access to medical systems — largely driven by acute strain on medical infrastructure in 2020.

Looking through the lens

While the Year-on-Year change (both relative and absolute difference) in reported cases between 2019 and 2020 provides a strong directional sense of the trend at aggregate level, it escapes the nuances of specific diseases. E.g., “Hantavirus disease” cases reported a drop of 85.4% in 2020 (compared to 2019) whereas “Legionnaires’ disease” reported a drop of just 17.1% in 2020 (compared to 2019). However, this relative change hides some very important factors — historical trends, seasonality etc. for the disease amongst many other. As is evident in the chart below, “Hantavirus disease” shows a certain seasonality where the number of cases appear to ebb and peak every 2 years. However, in case of “Legionnaires’ disease”, the number of cases were increasing year on year for last 4 years (prior to 2020) before they dropped in 2020. Hence, the drop of 17.1% in “Legionnaires’ disease” cases is an outlier — statistically significant change — but the same inference cannot be concluded with reasonable confidence about “Hantavirus disease”.

(Chart also published online at https://datawrapper.dwcdn.net/b9FuD/1/)

Digging deeper with a fine-tooth comb

Considering the rather weak significance of simply comparing 2019 case count to 2020, it was important to consider data from few more years — without sacrificing the need for data recency and relevance. Therefore, I used historical data from last 5 years to develop my statistical model to consider the underlying historical trends when predicting the expected (and range of) number of cases in 2020 in a ‘normal’ year.

The actual case count in 2020 for each disease was then compared against the range of expected values (output from the predictive model) to identify the diseases that had statistically significant difference between observed and predicted values. Since the RKI database provides data at weekly level, it provided enough data points for baseline period to develop a reliable model to identify and measure seasonality and trends in the number of cases.

The output from the model provided a range of expected case count for each disease at weekly level. However, for purpose of this paper to simplify reporting, I am aggregating the weekly numbers to annual figures. Plotting the conservative (lower limit) and most likely (reasonable prediction) scenarios against the observed values, we now get a better understanding of the actual change in case count for each disease in 2020.

(Chart also published online at https://datawrapper.dwcdn.net/b9FuD/1/)

Detailed disease level chart (plotting predicted and actual values) at sub-annual level has been published here.

Aggregating these differences between reported and expected case numbers provides us with a staggering reality — Considering the most likely prediction scenario, we observe a difference of about 98,000 cases across the different diseases / indications.

(Chart also published online at https://datawrapper.dwcdn.net/uASl7/1/)

Connecting the dots

While the analytical model could identify the number of cases that can be deemed ‘missing’ from 2020, we now need to understand what could be driving this. Is this an outcome of under-reporting or under-diagnosis, or could be attributed to ‘positive silver-lining’ outcome from the lockdowns?

Well, case can certainly be made especially for contagious diseases like “Rotaviral”, “Noroviral”, “Chickenpox”, “Mumps” etc. commonly prevalent among young children. For such communicable diseases, we can make a reasonable argument that there were lot less chances for spread as schools were closed for most part of the year. Hence, the contagion was largely contained and, thus, significantly impacted the case numbers in 2020. But same can’t be said about some non-communicable diseases — Did stronger focus on cleanliness and hygiene to prevent COVID-19 also help in lowering infections from “Campylobacteriosis” bacteria?

Nevertheless, it makes a good case to continue tracking these trends into 2021 and develop more nuanced models of studying prevalence and risks associated with these disease areas.

--

--