# Using a delay-adjusted case fatality ratio to estimate under-reporting

**This study has not yet been peer reviewed.**

## Aim

To estimate the percentage of symptomatic COVID-19 cases reported in different countries using case fatality ratio estimates based on data from the ECDC, correcting for delays between confirmation-and-death.

## Methods Summary

In real-time, dividing deaths-to-date by cases-to-date leads to a biased estimate of the case fatality ratio (CFR), because this calculation does not account for delays from confirmation of a case to death, and under-reporting of cases.

Using the distribution of the delay from hospitalisation-to-death for cases that are fatal, we can estimate how many cases so far are expected to have known outcomes (i.e. death or recovery), and hence adjust the naive estimates of CFR to account for these delays.

The adjusted CFR does not account for under-reporting. However, the best available estimates of CFR (adjusting or controlling for under-reporting) are in the 1% - 1.5% range. We assume a baseline CFR, taken from a large study in China, of 1.38% (95% crI: 1.23–1.53%)[1]. If a country has an adjusted CFR that is higher (e.g. 20%), it suggests that only a fraction of cases have been reported (in this case, \(\frac{1.38}{20} = 6.9\%\) cases reported approximately).

## Current estimates for percentage of symptomatic cases reported for countries with greater than ten deaths

### Figure

*Figure 1: Plotting the estimates for the proportion of symptomatic cases reported in different countries using cCFR estimates. Blue shading is the 2.5% - 97.5% confidence range. Note that there is a mean delay of 13 days between confirmation and death, and so these estimates reflect the percentage of cases being reported as of around two weeks ago.*

### Table

Country | Percentage of cases reported (95% CI) | Total cases | Total deaths |
---|---|---|---|

Albania | 9.1% (5.2% - 18%) | 243 | 15 |

Algeria | 6.8% (4.6% - 11%) | 584 | 35 |

Andorra | 15% (7.8% - 31%) | 376 | 12 |

Argentina | 19% (11% - 32%) | 966 | 24 |

Australia | 100% (69% - 100%) | 4707 | 20 |

Austria | 42% (32% - 56%) | 10182 | 128 |

Belgium | 7.4% (6.2% - 8.8%) | 12775 | 705 |

Bosnia and Herzegovina | 13% (7.1% - 28%) | 413 | 12 |

Brazil | 11% (8.9% - 14%) | 5717 | 201 |

Burkina Faso | 8.7% (4.7% - 18%) | 246 | 12 |

Canada | 31% (23% - 42%) | 8536 | 96 |

Chile | 89% (46% - 100%) | 2738 | 12 |

China | 33% (29% - 38%) | 82295 | 3310 |

Colombia | 24% (13% - 46%) | 906 | 16 |

Czech Republic | 53% (33% - 86%) | 3308 | 31 |

Denmark | 20% (15% - 27%) | 2860 | 90 |

Dominican Republic | 7.1% (5% - 10%) | 1109 | 51 |

Ecuador | 14% (9.9% - 19%) | 2302 | 79 |

Egypt | 9.7% (6.6% - 15%) | 656 | 41 |

Finland | 48% (27% - 92%) | 1384 | 17 |

France | 6.9% (6% - 7.9%) | 52128 | 3523 |

Germany | 47% (39% - 56%) | 67366 | 732 |

Greece | 16% (11% - 24%) | 1314 | 49 |

Hungary | 15% (8.6% - 29%) | 492 | 16 |

India | 17% (11% - 27%) | 1397 | 35 |

Indonesia | 5.3% (4.1% - 6.9%) | 1528 | 136 |

Iran | 9.9% (8.5% - 11%) | 44606 | 2898 |

Iraq | 7% (4.9% - 10%) | 694 | 50 |

Ireland | 20% (15% - 29%) | 3235 | 71 |

Israel | 100% (60% - 100%) | 4916 | 20 |

Italy | 5.8% (5.1% - 6.5%) | 105792 | 12430 |

Japan | 26% (18% - 38%) | 1953 | 56 |

Lebanon | 26% (14% - 55%) | 463 | 12 |

Luxembourg | 48% (29% - 84%) | 2178 | 23 |

Malaysia | 46% (30% - 72%) | 2626 | 37 |

Mexico | 17% (11% - 27%) | 1215 | 29 |

Morocco | 6% (4.1% - 9.2%) | 617 | 36 |

Netherlands | 5.9% (5% - 6.9%) | 12595 | 1039 |

Norway | 100% (63% - 100%) | 4447 | 28 |

Pakistan | 37% (22% - 62%) | 2039 | 26 |

Panama | 17% (11% - 28%) | 1181 | 30 |

Peru | 16% (10% - 26%) | 1065 | 30 |

Philippines | 7.4% (5.5% - 10%) | 2084 | 88 |

Poland | 30% (19% - 48%) | 2311 | 33 |

Portugal | 18% (14% - 23%) | 7443 | 160 |

Romania | 12% (8.9% - 17%) | 2245 | 69 |

Russia | 40% (23% - 76%) | 2337 | 17 |

San Marino | 7.7% (4.9% - 13%) | 230 | 26 |

Serbia | 14% (8.8% - 25%) | 900 | 23 |

Slovenia | 40% (21% - 83%) | 814 | 13 |

South Korea | 69% (53% - 90%) | 9786 | 163 |

Spain | 5.4% (4.7% - 6.1%) | 94417 | 8189 |

Sweden | 14% (11% - 18%) | 4435 | 180 |

Switzerland | 24% (20% - 30%) | 16108 | 373 |

Turkey | 15% (12% - 19%) | 15422 | 214 |

Ukraine | 11% (5.9% - 22%) | 549 | 13 |

United Kingdom | 5.4% (4.6% - 6.2%) | 25150 | 1789 |

United States of America | 16% (14% - 19%) | 189618 | 4079 |

*Table 1: Estimates for the proportion of symptomatic cases reported in different countries using cCFR estimates based on case and death timeseries data from the ECDC. Total cases and deaths in each country is also shown. Confidence intervals calculated using an exact binomial test with 95% significance.*

## Adjusting for outcome delay in CFR estimates

During an outbreak, the naive CFR (nCFR), i.e. the ratio of reported deaths date to reported cases to date, will underestimate the true CFR because the outcome (recovery or death) is not known for all cases [2]. We can therefore estimate the true denominator for the CFR (i.e. the number of cases with known outcomes) by accounting for the delay from confirmation-to-death [4].

We assumed the delay from confirmation-to-death followed the same distribution as estimated hospitalisation-to-death, based on data from the COVID-19 outbreak in Wuhan, China, between the 17th December 2019 and the 22th January 2020, accounting right-censoring in the data as a result of as-yet-unknown disease outcomes (Figure 1, panels A and B in [5]). The distribution used is a Lognormal fit, has a mean delay of 13 days and a standard deviation of 12.7 days [5].

To correct the CFR, we use the case and death incidence data to estimate the number of cases with known outcomes [3,4]:

\[ u_{t} = \frac{\sum_{i = 0}^t \sum_{j = 0}^{\infty} c_{i-j} f_j}{\sum_{i = 0}^t c_i}, \]

where \(u_t\) represents the underestimation of the known outcomes [2–4] and is used to scale the value of the cumulative number of cases in the denominator in the calculation of the cCFR, \(c_{t}\) is the daily case incidence at time, \(t\) and \(f_t\) is the proportion of cases with delay of \(t\) between confirmation and death.

## Approximating the proportion of symptomatic cases reported

At this stage, raw estimates of the CFR of COVID-19 correcting for delay to outcome, but not under-reporting, have been calculated. These estimates range between 1% and 1.5% [1,4,6]. We assume a CFR of 1.38% (95% crI 1.23% - 1.53%), taken from a recent large study [1], as a baseline CFR. We use it to approximate the potential level of under-reporting in each country. Specifically, we perform the calculation \(\frac{1.38\%}{\text{cCFR}}\) of each country to estimate an approximate fraction of cases reported.

## Limitations

Implicit in assuming that the under-reporting is \(\frac{1.38\%}{\text{cCFR}}\) for a given country is that the deviation away from the assumed 1.38% CFR is entirely down to under-reporting. In reality, burden on healthcare system is a likely contributing factor to higher than 1.38% CFR estimates, along with many other country specific factors.

The following is a list of the other prominent assumptions made in our analysis:

We assume that people get tested upon hospitalisation. A few examples where this is not the case are Germany and South Korea, where people can get tested earlier.

We assume that hospitalisation to death from early Wuhan is representative of the all the other countries (by using the distribution parameterised using early Wuhan data) and that all countries have the same risk and age profile as Wuhan.

Severity of COVID-19 is known to increase with age. Therefore, countries with older populations will naturally see higher death rates. We are extending this analysis to adjust for the age distribution for countries with more than five reported deaths and where age distribution data is available.

All results are linked and biased by the baseline CFR, assumed at 1.38% [1].

The under-reporting estimate is very sensitive to the baseline CFR, meaning that small errors in it lead to large errors in the estimate for under-reporting.

## Code and data availability

The code is publically available at https://github.com/thimotei/CFR_calculation. The data required for this analysis is a time-series for both cases and deaths, along with the corresponding delay distribution. The data is taken from ECDC, using the NCoVUtils package [7].

## References

1 Verity R, Okell LC, Dorigatti I *et al.* Estimates of the severity of covid-19 disease. *medRxiv* 2020.

2 Kucharski AJ, Edmunds WJ. Case fatality rate for ebola virus disease in west africa. *The Lancet* 2014;**384**:1260.

3 Nishiura H, Klinkenberg D, Roberts M *et al.* Early epidemiological assessment of the virulence of emerging infectious diseases: A case study of an influenza pandemic. *PLoS One* 2009;**4**.

4 Russell TW, Hellewell J, Jarvis CI *et al.* Estimating the infection and case fatality ratio for covid-19 using age-adjusted data from the outbreak on the diamond princess cruise ship. *medRxiv* 2020.

5 Linton NM, Kobayashi T, Yang Y *et al.* Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: A statistical analysis of publicly available case data. *Journal of Clinical Medicine* 2020;**9**:538.

6 Guan W-j, Ni Z-y, Hu Y *et al.* Clinical characteristics of coronavirus disease 2019 in china. *New England Journal of Medicine* 2020.

7 Abbott S MJ Hellewell J. NCoVUtils: Utility functions for the 2019-ncov outbreak. *doi:105281/zenodo3635417* 2020.