Data, Info and News of Life and Economy

Tag Archives: COVID-19 Data

COVID Data Gaps And Inaccuracies: New Reports From Science And Nature

George Calhoun wrote . . . . . . . . .

  • “All governments want to downplay the degree of deaths… and reported deaths greatly underestimate pandemic-associated mortality.” – Science (January 14, 2022)
  • Reported COVID-19 numbers are clearly too low or missing entirely… Everyone knows any answer will be provisional and imprecise. But they hope to counter misleading claims prompted by official figures, such as China’s count of just under 5,000 COVID-19 deaths.” – Nature (January 18, 2022)

Except for the crisis in the Ukraine, the most serious cloud overhanging the global economy right now is the slowdown in China, driven (in part) by the re-imposition of lockdowns and other severe countermeasures in response to the latest Covid outbreaks in several Chinese cities.

This has raised the question of whether China’s Zero-Covid policy is really working, and at what cost – which in turn has called attention to obvious gaps, anomalies and discrepancies in Beijing’s reported figures for Covid infection and mortality.

The media has mostly ignored the data issue until recently, but this week two new reports in the leading international scientific journals, Science and Nature, have spotlighted the problem.

The Science Article: The India Case

The latest issue of Science reports on a study of Covid mortality statistics in India, which concludes that the official Covid death statistics vastly under-estimate India’s actual death rate.

The study estimates that in 2020 and 2021 there were approximately 3.25 million excess deaths – deaths above the historical trend line – a number 6 to 7 times higher than the official Covid death count (480,000) reported by the Indian government.

The article cites a number of previous studies that also found significant excess mortality for India. A model developed by The Economist magazine (described in my previous columns) gives an even higher number, which Nature magazine calls “sadly plausible”:

“The model estimates some 5 million deaths in India, for example, 10 times higher than the country’s official COVID-19 toll.”
Another recent study by the National Bureau of Economic Research puts the estimate for India’s excess deaths at “roughly 6.3 million” — which is higher than the current official Covid death toll for the entire world (~ 5.6 million).

Although the article does not address China specifically, it bears on the general question of the reliability of official Covid data, and the causes of under-reporting that need to be controlled for in any statistical study. The data-quality and completeness issues that confront India also confront China, and although the scale of the discrepancies may be different, they must still be acknowledged. Under-reporting is real and pervasive almost everywhere, and efforts to waive it away because of supposed Chinese exceptionalism do not measure up to proper scientific standards.

Causes of Under-Reporting

The authors of the Science article attribute the under-reporting in India to three factors:

  • Reporting Failures: failures to properly certify or record the cause of death, especially in rural areas with inferior healthcare services
  • Comorbidity: misattribution of Covid deaths to co-existing chronic diseases
  • “Politics”

Reporting Failures, Especially from Rural Areas

Certification of “cause of death” is one step in the Covid data pipeline where information-loss can occur.

“It’s easy to forget that access to basic health data, such as birth and death certificates and communicable-disease reporting, is the fuel that drives the public-health machine. Even basic data are lacking in some places…” – Nature (January 6, 2022)
The criterion for deciding if someone has died from Covid is not standardized across the world.

“Early in the pandemic, countries such as the Netherlands counted only those individuals who died in hospital after testing positive for Covid. Neighboring Belgium included deaths in the community and everyone who died after showing symptoms of the disease, even if they weren’t diagnosed.”

In the case of India, “medical certification remains uncommon.”

“Of India’s 10 million deaths estimated by the United Nations in 2020, over eight million did not undergo medical certification.”

The Economist points out that “Many people who die while infected with SARS-CoV-2 are never tested for it, and do not enter the official totals.”

The problem of underreporting in India is apparently worse in the poorer, rural areas of the country. The same is true in the United States.

“Rural regions [in the U.S.] … broadly lack access to healthcare, tend to have older and health-compromised populations, and have far more limited access to COVID testing.” – Proceedings of the National Academy of Sciences
Do similar problems affect rural China? Because of Beijing’s data suppression, we don’t know. But that is a huge “Don’t Know.” There are more people living in the rural China than in the entire European Union. The presumption (until proven otherwise) must be that certification deficits would likely exist. Because of the enormous numbers involved, any flaws in the cause-of-death certification process in rural China would have a huge potential impact on the numbers.


The Indian study cites “misattribution” of Covid deaths to other causes. How serious is this problem, seen from a statistical and data collection perspective?

Covid elevates the death rate for people with pre-existing conditions or risk factors, like diabetes or heart disease. The effect is very significant. As The Economist’s authors write, “those with other illnesses (“comorbidities”)—die at alarming rates.”

When someone with, say, kidney disease, falls victim to Covid and dies, how should be reported? Is Covid the actual cause of death?

“It can be challenging to differentiate between people who died of COVID-19 and those who were infected but died from unrelated causes. “That’s going to be a very critical piece of all this,” [a Harvard epidemiologist] says. “If you have two concurrent conditions, what does it get classified as?” – Nature
A related problem is created by Covid’s ability to express in forms similar to other illnesses.

“Then there are the direct-but-uncounted deaths — those that were missed because the individual presented with symptoms not recognized as COVID-19. “We’re still figuring out exactly how the disease manifests,” says Natalie Dean, a biostatistician at the University of Florida in Gainesville. Strokes and pulmonary embolisms are two potentially deadly complications of the virus that might have been overlooked initially, she says.” – Nature
How serious is the impact of this ambiguity on the data? The Economist’s researchers were able to access an impressive database describing comorbidity patterns in the U.S.

“A group of American hospitals, doctors, insurers, pharmacies and data vendors have pooled data about their patients to create the Covid-19 Research Database, an archive of over 5 billion medical records. The project’s administrators have granted access to The Economist. The archive records the age, sex and presence of 29 co-morbidities among 104 million people in America, of whom 466,000 were diagnosed with covid in May-December 2020. It also lists which ones died in 2020, and, for people who tested positive, their date of diagnosis and whether they were hospitalised during their illness.”

Covid boosted the mortality rates for all groups defined by the various comorbidity factors, as shown here for a sample of elderly women:

The death rate for all of these medical conditions increased with Covid. For people with no pre-existing conditions in this age/gender group, the death rate for those who are infected with Covid is about 4.5 per 1000 people. For those with hypertension, Covid raises the death rate to about 6 per 1000. For people with Type 1 diabetes or heart disease, Covid almost doubles the death rate, to more than 8 per 1000.

If a person with hypertension or diabetes becomes infected with Covid, and dies, what should be recorded as the cause of death? Some with those conditions would have died in any case. How can we capture statistically the extra mortality contributed by Covid?

There is obviously significant ambiguity – which can be exploited to manipulate the Covid statistics. If the authorities wish to see a reduced Covid mortality rate, it is easy to attribute these deaths to the other causes. This is another significant source of information loss in the collation of Covid mortality statistics.

Are local Chinese authorities actually exploiting this ambiguity to meet the zero-covid mandates of Beijing? It would be hard to detect. It is certainly possible, and even “sadly plausible.”


The authors of the Science article also point to government interference in India in the data collection and reporting processes.

“‘The Indian government very much is trying to suppress the numbers in the way that they coded the COVID deaths,’ Jha says. ‘I think the political pressures were such that they said, “Anything that’s going to come out is going to be embarrassing.”’”

Is it possible that Chinese authorities may also have manipulated the data on Covid?

It is more than possible. Given (1) the history of similar cover-ups in China in the past, and (2) the absurdity of the official Covid death count (zero deaths since April 2020), one would have to say that political manipulation by Beijing is a virtual certainty.

The Article in Nature: Excess Mortality & The Missing Chinese Data

Also this week, Nature magazine published a review of excess mortality methodologies and studies.

The public health establishment needs this metric, with all it flaws. The World Health Organization (WHO) says “we are likely facing a significant undercount of total deaths directly and indirectly attributed to COVID-19.”

“Recorded increases in all-cause mortality during peak pandemic transmission are likely nearly all caused by COVID infection. WHO has recognized such counts as a crude but useful method to track the pandemic.”

The Nature article examines excess mortality calculations for past influenza outbreaks, as well as the models built to estimate the excess mortality associated with Covid, including models developed by the Institute for Health Metrics and Evaluation (IMHE) in Seattle, and by The Economist (both of which are updated daily). It also reviews “the most comprehensive of these excess-mortality estimates” developed by an economist at the Hebrew University of Jerusalem in Israel, and a data scientist at the University of Tübingen, Germany – the World Mortality Dataset (WMD), a leading database of all-cause mortality before and during the pandemic (2015–21) assembled from many sources covering 116 countries and territories.

One of the implications of the Nature article is this: All these models are hampered by the incompleteness of Chinese Covid data.

For example, the WMD (an unfortunate acronym?) is less than “most comprehensive” in at least one important respect: it “lacks excess-death estimates for China…They either do not collect them or do not publish them…” (The WMD authors themselves cite the disappointing response they received from Chinese authorities, i.e., “We are sorry to inform you that we do not have the data you requested.”)

(Another important database, the Human Mortality Database (HMD) developed and managed by the University of California at Berkeley and the Man Planck Institute in Germany, is apparently also forced to exclude China because of the lack of data.)

The IMHE model does not have updated information on China, and can cite only the official, ever-unchanging figure of 4636 Covid deaths in all of mainland China, frozen since April 2020.

The Economist’s team (also stuck with the spurious 4636 datapoint) used a Machine Learning model to try to interpolate the missing data. Their approach is sound and careful, but they acknowledge that the absence of reliable Chinese data is a serious handicap.

“For China, the model estimates almost 750,000 deaths (well over 150 times higher than the country’s reported 4,600), but with a wide uncertainty interval ranging from as low as 200,000 fewer deaths than expected, to as high as 1.9 million excess deaths. ‘The only fair thing to present at this point is a very wide range,’ says Sondre Ulvund Solstad, a data scientist who leads The Economist’s modelling work. ‘But as more data come in, we are able to narrow it.’”

Optimism is good. Maybe some day more data will show up.

[Note: As described in my previous column, my own admittedly rough analysis of excess mortality based on anomalous changes in China’s Crude Death Rate arrived at an estimate of about 800,000 unexplained excess deaths in 2020 and 2021 – not far off from the mid-range of The Economist’s calculation.]

Much has been made of China’s refusal to cooperate with investigations into the origin of the virus. The refusal to provide information about ongoing Covid infections and deaths is just as problematic – even more so, because it hampers our understanding of the effectiveness of alternative countermeasures.

Excess Mortality is the Crucial Metric

The very idea of excess mortality has been seen by some as controversial. It has the flavor of a “hidden variable.” It is often not directly observable. It is inferred from analysis of information that is available, such as crude death rates and official Covid mortality reports. The estimates for the missing data are imprecise, and may cover a wide range. Some have tried to cast doubt on these efforts, as though “excess mortality” were a mere statistical contrivance.

This is misguided. Divining “hidden variables” is what statistics is all about. Sometimes the missing information can be confirmed by additional direct observation. This is what the Science study has attempted, by examining survey data and government numbers to establish better Covid mortality numbers. In other cases (e.g. China), the hidden variable cannot be verified directly, but it can be quantified within a range of probabilities, sufficient for policy design and assessment. As WHO has said –

“For countries with limited capacity [or willingness] to conduct real-time comparative analysis of observed and expected deaths, health estimates are an important in-filling mechanism. They can be calculated using a variety of statistical methods, from a minimalist approach to expert and statistical data synthesis.”
The UN is on board. They call excess mortality the “preferred measure” for assessing Covid impact.

So, too, the Journal of the American Medical Association:

“Excess mortality is a comprehensive and robust indicator because it relies on all-cause mortality instead of specific causes of death.” – JAMA

And so, too, the World health Organization. WHO’s assistant Director General (a data analytics expert) said this of the Indian study:

“The work nicely triangulates data from different sources, each of which has its own limitations…The study design is robust. Countries can learn from this approach to … produce country-specific estimates.’WHO is now updating its estimates of excess deaths caused by COVID-19 and plans to release them soon.” – Science

(It will be interesting to see if WHO’s update addresses the obvious discrepancies in Chinese excess mortality figures.)

The inherent uncertainty of the estimation process – which statisticians measure with confidence factors and probabilities – is not a valid reason to reject this approach.

“It is better to provide an uncertain number than to rely on a very certain number that is clearly false.” – Nature

The whole world pays the price of bad Covid data.

“Very low or zero ‘official’ numbers of COVID-19 deaths for countries where data are patchy or lacking present problems of their own. They have fuelled nonsense theories that people in Africa have genetic resistance to the disease and don’t need international help or vaccines, for instance.” – Nature
Or – for instance – the nonsense theory that Beijing’s zero-covid policy has actually eliminated all Covid mortality from China since April 2020.

As the authors of the Science study point out

“The [India] finding could prompt scrutiny of other countries with anomalously low death rates and call for a recalibration of the global numbers.” – Science (January 14, 2022)

Source : Forbes