Here are some screenshots of the spreadsheet I created. I will happily upload this somewhere, but I don't know where to share excel docs and I'm not using a Google Drive because fuck Google.
I have included 166 countries in this spreadsheet but have not included the ones where number of tests performed is unavailable. I also did not include China because they are liars and nothing they say is true.
Methodology:
Positive Test Rate was calculated by dividing number of cases by number of tests for each country.
Extrapolated Cases was calculated by multiplying positive test rate by total population
Death Rate was calculated by dividing "confirmed" deaths by total extrapolated cases.
Why did I do it this way? Because this is EXACTLY how the CDC calculates the flu numbers every year.
Findings:
The average positive test rate across the world is only 9.82%. 90 tests out of 100 are negative.
The median positive test rate across the world is 5.09%. 95 tests out of 100 are negative.
The US is currently at a 17.31% positive test rate, while France is at a 35.89% positive test rate.
Conversely Germany has a 6.33% positive test rate and Portugal has a 6.46% positive test rate
Using the average test rate across the world shows 572,344,035 cases which puts the world death rate at 0.0389%
Using the median test rate across the world to extrapolate total cases shows 296,481,154 cases which puts the world death rate at 0.0751%
For comparison In the United states the flu is usually around 0.095%.
The highest death rates in the world are Italy (0.4297%), Spain (0.3100%), and Belgium (0.3066%). San Marino does have a 0.4797% death rate but is estimated to only have 8,547 cases.
The United States Death Rate is 0.1073% while the UK is at 0.1905%
Currently in the US the flu is just as deadly as the coronavirus as a whole for the nation.
Worldwide coronavirus is between 33% and 80% as deadly as the flu (based on the US average flu death rate of 0.095%)
Am I correct in saying your approach assumes the rate of infection between the population actually tested and the population still left untested is the same?
If so, how would you support this? Don't people self-select for testing when symptoms are evident or exposure to an infected person was recent?
This data set is estimating the number of cases that exist beyond those that have only tested positive.
This is not a predictive model of future cases or infection rate. That is not what this data set is attempting to show. This is estimating actual cases that exist NOW.
This is an estimation of the total number of cases worldwide based on the 31M tests that have been performed and using that 31M sample size to estimate the current number of unreported/untested cases - This is the same way the flu impact is calculated.
Not everyone shows symptoms. Not everyone that shows symptoms will get tested. This is an attempt, using the CDC methodology for the flu, to estimate the actual number of cases that exist today.
OK. I was hoping there was a link to the raw dataset because I'm too lazy to write a couple lines of code to scrape that table. Of course I forgot the easier route of ctrl-c and ctrl-v and importing via a spreadsheet. LOL.
spez: do you have a link that lays out the CDC methodology? It's not important; this will just be a fun side project to drop into a jupyter notebook and share with some other stats nerds at work.
I pulled the latest data available from https://www.worldometers.info/coronavirus/#countries for the China Flu because I was curious what the data showed.
Here are some screenshots of the spreadsheet I created. I will happily upload this somewhere, but I don't know where to share excel docs and I'm not using a Google Drive because fuck Google.
https://imgur.com/lVxJYoS
https://imgur.com/UNUTlHD
https://imgur.com/OVENH58
https://imgur.com/cHMpV6L
I have included 166 countries in this spreadsheet but have not included the ones where number of tests performed is unavailable. I also did not include China because they are liars and nothing they say is true.
Methodology:
Positive Test Rate was calculated by dividing number of cases by number of tests for each country.
Extrapolated Cases was calculated by multiplying positive test rate by total population
Death Rate was calculated by dividing "confirmed" deaths by total extrapolated cases.
Why did I do it this way? Because this is EXACTLY how the CDC calculates the flu numbers every year.
Findings:
The average positive test rate across the world is only 9.82%. 90 tests out of 100 are negative.
The median positive test rate across the world is 5.09%. 95 tests out of 100 are negative.
The US is currently at a 17.31% positive test rate, while France is at a 35.89% positive test rate.
Conversely Germany has a 6.33% positive test rate and Portugal has a 6.46% positive test rate
Using the average test rate across the world shows 572,344,035 cases which puts the world death rate at 0.0389%
Using the median test rate across the world to extrapolate total cases shows 296,481,154 cases which puts the world death rate at 0.0751%
For comparison In the United states the flu is usually around 0.095%.
The highest death rates in the world are Italy (0.4297%), Spain (0.3100%), and Belgium (0.3066%). San Marino does have a 0.4797% death rate but is estimated to only have 8,547 cases.
The United States Death Rate is 0.1073% while the UK is at 0.1905%
Currently in the US the flu is just as deadly as the coronavirus as a whole for the nation.
Worldwide coronavirus is between 33% and 80% as deadly as the flu (based on the US average flu death rate of 0.095%)
Thank you for doing this.
spez: Where do you download the raw dataset from worldometer?
I copied the data from the webpage into a spreadsheet then added additional columns to:
Calculate total population (Total tests / Tests per 1M X 1 million),
positive test rate (total cases / total tests),
extrapolated tests (Total population X positive test rate)
death rate (Total deaths / Extrapolated Cases)
Am I correct in saying your approach assumes the rate of infection between the population actually tested and the population still left untested is the same?
If so, how would you support this? Don't people self-select for testing when symptoms are evident or exposure to an infected person was recent?
This data set is estimating the number of cases that exist beyond those that have only tested positive.
This is not a predictive model of future cases or infection rate. That is not what this data set is attempting to show. This is estimating actual cases that exist NOW.
This is an estimation of the total number of cases worldwide based on the 31M tests that have been performed and using that 31M sample size to estimate the current number of unreported/untested cases - This is the same way the flu impact is calculated.
Not everyone shows symptoms. Not everyone that shows symptoms will get tested. This is an attempt, using the CDC methodology for the flu, to estimate the actual number of cases that exist today.
OK. I was hoping there was a link to the raw dataset because I'm too lazy to write a couple lines of code to scrape that table. Of course I forgot the easier route of ctrl-c and ctrl-v and importing via a spreadsheet. LOL.
spez: do you have a link that lays out the CDC methodology? It's not important; this will just be a fun side project to drop into a jupyter notebook and share with some other stats nerds at work.
https://www.cdc.gov/flu/about/burden/index.html
I will warn you that you will have to look at their referenced sources at the bottom of articles to get a better look into their full methodology.
https://docs.google.com/document/d/1545C_dJWMIAgqeLEsfo2U8Kq5WprDuARXrJl6N1aDjY/preview