Covid 19
Posted on Sat 11 April 2020 in Blog
Covid 19 (UPDATED at 2020-04-11 18:27)
Numbers or data without units, errors/uncertainties and context are at best useless and at worst damaging. I mention this because the data presented here is taken as is and has some context people should be aware of. For example let's say the confirmed number of cases for two similarly sized countries A and B are 1000 and 500 cases, is A twice as bad as B? It all depends maybe A is testing twice as many people or B's test not as accurate or many other factors? These factors need to be taken into account in a real analysis I've done this for my own interests, to investigate the data and try out using Plotly so always take figures and advice from official sources.
Raw Data
Data being collated by Johns Hopkins University and available from this github repo. So lets plot the raw data for the confirmed cases with linear (left) logarithmic (right) scales.
There are a couple of things to notice first while it's easier to differentiate between the data in the log plot the linear plot really indicates the scale also the order of magnitude difference in number of cases this is partially due to their being more cases in some countries but also the population size. The last point is related to the difference in populations in the countries and we can make a rough correction for this by diving by the population (in millions) to obtain the following.
The differences between countries is much smaller, for the rest of the analysis unless specified I'll be using data normalised by the population. Similar plots can also be made for number of confirmed deaths.
Analysis
The spread of infectious diseases, at least in the early stages, is often modeled as an exponential of the form $$ \frac{N_i}{N_0} = a \exp({\lambda t}) $$ this can be rearranged to obtain $$log(N_i) = log(a) + log(N_0) + \lambda t $$ or $$ y = \lambda t + c $$ where $y=log(N_i)$, $c = log(N_0) + log(a)$,$\lambda$ is the growth and the the doubling time is given by $T_D=log(2)/\lambda$.
We can fit this equation to the data in this case I've chosen to fit the last two weeks or 14 days worth of data. (Clicking on the legend will turn on/off traces)
We can also try to compare different countries at similar points along the time line, a simple way to do this is to align the data to some common point, I've chosen to use the point at which the number of confirmed cases exceeded 10.
Again we make the same plots with but normalised by population.