# Data analysis on corona spread in China: 1/22-2/08

I found data on new confirmed cases of the corona virus. Data on china seems to be sufficiently big to do some basic data analysis. Predicting the evolution of the virus spread using regression seems meaningful. Here is the main result I got with a little bit of python:

The red line is a quadratic polynomial that has been fitted to the data using regression. It is quite surprising that a quadratic function can be fitted so well to the data. We are at the part of the quadratic function where it keeps increasing so hopefully it will behave more like a cubic function in the future. The bounds, the orange and green lines, are predictors for how much the data will most likely be off from the red line. There is a neat technical method for how I came up with this, see the technical section below.

## Technical section

Given the quadratic function obtained through regression you can compute the (absolute) error. It appears that after ordering the error behaves like a linear function. Therefore, the average error is a good predictor for error bounds. You can pull the error study to the domain of the quadratic function since on the domain corresponding to the range we are interested in the quadratic function is strictly increasing.

Data on Github: https://github.com/CryptoKass/ncov-data

Well, just spent 20 minutes looking up how a cubic function graph might look :))

Didn't buy anything, but shared your Rebdubble link on Twitter.

It is a more powerful quadratic function :P

Makes me wonder why should it be a quadratic or cubic function? Assuming the virus has just began to spread and there are still no countermeasures in place should it not be like some exponential function xn, where x is average number of people infected by each patient and n is nth round of spread (which would depend on time taken by patient to come in contact with x new targets). I think I should read more on disease modeling. This fit seems interesting, nonetheless. Makes me wonder what are dynamics of the spread.

If you where in a big hall with the middle person infected, nobody moving and assume that the person who is uninfected will get the disease with 100 percent certainty if they stand next to an infected person, then the new infections will behave linear since the circumference of the circle increases linear with the radius. So exponential behavior is probably based on some kind of additional assumption.

