Difference between revisions of "Gomes"

From Covid-19
Jump to navigation Jump to search
Line 43: Line 43:
 
For the Gamma family, using the "susceptibility" model, the formula is
 
For the Gamma family, using the "susceptibility" model, the formula is
  
$$\text{HIT} = 1-R_0^{-(1+CV^2)^{-1}}$$
+
\[\text{HIT} = 1-R_0^{-(1+CV^2)^{-1}},\] and with the "connectivity" model it's
 +
 
 +
\[\text{HIT} = 1-R_0^{-(1+2CV^2)^{-1}}.\]

Revision as of 18:38, 19 May 2020

Article

Note on "Individual variation in susceptibility or exposure to SARS-CoV-2 lowers the herd immunity threshold" by M. Gabriella M. Gomes et al.

Summary

This paper made a media splash early in May 2020 with headlines such as "Herd immunity may only need 10-20 per cent of people to be infected". The proposed mechanism is that people who are more involved in the infection (through susceptibility or infectiousness) will be preferentially infected, and so become immune at a faster rate than those who don't take part so much.

My opinion:

  • There is likely to be such an effect: a reduced herd immunity threshold (HIT) due to people being differently susceptible/infectious compared with what simple homogeneous modelling would suggest.
  • This is a valuable point to make given that many people seem to be taking as read that the herd immunity threshold must be 60-70%, and we obviously very much need to know when herd immunity arises and to what extent we are feeling its effects now. However,
  • I doubt that a HIT as low as 10-20% is likely. It arises in this paper's model from an extreme distribution of susceptibilities where there is a big concentration at the low end.
  • Contrary to the practice in the paper, I believe the Coefficient of Variation (CV) is not a suitable parameter to use to measure how much the susceptibility distribution varies from a point value (constant): by choice of distribution you can get a huge variation in HIT for a given CV. That makes it dangerous to use, as the authors do at one point, an empirical/measured CV from one setting and translate it into a CV of a particular distribution (Gamma) in their idealised setting. Furthermore, there is a straightforward way to evaluate the HIT directly from the distribution (which the authors don't appear to have considered) so there is no need for proxy indicators like CV.
  • There is a question of to what extent inhomogeneity in susceptibility is already taken into account in existing modelling - is it taken into account enough? I don't know the answer to this, and probably other people would be better placed to comment. Of course epidemiologists are used to using mixing matrices based on age, location and other things, and these will certainly account for some of the inhomogeneity: for example Prem et al and Klepac et al. But it is conceivable (from the point of view of my limited knowledge) that these efforts, which are based on empirical data, don't go far enough because it's hard to take into account all of the "assortative" behaviours people engage in. (For example - illustrative, not a real example - if your mixing matrix classifies people into football supporters or not, that would account for some inhomogeneity, but it may turn out that you really need to take account of what particular team people support, because supporters tend very strongly only to mix with those of the same team.) If that is true, then there could be a case for artificially boosting inhomogeneity in the models (perhaps in the manner of Gomes et al) to account for the missing/unmeasurable inhomogeneity.

In more detail

NB: This was originally written about the first version of this paper. Since then a second version has appeared where the authors use a distribution for their parameters rather than fixing them. This note is largely addressed at features/results which are common to both versions of this paper.

To parameterize the susceptibility distribution, the authors have used the Gamma family, presumably because it's a convenient family of distributions on non-negative reals with specified mean and standard deviation (and maybe it is common practice to use it). However, if you actually look at what the distribution looks like when the Coefficient of Variation ($$CV$$) is equal to 3, you will see it is quite extreme with a strong concentration at the low end. $$CV=3$$ corresponds to shape parameter $$k=1/9$$, which would imply that 63% of the population had susceptibility less than 0.09 (relative to a mean of 1) and 50% with susceptibility less than 0.01.

So you can forget almost all the maths in this case: in simple terms such an extreme distribution (Gamma at $$CV=3$$) effectively disconnects a large part of the graph. Obviously in that case you wouldn't need nearly such a large proportion of the population to become immune to generate herd immunity, because you only need immunity amongst the subpopulation that isn't disconnected (and that itself is strongly stratified into mostly disconnected vs connected).

The question then becomes is it plausible that this particular CV=3 distribution describes the true susceptibility distribution or connection graph? The evidence cited in the paper is that other epidemics have had $$CV$$s as high as 3.3 (Brazil, tuberculosis), but as we shall see, $$CV$$ is not the proper parameter, and having a $$CV$$ of 3 is not the same as having a $$CV$$ of 3 with a particular distribution (Gamma).

Reasons I think that the Gamma $$CV=3$$ distribution is implausible: we know (in normal times, which is what we are talking about here because we are considering whether there is herd immunity after restrictions are lifted) that 50% of the population aren't living as hermits, cut off from all contact. And I don't know of any suggestion that a large chunk of the population might have a prior immunity to Covid-19. Indeed, there is some evidence from closed settings to suggest that anyone can get it. E.g., in the infamous choir rehearsal, everyone in range apparently caught it. And there is some evidence that despite younger people being far less affected in terms of severity, they can still catch the disease quite readily: a recent infection survey in England found that "there is no evidence of differences in the proportions testing positive between the age categories 2 to 19, 20 to 49, 50 to 69 and 70 years and over". There may be an effect where young people catch the disease less, but the preceding ONS survey suggests it can't be all that extreme: not enough to imagine that 50+% of the population don't take part in the epidemic at all.

Tests

To test the above claims (the authors' and my own), I reimplemented their model in Python. This gives a good match to the output of version 1 of their paper in "susceptibility" mode (where people are differently susceptible to infection, but not differently infectious), though not in "connectivity" mode (where people are differently infectious too). Possibly the discrepancy in the latter mode is due to a difference in our initial conditions. In any case there is only a discrepancy in the progress of the infection, not in the HITs.

To illustrate how the HIT doesn't properly depend on $$CV$$, we now also try the "two-point" distribution parameterized by $$x, y$$ and $$p$$, where $$P(X=x)=p$$, and $$P(X=y)=1-p$$. Fixing the mean to be 1 and the variance to be $$CV^2$$ leaves a free parameter that may as well be $$x$$. $$x=0.99$$ corresponds to some very rare superspreaders, while $$x=0$$ corresponds to a lot of (for want of a better term) "superhermits" - i.e., like Gamma at $$CV=3$$, lots of the distribution is concentrated at or near 0.

Version 2 of the paper came out after I conducted that test, and in it the authors also try out a different family of distributions (lognormal) to test robustness and dependence of their results on a particular family (Gamma). Using lognormal they do in fact get much bigger answers for the HIT than they do for Gamma in the $$CV=3$$ case, but as far as I can see they do not mention this point in the main body of their paper, or discuss how it may be a problem for their use of $$CV$$. It's also worth noting that the lognormal family doesn't have a free parameter to vary like the-point distribution has, so you still don't get to see the full variation in behaviour within the $$CV=3$$ umbrella.

The HIT values for the four distributions, in "susceptilibity" mode, all with $$CV=3$$, are: Two-point at x=0: 6.3%, Gamma: 9.5%, Lognormal: 20.4%, Two-point at x=0.99: 62.6%. So we see there is a huge variation in HIT, and it doesn't make sense to think of HIT as a function of $$CV$$.

Formula

As it happens, there is a nice formula for the HIT under the authors' model, so there is no need for simulation, and there is no dependency on the particular characteristics such as the timing of social distancing.

For the Gamma family, using the "susceptibility" model, the formula is

\[\text{HIT} = 1-R_0^{-(1+CV^2)^{-1}},\] and with the "connectivity" model it's

\[\text{HIT} = 1-R_0^{-(1+2CV^2)^{-1}}.\]