Periodically during the COVID-19 pandemic, CDC scientific staff have employed their available studies’ data to estimate the efficacy of current or recent versions of COVID-19 vaccines to reduce risk of testing positive for COVID-19. While the fact of “testing positive” has been somewhat controversial because of the secret PCR Ct threshold numbers involved that have allowed for uninfectious people with unrecognized COVID-19 from some weeks in the past to remain test-positive, my goal here is to illustrate CDC’s problematic epidemiologic methods that have substantially inflated the vaccine efficacy percents that they have reported.
Controlled epidemiologic studies fall into three and only three basic study designs. Either a total sample of subjects is sampled, and each subject is evaluated both for case status and previous exposure status—this is a cross-sectional study—or a sample of exposed people and a sample of unexposed people are followed to see who becomes a case and who a control—a cohort study—or a sample of cases and a sample of controls is obtained, and each subject evaluated for past exposure status—this is a case-control study. If a cohort study involves randomizing the subjects into those exposed and unexposed, this is a randomized controlled trial (RCT), but the study design is still cohort.
For a vaccine, its efficacy is estimated as 1.0—RR. For case-control study data which only estimate OR not RR, when does the OR approximate the RR accurately enough to be substituted in this formula? This question has a detailed epidemiologic history beyond the current scope, but in the simplest sense, the OR approximates the RR when in the population, cases are infrequent compared to controls.
However, the investigators estimated the OR not the RR from these data, by using a statistical analysis method called logistic regression that allows the OR to be adjusted for various possible confounding factors. There is nothing wrong with using logistic regression and obtaining estimated ORs in any study design; the problem is using the OR value instead of the RR in the vaccine efficacy formula 1.0—RR. Because the study design was cross-sectional, the investigators could have examined relative case occurrence in the population from their sampled numbers, but they did not appear to do this. In fact, cases comprised 3,295 of the total 9,222 sampled, 36 percent, which is not nearly small enough to use the OR as a substitute for the RR. This is true both among the exposed subjects (25 percent) and the unexposed (37 percent).
Nevertheless, it is possible to get a rough idea of how much this bad assumption affected the authors’ claimed overall 54 percent vaccine efficacy. The relevant numbers of subjects, shown in the table below, are stated in Tables 1 and 3 of the Link-Gelles paper. The RR calculation from these raw data is simple. The risk in the vaccinated is 281/1,125 = 25 percent; in the unvaccinated, it is 3,014/8,097 = 37 percent. The RR is the ratio of these two, 25 percent/37 percent = 0.67, thus the vaccine efficacy based on these raw data would be 1.0—0.67 = 0.33 or 33 percent.
Similarly the OR can be estimated from these raw data as 0.56, which if used in the vaccine efficacy formula would give efficacy of 44 percent, appreciably different than the 33 percent efficacy as properly estimated by using the RR.
However, Link-Gelles et al. used the adjusted OR = 0.46 as obtained from their logistic regression analysis. This differs from the unadjusted OR = 0.56 by a factor of 0.46/0.56 = 0.82. We can use this adjustment factor, 0.82, to approximate what the raw RR would have been had it been adjusted by the same factors: 0.67*0.82 = 0.55. These numbers are shown in the table below, and demonstrate that the correct vaccine efficacy is approximately 45 percent, not the claimed 54 percent, and less than the nominal 50 percent desired level.
It seems surprising to me that apparently none of the more than 60 authors between the Link-Gelles and Tenforde papers recognized that the sampling design of their studies was cross-sectional, not case-control, and thus that the proper parameter to use for estimating vaccine efficacy was the RR not the OR, and that the rare-disease assumption for substituting the OR for the RR was not met in their data. These studies therefore substantially overestimated the true vaccine efficacies in their results. This is not a purely academic issue, because CDC public health policy decisions can be derived from incorrect results such as these.