When we first published the Inside Medicine Covid metrics dashboard, many readers were confused by the CDC Covid-19 wastewater data. You weren’t alone. It even took Inside Medicine data guru Benjy Renton and me a few minutes to figure out that the CDC wastewater data being published at the time were percentiles not percentages. In other words, a week could be in the 94th percentile—but that did not mean that Covid levels were 94% as high as the maximum. It just meant that if you took 100 weeks of data, 6 weeks had higher levels and 93 weeks had lower levels.
A case study in Covid-19 wastewater measurement.
Let’s look at Massachusetts data using the CDC’s old wastewater reporting system (which has now been replaced, as I’ll discuss in a moment) and see if we can understand what the data mean.
The reason the old CDC wastewater reporting that used percentiles was/is so misleading is that a week like last week might have looked bad—after all it was in the 73rd percentile. But in fact, last week, levels were probably around 12.5% of the highest levels in state history. The reasons that is not obvious are two-fold. First, people confuse percentile and percentage. Second, the CDC’s old wastewater data for Massachusetts only went back to early 2023. So you couldn’t even see the tsunami that was early Omicron when you went to look at it. (And the same thing was/is true on our dashboard; don’t worry, we’ll fix this.)
The new CDC wastewater reporting system is better—but you’d still have trouble figuring out just how bad things are now compared to the worst peaks of the pandemic in many places.
Still, it’s a big improvement. In the graph below, we combined the old and new CDC Covid-19 wastewater reporting for Massachusetts. I’ve annotated them to show how the two weeks highlighted in the first graph above compare across the old and new systems.
You’ll notice a few things. First, the new CDC wastewater data actually resembles the waves of Covid we know about. That’s a good thing. Second, the “value” for last week (6.41) and the one for the first week of 2023 (15.51), are markedly different compared to those in the old system. Now, it is still not correct to say that last week was 41% as bad as the first week of 2023 (6.41/15.51=41%). The reason is that the CDC is still using a statistical transformation, not raw RNA levels. These are not actually Covid-19 concentrations—they are more like standard deviations. For reasons, I won’t get into, this is a reasonably intelligent fix, albeit imperfect, as you’ll see in a moment. (Some reasonable complaints about the new system include that it currently has less local data compared to the old system, and there are some missing jurisdictions that are still being filled in.) Lastly, the new CDC data actually reach back in time further than the old system. That’s yet another improvement. In the new system, we can at least see the peak of early Omicron (January, 2022). That makes the peak at the start of 2023 look like a relatively mild one—which it absolutely was, in comparison to 2022.
The CDC does not combine these graphs—nor do they combine these graphs with actual wastewater concentrations (that is, graphs showing how many copies of Covid-19 genetic material per milliliter were measured each week). But fortunately, Biobot does publish that data. (Ironically, Biobot was feeding the CDC’s old reporting system, but the CDC decided not to publish actual levels because they thought people wanted to compare states, and many states used different sampling methods; as a result, the CDC did all this work to normalize things, which only led to confusion.)
So, we at Inside Medicine decided to plunk all three systems onto one graph. The hope was that we’d get a better look at where things really are compared to the past. The pink line represents actual Covid-19 wastewater levels (for the Boston area); the blue line shows the CDC’s new wastewater reporting system for Massachusetts; the purple line shows the old system. The results are exactly what we expected:
Because Biobot data go back to 2020, we can really see how both the CDC’s old and its new reporting systems compare to actual Covid-19 levels—in Massachusetts anyway. (We’ll see if we can pull in more of these from elsewhere for other states and update our dashboard.) What we find is that no week or month has ever come close to the mayhem that was early Omicron, which was now just about two years ago. Only by zooming out enough (i.e., going back far enough in time) and using the right metric can we really see that.
The CDC got too clever by half. Its new approach to wastewater is better.
People love the idea of wastewater. I’m one of them. (The CDC is also tracking Mpox, another good thing to do!) The notion that we do not need to rely on how many Covid-19 tests are being done to have an idea of where case counts are is extremely important. But giving people epidemiological data that leaves them confused (and misled, at times) is no better than no data at all. I hope you can now see how unhelpful it was for the CDC to call the first week in January of 2023 “the 94th percentile” for Covid-19 levels in Massachusetts. According to Biobot data (the pink line), that peak was around one-fifth (20.5%) of the state’s all-time high (early January of 2022). That’s a lot lower than “94th percentile” implies to most casual observers, don’t you think? And last week? The old CDC system called it the “73rd percentile.” But triangulating from the three curves above, we can reasonably estimate that actual Covid-19 levels were around 12.5% that of the state’s maximum. That’s reassuring.
As you know, I’m all for giving people warning about Covid-19. And indeed, cases are rising. We need to watch this and react accordingly. But crying wolf too often (“73rd percentile!") will ultimately backfire if people tune out what sound like big numbers, but actually are not. People do have some sense of ground conditions, and when the way that data are being reported don’t match what they’re experiencing, they can sense something isn’t right. We want people to trust data. The best way to achieve that is to give them information that they can make sense of and understand.
Questions? Comments? Please add them to the Comments section.
Thanks to Benjy Renton for managing the Inside Medicine Covid and Respiratory Illnesses Dashboard.
“But giving people epidemiological data that leaves them confused (and misled, at times) is no better than no data at all.”
Precisely!
This is very helpful--I have a sense that cases are rising, and the JN.1 variant is coming, but it's so hard to understand what is going on--this is very appreciated.