County-level testing data from an unexpected source

For the first time since February, a federal public health institution has released county-level testing data. And it wasn't the CDC or the HHS.

Sep 14, 2020

Welcome back to the COVID-19 Data Dispatch, where new federal datasets are interrogated with ferocity.

This week, I’m diving into county-level test positivity rates released by the Center for Medicare & Medicaid Services, including where exactly these data come from and why this particular institution was the one to publish them. Plus: questions for New York state and some exciting new data sources.

If you were forwarded this email, you can subscribe here:

Wait, who's reporting county-level testing data?

New county-level testing data via CMS, visualized by yours truly.

On September 3, 2020, the Center for Medicare & Medicaid Services (CMS) posted a county-level testing dataset. The dataset specifically provides test positivity rates for every U.S. county, for the week of August 27 to September 2.

This is huge. It’s, like, I had to lie down after I saw it, huge. No federal health agency has posted county-level testing data since the pandemic started. Before September 3, if a journalist wanted to analyze testing data at any level more local than states, they would need to aggregate values from state and county public health departments and standardize them as best they could. The New York Times did just that for a dashboard on school reopening, as I discussed in a previous issue, but even the NYT’s data team was not able to find county-level values in some states. Now, with this new release, researchers and reporters can easily compare rates across the county and identify hotspot areas which need more testing support.

So Betsy, you might ask, why are you reporting on this new dataset now? It’s been over a week since the county-level data were published. Well, as is common with federal COVID-19 data releases, this dataset was so poorly publicized that almost nobody noticed it.

It didn’t merit a press release from CMS or the Department of Health and Human Services (HHS), and doesn’t even have its own data page: the dataset is posted towards the middle of this CMS page on COVID-19 in nursing homes:

Highlighting mine.

The dataset’s release was, instead, brought to my attention thanks to a tweet by investigative reporter Liz Essley Whyte of the Center for Public Integrity:

Liz Essley Whyte @l_e_whyte

🚨 Coronavirus data nerds: @CMSGov has started posting county-level test positivity data Here are the top 20 counties for test positivity according to their data Anything over 10% positivity is red zone

In today’s issue, I’ll share my analysis of these data and answer, to the best of my ability, a couple of the questions that have come up about the dataset for me and my colleagues in the past few days.

Analyzing the data

Last week, I put together two Stacker stories based on these data. The first includes two county-level Tableau visualizations; these dashboards allow you to scroll into the region or state of your choice and see county test positivity rates, how those county rates compare to overall state positivity rates (calculated based on COVID Tracking Project data for the same time period, August 27 to September 2), and recent case and death counts in each county, sourced from the New York Times’ COVID-19 data repository. You can also explore the dashboards directly here.

The second story takes a more traditional Stacker format: it organizes county test positivity rates by state, providing information on the five counties with the highest positivity rates in each. The story also includes overall state testing, case, and outcomes data from the COVId Tracking Project.

As a reminder, a test positivity rate refers to the percent of COVID-19 tests for a given population which have returned a positive result over a specific period of time. Here’s how I explained the metric for Stacker:

These positivity rates are typically reported for a short period of time, either one day or one week, and are used to reflect a region’s testing capacity over time. If a region has a higher positivity rate, that likely means either many people there have COVID-19, the region does not have enough testing available to accurately measure its outbreak, or both. If a region has a lower positivity rate, on the other hand, that likely means a large share of the population has access to testing, and the region is diagnosing a more accurate share of its infected residents.
Test positivity rates are often used as a key indicator of how well a particular region is controlling its COVID-19 outbreak. The World Health Organization (WHO) recommends a test positivity rate of 5% or lower. This figure, and a more lenient benchmark of 10%, have been adopted by school districts looking to reopen and states looking to restrict out-of-state visitors as a key threshold that must be met.

Which counties are faring the worst, according to this benchmark? Let’s take a look:

This screenshot includes the 33 U.S. counties with the highest positivity rates. I picked the top 33 to highlight here because their rates are over 30%—six times the WHO’s recommended rate. The overall average positivity rate across the U.S. is 7.7%, but some of these extremely high-rate counties are likely driving up that average. Note that two counties, one in South Dakota and one in Virginia, have positivity rates of almost 90%.

Overall, 1,259 counties are in what CMS refers to as the “Green” zone: their positivity rates are under 5%, or they have conducted fewer than 10 tests in the seven-day period represented by this dataset. 874 counties are in the “Yellow” zone, with positivity rates between 5% and 10%. 991 counties are in the “Red” zone, with positivity rates over 10%. South Carolina, Alabama, and Missouri have the highest shares of counties in the red, with 93.5%, 61.2%, and 50.4%, respectively:

Meanwhile, eight states and the District of Columbia, largely in the northeast, have all of their counties in the green:

My Tableau visualizations of these data also include an interactive table, which you can use to examine the values for a particular state. The dashboards are set up so that any viewers can easily download the underlying data, and I am, as always, happy to share my cleaned dataset and/or answer questions from any reporters who would like to use these data in their own stories. The visualizations and methodology are also open for syndication through Stacker’s RSS feed—I can share more details on this if anyone is interested.

Answering questions about the data

Why is the CMS publishing this dataset? Why not the CDC or HHS overall?

These test positivity rates were published as a reference for nursing home administrators, who are required to test their staff regularly based on the prevalence of COVID-19 in a facility’s area. A new guidance for nursing homes dated August 26 explains the minimum testing requirement: nursing homes in green counties must test all staff at least once a month, those in yellow counties must test at least once a week, and those in red counties must test at least twice a week.

It is important to note that facilities are only required to test staff, not residents. In fact, the guidance states that “routine testing of asymptomatic residents is not recommended,” though administrators may consider testing those residents who leave their facilities often.

Where did the data come from?

The CMS website does not clearly state a source for these data. Digging into the downloadable spreadsheet itself, however, reveals that the testing source is a “unified testing data set,” which is clarified in the sheet’s Documentation field as data reported by both state health departments and HHS:

COVID-19 Electronic Lab Reporting (CELR) state health department-reported data are used to describe county-level viral COVID-19 laboratory test (RT-PCR) result totals when information is available on patients’ county of residence or healthcare providers’ practice location. HHS Protect laboratory data (provided directly to Federal Government from public health labs, hospital labs, and commercial labs) are used otherwise.

What are the units?

As I discussed at length in last week’s newsletter, no testing data can be appropriately contextualized without knowing the underlying test type and units. This dataset reports positivity rates for PCR tests, in units of specimens (or, as the documentation calls them, “tests performed.”) HHS’s public PCR testing dataset similarly reports in units of specimens.

How are tests assigned to a county?

As is typical for federal datasets, not every field is exactly what it claims to be. The dataset’s documentation elaborates that test results may be assigned to the county where a. a patient lives, b. the patient’s healthcare provider facility is located, c. the provider that ordered the test is located, or d. the lab that performed the test is located. Most likely, the patient’s address is used preferentially, with these other options used in absence of such information. But the disparate possiblities lead me to recommend proceeding with caution in using this dataset for geographical comparisons—I would expect the positivity rates reported here to differ from the county-level positivity rates reported by a state or county health department, which might have a different documentation procedure.

How often will this dataset be updated?

Neither the CMS page nor the dataset’s documentation itself indicate an update schedule. A report from the American Health Care Association suggests that the file will be updated on the first and third Mondays of each month—so, maybe it will be updated on the 21st, or maybe it will be updated tomorrow. Or maybe it won’t be updated until October. I will simply have to keep checking the spreadsheet and see what happens.

Why won’t the dataset be updated every week, when nursing homes in yellow- and red-level counties are expected to test their staff at least once a week? Why is more public information about an update schedule not readily available? These are important questions which I cannot yet answer.

Why wasn’t this dataset publicized?

I really wish I could concretely answer this one. I tried submitting press requests and calling the CMS’ press line this past week; their mailbox, when I called on Friday, was full.

But here’s my best guess: this dataset is intended as a tool for nursing home facilities. In that intention, it serves a very practical purpose, letting administrators know how often they should test their staff. If CMS or HHS put out a major press release, and if an article was published in POLITICO or the Wall Street Journal, the public scrutiny and politically-driven conspiracy theorists which hounded HHS during the hospitalization data switch would return in full force. Nursing home administrators and staff have more pressing issues to worry about than becoming part of a national political story—namely, testing all of their staff and residents for the novel coronavirus.

Still, even for the sake of nursing homes, more information about this dataset is necessary to hold accountable both facilities and the federal agency that oversees them. How were nursing home administrators, the intended users of this dataset, notified of its existence? Will the CMS put out further notices to facilities when the data are updated? Is the CMS or HHS standing by to answer questions from nursing home staff about how to interpret testing data and set up a plan for regular screening tests?

For full accountability, it is important for journalists like myself to be able to access not only data, but also the methods and processes around its collection and use.

New York’s school COVID-19 dashboard looks incredible… but where is it?

I wrote in last week’s issue that New York state is launching a dashboard that will provide data on COVID-19 in public schools.

New York Governor Andrew Cuomo discussed this dashboard in his online briefing last Tuesday, September 8. (If you’d like to watch, start at about 18:00.) He explained that every school district is now required to report test and case numbers daily to New York’s Department of Health. Local public health departments and state labs performing testing are also required to report these numbers, so that the state department can cross-check against three different sources. Cases and tests will be published by school on the new dashboard, called the COVID Report Card.

In his briefing, Governor Cuomo showed a mockup of what the Report Card will look like. The available data includes positive cases by date, tests administered by the school (including test type, lab used, and test wait time), the school’s opening status (i.e. is it operating remotely, in person, or with a hybrid model), and the percentage of on-site students and staff who test positive.

This dataset promises to be much more complete than any other state’s reporting on COVID-19 in schools. But I haven’t been able to closely examine these data yet, because the dashboard has yet to come online.

According to reporting from Gothamist, state officials planned for the dashboard to begin showing data on September 9. As I send this newsletter on September 13, however, the dashboard provides only a message stating that the COVID Report Card will be live “when the reporting starts to come back.”

“The facts give people comfort,” Governor Cuomo said in his briefing. So, Governor, where are the facts? Where are the data? When will New York students, parents, and teachers be able to follow COVID-19 in their schools? My calls to Governor Cuomo’s office and the New York State Department of Health have as yet gone unanswered, and subsequent press releases have not issued updates on the status of these data.

I hope to return with an update on this dashboard next week. In the meantime, for a thorough look at why school COVID-19 data are so important and the barriers that such data collection has faced so far, I highly recommend this POLITICO feature by Biana Quilantan and Dan Goldberg.

Featured data sources

COVID-19 Cutback Tracker: Researchers at the Tow Center for Digital Journalism at Columbia University have tracked layoffs, furloughs, closures, and other cutbacks to journalistic outlets since March 2020. Findings from the project were released this past Wednesday in a new tracker.
We Rate Covid Dashboards: Two weeks ago, I analyzed college and university COVID-19 dashboards for my newsletter. This project from public health experts at Yale and Harvard, meanwhile, goes much further: the researchers have developed a rating scheme based on available metrics, legibility, update schedules, and more, and rated over 100 dashboards so far.
GenderSci Lab’s US Gender/Sex Covid-19 Data Tracker: The GenderSci Lab, an interdisciplinary research project, is tracking COVID-19 by gender by compiling information from state reports. The tracker includes case counts, death counts, and mortality rates.
COVIDcast: This dashboard, by the Delphi Group at Carnegie Mellon University, features interactive maps for a variety of COVID-19 indicators, including movement trends, doctors’ visits , and even test positivity based on antigen tests.
2019 baby name popularity: Okay, this one isn’t COVID-19 related. But as Stacker’s resident baby names expert, I feel obligated to inform my readers that, last week, the Social Security Administration finally released its counts of how many babies were given which names in 2019. (The annual update is usually released in March, but was delayed this year due to COVID-19 concerns.) Olivia has beat out Emma for the number one-ranked baby girl name, after Emma’s five years at the top. Personally, I always get a kick out of scrolling through the long tails to see what unique and creative names parents are using.

COVID source callout

Someday, I will write a parody stage play called “Waiting for Texas.” It will feature a squadron of diligent COVID Tracking Project volunteers, eagerly refreshing Texas’ COVID-19 dashboard, wondering if today, maybe, will be the day that the site updates by its promised time of 4 PM Central (5 PM Eastern).

This past weekend, I was not so lucky. Texas’ data came late enough on Saturday that the Project decided to publish its daily update without this state. How late did it come? 6:30 PM Central, or 7:30 PM Eastern. I understand the procrastination, Texas (see: the sending time of this newsletter today), but a little heads up might be nice next time.

COVID-19 Data Dispatch

County-level testing data from an unexpected source

For the first time since February, a federal public health institution has released county-level testing data. And it wasn't the CDC or the HHS.

Wait, who's reporting county-level testing data?

New county-level testing data via CMS, visualized by yours truly.

Highlighting mine.

Analyzing the data

Answering questions about the data

New York’s school COVID-19 dashboard looks incredible… but where is it?

Featured data sources

COVID source callout

More recommended reading*

My recent Stacker bylines

News from the COVID Tracking Project

*And listening

Bonus

Discussion about this post