What data exist on school reopening?

There is no government-run national dataset on COVID-19 in American schools. Researchers and volunteer projects are working to fill that gap.

Welcome back to the COVID-19 Data Dispatch, your number one source for geeking out over volunteer-run spreadsheets.

This week, I’m bringing you a review of data sources—both current and under development—that cover how COVID-19 and school reopenings are impacting each other across the country. Plus: another update on hospitalization data and a reflection on the people behind COVID-19 numbers.

This is a baby newsletter, so I would appreciate anything you can do to help get the word out. If you were forwarded this email, you can subscribe here:

Data on schools reopening lag the actual reopening of schools

Reported COVID-19 cases in K-12 schools, compiled by Alisha Morris and other volunteers. Screenshot via Jon W.’s Tableau dashboard.

As I wrote in my coverage of the congressional subcommittee hearing on national COVID-19 response a few weeks ago, everyone wants to reopen the schools.

Politicians on both sides of the aisle, along with public health leaders such as the CDC’s Dr. Robert Redfield and NIAID’s Dr. Anthony Fauci, agree that returning to in-person learning is crucial for public health. Many children rely on food and health resources provided by schools. Parents rely on childcare. Without in-person schools, it is difficult for teachers and other mandated reporters to identify cases of child abuse. And all school students, from kindergarteners to college kids, are facing the mental health deterioration that comes from limited social interaction with their peers.

But in deciding whether and how to return to in-person learning, school districts around the country are facing the same challenge that states faced early in the pandemic: they’re on their own. Some districts may have guidance from local government; in New York, for example, schools are allowed to reopen if they are located in an area with a under 5% of COVID-19 tests returning positive results. Every county in the state meets this guideline, and the state as a whole has had a positivity rate under 1% for weeks.

Still, low community transmission does not indicate that a state is necessarily safe for reopening. Teachers in New York City have protested the city’s plan for school reopening, citing poor ventilation, no plan for regular testing, and other health concerns. Teachers in Detroit, outside of Phoenix, and other districts across the country are considering strikes. Earlier this week, the White House formally declared that teachers are essential workers—meaning they could continue working after exposure to COVID-19—which Randi Weingarten, president of the American Federation of Teachers, called a move to “threaten, bully, and coerce” teachers back into their unsafe classrooms. Meanwhile, many colleges and universities are planning to bring students from out of state into the communities around their campuses.

As conversations on school reopening heighten at both national and local levels, a data journalist like myself has to ask: what data do we have on the topic? Is it possible to track how school reopening is impacting COVID-19 outbreaks, or vice versa?

The answer is, as with any national question about COVID-19, the data are spotty. It’s possible to track cases and deaths at the county level, but no source comprehensively tracks testing at a level more local than the state. It is impossible to compare percent positivity rates—that crucial metric many districts are using to determine whether they can safely reopen—both broadly and precisely across the country.

The best a data journalist can do is represented in this New York Times analysis. The Times pulled together county-level data from local public health departments and evaluated whether schools in each county could safely open based on new cases per 100,000 people and test positivity rates. Test positivity rates are difficult to standardize across states, however, because different states report their tests in different units. And, if you look closely at this story’s interactive map, you’ll find that some states—such as Ohio, New Hampshire, and Utah—are not reporting testing data at the county level at all.

Still, some research projects and volunteer efforts are cropping up to document COVID-19 in schools as best they can. I will outline the data sources I’ve found here, and I invite readers to send me any similar sources that I’ve missed so that I can feature them in future issues.

How schools are reopening

  • COVID-19 Testing in US Colleges: Sina Booeshaghi and Lior Pachter, two researchers from CalTech, put together a database documenting testing plans at over 500 colleges and universities throughout the U.S. The database is open for updates; anyone who would like to suggest an edit or contribute testing information on a new school can contact the researchers, whose emails are listed in the spreadsheet. Booeshaghi and Pachter wrote a paper on their findings, which is available in preprint form on medRxiv (it has not yet been reviewed by other scientists).

  • The College Crisis Initiative: Davidson College’s College Crisis Initiative (or C2i) maps out fall 2020 plans for about 3,000 colleges and universities. Clicking on a college in the interactive map leads users to see a brief description of the school’s opening policy, along with a link to the school’s website. Corrections may be submitted via a Google form.

  • District Budget Decisions: Edunomics Lab at Georgetown University has compiled a database of choices school districts are making about how to change their budgets and hiring during the COVID-19 pandemic. The database includes 302 districts at the time I send this newsletter; district choices are categorized as budget trimming, salary reductions, benefits adjustment, furloughs, and layoffs.

Reporting COVID-19 in schools and districts

  • COVID-19 in Iowa: Iowa’s state dashboard includes a page which specifically allows users to check the test positivity rates in the state’s school districts. Click a school district in the table on the left, and the table on the right will automatically filter to show how testing is progressing in the counties encompassed by this district. So far, Iowa is the only state to make such data available in an accessible manner; other states should follow its lead.

  • NYT COVID-19 cases in colleges: Journalists at the New York Times surveyed public and private four-year colleges in late July. The analysis found at least 6,600 cases tied to 270 colleges since March. This dataset is not being actively updated, but it is an informative indicator of the schools that faced outbreaks in the spring and summer.

  • Individual school dashboards: Any large college or university that chooses to reopen, even in a partial capacity, must inform its students of COVID-19’s progress on campus. Some schools are communicating through regular emails, while others have put together school-specific dashboards for students, professors, and staff. Two examples of school dashboards can be found at Boston University and West Virginia University; at other schools, such as Georgia Tech, students have spun up their own dashboards based on school reports.

Reopening gone wrong

  • K-12 school closures, quarantines, and/or deaths: Weeks ago, Alisha Morris, a theater teacher in Kansas, started compiling news reports on instances of COVID-19 causing schools to stall or alter reopening plans. Morris’ project grew into a national spreadsheet with hundreds of COVID-19 school case reports spanning every U.S. state. She now manages the sheet with other volunteers, and the sheet’s “Home” tab advertises a new site coming soon. You can explore the dataset through a Tableau dashboard created by one volunteer.

Datasets under development

  • FinMango and Florida COVID Action collaboration: FinMango, a global nonprofit which has pivoted to help COVID-19 researchers, has partnered with Florida COVID Action, a data project led by whistleblower Rebekah Jones, to track COVID-19 cases in K-12 schools. The project, called the COVID Monitor, has already been compiling reports from media and members of the public since July. It includes about 1,300 schools with confirmed or reported COVID-19 cases so far, 200 of which are in the project’s home state of Florida.

  • ProPublica school reopening survey: A new initiative from ProPublica asks students, parents, educators, and staff to report on their schools’ reopening plans. Readers who might prefer to share information with ProPublica through more private means can get in touch on Signal or visit the publication’s tips page.

  • Nature university reopening survey: Similarly to ProPublica, Nature News is surveying its readers on their reopening experience. This survey specifically calls on research scientists to share how they will be teaching and if they agree with the approach their university has taken on reopening. Respondents who wish for more privacy can use Signal or WhatsApp.

No, hospitalization data isn’t switching back to the CDC

I mean, it is. But not right now. Or is it?

Last Thursday, the Wall Street Journal published an article headlined, “Troubled COVID-19 Data System Returning to CDC.” At first glance, the article reports that the tracking of COVID-19 hospitalization data is returning to the CDC’s charge after numerous concerns were raised about data accuracy and integrity under Department of Health and Human Services (HHS) control.

Readers, I cannot lie: when I first saw this headline, I lay down on the floor of my apartment and cursed for several minutes. Why would they change it back, I thought. The HHS is already collecting data from more hospitals than the CDC did. It made sense with remdisivir distribution. Why make everyone go through another system switch.

And then I got up, sent some incredulous messages in the COVID Tracking Project Slack server, and actually read the full article. What is actually happening, according to WSJ reporter Robbie Whelam, is this: the CDC is developing a new data system which will be more efficient for both hospitals and data users. After the new system is complete, the CDC will once again collect and report hospitalization data.

“CDC is working with us right now to build a revolutionary new data system so it can be moved back to the CDC, and they can have that regular accountability with hospitals relevant to treatment and PPE,” Dr. Birx said, referring to personal protective equipment used by doctors and nurses.

The article, however, fails to report any meaningful details about this new CDC data system. What is the proposed timeline for the system? What makes it “revolutionary?” Who is developing it? What new metrics will it collect? How will it address challenges that hospitals with fewer staff or lower technological capacity currently face in making daily reports? I could go on, but you get the idea.

Also, there’s this insight, from POLITICO reporter Dan Diamond:

Within a few hours, the WSJ had changed their headline to “COVID-19 Data Will Once Again Be Collected by CDC, in Policy Reversal.”

It continues to be unclear when or how the HHS-back-to-CDC hospitalization data switch will occur, if it does occur. As COVID-19 Tracking Project lead Erin Kissane points out, federal IT development happens very slowly. It will likely be months before definitive information is available on the CDC’s new database.

Meanwhile, the HHS is proceeding with its own new data system effort: an overhaul called the Modernizing Public Health Reporting and Surveillance project, POLITICO reported this past Wednesday. The project plans to improve data technology and data quality at state and local public health departments over the next several years. It’s an ambitious initiative, considering that HHS is still working on fixing its hospital reporting:

HHS says that 85 percent of the nation’s hospitals report daily — a mark that is improving, and that includes more metrics the government uses to allocate scarce resources during the pandemic, like the drug remdesivir. But federal officials say they receive only half of the required clinical information on average, a gap that could distort the scope of the pandemic and obscure who’s getting sick where.

I may be optimistic, but I’m hoping that at least one of these new data systems will be ready to go before the next pandemic hits.

March for the dead, fight for the living

Earlier this weekend, I attended a protest in New York City called, “March for the Dead.” The event sought to memorialize New Yorkers who died of COVID-19 and demand that the federal government better address the realities of this pandemic and protect vulnerable Americans.

After a rally and a silent, candlelit march across the Brooklyn Bridge, the protest finished with a reading of names. Two organizers read the names of 1,709 New Yorkers whose lives were lost in this pandemic in front of a makeshift memorial comprised of candles and signs. The names came from a database compiled by local NYC publication THE CITY, the Columbia Journalism School, and the CUNY Craig Newmark Journalism School; they comprise only a small fraction (7.2%) of New Yorkers who have died due to COVID-19.

(Disclaimer: one of the event’s organizers, Justin Hendrix, volunteers with me at the COVID Tracking Project.)

While this newsletter is a journalism project, it felt fitting this week for me to share a few lines I wrote on the subway home after listening to the name reading. “March for the Dead” reminded me of the people behind the numbers I spend so much time compiling and analyzing—a reminder that I think anyone covering this pandemic sorely needs.

how long would it take to read all the names?

1,700 names in the city’s memorial. it took an hour, maybe, give or take.  i wasn't really keeping track. i was listening to the names, the way they rang out in the open square. the way they fell heavy on the pavement, like drops of rain at the start of a thunderstorm. but this is not the start of a thunderstorm, of course. it's a hurricane, and another hurricane, and a wildfire, and a tornado, and all of it preventable. the father of one of the readers, kept in a nursing home. grocery store clerks, cafe workers, nurses, and parents, siblings. so many pairs of names that rhyme. so many bodies in tiny apartments, bodies in shelters, bodies hooked up to breathing machines, gasping for every molecule of oxygen.

this is not the start of a thunderstorm. it's a hurricane, and we aren't stopping it. an hour, perhaps, for 1,700 names. how long would it take, to read all 25,000 names of those who died in new york city? all 175,000 who died in america? all the thousands more who have not been counted yet? how long would it take to talk to the families and friends of the people who bore those names, to find out their favorite colors, what they ate for breakfast, what they were looking forward to this year? how long would it take to attend 175,000 funerals?

this is a metric i can't count. my back would crack under the weight. all i can do is sit in the square, sit quietly, and listen. and then i return to work, i keep counting the numbers i can count. i let them echo. 

Featured data sources (unrelated to school reopening)

  • COVID Care Map: Dave Luo, another COVID Tracking Project volunteer, also runs this volunteer effort to aggregate and clean public data on health care system capacity. The source has mapped capacity figures at the state, county, and individual facility levels, as well as other healthcare data from sources such as the Institute for Health Metrics and Evaluation (IHME).

  • Federal allocation of remdesivir: This public dataset from HHS shows how many cases of remdesivir, an antiviral drug which has become an important treatment option for COVID-19 patients, have been distributed to each state since early July. The dataset is cited in an NPR investigation which reports confusion and lack of transparency about how remdesivir distribution is decided.

  • The White House’s Red Zone Reports: Each week, the White House Coronavirus Task Force sends reports to U.S. governors about the state of the pandemic, including county-level data on cases and tests. The reports are not made public, but the Center for Public Integrity is collecting and releasing many of them. As of August 23, the Center’s document repository includes one report on all 50 states (from July 14) and 13 state-specific reports.

COVID source callout

New Jersey reports COVID-19 demographic data in three different places.

First: there are confirmed case summary reports, released in PDF form. These reports include pie charts that break down COVID-19 cases, deaths, and hospitalizations according to race and ethnicity, age group, and gender. A case summary report was last released on July 30.

Second: there is a “demographics” tab on New Jersey’s dashboard, which includes tables on COVID-19 deaths by race and ethnicity, age group, and underlying conditions. This tab currently lags the main dashboard significantly; the tables add up to about 11,000 deaths, while New Jersey has reported about 1,600 deaths total.

And third: there is a “case and mortality summaries” tab on the dashboard, which replicates the format of the old PDF reports with some confusing navigation. (Two rows of tabs at the top, and another row of tabs at the bottom? Who designed this? Who hurt them?)

This section of the dashboard appears to be updated daily, at least for now. But at the COVID Racial Data Tracker, we are wary of New Jersey. We won’t let our guard down. We are prepared for the source to change again.

More recommended reading

My recent Stacker bylines

News from the COVID Tracking Project


That’s all for today! I’ll be back next week with more data news.

If you’d like to share this young newsletter further, you can do so here:


And if you have any feedback for me—or if you want to ask me your COVID-19 data questions—you can send me an email (betsyladyzhets@gmail.com) or comment on this post directly:

Leave a comment