Counting the Infected – The New York Times

michael barbaro

From The New York Times, Im Michael Barbaro. This is The Daily.

Today: For months, the U.S. government has been quietly collecting information on hundreds of thousands of coronavirus cases across the country. My colleague, Robert Gebeloff, on the story of how The Times obtained that data.

Its Wednesday, July 8.

Robert, you live in a corner of The Times, the data team, that Im not sure most people understand all that well. So when the pandemic starts, how do you all respond?

So, by training, my goal is to find stories that can best be told through data, which is not every story, but theres a lot of stories out there. So if you go back to early March, the pandemic is starting. And I know that our job as The New York Times is to really get our arms around whats going on and, by that, to start collecting the data that is starting to come out about cases and deaths around the country. So my colleagues set up a team of people across different departments whose primary job would be to monitor all the states, all the major counties, and gather the information and start to build a database. Start to say, were getting information from New York over here and California over here, but lets put it into one database just for the purpose of tracking where the cases were, where the deaths were.

Youre saying its not coming out on a national level. Theres no big clearinghouse thats going to hand you data every day about exactly where the virus is all across the country.

Correct. And at that point, we assume that some kind of federal system may be in the offing, but we werent going to wait for it. And part of our report every day, youll see on our website, are maps showing where the cases are, where new cases are, where deaths are, where the new hotspots are. That all emanated from these early days of creating this ground-level system for being able to collect this data.

And I wonder if you can take me into the process of that a little bit. I mean, what does it look like? Where exactly is the information coming from?

Well, its really like a hive of activity. I mean, thats the way I like to think of it. You have, at any given time, a team of clerks, reporters, editors, all assigned to monitor what gets announced in various parts of the country. So at one moment, you could have somebody wrestling with new data that was put out by California and trying to get it into a format that matches our data standards. And you could have somebody in Mississippi confused about whether the new data announced is cumulative, or is it new cases for the day? And often, that involves basic reporting of going back to the state and asking questions. Then, while all this is going on and people are collecting this data, we have other people trying to put the data into context. Its, you know, truly this whole new full-time operation just devoted to trying to track what is really happening with the pandemic and to do some surveillance on the national picture.

Right. This sounds very tedious, incremental. You know, gathering up tiny bits of data, cleaning it, making sure it all lines up not sexy.

It is not sexy at all. You know, when youre data journalists, the fun part is doing what we call the queries asking questions of the data and seeing what it shows. But we all know, like, job one is to make sure your data is good. Otherwise, the questions you ask wont mean anything.

Hmm. And what do you begin to learn through this data?

Right. Part of what my personal job is to do is to look at this data and try and help understand what it tells us. So, for example, one of the early findings we had when we were looking at the pandemic in March was it seemed to be hitting mostly in big cities New York, New Orleans, Detroit.

Seattle.

Seattle. It seemed to be in places with a lot of population density. But there was also another class of place that seemed to be popping up. And it was resort counties places with ski resorts. And so that led us to this insight that it wasnt just population density, that there are other possible explanations for why places got hit. Then, as the weeks went on, we began to see the fill-in, what I call the fill-in, which is there were all of these new counties that were starting to get cases. And so by having this record, what we were able to then report is there are now hundreds of rural counties getting their first cases. And, you know, how were they preparing? And how were they talking to people? And then, another thing weve been monitoring is there seems to be this ideological difference or at least there has been about how serious a problem is it. How soon should government reopen or allow businesses to reopen? And

Right. Kind of a red state-blue state divide over shutting down and reopening.

Right. But our reporting showed that there was this additional element involved, which was, for the first six to eight weeks of the pandemic, there were hardly any red counties with high infection rates. And most of the hard-hit places were in blue counties. And so we were able to raise the specter of, if you live in a place that doesnt have first-hand experience with the virus, you dont have your emergency rooms being overflowed. Maybe that also contributes to your belief that, you know what, we should open the economy. This is not worth shutting down the economy for.

Right.

And all of these types of stories are, again, driven by the idea that in the first place, we had good county-level data that we couldnt get anywhere else. That allowed us to look at the world through these different prisms and ask different questions about how the pandemic was playing out.

Mm-hmm. Youre laying out clear examples of why data like this is important and what it lets us understand. But Im curious what the limitations of this kind of a database are. What does it not tell us?

Yeah. So think of it this way. A data set we think of like any other source that were going to interview. And we think of what might this source be able to tell us about something. And so we think of questions that were going to ask the source. So the problem became we had this data set, and we knew where the cases were and the deaths were, but we couldnt ask it any other questions. We couldnt ask, who were the people actually becoming infected in these counties? Were they old? Were they young? Where they rich? Were they poor? Were they front-line workers? Were they white? Were they Black? Were they Latino? So all these questions we had we couldnt really ask the data set we had.

So what did you end up doing?

So, along the way, we learned that the C.D.C. actually had some information that would be helpful in this, in that every time a person was confirmed to have a coronavirus infection, the local health agency would fill out a report that would have characteristics of the case the person, the age, the race. And the form actually asked dozens of questions. You know, was the person at work? Was the person staying home? What were the symptoms? And that these forms ultimately ended up at the C.D.C.

Hmm.

And if we could get our hands on this data, we could ask a lot more questions about how this pandemic is playing out. And so we decided to approach the C.D.C. and request access.

And heres why we needed that data. So many people in this country are getting sick. So many people are dying. And our job is to try and explain, who is it that is getting sick? Who is dying and why? And if we had any chance of getting answers to those questions, we need the best data. And if the C.D.C. had the data, we wanted to get a copy ourselves.

And so how do you go about trying to get it?

Well, in this case, we ended up suing them.

Well be right back.

So, Robert, why did The New York Times sue the C.D.C.?

So suing the C.D.C. sounds very dramatic. But in fact, many, many times in the course of a year, we go to court to establish our rights to get public information. Its somewhat more routine than most people would realize. And sometimes its because the government out and out refuses to give up the information. But in this case, it was more to do with the timing. Without going to court and putting pressure on the agency, we were looking at the prospect of waiting months to get our hands on this information.

Right.

But by going to court, it sort of put the clock on. And we had the agencys full attention.

And so what ends up happening once this clock is ticking and a judge is looking over the shoulders of the C.D.C.?

So the C.D.C. tells us that they will comply. They just need to do a little more research as to what they can possibly produce, taking into consideration the privacy of people who are in the database and stripping out personally identifiable information. But ultimately, the day comes where they say, OK, New York Times, here is a database of 1.45 million cases

Wow.

that we have collected from state and local authorities. And we were then free to have a new interview subject and be able to ask it a whole lot of more interesting and detailed questions.

Right. I mean, this quite literally sounds like the motherlode of data on this pandemic in the United States.

Well, in many ways it was. What we were able to see from this was detailed information about individuals who had become infected and died. And for each individual, we were able to look at their age, the county they lived in, their race and their ethnicity. And that is far more information than we had before. And in the end, we ended up being able to break down cases for nearly 1,000 counties covering more than half of the U.S. population.

And this number 1.5 million Americans how big a proportion of all cases of the virus is that?

So for the time period covered by the data it was all cases through the end of May it was about 88 percent of all cases that we had some information about.

So when you get this massive data dump, what do you do? What do you find?

So when we finally had our hands on this data, we were checking what types of information were included, how complete the information was, and just looking at the data in many different ways to see what it could tell us. And eventually, three main trends emerged.

And so what were those trends?

So the first was just how pervasive the racial disparity was with this pandemic.

Mm-hmm.

Whatever knowledge people had that African-Americans and Latinos were becoming infected at a higher rate, a lot of that was tied to big cities that had released data. But what we found is that this racial disparity pervades everywhere, whether you go from cities to suburbs, even into rural places.

Huh.

In fact, any place we found where there was a significant African-American population, almost all of them, African-American infection rates were higher than the rate for Whites. Same thing with Latinos. Any place we found where there was a significant Latino population, for almost all of them, the infection rate was higher for Latinos.

Hmm.

The second big takeaway is what is driving these racial disparities. So most of the earliest explanations of the racial disparity were focused on death rates. And one of the explanations for the disparities in death rates that is commonly offered is something called comorbidities the idea that African-Americans might be dying at a higher rate because they were more likely to have preexisting conditions or to be in poorer health to begin with. But in our analysis, we focused mostly on the actual infection rates. And the reason for that is that gets us out of the question of whether comorbidities is driving it and puts us more on the question of who is most at risk to become infected in the first place. And so when we see disparities in the infection rates, we can then raise the question of, why are people in certain groups more likely to become infected?

Mm-hmm.

And that led us to looking at, where do people work? Where do people live? And what is their housing situation? And if you look at where people work and look at what the data shows, it shows that African-Americans and Latinos in the U.S. are far less likely to have the kind of job where you can do it at home. They are more likely, instead, to have a job in the production sector, in a factory or in the service sector. All of that combined would increase your risk of becoming infected. And with housing, what we found is that Latinos in particular are far more likely to live either with more people in the household or with less space in the household, both of which would also increase the odds of a person might become infected.

So the second discovery very much helps understand the first. There are kind of structural issues around how Black and Latino Americans work and live that contribute to this racial disparity in the pandemic.

Thats correct. And the third takeaway from this is what you learn by looking at the pandemic through the prism of age.

Hmm.

Right now, most of what we know about the disparity is all cases of people of all age groups. And thats how the rates are calculated. But if you realize something about this pandemic, its that older people are far more likely to get sick and die.

Right.

And in the U.S. right now, the older population is very disproportionately white, non-Hispanic.

Huh.

So if you dont account for age, youre by definition almost understating the disparity. So what we did what some epidemiologists call age adjusting is looked at infection rates across age groups. And when you look at, say, what the infection rate is for people who are in their 40s or in their 50s, the disparity is much bigger than youll ever see in numbers without age adjustment.

So when you accounted for the fact that so many older people have died from the coronavirus, and that the older population in this country skews white, you found that the racial disparity actually gets even greater.

Correct. In fact, if you look at some of the younger age groups, the death rate for Latinos is about 10 times higher that for whites.

Wow.

Now, the caveat to that, of course, is youre much, much less likely to die at those age groups. But its still, among the people who do die in those age groups, its very heavily Black and Latino.

Mm-hmm. I mean, these insights, once again, seem to highlight just how important it is to have this kind of information. Because from what youre saying, we have been, in some sense, misunderstanding the racial disparities of this virus the causes of the racial disparities because we havent had access to this data.

Well, at minimum, you could say we didnt know the extent to which these problems existed. And getting data like this helps us sort of define what the ground truth is about how this pandemic is playing out. That being said, theres still a lot more that we would like to know.

Mm-hmm.

The database had 1.45 million records. And it had, for each record, more than 100 columns or 100 pieces of information. Most of those were blank. And that leaves us in the dark about a lot of questions that wed like answered, like how many people are contracting the virus at work? Or how many are getting it from traveling or being at bars? So still a lot of room for improvement. And hopefully, knowing what can be done, the power of having this data to answer questions will help inspire the C.D.C. to collect the information better.

Mm-hmm. And perhaps release it more quickly. I have to think that suing the C.D.C., getting this data and reporting out these insights on race has increased pressure on the federal government to make this information more available. Is that true?

I would like to think so. There is still some mystery as to what will ultimately happen. Our case is still pending. The status is, the C.D.C. at this point believes they satisfied our request.

Right.

Our lawyers are still investigating whether or not there was more information that should have been released or more types of information. And, you know, once that is resolved, the question will be what does the C.D.C. do going forward. And a lot of people, in reaction to the story that published, were asking me, do you think theyll just start posting this on their own? And I would think that whether or not the information is complete, its still better than anything else out there. And so hopefully we will see more of this type of information made public.

That would definitely be beneficial to not just us, but to researchers around the nation and the world to have access to more complete and better information. But until that happens, were going to keep doing what weve been doing.

Were going to go out every day, go to every state and collect data on coronavirus cases and deaths.

Rob, thank you very much.

Thanks, Michael.

On Tuesday, the latest updates to The Timess database found that the virus has infected more than 3 million Americans and has killed more than 130,000 of them. Globally, it recorded nearly 12 million infections and nearly 542,000 deaths, including 65,000 in Brazil, where the countrys president, Jair Bolsonaro, who has repeatedly downplayed the pandemic and avoided wearing a mask, announced that he had tested positive for the virus.

Well be right back.

Station, this is Houston. Are you ready for the event?

Hello, Houston. Were ready for the event.

38 days ago, NASA and SpaceX launched two U.S. astronauts into space on a mission to the International Space Station, where they joined a fellow American. It was the first time that a manned spacecraft has left American soil in nearly a decade.

The New York Times, this is mission control Houston. Please call station for a voice check.

On Tuesday, I spoke with the three U.S. astronauts now aboard the space station.

Hello, New York Times. New York Times, this is the International Space Station. How do you hear us?

Bob Behnken and Doug Hurley, who arrived a few weeks ago, along with Chris Cassidy, who has been there since April.

We hear you loud and clear. How do you hear us?

We hear you loud and clear as well. Good afternoon. Welcome aboard, and were happy to talk to you.

Of course, their time in space is precious. And so NASA gave us six minutes on the dot.

If I might boldly call you by your first names Doug, Chris and Bob thank you very much for making time for us. I wonder if you can start by telling us exactly where you are in space, relative to us right now.

Well, while I kick things off, Bobs going to pull up our mapping program. Right at the moment, we didnt have it on the computer. Sorry about that. But were orbiting 250 miles above the Earth. And it looks like we are abeam of Baja California, just a little bit out into the Pacific Ocean.

Mm-hmm. So over America the U.S.-Mexico border.

Right. Yeah. Were just over the Pacific Ocean. We just past California heading south.

If youll indulge me for a minute, I want to talk a little bit about feelings. Knowing I was going to be talking to you, I have been thinking a lot about this moment back on Earth and wondering, with so much turmoil here, and you looking down on all of it from such a distance, what that feels like to look down on a planet thats truly in the midst of some really challenging, tumultuous times.

Well, it certainly is challenging to hear, either by secondhand or when we get the opportunity to see some news up here, all the turmoil thats going on. The challenges with the pandemic and the strife in the cities and all the different challenges that people are going through on a day-to-day basis. It is you know, emotionally it does take a toll on us, certainly. And I think the other thing that really resonates with me, personally, is just when you look out the window, when you see the planet below, you dont see borders. You dont see this strife. You see this beautiful planet that we need to take care of. And hopefully, as technology advances and as this commercial space travel gets going, more people will get that opportunity. Because I think if you get the chance to look out the window from space and look back on our planet, it will change you. It will change you for the better. And youll realize that this is one big world, rather than all these different little countries or cities or factions that we have on the planet. And I think it will make it a better place.

Well, thats really interesting. And I wonder if you could say a little bit more about that, because in the time since I believe youve all last been in space, there actually have been changes on Earth. You know, major ice shelves have broken off in Antarctica. Huge fires have swept across Australia, California. The Great Barrier Reef has essentially died. And when you look down at Earth, can you actually see some of those changes to the Earth, compared with when you last saw it?

Well, I think one of the things that we see from up here is that the Earth is not a stagnant place. It continues to change, whether its a fire, whether its the seasons, whether its different things happening further out. You know, we just saw a comet become visible in the predawn era. So its definitely a lot of things happening with the Earth and

Wow.

that continuous change.

See original here:

Counting the Infected - The New York Times

Related Posts

Comments are closed.