Chapter 2.1: Data Providers¶
How/where did you collect data?¶
The data for most of the graphs and calculations was pulled from the John Hopkins COVID-19 Data Repository as downloadable csv files. Population counts per county were downloaded from census.gov. LA and NYC have data available at their respective county health websites. These sites provide links to their data frames or downloadable csv files.
What are some challenges when collecting each data set?¶
When looking at specific counties, I made efforts to search their websites for local data. Many of the county websites I parsed store their data in immutable formats, like pdf documents. Or worse, they display semi-interactive graphs of their data, with no access to the raw data they used. Rather than quickly downloading a csv file containing case or death counts over the course of months, the only way to obtain a data table is through manual entry.
How do counties compare in their data sources?¶
LA’s Department of Health provides its own data set of confirmed cases, deaths, and tests administered. The website includes a link to their csv file, but this link changes daily. So, to update LA’s data I had to go back to their health department’s website and obtain a new link. Furthermore, the COVID 19 data page of the site was often down and unreachable. New York City also had an extensive data archive on GitHub that provided COVID data across multiple demographic categories. Lane OR, Cook IL, Maricopa AZ, and Miami-Dade FL stored their COVID 19 data in pdfs or graphs that were updated daily. This made downloading their county data extremely difficult and time consuming, in some cases requiring manual entry for every data point.