History as it Happens: Rescuing the Historical Record in a Digital World – NYU News

Posted: February 1, 2022 at 2:28 am

Katy: In addition to that, were archiving ProPublicas data journalism appstheir whole catalog, if we can. Theyre one of our partners on this work. They build some of these really interesting, complex, robust websites that are querying a database in real time. One, which is titled Are hospitals near me ready for the coronavirus?, allows you to enter your zip code and see how full the hospitals are. This was, of course, very useful last winter.

ProPublica produces many different versions of this but there isnt a technology thats able to capture and archive the sitesyet. Were working with different partners and developing tools and we think we will ultimately be able to capture all of ProPublicas journalism apps.

What does the archiving process look like?

Katy: Its not easy to look at a data journalism site and know whether its archivable or not. Were working on a flow chart that would help digital archivists and data journalists figure out exactly what they have built and which aspects can be preserved. Some things can be archived with Web Recorder, which is a high fidelity dynamic web archiving tool that can capture a lot of things, but it can present issues with getting the archives to library catalogs and making them available to researchers later. Sometimes it isnt until you get to the quality assurance step and you check the archived version that you realize it didnt capture crucial parts of the site.

Vicky: But our tool, ReproZipWeb, enables us to do server-side archiving. Anyone can use itits free and open source. If you have access to either a server where the materials are being hosted in production or a copy of those materials, you would first start the server, which engages the tool and keeps track of everything thats happening on the server, including the software it touches, the data it uses, the database, the type of the database, and so on. It captures a lot of in-depth metadata which is required for active, ongoing digital preservation. At the end of the process, you get a bundled file which is small and shareable and contains all the assets needed to rerun the Web application in different environments. Its not just facilitating archiving but its also facilitating reuse for others.

If we dont have access to different computational environments such as different operating systems and different servers over the long term, then a lot of this work becomes moot. If you dont have a copy of Windows 93 but you have a Windows 93 file and you opened it now, it would look like Wingdings. Software archiving is a crucial part of this work.

Its counterintuitive to think that something published online as recently as last year is already at risk of being lost. How widespread is the problem?

Katy: Oddly, there are books that were published 500 years ago that are much more stable and preservable than some of these dynamic websites. The sites can be exceptionally fragile, especially with some of these news organizations, like Vox or Chalkbeat, that dont have a legacy publication behind them. There has been a lot of really interesting data journalism created during COVID thats already gone, and data journalists are sounding the alarm about the loss of their work. Digital-first, start-up media organizations are incredibly volatile.

Excerpt from:

History as it Happens: Rescuing the Historical Record in a Digital World - NYU News

Related Posts