Some of you who are into the programming field know what's GitHub. For those of you who don't, GitHub is a website where people who have developed open source codes can store and collaborate on.

Companies mainly use it as a collaborative tool where any changes they make to the software's code can be uploaded to GitHub and the next person would just need to pull the updated code and make other changes.

But the company is now embarking on a unique journey to archive terabytes of code from all around the world.

Curiosity got the better of us, and we had some questions to ask VP of Strategic Programs at GitHub, Thomas Dohmke, on why they did such a thing?

We thought its worth preserving that open source software for future generations to come and we came up with the idea of the archive program. It uses different approaches to archive open-source software in various forms of media.

Not everything on your SSD is kind of like up to date, and then you might upload it into Dropbox or One Drive every other day or so.

But then youll do like an (Apple) Time Machine backup or like an external hard drive backup more rarely, right? Because you dont have the time to always connect that hardware. You might not have it with you when you go on a trip. We do a similar thing, and we have a hot layer which is real-time backups. First, of course, in GitHub itself, our data centres have backups.

And we have multiple regions where we store our data, so if there are some thunderstorms, we can just hand over to a different region, taking over the data. But then we also stream data through our API into two partners. Theyre called GH Archive and GH torrent.

If you have some cool projects that have more than just one, that was kind of like a classifier to figure out which ones to pick.

This is a real open source project and not just some students Hello, World kind of project where you just created something to try it out. Its all the relevant open source code in the world.

That includes like Linux for example, and the Bitcoin source code, the Ruby or JavaScript source codes. It goes all the way from operating systems to Crypto Currency.

Thousands and thousands of libraries that are used in the dependency tree of basically every kind of software project where thats commercial or not. It also includes Microsoft MS-DOS because a couple of years ago they made that open source.

It depends on the layer. Like in the cold layer, its more like a museum or history lesson.

We put it into the vault about two weeks ago, so thats now code frozen in time. The value is not so much to recover this code and make it run and found your startup in 1,000 years. The value is similar to learning about our past in the medieval ages.

We have a saying at GitHub, software is the biggest team sport on Earth. Because people all around the world in their spare time, weekends, and of course professional players are working together and they dont care about where theyve come from. They dont care about languages. They dont care about cultural differences.

All they care about is collaborating on the codes to make them better, and this is what the archive preserves in the first place.

Thats why we also have the warm and the hot layer. They will be updated all the time, assuming that those entities and including GitHub, are there for the next 100 years.

For the cold layer, were also thinking about going back, in the next five years or so to do another snapshot. Maybe we put the next snapshot in a different location. We havent figured that part out. We are thinking about new ways of archiving software all around the world.

I would say pretty strong in the sense that its in a coal mine, and the entrance to the core is 100 meters above sea level so that you have to go up a hill first to get into the coal mine.

Its unlikely that the sea levels will rise 100 meters, right? What the scientists are talking about is a single-digit rise. Thats already a problem because of cities like Miami or in the Netherlands thats built on ocean level.

You go down a little bit and then up a little bit again and down a little bit and that way its protected from meltwater. So, if the permafrost is melting and some water is flowing into the mine, its basically stuck in those valleys of the shafts. Unless a lot of the ice melts, it shouldnt be an issue.

And then you get into the actual archive, which looks like a metal container thats protecting the archive data from the permafrost. But its deep, and you have to go down 300 meters in the mountain.

Then you have the code which is stored in a plastic container, and the plastic containers are wrapped with aluminium foil to keep it in constant conditions. So I would with high confidence say that this will survive most of the likely events that are happening now.

We have not only that copy in the Arctic. We have the other copies in Paris and in San Francisco on their servers. Were also putting two reels into the Library of Oxford which have been archiving data since the 1500s.

The archive is locked down. Its kind of like a data centre or a bank vault so you cant access this unless you have a scientific reason or youre like a partner with the Norwegian government and the company that we are working with.

Occasionally they might have visitors there to do another drop. For example, if the Singapore government wants to store the data. They can of course access the archive and they do photo shoots. But normal people cannot go into the archives. They can go into the coal mine and see the coal mine.

I personally think its fascinating what GitHub is doing. We always talk about preserving history so we can learn from it. But with how much technology has integrated into our lives, we should remember how it has help in human advancement.

These types of technology are ingrained into history. Therefore, it should be given the same treatment as any other pieces of history.

Cover image sourced from GitHub.

