Tags: preservation

363

sparkline

Sunday, October 20th, 2024

Archives

Speaking of serendipity, not long after I wrote about making a static archive of The Session for people to download and share, I came across a piece by Alex Chan about using static websites for tiny archives.

The use-case is slightly different—this is about personal archives, like paperwork, screenshots, and bookmarks. But we both came up with the same process:

I’m deliberately going low-scale, low-tech. There’s no web server, no build system, no dependencies, and no JavaScript frameworks.

And we share the same hope:

Because this system has no moving parts, and it’s just files on a disk, I hope it will last a long time.

You should read the whole thing, where Alex describes all the other approaches they took before settling on plain ol’ HTML files in a folder:

HTML is low maintenance, it’s flexible, and it’s not going anywhere. It’s the foundation of the entire web, and pretty much every modern computer has a web browser that can render HTML pages. These files will be usable for a very long time – probably decades, if not more.

I’m enjoying this approach, so I’m going to keep using it. What I particularly like is that the maintenance burden has been essentially zero – once I set up the initial site structure, I haven’t had to do anything to keep it working.

They also talk about digital preservation:

I’d love to see static websites get more use as a preservation tool.

I concur! And it’s particularly interesting for Alex to be making this observation in the context of working with the Flickr foundation. That’s where they’re experimenting with the concept of a data lifeboat

What should we do when a digital service sinks?

This is something that George spoke about at the final dConstruct in 2022. You can listen to the talk on the dConstruct archive.

Monday, October 14th, 2024

Train coding

When I went up to London for the State of the Browser conference last month, I shared the train journey with Remy.

I always like getting together with Remy. We usually end up discussing sci-fi books we’re reading, commiserating with one another about conference-organising, discussing the minutiae of browser APIs, or talking about the big-picture vision of the World Wide Web.

On this train ride we ended up talking about the march of time and how death comes for us all …and our websites.

Take The Session, for example. It’s been running for two and a half decades in one form or another. I plan to keep it running for many more decades to come. But I’m the weak link in that plan.

If I get hit by a bus tomorrow, The Session will keep running. The hosting is paid up for a while. The domain name is registered for as long as possible. But inevitably things will need to be updated. Even if no new features get added to the site, someone’s got to install updates to keep the underlying software safe and secure.

Remy and I discussed the long-term prospects for widening out the admin work to more people. But we also discussed smaller steps I could take in the meantime.

Like, there’s the actual content of the website. Now, I currently share exports from the database every week in JSON, CSV, and SQLite. That’s good. But you need to be tech nerd to do anything useful with that data.

The more I talked about it with Remy, the more I realised that HTML would be the most useful format for the most people.

There’s a cute acronym in the world of digital preservation: LOCKSS. Lots Of Copies Keep Stuff Safe. If there were multiple copies of The Session’s content out there in the world, then I’d have a nice little insurance policy against some future catastrophe befalling the live site.

With the seed of the idea planted in my head, I waited until I had some time to dive in and see if this was doable.

Fortunately I had plenty of opportunity to do just that on some other train rides. When I was in Spain and France recently, I spent hours and hours on trains. For some reason, I find train journeys very conducive to coding, especially if you don’t need an internet connection.

By the time I was back home, the code was done. Here’s the result:

The Session archive: a static copy of the content on thesession.org.

If you want to grab a copy for yourself, go ahead and download this .zip file. Be warned that it’s quite large! The .zip file is over two gigabytes in size and the unzipped collection of web pages is almost ten gigabytes. I plan to update the content every week or so.

I’ve put a copy up on Netlify and I’m serving it from the subdomain archive.thesession.org if you want to check out the results without downloading the whole thing.

Because this is a collection of static files, there’s no search. But you can use your browser’s “Find in Page” feature to search within the (very long) index pages of each section of the site.

You don’t need to a web server to click around between the pages: they should all work straight from your file system. Double-clicking any HTML file should give a starting point.

I wanted to reduce the dependencies on each page to as close to zero as I could. All the CSS is embedded in the the page. Likewise with most of the JavaScript (you’ll still need an internet connection to get audio playback and dynamic maps). This keeps the individual pages nice and self-contained. That means they can be shared around (as an email attachment, for example).

I’ve shared this project with the community on The Session and people are into it. If nothing else, it could be handy to have an offline copy of the site’s content on your hard drive for those situations when you can’t access the site itself.

Tuesday, September 17th, 2024

To remember, or to forget?

What are your own scribbles, your own ordinary plenty, not worth much to you now but that someone in the future may treasure?

Saturday, August 24th, 2024

The race to save our online lives from a digital dark age | MIT Technology Review

For many archivists, alarm bells are ringing. Across the world, they are scraping up defunct websites or at-risk data collections to save as much of our digital lives as possible. Others are working on ways to store that data in formats that will last hundreds, perhaps even thousands, of years.

Sunday, June 9th, 2024

Blogs and longevity | James’ Coffee Blog

When I write a blog post, I want it to live on my blog, rather than a platform. I can thus invest my time thinking about how to make my blog better and backing it up, rather than having to worry about where my writing is, finding ways to export data from a platform, setting up persistent backups, etc.

Monday, May 13th, 2024

Untapped – Using Simple Tools as a Radical Act of Independence

It would be much harder for a 15-year-old today to View Source and understand the code structure that built the website they’re on. Every site is layered with analytics, code snippets, javascript plugins, CMS data, and more.

This is why the simplicity of HTML and CSS now feels like a radical act. To build a website with just these tools is a small protest against platform capitalism: a way to assert sustainability, independence, longevity.

Wednesday, January 3rd, 2024

Permanence · Indie Microblogging

One of the reasons to own your content is to make it last. When you have control of the text you write or the photos you post, it’s up to you whether that content stays on the internet. When you post to someone else’s platform, you’ve given up that control. It’s not up to you whether the company that hosts your content will stay in business or change everything to break your content.

Thursday, November 2nd, 2023

Internet Artifacts

I love this timeline of internet firsts. Best of all:

You may touch the artifacts

The websites on display work—even the ones that used Flash!

Thursday, September 14th, 2023

Canadian Typography Archives

Go spelunking down the archives to find some lovely graphic design artefacts.

Wednesday, September 6th, 2023

File over app — Steph Ango

In the fullness of time, the files you create are more important than the tools you use to create them. Apps are ephemeral, but your files have a chance to last.

Friday, September 1st, 2023

Shining a Light on the Digital Dark Age - Long Now

A false sense of security persists surrounding digitized documents: because an infinite number of identical copies can be made of any original, most of us believe that our electronic files have an indefinite shelf life and unlimited retrieval opportunities. In fact, preserving the world’s online content is an increasing concern, particularly as file formats (and the hardware and software used to run them) become scarce, inaccessible, or antiquated, technologies evolve, and data decays. Without constant maintenance and management, most digital information will be lost in just a few decades. Our modern records are far from permanent.

Tuesday, August 29th, 2023

The 100-Year Plan on WordPress.com

Some really interesting long-term thinking from Matt—it’ll be interesting to see the terms and conditions.

Monday, August 7th, 2023

Lunar Codex

Time capsules on the moon, using NanoFiche as the storage medium.

35mm scans — writer/editor/reporter

Clicking through these cold war slides gives an uncomfortable mixture of nostalgic appreciation for the retro aesthetic combined with serious heebie-jeebies for the content.

The slides appear to be 1970s/1980s informational or training images from the United States Air Force, NORAD, Navy, and beyond.

Saturday, May 27th, 2023

404 Page Not Found | Kate Wagner

Considering the average website is less than ten years old, that old warning from your parents that says to “be careful what you post online because it’ll be there forever” is like the story your dad told you about chocolate milk coming from brown cows, a well-meant farce. On the contrary, librarians and archivists have implored us for years to be wary of the impermanence of digital media; when a website, especially one that invites mass participation, goes offline or executes a huge dump of its data and resources, it’s as if a smallish Library of Alexandria has been burned to the ground. Except unlike the burning of such a library, when a website folds, the ensuing commentary from tech blogs asks only why the company folded, or why a startup wasn’t profitable. Ignored is the scope and species of the lost material, or what it might have meant to the scant few who are left to salvage the digital wreck.

Friday, December 16th, 2022

99 Good News Stories From 2022

A look at back at what wasn’t in the headlines this year.

Friday, November 18th, 2022

The lost thread

The speed with which Twitter recedes in your mind will shock you. Like a demon from a folktale, the kind that only gains power when you invite it into your home, the platform melts like mist when that invitation is rescinded.

Thursday, September 8th, 2022

Worse than LaserDiscs?

Kevin takes my eleven-year old remark literally and points out at least you can emulate LaserDiscs:

So LaserDiscs aren’t the worst things to archive, networks of servers running code that isn’t available or archivable are, and we are building a lot more of those these days, whether on the web or in apps.

Wednesday, September 7th, 2022

“Writing an app is like coding for LaserDisc” – Terence Eden’s Blog

I love this: Terence takes eleven years to reflect on a comment I made on stage at an event here in Brighton. It’s all about the longevity of the web compared to native apps:

If you wrote an app for an early version of iOS or Android, it simply won’t run on modern hardware or software. APIs have changed, SDKs weren’t designed with forward compatibility, and app store requirements have evolved.

The web has none of that. The earliest websites are viewable on modern browsers.

As wrote at the time, I may have been juicing things up for entertainment:

Now here’s the thing when it comes to any discussion about mobile or the web or anything else of any complexity: an honest discussion would result in every single question being answered with “it depends”. A more entertaining discussion, on the other hand, would consist of deliberately polarised opinions. We went for the more entertaining discussion.

But I think this still holds true for me today:

The truth is that the whole “web vs. native” thing doesn’t interest me that much. I’m as interested in native iOS development as I am in native Windows development or native CD-ROM development. On a timescale measured in years, they are all fleeting, transient things. The web abides.

Wednesday, July 20th, 2022

Fandom Relics and the Enthusiastic Past - Long Now

Not much stays in one place for one long, especially when it comes to digital artifacts. When the Yahoo Groups archive was summarily deleted by parent company Verizon just a few years ago, fandom suffered massive losses, just as it had during the Livejournal purges of the late 02000s, and during the Tumblr porn ban in 02018. Fandom preservation, then, ties into the larger issue of digital preservation as a whole, and specifically the question of how individual and group emotions and experiences — which make up so much of what it means to be a fan — can be effectively documented, annotated, and saved.