Tags: archiving

28

sparkline

Sunday, October 20th, 2024

Archives

Speaking of serendipity, not long after I wrote about making a static archive of The Session for people to download and share, I came across a piece by Alex Chan about using static websites for tiny archives.

The use-case is slightly different—this is about personal archives, like paperwork, screenshots, and bookmarks. But we both came up with the same process:

I’m deliberately going low-scale, low-tech. There’s no web server, no build system, no dependencies, and no JavaScript frameworks.

And we share the same hope:

Because this system has no moving parts, and it’s just files on a disk, I hope it will last a long time.

You should read the whole thing, where Alex describes all the other approaches they took before settling on plain ol’ HTML files in a folder:

HTML is low maintenance, it’s flexible, and it’s not going anywhere. It’s the foundation of the entire web, and pretty much every modern computer has a web browser that can render HTML pages. These files will be usable for a very long time – probably decades, if not more.

I’m enjoying this approach, so I’m going to keep using it. What I particularly like is that the maintenance burden has been essentially zero – once I set up the initial site structure, I haven’t had to do anything to keep it working.

They also talk about digital preservation:

I’d love to see static websites get more use as a preservation tool.

I concur! And it’s particularly interesting for Alex to be making this observation in the context of working with the Flickr foundation. That’s where they’re experimenting with the concept of a data lifeboat

What should we do when a digital service sinks?

This is something that George spoke about at the final dConstruct in 2022. You can listen to the talk on the dConstruct archive.

Monday, October 14th, 2024

Train coding

When I went up to London for the State of the Browser conference last month, I shared the train journey with Remy.

I always like getting together with Remy. We usually end up discussing sci-fi books we’re reading, commiserating with one another about conference-organising, discussing the minutiae of browser APIs, or talking about the big-picture vision of the World Wide Web.

On this train ride we ended up talking about the march of time and how death comes for us all …and our websites.

Take The Session, for example. It’s been running for two and a half decades in one form or another. I plan to keep it running for many more decades to come. But I’m the weak link in that plan.

If I get hit by a bus tomorrow, The Session will keep running. The hosting is paid up for a while. The domain name is registered for as long as possible. But inevitably things will need to be updated. Even if no new features get added to the site, someone’s got to install updates to keep the underlying software safe and secure.

Remy and I discussed the long-term prospects for widening out the admin work to more people. But we also discussed smaller steps I could take in the meantime.

Like, there’s the actual content of the website. Now, I currently share exports from the database every week in JSON, CSV, and SQLite. That’s good. But you need to be tech nerd to do anything useful with that data.

The more I talked about it with Remy, the more I realised that HTML would be the most useful format for the most people.

There’s a cute acronym in the world of digital preservation: LOCKSS. Lots Of Copies Keep Stuff Safe. If there were multiple copies of The Session’s content out there in the world, then I’d have a nice little insurance policy against some future catastrophe befalling the live site.

With the seed of the idea planted in my head, I waited until I had some time to dive in and see if this was doable.

Fortunately I had plenty of opportunity to do just that on some other train rides. When I was in Spain and France recently, I spent hours and hours on trains. For some reason, I find train journeys very conducive to coding, especially if you don’t need an internet connection.

By the time I was back home, the code was done. Here’s the result:

The Session archive: a static copy of the content on thesession.org.

If you want to grab a copy for yourself, go ahead and download this .zip file. Be warned that it’s quite large! The .zip file is over two gigabytes in size and the unzipped collection of web pages is almost ten gigabytes. I plan to update the content every week or so.

I’ve put a copy up on Netlify and I’m serving it from the subdomain archive.thesession.org if you want to check out the results without downloading the whole thing.

Because this is a collection of static files, there’s no search. But you can use your browser’s “Find in Page” feature to search within the (very long) index pages of each section of the site.

You don’t need to a web server to click around between the pages: they should all work straight from your file system. Double-clicking any HTML file should give a starting point.

I wanted to reduce the dependencies on each page to as close to zero as I could. All the CSS is embedded in the the page. Likewise with most of the JavaScript (you’ll still need an internet connection to get audio playback and dynamic maps). This keeps the individual pages nice and self-contained. That means they can be shared around (as an email attachment, for example).

I’ve shared this project with the community on The Session and people are into it. If nothing else, it could be handy to have an offline copy of the site’s content on your hard drive for those situations when you can’t access the site itself.

Friday, September 1st, 2023

Shining a Light on the Digital Dark Age - Long Now

A false sense of security persists surrounding digitized documents: because an infinite number of identical copies can be made of any original, most of us believe that our electronic files have an indefinite shelf life and unlimited retrieval opportunities. In fact, preserving the world’s online content is an increasing concern, particularly as file formats (and the hardware and software used to run them) become scarce, inaccessible, or antiquated, technologies evolve, and data decays. Without constant maintenance and management, most digital information will be lost in just a few decades. Our modern records are far from permanent.

Tuesday, August 3rd, 2021

A Few Notes on A Few Notes on The Culture

When I post a link, I do it for two reasons.

First of all, it’s me pointing at something and saying “Check this out!”

Secondly, it’s a way for me to stash something away that I might want to return to. I tag all my links so when I need to find one again, I just need to think “Now what would past me have tagged it with?” Then I type the appropriate URL: adactio.com/links/tags/whatever

There are some links that I return to again and again.

Back in 2008, I linked to a document called A Few Notes on The Culture. It’s a copy of a post by Iain M Banks to a newsgroup back in 1994.

Alas, that link is dead. Linkrot, innit?

But in 2013 I linked to the same document on a different domain. That link still works even though I believe it was first published around twenty(!) years ago (view source for some pre-CSS markup nostalgia).

Anyway, A Few Notes On The Culture is a fascinating look at the world-building of Iain M Banks’s Culture novels. He talks about the in-world engineering, education, biology, and belief system of his imagined utopia. The part that sticks in my mind is when he talks about economics:

Let me state here a personal conviction that appears, right now, to be profoundly unfashionable; which is that a planned economy can be more productive - and more morally desirable - than one left to market forces.

The market is a good example of evolution in action; the try-everything-and-see-what-works approach. This might provide a perfectly morally satisfactory resource-management system so long as there was absolutely no question of any sentient creature ever being treated purely as one of those resources. The market, for all its (profoundly inelegant) complexities, remains a crude and essentially blind system, and is — without the sort of drastic amendments liable to cripple the economic efficacy which is its greatest claimed asset — intrinsically incapable of distinguishing between simple non-use of matter resulting from processal superfluity and the acute, prolonged and wide-spread suffering of conscious beings.

It is, arguably, in the elevation of this profoundly mechanistic (and in that sense perversely innocent) system to a position above all other moral, philosophical and political values and considerations that humankind displays most convincingly both its present intellectual immaturity and — through grossly pursued selfishness rather than the applied hatred of others — a kind of synthetic evil.

Those three paragraphs might be the most succinct critique of unfettered capitalism I’ve come across. The invisible hand as a paperclip maximiser.

Like I said, it’s a fascinating document. In fact I realised that I should probably store a copy of it for myself.

I have a section of my site called “extras” where I dump miscellaneous stuff. Most of it is unlinked. It’s mostly for my own benefit. That’s where I’ve put my copy of A Few Notes On The Culture.

Here’s a funny thing …for all the times that I’ve revisited the link, I never knew anything about the site is was hosted on—vavatch.co.uk—so this most recent time, I did a bit of clicking around. Clearly it’s the personal website of a sci-fi-loving college student from the early 2000s. But what came as a revelation to me was that the site belonged to …Adrian Hon!

I’m impressed that he kept his old website up even after moving over to the domain mssv.net, founding Six To Start, and writing A History Of The Future In 100 Objects. That’s a great snackable book, by the way. Well worth a read.

Thursday, November 19th, 2020

Keepers of the Secrets | The Village Voice

A deeply fascinating look into the world of archives and archivists:

The reason an archivist should know something, Lannon said, is to help others to know it. But it’s not really the archivist’s place to impose his knowledge on anyone else. Indeed, if the field could be said to have a creed, it’s that archivists aren’t there to tell you what’s important. Historically momentous documents are to be left in folders next to the trivial and the mundane — because who’s to say what’s actually mundane or not?

Friday, February 28th, 2020

The Permanent Legacy Foundation

A non-profit that offers digital preservation services for individuals.

Permanence means no subscriptions; a one-time payment for dedicated storage that preserves your most precious memories and an institution that will be there to protect the digital legacy of all people for all time.

Wednesday, December 5th, 2018

Why You Should Never, Ever Use Quora – Waxy.org

Never mind their recent data breach—the reason to avoid Quora is that it’s a data roach motel.

All of Quora’s efforts to lock up its community’s contributions make it incredibly difficult to preserve when that they go away, which they someday will. If you choose to contribute to Quora, they’re actively fighting to limit future access to your own work.

Friday, November 23rd, 2018

FlickrJubilee (@FlickrJubilee) / Twitter

Flickr is removing anything over 1,000 photos on accounts that are not “pro” (paid for) in 2019. We highlight large and amazing accounts that could use a gift to go pro. We take nominations and track when these accounts are saved.

Home - Memory of Mankind

A time capsule for the long now. Laser-etched ceramic tablets in an Austrian salt mine carry memories of our civilisation in three categories: news editorials, scientific works, and personal stories.

You can contribute a personal story, your favorite poem, or newspaper articles which describe our problems, visions or our daily life.

Tokens that mark the location of the site are also being distributed across the planet.

Archiving web sites [LWN.net]

As it turns out, some sites are much harder to archive than others. This article goes through the process of archiving traditional web sites and shows how it falls short when confronted with the latest fashions in the single-page applications that are bloating the modern web.

Thursday, November 8th, 2018

The Commons: The Past Is 100% Part of Our Future | Flickr Blog

This is very, very good news. Following on from the recent announcement that a huge swathe of Flickr photos would soon be deleted, there’s now an update: any photos that are Creative Commons licensed won’t be deleted after all. Phew!

I wonder if I can get a refund for that pro account I just bought last week to keep my Creative Commons licensed Flickr pictures online.

Sunday, November 4th, 2018

Why we’re changing Flickr free accounts | Flickr Blog

I’ve got a lot of photos on Flickr (even though I don’t use it directly much these days) and I’ve paid up for a pro account to protect those photos, but I’m very worried about this:

Beginning January 8, 2019, Free accounts will be limited to 1,000 photos and videos.

That in itself is fine, but any existing non-pro accounts with more than 1000 photos will have older photos deleted until the total comes down to 1000. This means that anyone linking to those photos (or embedding them in blog posts or articles) will have broken links and images.

Tears in the rain.

Wednesday, January 3rd, 2018

The Significance of the Twitter Archive at the Library of Congress | Dan Cohen

It’s a shame that this archiving project is coming to end. We don’t always know the future value of the present:

Researchers have come to realize that the Proceedings of the Old Bailey, transcriptions from London’s central criminal court, are the only record we have of the spoken words of many people who lived centuries ago but were not in the educated or elite classes. That we have them talking about the theft of a pig rather than the thought of Aristotle only gives us greater insight into the lived experience of their time.

Tuesday, January 2nd, 2018

Wednesday, December 20th, 2017

Future Historians Probably Won’t Understand Our Internet - The Atlantic

You can’t log into the same Facebook twice.

The world as we experience it seems to be growing more opaque. More of life now takes place on digital platforms that are different for everyone, closed to inspection, and massively technically complex. What we don’t know now about our current experience will resound through time in historians of the future knowing less, too. Maybe this era will be a new dark age, as resistant to analysis then as it has become now.

Tuesday, April 25th, 2017

DRM for the Web is a Bad Idea | Internet Archive Blogs

The Encrypted Media Extensions (EME) addition to HTML is effectively DRM with the blessing of the W3C. It’s bad for accessibility, bad for usability, bad for security, and as the Internet Archive rightly points out, it’s bad for digital preservation.

Thursday, May 12th, 2016

Archiving Our Online Communities — Medium

Now this is how you shut down a service:

  • Maintain read-only URLs for at least ten years.
  • Create physical copies etched in metal held by cultural institutions for ten thousand years.
  • Allow users to export their data (of course).

Web projects often lack hard edges. They begin with clarity but end without. We want to close Hi.co with clarity. To properly bookend the website.

And nary a trace of “We are excited to announce…” or “Thank you for joining us on our incredible journey…”

(Such a shame that the actual shut-down notice is only on Ev’s blog, but hopefully Craig will write something on his own site too.)

Thursday, April 14th, 2016

The Internet Archive—Bricks and Mortar Version - Scientific American Blog Network

A profile of the Internet Archive, but this time focusing on its physical space.

The Archive is a third place unlike any other.

Sunday, May 10th, 2015

MoMA’s Digital Art Vault

Ben Fino-Radin describes how the MoMA’s archivematica “analyzes all digital collections materials as they arrive, and records the results in an obsolescence-proof text format that is packaged and stored with the materials themselves.”

Tuesday, March 17th, 2015

JavaScript and Archives | inkdroid

Thoughts on the long-term viability of sites that use JavaScript to render their content.