Madra Teanga - Open Source Irish Language Programming
An open source project that has already produced a great app for learning Irish—programmed in a language called Draíocht (sin “magic” as Béarla)!
I’m supporting this on Open Collective.
An open source project that has already produced a great app for learning Irish—programmed in a language called Draíocht (sin “magic” as Béarla)!
I’m supporting this on Open Collective.
A different world is possible. Here, for example, is an open-source large language model from Europe, designed to support the 24 official languages of the European Union.
I have no idea why their top level domain is for the British Indian Ocean Territory, soon to be no more. That doesn’t instil confidence.
The Wikimedia Foundation, stewards of the finest projects on the web, have written about the hammering their servers are taking from the scraping bots that feed large language models.
Our infrastructure is built to sustain sudden traffic spikes from humans during high-interest events, but the amount of traffic generated by scraper bots is unprecedented and presents growing risks and costs.
Drew DeVault puts it more bluntly, saying Please stop externalizing your costs directly into my face:
Over the past few months, instead of working on our priorities at SourceHut, I have spent anywhere from 20-100% of my time in any given week mitigating hyper-aggressive LLM crawlers at scale.
And no, a robots.txt file doesn’t help.
If you think these crawlers respect robots.txt then you are several assumptions of good faith removed from reality. These bots crawl everything they can find, robots.txt be damned.
Free and open source projects are particularly vulnerable. FOSS infrastructure is under attack by AI companies:
LLM scrapers are taking down FOSS projects’ infrastructure, and it’s getting worse.
You try to do the right thing by making knowledge and tools freely available. This is how you get repaid. AI bots are destroying Open Access:
There’s a war going on on the Internet. AI companies with billions to burn are hard at work destroying the websites of libraries, archives, non-profit organizations, and scholarly publishers, anyone who is working to make quality information universally available on the internet.
My own experience with The Session bears this out.
Ars Technica has a piece on this: Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries .
So does MIT Technology Review: AI crawler wars threaten to make the web more closed for everyone.
When we talk about the unfair practices and harm done by training large language models, we usually talk about it in the past tense: how they were trained on other people’s creative work without permission. But this is an ongoing problem that’s just getting worse.
The worst of the internet is continuously attacking the best of the internet. This is a distributed denial of service attack on the good parts of the World Wide Web.
If you’re using the products powered by these attacks, you’re part of the problem. Don’t pretend it’s cute to ask ChatGPT for something. Don’t pretend it’s somehow being technologically open-minded to continuously search for nails to hit with the latest “AI” hammers.
If you’re going to use generative tools powered by large language models, don’t pretend you don’t know how your sausage is made.
As it currently stands, both the rapid growth of AI-generated content overwhelming online spaces and aggressive web-crawling practices by AI firms threaten the sustainability of essential online resources. The current approach taken by some large AI companies—extracting vast amounts of data from open-source projects without clear consent or compensation—risks severely damaging the very digital ecosystem on which these AI models depend.
More on how large language bots are DDOSing the web:
LLM scrapers are taking down FOSS projects’ infrastructure, and it’s getting worse.
This is excellent! A free web book (it’s a book! it’s a website!) that teaches you how to make a website from scratch:
I feel strongly that anyone should be able to make a website with HTML if they want. This book will teach you how to do just that. It doesn’t require any previous experience making websites or coding. I will cover everything you need to know to get started in an approachable and friendly way.
👏
This project, based on OpenStreetMap, looks great:
OpenFreeMap lets you display custom maps on your website and apps for free.
You can either self-host or use our public instance.
I’m going to try it out on The Session once there’s documentation for using this with Leaflet.
David is on board. Who else?
I was having a discussion with some of my peers a little while back. We were collectively commenting on the state of education and documentation for front-end development.
A lot of the old stalwarts have fallen by the wayside of late. CSS Tricks hasn’t been the same since it got bought out by Digital Ocean. A List Apart goes through fallow periods. Even the Mozilla Developer Network is looking to squander its trust by adding inaccurate “content” generated by a large language model.
The most obvious solution is to start up a brand new resource for front-end developers. But there are two probems with that:
I actually think there are plenty of good articles and resources on front-end development being published. But they’re not being published in any one specific place. People are publishing them on their own websites.
Ahmed, Josh, Stephanie, Andy, Lea, Rachel, Robin, Michelle …I could go on, but you get the picture.
All this wonderful stuff is distributed across the web. If you have a well-stocked RSS reader, you’re all set. But if you’re new to front-end development, how do you know where to find this stuff? I don’t think you can rely on search, unless you have a taste for slop.
I think the solution lies not with some hand-wavey “AI” algorithm that burns a forest for every query. I think the solution lies with human curation.
I take inspiration from Phil’s fantastic project, ooh.directory. Imagine taking that idea of categorisation and applying it to front-end dev resources.
Whether it’s a post on web.dev, Smashing Magazine, or someone’s personal site, it could be included and categorised appropriately.
Now, there would still be a lot of work involved, especially in listing and categorising the articles that are already out there, but it wouldn’t be nearly as much work as trying to create those articles from scratch.
I don’t know what the categories should be. Does it make sense to have top-level categories for HTML, CSS, and JavaScript, with sub-directories within them? Or does it make more sense to categorise by topics like accessibility, animation, and so on?
And this being the web, there’s no reason why one article couldn’t be tagged to simultaneously live in multiple categories.
There’s plenty of meaty information architecture work to be done. And there’d be no shortage of ongoing work to handle new submissions.
A stretch goal could be the creation of “playlists” of hand-picked articles. “Want to get started with CSS grid layout? Read that article over there, watch this YouTube video, and study this page on MDN.”
What do you think? Does this one-stop shop of hyperlinks sound like it would be useful? Does it sound feasible?
I’m just throwing this out there. I’d love it if someone were to run with it.
I like this framing:
If you’ve ever corrected a typo in an Open Source readme, or added alt-text to an image, or tidied up some broken references in Wikipedia - you’re doing Digital Litter Picking. You’re cleaning up after others. And I think that’s a marvellous way to spend a little time.
This is insightful:
AI and automation is often promoted as a way of handling complexity. But handling complexity isn’t the same as reducing it.
In fact, by getting better at handling complexity we’re increasing our tolerance for it. And if we become more tolerant of it we’re likely to see it grow, not shrink.
From that perspective, large language models are over-engineered bandaids. They might appear helpful at the surface-level but they’re never going to help tackle the underlying root causes.
As a self-initiated learner, being able to view source brought to mind the experience of a slow walk through someone else’s map.
This ability to “observe” software makes HTML special to work with.
Really good advice from Maggie on running small community events:
No one else will organise the group you most want to be a part of. Whatever weird, specific things you enjoy – perhaps doing speed sudokus while smoking robusto cigars, or hosting a chemistry analysis session on sourdough bread techniques (I’m not judging either of these) – it’s worth trying to find the others. You are the most qualified person to create environments and experiences that you will personally enjoy, and in doing so you will attract people who like things that you also like. This is a decent way to make friends.
I was chatting to Andy last week and he started ranting about the future of online documentation for web developers. “Write a blog post!” I said. So he did.
I think he’s right. We need a Wikimedia model for web docs. I’m not sure if MDN fits the bill anymore now that they’re deliberately spewing hallucinations back at web developers.
Building a website can seem difficult, but half the battle is just getting started! We wanted to put this guide together as an easy compilation of tutorials and places to learn exactly what you need to get started.
This is a really useful guide for beginners!
We hope this guide helps make everything feel more accessible to you, because it is! The internet belongs to all of us, so be sure to stake your claim in it.
This is a wonderfully in-depth interactive explainer on touch target sizes, with plenty of examples.
This is a terrific interactive explainer!
This online course from Sara looks superb!
I know how overwhelming and even frustrating accessibility may feel at first. But I promise you, accessibility isn’t always as hard as it seems (especially if you know where and when to start!). And my goal with this course is to make it friendlier and more approachable.
Best of all, there’s $100 off if you sign up now—that’s a 25% saving.