This is a better link to this same document by the same author, because it skips dev.to which keeps on showing me signup forms: https://briandouglas.ie/sqlite-defaults/
There’s surprisingly much pushback about this banning. Personally I don’t see why a user can’t just wait 70 days until they can submit any unseen domain they wish.
People who enjoy OP’s content can add his site to their RSS reader.
This is missing the single most important tip: ensure ALL write transactions use this:
BEGIN IMMEDIATE TRANSACTION;
Understanding why is a bit convoluted. SQLite defaults to starting all transactions as read transactions, and then upgrading to write transactions half way through if it sees an insert/update. But this plays havoc with SQLite locking, where if another transaction has grabbed a write lock since this one started you can get the dreaded “SQLITE_BUSY” error.
If you start an immediate transaction instead SQL will grab that lock at the start of the transaction, and any other transaction that wants a write lock will wait for you to finish without throwing SQLITE_BUSY.
+1.
The most confusing thing about this is that searching for the error (even in documentation) very rarely yields any results other than “oh just increase the busy_timeout, that fixes it”. It doesn’t, the dead giveaway being that the error is thrown immediately not after the timeout.
Frustratingly some libraries are lacking support for BEGIN IMMEDIATE, notably sqlx for rust.
SQLite defaults to starting all transactions as read transactions, and then upgrading to write transactions half way through if it sees an insert/update. But this plays havoc with SQLite locking, where if another transaction has grabbed a write lock since this one started you can get the dreaded “SQLITE_BUSY” error.
An other interesting things to do is split your database into multiple, well, databases. And if you attach databases using a URI they don’t have to have the same read/write status.
I believe SQLite defaults to allowing only up to 10 database files attached to a single connection, but that is probably something you can increase with a compile time setting.
The ATTACH statement is an SQLite extension that allows two or more databases to be associated to the same database connection and to operate as if they were a single database. The number of simultaneously attached databases is limited to SQLITE_MAX_ATTACHED which is set to 10 by default. The maximum number of attached databases cannot be increased above 125.
The maximum number of attached databases can be lowered at run-time using the sqlite3_limit(db,SQLITE_LIMIT_ATTACHED,size) interface.
Transactions can still be useful for doing reads with a consistent view of the database. For example imagine that you are a bank and want to build a report that includes total balance. If you don’t have a consistent view of the database you can over or undercount if transactions are happening while you create the report.
Note that some of these are per-connections settings, and some affect the on-disk database. It doesn’t hurt to issue all of these whenever you open a connection, but
incremental_vacuum has to be set before any pages are allocated, which basically means it should be the very first statement executed.
Enabling WAL should come second, because transitioning from journal to WAL is more expensive once there’s data on disk.
Some of those settings also depend on the available hardware and filesystem:
Putting temp data in memory may not be a good choice on a system with less RAM.
Mmap is a hard NO if the db is on a filesystem that might be unmounted, like an external drive or SD card. If the user abruptly unplugs the disk, the next page-fault of a mapped page will trigger a segfault and crash your process.
Last I heard, iOS still limits the amount of mapped memory a process can use. Trying to mmap a full 2GB might get your process killed. (You just need to mmap the max expected size of the db.)
It’s also a good practice to run PRAGMA OPTIMIZE periodically, like just before closing the db, or periodically when you can afford a few seconds.
Auto-vacuuming is only possible if the database stores some additional information that allows each database page to be traced backwards to its referrer. Therefore, auto-vacuuming must be turned on before any tables are created. It is not possible to enable or disable auto-vacuum after a table has been created.
[…]
The database connection can be changed between full and incremental autovacuum mode at any time. However, changing from “none” to “full” or “incremental” can only occur when the database is new (no tables have yet been created) or by running the VACUUM command. To change auto-vacuum modes, first use the auto_vacuum pragma to set the new desired mode, then invoke the VACUUM command to reorganize the entire database file.
I use SQLite heavily from multiple languages, bindings and have several wrappers and implementations.
It’s a good list but I would disagree with setting auto_vacuum. It’s not well documented but that has a lot of side effects, some of them quite bizarre and not related to actual vacuuming. For example it’s worth setting if you seek around in very large incremental blobs. However I think the overhead in general is not worth it.
temp_store memory is also risky. If you can limit the memory it would assist in detecting bad queries but you’re more likely to just crash your system. It’s probably okay to set it by default but I would be careful.
mmap_size always sounds good but it’s not usually helpful at all, and will cause annoyingly large memory use on some platforms that is very hard to justify and reason about. For example in Go, the mmap is invisible to the heap allocator, but your process will have a huge virtual address space. Every OS handles buffering and caching differently on this.
The page_size is a very interesting one. It can drastically affect performance depending on your use case, and infrastructure. For example on networked attached storage I saw performance improvements improve linearly with page size, even when I had tiny tiny rows. I think the default page size (4096) is good. 8192 or 16k to match file system cluster size or memory page size might be worth it. Don’t fiddle with it until everything else is stable and you are ready to experiment. Don’t lock it for every user, as I mentioned every set up will vary, what is good on your machine might not be on another.
It’s very much a problem on iOS where mmap pages that are no longer being used don’t get reclaimed and your app explodes with OOM. I assume that sort of thing is what they meant about “some platforms”.
I only have a Stack Overflow answer to hand unfortunately, but this tracks with my experience. See the bit about PROT_WRITE under “How does iOS mapped file support differ from other operating systems?”.
I’ve not yet used SQLite much as a programmer. But as a web app/container consumer, I use it a lot. If a web app/container (e.g. nextcloud) isn’t already setting some of these, would it be a terrible idea to set them behind its back?
Slight OT: is there guidance on how to deploy an application that uses sqlite. Do you need to rely on a smart load balancer to guarantee no-downtime deployments?
You can use systemd socket activation or s similar socket handoff technique. The idea is another process opens the sockets and hands them off to your service. When the service reloads it takes them back, then hands them to the next instance. There is a slight delay but no dropped connections.
One thing I realized while thinking about this is it’s only the write requests you need to worry about. If you have a load balancer you can route all the GET requests to a separate process. That process can safely do rolling restarts or staged rollouts.
Honestly if you want to guarantee no-downtime deployments, you will suffer a lot less just using postgres or whatever SQL flavor your provider likely offers, even in unmanaged form. You can get the experience of “my database is a single file to backup” by calling sqldump > backup.sql and taking that.
Sqlite is nice if you want low ops and don’t have that many durability requirements but if you’re at downtime prevention you’re going to have to do a bunch of work anyways.
@pushcx if a user is banned for self-promo shouldn’t their post be unpublished or something? What’s the point of keeping the self-promo post up when the user is banned for it?
This user repeatedly saw the explicit warning about starting with self-promo and switched from trying to link their blog to link to dev.to to get around it. They had zero other interaction on other stories. I’ll usually start with a warning or a temporary domain ban if either of those things isn’t true. Experience has taught me that the people who immediately, knowingly break an unambiguous rule to self-promo almost never become regular contributors. Fundamentally, they see the site as a tool to increase a metric rather than a community to participate in.
I didn’t remove this story because I try to do the smallest intervention possible. More than removing comments and stories, I DM people about stories or comments (this isn’t in the modlog because then it’d be public shaming). But I did ban this user’s sites because I don’t want to encourage marketers. I’ve long been an active entrepreneur and occasionally see marketers (eg. “DevRel”) in private communities discuss Lobsters as a source of customer traffic. Having a reputation as hostile to exploitation cuts down on how attractive the site is to them.
Relatedly, @calvin posted an issue in response to this story that has some more info on recent code changes I’ve made and plan. I’m still trying to find ways to mitigate content marketing with a lower false-positive rate, or a permeable community-based enforcement rather than a simple boolean.
No, that’s just my mistake, I misremembered. I’m sorry for the confusion.
To be safe, I double-checked the warning about self-promo. They hit it 25 minutes after they signed up when they tried to link their blog, and again 6 days later.
That’s all fine and I have no argument against banning the user according to Lobsters rules. But my point is that we shouldn’t have it both ways–if we are banning users for self-promo, then self-promo is bad, therefore self-promo posts should be taken down. Or at the very least tagged as self-promo (or marketing or something) and this tag should be by default hidden as a filtered tag for users.
My point is that the rules should be logically consistent. Otherwise it looks like Lobsters is ‘benefitting’ from self-promo posts (in the sense that it generates interesting discussion content) while at the same time punishing its submitters.
It’s consistent on a different axis. The goal is having good discussions like the one that’s happening here. The deterrent effect of removing it wouldn’t be worth disrupting that. People who relentlessly post self-promo without interacting not only don’t contribute to those discussions, their material is much less likely to spontaneously start good discussions.
That doesn’t make any sense. All the rules are being followed by everyone, with no carve outs. Starting good discussions isn’t the one and only goal, preventing marketing is also a goal. Banning this user discourages marketing, but deleting the story doesn’t act as a further deterrent, so there is no point.
If preventing marketing is also a goal, then why leave the marketing posts up? It then sounds like preventing marketing is not really a goal then, it’s just a convenient excuse to ban people?
In that case the user should be unbanned, no? It’s just fair play after all. Otherwise Lobsters gets to have its cake and eat it too–ban self-promo users but also get to keep the interesting conversations that their posts generate? Seems like double dealing.
For the record: I wasn’t making a comment on the banning, as I’m not overly familiar with the culture on banning / self-promotion here. I was mostly pointing out that the comment thread on this post has a lot of utility and I would hate to see it deleted. I’m not here to litigate social norms.
That’s fine, my reply is not necessarily for you but just using your comment as a springboard for discussion and rebuttal in case the mods think similarly.
I guess another question is, why are these sensible options not the default? For example, when does PRAGMA journal_mode = WAL; make more sense than PRAGMA journal_mode = DELETE; ?
SQLite has a legendary commitment to compatibility - files created twenty years ago can still be opened in the most recent SQLite version. This is the root of many of the surprising defaults.
WAL mode does make things more complicated because it results in three files instead of just one (at least while the database has an open connection), so I’m not surprised it’s an opt-in thing.
This is a better link to this same document by the same author, because it skips
dev.to
which keeps on showing me signup forms: https://briandouglas.ie/sqlite-defaults/The original article 404s now too… And the OP and author has been banned?!
The details on the banning are in the modlog.
There’s surprisingly much pushback about this banning. Personally I don’t see why a user can’t just wait 70 days until they can submit any unseen domain they wish.
People who enjoy OP’s content can add his site to their RSS reader.
This is missing the single most important tip: ensure ALL write transactions use this:
Understanding why is a bit convoluted. SQLite defaults to starting all transactions as read transactions, and then upgrading to write transactions half way through if it sees an insert/update. But this plays havoc with SQLite locking, where if another transaction has grabbed a write lock since this one started you can get the dreaded “SQLITE_BUSY” error.
If you start an immediate transaction instead SQL will grab that lock at the start of the transaction, and any other transaction that wants a write lock will wait for you to finish without throwing SQLITE_BUSY.
The best explanation I’ve seen of this problem is in here: https://fractaledmind.github.io/2024/04/15/sqlite-on-rails-the-how-and-why-of-optimal-performance/
+1. The most confusing thing about this is that searching for the error (even in documentation) very rarely yields any results other than “oh just increase the busy_timeout, that fixes it”. It doesn’t, the dead giveaway being that the error is thrown immediately not after the timeout. Frustratingly some libraries are lacking support for BEGIN IMMEDIATE, notably sqlx for rust.
Yeah, the lack of searchability is a big problem - I just raised that on the SQLite forum: https://sqlite.org/forum/forumpost/5ed1b0217e
Yeah, I gave up on sqlx on that basis. I just make direct use of seaquery now.
An other interesting things to do is split your database into multiple, well, databases. And if you attach databases using a URI they don’t have to have the same read/write status.
Oh, that is really interesting!
I believe SQLite defaults to allowing only up to 10 database files attached to a single connection, but that is probably something you can increase with a compile time setting.
Yes, see Maximum Number Of Attached Databases on the “Implementation Limits for SQLite” page of the docs:
I’m wondering why this is the default behaviour because from my limited experience I wouldn’t start a transaction unless I had write operations to do.
Transactions can still be useful for doing reads with a consistent view of the database. For example imagine that you are a bank and want to build a report that includes total balance. If you don’t have a consistent view of the database you can over or undercount if transactions are happening while you create the report.
a more detailed article on the same topic https://kerkour.com/sqlite-for-servers
Note that some of these are per-connections settings, and some affect the on-disk database. It doesn’t hurt to issue all of these whenever you open a connection, but
Some of those settings also depend on the available hardware and filesystem:
It’s also a good practice to run PRAGMA OPTIMIZE periodically, like just before closing the db, or periodically when you can afford a few seconds.
So does
incremental_vacuum
apply to the SQLite database file on disk and stay persistent across multiple future connections, same as WAL mode does?EDIT: Just found this https://www.sqlite.org/pragma.html#pragma_auto_vacuum
Yes. It’s not free, I don’t recommend it by default.
I use SQLite heavily from multiple languages, bindings and have several wrappers and implementations.
It’s a good list but I would disagree with setting auto_vacuum. It’s not well documented but that has a lot of side effects, some of them quite bizarre and not related to actual vacuuming. For example it’s worth setting if you seek around in very large incremental blobs. However I think the overhead in general is not worth it.
temp_store memory is also risky. If you can limit the memory it would assist in detecting bad queries but you’re more likely to just crash your system. It’s probably okay to set it by default but I would be careful.
mmap_size always sounds good but it’s not usually helpful at all, and will cause annoyingly large memory use on some platforms that is very hard to justify and reason about. For example in Go, the mmap is invisible to the heap allocator, but your process will have a huge virtual address space. Every OS handles buffering and caching differently on this.
The page_size is a very interesting one. It can drastically affect performance depending on your use case, and infrastructure. For example on networked attached storage I saw performance improvements improve linearly with page size, even when I had tiny tiny rows. I think the default page size (4096) is good. 8192 or 16k to match file system cluster size or memory page size might be worth it. Don’t fiddle with it until everything else is stable and you are ready to experiment. Don’t lock it for every user, as I mentioned every set up will vary, what is good on your machine might not be on another.
Virtual address space is not the same as memory use. That’s not a reason to discourage mmap.
It’s very much a problem on iOS where mmap pages that are no longer being used don’t get reclaimed and your app explodes with OOM. I assume that sort of thing is what they meant about “some platforms”.
Huh, interesting! Do you have a reference for that?
I only have a Stack Overflow answer to hand unfortunately, but this tracks with my experience. See the bit about PROT_WRITE under “How does iOS mapped file support differ from other operating systems?”.
I know that and my point stands. It’s a common misunderstanding that memory mapping and virtual address space are free real estate.
I always used WAL mode just out of a blind assumption that it was a good idea. I’m glad to finally find out it genuinely does help to set it.
I’ve not yet used SQLite much as a programmer. But as a web app/container consumer, I use it a lot. If a web app/container (e.g. nextcloud) isn’t already setting some of these, would it be a terrible idea to set them behind its back?
Slight OT: is there guidance on how to deploy an application that uses sqlite. Do you need to rely on a smart load balancer to guarantee no-downtime deployments?
You can use systemd socket activation or s similar socket handoff technique. The idea is another process opens the sockets and hands them off to your service. When the service reloads it takes them back, then hands them to the next instance. There is a slight delay but no dropped connections.
One thing I realized while thinking about this is it’s only the write requests you need to worry about. If you have a load balancer you can route all the GET requests to a separate process. That process can safely do rolling restarts or staged rollouts.
Honestly if you want to guarantee no-downtime deployments, you will suffer a lot less just using postgres or whatever SQL flavor your provider likely offers, even in unmanaged form. You can get the experience of “my database is a single file to backup” by calling
sqldump > backup.sql
and taking that.Sqlite is nice if you want low ops and don’t have that many durability requirements but if you’re at downtime prevention you’re going to have to do a bunch of work anyways.
@pushcx if a user is banned for self-promo shouldn’t their post be unpublished or something? What’s the point of keeping the self-promo post up when the user is banned for it?
This user repeatedly saw the explicit warning about starting with self-promo and switched from trying to link their blog to link to
dev.to
to get around it. They had zero other interaction on other stories. I’ll usually start with a warning or a temporary domain ban if either of those things isn’t true. Experience has taught me that the people who immediately, knowingly break an unambiguous rule to self-promo almost never become regular contributors. Fundamentally, they see the site as a tool to increase a metric rather than a community to participate in.I didn’t remove this story because I try to do the smallest intervention possible. More than removing comments and stories, I DM people about stories or comments (this isn’t in the modlog because then it’d be public shaming). But I did ban this user’s sites because I don’t want to encourage marketers. I’ve long been an active entrepreneur and occasionally see marketers (eg. “DevRel”) in private communities discuss Lobsters as a source of customer traffic. Having a reputation as hostile to exploitation cuts down on how attractive the site is to them.
Relatedly, @calvin posted an issue in response to this story that has some more info on recent code changes I’ve made and plan. I’m still trying to find ways to mitigate content marketing with a lower false-positive rate, or a permeable community-based enforcement rather than a simple boolean.
I see 24 comments on other stories over the past month that look normal to me, with no appreciable self-promotion. Am I misunderstanding your meaning?
No, that’s just my mistake, I misremembered. I’m sorry for the confusion.
To be safe, I double-checked the warning about self-promo. They hit it 25 minutes after they signed up when they tried to link their blog, and again 6 days later.
[Comment removed by author]
That’s all fine and I have no argument against banning the user according to Lobsters rules. But my point is that we shouldn’t have it both ways–if we are banning users for self-promo, then self-promo is bad, therefore self-promo posts should be taken down. Or at the very least tagged as
self-promo
(ormarketing
or something) and this tag should be by default hidden as a filtered tag for users.My point is that the rules should be logically consistent. Otherwise it looks like Lobsters is ‘benefitting’ from self-promo posts (in the sense that it generates interesting discussion content) while at the same time punishing its submitters.
It’s consistent on a different axis. The goal is having good discussions like the one that’s happening here. The deterrent effect of removing it wouldn’t be worth disrupting that. People who relentlessly post self-promo without interacting not only don’t contribute to those discussions, their material is much less likely to spontaneously start good discussions.
But if they do happen to start good discussions, they still stay banned? Seems kinda like a ‘rules for thee but not for me’ situation.
That doesn’t make any sense. All the rules are being followed by everyone, with no carve outs. Starting good discussions isn’t the one and only goal, preventing marketing is also a goal. Banning this user discourages marketing, but deleting the story doesn’t act as a further deterrent, so there is no point.
If preventing marketing is also a goal, then why leave the marketing posts up? It then sounds like preventing marketing is not really a goal then, it’s just a convenient excuse to ban people?
Please don’t remove this post - the thread is full of gold and I have it bookmarked for that reason.
In that case the user should be unbanned, no? It’s just fair play after all. Otherwise Lobsters gets to have its cake and eat it too–ban self-promo users but also get to keep the interesting conversations that their posts generate? Seems like double dealing.
For the record: I wasn’t making a comment on the banning, as I’m not overly familiar with the culture on banning / self-promotion here. I was mostly pointing out that the comment thread on this post has a lot of utility and I would hate to see it deleted. I’m not here to litigate social norms.
That’s fine, my reply is not necessarily for you but just using your comment as a springboard for discussion and rebuttal in case the mods think similarly.
I guess another question is, why are these sensible options not the default? For example, when does
PRAGMA journal_mode = WAL;
make more sense thanPRAGMA journal_mode = DELETE;
?SQLite has a legendary commitment to compatibility - files created twenty years ago can still be opened in the most recent SQLite version. This is the root of many of the surprising defaults.
WAL mode does make things more complicated because it results in three files instead of just one (at least while the database has an open connection), so I’m not surprised it’s an opt-in thing.
Thanks for posting these!