How I ran one Ruby app on three SQL databases for six months
Since June 2023, I’ve been running a service written in Ruby (Sinatra) that provides several Bluesky custom feeds (initially built with a feed for the iOS/Mac developers community in mind, later expanded to many other feeds). If you don’t know much about Bluesky feeds, you make them by basically running a server which somehow collects and picks existing posts from Bluesky using some kind of algorithm (chronological or by popularity, based on keyword matching, personal likes, whatever you want), and then exposes a specific API endpoint. The Bluesky AppView (API server) then calls your service passing some request parameters, and your service responds with a list of URIs of posts (which the API server then turns into full post JSON and returns to the client app). This lets you share such feed with anyone on the platform, so they can add it to their app and use it like any built-in feed. (If you’re interested, check out my example feed service project.)
In order to provide such service, in practice you need to connect to the Bluesky “firehose” streaming API which sends you all posts made by anyone on the network, and then save either those which are needed for your algorithm, or save all of them and filter later. I chose the latter, since that lets me retry the matching at any time after I modify the keyword lists and see what would be added after that change (and also some of the feeds I now run require having all posts). I also use the same database/service to generate e.g. the total daily/weekly stats here.
All posts made on Bluesky is much less than all posts on Twitter, of course, but it’s still a lot of posts. At the moment (October 2025), there are around 3.5M posts made on average every day; at the last “all time high” in November 2024, it was around 7.5M per day. A post is up to 300 characters of (Unicode) text, but since I also store the other metadata that’s in the record JSON, like timestamp, reply/quote references, embeds like images and link cards, language tags etc., it adds up to a bit less than 1 KB of storage per post on average.
In addition to that, the firehose stream (if you use the original CBOR stream from a relay, not Jetstream, which is a JSON-serving proxy) includes a lot of overhead data that you don’t need in a service like that, plus all the other types of events like handle changes, likes, follows, blocks, reposts, and so on. The total input traffic is around 15 Mbit/s average right now in October 2025 (or around 5 TB per month), and it used to be around twice that for a moment last year. (Jetstream sends around an order of magnitude less, especially if you ask it to send filtered data, e.g. only the posts.)
On disk, the millions of posts per day add up to a few gigabytes per day. Since I was running this on a VPS with a 256 GB disk (Netcup, RS 1000 – reflink), I have a cron job set up to regularly prune all older posts and keep only e.g. last 40 days worth of them (since I don’t really need to keep the older posts forever), so around 200-ish gigabytes total, and around 200 millions of rows in the posts table.
And until March this year, I was keeping all this data in… SQLite 🫠
Chapter 1: You’re probably wondering how I ended up in this situation
I think I had never used SQLite in a web app before this – I normally always used MySQL, since that was commonly used in PHP first and then in Ruby webapps (Rails/Sinatra w/ ActiveRecord). I was used to it, I knew how it worked more or less, and I always thought SQLite was only meant to be used in embedded scenarios like native desktop/mobile apps. But the example feed-generator project in JS shared by Bluesky used SQLite, and since I started out by porting that to Ruby, I ended up also using SQLite, planning to switch to something else later. But you know how it is, that “later” time never comes – and if it ain’t broke, don’t fix it. SQLite worked surprisingly well for much much longer than I expected, with much more data than I expected, and it turns out it can be absolutely ok for server/webapp database purposes, at least in some scenarios. But eventually I started hitting some limitations.
The main problem is that SQLite doesn’t allow concurrent write access. This means that many processes can read posts or other data simultanously, but only one process at a time can write to the database. This could be a serious problem in most webapps, but it worked fine in this particular architecture. You see, there’s really one entry point to the system where the posts are saved: the firehose stream consumer process. All posts come from there, and nothing else saves posts, the Sinatra API server only makes queries to what was already saved from the firehose. This is why this has worked for me for as long as it did.
However… at some point I added a second parallel thread, which separately reads data from the plc.directory and saves data about some accounts' handles and assigned PDS servers. There are also cron jobs running scripts and Rake tasks, which sometimes modify data and sometimes take a bit to run (like that older posts cleaner), and I sometimes run Rake tasks manually to e.g. rebuild feeds.
And this is where I started running into a second related problem: how this concurrent writing is/was handled in ActiveRecord. I don’t know if I can explain this all correctly (see this GitHub issue and this StackOverflow comment), but the gist is, ActiveRecord used some mode of data locking in SQLite, which resulted in a flow like this:
- A transaction is started which locks the data for reading.
- Some records are loaded and turned into AR models (through a
where
/find_by
call etc.). - Some updates are made to the model objects.
- When I call
save
on the model, AR tries to change the lock mode that would allow it to also make a write. - But something else has made other writes in the meantime.
- Because of how the transaction/locks were set up, SQLite decides that the data I’m trying to save might have been modified in the meantime, and aborts the whole transaction and throws an error (
SQLite3::BusyException: database is locked
).
Changing timeout settings doesn’t fix the problem, because the exception is thrown immediately, without waiting for the other process to finish. What worked around the problem was a mix of:
- trying to avoid doing multiple writing operations in parallel at all
rearranging the code a bit artificially so it opens a transaction and then first does some kind of write, even a completely pointless one, before doing the reads, which makes it create the right kind of lock from the beginning:
def rescan_feed_items(feed) ActiveRecord::Base.transaction do # lulz FeedPost.where(feed_id: -1).update_all(feed_id: -1) feed_posts = FeedPost.where(feed_id: feed.feed_id).includes(:post).order('time') feed_posts.each do |fp| ... end end end
or so that it does the write without touching the original model, e.g.:
# instead of: @post.update(thread_id: root.id) # do: Post.where(id: @post.id).update_all(thread_id: root.id)
This has mostly let me avoid the problem for a long time, but this meant it kept popping back up sometimes, and I had to write some code sometimes in a way that didn’t logically make sense and was only like that to avoid the exceptions. In particular, with the plc.directory thread, I had to make it pass any changes back to the main thread through a queue so they can be saved to the database from there.
(Ironically, the ActiveRecord folks finally fixed this whole problem in the 8.0 release, just as I finished migrating away from SQLite…)
A second problem was that I started having some performance issues that I couldn’t find a good solution for – e.g. post write operations were occasionally randomly taking e.g. 1 or 2 seconds to finish instead of milliseconds; and I wanted to optimize the app to be able to potentially save as many posts per second as possible, to prepare for larger traffic in the future.
When I was asking for advice, everyone was telling me “dude, just switch to Postgres” 😛 But I haven’t really worked with Postgres before other than briefly, and I knew some things were different there, and I wasn’t sure if I want to switch to something unknown rather than what I knew (MySQL).
And since I have way too much time, no life, and probably a good bit of neurodivergence, I chose the most obvious solution: set up the app on both MySQL and on Postgres on two separate & identical VPSes, and compare how it works in both versions… 🫣
Chapter 2: Getting it to run
Turns out, SQL databases are a bit different from each other in a lot of aspects, and migrating a webapp from one to the other is a bit more work than just editing the Gemfile
and database.yml
– who would’ve guessed…
Beyond the obvious, here are some things I had to change on the MySQL branch:
Some column type changes:
- integer column sizes – in SQLite, all numbers are just integers of any size, so here I changed some to
smallint
and some tobigint
- similarly, some
string
columns were changed totext
- for
datetime
columns, I’ve set the decimal precision explicitly to 6 digits (this is now the default since AR 7.0, I started on 6.x for some reason)
In queries:
- I removed some index hacks like
.where("+thread_id IS NULL”)
, which tell the SQLite query optimizer to use/not use a given index - some date operations had to be rewritten to use different functions, e.g.
DATETIME('now', '-7 days')
toSUBDATE(CURRENT_TIMESTAMP, 7)
- some queries had to be rewritten or updated because they were just throwing SQL syntax errors – e.g. I had to explicitly list table names on some fields in
SELECT
; or there was this thing in MySQL where I had to nest a subquery for DELETE one level deeper ActiveRecord::Base.connection.execute
returns rows as arrays instead of hashes for some reason, indexed by[0]
not by['field']
For Postgres, in addition to most of the above:
- I had to replace
0
/1
used as false/true in boolean columns with an explicitFALSE
/TRUE
- date functions in queries were slightly different again, e.g.
DATE_SUBTRACT(CURRENT_TIMESTAMP, INTERVAL '7 days')
- strings in queries had to be changed to all be in single quotes, not in double quotes, which in Pg are reserved for field names like
"post_id"
(what SQLite and MySQL use the backticks for:`post_id`
)
I could also finally remove all the code hacks added to work around the SQLite concurrency issues – start normally saving handles in the PLC importer thread, rewrite some transaction blocks to a more logical form, and so on.
Chapter 3: Migrating the data
For migration to MySQL, I used the sqlite3mysql tool, written in Python:
pip install sqlite3-to-mysql
sqlite3mysql -K -E -f bluesky.sqlite3 -d bluefeeds_production -u kuba -i DEFAULT -t feed_posts handles post_stats subscriptions unknown_records ...
For the posts table (which is the vast majority of the database size), I used the -c
(--chunk
) option to import posts in batches:
sqlite3mysql -K -E -f bluesky.sqlite3 -d bluefeeds_production -u kuba -i DEFAULT -t posts -c 1000000
For Postgres, I used pgloader. Unlike sqlite3mysql, it isn’t configured through command-line flags, but instead you need to write “command” files with a special DSL and then pass the filename in the argument. So my command files looked something like this:
load database
from sqlite://./db/bluesky.sqlite3
into postgresql:///bluefeeds_production
with data only, truncate
including only table names like 'feed_posts', 'handles', 'post_stats', 'subscriptions', 'unknown_records';
I’ve split the tables into several command files, because the tool generally wasn’t giving me much info when anything failed, beyond just a number of failed rows/tables in a table, so I wanted to do it more slowly step by step.
For the posts table, I’ve similarly set a “prefetch rows” flag to do it in batches:
load database
from sqlite://./db/bluesky.sqlite3
into postgresql:///bluefeeds_production
with data only, truncate, prefetch rows = 10000
including only table names like 'posts';
One surprise realization I had during the import: in SQLite, when you define a column as string with limit 50, it doesn’t actually enforce that limit! Apparently I had a whole bunch of records in some tables where the values were much longer than the expected max length… because I was missing Ruby-side AR validations (validates_length_of
) in some models – and those records were being rejected by both new databases. So I had to add all those missing length validations and clean up the invalid data first.
An additional problem cropped up in MySQL, which has different text collation rules depending on accents and unicode normalization than SQLite & Postgres. I have a “hashtags” table listing all hashtags that appeared anywhere in the posts, with a unique index on the hashtag name – but the import to MySQL was failing, because some hashtags were considered by MySQL as having the same text as some others, while SQLite had considered them different… I tried to pick a different collation for the table (utf8mb4_0900_as_cs
, i.e. both accent-sensitive
and case-sensitive), but that only partially helped with some name pairs (“e” vs. “é”), but not with others (different normalization, or invisible control characters, and there are *countless* different types of those, as I have learned…). I eventually gave up and ended up just dropping the unique index for now.
In Postgres, in turn, the problem was that it apparently doesn’t support strings that contain null bytes, and there are some occasional posts that somehow end up with a \0
in the post text… So I had to just filter out such posts as invalid.
if text.include?("\u0000") return end
Finally, something that somehow caused an issue in both versions was a thing that every programmer loves – timezones… When I made a query to count posts added in the last 5 minutes, and it always returned 0, but “last 65 minutes” didn’t, I immediately knew what was going on 🫠
In Postgres, the solution was to tell ActiveRecord to use the timestamptz
data type for timestamp columns instead of the default timestamp
:
ActiveRecord::ConnectionAdapters::PostgreSQLAdapter.datetime_type = :timestamptz
(and then migrate the existing columns to use that type).
In MySQL, you need to either tell AR to use the timezone the database is using, if it’s not set to UTC:
ActiveRecord.default_timezone = :local
Or tell the database to use UTC:
[mysqld]
default-time-zone = '+00:00'
(but not both…)
Chapter 4: Optimizing
To be able to test both databases with real production traffic in a way that would let me compare them in a fair competition, I’ve set up a kind of “database A/B test” 🙃
The feed was configured to load data from my original SQLite server as before, but in the request handler on that server, instead of calling the feed class locally, the code picked one of the two other servers based on either the “DID” (user ID) of the caller or current timestamp, proxied the call to that server, took the response, and returned it back to the caller:
get '/xrpc/app.bsky.feed.getFeedSkeleton' do feed_key = params[:feed].split('/').last if ['hashtag', 'follows-replies'].include?(feed_key) server = Time.now.hour.odd? ? SERVER_A : SERVER_B elsif ['replies'].include?(feed_key) server = Time.now.hour.even? ? SERVER_A : SERVER_B else did = parse_did_from_token server = did.nil? || did =~ /[a-m0-4]$/ ? SERVER_A : SERVER_B end url = "https://#{server}/xrpc/app.bsky.feed.getFeedSkeleton?" + request.query_string headers = env['HTTP_AUTHORIZATION'] ? { 'Authorization' => env['HTTP_AUTHORIZATION'] } : {} response = Net::HTTP.get_response(URI(url), headers) content_type :json [response.code.to_i, response.body] end
This meant that each feed load took a bit longer, but it wasn’t very noticeable in practice, and I had the real traffic split into two more or less equal parts, going to the two servers. (Coincidentally, it’s exactly one year today since I deployed that change – I’ve been procrastinating way too long on this blog post 🫠)
And then started my months-long work of optimizing the databases and queries…
Some problems showed up immediately: a query was returning data immediately in SQLite, and in MySQL or Postgres it just hangs. So in those cases, I often had to add some missing index, or modify the index somehow (e.g. add an additional field, or switch a field in a composite index to DESC
), or rearrange a query to make it use the intended index, add some limiting condition, or occasionally (in MySQL) add FORCE INDEX
. Those issues were generally fairly easy to fix, and the fix either clearly worked or not. I think some queries I had were just logically not fully thought through, but they had been working fine before because SQLite has some things organized differently on disk and some access patterns work better, hiding the issue with the query.
The bigger problem and one I’ve spent a ton of time on (in the Postgres version) was one specific query in a set of “replies feeds”. I mostly wrote everything about in on my Journal blog on micro.blog back in January, and I remembered much more about it then than I do now, so I’ll just link to that old blog post here.
The TLDR is that I have a set of three feeds: Follows & Replies, Only Replies and Only Posts, which share the same code, just with slightly different filters; these are personalized feeds (i.e. having different content depending on who’s loading them), which for a given user fetch the list of the accounts that user is following, and then make a query asking the database for “most recent N posts from any of the users that this account follows” – so basically a reimplementation of the standard “Following” feed, with some changes.
This query was much slower on the Postgres server than on the other two databases, and Postgres insisted on using the posts index on (time)
(scanning possibly millions of rows to find the right ones) instead of using the one on (user, time)
some number of times and merging the results (and apparently asking “how to do FORCE INDEX in Postgres” is a terrible heresy 😛). I spent a lot of time on this and it took me a lot of trial and error, asking more experienced people on Bluesky, reading docs and articles, chatting with ChatGPT, and so on. I went through: bumping up the STATISTICS
target and/or hardcoding n_distinct
(and rolling those back), tweaking some configuration variables, rearranging the query/index in various ways – and what I finally settled on was:
- changing the
(user, time)
index to also include theid
primary key as the third field, i.e.(user, time DESC, id)
, to let it do “Index Only Scans” on it – I haven’t realized that indexes in Postgres don’t reference the primary key, so they can’t be used this way unless theid
is explicitly included there! - and setting up a cron job to do very frequent manual VACUUM (4× a day, with forced
index_cleanup
), because otherwise it has to re-check some of the ids fetched from the index to verify if the rows haven’t been deleted
After those changes, it finally started working really nicely on Postgres, with the mean response time from this query going below 20 ms, while the MySQL version was doing around 50 ms, and the initial Postgres version before the index changes had slowed down to as much as 200-300 ms mean time. (It still occasionally picks the wrong index, for accounts that are somewhere in the middle follows range, over 1000 followed accounts – for those with many thousands, the (time)
index is almost always better – but I think that’s somewhat unavoidable.)
Epilogue: There can be only one
In the end, after all the tweaks and optimizations, both servers on both databases were working quite fine, and I think I would probably be ok with either of them. But I had to pick one.
I ended up picking…
…
…
Postgres! 🏆
In those few months, I managed to read sooo many pages of the documentation, articles about various specific settings, spent so much time in the psql
console, reading output from the analyzer, that I got much more comfortable with it than I was at the beginning… So ironically, the fact that I had to spend more time tweaking it to get it to work all smoothly made me prefer it in the end. I felt like specifically because there were so many different dials and switches, I felt more “in control” with Postgres than with MySQL – the tutorials for tuning Postgres mentioned 5-10 different settings at least, and the ones for MySQL basically said “ah, just set innodb_buffer_pool_size
to half the RAM and you’re done”. And if you’ve already set that and you’d like to optimize things further? Well… ¯\_(ツ)_/¯
Postgres’s query analyzer output is also more readable and helps you more with figuring out how it’s actually handling the query, and generally various debugging commands seem to provide more readable info about current parameters of the system – while MySQL mostly has a SHOW ENGINE INNODB STATUS
command, which just pukes several pages of text output at you.
I’ve been doing various manual benchmarks of how different parts of the system work, what the average response times and query times are and so on, on both versions, and keeping the results in tables, but I can’t really get any general conclusion from this on which database is better at what. It was often things like: one does single row deletes faster and the other does multi-row deletes faster, or one is faster at inserts and the other at counts… but generally it was changing a bit too much over time, and I wasn’t doing it in a super scientifically controlled way.
One thing that I could see on Munin charts in the end was that the Postgres server had higher numbers on the disk read IO/throughput charts, while the MySQL server had noticeably higher numbers on the disk write IO/throughput charts. Not sure if this is a good assumption, but my guess was that with higher traffic in the future, it would generally be easier to scale the read load in Postgres (with various caches, replicas etc.) than to scale the write load in MySQL. Also, in the end after all the optimizations, the key query in the “replies” feeds was working noticeably better in the Postgres version, and in the test “how quickly it can possibly process events when catching up at max speed” (where post record inserts are generally the bottleneck), the Postgres version also ended up with a slightly higher processing speed.
So since March, the app has been running on a new 512 GB VPS on a Postgres database, and it’s been working fine since then.