MacKuba

🍎 Kuba Suder's blog on Mac & iOS development

Introduction to AT Protocol

Categories: Social Comments: 0 comments

Some time ago I wrote a long blog post I called “Complete guide to Bluesky”, which explains how all the user-facing features of Bluesky work and various tips and tricks. This one is meant to be a bit like a developer version of that – I want to explain in hopefully understandable language what all the pieces of the network architecture are and how they all fit together. I hope this will let you understand better how Bluesky and the underlying protocol works, and how it differs from e.g. the Fediverse. This should also be a good starting point if you want to start building some apps or tools on ATProto.

This post is a first part of a series – next I want to look at some comparisons with the Fediverse and some common misconceptions that people have, and look at the state of decentralization of this network, but that was way too much for one post; so this one focuses on the “ATProto intro tutorial” part.

But before we start, a little philosophical aside:

What is “Bluesky”? Which “Bluesky” are we talking about?

Discussions about Bluesky sometimes get a little confusing because… “Bluesky” could mean a few different things. Language is hard.

First, we have Bluesky the company, the team. Usually, when people want to clarify that they’re talking about the group of people or the organization, they say “Bluesky PBC” (PBC = Public Benefit Corporation), or “Bluesky team”.

(If you want to read a bit about where Bluesky came from and what’s the current state of the company, read these two sections in the Bluesky Guide blog post.)

And we also have Bluesky the product, the social network, the thing that they’ve built. This network is not a single black box like Twitter or Facebook are (despite what they say about it on Mastodon), it’s more like a set of separate and actually very transparent boxes.

The system they’ve built, of which Bluesky was initially meant to be just a tech demo, is called the Authenticated Transfer Protocol, or AT Protocol, or ATProto. Bluesky is built on ATProto, and it is in practice a huge part of what ATProto currently is, which makes the boundary between Bluesky and non-Bluesky a bit hard to define at times, but it’s still only a subset of it.

Bluesky in this second meaning is some nebulous thing that consists of: the data types (“lexicons”) that are specific to the Bluesky microblogging aspect of ATProto, like Bluesky posts or follows; the APIs for handling them and for accessing other Bluesky-specific features; the rules according to which they all work together; and the whole “social layer” that is created out of all of this, the virtual “place” – the thing that people have in mind when they say “this website”, even when it’s accessed through a mobile app. One of the coolest things about Bluesky & ATProto, in my opinion, is that it connects many different independent pieces into something that still feels like one shared virtual space.

People outside the company can create (and are creating) other such things on ATProto that aren’t necessarily Bluesky-related – see e.g. WhiteWind or Leaflet (blogging platforms), Tangled (GitHub alternative), Frontpage (Hacker News style link aggregator), or Grain (photo sharing site). They use the same underlying mechanisms that are at the base of ATProto, but use separate data types, have different rules, goals, and UIs. How do we call these things as a whole, the different sets of “data types + rules + required servers + client apps” that define different use cases of the network?

Bluesky team usually calls them “apps”, but I’m not a big fan of this term, because “app” kinda implies a client app, and that’s just one small piece of it. I sometimes call them “services” – though it’s probably not perfect either, since it implies just the server part in turn. Suggestions welcome :) (I’m mentioning this at the beginning, because this is something that many different parts are related to.)

Personally, when I say “the Bluesky app”, I will generally mean the actual client app (mobile / webapp), not the “service”, and when I say “Bluesky-specific”, I will mean the “service”, not the company; and “Bluesky-hosted” will mean run by Bluesky the company. Hopefully in most cases, it can be guessed from context.

BTW, the commonly accepted term for the whole shared “multiverse” of all ATProto apps is “The Atmosphere”, or “ATmosphere” (though I much prefer the former personally, the weird capitalization bugs me somehow ;). It was coined by someone from the community, but was accepted by the team and is now mentioned on the official atproto site.


Let’s start with defining the various building pieces of the protocol:

Records & blobs

The most basic piece of the ATProto world is a record. Records are basically JSON objects representing the data about a specific entity like a post or profile, organized in a specific way. A post/reply, repost, like, follow, block, list, entry on a list, user profile info – each of these is one record. Most public actions you take on Bluesky, like following someone or liking a post, are performed by creating a record of an appropriate type (or editing/deleting one created before).

For example, this is a post record. This is one of the likes of that post.

Records are stored on disk and transferred between servers in a binary format called CBOR, although in most API endpoints they’re returned in a JSON form (they are equivalent, just different encodings of the same data).

The key thing about records, which has very real consequences for user-facing features, is that you can only create and modify your own records, not those owned by others (and there are no “shared” records at the moment, each record is owned by a specific account). This means that e.g. when you follow someone, you create a follow record on your account, and that other person can’t delete your record, which is why there’s currently no “soft-blocking” feature, i.e. you can’t make someone stop following you (though you can block them). There are workarounds though, as I’ll explain later in the AppView section.

This also means that there’s often an unexpected assymetry between seemingly similar actions: for example, getting a list of people followed by person X is very simple (they’re all X’s records, so they’re all in one place), but getting a list of all followers of X is much harder (each record is in a different place!). This is something that the AppView helps with too, as we’ll see later.

A second, complimentary way of storing user data is blobs. Blobs are basically binary files, meant mostly for storing media like images and video. For example, here is a direct link to an image blob showing a photo of when I started writing this blog post. Blobs are stored on the same server as records, but somewhat separate from them, since it’s a different type of data.

Lexicons

Each record belongs to a specific “record type” and stores its data organized in a specific structure, which defines what kinds of fields it can have with what types, what they mean, which are required, and so on – kind of like XML/JSON Schema. This schema definition which describes a given record type is called a lexicon in ATProto. (If you’re curious why make a new standard, see threads e.g. here, here, or here, or this blog post).

A lexicon needs to have an identifier (called NSID, Namespace Identifier), which uses the reverse domain name format, e.g. app.bsky.feed.post. All lexicons that are used to store the data of a specific app are usually grouped under the same prefix, e.g. Bluesky lexicons all start with app.bsky.

The structure of a given lexicon’s records is defined in a special JSON file – for example, this file defines the app.bsky.feed.post lexicon. As you can see, this is the place which for example specifies that a post’s text can have at most 300 characters (more specifically, Unicode graphemes). This also means that you can’t create a different server which would make posts longer than 300 characters that would be Bluesky-compatible and displayed on bsky.app – such posts would not pass the validation against the post record schema, and would be rejected by any server or client which performs such validation. Essentially, whover designs and controls the given lexicon, decides what kinds of data it can hold and any constraints on it. In order to store a different, incompatible type of data, you need to create a new lexicon (although you can add additional fields to a record that aren’t defined in its lexicon; many third party apps are doing that, like e.g. Bridgy Fed).

Lexicon name prefixes generally define boundaries between “apps” as in “services”, and between the “territory” that’s owned by different parties. The lexicons and endpoints defined by Bluesky are defined either under app.bsky.* – these are things specific to Bluesky the microblogging service – or under com.atproto.*, which are things meant to be used by all ATProto apps and services regardless of the use case. There are also a couple of other minor namespaces like chat.bsky.* for the (centralized) DM service, and tools.ozone.* for the open source Ozone moderation tool.

The lexicon prefix is generally (in most cases) a good way to tell if a piece of the protocol is something Bluesky-specific (specific to the Bluesky service), or something general for all ATProto. There are no record types defined in com.atproto, so things like post, profile, follow are all Bluesky-specific and under app.bsky, as are APIs for e.g. searching users, getting timelines, custom feeds and so on. Meanwhile, com.atproto APIs deal more with things like: info about a repository, fetching a repository, signing up for a new account, refreshing an access token, downloading a blob, etc.

Third party developers and teams building apps on ATProto/Bluesky, which either extend Bluesky’s features or make something completely separate, use their own namespaces for new lexicons, like blue.flashes, social.pinksky, events.smokesignal, sh.tangled, and so on. (There is a lot of nuance to whether you should use your own lexicons or reuse or extend existing ones when building things, and there have been a lot of discussions about it on Bluesky, and even conference talks. A good starting point is this blog post by Paul Frazee.)

Identity

Each user is uniquely identified in the network with their Decentralized Identifier (DID). DIDs are a W3C standard, but (as I understand) this standard mostly just defines a framework, and there can be many different “methods” of storing and resolving the identifiers, and each system that uses it can pick or create different types of those DIDs.

The format of a DID is: did:<type>:<…>, where the last part depends on the method. ATProto supports two types of DIDs, but in practice, almost everyone uses one of them, the “plc”. Each DID has a “DID document”, a JSON file (see mine) which describes the account – in ATProto at least, the document includes things such as: the assigned handles, the PDS server hosting the account, and some cryptographic keys.

An important thing to note is that DIDs are permanent; it’s the only thing that is permanent about your account, because something has to be. There needs to be some unique ID that all databases everywhere can use to identify you, which doesn’t change, and the DID is that ID. This means that you can’t change a DID of one type into another type later.

The main DID method is did:plc, where IIRC “plc” originally stood for “placeholder” (I think it was meant to be temporary until something better is designed), and was later kind of retconned to mean “Public Ledger of Credentials” 🙃 The DIDs of this type are identified by a random string of characters, which looks like this: did:plc:vc7f4oafdgxsihk4cry2xpze. The DID documents of each DID are stored in a centralized service hosted at plc.directory (Bluesky wants to eventually transfer the ownership to some external non-profit), which basically keeps a key-value store mapping a DID to a JSON file. It also keeps an “audit log” of the previous versions of the document (this means that, for example, the whole history of your old handles is available and you can’t erase it!). There’s also some cryptographic stuff there which, as I understand it, lets anyone verify that everything in the database checks out (don’t ask me how).

The other, rarely used method is did:web. Those DIDs look like this: did:web:witchcraft.systems, and the DID document is stored in a specific .well-known path on the given hostname, in this case witchcraft.systems (yes, that’s an actual TLD ;). It does not store an audit log/history like plc does.

The reason why it’s rarely used and not recommended, is because, first, it’s more complicated to create one (though that’s a solvable problem of course, see a just published guide); but second and more importantly, since DIDs are permanent, this means that your account is permanently bound to that domain. You need to keep it accessible and not let it expire, or you lose the account – you can’t migrate it to did:web:another.site at some point later. It gives you more independence, but at the cost of being tied to that domain you have, and this isn’t a tradeoff that most people are likely to want, and definitely not people who don’t understand what they’re getting into.

If you’re fine with that choice, you can create a did:web account and almost everything in Bluesky and ATProto should work exactly the same. “Almost”, because some services forget to implement that second code path, since it’s so rarely used 😉 but in that case, politely nudging the developer to fix the issue should help in most cases :>

Handles

What DIDs enable is that since they act as the unique identifier, your handle doesn’t have to, like it does on the Fediverse. I can be @mackuba.bsky.social one day, @mackuba.eu the next day, and @mackuba.martianbase.net the week after. All existing connections – follows & followers, my posts, likes, blocks, lists I’m on, mentions in posts, etc. all work as before, because they all reference the DID, not the handle. With mentions specifically it works kinda funny, because they use what’s called a “facets” system (see later section), where the link target is specified separately from the displayed text. So you can have an old post saying “hey @mackuba.bsky.social”, where the handle in it links to my profile which is now named “@mackuba.eu”. The link still works, because it really links to the DID behind the scenes.

Unlike on the Fediverse, the format of handles is just a hostname, not username + hostname. You assign a whole hostname to a specific account, and if you own any domain name, that can be your username (and if you own a well known domain name, it’s strongly recommended that you do, as a form of self-verification!).

The handle to DID assignment is a two-way link – a DID needs to claim a given handle, and the owner of the domain needs to verify that they own that DID. On the DID side, this happens in the alsoKnownAs field of the DID document (see here in mine). On the domain side, there are two ways of verifying a handle, depending on what’s more convenient to you: either a DNS TXT entry, or a file on a .well-known path.

You might be wondering how handles like *.bsky.social work – in this case, each such handle is its own domain name, and you can actually enter a domain like aoc.bsky.social into a browser and it will redirect to a Bluesky profile on bsky.app. Behind the scenes, this is normally handled by having a wildcard domain pointing to one service, which responds to HTTP requests on that .well-known path by returning different DIDs, depending on the domain. That’s not only a bsky.social thing – e.g. there’s now an open Blacksky PDS server which hands out blacksky.social handles, and there are even “handle services” which only give out handles – e.g. you can be yourname.swifties.social if you want ;)

One place where handle changes break things is (some) post URLs on bsky.app. The official web client uses handles by default in permalinks, which means that if you link to a Bluesky post e.g. from a blog post and you change your handle later, that link will no longer work. You can however replace the handle after /profile/ with the user’s DID, and the router accepts such links just fine, they just aren’t used by default. So the form you’d want to use when putting links in a blog post or article (like the one you’re reading) would be something like: https://bsky.app/profile/did:plc:ragtjsm2j2vknwkz3zp4oxrd/post/3llwrsdcdvc2s.

AT URIs

Each record can be uniquely addressed with a specific URI with the at:// scheme. The format of the URI is:

at://<user_DID>/<lexicon_NSID>/<rkey>

Rkey is an identifier of a specific record instance – a usually short alphanumeric string, e.g. Bluesky post rkeys look something like 3larljiybf22v. So a complete post URI might look like this: at://did:plc:z72i7hdynmk6r22z27h6tvur/app.bsky.feed.post/3larljiybf22v. You can look up at:// URIs in some record browser tools, e.g. PDSls.

AT URIs are used for all references between records – quotes, replies, likes, mute list entries, and so on. If you look at this like record, for example, its subject.uri points to at://did:plc:vwzwgnygau7ed7b7wt5ux7y2/app.bsky.feed.post/3lv2b3f5nys2n, which is the URI of a post record you can see here. Since the URIs use DIDs in the first part, handle changes don’t affect such links.

User repositories

All user data (records and blobs) is stored in a repository (or “repo”). The repository is identified by user’s DID, and stores:

  • records, grouped by lexicon into so-called collections
  • blobs (stored separately from records)
  • authentication data like access tokens, signing keys, hashed passwords etc.

Internally, an important part of how the repo stores user records is a data structure called “Merkle Search Tree” – but this isn’t something that you need to understand when using the protocol, unless you’re working on a PDS/relay implementation (I haven’t needed to get into it so far).

You can download the records part of your (or anyone else’s!) repo as a bundle called a CAR file, a Content Addressed Archive (fun fact: the icon for the button in the Bluesky app which downloads a repo backup is the shape of a car 🚘).

The cool part is that a repository stores all data of the given user, from *all* lexicons. Including third party developer lexicons. This means that if someone has their account hosted on Bluesky servers, but uses third party ATProto apps like Tangled or Grain, Bluesky lets them store these apps’ records like Grain photos or Tangled pull requests on the same server where it keeps their Bluesky posts. (And yes, of course someone made a lexicon/tool for storing arbitrary files on your Bluesky PDS… and did it in Bash, because why not 🙃)

XRPC

XRPC is the convention used for APIs in the ATProto network. The API endpoints use the same naming convention as lexicon NSIDs, and they have URLs with paths in the format of /xrpc/<nsid>, e.g. /xrpc/app.bsky.feed.getPosts. There are similar lexicon definition files which specify what parameters are accepted/required by an endpoint and what types of data are returned in the JSON response. PDSes, AppViews, labellers and feed generators all implement the same kind of API, although with different subsets of specific endpoints. Third party apps don’t have to use the same convention, but it’s generally a good idea, since it integrates better with the rest of the ecosystem.

Rich text / facets

This one is kinda Bluesky-specific, but it’s pretty important to understand, and I think you can reuse it for non-Bluesky apps too.

The “facets” system is something used for links and possibly rich text in future in Bluesky posts. It’s perhaps a little bit unintuitive at first, but it’s pretty neat and allows for a lot of flexibility.

The way you handle links, mentions, or hashtags, is that they aren’t highlighted automatically, but you need to specifically mark some range of text as a link using the facets. A facet is a marking of some range of the post text (from-to) with a specific kind of link. If you look e.g. at this post here, you can see that it has a facet marking the byte range 60-67 of the post text as a hashtag “ahoy25”. If there was no facet there, it would just render as normal unlinked text “#ahoy25” in the post (when you see that, it’s an easy tell that a post was made using some custom tool that’s in early stages of development). It works the same way for mention links and normal URL links.

(If you’re curious why they implemented it this way, check out this blog post.)

Note that the displayed text in the marked fragment doesn’t have to match what the facet links to; this means that you can have links that just use some shorter text for the link instead of a part of the URL, in order to fit more text in one post (although in the official app, clicking such link triggers a warning popup first). E.g. some Hacker News bots commonly use this format, see this post. The Bsky app doesn’t let you create such posts directly, but some other clients like Skeetdeck do.

Facets are also used for URL shortening – if you just put a long URL in the text of a post made through the API, it will be neither shortened nor highlighted. You need to manually mark it with a facet, and manually shorten the displayed part to whatever length you want.

Likely the most tricky part is that the index numbers you need to use for the ranges are counted on a UTF-8 representation of the text string, but they’re counted in… bytes and not unicode scalars, which most languages index strings in 😅 This is somewhat of an unfortunate tech debt thing as I understand, and it was made this way mostly because of JavaScript, which doesn’t work with UTF-8 natively. But this means you need to be extra careful with the indexes in most languages.


Ok, now that we got through the basic pieces, let’s talk about servers:

PDS

The original copy of all user data is stored on a server called PDS, Personal Data Server. This is the “source of truth”. A PDS stores one or more user accounts and repos, handles user authentication, and serves as an “entry point” to the network when connecting from a client app. Most network requests from the client are sent to your PDS, although only some of them are handled directly by the PDS, and the rest are proxied e.g. to the AppView. So in a way, your PDS kind of serves as your “user agent” in the network on the backend side of things (beyond the client app), especially if it’s under your control.

Each PDS has an XRPC API with some number of endpoints for things like listing repositories, listing contents of each, looking up a specific record or blob, account authentication and management, and so on. It also has a websocket API called a “firehose” (the subscribeRepos endpoint). The firehose streams all changes happening on a given PDS (from all repos) as a stream of “events”, where each event is an addition, edit, or deletion of a record in one of the repos, or some change related to an account, like handle change or deactivation.

One of the most important features of ATProto is that an account is not permanently assigned to a PDS. Unlike in ActivityPub, where your identifier is e.g. mackuba@mastodon.social and it can never change, because everything uses that as the unique ID, here the unique ID is the DID. The PDS host is assigned to a user in the DID document JSON (e.g. on plc.directory), but you can migrate to a different PDS at any point, and at the moment there are even some fairly user-friendly tools available for doing that, like ATP Airport or PDS MOOver (although it’s still a bit unpolished at the moment, and for now you can’t migrate back to Bluesky-hosted PDSes). In theory, you should even be able to migrate to a different PDS if your old PDS is dead or goes rogue, if you have prepared in advance (this is a bit more technical). If everything goes well, nobody even notices that anything has changed (you can’t even easily check in the app what PDS someone is on, although there are external tools for that, like internect.info).

Initially, during the limited beta in 2023, Bluesky only had one PDS, bsky.social. In November 2023, several additional PDSes were created (also under Bluesky PBC control) and existing users were quietly all spread to a random one of those. At that point, the network was already “technically federated”, operating in the target architecture, although with access restricted to only Bluesky-run servers. This restriction was lifted in February 2024 with the public federation launch.

Since then, ATProto enthusiasts started setting setting up PDS servers for themselves, either creating alt/test accounts there, or moving their main accounts. As of August 2025, there around 2000 third party PDS servers, although most of them are very small – usually hosting one person’s main and/or test accounts, and maybe those of a couple of their friends. I have a list of them on my website, and there’s also a more complete list here (mine excludes inactive PDSes and empty accounts).

As you can see there, there’s one massive PDS for Bridgy Fed, the Bluesky-Mastodon bridge service, hosting around 30-40k bridged accounts from the Fediverse, Threads, Nostr, Flipboard, or the web (blogs); then some number of small to medium PDSes for various services, and a very long tail of servers with single-digit number of accounts. At this moment, large public PDS in the style of Fedi instances aren’t much of a thing yet, although there are at least a few communities working on setting up one (e.g. Blacksky, Northsky, or Turtle Island). Blacksky specifically has opened up for migrations just last week and has now a few hundred real accounts.

The vast majority of PDSes at the moment use the reference implementation from Bluesky (written in TypeScript), but there are a few alternative implementations at various levels of maturity (Blacksky’s Rudy Fraser’s rsky written in Rust, cocoon in Go, or millipds in Python). The official version is very easy to set up and very cheap to run – it’s bundled in Docker, and there’s basically one script you need to run and answer a few questions.

As for the Bluesky-hosted PDSes, the number is currently in high double digits, and each of them hosts a few hundred thousands of accounts (!). And what’s more, they keep the record data in SQLite databases, one per account. And it works really well, go figure. The Bluesky PDSes are all given names of different kinds of mushrooms (like Amanita, Boletus or Shiitake), hence they are often called “mushroom servers”; you can see the full list e.g. here. bsky.social was left as a so-called “entryway server”, which handles shared authentication for all Bluesky-hosted PDSes (it’s a private piece of Bluesky PBC infrastructure that’s not open source and not needed for independent PDS hosters).

Relay

A relay is probably the piece of the ATProto architecture that’s most commonly misunderstood by people familiar with other networks like the Fediverse. It doesn’t help that both the Fediverse and Nostr also include servers called “relays”, but they serve a different purpose in each of them:

  • a relay in Nostr is a core piece of the architecture: your posts are uploaded to one or more relays that you have configured and are hosted there, where other users can fetch them from
  • a relay in the Fediverse is an optional helper service that redistributes posts from some number of instances who have opted in to others, in order to make content more discoverable e.g. on hashtag feeds

In ATProto, a relay is a server which combines the firehose streams from all PDSes it knows about into one massive stream that includes every change happening anywhere on the network. Such full-network firehose is then used as the input for many other services, like AppViews, labellers, or feed generators. It serves as a convenient streaming API to get e.g. all posts on the network to process them somehow, or all changes to accounts, or all content in general, from a single place.

Initially, the relay was also expected to keep a complete archive of all the data on the network, from all repos, from the beginning of time. This requirement was later removed in the updates late last year, at least partially triggered by the drastic increase in traffic in November 2024, which overwhelmed Bluesky’s and third party servers for at least a few days. Currently, Bluesky’s and other relays are generally “non-archival”, meaning that they live stream current events (+ a buffer of e.g. last 24 or 36 hours), but don’t keep a full archive of all repos (this change has massive lowered the resource requirements / cost of running a relay, making it much more accessible). An archival relay could always be set up too, but I’m not aware of any currently operating.

Bluesky operates one main relay at bsky.network, which is used as a data source for their AppView and pretty much everyone else in the ATProto ecosystem at the moment (internally, it’s really some kind of “load balancer” using the rainbow service, with a few real relay servers behind it).

The relay code is implemented in Go, and isn’t very hard to get up and running (especially the recent “1.1” update improved things quite a lot). Some people have been running alternative relay services privately for some time, and there is now e.g. a public relay run by Rudy Fraser at atproto.africa (with a custom implementation in Rust! 🦀), and a couple run by Phil @bad-example.com. I’m also running my own small relay, feeding content only from non-Bluesky PDSes.

Jetstream

There is also a variant of a relay called Jetstream – it’s a service that reads from a real CBOR relay and outputs a stream that’s JSON based, better organized, and much more lightweight (the full relay includes a lot of additional data that’s mostly used for cryptographic operations and other low-level stuff). For many simpler tools and services, it might make more sense to stream data from that one instead, if only to save bandwidth. (Bluesky runs a couple of instances listed there in the readme, but you can also run your own.)

AppView

The terribly named AppView is the second most important piece of the network after the PDS.

The AppView is basically an API server that serves processed data to client apps. It’s an equivalent of an API backend (with the databases behind it) that you’d find on a classic social media site like Twitter. AppView streams all new data written on the network from the relay, and saves a copy of it locally in a processed, aggregated and optimized form. For example, an AppView backed by an SQL database could have a posts table with a text column, a likes table storing all likes with a foreign key post_id, probably also an integer likes_count column in posts for optimization, and so on.

The AppView is designed to be able to easily give information such as:

  • the latest posts from this user
  • all the replies in a given thread organized in a tree
  • most recent posts on the network with the hashtag #rubylang or mentioning “iOS 26”
  • how many likes/reposts has a given post received, and who made them
  • how many follows/followers does a given user have, and who are they
  • is user A allowed to view or reply to a post from user B

All this data originates from users’ PDSes and has its original copy stored there, but the “raw” record don’t always allow you to access all information easily. For example, to find out how many likes a post has, you need to know all app.bsky.feed.like records referencing it from other users, and each of those like records is stored in the liking user’s repo on that user’s PDS. Same with followers, as I mentioned earlier in the section on records, or with building threads (again, different replies in one thread are hosted in different repos), or for basically any kind of search. So having this kind of API with processed data from the entire network is essential for client apps and various tools and services built around Bluesky by other people.

AppView also applies some additional rules to the data, sometimes overriding what people post into their PDSes, since anyone can technically post anything into their PDS. For example, the AppView prevents you from looking at the profiles of people who have blocked you, at least when you’re logged in. It also hides them from your followers list, even if they have a follow record referencing you, making it seem like they don’t; and if they try to make an app.bsky.feed.post replying to you (they can create such record on their PDS!), it excludes such reply from feeds and threads, as if it never happened. Same goes for “thread gates” which lock access to threads, and so on.

The AppView is one of the few components which aren’t completely open source. Initially, the AppView used Postgres as its data store; that version is still in the public repository. In late 2023, Bluesky has migrated to a “v2” version, which uses the NoSQL database ScyllaDB instead, to be able to handle the massive read traffic from many millions of concurrent users. The upper layer with the “business logic” is kept in the public repository, while the so called “dataplane” layer that interacts directly with Scylla is not. The reason is mostly that it’s built for a specific hardware setup they have and wouldn’t be directly usable by others, while it would add some unnecesary work for the team to publish it. It’s still possible to run the AppView with the old Postgres-based data layer (and I think the team uses that internally for development), it just can’t handle as much traffic as the current live version.

This is the piece that’s hardest to run yourself, and one that requires the most resources. That said, a private AppView should be possible to run right now for under $200/month – the biggest requirement is at least a few TB of disk space. The truly costly part is not collecting and storing all this data, but serving it to a huge number of users who would use it as a backend for the client app in daily use. An alternative full-network Bluesky AppView that is used by a few thousands of users shouldn’t be very hard to run, but to be able to serve millions, you’ll need a lot of hardware and something more custom than the Postgres-based version.

There have also been some attempts at alternative implementations – the most advanced right now is AppViewLite, built in C#, which goes to great lengths to minimize the resource use.

CDN

A part of the AppView (at least the Bluesky one) is also a CDN for serving images & videos. The API responses from e.g. getTimeline or getPostThread generally include links to any media on the Bluesky CDN hostname, not directly on the PDS, even though you can fetch every blob from the PDS, since that’s the “source of truth” (although IIRC the Bluesky PDS implementation doesn’t set the CORS headers there). It’s recommended to access any media this way in order to not use too much bandwidth from the PDS.

Labellers

(Or “labelers” officially, but I like the British spelling more here, sue me ¯\_(ツ)_/¯)

We’re now getting to more Bluesky specific things (i.e. specific for the Bluesky-service, although some parts of it are ATProto-general and mentioned on the atproto.com site).

A labeller is a moderation service for Bluesky (or other ATProto app), which can be run by third parties. Labellers emit labels, which are assigned to an account or a record (like a post). Each labeller defines its own set of labels, depending on what it’s focusing on; then, users can “subscribe” to a labeller and choose how they want to handle the labels it assigns: you can hide the labelled posts/users, mark them with a warning badge, or ignore given label.

Labellers were initially designed to just do community moderation of unwanted content, e.g. you can have a service focused on fighting racism, transphobia, or right-wing extremism, and that service helps protect its users from some kinds of bad actors; or you can have one marking e.g. posts with political content, users who follow 20k accounts, or who post way too many hashtags. In practice, many existing labellers are meant for self-labelling instead, letting you assign e.g. a country flag or some fun things like a D&D character class to yourself.

The way it works technically is:

  • a labeller either runs a firehose client pulling posts from the relay, or relies on reports from users and/or its operating team (usually using the Ozone tool for that)
  • labels, which are lightweight objects (not ATProto records) are emitted from labeller’s special firehose stream (the subscribeLabels endpoint)
  • the AppView listens to the label firehoses of all labellers it knows about, in addition to the relay stream, and records all received labels in its database
  • when a logged in user pulls data like threads or timelines from the AppView, it adds relevant label info to the responses depending on which labellers the user follows
  • the specific list of labellers whose labels should be applied is passed explicitly in API requests in the atproto-accept-labelers header (there is a “soft” limit of 20 labellers you can pass at a time, which is why the official app won’t let you subscribe to more)
  • in the official app, Bluesky’s official moderation service (which is “just” another labeller) is hardcoded as one of those 20 and you can’t turn it off; when connecting from your own app or tool, you’re free to ignore it if you want

(Read more about labellers here.)

Feed generators

Custom feeds are one of the coolest features of Bluesky. They let you create any kind of feed using any algorithm and let everyone on the platform use it (even as the default feed, if they want to).

The way this system works is that you need to run a “feed generator” service on your server. In that service, you expose an API that the AppView can call, which returns a list of post at:// URIs selected by you however you want in response to a given request.

A minimal feed service can be pretty simple – the API is just three endpoints, two of which are static, and the third returns the post URIs. One “small” problem is that in order to return the post URIs, you need to have some info about posts stored up front, which in practice means that you almost always need to connect to a relay’s firehose stream and store some post data (of selected or all posts, depending on your use case).

The flow is like this:

  • a feed record is uploaded to your repo, including metadata and location of the feed generator service, which lets other users find your feed
  • when the user opens that feed in the app, the AppView makes a request to your service on their behalf
  • your service looks at the request params and headers, and returns a list of posts it selected in the form of at:// URIs
  • the AppView takes those URIs and maps them to full posts (so-called “hydration”), which it returns to the user’s app

How exactly those posts are selected to be returned in the given request is completely up to you, the only requirement is that these are posts that the AppView will have in its database, since you only send URIs, not actual post data. In most cases, feeds use some kind of keyword/regexp matching and chronological ordering, but you can even build very complex, AI-driven algorithmic “For You” style personalized feeds.

You don’t necessarily have to code a feed service yourself and host it in order to have a custom feed – there are a few feed hosting services that don’t require technical knowledge to use, like SkyFeed or Graze.

Client apps

Ok, that’s technically not a server, but stay with me…

The final piece that you need to fully enjoy Bluesky is the client app – a mobile/desktop one or a web frontend. Unlike on Fedi, where an instance software like Mastodon usually includes a built-in web frontend that is your main interface for accessing the service, the PDS doesn’t include anything like that, just a database and an API (which also means it’s much more lightweight and needs less resources). All browsing is done through a separate client, and the client always does everything through the public API – kind of like when you run a custom web client for Mastodon like Elk or Phanpy, you connect it to your instance, and you view your timeline on elk.zone.

So when you go to bsky.app, that’s what you’re seeing – a web client that connects to your PDS (Bluesky-hosted or self-hosted) through the public API, no more, no less. The official app is built for both mobile platforms and for the web from a single React Native codebase (apparently React Native on the web and normal web React is not the same thing 🧐). This has allowed the still very small frontend team (and IIRC at first it was literally just Paul) to build the app for three platforms in any reasonable amount of time and maintain it going forward. The downside is that it’s kinda neither a great webapp nor a great mobile app… But the team is doing what they can to improve it, and it’s already much better than it used to be, and tbh more than good enough for me.

There aren’t nearly as many alternative clients as there are for Mastodon, and none of them are really great, but there are a few options; see the apps part of my Bluesky Guide blog post for links.

DMs

Notice that I haven’t mentioned DMs anywhere – that’s because they aren’t a part of the protocol at the moment. The Bluesky team wants to eventually add some properly implemented, end-to-end encrypted, secure DMs using some open standard, but they won’t be able to finish that in the short term, and a lot of people were asking for at least some simple version of DMs in the app. So they’ve decided as an interim solution to implement them as a fully centralized, closed source service. It is accessible to third-party Bluesky clients through the API (the chat.bsky.* namespace), but it’s not something you can run yourself. The team is very open about the fact that it’s not a proper replacement for something like Signal, and that for sensitive communication, you should ideally just use it for swapping contacts on Signal on iMessage and move the conversation there. They also kinda don’t want to spend too much time adding features there, because it’s considered a temporary solution, so it’s pretty basic in terms of available features.

There are also a few other closed-source helper services, like the “cardyb” they use for generating link card details, or the video service for preprocessing videos, but they’re all specific to some Bluesky use cases only and not strictly necessary to use.


How it all fits together

So the flow and hierarchy is like this:

  • the client app you use creates new records as a result of actions you take (new posts, likes, follows), and saves them into your PDS
  • your PDS emits events on its firehose with the record details
  • Bluesky relay and other relays are connected to the firehoses of each PDS they know about (your PDS generally needs to ask them to connect using the PDS_CRAWLERS ENV variable), and they pass those events to their output firehose
  • the Bluesky AppView (and other AppViews) listen to the firehose of their selected relay (though it could be multiple relays, or it could even just stream directly from PDSes, but in practice this will normally be one trusted relay)
  • the AppView gets events including your records, and if they are relevant, saves the data to its internal database in some appropriate representations
  • when other users browse Bluesky in their client apps, they load timelines, feeds and threads from the AppView, which returns info about your post from that database it saved it to

Additionally:

  • feed generators run by third party feed operators also stream data from Bluesky’s or some other relay and save it locally, so they can respond to feed requests from the AppView
  • labellers also stream data from Bluesky’s or some other relay, and emit labels on their firehoses, which get sent to the AppView (note: there is no official “labeller relay” sitting between labellers and the AppView, although one third party dev wrote one)

Note:

  • PDSes do not connect to each other directly, and they don’t store posts of users from other PDSes, only their own
  • although right now basically everyone uses the Bluesky relay and AppView, anyone can set up their own alternative relays and AppViews, which feed from all or any subset of known PDSes
  • PDS chooses which relays to ask to connect, but relays can also connect by themselves to a PDS or another relay; AppView chooses which relay(s) it streams data from; and PDS chooses which AppView it loads timelines & threads from
  • it’s absolutely possible and expected that two users using different PDSes, which use separate AppViews feeding from separate relays will be able to talk to each other and see each other’s responses on their own AppView, as long as the users aren’t banned on the other user’s infrastructure

The metaphor that’s often used to describe these relationship is that PDSes are like websites which publish some blog posts, and relays & AppViews are like search engines which crawl and index the web, and then let you look up results in them. In most cases, a website should be indexed and visible in all/most available search engines.


Where to go next

And that’s about it – I think with the above, you should have a pretty good grasp of the big picture of ATProto architecture and all the specific parts of it. Now, if you want to start playing with the protocol and building some things on it, a lot will depend on what specifically you want to build and using what languages/technologies:

SDKs:

Two languages are officially supported by Bluesky:

  • JavaScript/TypeScript, in which most of their code is written (see the packages folder in the atproto repo)
  • Go, which is used in some backend pieces like the relay, or the goat command line tool used e.g. for PDS migrations (see the indigo repo)

For Python, there is a pretty full-featured SDK created by Marshal, which is the only third party SDK officially endorsed by the Bluesky team.

For other languages, I have a website called sdk.blue, which lists all libraries and SDKs I know about, grouped by language. As you can see, there is something there for most major languages; I’ve built and maintain a group of Ruby gems myself. If you want to use a language that doesn’t have any libraries yet, it’s really not that hard to make one from scratch – for most things you just need an HTTP client and a JSON parser, and maybe a websocket client.

Docs:

There is quite a lot of official documentation, although it’s a bit spread out and sometimes not easy to find.

The places to look in are:

  • atproto.com – the official AT Protocol website; a bit more formal documentation about the elements of the protocol, kind of like what I did here, but with much more info and detailed specifications of each thing
  • docs.bsky.app – more practical documentation with guides and examples of specific use cases in TS & Python (roll down the sections in the sidebar); it shows examples of how to make a post, upload a video, how to connect to the firehose, how to make a custom feed, etc.
  • docs.bsky.app/blog – developer blog with updates about protocol changes
  • HTTP reference – a reference of all the API endpoints
  • something that I also find useful is to have the atproto repo checked out locally and opened in the editor, and look things up in the JSON files from the /lexicons folder

And a few other articles that might work better for you:

Community:

Someone said recently that “bsky replies are the only real documentation for ATProto”, and honestly, they’re not wrong. We have a great community of third party developers now, building their own tools, apps, libraries, services, even organizing conferences. If you’re starting out and you have any questions, just ask and someone will probably help, and some of the Bluesky team developers are also very active in Bluesky threads, answering questions and clarifying things. So a lot of such knowledge that’s not necessarily found in the official docs can be found somewhere on Bluesky.

The two places I recommend looking at are:

  • the “ATProto Touchers” Discord chat – ping me or some other developer for an invite :)
  • my ATProto feed on Bluesky, which tries to catch any ATProto development discussions – it should include posts with any mention of “ATProto” or things like “AppView” or various API names and technical terms, or you can use #atproto or #atdev hashtag to be sure

Also, there’s a fantastic newsletter called Connected Places (formerly Fediverse Report) by Laurens Hof, who publishes two separate editions every week, about what’s happening in Bluesky/ATProto and in the Fediverse (and *a lot* of things are happening).

Ideas:

Some easy ways to start tinkering:

  • use one of the existing libraries for your favorite language and make a website or command-line tool which loads some data from the AppView or PDS: load and print timelines, calculate statistics, browse contents of PDSes and repos, etc.
  • make a bot that posts something (not spammy!)
  • make a simple custom feed service using one of the available templates
  • connect to the relay firehose and print or record some specific types of data

Tools:

And a couple of tools which will certainly be useful in development:

  • internect.info – look up an account by handle/DID and see details like assigned PDS or handle history
  • PDSls – PDS and repository browser, lets you look up repos by account DID or records by at:// URI (there are a few others, but this one is most popular)