A dive into open chat protocols I’m between projects right now, so as is my idiom I’m going to take some random topic that has caught me on a manic swing in my little bipolar life, and dive deeper into it for a few days. One of the low-key topics in the back of my mind is that “the world needs an open chat protocol that doesn’t suck”, and something made me start thinking seriously about XMPP again for the first time in a decade. I used XMPP myself a fair amount in its little Golden Age of the early 2010s, I ran an XMPP server or two, and it mostly worked pretty okay. And then Google embraced and extinguished it, WhatsApp got big and then got swallowed by Facebook, and there was this weird lull where no particularly good chat service existed and we all just used IRC and Mumble until Discord exploded onto the scene in the late 2010s. I hopped on Discord and never looked back until now, as Discord becomes slowly but inevitably enshittified, and suddenly people are really interested in alternatives. The Hip Alternative is Matrix, but there’s a slowly building amount of reactionary ire against it as people try to use it, so I’m wondering: Why not Zoidberg XMPP? In 2025, the main problems with XMPP seem to be: It’s Not Cool It uses XML and XML is grody There’s ten million extensions and compatibility is a nightmare Some of its encryption stuff is not good enough for a post-Snowden world, which is made harder by the extensions being a mess Meanwhile Matrix has had a lot of malding written about it on lobste.rs this year, and no particularly positive endorsements. But most of those complaints actually go into any particular depth besides “it doesn’t work well and it makes me angy”. That matches my own experiences with Matrix, so a lot of that malding resonates with me, but one thing I’ve learned in life is that complaining about something might be a useful signal of a problem’s existence, but actually trying to solve the problem carries 1000x more weight. And most of these various complaints lack anything particularly concrete or actionable. So, how do Matrix and XMPP stack up against each other, in terms of technology and social organization? Last updated August 2025. Feedback and corrections from lobste.rs discussions incorporated. Discord First off, our control study. To compare Matrix and XMPP, we have to look at what we’re comparing them against. And since all my friends are on Discord and I help run a public discord server with a few thousand members, my life basically revolves around Discord. What’s the status with Discord these days? Well, the status is that the enshittification will continue until morale improves. It’s great to use still, but the people who made it have been replaced with people who are there to milk money out of it so it’s really only going to go downhill. However, Discord is pretty damn hard to replace by now. For all its flaws (and I can list many) it does some difficult things very very well: It does chat rooms Pretty Well, with nice graphical flourishes for those who want them. More substantially it includes things that are very nice luxuries like “upload medium-sized files” and “create HTTP link that points to a particular message in a room”, and the meta-tools around chat rooms like search, moderation and permissions range from “mediocre but Good Enough” to “pretty nice”. It does voice and video chat very easily, including useful things like streaming a user’s screen. This is something that just keeps being out of easy reach of small server operators a lot of the time, unless you want to use some evil 3rd party service like Youtube or Twitch. SIP is supposedly how you video calls as an open source operator, and I just have never managed to actually find a compelling way to make it work. Simply scaling up to be as large as Discord is not easy. https://discord.com/blog/maxjourney-pushing-discords-limits-with-a-million-plus-online-users-in-a-single-server and https://discord.com/blog/how-discord-scaled-elixir-to-5-000-000-concurrent-users for example. A federated system makes this easier on some levels, but harder on others; the large load on single servers becomes replaced with a large load for communicating between servers. Scaling out to have a strong ecosystem of separate servers is not particularly easy either… partially because it’s difficult to push new users away from matrix.org and on to other large servers. Mastodon seems to have done this pretty well, but Matrix’s record is rather more mixed. It seems to be trying harder now. Discord has an advantage here because, being centralized, they don’t try to scale out at all. Their software has to interoperate with nothing except their software, and the same goes for the management of their infrastructure and business and the enforcement of their policies. and on to other large servers. Mastodon seems to have done this pretty well, but Matrix’s record is rather more mixed. It seems to be trying harder now. Discord has an advantage here because, being centralized, they don’t try to scale out at all. Their software has to interoperate with nothing except their software, and the same goes for the management of their infrastructure and business and the enforcement of their policies. Building a good client is not easy. I could go on at length about the flaws in the Discord client, but it works on every single system in my house, including the Playstation 5, and it supports pretty damn seamless voice and video chat on all of them. It’s also basically the only system that I’ve used which comes even close to doing that apart from Signal. Discord is great, frankly, but it’s still a proprietary service. And so the company’s CEO being swapped out recently reminded a lot of people of a very firm reality: no for-profit company has managed to make a chat system pay for itself. Ever. Look at the history and you have ICQ, Yahoo Instant Messenger, AOL Instant Messenger, Skype, MSN, Google Chat, and who knows how many others. WhatsApp will die when Facebook decides it’s no longer profitable. Signal is an interesting outlier; we’ll see how it goes. But I fully expect Discord to fall to the same problems that everything else has, sooner or later. What about other open-source chat systems? Signal, while very interesting, is a complicated only-semi-open-source topic and something I’m not qualified to talk about enough, and don’t have energy to dive into right now. TLDR it’s great but its servers are not federated and that’s on purpose, and I don’t like that. Zulip is also focused on non-federated use cases; it competes with Slack, not Discord. That’s fine, just not part of this particular problem. IRC will never go away, but will never become better either and is Not Good Enough by my standards. IRCv3 is probably an interesting story but I don’t know anything about it right now; are there people out there trying to use it? Jitsi looks promising but I’ve never dug into it either; maybe that’ll be next on my list. And that’s pretty much the list of open-source chat systems, as far as I know. At least for now. Others are being created, but seem to be squarely in the “only ever used by their creators” category, so for now let’s just focus down on just XMPP and Matrix. Edit: Other interesting things to look at include: Revolt is a straight-up Discord knockoff, and actually a pretty decent one. Not federated, no clear plans to become so, but apparently there’s some bridge software for connecting it to other systems that works pretty well. SimpleX is open-source, but claims to be distributed when to me it seems like the worst combination of distributed and federated. Tox on the other hand is much more convincingly peer-to-peer, and seems to work quite well from ten minutes of playing around with it. It’s young though, I’d consider it maybe-alpha levels of development. Mumble seems to still be doing well, happily sitting in its niche of easy-to-run, non-federated voice-chat for gaming. XMPP So, what’s actually wrong with XMPP? It’s a has-been, it’s deeply unfashionable, nobody uses it anymore. Okay, but… why? Is it the server software? Prosody is an excellent little program suitable for home-server or small-to-medium community use. ejabberd is the big boi that has the ability to scale up to industrial levels. I’ve run both in production, and both are actively developed and have been for a decade or two. They’re fine. One might even say they’re pretty good. There’s other servers out there like Openfire that I don’t know anything about, but most of the XMPP hosts I’ve found in my rummaging around run either Prosody or ejabberd. Is it the client software? Gajim is… okay. Its UI is somewhat flabby compared to something like Discord or Slack, and for some reason it doesn’t enable OMEMO end-to-end encryption by default, and it’s not exactly graceful or inspiring. I can’t figure out how to set a user profile icon. But it does work. At the risk of being uncharitable, idk maybe Gnome people like it. Let’s move on. conversations.im on Android is honestly far better. It has a voice and video chat option, but Gajim doesn’t seem to recognize it… either that or the random server I put my test account on doesn’t know how to do it. Either way trying to use it just has the call ring forever with no notification in Gajim. Sending files either both ways works fine though, between different clients on different servers. Ok see, this client is actually decent. Maybe even semi-inspiring. Not perfect, but far better. Let’s do something a little off-beat and try out profanity, which is a console-only client. Presumably the name is a reference to the venerable BitchX, without being quite as likely to make the women in your life consider you an inveterate neckbeard. It’s… well, it’s a TUI client that isn’t irssi and thus Feels Wrong, but it is reasonably easy to operate with a bit of digging. Like Gajim it doesn’t default to using encryption, but unlike Gajim I can actually figure out how to turn it on and make it trust another user, so that’s an improvement. On the other hand I couldn’t figure out how to make it send a file, and trying to receive one gave me an impressively unhelpful message. Sigh. Oh wait, after some more searching, Dino seems to get good reviews? It’s a GTK+ desktop client and like Gajim, its UI seems to be of the “leave out all the interesting stuff” style that people seem to think is a good idea for some damn reason. But… despite missing some of the knobs I want (like being able to force it into dark mode, or very basic-seeming stuff like “view and edit your contacts list”???) voice and video chat Just Works cross-platform with conversations.im. Fucking finally. And it uses encryption by default, for once. All right Dino, I don’t like you, but you get a pass. Best Linux XMPP client I’ve found so far, by far. Is it the feature set? Not as far as I can tell. With a good client, it did everything I could do in Discord except stream my screen. This was true even when using a couple random public servers I knew nothing about. File sending, presence and typing notifications, contact management and account metadata, profile pictures and blocks… all that just heckin’ worked, when the client supported it. I could even do kinda neat unexpected things like browse public chat channels on random servers, stick random key-value metadata in my public profile, etc. XMPP has a lot of interesting little nooks and crannies. All in all, everything I wanted was there, just not necessarily usable with a particular client as far as I could tell. That said, there’s definitely some features that are supposedly widely-deployed but are far more half-baked than they look, especially around encryption. The easy stuff tends to Just Work, but the hard stuff miiiiiiight turn out to be even harder than you think. Is it the protocol? Aw right, time to dig into the actual tech. The core of the XMPP protocol is three RFC’s: RFC 6120 for the framing and messaging stuff, RFC 6121 for actually using that for IM, and RFC 7622 for the address format. All of them obsolete several older RFC’s, so they’ve been revised a few times, and all of them are basically stable as of 2015. That’s honestly an encouraging-ish track record. They are 200, 110 and 27 pages respectively, so they’re meaty, but not 3000-page tomes like some technical standards are, and a fair amount of that space is taken up by examples. There’s a fair number of other RFC’s for various bits and pieces. Most features in XMPP are in the form of extensions though, called XMPP Extension Protocols (XEPs). There are currently 505 of them listed by the XMPP Foundation that stewards the protocol, of which 89 are categorized as “stable” or “final”, and 50 as “active”. About 100 are in some “this is no longer relevant” state (deprecated, obsolete, etc), and 250 of them are “experimental, deferred or dormant”. And 5 are “proposed”, with such awe-inspiring names as “Fast Authentication Streamlining Tokens” or “Pre-Authenticated In-Band Registration”. Soooo there’s about 150 XEPs that matter to an implementer, 250 that matter to protocol designers, and 100 that matter to historians. Let’s get to readin’. Well, skimming. …Lightly skimming. At the time of writing, most deployed servers still use the Server Dialback protocol [XEP-0220] to provide weak identity verification instead of using SASL with PKIX certificates to provide strong authentication… bwahhahahahaha so instead of using SASL, XMPP servers talking to each other do something similar to the reverse-DNS thing email servers do, where server A opens a connection to server B and then B tries to open a connection back to A again to make sure it’s actually who it says it is. Based. Has anyone willingly used SASL since 2012? Or even before that, really? I only know SASL exists because I made an email server auth with Kerberos and LDAP in 2009, and let me tell you, it wasn’t much fun. (Update: Turns out that of course the situation is naturally a bit more complicated these days, but it mostly seems to come down to “use TLS certificates” and a bunch of details around the “what” and “how”. There’s some best practices guidence, which is reasonably up to date.) All right, with that auspicious start, the actual protocol connections work exactly how you expect them to: example.net <--------------> im.example.com ^ ^ | | v v [email protected] [email protected] There’s provisions for the two servers to keep long-lived TCP connections open and multiplex multiple unrelated messages into it, user accounts are attached to particular servers which store metadata such as contact lists for that user, multiple client devices can be logged into the same account at the same time, and the basic operation is to shuffle a block of XML from point A to point B. No surprises here, really. Domains advertise their XMPP servers via DNS SRV records (as they should), TLS can be involved, etc. There’s a bunch of XML fru-fru, but fundamentally you have three message types: “message”, which is a push that doesn’t require a response, “iq” (info/query) for request-response, and “presence” which is a pubsub operation for broadcasting. Then you do actual stuff by putting more XML into one of these three message types. That seems to just about cover everything you’d want, apart from maybe like… video streams or something, idk. Upon making a connection, each side tells the other about its supported features, which may or may not be required. These are things like STARTTLS support, compression algorithms, etc. There’s a big pile of info about keepalive, error-handling, multiplexing messages over multiple streams or vice versa, with copious examples. And that’s… more or less it, for the core spec. There’s plenty of details but that’s really the overview of the 200-page RFC 6120. Clients talk to servers, servers talk to each other, they trade atomic-ish XML message chunks over TCP, and you have request, request/response and pubsub message types. It’s… Pretty Good Actually? RFC 6121 defines rosters (contact lists), presence notifications, messages, and other fairly-bare-bones chat stuff, while the RFC describing address formats is just as boring as you’d want it to be. The only really weird part of the whole setup is that there’s a “resource” identifier attached to each client session, so you might have [email protected]/desktop and [email protected]/phone and they’re technically different addresses that can have messages sent to them individually. I’m not really sure why you’d bother doing that these days? If I send a message I want [email protected] to see my message regardless of whether they look at their phone or their desktop; I could see being able to target specific devices for some use cases, but by default I’d expect messages to just go to all of them. In fact, there’s an XEP that discusses doing exactly that. Apart from that wrinkle, and the pretense that SASL matters, and the whole various ceremonial sacrifices to the dark outer gods of XML, the whole thing seems to have aged pretty decently. Soooooo unfortunately it’s time to look at the XEPs, ’cause this is where 90% of the actual useful protocol is defined. Let’s save our sanity from going through all 150 “XEPs that matter” identified above by just considering the ones conversations.im implements, of which it seems there are 21. …that list does not appear exhaustive, I find a different list elsewhere. But profanity appears to support 31 XEPs, so 20-30 is something in the ballpark of the number you need to make a reasonably-full-featured client. These can be as simple as a /me command or as sophisticated as p2p media streaming. Actually, I don’t think I’m going dig deep into XEPs after all. I’ve skimmed some and they seem to be on the whole sensible and easy to read; rather similar in style to IETF RFC’s. Things like multi-user chats and presence notifications seem pretty sensible, as far as I can tell. And there are also compliance suites, which are very good to have. I feel like the trend towards including tests in standards is probably one of the most important pieces of progress in defining protocols since the internet started. The only real problem with XEPs is there’s just so damn many of them, and it’s so much work to figure out which ones you actually care about. We’ve answered the question I wanted to ask: Does XMPP have good technical foundations even if they’re somewhat old-fashioned now, like IMAP and bittorrent? Or is it a crumbly pile of crocks like IRC and email where you can’t add anything useful without breaking everything? Seems to lean more towards “good foundations”, happily. Let’s move on. Is it the community? Searching for “xmpp servers” gets me lots of software and few hosts. There’s a list of hosts at https://providers.xmpp.net/, which seems to have been discontinued. xmpp.org refers to several external lists, including that one. https://jabber.at/ has a list that only a nerd could love, but at least there’s data there. When was it last updated? In… 2019. Great. Is any of that data still valid? Ssssssome of it is. Hmmm. Sure okay let’s look at the top things on those lists… https://xmpp.404.city/ greets me with a gigantic fucking skeezy “advertise on your favorite server” banner. Swell. Certainly no equivalent to https://joinmastodon.org/servers, I gotta say. Some more digging does find me a functioning server with open registrations, charmingly run by five nerds somewhere in India. Good enough for playing with test accounts, I’ll take it. Bless you, five nerds somewhere in India. The one large, official-looking, actually maintained server that I can find with an active community around it is the one run by the conversations.im XMPP client, which makes sense. There’s probably others, but if you just want to make an account somewhere, that’s probably the place to start. (Edit: https://joinjabber.org/docs/servers/ is fairly small but appears active and well-curated, so start there.) Additionally each server and client has a heckin’ laundry list of features they support and no sane person is going to comb through one at a time and compare unless they have to. Gives me flashbacks of Scheme SRFI’s. There seem to be only two real server implementations used by the general public though: ejabberd and Prosody. Both of those support far more XEPs than the clients do. So while I had Some Issues with the client software, the server software should mostly just do everything you want it to without much fuss. Again, I’ve run both ejabberd and Prosody up until 2017 or so, and both were pretty great. How active is the standards body? Without doing anything as sensible as reading the mailing list, we can just look at a few of the more recent and important-sounding XEPs, mostly at random. OMEMO encryption, which seems to be the encryption that you actually want to use, has been an “experimental” XEP for about 10 years. That’s… not a great look. To pick another XEP basically at random, push notifications are another feature that I’d expect to Just Kinda Be There, was started in 2015, and as of 2020 is marked “deferred”, which basically means “nobody has touched or talked about this in at least a year”. Soooo the standards body isn’t dead dead, but it’s certainly spending a lot of time in bed and not breathing too well. Oh, here’s an interesting detail: I haven’t counted, but I have to observe that almost every one of the first 250 XEPs I’ve looked at has Peter Saint-Andre’s name on it. And he retired in 2022. There’s probably a great story there; anyone wanna write a biography on the guy? I haven’t yet noticed any other key names, but I haven’t looked too close either. Oh, it looks like there’s a Modern XMPP project/group that is working on gathering and refactoring the various actually-important XEPs into one place. This looks promising! They’ve done a fair amount of work! …oh, there’s only been a single commit to their github repo since 2023 though. RIP. Matrix All right, we have our control group and one data point, on to the other data point. Again, the whole purpose for this article is that there’s been a small but noticeable swell of dissatisfaction with Matrix lately: https://lobste.rs/s/jtd0b1/giving_up_on_element_matrix_org https://lobste.rs/s/vqsran/matrix_is_cooked https://lobste.rs/s/x6qsw2/how_we_discovered_recovered_from – this one appears to be pure act-of-god misfortune, but it sure doesn’t feel good. This seems to be a mixture of cry-more-noob bitching, suspicious astroturfing, and legitimate frustration. I haven’t actually used it in anger in a couple years, but I can confirm that using Matrix has never really sparked joy in me. Account management is difficult, the choices of clients are ever-shifting and never good, getting multiple clients able to talk to the same account is an exercise in broken encryption and bullshit-broken cross-device-verification, and the matrix.org server is slow and laggy, especially with the official web client it offers. And it hasn’t gotten better in the last few years, as far as I’ve seen. That said, I can log into it and use it, and it’s been perfectly fine using the official client on my phone to talk to one chatroom of half a dozen people, even if we only talk once every few months. So obviously it’s not a complete lost cause. So let’s ask, what’s wrong with Matrix? Is it the server software? Maybe? The official server is Synapse, which seems to also be the only production-ready one, but there’s a handful of others listed in “beta” and at least one or two of them are in production use by their community. Synapse is a notorious resource-hog, and accounts conflict about whether it’s a fundamental problem or whether it’s due to being written in Python. Am I really interested enough in finding out that I’m going to take the time to actually install and run this stuff? I… probably should. Not gonna though. Is it the client software? Kkkkkinda? Element has been grudgingly usable for me, on mobile and on web. Other people have bigger complaints and I’ve only used it lightly. It’s going through some development bumps with an attempted rewrite though, and no other client seems to really stand up to its standards. The situation has been described as “two [official] apps, one unsupported and one incomplete”, and that’s a pretty ugly situation to be in. So far, the only issue I’ve had with Element is that sometimes screwy shit happens, like taking literally four minutes to create a chat room, or mysteriously being unable to see the messages in a different chatroom, or joining a chat and taking another 3 minutes before you can actually see the history for it. These aren’t rare occurrences or freaky outliers, they’re the first things that happen when I start the program and make an account somewhere. Only sometimes though! Once you start actually talking with stuff, it tends to kinda Just Work… for those servers or those users or those chats. For a while. It’s certainly not flawless, but I’m willing to consider it a decent attempt. I could just hit the “voice chat” or “video chat” buttons and have my phone and desktop start talking to each other, which is more than I can say for most of the XMPP clients I used. I’m not sure that this is Element’s fault though. See below. Is it the feature set? The feature set, frankly, is fine. As a user, it hits all the same buttons that are important in Discord. There’s probably differences, and I have no idea what the experience as a moderator is like. But between my PC and cell phone, video chat worked pretty much flawlessly without trying, which is more than can be said for XMPP. But again, see below… Is it the protocol? Time to dig into this. First off, how is the protocol defined and developed? Matrix has a single spec, and makes changes to it via Matrix Spec Proposals (or maybe Matrix Spec Changes, MSC’s). Each of those seem to be essentially a revision to the spec document, which are reviewed and then either accepted or rejected. They’re less like IETF RFC’s that build atop each other like legos, and more like patches that alter the standard incrementally. Let’s look at a random MSC to see what they look like… like uh, I dunno, MSC3916, “authentication for media”, I have no idea how I landed on that one but it was accepted around summer 2024 so it’s not quite brand-new but is fairly recent. It starts with… uh. Currently, access to media in Matrix has a number of problems including the following: The only protection for media is the obscurity of the URL, and URLs are easily leaked (eg accidental sharing, access logs). Anybody (including non-matrix users) can cause a homeserver to copy media into its local store. When a media event is redacted, the media it used remains visible to all. There is currently no way to delete media. If a user requests GDPR erasure, their media remains visible to all. When all users leave a room, their media is not deleted from the server. I’m biased. Okay. I know that. I’m saying that right here and now. I’m predisposed to feel like Matrix is sub-par technology because my experience with it was annoying and I read lots of people who have had annoying experiences with it. But allow me to indulge for a brief moment in saying fucking what? Anyone can decide to denial-of-service a Matrix server by filling its storage up with random crap, without even registering for it? And when the crap is deleted by a moderator or something then it’s not actually deleted? And anyone can grab that data from the server if they know where to look? How is this hard?! Contrast the XMPP implementation of the idea, which certainly has holes in it, but tries to offer some security! File retention, deletion and deciding who to show it to are explicitly out of scope for that standard; most of it is offloaded onto an external HTTP server doing the media storage, but there is a section in the spec about expiration and timeouts, CORS, and sanitizing content. A logged-in user asks a server “can I upload something?” and the server says “sure send a HTTP PUT request to this URL” and includes headers to send along with the HTTP PUT request for the client to include. You know, stuff like an auth cookie. I’m very much not an expert on web security stuff, but… this seems like all you need for the client and server to be able to talk to an external HTTP server reasonably? I think? It certainly solves item #2 from that MSC that has been an open issue since 2017, and item #1 is Actually Kinda Hard and the others are explicitly out of scope. But you could extend the same mechanism in XMPP to support HTTP DELETE requests just fine, and GET requests too tbh. Then your XMPP account and room membership becomes your authorization for being able to access that HTTP resource, via a token the server gives people in that room. When someone links an image or something in a XMPP room the server can say to its clients “oh btw, to get the data, include these headers in the GET request”. Boom, done, nobody outside the chat room can access the image. Or even certain people inside the room; the server can decide to give different people different tokens if it wants. Surprised that’s not already in the XEP, to be honest, ’cause it seems obvious. Fine, fine, forget it. Rant over. Let’s move on and look at the the actual spec. It starts with “The intention is to provide an open decentralised pubsub layer for the internet for securely persisting and publishing/subscribing JSON objects.” Sounds a lot like XMPP really. The main difference I see in intent with Matrix is persisting data as well as shuffling it. There’s also a few goals that are notably more ambitious than XMPP: Creation and management of fully distributed chat rooms with no single points of control or failure Eventually-consistent cryptographically secure synchronisation of room state across a global open network of federated servers and services Use of 3rd Party IDs (3PIDs) such as email addresses, phone numbers, Facebook accounts to authenticate, identify and discover users on Matrix. Trusted federation of identity servers for: Publishing user public keys for PKI, mapping of 3PIDs to Matrix IDs Notably, the support for history is a built-in assumption. XMPP clients and servers send each other events, Matrix clients and servers synchronize history with each other. While XMPP ties each chat room to a particular server, Matrix distributes chat rooms across multiple servers, as mentioned. Matrix servers are explicitly a system for achieving eventual consistency of a shared history. XMPP servers just shotgun messages at each other and have a bit of support for retries and “oops this server is offline, try again later”. That means that if [email protected] and [email protected] are both in a chatroom, then that chatroom isn’t tied to example.com or example.net , but both of servers have full copies of it in the form of a Merkle-DAG-ish chain of events that they synchronize between each other. There is no authoritative source of messages, just this wobbly distributed consensus that floats between multiple servers and clients. Clients update the state and tell their server about it, then the servers tell each other about it, and the servers tell their clients about it, and ’cause they all have the same data and the same consistency rules they all end up with the same result. I don’t think it’s a true Merkle DAG ’cause messages aren’t necessarily immutable, but it still all feels very Bitcoin-y. Which isn’t a surprise in retrospect since they started and grew up around the same time, starting in late 2014 and into the mid 2010s. Okay, this is a big deal. Sorry if I’m going to hyperfixate on this a bit, but it seems to be the key point about Matrix. And I could be entirely off base here, but this sure explains why Matrix has a reputation for being a resource hog. When you send a message to a chat room on a Matrix server it does essentially a git merge to add it to the end of its eventually-consistent chain of state. Then it forwards the state changes to every other server involved with the chat which do the same work over again, then they send the messages to to all the other clients involved and they do all the same work over again. (Edit: Not entirely true, apparently the clients just trust the servers, but still.) Contrast with XMPP which may lose messages in transit and need to resend, but each domain acts as the authoritative source of truth for each chat room on it. So clients just say “hey gimme everything between T-1 and T-2” and the server does it and the client says “cool thanks” with no work to do beyond that. Hence why you can run XMPP clients on an ESP32. Moving on… Unlike XMPP’s raw TCP sockets with server dialback and port numbers and DNS SRV records all the nonsense that involves, Matrix just shuffles JSON over HTTPS connections. It doesn’t seem to cleave to REST or JSONRPC or any other attempt at formalism, it just nods in a businesslike manner and straight-up uses HTTP as a RPC protocol. Has some pros and cons vs the more raw approach, but both are valid. Let’s get more into the actual messaging… Typically an event has a single parent: the most recent message in the room at the point it was sent. However, homeservers may legitimately race with each other when sending messages, resulting in a single event having multiple successors. The next event added to the graph thus will have multiple parents. Every event graph has a single root event with no parent. Lemme tell you a secret about global event ordering in chat rooms: nobody cares. These aren’t bank transactions. If two users get the same two messages in opposite order from each other then it’s fine, even in a formal setting like a university talk or a work chatroom, and it can be fixed by the client as soon as the authoritative server decides on what ordering is correct. Messages that rely on ordering, such as replies, can just include references to previous messages piecemeal. You don’t heckin’ need a single global consistency chain that can be reproduced exactly by every single system involved even if it’s on Mars, just so that lesbian catgirls can say “mreow uwu” to each other on the internet. This pattern continues to every other type of room feature: all state and messages are synchronized between every server involved in the room. While rooms are named with a domain name, it’s just a name, the room doesn’t “live” on that server. This is very cool, but it has some of the problem that Bitcoin and other distributed-ledger systems have: it means that fuckin every single server and client has a copy of the entire(ish?) state and has to merge new things into it according to some consistency algorithm. In fact that also makes security way harder ’cause how the hell are you gonna anonymize that? How can you delete the server logs? You can’t, everyone involved has a copy of everything that happened and it fundamentally can’t have messages snipped out of it; no wonder Matrix’s end-to-end encryption has always been a source of grief. The process of deleting a message doesn’t delete it either ’cause that would break the history chain, it’s “redacted” from the message log leaving an empty tombstone with only metadata saying “there was a message here”. That’s fine in theory, but I have to ask, what is the purpose of this? What do we gain from it??? Maybe I’m short-selling the redundancy here. If a Matrix server goes offline, that server’s users go offline but no chat messages from that server are lost. The rest of the members of a chat go along in their lives as if nothing ever happened. That’s pretty cool. But you know, we’re pretty good at making resilient clusters of servers by now. I’ve been on liberachat basically constantly since 2022 and it has had server problems precisely zero times in that span. If a server goes down for good it will be more likely because someone stopped maintaining it, not because of technical problems, and as far as I can tell(?) there’s nothing stopping a room’s owner from left_pad ’ing the room and making it vanish out from under you anyway. (Edit: this does happen.) Having your decentralized chat room floating in the ether of consensus between multiple servers is cool, but has costs as well. Okay moving on again… Users in Matrix are identified via their Matrix user ID. However, existing 3rd party ID namespaces can also be used in order to identify Matrix users. And then it talks about a “globally federated cluster of trusted identity servers”. This feels like a very… corporate-y feature. Why do you want this? Matrix already supports OAuth. Why does anyone want 3rd party identity providers in this? If you want an identity provider and you’re a company or government, you have an ActiveDirectory server that you can have your Matrix server pull accounts from. If you’re an individual, you have an email account and a cell phone. Both of these approaches provide a way for a particular Matrix server implementation to call ’em up and say “hey are you $PERSON?” without the protocol needing to be involved in it. What problem are we solving here? Idk, mayyyybe there is one? It’s so baffling that I want to dig into the issue tracker to find out why it exists. Oh, here’s an actually interesting detail: XMPP conversations start with a feature negotiation handshake where each side pushes feature lists to the other, while Matrix lets servers instead have a “well known URL” example.com/.well-known/... containing metadata about the server. Presumably this is so that you can have Matrix server endpoints starting in places other than the example.com/ URL. But having it be a pull rather than push model is kinda interesting. All right, let’s read about the actual messaging. While XMPP shoves multiple XML stanzas over a TCP stream, Matrix does basically the same thing via HTTP requests to various endpoints. This is theoretically less efficient ’cause each one involves a new HTTP request to the server, but in practice HTTP keepalive probably makes this a non-issue. Each HTTP request is probably gonna involve a fair bit of redundant information in headers again, but if the XMPP TCP stream goes poof then it needs to re-establish it do negotiation and stuff all over again before it can start sending new messages. Stateful vs stateless; it’s probably a bit of a wash in practice, in terms of efficiency. Ok not to beat a dead horse, it’s interesting how Matrix thinks about the world in a fundamentally different way than XMPP. I can’t stop thinking about this. Again, XMPP is all about events: push, request/reply, pubsub, and then state is built up out of sequences of events. Matrix is all about shared, eventually-consistent state, and events are just things that move state from one machine to another. It’s the difference between an edge-triggered vs. level-triggered view of the world, I suppose, and those are always context-dependent in terms of which is better. In this case I think that edge-triggered XMPP is the superior model. Ironically while it’s kinda less robust on the technical level (you end up needing retries, explicit state synchronization, etc), it ends up being far more useful fit for the problem domain of chat rooms where frankly no single event is really that critical, and retries/redundant messages are cheap and automatic. Updating the latest chunk of a client’s state from the server’s really isn’t that difficult. It trades being less consistent in exchange for better scaling and smoother degradation of service. If you plop some excessive number of people in an XMPP chatroom and get them all sending messages, it’ll bog down that one server and messages will start getting lost on their way to clients or other servers, but some of them will probably make it through one way or another. If you do the same thing in a Matrix server then I expect it will bog down the entire server, and both that server and any others involved will grind to a screeching halt as they try to update their shared global state with each other while users change the state faster than the consensus can propagate. Oh. Yeah, that’s the problem right there. A Matrix chatroom is shared global state, for a very literal meaning of the word “global”. And distributed solutions to shared global state are difficult and slow compared to non-distributed ones. As a general rule, if you want to update shared global state quickly, you centralize it: into a Postgres database, into an Erlang ETS object, into a ZFS filesystem. Put all your eggs into one basket, and make it a really good basket. It won’t scale you to Google levels, but it can get you pretty damn close, and that’s honestly all anyone reading this is realistically going to ever need. That’s what Discord does as far as I can tell, and they’re making it work well. Distributed networks work best for things where updates are uncommon, like DNS or Bittorrent. I should do some digging and see if I can find out more about how the Discord protocol views the world. From this article, it sounds like it’s the event-based model: “Users … communicate with remote Erlang nodes that contain guild (Discord server) processes. When anything is published in a guild, it is fanned out to every session connected to it.” They don’t say much about their network protocol, but the way they talk about it sounds event-y rather than state-y. This fan-out pattern appears again multiple times in other blog posts they make. So it sounds like they are a lot less concerned with how messages relate to each other than with how they are moved around. Anyone know more? Hmmm, as long as I’m speculating, I wonder how Matrix performs in the presence of a netsplit, where two servers stop being able to talk to each other and then re-connect sometime later? You would have two servers basically “fork” a chat room, each makes a long and complicated conversation DAG that are children of the last shared message, and then they try to re-merge with each other. XMPP’s event-based view of the world falls down hard here: you basically get corrupted state while the netsplit is in effect, and who knows if it ever comes back to anything sensible. In the shared-state-y model of the world these merge deterministically, if not necessarily cleanly or helpfully, and you can trace each independent fork separately until they are merged. That’s rather nice… but it still begs the question of “who cares?”, because the state during the netsplit is still inconsistent and not terribly useful. The state of a conversation between humans after a netsplit doesn’t actually much depend on the state during it, it’s mostly some variation of “wait where did $PERSON go?” “is this a server problem?” “am I here?” “lol this again” “test” “oh I think it’s back now” “can you hear me?” as people run their own eventual-consistency algorithms in their heads. So it seems fine to just have one server tell the other “here’s all the stuff you missed” and the other server saying “cool as far as my clients know all this stuff happened in one go at 17:23:42 UTC when we reconnected”. That’s still a valid eventual consistency algorithm, and a much simpler one. How heavyweight is Matrix’s resolution algorithm? I read through it, but don’t have a good gut feeling, besides knowing for sure it’s more work than an XMPP server’s most braindead option of “append message to ring buffer”. (Edit: Actual information about Matrix and netsplits can be found here.) This consistency model sounds like it might actually be fun to abuse. Fire up a few Matrix home servers and have them talk to each other and see how much one homeserver can make every single other one lag out trying to do their distributed consistency thing. I might have to try it. Maybe the real conclusion is that everything about how XMPP behaves in degraded situations is pretty simple and straightforward, if bad, while in Matrix the degraded behaviors are complicated, nuanced questions that nobody knows good answers to yet? Or at least I don’t. Probably time to stop talking out of my ass. So, yeah. I could be 100% wrong about all this nonsense. I’m not an expert in distributed stuff, at best I am a dabbler. There’s lots of other fuzzy icky parts of the Matrix spec that make me squint suspiciously, but honestly there could be parts of the XMPP specs that are just as bad which I glossed over ’cause they weren’t called out as clear gotchas in the documents. So, I’m gonna stop my expedition here. But it’s sure been a wild ride. Is it the community? Rant redacted. Suffice to say I’m not particularly impressed either with Matrix’s critics saying “omg it sucks”, or its supporters saying “Big Brother assures you everything is fine”. Hence why I’ve written this. It’s easy to bitch about something and get popular on HackerNews, and almost as easy to play the unjustly persecuted and carry on without fixing anything. Time will tell. But either way, unless you’re going to put in actual work to provide constructive critiques and solutions to problems, you’re basically just part of the problem. One criticism of Matrix is the involvement of a for-profit company in its development. I’m sympathetic towards both sides of this argument. The one fact of life is that you gotta eat. But we’ve sure had a lot of for-profit companies pull some nasty bait-and-switch moves the last few years, even “nice” ones. And nonprofits nominally in charge of for-profit companies are no better. It’s… probably not a hopeless situation, honestly, but people are probably right to be suspicious. Trust arrives on foot and leaves on horseback. Part of the problem with Matrix is just that these problems are legitimately difficult and they’re having growing pains. And honestly, that’s fine. Scaling the technology gets nontrivial. There are semi-plausible reports of single XMPP servers supporting millions of users, but none I could find with 50,000 users in the same chatroom all gushing at once about the World Cup. Handling moderation, abuse, and outright crime is a human social problem and thus is difficult to automate well. Federation will always have more of a problem with spam and bad actors than a well-run and trustworthy centralized system, and federation gets rid of the one giant failure-mode of “Discord goes out of business” in exchange for lots of little failure modes as individual servers go down, come back, individually enshittify, get replaced, or don’t get replaced and just die forever. Federated systems are human systems, and human systems are complicated and messy. The distributed nature of Matrix rooms raises other questions as well though: moderation and control. Nobody having control or authority of a shared space like a chatroom sounds good right up until you need someone to have control and authority to clamp down on bad actors. How does that work in Matrix? Rooms have permissions where the creator of a room has absolute control and can pass it, or less powerful versions of it, to other people. So it’s basically a capability, that makes sense. But with no server authority to act as a final backstop, how does that work in cases of abuse? What happens if the room owner vanishes? What happens if the room owner’s homeserver vanishes and takes their admin permissions with them? What happens if the room owner’s account or homeserver gets compromised? Like Bitcoin transactions, as far as I can tell there’s no human you can talk to for recovery. There’s ways around this, if you try hard enough, but oh look now you’ve introduced single points of failure to your distributed failure-proof protocol. If a system is absolutely ironclad and backed only by computers deciding what reality looks like, that means it’s resilient to some modes of failure but very vulnerable to others. Maybe I’m missing something here? (Edit: Apparently some abuse and moderation functions are handled by shared blacklists, bots, and so on. But that just turns our distributed system into a federated or centralized one again, proving my point!) Relatedly, apparently Matrix has a spam problem? It’s certainly not alone in that, speaking as someone who bans about a dozen spambots a month on a Discord server, but it does demonstrate that “decentralized chat” is not a silver bullet against this. This seems like it might be a relatively new problem, so I’m going to cut the Matrix devs and admins some slack and say “improving this situation is WIP”. (But my brain keeps reminding me that anyone can upload images to any matrix server, bypassing authentication. And that bug has been open since 2017. JFC. Maybe I’m not gonna cut them some slack after all. (Edit: it’s apparently been fixed now even if the issue is still open, “the issue was that you could cause any server to download media from any other server, persisting it, and hotlink that media. Now media requires a login to download (and trigger that flow).”) Conclusions Whew. That was a rollercoaster, but it took us some interesting and maybe-useful places. To wrap up big difference between XMPP and Matrix is that XMPP gives you ephemeral messages hosted on specific servers, and Matrix gives you distributed history shared between everything. In XMPP if the server gives you a message then it’s your problem now and you can do whatever you feel like with it, while Matrix makes each message the problem of every single server and user involved in its history. Both XMPP and Matrix have culture/management problems, but XMPP’s are probably bigger. It’s easier for a maybe-kinda-toxic culture with some money to keep going than it is to resurrect a mostly-dead open standards body and get people interested in it again. Oh, it’s also worth mentioning: as far as I can tell, the Official Line of both Matrix and XMPP is that they are not competing with each other. And frankly, that’s probably the best way to view it. You can have software bridge between them freely, and it’s probably not even that hard. That’s the whole point of this: these systems are open. So unlike Discord or WhatsApp or whatever bullshit the brain-parasites are peddling today, you should be able to use Matrix and XMPP together. Okay. So, zooming out again for a bit here’s the general trends for chat systems I see for the next 10-20 years: Again, nobody is going to make money on chat networks long-term – I could be wrong here, but nobody’s done it in the last 30 years. Discord, Telegram, and Signal are just the latest turn of the wheel. The people that support chat networks long-term (cell phone networks with SMS/MMS, Google/Apple with RCS, Facebook with WhatsApp) do so as a loss-leader to sell some other product, and thus really really want to keep their vendor lock-in. Good voice and video chat is a killer feature. Why is this still hard to do for a single person running a VPS? I should check out Jitsi more. Writing good clients is wayyyyyy harder than writing good servers. We have Good Enough authentication systems, you don’t have to reinvent them just for a chat protocol. Tie into existing ones and worry more about authorization. Good c2s and s2s encryption is easy by now but people still suck at it for some reason Good end-to-end encryption is still hard, and bad end-to-end encryption is both of dubious utility and also very painful for users End-to-end encryption is not actually a killer feature for 95% of people but is critical for those 5%. Potentially life-threatening. And absolutely nobody does end-to-end encryption well except Signal Federated servers are easy, distributed/peer-to-peer networks are certainly possible but wayyyyy harder. There’s lots and lots of human social problems overlaid on the technical ones. Centralized systems like Discord or SMS make these Easier ’cause you have someone to blame. Distributed systems make them harder ’cause you have nobody to cry to for help. Federated systems are in between. Signal is an outlier that I don’t know much about the guts of. But even if it performs perfectly for the next 20-30 years, what’s it gonna do once Moxie Marlinspike dies of old age? There’s a nonprofit foundation around it; maybe that will remain uncompromised? …Okay, honestly, 20-30 years would be a great run for a messaging technology. I’m just pissed off that I’m not allowed to run my own Signal server for some reason. XMPP The technology, for all its early-2000s XML nonsense, seems very solid. It chugs along in low-key places like IoT systems and under the hood of WhatsApp, Jitsi and SIP. And while people have many complaints about XMPP, nobody ever seems to complain that it’s slow, laggy, or difficult to run on the operations side. More the opposite. Maybe more importantly, the decentralized-but-federated nature of authority in XMPP-land, where each server has absolute authority over what happens on that server… it isn’t perfect, but it’s sure the best solution we have right now. We know how to make it work: you let people choose whatever server they want to live on and let them move easily between them, and they’ll self-sort into a Zipfian distribution. So without much planning or coordination you end up with a few large servers/providers, a good handful of medium sized ones, and myriad tiny ones. Email does this (badly), the world wide web does this (despite the efforts of monopolies to stop it), Mastodon does this, VPS hosting companies do this, IRC does this, and to be fair, Matrix does this after a fashion as well. It takes constant effort to make it keep things working well this way, it’s an unstable equilibrium where large systems occasionally fail and new things pop up and fade as they become fashionable. It’s always at the risk of being over-centralized by the Googles and Facebooks of the world. But we know how to fight that as well, and it’s much easier to do when you have definite sources of authority than when you have a distributed system where shadow cartels can run it as they please. So, the problems with XMPP are mostly social with few notable technical weaknesses, while the advantages are both social and technical. The good news is that XMPP standards and discussion are deliberately structured like IETF ones, which is to say, there’s structure to the discourse but basically zero barrier to entry. If you want to do something, sign up for the mailing list, barge in, and start writing stuff. At least in theory. In reality some institutions can have very strong unwritten barriers against this, but hey, I’m tempted to try. Matrix Mannnn, it’s Bitcoin and IPFS all over again. It’s 2025 and still nobody’s managed to make a low-trust distributed system that works well except for Bittorrent, and guess what, Bittorrent works well mainly because of centralized tracker servers that the clients trust. Can we agree that these purely-peer-to-peer systems fundamentally don’t work well at large scale without some amount of authoritative system helping them along, and stop chasing the dragon of stateless distribution? Maybe in another generation the theory boffins will have come up with something better than Merkle DAG’s and Kademlia and we can try again. Again, it’s cool. A server with many users goes down and all its users go poof but the rest of the system keeps working as if nothing ever happened? That’s 100% a dream worth fighting for and I applaud those fighting for it. But right now? It’s slow. Creating an account takes a minute. Creating a room takes 10 seconds. Joining a room takes 10 seconds. Sending messages lags. And sometimes it decides to just sit around and take forever for some operations at random. It’s sloooooooow. It’s heavyweight. It’s over-engineered. It’s complicated. Management of distributed consensus brings up exciting new social problems that make everything harder, even things as simple and fundamental as “delete spam images”. The complexity breeds bugs that nobody has the bandwidth to fix. Meanwhile on XMPP all of those operations are essentially as fast as a network round-trip and the server software was perfected a decade ago, and every social problem is solved by the primitive-but-robust tools of “talk to the server operator and tell them to fix it” and “make sure the server has a written policy you like and enforces it well”. Is it worth it? I dunno. Realistically, chat servers vanishing and taking their contents with them is a problem, but to me it’s a problem about humans, not about technology. I have high hopes for the concept, by all means play with Secure Scuttlebutt and cool LoRa hacks and all that stuff… but distributed networks take all the Hard Problems that Discord deals with, both social and tech, and makes them 100x harder. Plus it’s 2025 and lemme tell you, the promise of distributed systems removing power from the powerful has not panned out. Computer technology isn’t going to solve our social problems for us if we just get more of it, there is a human technology of social progress and system design that we need to focus on more. These are boring, humdrum technologies like nonprofit foundations, open source licenses, co-ops, hackerspaces, laws, and bureaucracy. So, so much goddamn bureaucracy. Tangent time here: if you need multiple humans to cooperate with each other and make decisions involving more people than our cozy little D&D-party-sized immediate social groups, then bureaucracy is how you do it. It’s also how you make systems that outlast the participation of any one human: you have a set of rules, write them down, find people to enforce them as written, and make a mechanism to update them as necessary. That was true in China and Babylon 3000 years ago, and it’s true now. With computers we can automate some of the bureaucracy away, move it around, and optimize it to run more smoothly and consistently, but at some point you need humans in the decision-making loop instead of just a distributed consensus algorithm. And shitty as it is, bureaucracy is the best tool we have for formalizing and logging human decision-making. So, yeah. The problems with Matrix are both technological and social, while the advantages are maybe technological. Appendix: How to revive XMPP The biggest problem with XMPP is not just that it’s hard to use, but it’s also hard to develop software for. You want to write a new XMPP chat client or server? Be prepared to read 50 XEPs, and nobody even tells you which ones. Start going through each one individually, like every single client implementer has before you. The XEP process has been very good for getting people talking about cool tech-y bits and pieces for solving specific problems, but miserable for developing functional interoperable software outside of a single sub-ecosystem. IMO, XMPP should do what Vulkan does: have numbered formal releases as well as extensions, where each numbered release simply says “this MUST support this list of extensions: …”. Ideally while also having the ability to (rarely) break backwards compat where necessary, and say “this MUST NOT support this broken terrible old extension: …” As far as I can tell this has been a very happy way of running a standards body for Vulkan for almost a decade. This way if a user has a client written for protocol version 1.2 and a server provides version 1.3, you know at a glance that everything should work together, and if not it’s a bug to report to the developers. If you have a version 1.3 client and version 1.2 server it has a chance of working, and the client might be able to poke around and ask for the extensions it supports and get useful answers, but that’s the client’s responsibility to do. And you sure as fuck don’t need to decode this: A normal helpful list of features for regular people. The big thing about this approach is it runs in parallel to the extension-based XEP development process. All the official release does is bundle up a bunch of extensions and say “these are now the minimum viable product”. You can even have different releases for different purposes – IM, file sharing, pubsub, IoT/automation, whatever. The protocol can still evolve and grow as normal, there’s just useful checkpoints at convenient places. Again, the goal isn’t to reinvent the world, the goal is to make it easy. Make it so that someone using or writing software has a very easy single reference to look at, along with a test suite for it. In fact, that was almost exactly what the Modern XMPP dudes tried to do not long ago. So let’s just join up with them and get ’em motivated again. There’s also Snikket, which appears to be taking the other approach of sidestepping the standards process where necessary. (Edit: Movim also looks interesting.) But that’s not everything; what else needs to be done? Well, a hell of a lot, to be honest. This is not an exhaustive list: Start talking with the XMPP Foundation again. Wake ’em the heck up! Ask difficult questions. Why is OMEMO still experimental? Which of the 3 different methods of video call is the one you should actually use? Can we make the HTTP file upload integration provide auth for GET and DELETE requests? How cool would it be if we channeled XMPP over QUIC instead of TCP? Do we dare replace XML with CBOR or something without changing the actual shape of the messages, so servers can just add it as a new serialization option? They need young blood, and if reading thephd’s blog has taught me anything it’s that a charismatic, determined and capable individual can do a lot to shake up a moribund standards body. Fix up the standard. It’s very much a decrepit house with good bones, so now’s a good time to fix up a bunch of deferred maintenance, remodel the kitchen and give it a coat of paint. And for the love of god modernize the encryption options and make them actually possible to use. At the risk of beating a dead horse, if I write client or server software I want a single document I can follow, with a version number, that tells me both what to do and what not to do. With test suites. And most of that exists! …in a very inconvenient form. The existing standards and processes are actually mostly fine and doesn’t need replacing, afaict, what they need is a direction and maybe a fresh clone of Peter Saint-Andre. You don’t need to rip out much in in basic XMPP, if anything, just consolidate the useful bits and leave out the non-useful bits (like outdated encryption). Call it “XMPP IM Profile 1.0”, let the XEP process spin along as it is in the background, and go back to it in a few years to bring in revisions and new extensions for XIMP 1.1. You don’t have to touch existing server software at all as far as I can tell, besides adding a new feature flag somewhere. It seems to do everything the clients want and far more. Make it grow. At a glance conversations.im is a really good client, so we need to ask, what do we have to do to make desktop and web clients that are as good as it is? Can someone please start a hosting business for cheap/free XMPP accounts and servers, for whatever market you think is worth it, and actually advertise it effectively? Can someone please go through the official account providers list on xmpp.org and remove everything that’s been dead since 2019? Maybe even make some protocol so that servers can advertise themselves to centralized indices that a user can browse? (Oh look, that sounds like something you could do with… a networked pubsub system. Anyone wanna write a XEP for it?) XMPP is a useful system, and a bunch of software exists for it right now, and even if that software is imperfect a lot of it is pretty good. So we should start using them. It’ll be easier than writing a new protocol from scratch again, I promise. Appendix 2: Discussion Interesting snippets from lobste.rs: