Matrix is a new-ish player in the messaging and group chat space, but the software ecosystem is stable and is becoming mature enough to be quite usable. My goal here is to introduce the platform and to then take a closer look at it from an open-source intelligence perspective, since this is less well understood than more mature and popular platforms out there.
This is the first post in a four part series, so stick around even if this introduction to the platform is too basic for you. I plan to get down in the weeds a bit in a couple of the later posts, including some programing examples.
Matrix is an open source, federated platform similar to XMPP. The Matrix API is pretty full featured and well-documented, as is the project in general. A great place to get started is reading their guides. This platform fits in a hybrid application category, being both a messenger that can be used for person-to-person communication, as well as for group chat for fairly large numbers of participants. Matrix has integrated support for voice and video chat as well but is a text-based messaging platform at its core. That does make comparisons difficult however. It is most similar to Telegram in terms of functionality, although they are quite different in most other ways.
This protocol is built entirely on HTTP (the worldwide web protocol) so we’re dealing with the standard client-server model, using proven web infrastructure like transport layer encryption (TLS). HTTP is not only a well understood application layer standard, but this choice allows developers to use standardized code libraries, tools and testing frameworks that are mature and exist in most programming environments. I’m going to show screenshots from a graphical client but keep in mind that virtually all user actions can be accomplished using familiar command line tools like curl
. This is due to a well-designed API that features Restful endpoints that speak JSON.
In terms of functionality, Matrix supports both private and public conversations between arbitrary numbers of people, and those conversations can be either encrypted or in the clear. There is plenty of nuance there that I glossed over, and I’ll mention some of it later as it pertains to OSINT investigations.
Matrix rooms are often bridged to other platforms, providing integration with chats on Telegram, IRC, Discord, WhatsApp and others. This has some implications for OSINT, including the fact that to link chats with two different protocols can only be accomplished by using unencrypted rooms. Also noteworthy is the fact that you can investigate the activity and search through the history on any of the platforms to which a chat is bridged.
The matrix team not only develops the protocol and codes the reference implementations for the client and server, but also runs the largest homeserver out there by any measure. The typical noob onboarding process entails creating an account on a homeserver, and the matrix.org homeserver is usually the first one new users find. People also create accounts here to establish a canonical account that is far more likely to persist over time than a server run by one person. More about this later.
Matrix.org is not only the largest home server out there, but does a good job making lots of available data publicly accessible. A good place to begin poking around is view.matrix.org where they display information about all public rooms that have users with accounts on matrix.org homeserver and have set their visibility set to be world readable.
This site allows unauthenticated users to view information about rooms, including the conversations, users, and the home servers associated with those rooms. This is a great starting place for OSINT, whether you’re doing manual research, using bots, or pointing a scraper at it. This site is useful because it can be used to answer so many questions like, “Where do the people in room X come from?” in all it’s various forms. People use particular home servers based on common interests, geography or language, political ideologies, technical expertise, and more.
Matrix is still in the early phase of development, and attracts lots of early adopter types. The rooms often are populated by users who skew younger and more tech-savvy than other platforms. However, most of the same types of activity seen on similar platforms can be found here as well.
In the next post we’ll take a deep dive into the Matrix data model to see what data can be gathered from where, as well as what hidden or optional data to look for. We’ll closely examine users, rooms, servers and spaces – the four first class entities in Matrix, with an emphasis on OSINT.
After that we’ll look at conducting an investigation: tools, tips, and techniques. There aren’t the sort of ecosystem of tools, websites and scripts for Matrix research that you’d expect on a more mature platform. So we’ll need to know how to investigate manually, and know what options are available to us.
Finally we’ll dive down into the API, Matrix scripting basics, and nascent frameworks and tools we can employ. We need better tools for Matrix, so this section is all about how to get started building those. Hope you stick around for the ride as we enter the Matrix!