OSINT Investigations on Matrix – Data Model

Thursday , 27, May 2021

In the previous post we gave a high-level introduction to the Matrix platform, how it operates, what characteristics it has, and why it is interesting from an OSINT perspective. In the next post in this series, we’ll examine tools, tips and techniques useful to a researcher on this platform, followed by a closer look at building tools for Matrix researchers. Right now we’re going to see what useful data is available, where, and how to find it.

MATRIX DATA MODEL FOR OSINT

The major entities in Matrix are users and accounts, rooms, homeservers, and communities/spaces. In this post, we’ll dive into specifics: how they are related, what data is accessible with and without authentication, what data is optional, where to find specific data, and a bit about understanding the relationships of users, rooms, homeservers and their data.

Here is my attempt to depict the relationship between those first-class entities in Matrix (above) and the available data we should look for that might be associated with each. I ran out of room thoughThis is surely not too useful without some explanation and some examples, so let’s dive in a bit deeper.

MATRIX USER ID – MXID

Users are not first class entities on Matrix, user accounts are. The distinction is important as it’s quite common for a user to have multiple accounts, for a variety of reasons. Each account has a unique Matrix identifier called a MXID. When I refer to user accounts from this point on, I’ll always be talking about MXIDs.

user profile as displayed in Element client

Here we see displayed a user profile for a user with display nameRichard Stallman“. Below the display name is this user’s MXID @r23:tedomum.net, which is the permanent identifier for this user account. Just like an email address, this MXID has 2 parts – a username, and a host indicating the homeserver. So this user authenticates to the homeserver running on tedomum.net. New users often create accounts on the first homeserver they find – matrix.org. Other choices are often interesting, whether the user runs their own homeserver, uses one known to be a haven for certain kinds of questionable content, etc.

This relationship between a MXID and a homeserver is the strongest connection in Matrix, as the homeserver literally mediates all user interaction with rooms and users. Remember this is a federated system, so users do not interact with some central entity or directly peer to peer. Their homeserver actually interacts with other homeservers on their behalf.

So what information can we get from this user profile? Obviously there is (optionally) a profile pic, and like many other platforms we want to click on it to get the full, uncropped version just as the user uploaded it.

Beneath the MXID there is a presence indicator, showing that this user is offline. This is not under the control of this user, but is a function of the homeserver. It’s usually not enabled because it imposes additional overhead on the server software, which is already overly resource-hungry at this point. It deserves mention however, because if their homeserver supports it, that might be used to infer the timezone for a user based on the time of day when they are active.

Next we see that this user has a role of “default”. This may be confusing in that it suggests what the power level (authority) of a user is within a given room. This user account has the default power level for this room, but might have more privileges in other rooms. More on power levels later, when we cover rooms.

The “verify” link has been clicked here. It expanded to show the current sessions for this user account. These often indicate client software and operating system, but cannot be considered to be completely reliable, as this can be set by the user after authenticating. There is a bit of nuance here as right after authenticating, but before the user changes it this value is more reliable – at least to the extent that the client reports itself. After authenticating the user is able to modify this value to be whatever they want, including disinformation. The user account shown has 2 active sessions, one of which is identified as a desktop client and the other as a mobile client.

These session identifiers will often indicate the language of the user, as well as the operating system, and in the case of the web-based client even the browser type. Browser-based Matrix clients will create a new session each time the user authenticates, without cleaning up old sessions. Users can get rid of unwanted sessions by either logging out (close current session) or by explicitly deleting old sessions.

The final piece of interesting data shown in this profile is the “Jump to read receipt” link. Clicking on this in a room navigates to the last message seen by this MXID in the current room. Most clients are going to display a timestamp next each message, so you can tell when the )authenticated) user last visited the room. In the aggregate these timestamps do suggest the time of day a given user is online and posting messages.

One related thing to mention is that a user can provide a phone number and/or email address when signing up for an account, and can make those discoverable by others who search for their friends via an identity server. There is more nuance here, but the most important takeaway is that using an identity server is optional, and the information provided may be unreliable, since it is supplied by the user.

ROOMS

Matrix rooms are the most confusing entity for many people due to the federated nature of Matrix. Rooms are ephemeral and do not have a permanent home on any particular server. Rooms are the mechanism used for group chats even if it is a private chat with only two people talking to each other. This is where the most features seem to live, and rooms have most of the remaining information we’re interested in from an OSINT perspective. Each room has a state that is persisted, with all settings stored and editable by any administrator of that room.

Some rooms are public, meaning that anyone can search for them and can join, and some are private. Some rooms have the “preview enabled” feature set, which means that anyone can view the content without joining.

In this image we see the general settings for a room for which we have visibility, as presented by this particular client. Room name, topic and perhaps avatar are helpful for public rooms to indicate what the room is about in room search results. Room names have the same two components as MXID: a name portion and a host, e.g.
#osint-chat:matrix.org

Room topic is often used to provide rules and guidelines, link to communities, or point to related external resources like a code of conduct or website that the room is based on. View it in room settings, or by typing /topic into your client. All these buttons to click have their slash-command equivalents.

Public address is analogous to an A rec in a DNS zonefile. This is the canonical alias for a room, and the local addresses are aliases that function like CNAME recs. Local addresses must be set by a privileged MXID from that same homeserver, and that often allows you to deduce who set the alias by process of elimination. Local addresses allow the room to show up as if it were a local room on a given homeserver. i.e. when users search for rooms on their local homeserver, it will show up in the result set.

Shown here is the advanced tab for room settings. The internal room ID is the actual room identifier, assigned when the room was created, whereas the room name and the other addresses are user-changeable. The host portion of the room ID does not mean that this room lives on that homeserver or even requires that server to exist, it’s only function is to prevent namespace collisions. Room can always be referred to using this identifier.

Room version is not so useful from an OSINT perspective, although it suggests how proactive the admin team for that room is. Version 1 rooms are unstable and should always be upgraded, so seeing version 1 indicates the admins are not active or are not aware of some of the finer points of room administration. Version 6 is the current room version as of this writing.

The real treasure trove is found by clicking on the Developer Tools button, or in your client by typing /devtools. This is a GUI showing all room settings. For example, we can see all homeservers interacting with this room by clicking on the button “Show Servers in Room” There is at least one MXID in the room for each server displayed here. For public rooms with aliases on matrix.org you can see the actual user counts per server by using the previously referenced view.matrix org site.

Another useful choice in the dev tools is to view the server ACL (access control list), the list of servers that are blocked from interacting with this room. This is on a per room basis, and reflects the sensibilities of the room administrators. This often reflects a history of abusive room members from certain homeservers resulting in it being added to this list. It also might represent a list of “undesirable” servers that a room administrator blocks across a group of related rooms.

One of the fundamental concepts in room administration is the notion of power levels. Every user has a power level assigned for each room they belong to. Room administrators can decide what power levels give users the authority to take various actions. Power level 100, written as PL100 means the user has no limits on the actions they can take in the room. PL100 means that user cannot be demoted or kicked out of the room. All these permission levels can be set by the rom administrator(s) so which authority is assigned to what power level can vary from room to room.

Room defaults are PL100 is admin, PL50 is a moderator (can delete messages, kick or ban users, change some room settings) and PL0 is the default that allows users to read and post messages. It is possible to “mute” a user by changing their power level to less than zero, e.g. PL-1. All these settings are flexible. The room state includes a list of users (MXID) and their respective power levels.

One interesting room property is the room ban list. This shows the MXIDs who have been banned from the room, with an optional reason comment recorded for future reference. Some insight into room administration policies can be inferred from this information.

There are still other room settings that are not worth exploring in depth, but we’ll mention some that affect the access you’ll have to room conversations. Room settings allow the room history to be available in part or full to members or to the general public depending on when they were invited or when they join.

Another setting that affects how much visibility you have into a room is encryption. All network traffic is encrypted using TLS, since this is all built on web protocols. However, the messages that are posted to rooms may or may not all be encrypted, depending on that room setting. If encrypted, then all users must verify other users in order to see their messages in plaintext. Let’s skip the details of what the verification process actually means or how it works because it’s not relevant to OSINT. So for investigators, this means that to see the room content in encrypted rooms you’ll not only need to be a member of the room but verify and be verified by some or all of the other members.

Rooms are the primary entity that users interact with conceptually, despite all interactions actually being mediated by a homeserver. Users join rooms, read conversations there, post message to rooms, leave rooms, and so on. Related rooms can be included in communities, now re-branded as spaces.

SPACES

Spaces are the successors to communities. They are collections of rooms and other spaces that can, or will soon be able to share group-wide room properties and default settings. Spaces can be nested in hierarchies, with public and hidden rooms included throughout, and can be hidden or public. Some examples of features being developed include restricting room membership and managing power levels across groups of rooms.

Let’s begin with an ID that is indistinguishable from room IDs in format – #roomName:host. They have a name, and a description that looks just like a room topic. Spaces can have an avatar too, just as a room can.

Spaces will be getting some special properties that allow them to act as containers where settings can be shared across rooms, and rooms will get settings allowing them to participate if they choose to. It’s not clear at this point what other data specific to spaces will be available and useful to OSINT researchers, so let’s move on.

HOMESERVERS

The Matrix federation consists of lots of homeservers. Here is one listing of free public homeservers that can serve as a good starting point, and here is another list. There are privacy and security reasons why people sometimes choose to run their own homeserver, but many do it just because they want to learn how things work. The privacy reasons people cite are the same reasons homeservers are interesting from an OSINT perspective. This is the place where metadata about users and their activity is available.

There are currently two flavors of homeserver software. The reference implementation that is stable but resource-hungry is called synapse. This written in python and features a PostgreSQL backend data store. A newer alternative written in Go called dendrite aims to be more performant and scalable, but it’s generally considered not quite ready for primetime.

I mentioned earlier that all user interactions with the rooms are actually proxied by the homeserver where that user is logged in. That is to say that a user authenticates to a homeserver, then that server sends and retrieves events/messages on behalf of the user. For example, if the user navigates to a room to read the latest messages in the ongoing conversation, behind the scenes their homeserver is actually fetching the latest messages from other homeservers to relay back to that user’s client.

Why do we care? When a user is authenticated, that server knows what IP address the user is communicating from. The homeserver is storing the various settings for this MXID in it’s database. All room memberships are known to the homeserver, since it needs to fetch recent content from those rooms for the user to see. In fact, since the homeserver is sending and receiving (sometimes unencrypted) messages, it potentially makes a quite detailed view of a user’s activity available to the server administrator. In the worst case, a homeserver can function as a honeypot, collecting data on it’s users. Hence the importance of carefully choosing a homeserver that you trust, or running your own.

So what useful information can we collect from homeservers? This is a server, so like any other sort of server software, we can use familiar tools and techniques for discovering information about the IP address, domain name, hosting service, other software running on the server, etc. Homeservers often use a dedicated subdomain for Matrix, and sometimes provide an overview or code of conduct or other information about the community.

COMING NEXT

In this post we looked at how each of the four major entities in Matrix relate to each other and what data is available from each. We saw how a user profile contains some useful information, how to find it all, and what it all means. We also learned a bit about rooms, how to locate public rooms, power levels for users in a room, how to look through the room settings to find out information about permissions and policies for that room. We talked a little about spaces, not because they’re very useful currently but because they’re the area of focus for the dev team, and the likely place to see new functionality coming soon. Lastly we looked at homeservers, noting how to find them, what to gather from them, and outlining their rather important role in making everything work.

In the next post in this series we’ll discuss tools, techniques and tips that might be employed in an investigation on the Matrix platform – stay tuned!

Tags:,