OSINT Investigations on Matrix – Tools

Tuesday , 15, June 2021

Welcome back to our series on conducting OSINT investigations on the Matrix platform. This is the fourth installment, so if you haven’t read the others you probably want to to go review them now. The first post was an introduction to the matrix platform itself – a high level description with an emphasis on items of interest to OSINT enthusiasts. After that we went through the Matrix data model in depth, exploring the available data for users, rooms and homeservers and where to find it. In the previous post, we talked about creating accounts, finding rooms and servers, and various tips and techniques.

We lamented that tools to help us automate routine tasks were few and far between, and in this post we’ll explore the RESTful API and review programming basics, as well as discussing libraries, frameworks and bots. Expect loads of links, because explaining things in more detail would result in an incredibly long post.

As mentioned earlier the Matrix protocol is built on HTTP and they have a nice RESTful API to use. You can see from the Matrix API documentation some non-destructive, publicly available “read” operations use HTTP GET requests. This includes useful OSINT interactions like fetching profile data, or asking a homeserver for metadata about a user. Other API calls use the standard POST, PUT and DELETE methods.

AUTHENTICATION EXAMPLE

As we saw in earlier posts, some information can be gathered without authenticating, but a lot of interesting data can only be collected an authenticated user. The general technique is to first get a session token, then do the research as an authenticated user. At the command line this HTTP request and response to authenticate as @osint_dude:somegreat.host should look something like:

curl -X POST -d '{ \
"type":"m.login.password", \
"user":"osint_dude", \
"password":"strongpassword", \
"initial_device_display_name":"some_cool_devicename" }' \
"https://somegreat.host/_matrix/client/r0/login"

{ "access_token":"dBk4jMK2dd0dhLEdkjFvRTx77QgGd", "home_server":"somegreat.host", "user_id":"@osint_dude:somegreat.host" }

Substitute your username, password, and homeserver obviously when POSTing to this standard endpoint /_matrix/client/r0/login

Those who have done this sort of thing before will quickly realize there are API calls for all the common tasks, and you probably want to use several of them to accomplish most basic tasks. For example, in our login example above, you might want to check to see if that homeserver is alive before attempting to login. Not strictly necessary but at the very least you can provide better error handling. From an OSINT perspective, the server software type and version should always be available at the standard endpoint shown below:

$ curl https://matrix.org/_matrix/federation/v1/version
or using a web browser, substituting your target homeserver for “matrix.org” in this example.

Also realize that while I’m showing examples using curl on the command line or in shell scripts, you can easily use Python or Javascript or your language of choice to do this. There are examples online of using Python to interact with Matrix, as we’ll see soon. Here is an example Javascript implementation of a lot of the basic functions.

JOINING ROOMS AND COLLECTING DATA FROM THEM

Turning back to OSINT tasks, users and rooms are obviously the most interesting entities to research. Let’s say we want to fetch a list of users from a room. Let’s also assume that you’ve already authenticated with your sock account. In an OSINT investigation we might would like to join a room, then get a list of the room members.

#!/bin/bash
SERVER="https://matrix.org"
TOKEN="dBk4jMK2dd0dhLEdkjFvRTx77QgGd"
ROOM="!OGEhHVWSdvArJzumhm:matrix.org"

RESULT=$(curl -XPOST "$SERVER/_matrix/client/r0/join/$ROOM?access_token=$TOKEN")
MEMBERS=$(curl -XGET \
    "$SERVER/_matrix/client/api/v1/rooms/$ROOM/members?access_token=$TOKEN" \
    | jq '.chunk[].user_id')

Obviously I’m taking a few more shortcuts here besides hard-coding some values in order to convey the basic idea. For example we should be checking the room status before joining, checking for success joining the room before asking for list of members, and maybe logging results somewhere for later use.

Notice that we’re using a utility called jq to parse the JSON because we want to list the user_ids only. You’ll want to use the preferred methods in your programming environment of choice to parse JSON, unless you’re doing this via shell script. In our contrived example we’re using hard-coded values for server, token and room instead of taking those values from previous results (e.g. $TOKEN) or user input, but it should suffice to provide some useful insights.

As long as we’ve joined the room and are collecting data, we might want to collect more information than simply a list of members. We should collect at least the room name, topic, admins and moderators – all the data items we discussed in the previous post about the Matrix data model. The entire list of Matrix API calls is extensive, and well documented.

USEFUL PYTHON RESOURCES

Although I wanted to show some basic examples of send common requests, you’re probably going to use a language like Python or Javascript, because you can easy take advantage of re-using code to make asynchronous API calls, parse JSON responses, even dealing with key management for encrypted rooms. If you’re trying to make something fancy you might want to handle uploaded media and follow posted links. All this is so much easier when using some library or framework that does a lot of the heavy lifting for you.

A popular choice for Python programmers is the matrix-nio project. As you can tell from the matrix-nio documentation it’s a stable, full-featured codebase on which plenty of projects are based. Here is the equivalent to our earlier example of user authentication using matrix-nio instead of curl commands.

Another great python resource out there is this mautrix-python framework from a Matrix OG named Tulir. You’ll want to investigate their work in particular if you want to take this to the next level, as they have useful code for bridging to other platforms, writing bots and more.

That’s probably where you want to start this journey, but it’s worth mentioning for more advanced users that you can create bots to help automate simple data collection tasks. In this scenario one would interact with the bot directly, providing the information required (e.g. MXID and Room ID) and the bot would perform useful tasks and return the results in the same room.

There are bot frameworks based on matrix-nio, like this tiny-matrix-bot, and even a bot framework built by Tulir called maubot. Here is a simple guide to getting started with bots using python. We’re not going to look at bots in detail in order to limit the scope of this article, but if you’re going to run a homeserver you might consider building and using some basic bots to assist in your investigations.

FINAL THOUGHTS

Matrix seems stable enough for some OSINT tools to perform basic research tasks, and could definitely use some OSINT bots like exist on Telegram. Any reasonably high-level language or web framework makes a great choice since the core strategy is using their API.

I didn’t talk about bookmarklets but certainly quick Javascript snippets are a powerful way to create some useful tools to be used in the web-based Matrix clients. Taking that a step further, many clients are actually web apps, e.g. electron apps, so there’s really no reason why an OSINT-friendly client couldn’t be put together.

I hope this series was useful to someone, and I plan to publish more in time. If you have any ideas, questions or suggestions feel free to DM me on Twitter at @mightbemike. Happy Matrixing!