Decentralized Search

Wednesday , 31, July 2019

In our last search engines post called Internet of Things Searching we discussed services that scan the Internet every day giving users easy ways to drill down into those results. Users can get detailed searches for routers, webcams, RDP service, Nginx web servers, SCADA controllers or whatever, as well as access robust APIs so these search data can be integrated into other systems. Today we talk about basically the opposite – user curated, decentralized general search.

YaCy is a decentralized, peer-to-peer search engine which can be run locally. Being a decentralized search engine means that it runs on volunteer nodes around the world and the creators have no idea how many people are using it or who they are. That means no search engine company knows about your search data, but also means that the quality of results depends on having a critical mass of people using it.

YaCy is a java application so it will run on any major platform, and it can easily be used to search a local network, or used to crawl sites, and comes with an easy to use web- based administration interface. This makes it a great choice to use for personal use at home, and is highly configurable to use as a search portal for a business.

One thing to note in terms of preserving privacy is that YaCy does not use HTTPS. So if you’re using YaCy as an external search engine, you should send all data through an encrypted tunnel, for example a VPN connection.

Here (above) is a graph of the network connections of a YaCy instance, provided by the web interface, with my local node shown top center.

Another search alternative worth mentioning is SearX, which is an open source search engine that searches dozens of other search engines and aggregates the results. There is no user tracking or profiling and it can be queried from the Tor network.

SearX has a flexible syntax and API allowing you to do highly specific searches or query other particular search engines. Best of all, you can easily setup and run your own instance for maximum privacy. SearX gives you raw, combined results that are never personalized.

The folks behind Searx run a public instance at searx.me and there is a list of public nodes here if you don’t feel like running your own, as well as instances running on Tor. It is recommended to run your own instance.

Search engines are powerful aggregators of information about us, our preferences, our devices, our health and so much more. Like any powerful tool we should be thoughtful in our use of them. Everyone should think about what type of information they are sharing with their searches, what company that data is being shared with, and what other profile data it can be correlated with.