Search Engines and Google

Wednesday , 24, July 2019

Google has long dominated the field of search engines in terms of market share, sporting a 77% market share in a recent survey, but many have expressed concerns about the search giant. The immensity and complexity of this search behemoth is nothing short of amazing.

According to internetlivestats.com, “Google now processes over 40,000 search queries every second on average, which translates to over 3.5 billion searches per day.” A search query using Google uses 1000 servers, happens in 0.2 seconds and travels over 1500 miles.

The principal concern for privacy advocates is Google’s business model, which is based on building user profiles and monetizing their behavior data. The incredible power Google has to build a composite profile is based on the wide range of Google services people use.

Google gathers data from Google search engine, Google calendars, Gmail, Picassa photo sharing, Youtube, Blogger, Google docs, credit card info via Google checkouts, news feeds via Reader, sites visited via Google analytics, file storage via Google Drive, smartphone usage data via Android operating system, app usage via Google Play, location information via Google maps and Waze, ad viewing and click data via Google Adsense and DoubleClick, browser-specific information via Google Chrome, and more.

There are other sources as well, but these are the most important ones in our view. Search data that is logged reveals over time an incredibly detailed portrait of a user. The personal and professional details of the user are revealed along with health and medical information, political preferences, interests and aspirations, hobbies, sexual orientation, financial situation, travel plans, religious beliefs, shopping interests and so much more.

Every letter typed, every phrase searched for, every result clicked on is significant and is recorded, because this data has an impact on Google’s profitability. However, when people know their activities are being recorded, they act differently. This is strikingly evident in activities like searching. For example, monitoring people’s behavior suppresses searches for health information.

Drawing conclusions from search data can be highly misleading too, as the search terms do not clearly reflect user intent. A search query like “presidential assassination” does not necessarily indicate the user is considering committing this crime, but could be a student writing a history paper. A searcher who queries “growing marijuana” might be someone who wants to learn how to grow it, but could also be a parent concerned over growing marijuana use in schools.

The combination of misleading search queries, ease of government access, the lack of transparency Google and others provide, and the inability of users to know how this data is used creates a problematic privacy issue.

Less obvious but problematic nevertheless is the filter bubble effect. Search results are biased based on your search history, and the results you see are those which are most likely to get clicked. So if you’re researching, you get tailored content instead of useful results. On a global scale it creates polarization, inhibits journalists from effectively researching topics, and reinforces the notion that many are living in online echo chambers – all for the sake of advertising profits.

To be fair, search engines want to analyze search data in order to improve search results ; those results that get clicked on most are the ones people are essentially voting on, which is a proxy for being the most relevant or useful. Search logs may also protect the search service from attack, and provide data that helps them to avoid those who would game the system in order to attract more visitors. But Google keeps far more data than what is needed to maintain quality results, so privacy minded individuals should really consider using a different search service that does not operate in such an invasive manner, does not have the ability to correlate data from so many sources, and does not base its business on the resale value of personal data.

In ethical, and increasingly legal terms, one of the principal issues is the secondary uses of the data – specifically search terms and the associated user profile data. By searching, users are explicitly giving consent to use these terms or phrases to generate a search result set. However, when it is combined with information from other sources it becomes a powerful indicator and the user may not be aware of what happens to their search history that is logged and aggregated over months or years.

For these reasons many privacy conscious people have increasingly been choosing alternative search engines in recent years. We’ve also seen a surge in specialized search engines that are doing a better job serving niches. This started with the major search engines like Google Search and Bing offering image and video searches, and has exploded in recent years to supplement the general purpose search services.

In the 1990s ther were a variety of search engines and search aggregators that were competitive but for years now this space has been dominated by Google search. They still dominate search, yet specialized search services are thriving too. In upcoming posts, I plan to explore decentralized search engines, IoT search engines, OSINT focused search engines, search services specializing in finding technical, medical and legal documents and more. Stay tuned for a closer look at searching.