Facial recognition systems measure and match facial feature patterns to identify people. The only thing required in terms of sensors is a camera, so it is well suited for use with mobile phones and CCTV cameras. This technology works well with crowds of people as well as with individuals, so it is not only used by apps and devices to identify the user, but also being used in concerts, sporting events and in airport terminals.

In the previous post, an introduction to biometric identification systems we mentioned the complexity of evaluating different use cases for these technoogiues, and discussed the problems around error rates. We also mentioned inherent bias showing up in facial recognition systems, and now we get to dive in a bit deeper.

Read More

Welcome to the second part of our scapy series of posts. In the previous post we discussed creating custom packets using scapy and gave some simple examples.

In this post we’ll cover common commands for operations on packets like inspecting them and for network operations like sniffing traffic. In the next post we’ll introduce programming with scapy, as it was designed to be used in Python programs.

Read More

Authentication in computer systems means validating that you are who you say you are. There are many ways to authenticate users these days, and they have important implications for privacy. Username and password combinations are problematic, yet we don’t really have great alternatives at this point.

Fingerprints and facial recognition systems are the most common forms of biometric authentication, and are already used in many smartphones. They are certainly convenient, but come with security and privacy problems of their own. If collections of this biometric identification data are collected and stored it may well be altered or stolen.

Read More

In our last search engines post called Internet of Things Searching we discussed services that scan the Internet every day giving users easy ways to drill down into those results. Users can get detailed searches for routers, webcams, RDP service, Nginx web servers, SCADA controllers or whatever, as well as access robust APIs so these search data can be integrated into other systems. Today we talk about basically the opposite – user curated, decentralized general search.

YaCy is a decentralized, peer-to-peer search engine which can be run locally. Being a decentralized search engine means that it runs on volunteer nodes around the world and the creators have no idea how many people are using it or who they are. That means no search engine company knows about your search data, but also means that the quality of results depends on having a critical mass of people using it.

Read More

Yesterday’s Hacker Public Radio podcast was an introduction to Bitcoin that I recorded recently. The audio is terrible, but hopefully the content is less than terrible. I’m not sure if there’s much of an audience on HPR for Bitcoin and blockchain related content – we’ll see.

Here is what I tried to cover:

  • What is Bitcoin?
  • Blockchains and blocks
  • What are transactions?
  • What are miners and what do they do?
  • Proof of Work in Bitcoin – SHA256 hashing
  • Bitcoin consensus mechanism
  • How do wallets work?
  • Brief discussion about types of wallets and wallet security
Read More

According to the introduction page on the scapy documentation website:

Scapy is a Python program that enables the user to send, sniff and dissect and forge network packets. This capability allows construction of tools that can probe, scan or attack networks.

What that doesn’t tell you is that scapy can be used interactively or imported into your python programs. It can do a lot more than they claim above too like craft custom packets for all major network protocols including weird ones like TFTP, read and write packet capture (pcap) files, establish socket connections, deal with encryption, send ethernet or wireless frames (even invalid ones), and much more.

Read More

The proliferation of cameras in the modern world is no fad, it continues unabated. Here is an image (shown below) from a 2016 patent application from Sony Corporation for a digital camera built into a contact lens. Notice that it has wireless communications capability and a storage unit to offload images and upgrade the software and firmware.

Read More

We talked about Google Search in the previous post on the topic of search engines, but now we’re going to shift gears and take a look at some search engines specialized for searching the Internet of Things (IoT). The 800 pound gorilla in the space is Shodan, that lets users search the Internet for devices by category, protocol and more. Shodan is integrated with network discovery and exploit tools, as well as web browsers.

Common searches might be searching across the Internet for webcams, traffic signals, industrial control systems, specific services running, or devices with default passwords. Results tend to have addresses, hostnames, open ports and other details.

Shodan searches can be for “dafult password”, SCADA, webcams, RDP servies and more

Shown above is the first three results for a search for “default password” that returned 16,959 matches. The result set can be refined by choosing country, services, etc. from the menu in the left column. As you can see, the relevant information is displayed about each device and the software running on it, along with unusual things noted.

If we click on one of the results we navigate to a detail page (shown above) for that particular device. It shows the location of that device on a map (below) along with the GPS coordinates. It also shows us the open ports, along with relevant information about the services listening on those ports.

Shodan can show us entire populations of devices – anything from a model of smart TV to routers with a certain firmware version. Alerts can be setup to create a monitoring system. You might want to check Shodan after installing a new networked device.

Shodan can be installed locally, and can be used programmatically via the API. Heavy users can use the API to script common searches for specific devices or ports to great effect. You probably don’t want your insecure web cams, baby monitors or smart TVs showing up in some of these searches, so always check to ensure that you’ve taken reasonable precautions like changing default passwords.

Perhaps the leading competitor is Censys, that has a similar offering. They scan the Internet and organize their findings in a way that is easy to navigate and drill down for details. They also have a full Restful API and are widely used by enterprise customers.

Another competitor is Thingful, whose specialties include working on locating connected vehicles. Of course they’re connected but they are often moving around, so this solution sounds ideal for operators of fleets of taxis or long-haul trucks.

Then there is Reposify, a company that aims to serve the corporate market for geographically diverse firms that want to monitor their public-facing attack surface. They too offer a robust API and scan the Internet constantly.

Another IoT search service is called ZoomEye that seems to offer very much the same feature set that Shodan does including detailed search queries and a rich API.

As you can see there are quite a few services that scan the Internet looking for connected devices, certain services and ports, specific devices or manufacturers and much more. I’m sure there are others that we did not include as this is a popular and growing space for search. Next, we’re going to take a look at decentralized search engines – stay tuned!

Photographs have clues that are specific to a camera. Tiny clues, but given a group of photos form the same camera it quickly becomes obvious that they did come from the same source. Once that happens, every photo you ever take is identified as coming from your camera, on your phone, and associated to you.

Most readers at this point would probably think, “Why should I care?” The answer is that this does not just affect investigative journalists. It will affect us all because as AI systems get more capable, they can infer an amazing amount about your life from a collection of seemingly unrelated facts. Bots scour the Internet now, gathering and organizing collections of photos. The current AI technology is not advanced enough to definitively associate all photos to the people who took them, for various reasons. These reasons are just temporary technical roadblocks – they will be overcome in time.

One very sophisticated method of demonstrating that camera images came from the same device has to do with patterns of noise generated by heat in that device. Noise signatures in photographs come from ambient thermal heat, and thermal heat generated in the camera – inside the charged coupled device (CCD) itself.

Astrophysicists have long been interested in this, but the easiest solution for them is supercooling the camera, which is not an option for a mobile phone user. Camera manufacturers have built products that heat the CCD just before taking a picture, but to date this is not available in mobile phones.

Mobile phone cameras product images that display patterns of random noise in the least significant bits (LSB), which basically means the colors are altered only a tiny amount. This alteration is small, so that a human eye can’t detect it but computers can see these patterns and easily determine that it came from a certain camera.

Unfortunately, for those without a forensics lab, the easiest way to make this difficult to detect is to degrade the image. A quick and easy way to make this signature less obvious would be to save in a lossy format like JPEG, then crop or resize and save again. The obvious problem is that this is noticeably making your photos look worse.

Moreover, you would need to vary this process of saving, altering and saving each time to avoid simply changing the pattern to be detected. As if that weren’t daunting enough already, you probably need to save with fairly high compression so that it really is a lossy process. That is not something you can completely mitigate without extreme effort, but in the
process of avoiding simpler techniques you can make this more difficult.

So with that out of the way, let’s look at simpler ways of determining that photos come from the same camera; ways that you have more control over. Dust and scratches are an easy ways to determine that a group of images all came from the same camera. Tiny specs of dust unnoticeable to the eye can prevent certain pixels from changing in any of your photographs. Facebook has patents for identifying a camera based on faulty pixel positions caused by dust, scratches and similar conditions.

Clean the lens of the phone’s cameras with a cleaning cloth made for cameras or eyeglasses to increase your chances of avoiding this issue. Also try to take pictures of a wider physical range than you actually need, and crop the image in the processing phase. Other techniques that can help are resizing the image and adding adjustment layers – both can alter the pixels to avoid those dust dead spots.

This post is a followup to the previous post Cameras, People and Photos. I have a lot more to say about this topic, so stay tuned for the next installment where I want to take a closer look at JPEG files, EXIF data and digital photographs.

Google has long dominated the field of search engines in terms of market share, sporting a 77% market share in a recent survey, but many have expressed concerns about the search giant. The immensity and complexity of this search behemoth is nothing short of amazing.

According to internetlivestats.com, “Google now processes over 40,000 search queries every second on average, which translates to over 3.5 billion searches per day.” A search query using Google uses 1000 servers, happens in 0.2 seconds and travels over 1500 miles.

The principal concern for privacy advocates is Google’s business model, which is based on building user profiles and monetizing their behavior data. The incredible power Google has to build a composite profile is based on the wide range of Google services people use.

Google gathers data from Google search engine, Google calendars, Gmail, Picassa photo sharing, Youtube, Blogger, Google docs, credit card info via Google checkouts, news feeds via Reader, sites visited via Google analytics, file storage via Google Drive, smartphone usage data via Android operating system, app usage via Google Play, location information via Google maps and Waze, ad viewing and click data via Google Adsense and DoubleClick, browser-specific information via Google Chrome, and more.

There are other sources as well, but these are the most important ones in our view. Search data that is logged reveals over time an incredibly detailed portrait of a user. The personal and professional details of the user are revealed along with health and medical information, political preferences, interests and aspirations, hobbies, sexual orientation, financial situation, travel plans, religious beliefs, shopping interests and so much more.

Every letter typed, every phrase searched for, every result clicked on is significant and is recorded, because this data has an impact on Google’s profitability. However, when people know their activities are being recorded, they act differently. This is strikingly evident in activities like searching. For example, monitoring people’s behavior suppresses searches for health information.

Drawing conclusions from search data can be highly misleading too, as the search terms do not clearly reflect user intent. A search query like “presidential assassination” does not necessarily indicate the user is considering committing this crime, but could be a student writing a history paper. A searcher who queries “growing marijuana” might be someone who wants to learn how to grow it, but could also be a parent concerned over growing marijuana use in schools.

The combination of misleading search queries, ease of government access, the lack of transparency Google and others provide, and the inability of users to know how this data is used creates a problematic privacy issue.

Less obvious but problematic nevertheless is the filter bubble effect. Search results are biased based on your search history, and the results you see are those which are most likely to get clicked. So if you’re researching, you get tailored content instead of useful results. On a global scale it creates polarization, inhibits journalists from effectively researching topics, and reinforces the notion that many are living in online echo chambers – all for the sake of advertising profits.

To be fair, search engines want to analyze search data in order to improve search results ; those results that get clicked on most are the ones people are essentially voting on, which is a proxy for being the most relevant or useful. Search logs may also protect the search service from attack, and provide data that helps them to avoid those who would game the system in order to attract more visitors. But Google keeps far more data than what is needed to maintain quality results, so privacy minded individuals should really consider using a different search service that does not operate in such an invasive manner, does not have the ability to correlate data from so many sources, and does not base its business on the resale value of personal data.

In ethical, and increasingly legal terms, one of the principal issues is the secondary uses of the data – specifically search terms and the associated user profile data. By searching, users are explicitly giving consent to use these terms or phrases to generate a search result set. However, when it is combined with information from other sources it becomes a powerful indicator and the user may not be aware of what happens to their search history that is logged and aggregated over months or years.

For these reasons many privacy conscious people have increasingly been choosing alternative search engines in recent years. We’ve also seen a surge in specialized search engines that are doing a better job serving niches. This started with the major search engines like Google Search and Bing offering image and video searches, and has exploded in recent years to supplement the general purpose search services.

In the 1990s ther were a variety of search engines and search aggregators that were competitive but for years now this space has been dominated by Google search. They still dominate search, yet specialized search services are thriving too. In upcoming posts, I plan to explore decentralized search engines, IoT search engines, OSINT focused search engines, search services specializing in finding technical, medical and legal documents and more. Stay tuned for a closer look at searching.

Camera technology is not what it was a generation ago. Photogaphs and videos captured using smartphones have revolutionized how we see the world, how we share information and how we document our lives. But few understand the technologies involved and how they are analyzed. In these next few posts I want to share what I know about this topic, which is limited and will be getting outdated by the time I finish typing this introduction.

Most of us use mobile phones because they are incredibly convenient. The actual phone function is only one of many that we use though; another important feature are the cameras. With builtin 12 megapixel cameras being no big deal, we take photos of food, scenery, friends, and of course selfies. What we don’t realize is that it’s important for “big marketing” to identify cameras, and the photos they take, and match them to people.

Surveillance cameras are going through an unprecedented population growth, and feed data into AI systems that identify us by our faces and our gait. They combine this with realtime location data from our phones and suddenly every detail of your activity since you stepped out of your house is known and recorded.

We post them to social media sites as if the world needed to know where we ate lunch. We pull them into apps to manipulate them, put cute sunglasses on our selfies and use apps to share them with the entire planet for all of recorded history. Welcome to a tiny preview of what we can expect in the future.

Image credit: Public Library of Science: https://journals.plos.org

People dramatically underestimate how much information is available to anyone analyzing their posted photographs. Aside from information about your camera which is linked to you, a lot of extraneous data can be extracted from photographs. For example, shown above is a system that identifies people not in line of sight of a camera by their reflection in someone’s cornea! In the image above, we see five individuals identified from the reflection in the eye of a person in view of a camera.

Identifying images and determining that they came from a specific camera, or were manipulated with a computer program associated to a specific individual is important. Photographs are used as evidence in courtrooms, and forensics is used to support the validity claims. However, if you are determined to post a selfie on a dating site, you should take precautions to not include too much evidence that identifies you.

Next we’re going to take a closer look at camera fingerprinting. You might be surprised at the sophistication of the techniques used.