All About MEtadata
I'm sure many of you have heard this description before from various news outlets, but to put it succinctly: metadata is data about data. There are two kinds of metadata: structural metadata and descriptive metadata.
Structural metadata provides information about the design and specification of data structures. Think of structural metadata as the blueprints to a house. In this analogy, the house is the "data" and the blueprints are the "data about the data". Simply looking at the house, you might see a nicely painted wood paneled exterior; on the inside you might see things like light switches on walls, running water in the bathrooms, etc. The blueprints (or structural metadata), will tell you where the wires run within the walls to connect the light switch to the light, or where the pipes run under the house to enable running water - things you cannot see with the naked eye. Likewise structural metadata provides information about the structure of the data that is not necessarily obtainable simply by observing the data.
Descriptive metadata provides information about "individual instances of application data", the data content (to borrow from Wikipedia). Continuing the blueprint analogy, descriptive metadata provides information like the name of the paint color on the nice wood paneled exterior, or the name of the style of lamp hanging from the ceiling in the dining room. Descriptive metadata provides information describing the data content (or in my blueprint analogy, provides information describing the home's content).
The information provided by Metadata varies based on media type. The information provided in photo metadata, for example, is different from the information provided in video metadata, which in turn is different from the information provided in the metadata from a phone call. Photo metadata, for instance, can tell you the dimension of the image in pixels. Since phone calls do not have dimensions or pixels, phone call metadata does not have this information, and instead provides information relevant to it's medium: call duration, call location, etc.
Being that my expertise is photo metadata, I will be focusing on that (though I have linked some articles at the end of this post if you are interested in learning more about metadata from other mediums).
When looking at photo metadata, there are several 'sections' or 'categories' with their own sets of pertinent information: File Properties, IPTC Core, IPTC Extension, Exif (Camera Data), GPS, Audio, Camera RAW (when applicable), Video, DICOM, and Mobile SWF. Metadata does not provide all information it is capable of conveying automatically. A lot of information that metadata is capable of providing must be user-entered. Such is the case with IPTC Core, IPTC Extension, and DICOM. We will ignore Audio and Video as those only factor in with Audio and Filmed media; the same for SWF Mobile. Camera RAW metadata is only applicable to images which have been exported from the RAW file format to another format (such as JPEG). The RAW file format is used by professionals and serious hobbyists, but the day-to-day user will likely never encounter it.
Thus, in looking at the information we can glean from metadata in the average user's photo we will focus on "File Information", "Exif", and "GPS".
The file information "sub-section" of photo metadata provides basic information about the image file: The file-name, it's format or document type (JPEG, GIF, PNG, etc.), date of creation and date last modified, file size, dimensions in pixels and inches, DPI (dots per inch), bit depth, color mode, and color profile. Below is a screenshot example of this. I have opened an image taken on my phone in the file management program Bridge, to take a look at the metadata.
This is a photo I took of my computer during the process of doing some repairs last year. On the middle right hand side of the image you will see the "Metadata" tab selected, with its first sub-section being "File Properties". You can learn some basic essential details about the image such as when it was taken and the size of the image. Click to enlarge.
Exif stands for Exchangeable image file format and is a metadata standard which catalogs detailed camera data. It provides information regarding both the structural and descriptive metadata essentials about an image: camera make and model, lens focal length, ISO, shutter speed, aperture, image resolution, etc.
Below is a screenshot from Bridge, of the same image as above, highlighting the Exif data.
Based on this information alone, you can determine not just the camera's make and model, but the type of environment the image was taken in: whether it was bright or dim, night or day.
GPS is more of a concern on phones than regular cameras. If you have a smartphone with a camera, you can bet that every time you snap a photo the camera is logging your location in the image's metadata. As of right now, the same is not true of ordinary cameras, however the push to make ordinary cameras connected to wifi or a network of some sort is on the rise. Such a thing would enable GPS coordinates to be embedded into the metadata of photo taken on ordinary cameras as well as smartphones.
The specific GPS information logged by metadata includes latitude and longitude, altitude, and image direction. While latitude and longitude are not enough to pinpoint your location exactly, the other information provided (altitude and image direction) are enough to pinpoint your location to within about 10-20 feet.
The NSA and Photo MEtadata
Based on the information revealed in the Snowden files last year, we learned that the NSA is not simply collecting intelligence on suspicious domestic or foreign individuals with suspected or proven ties to terrorism/crime (as was the general assumption by the American public up to that point); the NSA is collecting information on US Citizens, warrantlessly, and engaging in what is called "Dragnet Surveillance". Dragnet Surveillance is exactly what it sounds like: dragging a net through the miasma that is the digital world, collecting everything that falls into it. What this translates into, is that data is being collected on a large number of US Citizens who have done nothing wrong, and for which there is no suspicious activity or criminal/terrorism ties to merit surveillance.
So what exactly is the NSA collecting?
-Phone call metadata
-Emails, Facebook posts, instant messages
-Large amounts of internet traffic/more metadata
In those categories, images are often and easily collected in their entirety, or the metadata is logged.
What can Be Learned from Photo Metadata
You can learn a lot about the content of an image, as well as the environment in which it was shot, by reading metadata without ever actually looking at the content of the image itself. However, you can also learn a lot that has nothing to do with the image itself, but rather, the person taking the image.
I'll start with the first point: what metadata can say about an image (that sometimes even the image itself doesn't say). Below is the metadata from an unnamed photo. From the metadata alone, here is what someone with a competent understanding of both photography and metadata could learn.
1) This image was created by a serious hobbyist or professional, and has more than likely been edited.
The image originates from the Lightroom application, which means the JPEG file was created from an export from the program. Lightroom is a batch image editing program, designed for advanced hobbyist/enthusiasts and professionals to edit large numbers of photos at a time, then export them as a group. It is not a file management program, therefore if the image originates from Lightroom, the original file was imported, likely edited, then exported as a new JPEG file with the changes.
The image was taken on a professional grade DSLR camera (Nikon D300s), the exposure mode was set to manual, the lens used is not a "kit" lens (lenses that are provided in package bundles with bodies at a discount, the individual had to go out and purchase this lens specifically) - all indicating an individual who has a competent degree of technical understanding and skill, and who is trying to achieve an image quality beyond that of the average consumer.
2) This image was taken in a well lit room indoors, or in the shade outdoors. The image is not a landscape or an environmental shot, and the subject was likely near the photographer, perhaps within 25-50 feet.
The lens used was a fixed focal length lens, a 35mm f/1.8 lens. This type of lens is very versatile and is ideal for a wide array of environments, but is not good for anything too close or too far away. However, due to the aperture, shutter speed, exposure, and ISO, it is unlikely a landscape or environmental shot as landscapes reflect a large amount of light at the camera, and the settings would have needed to be drastically different. Therefore, whatever this is a photo of is likely nearby the photographer. Again, judging based on the aperture, shutter speed, exposure, ISO, and lack of flash, one can surmise that the environment had fairly balanced lighting: bright but not so bright as to be hard on the eyes, or to wash out the subject. Therefore, likely indoors in a well lit room, or outdoors in the shade on a sunny day.
However, the most telling piece of information listed doesn't have much to do with the tech specs: It's the body serial number. To the ordinary person this will not reveal anything. However, to an organization like the NSA, the serial number is a name. With access to purchase records, the NSA can track the body to the original purchaser. Which, in this case, is me. I'll explain why that's important in the following section.
You can see the original image below. Click to enlarge.
Those inferences, however, are made with a rather limited focus. To really squeeze all the intelligence out of photo metadata you can, you need to look at the bigger picture.
the big picture
Photo metadata, and all other types of metadata (phone calls, emails, etc.) can not only provide a lot of information about the content of the image, file, or digital event (if not more information than can be gained by viewing or observing the source), when you look at the metadata of several photos/files together, patterns begin to emerge.
Here's an example:
Joe is an older man, getting to retire next year. He isn't big into social media or having every new smartphone, tablet (in large and small sizes, now), laptop, and TV that comes out every few months. He lives a relatively solitary life with his wife, and is a fairly private man. He isn't big into social media or sharing every little thing - he doesn't have Facebook or Google+ or Twitter.
He works on the pier. He lives nearby and when he leaves the house after getting ready, shortly before dawn, he's just in time for the sunrise. Joe does have an iPhone, and often takes a photo of the sunrise if it is particularly beautiful that day. It isn't something he shares on the web, mostly just saves them to a folder on his computer, but he will occasionally email his favorites to friends and family.
If you look at the photo metadata of all the photos that are taken on Joe's phone over a period of time you can determine a lot of things about him that have nothing to do with his photos. Firstly, one can learn that Joe has a habit: photographing sunrises on a regular basis, generally around the same area. Looking at the location of each photo taken at sunrise over the course of a month, which you can determine based on the time stamp, you could determine the path he walks to work every day. By looking at the variation of times the photos were taken, you can determine the window of time in which he is out walking.
And, if you are the NSA, you have the resources to do even more.
Which bring me to my next point: Cross Referencing.
Back to the example about Joe:
Joe takes his sunrise photos on an iPhone. The iPhone, as well as all other smartphones, provide accurate time, date, and location information within the image's metadata. With access to all of the major mobile providers' user and call metadata, it would not be difficult to cross reference the GPS data from one of Joe's sunrise images with pings off nearby cell towers around the time the image was taken, to determine what subscriber was taking that photo. Once your name can be ascertained, they can connect that photo and all the inferences they made from that photo, to the larger network that is your digital footprint (at least that which the NSA has mapped and connected to you): Facebook, Google+, Twitter, Myspace, Tumblr, Pinterest, email, instant messenger, Skype and FaceTime, website memberships, phone call and text message records, emails, the list goes on.
Connecting the Pieces
The NSA has access to such a broad array of metadata, public data, and private user data, that they are able to make very accurate assumptions and educated inferences about people and the lives they lead. Effective mass surveillance is like putting together a puzzle. Individually each puzzle piece doesn't mean much. If someone gives you a single puzzle piece to a 200 piece puzzle of a bridge, you won't have any idea that is what it is supposed to eventually create - the part of the image image on the puzzle piece is incomplete and nonsensical, with little to no meaning provided to the observer without at least some of the rest of the pieces.
However, once enough pieces have been gathered, they can start to be put in the right order, pieced together, connections drawn; a picture starts to emerge and the viewer can see how all of the separate pieces are interconnected.
Here is an example.
Look at the following three sentences:
"He has a long criminal record of petty crime."
"Jim's new years resolution is trying to start fresh and get his life on track."
"Sally has a new boyfriend, but she wants to keep the relationship under the radar for a while."
When taken as three, individual and unconnected sentences, there is no real meaning behind them. However, if you put these three sentences together in the right way, they cease to be such:
"Sally has a new boyfriend, but she wants to keep the relationship under the radar for a while. He has a long criminal record of petty crime, but Jim's new years resolution is to start fresh and get his life on track."
Suddenly, a story is told, and fairly accurate inferences can be made about Sally and Jim, as well as their situation, that could not be made before. Mass surveillance works the same way. With each new seemingly innocuous data point accumulated by the NSA, they have another piece to the puzzle. Once they have enough pieces, a picture starts to emerge. With the added ability to cross reference other data points, the picture ends up being something we ourselves can't even see (at least yet).
If you are interested in learning more about metadata and the NSA's spying/surveillance tactics check out these great articles and resources:
FAQ: What You Need to Know About the NSA’s Surveillance Programs
FAQ: What You Need to Know About the NSA’s Surveillance Programs
NSA Files: Decoded
Prism (Surveillance Program)
Global Surveillance Disclosures (2013 - Present)