A couple of weeks ago The Guardian released new information about a program called XKeyscore the NSA operates, giving the government the capability to search emails, social media and browsing history amongst other data.
The presentation reveals fascinating insight into the program, its reach and capabilities.
Data is collected around the world at aprox. 150 sites, which seem to be clustered around Central America, Europe and the Middle East. It is pitched as unique because of its general capability allowing the operator to go “shallow” or “deep” when performing queries on the system.
“Shallow” query operations enable the analyzes of large data sets or in case of monitoring real-time activity (tipping), when the data rate is too high.
Because large amounts of time spent on the web is performing actions that are anonymous in some sense, XKeyscore has the capability to detect anomalies in the traffic that lead to intelligence and thus triggering traditional tasking.
On how to query the system the slides show the power lies in being able to look for anomalous events, giving specific examples such as:
- Someone whose language is out of place to the particular geographic region they are at.
- Someone using encryption which would signal they have something to hide.
- Someone searching the web for suspicious information.
- Show all encrypted word documents from Iran.
- Show all encryption usage in Iran.
XKeyscore stores extracts and authoring information on documents giving it the capability to trace where the document originated.
Data volumes are so high that according to the presentation, data collected never leave the sites, but is rather deleted at a finite time after being run through extract plugins which index and store metadata.
Traditionally collection of information is triggered by a strong-selector event when the target is known, but the system is capable to work back from an anomalous event to a strong selector, as well as tie in with other existing systems to allow collection after the event.
Examples of utilizing this approach to analyze data are also outlined:
- Finding a target that speaks German in Pakistan.
- Someone who has utilized Google maps to scope target locations.
- Who wrote a document and where that has been passed around through numerous people.
- Find all excel spreadsheets coming out of Iraq and map IP addresses. (Note: this particular point mentions MAC address which is incorrect.)
Another system comes to light named TAO (Tailored Access Operations) which provides XKeyscore with the capability to report on all exploitable machines in a particular country.
As new web services come online, the system scans metadata collected for the username which is likely reused from service to service, providing the discovery of new applications the agency had no idea about.
Future enhancement efforts centered around higher speeds, better presentation, VoIP, and adding metadata from Google Earth and EXIF tags. Keeping in mind this presentation is dated 2008, the future is here.
VoIP traffic can be collected and reconstructed; EXIF metadata is particularly interesting. From this metadata you could query for photos taken by an particular camera in a particular geographic region, on a particular date. Cross referencing exposure and time of day could let you determine if the photo was taken indoors or outdoors.