Research suggests that surveillance agencies could use statistical tricks to peek through the encryption that protects Web browsing.
Stung by revelations about mass government surveillance, consumer Web companies are expanding their use of encryption and releasing more details of those protections to reassure wary customers. Earlier this year, for instance, Apple released details of how communications sent via its iMessage service are encrypted.
New research suggests that the U.S. National Security Agency, or any other organization capable of collecting large quantities of Web traffic, could extract private information from encrypted communications by searching for patterns in that data stream. In tests, analysis of encrypted Internet traffic could reveal the health conditions a person was researching online. Similar techniques could glean information about use of iMessage such as when a person starts typing or what language they wrote a message in. That research focuses on an approach known as traffic analysis, which involves using statistical techniques to find patterns in encrypted communications.
Researchers at the University of California, Berkeley, and Intel developed a particularly effective version targeted against HTTPS, the form of encryption used to protect websites and visible to Web surfers as a padlock in a browsers address bar. The technique involves having software visit the websites of interest and using machine-learning algorithms to learn the traffic patterns associated with different pages. Those patterns are then looked for in a victims traffic trace.
The approach proved capable of identifying the pages for specific medical conditions a person was looking at on the Planned Parenthood and Mayo Clinic websites even though both sites encrypt connections with HTTPS. It could also identify what services a person accessed when he or she logged onto financial sites including Wells Fargo and Bank of America. On average, the technique was about 90 percent accurate at identifying Web pages. A paper on the Berkeley research will be presented at the Privacy Enhancing Technologies Symposium in Amsterdam next month.
Traffic analysis would be a useful tool for surveillance by government programs, such as those used by the NSA to collect and analyze encrypted Internet traffic (see NSA Leak Leaves Crypto Math Intact but Highlights Known Workarounds). Corporations with access to Internet traffic might also have motivation to use it, says Brad Miller, the PhD candidate at Berkeley who led the research.
There are very valid use cases of this type of analysis for companies, he says. For example, an ISP might want to gain information about its customers online activity that could be used to target ads, even if those customers have encrypted their browsing or communications. Some ISPs, such as Verizon Wireless, already sell data on their customers browsing to third parties for such purposes.
Scott Coull, a researcher with the security company RedJack, says the Berkeley work is the latest in a series of papers showing how traffic analysis could be used against consumers. When you look at the worst case for this kind of attack, things dont look very good, he says.
Coull recently found that traffic analysis can be very effective against messages sent via Apples iMessage, which are encrypted from the moment they are sent to the moment they are received. iMessage is by far the worst thing Ive seen, he says. Coull was able to identify when users started or stopped typing, were sending or opening a message, the language a message was written in, and its length, with 96 percent accuracy or higher.
That, combined with the fact that the iMessage protocol transmits a unique identifier for a device, adds up to similar metadata to what has been controversially collected by the NSA on U.S. phone calls, says Coull. If I had the ability to monitor a big chunk of traffic to and from the iMessage servers, I could come up with a social network of whom is messaging whom, and the language theyre using and the approximate size of the messages, he says.
The rest is here:
Statistical Tricks Extract Sensitive Data from Encrypted Communications