How deep must DPI be to incur privacy risk?

There has been some debate about just how “deep” the inspection of packets needs to be for the inspection to qualify as DPI. The strictest conception of “deep” draws a line between IP addresses and all other data in the packet, claiming that the use of any data other than the destination IP address constitutes DPI. A slightly more expansive conception makes a distinction between IP headers and the rest of the packet: inspection of any packet data other than IP header fields is considered deep. Chris Parsons classifies three levels of depth: “shallow,” which uses OSI layers 1-3, “medium,” which uses OSI layers 1-5, and “deep,” which uses data at all OSI layers. Paul Ohm has noted that a comparable distinction that telephone companies have historically made is between “content” (the actual telephone conversation) and “addressing” (the numbers dialed).

Taking these definitions together with an understanding of the layered structure of Internet packets, a conception of how “deep” DPI needs to be before it takes on privacy implications begins to emerge. It seems quite clear that ISPs’ use of IP headers, at OSI layer 3, is unobjectionable. IP headers have always been used by ISPs to route to packets to their destinations, and thus there is no new privacy risk created through this continued use.

TCP headers provide a minimal amount of additional information about an Internet user’s online activities, primarily in the form of port numbers. Many applications adhere to specific registered port numbers. Thus, by inspecting TCP headers, ISPs may glean some limited information about their subscribers’ activities that goes beyond what IP headers reveal. However, in part because web browsing is the dominant form of Internet usage, many non-web applications have migrated to port 80 in order to take advantage of web optimizations. Some applications also use non-standard ports, or they change the ports they use over time. Thus, while TCP headers provide some application-level information, when used in isolation, the privacy risk derived from inspection of TCP headers appears to be minimal.

Determining the user’s privacy interest in packet payloads is a thornier task. Payloads often contain application headers, and many of these headers — such as the HTTP version type and content encodings cited earlier — are fairly innocuous from a privacy perspective. However, other kinds of headers can reveal much more sensitive information about a person’s Internet activities, such as search terms, email recipient addresses, and many other kinds of data.

Even if the level of privacy protection accorded to a particular kind of application header were clear, for most application headers it would be next to impossible for an ISP to pluck out the header information without also inspecting at least a small amount of other data within a payload. Whereas IP and TCP headers have standardized formats, there is no standard format for payloads, nor a mechanism for ISPs to know definitively from IP or TCP header information whether a particular packet will contain an HTTP request, email to/from headers, or any other specific application data. ISPs can make good guesses about what a packet contains by observing the characteristics of the traffic flow – the sequences, sizes, and timing of streams of packets – together with TCP port information, but ultimately, finding application header information within a packet requires inspecting the packet payload itself.

The same analysis holds true for application data, including the content that Internet users access online and generate themselves. Internet users may not have a particularly strong privacy interest in some of this data – the content of an online weather report, for example. But the breadth of activities that Internet users engage in is increasingly vast and can incorporate information at all levels sensitivity. Email, instant messaging, VoIP, file-sharing, and the inumerable list of web-based activities – from reading the news to visiting to social networks to searching for health information – each carry with them a particular set of user privacy expectations. To attempt to ferret out particular bits that an ISP could inspect without sweeping in privacy-sensitive content is likely an impossible task.

The thread tying all of these pieces together is the inspection of application-level data. Whether it exists as port numbers in TCP headers or in packet payloads, when ISPs go beyond their traditional use of IP headers to route packets, privacy risks begin to emerge. They may be minimal, as with the mere inspection of TCP headers. But beyond OSI layer 4, drawing any firm conclusions about which parts of a packet are or are not privacy-sensitive becomes exceedingly difficult. Thus, the most useful way to understand DPI is as a practice applied to application-level data — any data above OSI layer 3 that relates to an application — with an understanding that the inspection of OSI layer 4 data alone may incur minimal privacy risk.

Leave a Reply

Your email address will not be published. Required fields are marked *