Privacy in Web Search

Querying Web search engines is by far the most frequent activity performed by online users and consequently the one in which they are likely to reveal a significant amount of personal information. Protecting the privacy of Web requesters is thus becoming increasingly important. This is often done by using systems that guarantee unlinkability between the requester and her query. The most popular solution to reach this objective is the use of anonymous communication protocols (e.g., onion routing). However, various studies have shown that anonymity might not resist to machine learning attacks as the contain of requests might allow learning quasi-identifiers about the users. Thus, an adversary could link a query to her requester’s public profile. Other approaches guarantee unidentifiability of the user interests by generating noise (e.g., creating covert queries or adding extra keywords to the original queries). However, these solutions overload the network and decrease the accuracy of the results.

Our objective in this project is to allows a user to perform a private Web search resistant to machine learning attacks while keeping the relevance of the results as close as possible to those of the original request she performed.

Challenges & Contributions

To reach this objective we are investigating three main questions:

1) Is it necessary to obfuscate all the requests? If not, how to assess the sensitivity of a request?, i.e., the probability that the latter is linked with the user's public profile.

2) How to obfuscate sensitive requests without compromising accuracy?

3) How to hide the requester's identity from the search engine, without relying on costly anonymous communication protocols?

In our current work, to answer the questions above, we propose a three stage architecture (depicted in the figure below) composed of: (1) a Linkability Assessment module that analyses the risk that a request is re-associated with the identity of the requester. This module compares the request formulated by the user with its local profile and a group profil (an aggregated profile from the other users' histories computed in a privacy preserving manner by our system); (2) an Obfuscator module that protects the queries which have been flagged linkable by the linkability assessment. In order to minimize the impact of the obfuscation on the accuracy of the results, we choose a basic functionality of search engines that allows the use of logical propositions in the query. Specifically, we complete the current query with k other fake queries using logical OR propositions. A post-filtering finally decreases the number of irrelevant answers introduced by the fake queries; and (3) a Privacy Proxy that relies on two non-colluding servers to hide the requester identity from the search engine.

Contributors

External Collaborators

Grants

The presented work is developed within the EEXCESS project funded by the EU Seventh Framework Program, grant agreement number 600601.

Selected publications