Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

hkprivate information retrieval from privacyuncooperative queryable databases

Identifieur interne : 000595 ( Istex/Corpus ); précédent : 000594; suivant : 000596

hkprivate information retrieval from privacyuncooperative queryable databases

Auteurs : Josep Domingoferrer ; Agusti Solanas ; Jordi Castellroca

Source :

RBID : ISTEX:D232AD0CEB55B23DA54A35A2A5E395717283CF0C

Abstract

Purpose This paper aims to address the privacy problem associated with the use of internet search engines. The purpose of the paper is to propose and validate a set of methods and protocols to guarantee the privacy of users' queries. Designmethodologyapproach In this paper hkprivate information retrieval hkPIR is defined as a practical compromise between computational efficiency and privacy. Also presented are hkPIR protocols that can be used to query any database, which does not even need to know that the user is trying to preserve his or her privacy. Findings The proposed methods are able to properly protect the privacy of users' queries. When internet users apply the protocols, search engines e.g. Google are not able to determine unequivocally the real interests of their users. The quality of the results does decrease with the increase in privacy, but the obtained tradeoff is excellent. Practical implications Current private information retrieval PIR protocols suffer from two significant shortcomings their computational complexity is On where n is the number of records in the database, which precludes their use for very large databases and web search engines and they assume that the database server cooperates in the PIR protocol, which prevents deployment in reallife uncooperative settings. The proposed protocols overcome both problems. Originalityvalue This is the first set of protocols that offer practical protection for the privacy of the queries that internet users submit to an internet search engine. The proposal has been implemented and it will be released to the general public soon. It will help to protect the right to privacy of millions of internet users.

Url:
DOI: 10.1108/14684520910985693

Links to Exploration step

ISTEX:D232AD0CEB55B23DA54A35A2A5E395717283CF0C

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">hkprivate information retrieval from privacyuncooperative queryable databases</title>
<author>
<name sortKey="Domingoferrer, Josep" sort="Domingoferrer, Josep" uniqKey="Domingoferrer J" first="Josep" last="Domingoferrer">Josep Domingoferrer</name>
<affiliation>
<mods:affiliation>Department of Computer Engineering and Mathematics, Rovira i Virgili University, Tarragona, Spain</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Solanas, Agusti" sort="Solanas, Agusti" uniqKey="Solanas A" first="Agusti" last="Solanas">Agusti Solanas</name>
<affiliation>
<mods:affiliation>Department of Computer Engineering and Mathematics, Rovira i Virgili University, Tarragona, Spain</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Castellroca, Jordi" sort="Castellroca, Jordi" uniqKey="Castellroca J" first="Jordi" last="Castellroca">Jordi Castellroca</name>
<affiliation>
<mods:affiliation>Department of Computer Engineering and Mathematics, Rovira i Virgili University, Tarragona, Spain</mods:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:D232AD0CEB55B23DA54A35A2A5E395717283CF0C</idno>
<date when="2009" year="2009">2009</date>
<idno type="doi">10.1108/14684520910985693</idno>
<idno type="url">https://api.istex.fr/document/D232AD0CEB55B23DA54A35A2A5E395717283CF0C/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000595</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">hkprivate information retrieval from privacyuncooperative queryable databases</title>
<author>
<name sortKey="Domingoferrer, Josep" sort="Domingoferrer, Josep" uniqKey="Domingoferrer J" first="Josep" last="Domingoferrer">Josep Domingoferrer</name>
<affiliation>
<mods:affiliation>Department of Computer Engineering and Mathematics, Rovira i Virgili University, Tarragona, Spain</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Solanas, Agusti" sort="Solanas, Agusti" uniqKey="Solanas A" first="Agusti" last="Solanas">Agusti Solanas</name>
<affiliation>
<mods:affiliation>Department of Computer Engineering and Mathematics, Rovira i Virgili University, Tarragona, Spain</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Castellroca, Jordi" sort="Castellroca, Jordi" uniqKey="Castellroca J" first="Jordi" last="Castellroca">Jordi Castellroca</name>
<affiliation>
<mods:affiliation>Department of Computer Engineering and Mathematics, Rovira i Virgili University, Tarragona, Spain</mods:affiliation>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Online Information Review</title>
<idno type="ISSN">1468-4527</idno>
<imprint>
<publisher>Emerald Group Publishing Limited</publisher>
<date type="published" when="2009-08-07">2009-08-07</date>
<biblScope unit="volume">33</biblScope>
<biblScope unit="issue">4</biblScope>
<biblScope unit="page" from="720">720</biblScope>
<biblScope unit="page" to="744">744</biblScope>
</imprint>
<idno type="ISSN">1468-4527</idno>
</series>
<idno type="istex">D232AD0CEB55B23DA54A35A2A5E395717283CF0C</idno>
<idno type="DOI">10.1108/14684520910985693</idno>
<idno type="filenameID">2640330405</idno>
<idno type="original-pdf">2640330405.pdf</idno>
<idno type="href">14684520910985693.pdf</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">1468-4527</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract">Purpose This paper aims to address the privacy problem associated with the use of internet search engines. The purpose of the paper is to propose and validate a set of methods and protocols to guarantee the privacy of users' queries. Designmethodologyapproach In this paper hkprivate information retrieval hkPIR is defined as a practical compromise between computational efficiency and privacy. Also presented are hkPIR protocols that can be used to query any database, which does not even need to know that the user is trying to preserve his or her privacy. Findings The proposed methods are able to properly protect the privacy of users' queries. When internet users apply the protocols, search engines e.g. Google are not able to determine unequivocally the real interests of their users. The quality of the results does decrease with the increase in privacy, but the obtained tradeoff is excellent. Practical implications Current private information retrieval PIR protocols suffer from two significant shortcomings their computational complexity is On where n is the number of records in the database, which precludes their use for very large databases and web search engines and they assume that the database server cooperates in the PIR protocol, which prevents deployment in reallife uncooperative settings. The proposed protocols overcome both problems. Originalityvalue This is the first set of protocols that offer practical protection for the privacy of the queries that internet users submit to an internet search engine. The proposal has been implemented and it will be released to the general public soon. It will help to protect the right to privacy of millions of internet users.</div>
</front>
</TEI>
<istex>
<corpusName>emerald</corpusName>
<author>
<json:item>
<name>Josep DomingoFerrer</name>
<affiliations>
<json:string>Department of Computer Engineering and Mathematics, Rovira i Virgili University, Tarragona, Spain</json:string>
</affiliations>
</json:item>
<json:item>
<name>Agusti Solanas</name>
<affiliations>
<json:string>Department of Computer Engineering and Mathematics, Rovira i Virgili University, Tarragona, Spain</json:string>
</affiliations>
</json:item>
<json:item>
<name>Jordi CastellRoca</name>
<affiliations>
<json:string>Department of Computer Engineering and Mathematics, Rovira i Virgili University, Tarragona, Spain</json:string>
</affiliations>
</json:item>
</author>
<subject>
<json:item>
<lang>
<json:string>eng</json:string>
</lang>
<value>Information retrieval</value>
</json:item>
<json:item>
<lang>
<json:string>eng</json:string>
</lang>
<value>Search engines</value>
</json:item>
<json:item>
<lang>
<json:string>eng</json:string>
</lang>
<value>Internet</value>
</json:item>
</subject>
<language>
<json:string>eng</json:string>
</language>
<originalGenre>
<json:string>research-article</json:string>
</originalGenre>
<abstract>Purpose This paper aims to address the privacy problem associated with the use of internet search engines. The purpose of the paper is to propose and validate a set of methods and protocols to guarantee the privacy of users' queries. Designmethodologyapproach In this paper hkprivate information retrieval hkPIR is defined as a practical compromise between computational efficiency and privacy. Also presented are hkPIR protocols that can be used to query any database, which does not even need to know that the user is trying to preserve his or her privacy. Findings The proposed methods are able to properly protect the privacy of users' queries. When internet users apply the protocols, search engines e.g. Google are not able to determine unequivocally the real interests of their users. The quality of the results does decrease with the increase in privacy, but the obtained tradeoff is excellent. Practical implications Current private information retrieval PIR protocols suffer from two significant shortcomings their computational complexity is On where n is the number of records in the database, which precludes their use for very large databases and web search engines and they assume that the database server cooperates in the PIR protocol, which prevents deployment in reallife uncooperative settings. The proposed protocols overcome both problems. Originalityvalue This is the first set of protocols that offer practical protection for the privacy of the queries that internet users submit to an internet search engine. The proposal has been implemented and it will be released to the general public soon. It will help to protect the right to privacy of millions of internet users.</abstract>
<qualityIndicators>
<score>8</score>
<pdfVersion>1.3</pdfVersion>
<pdfPageSize>519 x 680 pts</pdfPageSize>
<refBibsNative>true</refBibsNative>
<keywordCount>3</keywordCount>
<abstractCharCount>1695</abstractCharCount>
<pdfWordCount>7016</pdfWordCount>
<pdfCharCount>44213</pdfCharCount>
<pdfPageCount>25</pdfPageCount>
<abstractWordCount>265</abstractWordCount>
</qualityIndicators>
<title>hkprivate information retrieval from privacyuncooperative queryable databases</title>
<genre>
<json:string>research-article</json:string>
</genre>
<host>
<volume>33</volume>
<publisherId>
<json:string>oir</json:string>
</publisherId>
<pages>
<last>744</last>
<first>720</first>
</pages>
<issn>
<json:string>1468-4527</json:string>
</issn>
<issue>4</issue>
<subject>
<json:item>
<value>Information & knowledge management</value>
</json:item>
<json:item>
<value>Information & communications technology</value>
</json:item>
<json:item>
<value>Internet</value>
</json:item>
<json:item>
<value>Library & information science</value>
</json:item>
<json:item>
<value>Collection building & management</value>
</json:item>
<json:item>
<value>Information behaviour & retrieval</value>
</json:item>
<json:item>
<value>Records management & preservation</value>
</json:item>
<json:item>
<value>Bibliometrics</value>
</json:item>
<json:item>
<value>Databases</value>
</json:item>
<json:item>
<value>Document management</value>
</json:item>
</subject>
<genre>
<json:string>journal</json:string>
</genre>
<language>
<json:string>unknown</json:string>
</language>
<title>Online Information Review</title>
<doi>
<json:string>10.1108/oir</json:string>
</doi>
</host>
<publicationDate>2009</publicationDate>
<copyrightDate>2009</copyrightDate>
<doi>
<json:string>10.1108/14684520910985693</json:string>
</doi>
<id>D232AD0CEB55B23DA54A35A2A5E395717283CF0C</id>
<score>0.119690284</score>
<fulltext>
<json:item>
<original>true</original>
<mimetype>application/pdf</mimetype>
<extension>pdf</extension>
<uri>https://api.istex.fr/document/D232AD0CEB55B23DA54A35A2A5E395717283CF0C/fulltext/pdf</uri>
</json:item>
<json:item>
<original>false</original>
<mimetype>application/zip</mimetype>
<extension>zip</extension>
<uri>https://api.istex.fr/document/D232AD0CEB55B23DA54A35A2A5E395717283CF0C/fulltext/zip</uri>
</json:item>
<istex:fulltextTEI uri="https://api.istex.fr/document/D232AD0CEB55B23DA54A35A2A5E395717283CF0C/fulltext/tei">
<teiHeader>
<fileDesc>
<titleStmt>
<title level="a" type="main" xml:lang="en">hkprivate information retrieval from privacyuncooperative queryable databases</title>
</titleStmt>
<publicationStmt>
<authority>ISTEX</authority>
<publisher>Emerald Group Publishing Limited</publisher>
<availability>
<p>EMERALD</p>
</availability>
<date>2009</date>
</publicationStmt>
<sourceDesc>
<biblStruct type="inbook">
<analytic>
<title level="a" type="main" xml:lang="en">hkprivate information retrieval from privacyuncooperative queryable databases</title>
<author>
<persName>
<forename type="first">Josep</forename>
<surname>DomingoFerrer</surname>
</persName>
<affiliation>Department of Computer Engineering and Mathematics, Rovira i Virgili University, Tarragona, Spain</affiliation>
</author>
<author>
<persName>
<forename type="first">Agusti</forename>
<surname>Solanas</surname>
</persName>
<affiliation>Department of Computer Engineering and Mathematics, Rovira i Virgili University, Tarragona, Spain</affiliation>
</author>
<author>
<persName>
<forename type="first">Jordi</forename>
<surname>CastellRoca</surname>
</persName>
<affiliation>Department of Computer Engineering and Mathematics, Rovira i Virgili University, Tarragona, Spain</affiliation>
</author>
</analytic>
<monogr>
<title level="j">Online Information Review</title>
<idno type="pISSN">1468-4527</idno>
<idno type="DOI">10.1108/oir</idno>
<imprint>
<publisher>Emerald Group Publishing Limited</publisher>
<date type="published" when="2009-08-07"></date>
<biblScope unit="volume">33</biblScope>
<biblScope unit="issue">4</biblScope>
<biblScope unit="page" from="720">720</biblScope>
<biblScope unit="page" to="744">744</biblScope>
</imprint>
</monogr>
<idno type="istex">D232AD0CEB55B23DA54A35A2A5E395717283CF0C</idno>
<idno type="DOI">10.1108/14684520910985693</idno>
<idno type="filenameID">2640330405</idno>
<idno type="original-pdf">2640330405.pdf</idno>
<idno type="href">14684520910985693.pdf</idno>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<creation>
<date>2009</date>
</creation>
<langUsage>
<language ident="en">en</language>
</langUsage>
<abstract>
<p>Purpose This paper aims to address the privacy problem associated with the use of internet search engines. The purpose of the paper is to propose and validate a set of methods and protocols to guarantee the privacy of users' queries. Designmethodologyapproach In this paper hkprivate information retrieval hkPIR is defined as a practical compromise between computational efficiency and privacy. Also presented are hkPIR protocols that can be used to query any database, which does not even need to know that the user is trying to preserve his or her privacy. Findings The proposed methods are able to properly protect the privacy of users' queries. When internet users apply the protocols, search engines e.g. Google are not able to determine unequivocally the real interests of their users. The quality of the results does decrease with the increase in privacy, but the obtained tradeoff is excellent. Practical implications Current private information retrieval PIR protocols suffer from two significant shortcomings their computational complexity is On where n is the number of records in the database, which precludes their use for very large databases and web search engines and they assume that the database server cooperates in the PIR protocol, which prevents deployment in reallife uncooperative settings. The proposed protocols overcome both problems. Originalityvalue This is the first set of protocols that offer practical protection for the privacy of the queries that internet users submit to an internet search engine. The proposal has been implemented and it will be released to the general public soon. It will help to protect the right to privacy of millions of internet users.</p>
</abstract>
<textClass>
<keywords scheme="keyword">
<list>
<head>Keywords</head>
<item>
<term>Information retrieval</term>
</item>
<item>
<term>Search engines</term>
</item>
<item>
<term>Internet</term>
</item>
</list>
</keywords>
</textClass>
<textClass>
<keywords scheme="Emerald Subject Group">
<list>
<label>cat-IKM</label>
<item>
<term>Information & knowledge management</term>
</item>
<label>cat-ICT</label>
<item>
<term>Information & communications technology</term>
</item>
<label>cat-INT</label>
<item>
<term>Internet</term>
</item>
</list>
</keywords>
</textClass>
<textClass>
<keywords scheme="Emerald Subject Group">
<list>
<label>cat-LISC</label>
<item>
<term>Library & information science</term>
</item>
<label>cat-CBM</label>
<item>
<term>Collection building & management</term>
</item>
<label>cat-IBRT</label>
<item>
<term>Information behaviour & retrieval</term>
</item>
<label>cat-RMP</label>
<item>
<term>Records management & preservation</term>
</item>
<label>cat-BIB</label>
<item>
<term>Bibliometrics</term>
</item>
<label>cat-DAT</label>
<item>
<term>Databases</term>
</item>
<label>cat-DOCM</label>
<item>
<term>Document management</term>
</item>
</list>
</keywords>
</textClass>
</profileDesc>
<revisionDesc>
<change when="2009-08-07">Published</change>
</revisionDesc>
</teiHeader>
</istex:fulltextTEI>
<json:item>
<original>false</original>
<mimetype>text/plain</mimetype>
<extension>txt</extension>
<uri>https://api.istex.fr/document/D232AD0CEB55B23DA54A35A2A5E395717283CF0C/fulltext/txt</uri>
</json:item>
</fulltext>
<metadata>
<istex:metadataXml wicri:clean="corpus emerald not found" wicri:toSee="no header">
<istex:xmlDeclaration>version="1.0" encoding="UTF-8"</istex:xmlDeclaration>
<istex:document><!-- Auto generated NISO JATS XML created by Atypon out of MCB DTD source files. Do Not Edit! -->
<article dtd-version="1.0" xml:lang="en" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">oir</journal-id>
<journal-id journal-id-type="doi">10.1108/oir</journal-id>
<journal-title-group>
<journal-title>Online Information Review</journal-title>
</journal-title-group>
<issn pub-type="ppub">1468-4527</issn>
<publisher>
<publisher-name>Emerald Group Publishing Limited</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.1108/14684520910985693</article-id>
<article-id pub-id-type="original-pdf">2640330405.pdf</article-id>
<article-id pub-id-type="filename">2640330405</article-id>
<article-categories>
<subj-group subj-group-type="type-of-publication">
<compound-subject>
<compound-subject-part content-type="code">research-article</compound-subject-part>
<compound-subject-part content-type="label">Research paper</compound-subject-part>
</compound-subject>
</subj-group>
<subj-group subj-group-type="subject">
<compound-subject>
<compound-subject-part content-type="code">cat-IKM</compound-subject-part>
<compound-subject-part content-type="label">Information & knowledge management</compound-subject-part>
</compound-subject>
<subj-group>
<compound-subject>
<compound-subject-part content-type="code">cat-ICT</compound-subject-part>
<compound-subject-part content-type="label">Information & communications technology</compound-subject-part>
</compound-subject>
<subj-group>
<compound-subject>
<compound-subject-part content-type="code">cat-INT</compound-subject-part>
<compound-subject-part content-type="label">Internet</compound-subject-part>
</compound-subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="subject">
<compound-subject>
<compound-subject-part content-type="code">cat-LISC</compound-subject-part>
<compound-subject-part content-type="label">Library & information science</compound-subject-part>
</compound-subject>
<subj-group>
<compound-subject>
<compound-subject-part content-type="code">cat-CBM</compound-subject-part>
<compound-subject-part content-type="label">Collection building & management</compound-subject-part>
</compound-subject>
<subj-group>
<compound-subject>
<compound-subject-part content-type="code">cat-BIB</compound-subject-part>
<compound-subject-part content-type="label">Bibliometrics</compound-subject-part>
</compound-subject>
<compound-subject>
<compound-subject-part content-type="code">cat-DAT</compound-subject-part>
<compound-subject-part content-type="label">Databases</compound-subject-part>
</compound-subject>
</subj-group>
</subj-group>
<subj-group>
<compound-subject>
<compound-subject-part content-type="code">cat-IBRT</compound-subject-part>
<compound-subject-part content-type="label">Information behaviour & retrieval</compound-subject-part>
</compound-subject>
</subj-group>
<subj-group>
<compound-subject>
<compound-subject-part content-type="code">cat-RMP</compound-subject-part>
<compound-subject-part content-type="label">Records management & preservation</compound-subject-part>
</compound-subject>
<subj-group>
<compound-subject>
<compound-subject-part content-type="code">cat-DOCM</compound-subject-part>
<compound-subject-part content-type="label">Document management</compound-subject-part>
</compound-subject>
</subj-group>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>
<italic>h</italic>
(
<italic>k</italic>
)‐private information retrieval from privacy‐uncooperative queryable databases</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<string-name>
<given-names>Josep</given-names>
<surname>Domingo‐Ferrer</surname>
</string-name>
<aff>Department of Computer Engineering and Mathematics, Rovira i Virgili University, Tarragona, Spain</aff>
</contrib>
<x></x>
<contrib contrib-type="author">
<string-name>
<given-names>Agusti</given-names>
<surname>Solanas</surname>
</string-name>
<aff>Department of Computer Engineering and Mathematics, Rovira i Virgili University, Tarragona, Spain</aff>
</contrib>
<x></x>
<contrib contrib-type="author">
<string-name>
<given-names>Jordi</given-names>
<surname>Castellà‐Roca</surname>
</string-name>
<aff>Department of Computer Engineering and Mathematics, Rovira i Virgili University, Tarragona, Spain</aff>
</contrib>
</contrib-group>
<pub-date pub-type="ppub">
<day>07</day>
<month>08</month>
<year>2009</year>
</pub-date>
<volume>33</volume>
<issue>4</issue>
<fpage>720</fpage>
<lpage>744</lpage>
<permissions>
<copyright-statement>© Emerald Group Publishing Limited</copyright-statement>
<copyright-year>2009</copyright-year>
<license license-type="publisher">
<license-p></license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="14684520910985693.pdf"></self-uri>
<abstract>
<sec>
<title content-type="abstract-heading">Purpose</title>
<x></x>
<p>This paper aims to address the privacy problem associated with the use of internet search engines. The purpose of the paper is to propose and validate a set of methods and protocols to guarantee the privacy of users' queries.</p>
</sec>
<sec>
<title content-type="abstract-heading">Design/methodology/approach</title>
<x></x>
<p>In this paper
<italic>h</italic>
(
<italic>k</italic>
)‐private information retrieval (
<italic>h</italic>
(
<italic>k</italic>
)‐PIR) is defined as a practical compromise between computational efficiency and privacy. Also presented are
<italic>h</italic>
(
<italic>k</italic>
)‐PIR protocols that can be used to query any database, which does not even need to know that the user is trying to preserve his or her privacy.</p>
</sec>
<sec>
<title content-type="abstract-heading">Findings</title>
<x></x>
<p>The proposed methods are able to properly protect the privacy of users' queries. When internet users apply the protocols, search engines (e.g. Google) are not able to determine unequivocally the real interests of their users. The quality of the results does decrease with the increase in privacy, but the obtained trade‐off is excellent.</p>
</sec>
<sec>
<title content-type="abstract-heading">Practical implications</title>
<x></x>
<p>Current private information retrieval (PIR) protocols suffer from two significant shortcomings: their computational complexity is
<italic>O</italic>
(
<italic>n</italic>
) where
<italic>n</italic>
is the number of records in the database, which precludes their use for very large databases and web search engines; and they assume that the database server cooperates in the PIR protocol, which prevents deployment in real‐life uncooperative settings. The proposed protocols overcome both problems.</p>
</sec>
<sec>
<title content-type="abstract-heading">Originality/value</title>
<x></x>
<p>This is the first set of protocols that offer practical protection for the privacy of the queries that internet users submit to an internet search engine. The proposal has been implemented and it will be released to the general public soon. It will help to protect the right to privacy of millions of internet users.</p>
</sec>
</abstract>
<kwd-group>
<kwd>Information retrieval</kwd>
<x>, </x>
<kwd>Search engines</kwd>
<x>, </x>
<kwd>Internet</kwd>
</kwd-group>
<custom-meta-group>
<custom-meta>
<meta-name>peer-reviewed</meta-name>
<meta-value>no</meta-value>
</custom-meta>
<custom-meta>
<meta-name>academic-content</meta-name>
<meta-value>yes</meta-value>
</custom-meta>
<custom-meta>
<meta-name>rightslink</meta-name>
<meta-value>included</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
<ack>
<p>The authors are with the UNESCO Chair in Data Privacy, but the views expressed in this article are those of the authors and do not necessarily reflect the position of UNESCO nor commit that organisation. Thanks go to Susana Bujalance for her help in implementing the
<italic>GooPIR</italic>
prototype and to Úrsula González‐Nicolás for her help in computing the outcomes of quality measures. This work was partly supported by the Spanish Government through projects CONSOLIDER INGENIO 2010 CSD2007‐00004 “ARES” and TSI2007‐65406‐C03‐01 “E‐AEGIS”, and by the Government of Catalonia under grant 2005 SGR 00446.</p>
</ack>
</front>
<body>
<sec>
<title>Introduction</title>
<p>Privacy in databases can be viewed as a three‐dimensional property (
<xref ref-type="bibr" rid="b6">Domingo‐Ferrer, 2007</xref>
), because a distinction can be made between respondent privacy (privacy of the respondents to whom the database records correspond), owner privacy (privacy of proprietary datasets used to conduct joint research between several data‐owning organisations) and user privacy (privacy of the queries submitted by database users). In the context of interactively queryable databases and in particular, internet search engines, the most rapidly growing concern is user privacy. This is especially so after scandals like the August 2006 disclosure by the AOL search engine of 20 million queries made by 658,000 users (
<xref ref-type="bibr" rid="b1">AOL, 2006</xref>
). It is estimated that over 223 million data records of US residents have been exposed due to security breaches since January 2005 (
<xref ref-type="bibr" rid="b12">Lane
<italic>et al.</italic>
, 2008</xref>
), so an attacker could access user profiles stored in the databases of search engines.</p>
<p>There are personal and corporate motivations for requiring user privacy:
<list list-type="bullet">
<list-item>
<label></label>
<p>
<italic>Private life</italic>
. Most users feel uncomfortable about being profiled by a search engine or a database.</p>
</list-item>
<list-item>
<label></label>
<p>
<italic>Industrial life</italic>
. If a company is querying a database containing patents or stock market quotations, the database manager can partially infer the company's next technological or financial moves if he or she knows the queries being submitted by the company.</p>
</list-item>
</list>
Private information retrieval (PIR) was invented in 1995 by
<xref ref-type="bibr" rid="b4 b5">Chor
<italic>et al.</italic>
(1995, 1998)</xref>
with the assumption that there are at least two copies of the same database, which do not communicate with each other. In those same papers, Chor
<italic>et al.</italic>
showed that single‐database PIR (that is, with a single copy) does not exist in the information‐theoretic sense. However, two years later,
<xref ref-type="bibr" rid="b11">Kushilevitz and Ostrovsky (1997)</xref>
presented a method for constructing single‐database PIR based on the algebraic properties of the Goldwasser‐Micali public‐key encryption scheme (
<xref ref-type="bibr" rid="b9">Goldwasser and Micali, 1984</xref>
). Subsequent developments in PIR are surveyed in
<xref ref-type="bibr" rid="b14">Ostrovsky and Skeith (2007)</xref>
.</p>
<p>In the PIR literature the database is usually modelled as a vector. The user wishes to retrieve the value of the
<italic>i</italic>
‐th component of the vector while keeping the index
<italic>i</italic>
hidden from the database. Thus it is assumed that the user knows the physical address of the sought item, which might be too strong an assumption in many practical situations. Keyword PIR (
<xref ref-type="bibr" rid="b3">Chor
<italic>et al.</italic>
, 1997</xref>
;
<xref ref-type="bibr" rid="b11">Kushilevitz and Ostrovsky, 1997</xref>
) is a more flexible form of PIR – the user can submit a query consisting of a keyword and no modification of the structure of the database is needed.</p>
<p>We claim that the PIR protocols proposed so far have two fundamental shortcomings that hinder their practical deployment:
<list list-type="order">
<list-item>
<label>1. </label>
<p>The database is assumed to contain
<italic>n</italic>
items and PIR protocols attempt to guarantee maximum privacy, that is, maximum server uncertainty on the index
<italic>i</italic>
of the record retrieved by the user. Thus the computational complexity of such PIR protocols is
<italic>O</italic>
(
<italic>n</italic>
), as proven by
<xref ref-type="bibr" rid="b4 b5">Chor
<italic>et al.</italic>
(1995, 1998)</xref>
. Intuitively, all records in the database must be “touched”; otherwise, the server could rule out some of the records when trying to discover
<italic>i</italic>
(an implicit standard assumption in PIR is that no side information is available to the server allowing it to rule out a touched record). For large databases, an
<italic>O</italic>
(
<italic>n</italic>
) computational cost is unaffordable (
<xref ref-type="bibr" rid="b2">Beimel
<italic>et al.</italic>
, 2004</xref>
).</p>
</list-item>
<list-item>
<label>2. </label>
<p>It is assumed that the database server cooperates in the PIR protocol. However, it is the user who is interested in his or her own privacy, whereas the motivation for the database server is dubious. Actually, PIR is likely to be unattractive to most companies running queryable databases, as it limits their profiling ability. This probably explains why no real instances of PIR‐enabled databases seem to exist.</p>
</list-item>
</list>
If one wishes to run PIR against a search engine, there is another shortcoming beyond the lack of server cooperation – the database cannot be modelled as a vector in which the user can be assumed to know the physical location of the keyword sought. Even keyword PIR does not really fit, as it still assumes a mapping between individual keywords and physical addresses (in fact, each keyword is used as an alias of a physical address). A search engine allowing only searches of individual keywords stored in this way would be much more limited than real engines like Google or Yahoo.</p>
<p>TrackMeNot (
<xref ref-type="bibr" rid="b10">Howe and Nissenbaum, 2009</xref>
) is a practical system based on the principle that a browser extension installed in the user's computer hides the user's actual queries in a cloud of automatic ghost queries submitted to popular search engines at different time intervals. While useful and affordable on a small scale, if the use of TrackMeNot became generalised, the overhead introduced by ghost queries would significantly degrade the performance of search engines and communications networks. Also, the submission timing of automatic ghost queries may be distinguishable from the submission timing of actual queries, which could provide an intruder with clues to identify the latter type of queries.</p>
<p>In this paper the notion of
<italic>h</italic>
(
<italic>k</italic>
)‐PIR is introduced as a pragmatic approach to private information retrieval from real search engines. Given a user's query consisting of one or several target keywords, the basic idea is to achieve some level of privacy by adding
<italic>k</italic>
−1 bogus keywords to the user's actual target keywords. In this way, the database or search engine cannot determine with certainty which is (are) the user's target keyword(s). The higher the number
<italic>k</italic>
−1 of bogus keywords and the more similar the frequency of the target and the bogus keywords, the more privacy. Then a protocol is proposed that enables a user to perform
<italic>h</italic>
(
<italic>k</italic>
)‐PIR against a web search engine or database server that is uncooperative in preserving the privacy of user queries. By “uncooperative”, we mean that the engine or server does not offer any specific PIR functionality (this is usually the case in reality); however, we assume that the server is semi‐honest, in the sense that the server may be interested in user profiling but will correctly answer any user query.</p>
</sec>
<sec>
<title>
<italic>h</italic>
(
<italic>k</italic>
)‐Private information retrieval</title>
<p>Defining privacy in terms of maximum index uncertainty is impractical if the database consists of billions of web pages indexed by a search engine. Even if the stored web pages could be modelled as a vector with
<italic>n</italic>
records,
<italic>n</italic>
would be so large that the
<italic>O</italic>
(
<italic>n</italic>
) computational complexity of a PIR query would be unaffordable.</p>
<p>Privacy requirements must be relaxed. A possible relaxation is
<italic>h</italic>
(
<italic>k</italic>
)‐PIR, whose idea is to ensure that the uncertainty of the database or search engine about which is the actual user target keyword is no less than a lower bound expressed in terms of a privacy parameter
<italic>k</italic>
. In what follows, Shannon's entropy
<xref ref-type="fn" rid="fn1">[1]</xref>
will be used to quantify this lower bound on query uncertainty, because it is the most common uncertainty measure.</p>
<sec>
<title>Definition 1:
<italic>h</italic>
(
<italic>k</italic>
)‐private information retrieval</title>
<p>Given a non‐negative integer
<italic>k</italic>
and a function
<italic>h</italic>
( · ) such that
<italic>h</italic>
(
<italic>k</italic>
)≥0, a query protocol against a queryable database or a search engine provides
<italic>h</italic>
(
<italic>k</italic>
)‐private information retrieval, or
<italic>h</italic>
(
<italic>k</italic>
)‐PIR for short, for a user query
<italic>q</italic>
<sub>0</sub>
if any intruder (including the database server or the search engine) views the user query as a random variable
<italic>Q</italic>
<sub>0</sub>
, whose Shannon's entropy satisfies
<italic>H</italic>
(
<italic>Q</italic>
<sub>0</sub>
)≥
<italic>h</italic>
(
<italic>k</italic>
).</p>
<p>In plain words, the above definition means that the query appears to the intruder as a variable
<italic>Q</italic>
<sub>0</sub>
, which can take several possible values, so that the intruder cannot unequivocally determine which was the value
<italic>q</italic>
<sub>0</sub>
actually submitted by the user.</p>
<p>
<italic>Example 1</italic>
. If in the intruder's view there are
<italic>k</italic>
queries that are equally likely candidates to be the target query
<italic>q</italic>
<sub>0</sub>
submitted by the user, then
<italic>H</italic>
(
<italic>Q</italic>
<sub>0</sub>
)=log 
<sub>2</sub>
<italic>k</italic>
and the protocol provides log 
<sub>2</sub>
<italic>k</italic>
‐privacy (the highest possible privacy with
<italic>k</italic>
candidates). If the probabilities of the
<italic>k</italic>
candidates being
<italic>q</italic>
<sub>0</sub>
are different,
<italic>H</italic>
(
<italic>Q</italic>
<sub>0</sub>
)< log 
<sub>2</sub>
<italic>k</italic>
and therefore
<italic>h</italic>
(
<italic>k</italic>
)‐privacy with
<italic>h</italic>
(
<italic>k</italic>
)< log 
<sub>2</sub>
<italic>k</italic>
is achieved. If the protocol is such that the intruder knows for sure which of the
<italic>k</italic>
candidates is
<italic>q</italic>
<sub>0</sub>
, then the protocol provides only 0‐privacy, that is, no privacy at all. Obviously, for any non‐negative
<italic>k</italic>
′ and any function
<italic></italic>
( · ) such that 0≦
<italic></italic>
(
<italic>k</italic>
′)≦
<italic>h</italic>
(
<italic>k</italic>
), a
<italic>h</italic>
(
<italic>k</italic>
)‐private protocol is also
<italic></italic>
(
<italic>k</italic>
′)‐private.</p>
<p>The computational complexity of protocols implementing
<italic>h</italic>
(
<italic>k</italic>
)‐PIR can be as low as
<italic>O</italic>
(
<italic>k</italic>
), whatever the size of the database. The user can camouflage the target query within a set of
<italic>k</italic>
equally likely or almost equally likely queries (the target query plus
<italic>k</italic>
−1 bogus queries); then the user submits the set of queries and finally filters out the results relevant to the target query.</p>
<p>Note that the relaxation introduced by
<italic>h</italic>
(
<italic>k</italic>
)‐PIR with respect to PIR is somewhat parallel to the one introduced by
<italic>k</italic>
‐anonymity (
<xref ref-type="bibr" rid="b15">Samarati and Sweeney, 1998</xref>
) with respect to total anonymity.</p>
</sec>
</sec>
<sec>
<title>Single‐user
<italic>h</italic>
(
<italic>k</italic>
)‐PIR from a privacy‐uncooperative queryable database</title>
<p>In this section we start with the assumption that queries consist of a single keyword – the query output is expected to consist of the database records or web URLs containing that keyword. (An extension for multi‐keyword queries is given later in this paper.) A relatively simple way to approach
<italic>h</italic>
(
<italic>k</italic>
)‐PIR has been suggested at the end of the previous section –
<italic>k</italic>
−1 bogus keywords are added to the target query before submitting it. Adding means OR‐ing the target keyword with the bogus keywords. Some considerations are in order here:
<list list-type="order">
<list-item>
<label>1. </label>
<p>It will be assumed that keywords belong to some designated language (e.g. English and/or a domain‐specific language).</p>
</list-item>
<list-item>
<label>2. </label>
<p>A public reference thesaurus with a large number
<italic>N</italic>
of words and proper nouns in the designated language will be used to draw bogus keywords from. The thesaurus should provide the keywords along with their relative frequencies. There are several alternatives to obtain such a thesaurus:</p>
</list-item>
</list>
<list list-type="bullet">
<list-item>
<label></label>
<p>Query logs are possibly the best option to build the thesaurus. Keyword frequencies are taken based on the keyword appearances in the log. A good example of a query log is the above‐mentioned search data released by
<xref ref-type="bibr" rid="b1">AOL (2006)</xref>
.</p>
</list-item>
<list-item>
<label></label>
<p>Online encyclopaedias, newspapers or other text collections are good alternatives for building the list of keywords in the thesaurus when no query log is available for the designated user language. Keyword frequencies can be taken based on the keyword appearances in the text collection or from the frequency output by a search engine (e.g. Google) when the keyword is looked up.</p>
</list-item>
<list-item>
<label></label>
<p>For queries in English, one can also follow the procedure suggested by
<xref ref-type="bibr" rid="b17">Staddon
<italic>et al.</italic>
(2007)</xref>
: use the British National Corpus (BNC) (
<xref ref-type="bibr" rid="b13">Leech
<italic>et al.</italic>
, 2001</xref>
) to stem lexical keywords (e.g. the keywords “use”, “used”, “uses” and “using” would all be mapped to the stem “use”); use the BNC again to associate to each keyword the frequency of the corresponding stem.</p>
</list-item>
</list>
<list list-type="order">
<list-item>
<label>1. </label>
<p>OR‐ing bogus queries to the target query results in some overhead, because the output returned by the search engine needs to be filtered out to suppress the portion related to the bogus queries.</p>
</list-item>
<list-item>
<label>2. </label>
<p>For privacy reasons, bogus query addition and output filtering must be done locally at the user's computer.</p>
</list-item>
</list>
As a first approach, a standalone user can run the following algorithm.</p>
<sec>
<title>Protocol 1 (Naïve(
<italic>q</italic>
<sub>0</sub>
,
<italic>k</italic>
))</title>
<p>
<list list-type="order">
<list-item>
<label>1. </label>
<p>N: To mask the real target keyword
<italic>q</italic>
<sub>0</sub>
, locally sample the thesaurus to randomly draw
<italic>k</italic>
−1 bogus keywords
<italic>q</italic>
<sub>1</sub>
, 
<italic>q</italic>
<sub>2</sub>
, 
<italic>Λ</italic>
, 
<italic>q</italic>
<sub>
<italic>k</italic>
−1</sub>
.</p>
</list-item>
<list-item>
<label>2. </label>
<p>Submit the query:
<xref ref-type="fig" rid="F_2640330405001">Equation 1</xref>
where
<italic>π</italic>
is a random permutation of the set {0,1,
<italic>Λ</italic>
,
<italic>k</italic>
−1}.</p>
</list-item>
<list-item>
<label>3. </label>
<p>Once the query results are received, keep those results where the keyword
<italic>q</italic>
<sub>0</sub>
appears and discard the rest.</p>
</list-item>
</list>
Note that the random permutation π in Protocol 1 is needed to hide the place of
<italic>q</italic>
<sub>0</sub>
in the submitted query.</p>
<p>After masking
<italic>q</italic>
<sub>0</sub>
with the bogus keywords
<italic>q</italic>
<sub>1</sub>
, 
<italic>q</italic>
<sub>2</sub>
, 
<italic>Λ</italic>
, 
<italic>q</italic>
<sub>
<italic>k</italic>
−1</sub>
, the database, search engine or, more generally, any intruder observing the masked query is uncertain about which one among the
<italic>k</italic>
keywords is the real target keyword
<italic>q</italic>
<sub>0</sub>
. In fact, one can say that an intruder views the target keyword as a random variable
<italic>Q</italic>
<sub>0</sub>
with possible outcomes
<italic>q</italic>
<sub>0</sub>
, 
<italic>q</italic>
<sub>1</sub>
, 
<italic>Λ</italic>
, 
<italic>q</italic>
<sub>
<italic>k</italic>
−1</sub>
. The intruder's uncertainty about the actual value of the target keyword can be measured as the entropy of
<italic>Q</italic>
<sub>0</sub>
– the more similar the frequencies of the
<italic>q</italic>
<sub>
<italic>i</italic>
</sub>
's, the higher the entropy, which makes sense because the intruder's uncertainty is higher (which is good).</p>
<p>This is formalised by the following lemma.</p>
<p>
<italic>Lemma 1</italic>
. Let
<italic>q</italic>
<sub>0</sub>
be the target keyword a user is interested in, and let
<italic>f</italic>
(
<italic>q</italic>
<sub>0</sub>
) be its relative frequency in the reference thesaurus. Let
<italic>q</italic>
<sub>1</sub>
, 
<italic>Λ</italic>
, 
<italic>q</italic>
<sub>
<italic>k</italic>
−1</sub>
be
<italic>k</italic>
−1 bogus keywords with relative frequencies
<italic>f</italic>
(
<italic>q</italic>
<sub>1</sub>
), 
<italic>Λ</italic>
<italic>f</italic>
(
<italic>q</italic>
<sub>
<italic>k</italic>
−1</sub>
), respectively, used to construct the privacy‐protected query of Expression (1). Assuming that the relative frequencies in the reference thesaurus represent well the relative frequencies of keywords in the natural or domain‐specific language used in the database queries, the privacy of the target query can be measured as the entropy:
<xref ref-type="fig" rid="F_2640330405002">Equation 2</xref>
where
<italic>Q</italic>
<sub>0</sub>
is a random variable representing the intruder's view of the target keyword and:
<xref ref-type="fig" rid="F_2640330405003">Equation 3</xref>
<italic>Proof</italic>
. Assume that an intruder tries to find the target keyword that is masked in the privacy‐preserving submitted query of Expression (1). The intruder views this target keyword as a random variable
<italic>Q</italic>
<sub>0</sub>
with
<italic>k</italic>
possible values
<italic>q</italic>
<sub>
<italic>π</italic>
(0)</sub>
,
<italic>q</italic>
<sub>
<italic>π</italic>
(1)</sub>
,
<italic>Λ</italic>
,
<italic>q</italic>
<sub>
<italic>π</italic>
(
<italic>k</italic>
−1)</sub>
with known probabilities
<italic>g</italic>
(
<italic>q</italic>
<sub>
<italic>π</italic>
(
<italic>i</italic>
)</sub>
)∈[0,1]for
<italic>i</italic>
=0 to
<italic>k</italic>
−1. Therefore, the uncertainty about the specific outcome of
<italic>Q</italic>
<sub>0</sub>
, that is, the privacy of
<italic>q</italic>
<sub>0</sub>
, coincides with the entropy of
<italic>Q</italic>
<sub>0</sub>
, that is:
<xref ref-type="fig" rid="F_2640330405004">Equation 4</xref>
Protocol 1 has at least two shortcomings:</p>
<p>It does not guarantee any lower bound for the privacy
<italic>H</italic>
(
<italic>Q</italic>
<sub>0</sub>
), which in the worst case could be as low as 0. Therefore, a priori the protocol can only guarantee trivial 0‐PIR (of course, after a particular set of bogus keywords have been chosen, the actual privacy offered is
<italic>H</italic>
(
<italic>Q</italic>
<sub>0</sub>
)).</p>
<p>If the same keyword
<italic>q</italic>
<sub>0</sub>
is queried several times, it will be masked with different random bogus keywords each time, which will make it easy for an intruder to re‐identify
<italic>q</italic>
<sub>0</sub>
. In a large thesaurus with
<italic>n</italic>
keywords, the probability of a keyword belonging to two different random samples of
<italic>k</italic>
keywords is negligible (see Lemma 2 in the Appendix), so a repeated keyword is most likely the target keyword
<italic>q</italic>
<sub>0</sub>
.</p>
<p>The user can circumvent the above shortcomings by running the following modified protocol.</p>
</sec>
<sec>
<title>Protocol 2 (Enhanced(
<italic>q</italic>
<sub>0</sub>
<italic>upwd</italic>
,
<italic>k</italic>
,
<italic>ϵ</italic>
))</title>
<p>
<list list-type="order">
<list-item>
<label>1. </label>
<p>Let
<italic>PRNG</italic>
be a cryptographically secure pseudo‐random number generator. Seed
<italic>PRNG</italic>
with the hash of the concatenation of the target keyword
<italic>q</italic>
<sub>0</sub>
and a user password
<italic>upwd</italic>
.</p>
</list-item>
<list-item>
<label>2. </label>
<p>Let the keywords in the thesaurus be ranked by increasing relative frequency.</p>
</list-item>
<list-item>
<label>3. </label>
<p>Let
<italic>f</italic>
(
<italic>q</italic>
<sub>0</sub>
) be the relative frequency of the target keyword
<italic>q</italic>
<sub>0</sub>
.</p>
</list-item>
<list-item>
<label>4. </label>
<p>Adjust
<italic>PRNG</italic>
to uniformly generate real numbers in the interval [max (0,
<italic>f</italic>
(
<italic>q</italic>
<sub>0</sub>
)−
<italic>ϵ</italic>
),
<italic>f</italic>
(
<italic>q</italic>
<sub>0</sub>
) ].</p>
</list-item>
<list-item>
<label>5. </label>
<p>Let
<italic>a</italic>
:=
<italic>PRNG</italic>
.</p>
</list-item>
<list-item>
<label>6. </label>
<p>If
<italic>a</italic>
+
<italic>ϵ</italic>
>1 or there are less than
<italic>k</italic>
keywords in the thesaurus with relative frequencies in the interval [
<italic>a</italic>
, 
<italic>a</italic>
+
<italic>ϵ</italic>
], then signal failure and exit the protocol.</p>
</list-item>
<list-item>
<label>7. </label>
<p>Let
<italic>M</italic>
<sub>1</sub>
<italic>k</italic>
be the number of keywords in the thesaurus with relative frequencies in [
<italic>a</italic>
, 
<italic>a</italic>
+
<italic>ϵ</italic>
]. Let
<italic>M</italic>
<sub>2</sub>
be the rank in the ranked thesaurus of the least frequent of the
<italic>M</italic>
<sub>1</sub>
keywords. Adjust
<italic>PRNG</italic>
to generate pseudo‐random numbers uniformly in the interval [
<italic>M</italic>
<sub>2</sub>
,
<italic>M</italic>
<sub>2</sub>
+
<italic>M</italic>
<sub>1</sub>
−1] (i.e. the interval containing the ranks of the
<italic>M</italic>
<sub>1</sub>
keywords).</p>
</list-item>
<list-item>
<label>8. </label>
<p>To mask the target keyword
<italic>q</italic>
<sub>0</sub>
, call the
<italic>PRNG</italic>
generator
<italic>K</italic>
−1 times to draw
<italic>K</italic>
−1 bogus keywords
<italic>q</italic>
<sub>1</sub>
, 
<italic>q</italic>
<sub>2</sub>
, 
<italic>Λ</italic>
, 
<italic>q</italic>
<sub>
<italic>k</italic>
−1</sub>
by (pseudo‐)randomly sampling without replacement from those keywords in the thesaurus with relative frequencies in [
<italic>a</italic>
, 
<italic>a</italic>
+
<italic>ϵ</italic>
].</p>
</list-item>
<list-item>
<label>9. </label>
<p>Submit the query
<xref ref-type="fig" rid="F_2640330405005">Equation 5</xref>
where π is a random permutation of the set {0,1,
<italic>Λ</italic>
,
<italic>k</italic>
−1}.</p>
</list-item>
<list-item>
<label>10. </label>
<p>Once the query results are received, keep those results where the keyword
<italic>q</italic>
<sub>0</sub>
appears and discard the rest.</p>
</list-item>
</list>
The complexity of Protocol 2 is clearly linear in
<italic>k</italic>
: each bogus keyword must be generated (Step 8) and the results related to the target keyword must be filtered out of the overall results for the
<italic>k</italic>
submitted keywords (Step 10). Some explanations of the rationale of the Protocol follow:
<list list-type="bullet">
<list-item>
<label></label>
<p>For a certain user with a user password
<italic>upwd</italic>
and a specific target keyword
<italic>q</italic>
<sub>0</sub>
the bogus keywords are always the same in successive queries. This prevents an intruder from finding
<italic>q</italic>
<sub>0</sub>
as the only repeated keyword in successive queries by the same user.</p>
</list-item>
<list-item>
<label></label>
<p>The user password
<italic>upwd</italic>
is used to customise the bogus keywords generated for a given target keyword
<italic>q</italic>
<sub>0</sub>
. This has a privacy‐preserving effect. Indeed, if the bogus keywords only depended on the target keyword, a user
<italic>u</italic>
<sub>1</sub>
who saw a query submitted by another user
<italic>u</italic>
<sub>2</sub>
whose
<italic>k</italic>
keywords are identical to those in a past query of his or her own would know that the target keyword in
<italic>u</italic>
<sub>2</sub>
's query is the same as in his or her own past query.</p>
</list-item>
<list-item>
<label></label>
<p>Protocol 2 finds
<italic>k</italic>
−1 bogus keywords whose relative frequencies are similar to the relative frequency of
<italic>q</italic>
<sub>0</sub>
. This guarantees the lower bound for the privacy
<italic>H</italic>
(
<italic>Q</italic>
<sub>0</sub>
) stated in Theorem 1 below.</p>
</list-item>
<list-item>
<label></label>
<p>Frequencies for the bogus items are taken in an interval [
<italic>a</italic>
, 
<italic>a</italic>
+
<italic>ϵ</italic>
] containing
<italic>f</italic>
(
<italic>q</italic>
<sub>0</sub>
). Using an interval centred in
<italic>f</italic>
(
<italic>q</italic>
<sub>0</sub>
) would be more intuitive, but certainly weaker – an intruder attempting to re‐identify
<italic>q</italic>
<sub>0</sub>
from the protected query could estimate
<italic>q</italic>
<sub>0</sub>
as the median keyword among those in Expression (3).</p>
</list-item>
</list>
The following theorem proves that nontrivial
<italic>h</italic>
(
<italic>k</italic>
)‐privacy is guaranteed
<italic>a priori</italic>
by Protocol 2. For a certain target keyword
<italic>q</italic>
<sub>0</sub>
to be protected, the lower privacy bound
<italic>h</italic>
(
<italic>k</italic>
) can guide the data protector in choosing
<italic>ϵ</italic>
and the number
<italic>k</italic>
−1 of bogus keywords in order to achieve a certain privacy level.</p>
</sec>
<sec>
<title>Theorem 1</title>
<p>Let
<italic>q</italic>
<sub>0</sub>
be the target keyword protected using Protocol 2 to generate the privacy‐protected query in Expression (3). Then the privacy
<italic>H</italic>
(
<italic>Q</italic>
<sub>0</sub>
) of
<italic>q</italic>
<sub>0</sub>
can be lower‐bounded as:
<xref ref-type="fig" rid="F_2640330405006">Equation 6</xref>
where:
<xref ref-type="fig" rid="F_2640330405007">Equation 7</xref>
<xref ref-type="fig" rid="F_2640330405008">Equation 8</xref>
with:
<xref ref-type="fig" rid="F_2640330405009">Equation 9</xref>
and
<italic>d</italic>
is the greatest integer such that (
<italic>k</italic>
<italic>d</italic>
)
<italic>A</italic>
+
<italic>dB</italic>
≦1.</p>
<p>
<italic>Proof</italic>
. Lemma 1 above can be used to write:
<xref ref-type="fig" rid="F_2640330405010">Equation 10</xref>
Using Jensen's inequality on the right‐hand side of Equation (4), we get:
<xref ref-type="fig" rid="F_2640330405011">Equation 11</xref>
The lower bound in Equation (5) is minimum when:
<xref ref-type="fig" rid="F_2640330405012">Equation 12</xref>
is maximum. Let
<italic>a</italic>
∈[max (0,
<italic>f</italic>
(
<italic>q</italic>
<sub>0</sub>
)−
<italic>ϵ</italic>
),
<italic>f</italic>
(
<italic>q</italic>
<sub>0</sub>
) ] be the pseudo‐random number computed at Step 5 of Protocol 2 such that
<italic>f</italic>
(
<italic>q</italic>
<sub>
<italic>i</italic>
</sub>
)∈[
<italic>a</italic>
,
<italic>a</italic>
+
<italic>ϵ</italic>
]⊆[0,1] for all
<italic>i</italic>
=0 to
<italic>k</italic>
−1. Then we have
<italic>g</italic>
(
<italic>q</italic>
<sub>
<italic>i</italic>
</sub>
)∈[
<italic>A</italic>
,
<italic>B</italic>
]⊆[0,1] where:
<xref ref-type="fig" rid="F_2640330405013">Equation 13</xref>
and:
<xref ref-type="fig" rid="F_2640330405014">Equation 14</xref>
On the other hand:
<xref ref-type="fig" rid="F_2640330405015">Equation 15</xref>
<xref ref-type="fig" rid="F_2640330405016">Equation 16</xref>
So Lemma 3 (see Appendix) can be used to find that the maximum of Expression (6) is reached for
<italic>g</italic>
(
<italic>q</italic>
<sub>
<italic>i</italic>
</sub>
)=
<italic>Λ</italic>
=
<italic>g</italic>
(
<italic>q</italic>
<sub>
<italic>d</italic>
</sub>
)=
<italic>B</italic>
,
<italic>g</italic>
(
<italic>q</italic>
<sub>
<italic>d</italic>
+1</sub>
)=1−
<italic>dB</italic>
−(
<italic>k</italic>
<italic>d</italic>
−1)
<italic>A</italic>
and
<italic>g</italic>
(
<italic>q</italic>
<sub>
<italic>d</italic>
+2</sub>
)=
<italic>Λ</italic>
=
<italic>g</italic>
(
<italic>q</italic>
<sub>
<italic>k</italic>
</sub>
)=
<italic>A</italic>
or any permutation of those assignments, where
<italic>d</italic>
<
<italic>k</italic>
is the greatest integer such that (
<italic>k</italic>
<italic>d</italic>
)
<italic>A</italic>
+
<italic>dB</italic>
≦1. The corresponding maximum value of Expression (6) is:
<xref ref-type="fig" rid="F_2640330405017">Equation 17</xref>
The value of Expression (7) varies depending on
<italic>A</italic>
and
<italic>B</italic>
. Actually, it is maximum when
<italic>A</italic>
is as small as possible and
<italic>B</italic>
is as large as possible, which for fixed
<italic>ϵ</italic>
and
<italic>k</italic>
happens when
<italic>a</italic>
is minimum, that is, for
<italic>a</italic>
<sub> min </sub>
=max (0,
<italic>f</italic>
(
<italic>q</italic>
<sub>0</sub>
)−
<italic>ϵ</italic>
).</p>
</sec>
<sec>
<title>An extension for multi‐keyword queries</title>
<p>Protocol 2 can easily be extended to handle multi‐keyword queries by making the following adaptations:
<list list-type="bullet">
<list-item>
<label></label>
<p>Replace the target keyword
<italic>q</italic>
<sub>0</sub>
with a Boolean expression
<italic>E</italic>
<sub>0</sub>
involving several keywords.</p>
</list-item>
<list-item>
<label></label>
<p>Replace
<italic>f</italic>
(
<italic>q</italic>
<sub>0</sub>
) with the relative frequency
<italic>f</italic>
(
<italic>E</italic>
<sub>0</sub>
) of
<italic>E</italic>
<sub>0</sub>
, computed using the standard algebra employed to find the probability of a Boolean expression of events; in particular, if keywords can be assumed to be independent and
<italic>E</italic>
<sub>0</sub>
is the logical AND of several keywords,
<italic>f</italic>
(
<italic>E</italic>
<sub>0</sub>
) can be computed as the product of the relative frequencies of the keywords.</p>
</list-item>
<list-item>
<label></label>
<p>Instead of bogus keywords
<italic>q</italic>
<sub>1</sub>
, 
<italic>q</italic>
<sub>2</sub>
, 
<italic>Λ</italic>
, 
<italic>q</italic>
<sub>
<italic>k</italic>
−1</sub>
, use bogus expressions
<italic>E</italic>
<sub>1</sub>
, 
<italic>E</italic>
<sub>2</sub>
<italic>Λ</italic>
, 
<italic>E</italic>
<sub>
<italic>k</italic>
−1</sub>
whose relative frequency lies in the interval [
<italic>a</italic>
,
<italic>a</italic>
+
<italic>ϵ</italic>
].</p>
</list-item>
<list-item>
<label></label>
<p>Submit the query:
<xref ref-type="fig" rid="F_2640330405018">Equation 18</xref>
</p>
</list-item>
<list-item>
<label></label>
<p>Once the query results are received, keep those results where
<italic>E</italic>
<sub>0</sub>
appears and discard the rest.</p>
</list-item>
</list>
</p>
</sec>
</sec>
<sec>
<title>The
<italic>GooPIR</italic>
prototype and empirical results</title>
<p>A prototype called
<italic>GooPIR</italic>
has been developed in Java JDK 6.0 Standard Edition to implement the scheme described in Protocol 2. The prototype accepts queries consisting of single keywords or queries consisting of a logical AND of several keywords (with the limitation that independence between the keywords must be a plausible assumption). GooPIR locally masks the target keyword(s), submits the masked query to the Google search engine and then locally filters the results relevant to the target keyword(s). A screenshot of the prototype can be seen in
<xref ref-type="fig" rid="F_2640330405024">Figure 1</xref>
. The figure presents the search results for the keyword “University”.</p>
<p>By clicking on the “Details” button, the following information about the execution of Protocol 2 is displayed:
<list list-type="bullet">
<list-item>
<label></label>
<p>The target query
<italic>q</italic>
<sub>0</sub>
=“University” and its relative frequency
<italic>f</italic>
(
<italic>q</italic>
<sub>0</sub>
)=0.00648 in the reference thesaurus (see below for details of how the thesaurus was obtained);</p>
</list-item>
<list-item>
<label></label>
<p>The width
<italic>ϵ</italic>
=0.001 of the frequency interval the bogus keywords are drawn from;</p>
</list-item>
<list-item>
<label></label>
<p>The interval [max (0,
<italic>f</italic>
(
<italic>q</italic>
<sub>0</sub>
)−
<italic>ϵ</italic>
),
<italic>f</italic>
(
<italic>q</italic>
<sub>0</sub>
) ]=[0.00548,0.00648], called “Alpha interval” in the prototype;</p>
</list-item>
<list-item>
<label></label>
<p>The pseudo‐random value
<italic>a</italic>
=0.00619, called “Alpha” in the prototype;</p>
</list-item>
<list-item>
<label></label>
<p>The search interval [
<italic>a</italic>
,
<italic>a</italic>
+
<italic>ϵ</italic>
]=[0.00619,0.00719];</p>
</list-item>
<list-item>
<label></label>
<p>The number
<italic>M</italic>
<sub>1</sub>
=34 of candidate bogus keywords with frequencies within the search interval;</p>
</list-item>
<list-item>
<label></label>
<p>The list of
<italic>k</italic>
−1=10 bogus keywords used for masking, as well as their frequencies;</p>
</list-item>
<list-item>
<label></label>
<p>The entropy of the masked entry, that is,
<italic>H</italic>
(
<italic>Q</italic>
<sub>0</sub>
)=3.4582 bits (for the parameters used, the a priori lower bound of the entropy guaranteed by Theorem 1 is 3.42412 bits).</p>
</list-item>
</list>
<italic>GooPIR</italic>
can work with any reference thesaurus (which may include proper nouns, non‐linguistic expressions, etc.). The results reported in this section were obtained using a reference thesaurus constructed as follows:
<list list-type="bullet">
<list-item>
<label></label>
<p>An initial thesaurus of 216,551 English words without relative frequencies was obtained from:
<ext-link ext-link-type="uri" xlink:href="http://www.ojohaven.com/fun/search.html">www.ojohaven.com/fun/search.html</ext-link>
</p>
</list-item>
<list-item>
<label></label>
<p>Prepositions, articles and conjunctions were eliminated from the thesaurus.</p>
</list-item>
<list-item>
<label></label>
<p>All news articles dated October 2007 available from the “Print Edition” section of the electronic newspaper WikiNews (
<ext-link ext-link-type="uri" xlink:href="http://en.wikinews.org/wiki/Wikinews:Print_edition">http://en.wikinews.org/wiki/Wikinews:Print_edition</ext-link>
) were converted from PDF to ASCII text.</p>
</list-item>
<list-item>
<label></label>
<p>GooPIR assigned to each word in the thesaurus its relative frequency of appearance in the news articles described above. The resulting
<italic>n</italic>
=9096 EMBED English words with non‐zero frequency were taken as the reference thesaurus with frequencies.</p>
</list-item>
</list>
<xref ref-type="fig" rid="F_2640330405026">Table I</xref>
reports additional empirical results for keywords in several frequency ranges and for several values of
<italic>ϵ</italic>
and
<italic>k</italic>
. For each choice of frequency,
<italic>ϵ</italic>
and
<italic>k</italic>
, the following information is given: search interval [
<italic>a</italic>
,
<italic>a</italic>
+
<italic>ϵ</italic>
], number
<italic>M</italic>
<sub>1</sub>
of candidate bogus keywords with frequencies in the search interval, entropy of the masked query and a priori lower bound for the entropy according to Theorem 1. Note that one gets a trivial a priori lower bound equal to zero whenever
<italic>f</italic>
(
<italic>q</italic>
<sub>0</sub>
)≦
<italic>ϵ</italic>
(in this case
<italic>a</italic>
<sub> min </sub>
=0 and the lower bound is zero too). For conciseness, the bogus keywords used for masking have been omitted; however, their sequence is always the same for a certain target keyword in successive queries (regardless of
<italic>k</italic>
). For example, for
<italic>q</italic>
<sub>0</sub>
=“theater” with the parameters in the table, we get
<italic>q</italic>
<sub>1</sub>
=“vocally” (with frequency
<italic>f</italic>
(
<italic>q</italic>
<sub>1</sub>
)=0.0001),
<italic>q</italic>
<sub>2</sub>
=“murderers” (with frequency
<italic>f</italic>
(
<italic>q</italic>
<sub>2</sub>
)=0.0001),
<italic>q</italic>
<sub>3</sub>
=“dollar” (with frequency
<italic>f</italic>
(
<italic>q</italic>
<sub>3</sub>
)=0.00098),
<italic>q</italic>
<sub>4</sub>
=“biodiversity” (with frequency
<italic>f</italic>
(
<italic>q</italic>
<sub>4</sub>
)=0.00043), etc.</p>
<sec>
<title>Quality assessment of the results</title>
<p>Offering privacy to users by hiding the target keyword among other bogus keywords would not be attractive if the toll for privacy preservation was a substantial alteration of the query results. In order to assess quality, we have compared the query results obtained with and without our method.</p>
<p>The following notation is used in this section:</p>
<p>B:
<italic>U</italic>
: The set of URLs obtained with Google.</p>
<p>
<italic></italic>
: The set of URLs obtained with GooPIR.</p>
<p>
<italic>N</italic>
<sub>
<italic>URL</italic>
</sub>
: The cardinality of
<italic>U</italic>
and
<italic></italic>
(the first
<italic>N</italic>
<sub>
<italic>URL</italic>
</sub>
URLs for both outputs are considered).</p>
<p>We propose three simple quality measures. The first measure is the percentage of coincidences between
<italic>U</italic>
and
<italic></italic>
, that is:
<xref ref-type="fig" rid="F_2640330405019">Equation 19</xref>
where
<italic>Card</italic>
( · ) is used to denote set cardinality.</p>
<p>The second measure is the average distance between coincidences, that is:
<xref ref-type="fig" rid="F_2640330405020">Equation 20</xref>
where
<italic>r</italic>
<sub>
<italic>U</italic>
</sub>
(
<italic>u</italic>
) is the rank of
<italic>u</italic>
within
<italic>U</italic>
and
<italic>r</italic>
<sub>
<italic></italic>
</sub>
(
<italic>u</italic>
) is the rank of
<italic>u</italic>
within
<italic></italic>
. The third measure is the variance of the rank differences between the results, which are both in
<italic>U</italic>
and
<italic></italic>
, that is:
<xref ref-type="fig" rid="F_2640330405021">Equation 21</xref>
The outcome of the above measures varies depending on the relative frequency of the searched keyword. To study the influence of the frequency on the quality of the results, we have defined the following cases:
<list list-type="bullet">
<list-item>
<label></label>
<p>Low‐frequency keywords, whose frequency lies in the first quartile.</p>
</list-item>
<list-item>
<label></label>
<p>Medium‐frequency keywords, whose frequency lies in the second or third quartiles.</p>
</list-item>
<list-item>
<label></label>
<p>High‐frequency keywords, whose frequency lies in the fourth quartile.</p>
</list-item>
</list>
The following algorithm has been used to assess the quality of our results for different keyword frequencies (the number of queries was
<italic>NQ</italic>
=10 and the number of query results was
<italic>N</italic>
<sub>
<italic>URL</italic>
</sub>
=25).</p>
</sec>
<sec>
<title>Algorithm 1 (Assessing the quality of GooPIR results)</title>
<p>
<list list-type="order">
<list-item>
<label>1. </label>
<p>INPUT: Number of queries (
<italic>NQ</italic>
), Frequency (
<italic>F</italic>
, Number of results
<italic>N</italic>
<sub>
<italic>URL</italic>
</sub>
</p>
</list-item>
<list-item>
<label>2. </label>
<p>
<italic>i</italic>
←0</p>
</list-item>
<list-item>
<label>3. </label>
<p>While (
<italic>i</italic>
<
<italic>NQ</italic>
):</p>
</list-item>
</list>
<list list-type="bullet">
<list-item>
<label></label>
<p>
<italic>W</italic>
<italic>SelectRandomKeyword</italic>
(
<italic>F</italic>
)</p>
</list-item>
<list-item>
<label></label>
<p>
<italic></italic>
←Send (
<italic>W</italic>
) to
<italic>GooPIR</italic>
</p>
</list-item>
<list-item>
<label></label>
<p>
<italic>U</italic>
← Send (
<italic>W</italic>
) to Google</p>
</list-item>
<list-item>
<label></label>
<p>
<italic></italic>
←GetTheFirstResults(
<italic></italic>
,
<italic>N</italic>
<sub>
<italic>URL</italic>
</sub>
)</p>
</list-item>
<list-item>
<label></label>
<p>
<italic>U</italic>
←GetTheFirstResults(
<italic>U</italic>
,
<italic>N</italic>
<sub>
<italic>URL</italic>
</sub>
)</p>
</list-item>
<list-item>
<label></label>
<p>
<italic>M</italic>
<sub>1</sub>
←Compute
<italic>QM</italic>
<sub>1</sub>
(
<italic></italic>
,
<italic>U</italic>
)</p>
</list-item>
<list-item>
<label></label>
<p>
<italic>M</italic>
<sub>2</sub>
←Compute
<italic>QM</italic>
<sub>2</sub>
(
<italic></italic>
,
<italic>U</italic>
)</p>
</list-item>
<list-item>
<label></label>
<p>
<italic>M</italic>
<sub>3</sub>
←Compute
<italic>QM</italic>
<sub>3</sub>
(
<italic></italic>
,
<italic>U</italic>
)</p>
</list-item>
<list-item>
<label></label>
<p>
<italic>i</italic>
:=
<italic>i</italic>
+1</p>
</list-item>
</list>
<list list-type="order">
<list-item>
<label>1. </label>
<p>Return set of measurements for
<italic>QM</italic>
<sub>1</sub>
,
<italic>QM</italic>
<sub>2</sub>
and
<italic>QM</italic>
<sub>3</sub>
</p>
</list-item>
</list>
<xref ref-type="fig" rid="F_2640330405027 F_2640330405028 F_2640330405029">Tables II‐IV</xref>
show the outcome of
<italic>QM</italic>
<sub>1</sub>
,
<italic>QM</italic>
<sub>2</sub>
and
<italic>QM</italic>
<sub>3</sub>
for ten randomly selected keywords with respectively low, medium and high frequencies. It can be observed that the quality of the results decreases when the number
<italic>K</italic>
−1 of bogus keywords increases (see
<xref ref-type="fig" rid="F_2640330405025">Figure 2</xref>
). However, the average of
<italic>QM</italic>
<sub>1</sub>
(percentage of coincidences) is higher than 40 per cent even when the number of bogus keywords is high (e.g.
<italic>K</italic>
=10 keywords). In addition, the average of
<italic>QM</italic>
<sub>2</sub>
(rank distance) is always lower than
<italic>QM</italic>
<sub>3</sub>
. As could be expected, the average of
<italic>QM</italic>
<sub>3</sub>
(rank variance) tends to grow with
<italic>k</italic>
. It is apparent from
<xref ref-type="fig" rid="F_2640330405025">Figure 2</xref>
that the lower the frequency the better the quality.</p>
<p>The above measures show that, overall, the search results obtained with
<italic>GooPIR</italic>
are quite good. We next focus our analysis on the search results most relevant to the user, namely those results which appear first, that is, with lowest ranks. We have defined blocks of five results each:
<list list-type="order">
<list-item>
<label>1. </label>
<p>Block 1: Results 0 to 4</p>
</list-item>
<list-item>
<label>2. </label>
<p>Block 2: Results 5 to 9</p>
</list-item>
<list-item>
<label>3. </label>
<p>Block 3: Results 10 to 14</p>
</list-item>
<list-item>
<label>4. </label>
<p>Block 4: Results 15 to 19</p>
</list-item>
<list-item>
<label>5. </label>
<p>Block 5: Results 20 to 24</p>
</list-item>
</list>
<xref ref-type="fig" rid="F_2640330405030 F_2640330405031 F_2640330405032 F_2640330405033 F_2640330405034 F_2640330405035">Tables V‐X</xref>
show some blockwise quality measurements of the
<italic>GooPIR</italic>
results for low, medium and high frequencies, respectively. We have observed that the average of
<italic>QM</italic>
<sub>1</sub>
(average percentage of coincidences) in the first block is very high (i.e. between 65 per cent and 86 per cent) and the quality of the results tends to decrease in blocks with higher ranks. In addition, the influence of the frequency seems to be the same as in the non‐blockwise analysis – better quality is obtained for lower frequencies.</p>
</sec>
</sec>
<sec>
<title>Conclusions and future work</title>
<p>In this paper the concept of
<italic>h</italic>
(
<italic>k</italic>
)‐private information retrieval or
<italic>h</italic>
(
<italic>k</italic>
)‐PIR has been introduced as a pragmatic approach to offer users some query privacy in real databases and search engines. Then two protocols for keywords (a naïve one and an enhanced one) have been presented that do not assume any cooperation from the database in protecting the privacy of user queries; the second one offers nontrivial
<italic>h</italic>
(
<italic>k</italic>
)‐PIR and can be extended to handle multi‐keyword queries. The proposed protocols rely on a public thesaurus of keywords labelled with their relative frequencies. We have also presented empirical results obtained with a prototype implementing the enhanced protocol to provide keyword
<italic>h</italic>
(
<italic>k</italic>
)‐PIR when querying internet search engines.</p>
<p>However simple, the presented approach to obtain some user privacy when querying a privacy‐uncooperative database server or search engine seems to be new in the literature.</p>
<p>Future work may include:
<list list-type="bullet">
<list-item>
<label></label>
<p>Adding semantic diversity requirements to the frequency requirements when choosing the bogus keywords used to mask a given target keyword. A possible measure of semantic diversity is the hierarchical nominal variance defined in
<xref ref-type="bibr" rid="b7">Domingo‐Ferrer and Solanas (2008)</xref>
.</p>
</list-item>
<list-item>
<label></label>
<p>Pre‐processing the thesaurus to create clusters of keywords with similar frequency, so that selecting the bogus keywords is faster. Given a target keyword, a entire cluster of thesaurus keywords is taken as bogus keywords. Clusters must consist of
<italic>k</italic>
−1 (or more) keywords, and can be computed by using microaggregation (
<xref ref-type="bibr" rid="b8">Domingo‐Ferrer
<italic>et al.</italic>
, 2008</xref>
).</p>
</list-item>
</list>
</p>
</sec>
<sec>
<fig position="float" id="F_2640330405001">
<caption>
<p>Equation 1</p>
</caption>
<graphic xlink:href="2640330405001.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405002">
<caption>
<p>Equation 2</p>
</caption>
<graphic xlink:href="2640330405002.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405003">
<caption>
<p>Equation 3</p>
</caption>
<graphic xlink:href="2640330405003.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405004">
<caption>
<p>Equation 4</p>
</caption>
<graphic xlink:href="2640330405004.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405005">
<caption>
<p>Equation 5</p>
</caption>
<graphic xlink:href="2640330405005.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405006">
<caption>
<p>Equation 6</p>
</caption>
<graphic xlink:href="2640330405006.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405007">
<caption>
<p>Equation 7</p>
</caption>
<graphic xlink:href="2640330405007.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405008">
<caption>
<p>Equation 8</p>
</caption>
<graphic xlink:href="2640330405008.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405009">
<caption>
<p>Equation 9</p>
</caption>
<graphic xlink:href="2640330405009.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405010">
<caption>
<p>Equation 10</p>
</caption>
<graphic xlink:href="2640330405010.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405011">
<caption>
<p>Equation 11</p>
</caption>
<graphic xlink:href="2640330405011.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405012">
<caption>
<p>Equation 12</p>
</caption>
<graphic xlink:href="2640330405012.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405013">
<caption>
<p>Equation 13</p>
</caption>
<graphic xlink:href="2640330405013.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405014">
<caption>
<p>Equation 14</p>
</caption>
<graphic xlink:href="2640330405014.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405015">
<caption>
<p>Equation 15</p>
</caption>
<graphic xlink:href="2640330405015.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405016">
<caption>
<p>Equation 16</p>
</caption>
<graphic xlink:href="2640330405016.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405017">
<caption>
<p>Equation 17</p>
</caption>
<graphic xlink:href="2640330405017.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405018">
<caption>
<p>Equation 18</p>
</caption>
<graphic xlink:href="2640330405018.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405019">
<caption>
<p>Equation 19</p>
</caption>
<graphic xlink:href="2640330405019.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405020">
<caption>
<p>Equation 20</p>
</caption>
<graphic xlink:href="2640330405020.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405021">
<caption>
<p>Equation 21</p>
</caption>
<graphic xlink:href="2640330405021.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405022">
<caption>
<p>Equation 22</p>
</caption>
<graphic xlink:href="2640330405022.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405023">
<caption>
<p>Equation 23</p>
</caption>
<graphic xlink:href="2640330405023.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405024">
<label>
<bold>Figure 1
<x> </x>
</bold>
</label>
<caption>
<p>The main page of the
<italic>GooPIR</italic>
prototype</p>
</caption>
<graphic xlink:href="2640330405024.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405025">
<label>
<bold>Figure 2
<x> </x>
</bold>
</label>
<caption>
<p>Average percentage of coincidences for different values of
<italic>k</italic>
and different frequencies</p>
</caption>
<graphic xlink:href="2640330405025.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405026">
<label>
<bold>Table I
<x> </x>
</bold>
</label>
<caption>
<p>
<italic>GooPIR</italic>
empirical results for keywords in several frequency ranges and several values of
<italic>ϵ</italic>
and
<italic>k</italic>
</p>
</caption>
<graphic xlink:href="2640330405026.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405027">
<label>
<bold>Table II
<x> </x>
</bold>
</label>
<caption>
<p>Quality analysis for some low‐frequency keywords</p>
</caption>
<graphic xlink:href="2640330405027.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405028">
<label>
<bold>Table III
<x> </x>
</bold>
</label>
<caption>
<p>Quality analysis for some medium‐frequency keywords</p>
</caption>
<graphic xlink:href="2640330405028.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405029">
<label>
<bold>Table IV
<x> </x>
</bold>
</label>
<caption>
<p>Quality analysis for some high‐frequency keywords</p>
</caption>
<graphic xlink:href="2640330405029.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405030">
<label>
<bold>Table V
<x> </x>
</bold>
</label>
<caption>
<p>Quality analysis for
<italic>k</italic>
=2 of the URLs returned by
<italic>GooPIR</italic>
for low‐frequency keywords</p>
</caption>
<graphic xlink:href="2640330405030.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405031">
<label>
<bold>Table VI
<x> </x>
</bold>
</label>
<caption>
<p>Quality analysis for
<italic>k</italic>
=6 of the URLs returned by
<italic>GooPIR</italic>
for low‐frequency keywords</p>
</caption>
<graphic xlink:href="2640330405031.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405032">
<label>
<bold>Table VII
<x> </x>
</bold>
</label>
<caption>
<p>Quality analysis for
<italic>k</italic>
=2 of the URLs returned by
<italic>GooPIR</italic>
for medium‐frequency keywords</p>
</caption>
<graphic xlink:href="2640330405032.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405033">
<label>
<bold>Table VIII
<x> </x>
</bold>
</label>
<caption>
<p>Quality analysis for
<italic>k</italic>
=6 of the URLs returned by
<italic>GooPIR</italic>
for medium‐frequency keywords</p>
</caption>
<graphic xlink:href="2640330405033.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405034">
<label>
<bold>Table IX
<x> </x>
</bold>
</label>
<caption>
<p>Quality analysis for
<italic>k</italic>
=2 of the URLs returned by
<italic>GooPIR</italic>
for high‐frequency keywords</p>
</caption>
<graphic xlink:href="2640330405034.tif"></graphic>
</fig>
</sec>
<sec>
<fig position="float" id="F_2640330405035">
<label>
<bold>Table X
<x> </x>
</bold>
</label>
<caption>
<p>Quality analysis for
<italic>k</italic>
=6 of the URLs returned by
<italic>GooPIR</italic>
for high‐frequency keywords</p>
</caption>
<graphic xlink:href="2640330405035.tif"></graphic>
</fig>
</sec>
</body>
<back>
<fn-group>
<title>Notes</title>
<fn id="fn1">
<p>Given a discrete random variable
<italic>X</italic>
and probability function
<italic>p</italic>
, the entropy
<italic>H</italic>
(
<italic>X</italic>
) of
<italic>X</italic>
is defined as
<italic>H</italic>
(
<italic>X</italic>
)=−∑
<sub>
<italic>i</italic>
=1</sub>
<sup>
<italic>n</italic>
</sup>
<italic>p</italic>
(
<italic>x</italic>
<sub>
<italic>i</italic>
</sub>
)
<italic>log</italic>
<sub>2</sub>
<italic>p</italic>
(
<italic>x</italic>
<sub>
<italic>i</italic>
</sub>
), where
<italic>x</italic>
<sub>1</sub>
,ċ,
<italic>x</italic>
<sub>
<italic>n</italic>
</sub>
are the outcomes of
<italic>X</italic>
having non‐zero probability (see
<xref ref-type="bibr" rid="b16">Shannon, 1948</xref>
). Shannon's entropy measures the uncertainty about the specific outcome that will be obtained when the random variable
<italic>X</italic>
is sampled.</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="b1">
<mixed-citation>
<person-group person-group-type="author">
<string-name>AOL</string-name>
</person-group>
(
<year>2006</year>
), “AOL keyword searches” (online), available at:
<ext-link ext-link-type="uri" xlink:href="http://dontdelete.com/default.asp">http://dontdelete.com/default.asp</ext-link>
(accessed 2 December 2008).</mixed-citation>
</ref>
<ref id="b2">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Beimel</surname>
,
<given-names>A.</given-names>
</string-name>
</person-group>
,
<person-group person-group-type="author">
<string-name>
<surname>Ishai</surname>
,
<given-names>Y.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="author">
<string-name>
<surname>Malkin</surname>
,
<given-names>T.</given-names>
</string-name>
</person-group>
(
<year>2004</year>
), “
<article-title>
<italic>Reducing the servers computation in private information retrieval: PIR with preprocessing</italic>
</article-title>
”,
<source>
<italic>Journal of Cryptology</italic>
</source>
, Vol.
<volume>17</volume>
, pp.
<fpage>125</fpage>
<x></x>
<lpage>51</lpage>
.</mixed-citation>
</ref>
<ref id="b3">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Chor</surname>
,
<given-names>B.</given-names>
</string-name>
</person-group>
,
<person-group person-group-type="author">
<string-name>
<surname>Gilboa</surname>
,
<given-names>N.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="author">
<string-name>
<surname>Naor</surname>
,
<given-names>M.</given-names>
</string-name>
</person-group>
(
<year>1997</year>
),
<source>
<italic>Technical Report TR CS0917</italic>
</source>
, Private information retrieval by keywords,
<publisher-name>Department of Computer Science, Technion, Israel Institute of Technology</publisher-name>
,
<publisher-loc>Haifa</publisher-loc>
.</mixed-citation>
</ref>
<ref id="b4">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Chor</surname>
,
<given-names>B.</given-names>
</string-name>
</person-group>
,
<person-group person-group-type="author">
<string-name>
<surname>Goldreich</surname>
,
<given-names>O.</given-names>
</string-name>
</person-group>
,
<person-group person-group-type="author">
<string-name>
<surname>Kushilevitz</surname>
,
<given-names>E.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="author">
<string-name>
<surname>Sudan</surname>
,
<given-names>M.</given-names>
</string-name>
</person-group>
(
<year>1995</year>
), “Private information retrieval”, in
<italic>Proccedings of the 36th Annual Symposium on Foundations of Computer Science (FOCS), 23‐25 October, IEEE</italic>
, pp. 41‐50.</mixed-citation>
</ref>
<ref id="b5">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Chor</surname>
,
<given-names>B.</given-names>
</string-name>
</person-group>
,
<person-group person-group-type="author">
<string-name>
<surname>Goldreich</surname>
,
<given-names>O.</given-names>
</string-name>
</person-group>
,
<person-group person-group-type="author">
<string-name>
<surname>Kushilevitz</surname>
,
<given-names>E.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="author">
<string-name>
<surname>Sudan</surname>
,
<given-names>M.</given-names>
</string-name>
</person-group>
(
<year>1998</year>
), “
<article-title>
<italic>Private information retrieval</italic>
</article-title>
”,
<source>
<italic>Journal of the ACM</italic>
</source>
, Vol.
<volume>45</volume>
, pp.
<fpage>965</fpage>
<x></x>
<lpage>81</lpage>
.</mixed-citation>
</ref>
<ref id="b6">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Domingo‐Ferrer</surname>
,
<given-names>J.</given-names>
</string-name>
</person-group>
(
<year>2007</year>
), “
<article-title>
<italic>A three‐dimensional conceptual framework for database privacy, in Secure Data Management – 4th VLDB Workshop (SDM 2007)</italic>
</article-title>
”,
<source>
<italic>Lecture Notes in Computer Science</italic>
</source>
,
<volume>Vol. 4721</volume>
,
<publisher-name>Springer</publisher-name>
,
<publisher-loc>Berlin and Heidelberg</publisher-loc>
, pp.
<fpage>193</fpage>
<x></x>
<lpage>202</lpage>
.</mixed-citation>
</ref>
<ref id="b7">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Domingo‐Ferrer</surname>
,
<given-names>J.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="author">
<string-name>
<surname>Solanas</surname>
,
<given-names>A.</given-names>
</string-name>
</person-group>
(
<year>2008</year>
), “
<article-title>
<italic>A measure of variance for nominal hierarchical attributes</italic>
</article-title>
”,
<source>
<italic>Information Sciences</italic>
</source>
, Vol.
<volume>178</volume>
, pp.
<fpage>4644</fpage>
<x></x>
<lpage>55</lpage>
.</mixed-citation>
</ref>
<ref id="b8">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Domingo‐Ferrer</surname>
,
<given-names>J.</given-names>
</string-name>
</person-group>
,
<person-group person-group-type="author">
<string-name>
<surname>Sebé</surname>
,
<given-names>F.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="author">
<string-name>
<surname>Solanas</surname>
,
<given-names>A.</given-names>
</string-name>
</person-group>
(
<year>2008</year>
), “
<article-title>
<italic>A polynomial‐time approximation to optimal multivariate microaggregation</italic>
</article-title>
”,
<source>
<italic>Computers and Mathematics with Applications</italic>
</source>
, Vol.
<volume>55</volume>
, pp.
<fpage>714</fpage>
<x></x>
<lpage>32</lpage>
.</mixed-citation>
</ref>
<ref id="b9">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Goldwasser</surname>
,
<given-names>S.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="author">
<string-name>
<surname>Micali</surname>
,
<given-names>S.</given-names>
</string-name>
</person-group>
(
<year>1984</year>
), “
<article-title>
<italic>Probabilistic encryption</italic>
</article-title>
”,
<source>
<italic>Journal of Computer and Systems Science</italic>
</source>
, Vol.
<volume>28</volume>
No.
<issue>1</issue>
, pp.
<fpage>270</fpage>
<x></x>
<lpage>99</lpage>
.</mixed-citation>
</ref>
<ref id="b10">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Howe</surname>
,
<given-names>D.C.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="author">
<string-name>
<surname>Nissenbaum</surname>
,
<given-names>H.</given-names>
</string-name>
</person-group>
(
<year>2009</year>
), “
<article-title>
<italic>TrackMeNot: resisting surveillance in web search</italic>
</article-title>
”, in
<person-group person-group-type="editor">
<string-name>
<surname>Kerr</surname>
,
<given-names>I.</given-names>
</string-name>
</person-group>
,
<person-group person-group-type="editor">
<string-name>
<surname>Steeves</surname>
,
<given-names>V.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="editor">
<string-name>
<surname>Lucock</surname>
,
<given-names>C.</given-names>
</string-name>
</person-group>
(Eds),
<source>
<italic>Lessons from the Identity Trail: Privacy, Anonymity and Identity in a Networked Society</italic>
</source>
,
<publisher-name>Oxford University Press</publisher-name>
,
<publisher-loc>Oxford</publisher-loc>
.</mixed-citation>
</ref>
<ref id="b11">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Kushilevitz</surname>
,
<given-names>E.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="author">
<string-name>
<surname>Ostrovsky</surname>
,
<given-names>R.</given-names>
</string-name>
</person-group>
(
<year>1997</year>
), “Replication is not needed: single database, computationally‐private information retrieval”, in
<italic>Proceedings of the 38th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 20‐22 October, IEEE</italic>
, pp. 364‐373.</mixed-citation>
</ref>
<ref id="b12">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Lane</surname>
,
<given-names>J.</given-names>
</string-name>
</person-group>
,
<person-group person-group-type="author">
<string-name>
<surname>Heus</surname>
,
<given-names>P.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="author">
<string-name>
<surname>Mulcahy</surname>
,
<given-names>T.</given-names>
</string-name>
</person-group>
(
<year>2008</year>
), “
<article-title>
<italic>Data access in a cyber world: making use of cyberinfrastructure</italic>
</article-title>
”,
<source>
<italic>Transactions on Data Privacy</italic>
</source>
, Vol.
<volume>1</volume>
No.
<issue>1</issue>
, pp.
<fpage>2</fpage>
<x></x>
<lpage>16</lpage>
.</mixed-citation>
</ref>
<ref id="b13">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Leech</surname>
,
<given-names>G.</given-names>
</string-name>
</person-group>
,
<person-group person-group-type="author">
<string-name>
<surname>Rayson</surname>
,
<given-names>P.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="author">
<string-name>
<surname>Wilson</surname>
,
<given-names>A.</given-names>
</string-name>
</person-group>
(
<year>2001</year>
),
<source>
<italic>Word Frequencies in Written and Spoken English: Based on the British National Corpus</italic>
</source>
,
<publisher-name>Longman</publisher-name>
,
<publisher-loc>London</publisher-loc>
.</mixed-citation>
</ref>
<ref id="b14">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Ostrovsky</surname>
,
<given-names>R.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="author">
<string-name>
<surname>Skeith</surname>
,
<given-names>W.E. III</given-names>
</string-name>
</person-group>
(
<year>2007</year>
), “
<article-title>
<italic>A survey of single‐database PIR: techniques and applications, in public key cryptography – PKC 2007</italic>
</article-title>
”,
<source>
<italic>Lecture Notes in Computer Science</italic>
</source>
,
<volume>Vol. 4450</volume>
,
<publisher-name>Springer</publisher-name>
,
<publisher-loc>Berlin and Heidelberg</publisher-loc>
, pp.
<fpage>393</fpage>
<x></x>
<lpage>411</lpage>
.</mixed-citation>
</ref>
<ref id="b15">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Samarati</surname>
,
<given-names>P.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="author">
<string-name>
<surname>Sweeney</surname>
,
<given-names>L.</given-names>
</string-name>
</person-group>
(
<year>1998</year>
), Protecting privacy when disclosing information: k‐anonymity and its enforcement through generalization and suppression, Technical report, SRI International, Menlo Park, CA.</mixed-citation>
</ref>
<ref id="b16">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Shannon</surname>
,
<given-names>C.E.</given-names>
</string-name>
</person-group>
(
<year>1948</year>
), “
<article-title>
<italic>A mathematical theory of communication</italic>
</article-title>
”,
<source>
<italic>Bell Systems Technical Journal</italic>
</source>
, Vol.
<volume>14</volume>
No.
<issue>1</issue>
, pp.
<fpage>7</fpage>
<x></x>
<lpage>13</lpage>
.</mixed-citation>
</ref>
<ref id="b17">
<mixed-citation>
<person-group person-group-type="author">
<string-name>
<surname>Staddon</surname>
,
<given-names>J.</given-names>
</string-name>
</person-group>
,
<person-group person-group-type="author">
<string-name>
<surname>Golle</surname>
,
<given-names>P.</given-names>
</string-name>
</person-group>
and
<person-group person-group-type="author">
<string-name>
<surname>Zimny</surname>
,
<given-names>B.</given-names>
</string-name>
</person-group>
(
<year>2007</year>
), “Web‐based inference detection”, in
<italic>Proceedings of the 16th USENIX Security Symposium, The Advanced Computer Systems Association</italic>
, pp. 71‐86.</mixed-citation>
</ref>
</ref-list>
<app-group>
<app id="APP1">
<title>Appendix</title>
<sec>
<title>Lemma 2</title>
<p>Given a thesaurus with
<italic>N</italic>
keywords, the probability that the same keyword belongs to two independent random samples of
<italic>k</italic>
keywords drawn from the thesaurus is negligible if
<italic>N</italic>
>>
<italic>k</italic>
.</p>
</sec>
<sec>
<title>Proof</title>
<p>Let us compute the probability that there is an empty intersection between two independent random samples of size
<italic>k</italic>
drawn from the thesaurus. This can be computed as the number of favourable cases divided by the number of possible cases – the number of favourable cases is the number of different second samples that can be drawn from the thesaurus once the
<italic>k</italic>
keywords in the first sample have been removed from it, and the number of possible cases is the number of possible second samples. Therefore:
<xref ref-type="fig" rid="F_2640330405022">Equation 22</xref>
Clearly, Expression (8) tends to 1 as
<italic>N</italic>
grows, which proves the Lemma. χ</p>
</sec>
<sec>
<title>Lemma 3</title>
<p>Let [
<italic>A</italic>
,
<italic>B</italic>
]⊆[0,1] and
<italic>k</italic>
be such that
<italic>kA</italic>
<1 and
<italic>kB</italic>
>1. Let
<italic>p</italic>
<sub>
<italic>i</italic>
</sub>
∈[
<italic>A</italic>
,
<italic>B</italic>
] for
<italic>i</italic>
=1,ċ,
<italic>k</italic>
be variables such that ∑
<sub>
<italic>i</italic>
=1</sub>
<sup>
<italic>k</italic>
</sup>
<italic>p</italic>
<sub>
<italic>i</italic>
</sub>
=1. Then the maximum of:
<xref ref-type="fig" rid="F_2640330405023">Equation 23</xref>
is reached for
<italic>p</italic>
<sub>1</sub>
=
<italic>Λ</italic>
=
<italic>p</italic>
<sub>
<italic>d</italic>
</sub>
=
<italic>B</italic>
,
<italic>p</italic>
<sub>
<italic>d</italic>
+1</sub>
=1−
<italic>dB</italic>
−(
<italic>k</italic>
<italic>d</italic>
−1)
<italic>A</italic>
and
<italic>p</italic>
<sub>
<italic>d</italic>
+2</sub>
=
<italic>Λ</italic>
=
<italic>p</italic>
<sub>
<italic>k</italic>
</sub>
=
<italic>A</italic>
or any permutation of those assignments, where
<italic>d</italic>
<
<italic>k</italic>
is the greatest integer such that (
<italic>k</italic>
<italic>d</italic>
)
<italic>A</italic>
+
<italic>dB</italic>
≦1.</p>
</sec>
<sec>
<title>Proof</title>
<p>We proceed by construction. A necessary condition to maximise Expression (9) is to have as many
<italic>p</italic>
<sub>
<italic>i</italic>
</sub>
's as possible set at their maximum value
<italic>B</italic>
. At most,
<italic>d</italic>
variables
<italic>p</italic>
<sub>
<italic>i</italic>
</sub>
can be set to
<italic>B</italic>
, where
<italic>d</italic>
is the greatest integer such that (
<italic>k</italic>
<italic>d</italic>
)
<italic>A</italic>
+
<italic>dB</italic>
≦1. (Note that, by construction, (
<italic>k</italic>
−1)
<italic>A</italic>
+
<italic>B</italic>
≦1, so
<italic>d</italic>
≥1).</p>
<p>Without loss of generality, let
<italic>p</italic>
<sub>1</sub>
=
<italic>p</italic>
<sub>2</sub>
=
<italic>Λ</italic>
=
<italic>p</italic>
<sub>
<italic>d</italic>
</sub>
=
<italic>B</italic>
. Now, there is a “mass” 1−
<italic>dB</italic>
with (
<italic>k</italic>
<italic>d</italic>
)
<italic>A</italic>
≦1−
<italic>dB</italic>
left to be distributed among
<italic>p</italic>
<sub>
<italic>d</italic>
+1</sub>
,
<italic>Λ</italic>
,
<italic>p</italic>
<sub>
<italic>k</italic>
</sub>
, with the constraint that
<italic>p</italic>
<sub>
<italic>j</italic>
</sub>
<italic>A</italic>
for
<italic>j</italic>
=
<italic>d</italic>
+1,
<italic>Λ</italic>
,
<italic>k</italic>
. The following cases appear:
<list list-type="bullet">
<list-item>
<label></label>
<p>
<sub>
<italic>k</italic>
=
<italic>d</italic>
+1</sub>
. In this case the maximum is
<italic>p</italic>
<sub>1</sub>
=
<italic>Λ</italic>
=
<italic>p</italic>
<sub>
<italic>d</italic>
</sub>
=
<italic>B</italic>
and
<italic>p</italic>
<sub>
<italic>k</italic>
</sub>
=1−
<italic>dB</italic>
. (Note that
<italic>p</italic>
<sub>
<italic>k</italic>
</sub>
=1−
<italic>dB</italic>
is within range. Indeed, for
<italic>k</italic>
=
<italic>d</italic>
+1, by construction
<italic>d</italic>
is such that
<italic>A</italic>
+
<italic>dB</italic>
≦1<(
<italic>d</italic>
+1)
<italic>B</italic>
, so
<italic>A</italic>
≦1−
<italic>dB</italic>
<
<italic>B</italic>
).</p>
</list-item>
<list-item>
<label></label>
<p>
<italic>k</italic>
=
<italic>d</italic>
+2. In this case, the maximum is reached for
<italic>p</italic>
<sub>1</sub>
=
<italic>Λ</italic>
=
<italic>p</italic>
<sub>
<italic>d</italic>
</sub>
=
<italic>B</italic>
and we have to decide about the values for
<italic>p</italic>
<sub>
<italic>d</italic>
+1</sub>
and
<italic>p</italic>
<sub>
<italic>d</italic>
+2</sub>
. Both probabilities should add 1−
<italic>dB</italic>
. So maximising their sum of squares can be regarded as maximising the following univariate function
<italic>F</italic>
(
<italic>p</italic>
<sub>
<italic>d</italic>
+1</sub>
<sup>2</sup>
)=
<italic>p</italic>
<sub>
<italic>d</italic>
+1</sub>
<sup>2</sup>
+(1−
<italic>dB</italic>
<italic>p</italic>
<sub>
<italic>d</italic>
+1</sub>
)
<sup>2</sup>
<italic>with</italic>
<italic>p</italic>
<sub>
<italic>d</italic>
+1</sub>
∈[
<italic>A</italic>
,1−
<italic>dB</italic>
<italic>A</italic>
]continue with list (Including maths) The above concave function is maximised at either extreme value of
<italic>p</italic>
<sub>
<italic>d</italic>
+1</sub>
<sup>2</sup>
. Without loss of generality, let us take
<italic>p</italic>
<sub>
<italic>d</italic>
+1</sub>
=1−
<italic>dB</italic>
<italic>A</italic>
and
<italic>p</italic>
<sub>
<italic>d</italic>
+2</sub>
=
<italic>A</italic>
. (Note that
<italic>p</italic>
<sub>
<italic>d</italic>
+1</sub>
=1−
<italic>dB</italic>
<italic>A</italic>
is within range. For
<italic>k</italic>
=
<italic>d</italic>
+2, by construction 2
<italic>A</italic>
+
<italic>dB</italic>
≦1<(
<italic>d</italic>
+1)
<italic>B</italic>
, so
<italic>A</italic>
≦1−
<italic>dB</italic>
<italic>A</italic>
<
<italic>B</italic>
<italic>A</italic>
.)</p>
</list-item>
<list-item>
<label></label>
<p>
<italic>k</italic>
>
<italic>d</italic>
+1. The maximum is reached for
<italic>p</italic>
<sub>1</sub>
=
<italic>Λ</italic>
=
<italic>p</italic>
<sub>
<italic>d</italic>
</sub>
=
<italic>B</italic>
we have to decide about the values for
<italic>p</italic>
<sub>
<italic>d</italic>
+1</sub>
,
<italic>p</italic>
<sub>
<italic>d</italic>
+2</sub>
,
<italic>Λ</italic>
,
<italic>p</italic>
<sub>
<italic>k</italic>
</sub>
. To maximise
<italic>p</italic>
<sub>
<italic>d</italic>
+1</sub>
<sup>2</sup>
+
<italic>Λ</italic>
+
<italic>p</italic>
<sub>
<italic>k</italic>
</sub>
<sup>2</sup>
, we have to concentrate as much mass as possible in any single probability and set the rest to their lowest possible value
<italic>A</italic>
. Without loss of generality,
<italic>p</italic>
<sub>
<italic>d</italic>
+1</sub>
=1−
<italic>dB</italic>
−(
<italic>k</italic>
<italic>d</italic>
−1)
<italic>A</italic>
and
<italic>p</italic>
<sub>
<italic>d</italic>
+2</sub>
=
<italic>Λ</italic>
=
<italic>p</italic>
<sub>
<italic>k</italic>
</sub>
=
<italic>A</italic>
. (Note
<italic>p</italic>
<sub>
<italic>d</italic>
+1</sub>
is within range. By construction (
<italic>k</italic>
<italic>d</italic>
)
<italic>A</italic>
+
<italic>dB</italic>
≦1<(
<italic>d</italic>
+1)
<italic>B</italic>
, so
<italic>A</italic>
≦1−
<italic>dB</italic>
−(
<italic>k</italic>
<italic>d</italic>
−1)
<italic>A</italic>
<
<italic>B</italic>
−(
<italic>k</italic>
<italic>d</italic>
−1)
<italic>A</italic>
.)</p>
</list-item>
</list>
</p>
</sec>
</app>
<app id="APP6">
<title>About the authors</title>
<p>Josep Domingo‐Ferrer is Professor of Computer Science at Rovira i Virgili University of Tarragona, Spain, where he holds the UNESCO Chair in Data Privacy. He received with honors his MSc and PhD degrees in Computer Science from the Autonomous University of Barcelona in 1988 and 1991 (Outstanding Graduation Award). He also holds a MSc in Mathematics. His fields of activity are data privacy, data security and cryptographic protocols. In 2003, he was a co‐recipient of a research prize from the Association of Telecom Engineers of Catalonia. In 2004, he got the TOYPS'2004 Award from the Junior Chambers of Catalonia. In 2009, he won the ICREA Academia Research Prize, awarded by the Government of Catalonia to the 40 leading Catalan faculty members in all areas of research. He has authored three patents and over 210 publications, one of which became an ISI highly‐cited paper in early 2005. He has been the co‐ordinator of EU FP5 project CO‐ORTHOGONAL and of several Spanish‐funded and US‐funded research projects. He currently coordinates the CONSOLIDER “ARES” team on security and privacy, one of Spain's 34 strongest research teams. He has chaired or co‐chaired nine international conferences and has served on the programme committee of over 65 conferences on privacy and security. He is a co‐Editor‐in‐Chief of
<italic>Transactions on Data Privacy</italic>
and he is an Associate Editor of three international journals. In 2004, he was a Visiting Fellow at Princeton University.</p>
<p>Agusti Solanas is a tenure‐track lecturer at the Rovira i Virgili University of Tarragona (URV), Spain. He received his BSc and MSc degrees in Computer Engineering from URV in 2002 and 2004, respectively, the latter with honours (Outstanding Graduation Award). He received a PhD in Telematics Engineering from the Technical University of Catalonia in 2007 with honours. His fields of activity are data privacy, data security, clustering, neural networks and evolutionary computation. He is a participant in the CONSOLIDER‐INGENIO 2007 project. He has participated in several Spanish‐funded and Catalan‐funded research projects. He has authored over 40 publications and he has delivered several talks. He has served as Chair, programme committee member and reviewer for several conferences and journals. Agusti Solanas is the corresponding author and can be contacted at: agusti.solanas@urv.cat</p>
<p>Jordi Castellà‐Roca is a tenure‐track lecturer at Rovira i Virgili University, Spain. He is currently a member of the UNESCO Chair in Data Privacy. He got his title of Engineer in Computer Systems from University of Lleida in 1998, the title of Engineer in Computer Science from Rovira i Virgili University in 2000 and his PhD in Computer Science from the Autonomous University of Barcelona in 2005. His research focuses on the fields of cryptography (cryptographic protocols) and privacy. He has published over 30 works in international journals, book chapters, and international and national congresses. He is on the advisory board of an international magazine, and he has been a member of the scientific and organising committees of several international congresses. He has participated in Spanish‐funded and Catalan‐funded research projects. He is the author of six patents, five of them international and in operation. He is a founding partner of three technology companies.</p>
</app>
</app-group>
</back>
</article>
</istex:document>
</istex:metadataXml>
<mods version="3.6">
<titleInfo lang="en">
<title>hkprivate information retrieval from privacyuncooperative queryable databases</title>
</titleInfo>
<titleInfo type="alternative" lang="en" contentType="CDATA">
<title>hkprivate information retrieval from privacyuncooperative queryable databases</title>
</titleInfo>
<name type="personal">
<namePart type="given">Josep</namePart>
<namePart type="family">DomingoFerrer</namePart>
<affiliation>Department of Computer Engineering and Mathematics, Rovira i Virgili University, Tarragona, Spain</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Agusti</namePart>
<namePart type="family">Solanas</namePart>
<affiliation>Department of Computer Engineering and Mathematics, Rovira i Virgili University, Tarragona, Spain</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Jordi</namePart>
<namePart type="family">CastellRoca</namePart>
<affiliation>Department of Computer Engineering and Mathematics, Rovira i Virgili University, Tarragona, Spain</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<typeOfResource>text</typeOfResource>
<genre type="research-article" displayLabel="research-article"></genre>
<originInfo>
<publisher>Emerald Group Publishing Limited</publisher>
<dateIssued encoding="w3cdtf">2009-08-07</dateIssued>
<copyrightDate encoding="w3cdtf">2009</copyrightDate>
</originInfo>
<language>
<languageTerm type="code" authority="iso639-2b">eng</languageTerm>
<languageTerm type="code" authority="rfc3066">en</languageTerm>
</language>
<physicalDescription>
<internetMediaType>text/html</internetMediaType>
</physicalDescription>
<abstract>Purpose This paper aims to address the privacy problem associated with the use of internet search engines. The purpose of the paper is to propose and validate a set of methods and protocols to guarantee the privacy of users' queries. Designmethodologyapproach In this paper hkprivate information retrieval hkPIR is defined as a practical compromise between computational efficiency and privacy. Also presented are hkPIR protocols that can be used to query any database, which does not even need to know that the user is trying to preserve his or her privacy. Findings The proposed methods are able to properly protect the privacy of users' queries. When internet users apply the protocols, search engines e.g. Google are not able to determine unequivocally the real interests of their users. The quality of the results does decrease with the increase in privacy, but the obtained tradeoff is excellent. Practical implications Current private information retrieval PIR protocols suffer from two significant shortcomings their computational complexity is On where n is the number of records in the database, which precludes their use for very large databases and web search engines and they assume that the database server cooperates in the PIR protocol, which prevents deployment in reallife uncooperative settings. The proposed protocols overcome both problems. Originalityvalue This is the first set of protocols that offer practical protection for the privacy of the queries that internet users submit to an internet search engine. The proposal has been implemented and it will be released to the general public soon. It will help to protect the right to privacy of millions of internet users.</abstract>
<subject>
<genre>keywords</genre>
<topic>Information retrieval</topic>
<topic>Search engines</topic>
<topic>Internet</topic>
</subject>
<relatedItem type="host">
<titleInfo>
<title>Online Information Review</title>
</titleInfo>
<genre type="journal">journal</genre>
<subject>
<genre>Emerald Subject Group</genre>
<topic authority="SubjectCodesPrimary" authorityURI="cat-IKM">Information & knowledge management</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-ICT">Information & communications technology</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-INT">Internet</topic>
</subject>
<subject>
<genre>Emerald Subject Group</genre>
<topic authority="SubjectCodesPrimary" authorityURI="cat-LISC">Library & information science</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-CBM">Collection building & management</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-IBRT">Information behaviour & retrieval</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-RMP">Records management & preservation</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-BIB">Bibliometrics</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-DAT">Databases</topic>
<topic authority="SubjectCodesSecondary" authorityURI="cat-DOCM">Document management</topic>
</subject>
<identifier type="ISSN">1468-4527</identifier>
<identifier type="PublisherID">oir</identifier>
<identifier type="DOI">10.1108/oir</identifier>
<part>
<date>2009</date>
<detail type="volume">
<caption>vol.</caption>
<number>33</number>
</detail>
<detail type="issue">
<caption>no.</caption>
<number>4</number>
</detail>
<extent unit="pages">
<start>720</start>
<end>744</end>
</extent>
</part>
</relatedItem>
<identifier type="istex">D232AD0CEB55B23DA54A35A2A5E395717283CF0C</identifier>
<identifier type="DOI">10.1108/14684520910985693</identifier>
<identifier type="filenameID">2640330405</identifier>
<identifier type="original-pdf">2640330405.pdf</identifier>
<identifier type="href">14684520910985693.pdf</identifier>
<accessCondition type="use and reproduction" contentType="copyright">© Emerald Group Publishing Limited</accessCondition>
<recordInfo>
<recordContentSource>EMERALD</recordContentSource>
</recordInfo>
</mods>
</metadata>
<serie></serie>
</istex>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Istex/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000595 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Corpus/biblio.hfd -nk 000595 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Istex
   |étape=   Corpus
   |type=    RBID
   |clé=     ISTEX:D232AD0CEB55B23DA54A35A2A5E395717283CF0C
   |texte=   hkprivate information retrieval from privacyuncooperative queryable databases
}}

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024