Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Composite Bloom Filters for Secure Record Linkage

Identifieur interne : 000146 ( Pmc/Curation ); précédent : 000145; suivant : 000147

Composite Bloom Filters for Secure Record Linkage

Auteurs : Elizabeth Ashley Durham ; Murat Kantarcioglu ; Yuan Xue ; Csaba Toth ; Mehmet Kuzu ; Bradley Malin

Source :

RBID : PMC:4269299

Abstract

The process of record linkage seeks to integrate instances that correspond to the same entity. Record linkage has traditionally been performed through the comparison of identifying field values (e.g., Surname), however, when databases are maintained by disparate organizations, the disclosure of such information can breach the privacy of the corresponding individuals. Various private record linkage (PRL) methods have been developed to obscure such identifiers, but they vary widely in their ability to balance competing goals of accuracy, efficiency and security. The tokenization and hashing of field values into Bloom filters (BF) enables greater linkage accuracy and efficiency than other PRL methods, but the encodings may be compromised through frequency-based cryptanalysis. Our objective is to adapt a BF encoding technique to mitigate such attacks with minimal sacrifices in accuracy and efficiency. To accomplish these goals, we introduce a statistically-informed method to generate BF encodings that integrate bits from multiple fields, the frequencies of which are provably associated with a minimum number of fields. Our method enables a user-specified tradeoff between security and accuracy. We compare our encoding method with other techniques using a public dataset of voter registration records and demonstrate that the increases in security come with only minor losses to accuracy.


Url:
DOI: 10.1109/TKDE.2013.91
PubMed: 25530689
PubMed Central: 4269299

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:4269299

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Composite Bloom Filters for Secure Record Linkage</title>
<author>
<name sortKey="Durham, Elizabeth Ashley" sort="Durham, Elizabeth Ashley" uniqKey="Durham E" first="Elizabeth Ashley" last="Durham">Elizabeth Ashley Durham</name>
</author>
<author>
<name sortKey="Kantarcioglu, Murat" sort="Kantarcioglu, Murat" uniqKey="Kantarcioglu M" first="Murat" last="Kantarcioglu">Murat Kantarcioglu</name>
</author>
<author>
<name sortKey="Xue, Yuan" sort="Xue, Yuan" uniqKey="Xue Y" first="Yuan" last="Xue">Yuan Xue</name>
</author>
<author>
<name sortKey="Toth, Csaba" sort="Toth, Csaba" uniqKey="Toth C" first="Csaba" last="Toth">Csaba Toth</name>
</author>
<author>
<name sortKey="Kuzu, Mehmet" sort="Kuzu, Mehmet" uniqKey="Kuzu M" first="Mehmet" last="Kuzu">Mehmet Kuzu</name>
</author>
<author>
<name sortKey="Malin, Bradley" sort="Malin, Bradley" uniqKey="Malin B" first="Bradley" last="Malin">Bradley Malin</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">25530689</idno>
<idno type="pmc">4269299</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4269299</idno>
<idno type="RBID">PMC:4269299</idno>
<idno type="doi">10.1109/TKDE.2013.91</idno>
<date when="2014">2014</date>
<idno type="wicri:Area/Pmc/Corpus">000146</idno>
<idno type="wicri:Area/Pmc/Curation">000146</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Composite Bloom Filters for Secure Record Linkage</title>
<author>
<name sortKey="Durham, Elizabeth Ashley" sort="Durham, Elizabeth Ashley" uniqKey="Durham E" first="Elizabeth Ashley" last="Durham">Elizabeth Ashley Durham</name>
</author>
<author>
<name sortKey="Kantarcioglu, Murat" sort="Kantarcioglu, Murat" uniqKey="Kantarcioglu M" first="Murat" last="Kantarcioglu">Murat Kantarcioglu</name>
</author>
<author>
<name sortKey="Xue, Yuan" sort="Xue, Yuan" uniqKey="Xue Y" first="Yuan" last="Xue">Yuan Xue</name>
</author>
<author>
<name sortKey="Toth, Csaba" sort="Toth, Csaba" uniqKey="Toth C" first="Csaba" last="Toth">Csaba Toth</name>
</author>
<author>
<name sortKey="Kuzu, Mehmet" sort="Kuzu, Mehmet" uniqKey="Kuzu M" first="Mehmet" last="Kuzu">Mehmet Kuzu</name>
</author>
<author>
<name sortKey="Malin, Bradley" sort="Malin, Bradley" uniqKey="Malin B" first="Bradley" last="Malin">Bradley Malin</name>
</author>
</analytic>
<series>
<title level="j">IEEE transactions on knowledge and data engineering</title>
<idno type="ISSN">1041-4347</idno>
<idno type="eISSN">1558-2191</idno>
<imprint>
<date when="2014">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p id="P1">The process of record linkage seeks to integrate instances that correspond to the same entity. Record linkage has traditionally been performed through the comparison of identifying field values (
<italic>e.g., Surname</italic>
), however, when databases are maintained by disparate organizations, the disclosure of such information can breach the privacy of the corresponding individuals. Various private record linkage (PRL) methods have been developed to obscure such identifiers, but they vary widely in their ability to balance competing goals of accuracy, efficiency and security. The tokenization and hashing of field values into Bloom filters (BF) enables greater linkage accuracy and efficiency than other PRL methods, but the encodings may be compromised through frequency-based cryptanalysis. Our objective is to adapt a BF encoding technique to mitigate such attacks with minimal sacrifices in accuracy and efficiency. To accomplish these goals, we introduce a statistically-informed method to generate BF encodings that integrate bits from multiple fields, the frequencies of which are provably associated with a minimum number of fields. Our method enables a user-specified tradeoff between security and accuracy. We compare our encoding method with other techniques using a public dataset of voter registration records and demonstrate that the increases in security come with only minor losses to accuracy.</p>
</div>
</front>
</TEI>
<pmc article-type="research-article">
<pmc-comment>The publisher of this article does not allow downloading of the full text in XML form.</pmc-comment>
<pmc-dir>properties manuscript</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-journal-id">9887654</journal-id>
<journal-id journal-id-type="pubmed-jr-id">36377</journal-id>
<journal-id journal-id-type="nlm-ta">IEEE Trans Knowl Data Eng</journal-id>
<journal-id journal-id-type="iso-abbrev">IEEE Trans Knowl Data Eng</journal-id>
<journal-title-group>
<journal-title>IEEE transactions on knowledge and data engineering</journal-title>
</journal-title-group>
<issn pub-type="ppub">1041-4347</issn>
<issn pub-type="epub">1558-2191</issn>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">25530689</article-id>
<article-id pub-id-type="pmc">4269299</article-id>
<article-id pub-id-type="doi">10.1109/TKDE.2013.91</article-id>
<article-id pub-id-type="manuscript">NIHMS506202</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Composite Bloom Filters for Secure Record Linkage</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Durham</surname>
<given-names>Elizabeth Ashley</given-names>
</name>
<aff id="A1">Dept. of Biomedical Informatics, Vanderbilt University, Nashville, TN 37232.
<email>ea.durham@vanderbilt.edu</email>
</aff>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Kantarcioglu</surname>
<given-names>Murat</given-names>
</name>
<role>Senior Member, IEEE</role>
<aff id="A2">Department of Computer Science, University of Texas at Dallas, Richardson, TX, 75083.
<email>muratk@utdallas.edu</email>
</aff>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Xue</surname>
<given-names>Yuan</given-names>
</name>
<role>Member, IEEE</role>
<aff id="A3">Dept. of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN 37232.
<email>yuan.xue@vanderbilt.edu</email>
</aff>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Toth</surname>
<given-names>Csaba</given-names>
</name>
<aff id="A4">Dept. of Biomedical Informatics, Vanderbilt University, Nashville, TN 37232.
<email>csaba.toth@vanderbilt.edu</email>
</aff>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Kuzu</surname>
<given-names>Mehmet</given-names>
</name>
<aff id="A5">Dept. of Computer Science, University of Texas at Dallas, Richardson, TX, 75083.
<email>mehmet.kuzu@utdallas.edu</email>
</aff>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Malin</surname>
<given-names>Bradley</given-names>
</name>
<role>Member, IEEE</role>
<aff id="A6">Depts. of Biomedical Informatics and EECS, Vander bilt University, Nashville, TN 37232.
<email>b.malin@vanderbilt.edu</email>
</aff>
</contrib>
</contrib-group>
<pub-date pub-type="nihms-submitted">
<day>5</day>
<month>9</month>
<year>2014</year>
</pub-date>
<pub-date pub-type="ppub">
<month>12</month>
<year>2014</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>01</day>
<month>12</month>
<year>2015</year>
</pub-date>
<volume>26</volume>
<issue>12</issue>
<fpage>2956</fpage>
<lpage>2968</lpage>
<pmc-comment>elocation-id from pubmed: 10.1109/TKDE.2013.91</pmc-comment>
<permissions>
<copyright-statement>© 2013 IEEE</copyright-statement>
<copyright-year>2013</copyright-year>
</permissions>
<abstract>
<p id="P1">The process of record linkage seeks to integrate instances that correspond to the same entity. Record linkage has traditionally been performed through the comparison of identifying field values (
<italic>e.g., Surname</italic>
), however, when databases are maintained by disparate organizations, the disclosure of such information can breach the privacy of the corresponding individuals. Various private record linkage (PRL) methods have been developed to obscure such identifiers, but they vary widely in their ability to balance competing goals of accuracy, efficiency and security. The tokenization and hashing of field values into Bloom filters (BF) enables greater linkage accuracy and efficiency than other PRL methods, but the encodings may be compromised through frequency-based cryptanalysis. Our objective is to adapt a BF encoding technique to mitigate such attacks with minimal sacrifices in accuracy and efficiency. To accomplish these goals, we introduce a statistically-informed method to generate BF encodings that integrate bits from multiple fields, the frequencies of which are provably associated with a minimum number of fields. Our method enables a user-specified tradeoff between security and accuracy. We compare our encoding method with other techniques using a public dataset of voter registration records and demonstrate that the increases in security come with only minor losses to accuracy.</p>
</abstract>
<kwd-group>
<kwd>data matching</kwd>
<kwd>record linkage</kwd>
<kwd>entity resolution</kwd>
<kwd>privacy</kwd>
<kwd>security</kwd>
<kwd>Bloom filter</kwd>
</kwd-group>
</article-meta>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000146 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000146 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:4269299
   |texte=   Composite Bloom Filters for Secure Record Linkage
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:25530689" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a OcrV1 

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024