Serveur d'exploration sur SGML

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Rapidly Retargetable Approaches to De-identification in Medical Records

Identifieur interne : 000175 ( Pmc/Corpus ); précédent : 000174; suivant : 000176

Rapidly Retargetable Approaches to De-identification in Medical Records

Auteurs : Ben Wellner ; Matt Huyck ; Scott Mardis ; John Aberdeen ; Alex Morgan ; Leonid Peshkin ; Alex Yeh ; Janet Hitzeman ; Lynette Hirschman

Source :

RBID : PMC:1975794

Abstract

Objective

This paper describes a successful approach to de-identification that was developed to participate in a recent AMIA-sponsored challenge evaluation.

Method

Our approach focused on rapid adaptation of existing toolkits for named entity recognition using two existing toolkits, Carafe and LingPipe.

Results

The “out of the box” Carafe system achieved a very good score (phrase F-measure of 0.9664) with only four hours of work to adapt it to the de-identification task. With further tuning, we were able to reduce the token-level error term by over 36% through task-specific feature engineering and the introduction of a lexicon, achieving a phrase F-measure of 0.9736.

Conclusions

We were able to achieve good performance on the de-identification task by the rapid retargeting of existing toolkits. For the Carafe system, we developed a method for tuning the balance of recall vs. precision, as well as a confidence score that correlated well with the measured F-score.


Url:
DOI: 10.1197/jamia.M2435
PubMed: 17600096
PubMed Central: 1975794

Links to Exploration step

PMC:1975794

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Rapidly Retargetable Approaches to De-identification in Medical Records</title>
<author>
<name sortKey="Wellner, Ben" sort="Wellner, Ben" uniqKey="Wellner B" first="Ben" last="Wellner">Ben Wellner</name>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Huyck, Matt" sort="Huyck, Matt" uniqKey="Huyck M" first="Matt" last="Huyck">Matt Huyck</name>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mardis, Scott" sort="Mardis, Scott" uniqKey="Mardis S" first="Scott" last="Mardis">Scott Mardis</name>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Aberdeen, John" sort="Aberdeen, John" uniqKey="Aberdeen J" first="John" last="Aberdeen">John Aberdeen</name>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Morgan, Alex" sort="Morgan, Alex" uniqKey="Morgan A" first="Alex" last="Morgan">Alex Morgan</name>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Peshkin, Leonid" sort="Peshkin, Leonid" uniqKey="Peshkin L" first="Leonid" last="Peshkin">Leonid Peshkin</name>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Yeh, Alex" sort="Yeh, Alex" uniqKey="Yeh A" first="Alex" last="Yeh">Alex Yeh</name>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hitzeman, Janet" sort="Hitzeman, Janet" uniqKey="Hitzeman J" first="Janet" last="Hitzeman">Janet Hitzeman</name>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hirschman, Lynette" sort="Hirschman, Lynette" uniqKey="Hirschman L" first="Lynette" last="Hirschman">Lynette Hirschman</name>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">17600096</idno>
<idno type="pmc">1975794</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1975794</idno>
<idno type="RBID">PMC:1975794</idno>
<idno type="doi">10.1197/jamia.M2435</idno>
<date when="2007">2007</date>
<idno type="wicri:Area/Pmc/Corpus">000175</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000175</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Rapidly Retargetable Approaches to De-identification in Medical Records</title>
<author>
<name sortKey="Wellner, Ben" sort="Wellner, Ben" uniqKey="Wellner B" first="Ben" last="Wellner">Ben Wellner</name>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Huyck, Matt" sort="Huyck, Matt" uniqKey="Huyck M" first="Matt" last="Huyck">Matt Huyck</name>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mardis, Scott" sort="Mardis, Scott" uniqKey="Mardis S" first="Scott" last="Mardis">Scott Mardis</name>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Aberdeen, John" sort="Aberdeen, John" uniqKey="Aberdeen J" first="John" last="Aberdeen">John Aberdeen</name>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Morgan, Alex" sort="Morgan, Alex" uniqKey="Morgan A" first="Alex" last="Morgan">Alex Morgan</name>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Peshkin, Leonid" sort="Peshkin, Leonid" uniqKey="Peshkin L" first="Leonid" last="Peshkin">Leonid Peshkin</name>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Yeh, Alex" sort="Yeh, Alex" uniqKey="Yeh A" first="Alex" last="Yeh">Alex Yeh</name>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hitzeman, Janet" sort="Hitzeman, Janet" uniqKey="Hitzeman J" first="Janet" last="Hitzeman">Janet Hitzeman</name>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hirschman, Lynette" sort="Hirschman, Lynette" uniqKey="Hirschman L" first="Lynette" last="Hirschman">Lynette Hirschman</name>
<affiliation>
<nlm:aff>NONE</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Journal of the American Medical Informatics Association : JAMIA</title>
<idno type="ISSN">1067-5027</idno>
<idno type="eISSN">1527-974X</idno>
<imprint>
<date when="2007">2007</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Objective</title>
<p>This paper describes a successful approach to de-identification that was developed to participate in a recent AMIA-sponsored challenge evaluation.</p>
</sec>
<sec sec-type="methods">
<title>Method</title>
<p>Our approach focused on rapid adaptation of existing toolkits for named entity recognition using two existing toolkits, Carafe and LingPipe.</p>
</sec>
<sec>
<title>Results</title>
<p>The “out of the box” Carafe system achieved a very good score (phrase F-measure of 0.9664) with only four hours of work to adapt it to the de-identification task. With further tuning, we were able to reduce the token-level error term by over 36% through task-specific feature engineering and the introduction of a lexicon, achieving a phrase F-measure of 0.9736.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>We were able to achieve good performance on the de-identification task by the rapid retargeting of existing toolkits. For the Carafe system, we developed a method for tuning the balance of recall vs. precision, as well as a confidence score that correlated well with the measured F-score.</p>
</sec>
</div>
</front>
</TEI>
<pmc article-type="research-article">
<pmc-comment>The publisher of this article does not allow downloading of the full text in XML form.</pmc-comment>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">J Am Med Inform Assoc</journal-id>
<journal-id journal-id-type="publisher-id">jamia</journal-id>
<journal-title-group>
<journal-title>Journal of the American Medical Informatics Association : JAMIA</journal-title>
</journal-title-group>
<issn pub-type="ppub">1067-5027</issn>
<issn pub-type="epub">1527-974X</issn>
<publisher>
<publisher-name>American Medical Informatics Association</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">17600096</article-id>
<article-id pub-id-type="pmc">1975794</article-id>
<article-id pub-id-type="publisher-id">564</article-id>
<article-id pub-id-type="doi">10.1197/jamia.M2435</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Paper</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Rapidly Retargetable Approaches to De-identification in Medical Records</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Wellner</surname>
<given-names>Ben</given-names>
</name>
<xref ref-type="aff" rid="aff1"> a </xref>
<xref ref-type="aff" rid="aff3"> c </xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Huyck</surname>
<given-names>Matt</given-names>
</name>
<xref ref-type="aff" rid="aff2"> b </xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Mardis</surname>
<given-names>Scott</given-names>
</name>
<xref ref-type="aff" rid="aff1"> a </xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Aberdeen</surname>
<given-names>John</given-names>
</name>
<xref ref-type="aff" rid="aff1"> a </xref>
<xref ref-type="corresp" rid="cor1"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Morgan</surname>
<given-names>Alex</given-names>
</name>
<xref ref-type="aff" rid="aff1"> a </xref>
<xref ref-type="aff" rid="aff4"> d </xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Peshkin</surname>
<given-names>Leonid</given-names>
</name>
<xref ref-type="aff" rid="aff2"> b </xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Yeh</surname>
<given-names>Alex</given-names>
</name>
<xref ref-type="aff" rid="aff1"> a </xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Hitzeman</surname>
<given-names>Janet</given-names>
</name>
<xref ref-type="aff" rid="aff1"> a </xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Hirschman</surname>
<given-names>Lynette</given-names>
</name>
<xref ref-type="aff" rid="aff1"> a </xref>
</contrib>
</contrib-group>
<aff id="aff1">
<label>a</label>
The MITRE Corporation, Bedford, MA</aff>
<aff id="aff2">
<label>b</label>
Center for Biomedical Informatics, Harvard Medical School, Boston, MA</aff>
<aff id="aff3">
<label>c</label>
Department of Computer Science, Brandeis University, Waltham, MA</aff>
<aff id="aff4">
<label>d</label>
Stanford Biomedical Informatics, Palo Alto, CA.</aff>
<author-notes>
<corresp id="cor1">
<label></label>
Correspondence and reprints: John Aberdeen, 202 Burlington Road, Bedford, MA 01730 (Email:
<email>aberdeen@mitre.org</email>
).</corresp>
</author-notes>
<pub-date pub-type="ppub">
<season>Sep-Oct</season>
<year>2007</year>
</pub-date>
<volume>14</volume>
<issue>5</issue>
<fpage>564</fpage>
<lpage>573</lpage>
<history>
<date date-type="received">
<day>13</day>
<month>3</month>
<year>2007</year>
</date>
<date date-type="accepted">
<day>11</day>
<month>6</month>
<year>2007</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2007, American Medical Informatics Association</copyright-statement>
<copyright-year>2007</copyright-year>
</permissions>
<abstract>
<sec>
<title>Objective</title>
<p>This paper describes a successful approach to de-identification that was developed to participate in a recent AMIA-sponsored challenge evaluation.</p>
</sec>
<sec sec-type="methods">
<title>Method</title>
<p>Our approach focused on rapid adaptation of existing toolkits for named entity recognition using two existing toolkits, Carafe and LingPipe.</p>
</sec>
<sec>
<title>Results</title>
<p>The “out of the box” Carafe system achieved a very good score (phrase F-measure of 0.9664) with only four hours of work to adapt it to the de-identification task. With further tuning, we were able to reduce the token-level error term by over 36% through task-specific feature engineering and the introduction of a lexicon, achieving a phrase F-measure of 0.9736.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>We were able to achieve good performance on the de-identification task by the rapid retargeting of existing toolkits. For the Carafe system, we developed a method for tuning the balance of recall vs. precision, as well as a confidence score that correlated well with the measured F-score.</p>
</sec>
</abstract>
</article-meta>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Informatique/explor/SgmlV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000175 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000175 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Informatique
   |area=    SgmlV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:1975794
   |texte=   Rapidly Retargetable Approaches to De-identification in Medical Records
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:17600096" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a SgmlV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jul 1 14:26:08 2019. Site generation: Wed Apr 28 21:40:44 2021