TEPAPA: a novel in silico feature learning pipeline for mining prognostic and associative factors from text-based electronic medical records
Identifieur interne : 007123 ( Ncbi/Checkpoint ); précédent : 007122; suivant : 007124TEPAPA: a novel in silico feature learning pipeline for mining prognostic and associative factors from text-based electronic medical records
Auteurs : Frank Po-Yen Lin ; Adrian Pokorny ; Christina Teng ; Richard J. EpsteinSource :
- Scientific Reports [ 2045-2322 ] ; 2017.
Abstract
Vast amounts of clinically relevant text-based variables lie undiscovered and unexploited in electronic medical records (EMR). To exploit this untapped resource, and thus facilitate the discovery of informative covariates from unstructured clinical narratives, we have built a novel computational pipeline termed
Url:
DOI: 10.1038/s41598-017-07111-0
PubMed: 28761061
PubMed Central: 5537364
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Pmc, to step Corpus: 000D66
- to stream Pmc, to step Curation: 000D66
- to stream Pmc, to step Checkpoint: 000044
- to stream Ncbi, to step Merge: 007123
- to stream Ncbi, to step Curation: 007123
Links to Exploration step
PMC:5537364Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">TEPAPA: a novel in silico feature learning pipeline for mining prognostic and associative factors from text-based electronic medical records</title>
<author><name sortKey="Lin, Frank Po Yen" sort="Lin, Frank Po Yen" uniqKey="Lin F" first="Frank Po-Yen" last="Lin">Frank Po-Yen Lin</name>
<affiliation><nlm:aff id="Aff1"><institution-wrap><institution-id institution-id-type="ISNI">0000 0000 9119 2677</institution-id>
<institution-id institution-id-type="GRID">grid.437825.f</institution-id>
<institution>Department of Oncology,</institution>
<institution>St Vincent’s Hospital & The Kinghorn Cancer Centre,</institution>
</institution-wrap>
Darlinghurst, NSW Australia</nlm:aff>
<wicri:noCountry code="subfield">NSW Australia</wicri:noCountry>
</affiliation>
<affiliation><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000 0000 9983 6924</institution-id>
<institution-id institution-id-type="GRID">grid.415306.5</institution-id>
<institution>Garvan Institute of Medical Research,</institution>
</institution-wrap>
Darlinghurst, NSW Australia</nlm:aff>
<wicri:noCountry code="subfield">NSW Australia</wicri:noCountry>
</affiliation>
</author>
<author><name sortKey="Pokorny, Adrian" sort="Pokorny, Adrian" uniqKey="Pokorny A" first="Adrian" last="Pokorny">Adrian Pokorny</name>
<affiliation><nlm:aff id="Aff1"><institution-wrap><institution-id institution-id-type="ISNI">0000 0000 9119 2677</institution-id>
<institution-id institution-id-type="GRID">grid.437825.f</institution-id>
<institution>Department of Oncology,</institution>
<institution>St Vincent’s Hospital & The Kinghorn Cancer Centre,</institution>
</institution-wrap>
Darlinghurst, NSW Australia</nlm:aff>
<wicri:noCountry code="subfield">NSW Australia</wicri:noCountry>
</affiliation>
</author>
<author><name sortKey="Teng, Christina" sort="Teng, Christina" uniqKey="Teng C" first="Christina" last="Teng">Christina Teng</name>
<affiliation><nlm:aff id="Aff3"><institution-wrap><institution-id institution-id-type="ISNI">0000 0004 0527 9653</institution-id>
<institution-id institution-id-type="GRID">grid.415994.4</institution-id>
<institution>Department of Medical Oncology,</institution>
<institution>Liverpool Hospital,</institution>
</institution-wrap>
Liverpool, Sydney, NSW Australia</nlm:aff>
<wicri:noCountry code="subfield">NSW Australia</wicri:noCountry>
</affiliation>
</author>
<author><name sortKey="Epstein, Richard J" sort="Epstein, Richard J" uniqKey="Epstein R" first="Richard J." last="Epstein">Richard J. Epstein</name>
<affiliation><nlm:aff id="Aff1"><institution-wrap><institution-id institution-id-type="ISNI">0000 0000 9119 2677</institution-id>
<institution-id institution-id-type="GRID">grid.437825.f</institution-id>
<institution>Department of Oncology,</institution>
<institution>St Vincent’s Hospital & The Kinghorn Cancer Centre,</institution>
</institution-wrap>
Darlinghurst, NSW Australia</nlm:aff>
<wicri:noCountry code="subfield">NSW Australia</wicri:noCountry>
</affiliation>
<affiliation><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000 0000 9983 6924</institution-id>
<institution-id institution-id-type="GRID">grid.415306.5</institution-id>
<institution>Garvan Institute of Medical Research,</institution>
</institution-wrap>
Darlinghurst, NSW Australia</nlm:aff>
<wicri:noCountry code="subfield">NSW Australia</wicri:noCountry>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">28761061</idno>
<idno type="pmc">5537364</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5537364</idno>
<idno type="RBID">PMC:5537364</idno>
<idno type="doi">10.1038/s41598-017-07111-0</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">000D66</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000D66</idno>
<idno type="wicri:Area/Pmc/Curation">000D66</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000D66</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000044</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000044</idno>
<idno type="wicri:Area/Ncbi/Merge">007123</idno>
<idno type="wicri:Area/Ncbi/Curation">007123</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">007123</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">TEPAPA: a novel in silico feature learning pipeline for mining prognostic and associative factors from text-based electronic medical records</title>
<author><name sortKey="Lin, Frank Po Yen" sort="Lin, Frank Po Yen" uniqKey="Lin F" first="Frank Po-Yen" last="Lin">Frank Po-Yen Lin</name>
<affiliation><nlm:aff id="Aff1"><institution-wrap><institution-id institution-id-type="ISNI">0000 0000 9119 2677</institution-id>
<institution-id institution-id-type="GRID">grid.437825.f</institution-id>
<institution>Department of Oncology,</institution>
<institution>St Vincent’s Hospital & The Kinghorn Cancer Centre,</institution>
</institution-wrap>
Darlinghurst, NSW Australia</nlm:aff>
<wicri:noCountry code="subfield">NSW Australia</wicri:noCountry>
</affiliation>
<affiliation><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000 0000 9983 6924</institution-id>
<institution-id institution-id-type="GRID">grid.415306.5</institution-id>
<institution>Garvan Institute of Medical Research,</institution>
</institution-wrap>
Darlinghurst, NSW Australia</nlm:aff>
<wicri:noCountry code="subfield">NSW Australia</wicri:noCountry>
</affiliation>
</author>
<author><name sortKey="Pokorny, Adrian" sort="Pokorny, Adrian" uniqKey="Pokorny A" first="Adrian" last="Pokorny">Adrian Pokorny</name>
<affiliation><nlm:aff id="Aff1"><institution-wrap><institution-id institution-id-type="ISNI">0000 0000 9119 2677</institution-id>
<institution-id institution-id-type="GRID">grid.437825.f</institution-id>
<institution>Department of Oncology,</institution>
<institution>St Vincent’s Hospital & The Kinghorn Cancer Centre,</institution>
</institution-wrap>
Darlinghurst, NSW Australia</nlm:aff>
<wicri:noCountry code="subfield">NSW Australia</wicri:noCountry>
</affiliation>
</author>
<author><name sortKey="Teng, Christina" sort="Teng, Christina" uniqKey="Teng C" first="Christina" last="Teng">Christina Teng</name>
<affiliation><nlm:aff id="Aff3"><institution-wrap><institution-id institution-id-type="ISNI">0000 0004 0527 9653</institution-id>
<institution-id institution-id-type="GRID">grid.415994.4</institution-id>
<institution>Department of Medical Oncology,</institution>
<institution>Liverpool Hospital,</institution>
</institution-wrap>
Liverpool, Sydney, NSW Australia</nlm:aff>
<wicri:noCountry code="subfield">NSW Australia</wicri:noCountry>
</affiliation>
</author>
<author><name sortKey="Epstein, Richard J" sort="Epstein, Richard J" uniqKey="Epstein R" first="Richard J." last="Epstein">Richard J. Epstein</name>
<affiliation><nlm:aff id="Aff1"><institution-wrap><institution-id institution-id-type="ISNI">0000 0000 9119 2677</institution-id>
<institution-id institution-id-type="GRID">grid.437825.f</institution-id>
<institution>Department of Oncology,</institution>
<institution>St Vincent’s Hospital & The Kinghorn Cancer Centre,</institution>
</institution-wrap>
Darlinghurst, NSW Australia</nlm:aff>
<wicri:noCountry code="subfield">NSW Australia</wicri:noCountry>
</affiliation>
<affiliation><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000 0000 9983 6924</institution-id>
<institution-id institution-id-type="GRID">grid.415306.5</institution-id>
<institution>Garvan Institute of Medical Research,</institution>
</institution-wrap>
Darlinghurst, NSW Australia</nlm:aff>
<wicri:noCountry code="subfield">NSW Australia</wicri:noCountry>
</affiliation>
</author>
</analytic>
<series><title level="j">Scientific Reports</title>
<idno type="eISSN">2045-2322</idno>
<imprint><date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><p id="Par1">Vast amounts of clinically relevant text-based variables lie undiscovered and unexploited in electronic medical records (EMR). To exploit this untapped resource, and thus facilitate the discovery of informative covariates from unstructured clinical narratives, we have built a novel computational pipeline termed <italic>T</italic>
ext-based <italic>E</italic>
xploratory <italic>P</italic>
attern <italic>A</italic>
nalyser for <italic>P</italic>
rognosticator and <italic>A</italic>
ssociator discovery (TEPAPA). This pipeline combines semantic-free natural language processing (NLP), regular expression induction, and statistical association testing to identify conserved text patterns associated with outcome variables of clinical interest. When we applied TEPAPA to a cohort of head and neck squamous cell carcinoma patients, plausible concepts known to be correlated with human papilloma virus (HPV) status were identified from the EMR text, including site of primary disease, tumour stage, pathologic characteristics, and treatment modalities. Similarly, correlates of other variables (including gender, nodal status, recurrent disease, smoking and alcohol status) were also reliably recovered. Using highly-associated patterns as covariates, a patient’s HPV status was classifiable using a bootstrap analysis with a mean area under the ROC curve of 0.861, suggesting its predictive utility in supporting EMR-based phenotyping tasks. These data support using this integrative approach to efficiently identify disease-associated factors from unstructured EMR narratives, and thus to efficiently generate testable hypotheses.</p>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Frankovich, J" uniqKey="Frankovich J">J Frankovich</name>
</author>
<author><name sortKey="Longhurst, Ca" uniqKey="Longhurst C">CA Longhurst</name>
</author>
<author><name sortKey="Sutherland, Sm" uniqKey="Sutherland S">SM Sutherland</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Zheng, K" uniqKey="Zheng K">K Zheng</name>
</author>
<author><name sortKey="Mei, Q" uniqKey="Mei Q">Q Mei</name>
</author>
<author><name sortKey="Hanauer, Da" uniqKey="Hanauer D">DA Hanauer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kahn, Mg" uniqKey="Kahn M">MG Kahn</name>
</author>
<author><name sortKey="Weng, C" uniqKey="Weng C">C Weng</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chute, Cg" uniqKey="Chute C">CG Chute</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sledge, Gw" uniqKey="Sledge G">GW Sledge</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Abernethy, Ap" uniqKey="Abernethy A">AP Abernethy</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Shrager, J" uniqKey="Shrager J">J Shrager</name>
</author>
<author><name sortKey="Tenenbaum, Jm" uniqKey="Tenenbaum J">JM Tenenbaum</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Jensen, Pb" uniqKey="Jensen P">PB Jensen</name>
</author>
<author><name sortKey="Jensen, Lj" uniqKey="Jensen L">LJ Jensen</name>
</author>
<author><name sortKey="Brunak, S" uniqKey="Brunak S">S Brunak</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kho, An" uniqKey="Kho A">AN Kho</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Warner, Jl" uniqKey="Warner J">JL Warner</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Denny, Jc" uniqKey="Denny J">JC Denny</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ritchie, Md" uniqKey="Ritchie M">MD Ritchie</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Denny, Jc" uniqKey="Denny J">JC Denny</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wei, Wq" uniqKey="Wei W">WQ Wei</name>
</author>
<author><name sortKey="Denny, Jc" uniqKey="Denny J">JC Denny</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kohane, Is" uniqKey="Kohane I">IS Kohane</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Denny, Jc" uniqKey="Denny J">JC Denny</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Uzuner, O" uniqKey="Uzuner O">O Uzuner</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Delisle, S" uniqKey="Delisle S">S DeLisle</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Roque, Fs" uniqKey="Roque F">FS Roque</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kullo, Ij" uniqKey="Kullo I">IJ Kullo</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Fernandez Breis, Jt" uniqKey="Fernandez Breis J">JT Fernández-Breis</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Richesson, Rl" uniqKey="Richesson R">RL Richesson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chaturvedi, Ak" uniqKey="Chaturvedi A">AK Chaturvedi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Smith, Em" uniqKey="Smith E">EM Smith</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gillison, Ml" uniqKey="Gillison M">ML Gillison</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Marur, S" uniqKey="Marur S">S Marur</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Anaya Saavedra, G" uniqKey="Anaya Saavedra G">G Anaya-Saavedra</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Klussmann, Jp" uniqKey="Klussmann J">JP Klussmann</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="D Ouza, G" uniqKey="D Ouza G">G D’Souza</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Begum, S" uniqKey="Begum S">S Begum</name>
</author>
<author><name sortKey="Westra, Wh" uniqKey="Westra W">WH Westra</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mork, J" uniqKey="Mork J">J Mork</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gillison, Ml" uniqKey="Gillison M">ML Gillison</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hafkamp, Hc" uniqKey="Hafkamp H">HC Hafkamp</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Goldenberg, D" uniqKey="Goldenberg D">D Goldenberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="O Ullivan, B" uniqKey="O Ullivan B">B O’Sullivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Toutanova, K" uniqKey="Toutanova K">K Toutanova</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Klein, D" uniqKey="Klein D">D Klein</name>
</author>
<author><name sortKey="Manning, Cd" uniqKey="Manning C">CD Manning</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Bui, Dd" uniqKey="Bui D">DD Bui</name>
</author>
<author><name sortKey="Zeng Treitler, Q" uniqKey="Zeng Treitler Q">Q Zeng-Treitler</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hall, M" uniqKey="Hall M">M Hall</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Freund, Y" uniqKey="Freund Y">Y Freund</name>
</author>
<author><name sortKey="Mason, L" uniqKey="Mason L">L Mason</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Prasse, P" uniqKey="Prasse P">P Prasse</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Savova, Gk" uniqKey="Savova G">GK Savova</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Bland, Jm" uniqKey="Bland J">JM Bland</name>
</author>
<author><name sortKey="Altman, Dg" uniqKey="Altman D">DG Altman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Benjamini, Y" uniqKey="Benjamini Y">Y Benjamini</name>
</author>
<author><name sortKey="Hochberg, Y" uniqKey="Hochberg Y">Y Hochberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Hripcsak, G" uniqKey="Hripcsak G">G Hripcsak</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hripcsak, G" uniqKey="Hripcsak G">G Hripcsak</name>
</author>
<author><name sortKey="Albers, Dj" uniqKey="Albers D">DJ Albers</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hersh, Wr" uniqKey="Hersh W">WR Hersh</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations><list></list>
<tree><noCountry><name sortKey="Epstein, Richard J" sort="Epstein, Richard J" uniqKey="Epstein R" first="Richard J." last="Epstein">Richard J. Epstein</name>
<name sortKey="Lin, Frank Po Yen" sort="Lin, Frank Po Yen" uniqKey="Lin F" first="Frank Po-Yen" last="Lin">Frank Po-Yen Lin</name>
<name sortKey="Pokorny, Adrian" sort="Pokorny, Adrian" uniqKey="Pokorny A" first="Adrian" last="Pokorny">Adrian Pokorny</name>
<name sortKey="Teng, Christina" sort="Teng, Christina" uniqKey="Teng C" first="Christina" last="Teng">Christina Teng</name>
</noCountry>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Santé/explor/EdenteV2/Data/Ncbi/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 007123 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Checkpoint/biblio.hfd -nk 007123 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Santé |area= EdenteV2 |flux= Ncbi |étape= Checkpoint |type= RBID |clé= PMC:5537364 |texte= TEPAPA: a novel in silico feature learning pipeline for mining prognostic and associative factors from text-based electronic medical records }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Checkpoint/RBID.i -Sk "pubmed:28761061" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Checkpoint/biblio.hfd \ | NlmPubMed2Wicri -a EdenteV2
This area was generated with Dilib version V0.6.32. |