Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

The Temple University Hospital EEG Data Corpus

Identifieur interne : 000293 ( Ncbi/Merge ); précédent : 000292

The Temple University Hospital EEG Data Corpus

Auteurs : Iyad Obeid ; Joseph Picone

Source :

RBID : PMC:4865520
Url:
DOI: 10.3389/fnins.2016.00196
PubMed: NONE
PubMed Central: 4865520

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:4865520

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">The Temple University Hospital EEG Data Corpus</title>
<author>
<name sortKey="Obeid, Iyad" sort="Obeid, Iyad" uniqKey="Obeid I" first="Iyad" last="Obeid">Iyad Obeid</name>
</author>
<author>
<name sortKey="Picone, Joseph" sort="Picone, Joseph" uniqKey="Picone J" first="Joseph" last="Picone">Joseph Picone</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmc">4865520</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4865520</idno>
<idno type="RBID">PMC:4865520</idno>
<idno type="doi">10.3389/fnins.2016.00196</idno>
<idno type="pmid">NONE</idno>
<date when="2016">2016</date>
<idno type="wicri:Area/Pmc/Corpus">000021</idno>
<idno type="wicri:Area/Pmc/Curation">000021</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000001</idno>
<idno type="wicri:Area/Ncbi/Merge">000293</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">The Temple University Hospital EEG Data Corpus</title>
<author>
<name sortKey="Obeid, Iyad" sort="Obeid, Iyad" uniqKey="Obeid I" first="Iyad" last="Obeid">Iyad Obeid</name>
</author>
<author>
<name sortKey="Picone, Joseph" sort="Picone, Joseph" uniqKey="Picone J" first="Joseph" last="Picone">Joseph Picone</name>
</author>
</analytic>
<series>
<title level="j">Frontiers in Neuroscience</title>
<idno type="ISSN">1662-4548</idno>
<idno type="eISSN">1662-453X</idno>
<imprint>
<date when="2016">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Alotaiby, T N" uniqKey="Alotaiby T">T. N. Alotaiby</name>
</author>
<author>
<name sortKey="Alshebeili, S A" uniqKey="Alshebeili S">S. A. Alshebeili</name>
</author>
<author>
<name sortKey="Alshawi, T" uniqKey="Alshawi T">T. Alshawi</name>
</author>
<author>
<name sortKey="Ahmad, I" uniqKey="Ahmad I">I. Ahmad</name>
</author>
<author>
<name sortKey="Abd El Samie, F E" uniqKey="Abd El Samie F">F. E. Abd El-Samie</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goldberger, A L" uniqKey="Goldberger A">A. L. Goldberger</name>
</author>
<author>
<name sortKey="Amaral, L A" uniqKey="Amaral L">L. A. Amaral</name>
</author>
<author>
<name sortKey="Glass, L" uniqKey="Glass L">L. Glass</name>
</author>
<author>
<name sortKey="Hausdorff, J M" uniqKey="Hausdorff J">J. M. Hausdorff</name>
</author>
<author>
<name sortKey="Ivanov, P C" uniqKey="Ivanov P">P. C. Ivanov</name>
</author>
<author>
<name sortKey="Mark, R G" uniqKey="Mark R">R. G. Mark</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gotman, J" uniqKey="Gotman J">J. Gotman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lebedev, M" uniqKey="Lebedev M">M. Lebedev</name>
</author>
<author>
<name sortKey="Nicolelis, M" uniqKey="Nicolelis M">M. Nicolelis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liao, L D" uniqKey="Liao L">L. D. Liao</name>
</author>
<author>
<name sortKey="Lin, C T" uniqKey="Lin C">C. T. Lin</name>
</author>
<author>
<name sortKey="Mcdowell, K" uniqKey="Mcdowell K">K. McDowell</name>
</author>
<author>
<name sortKey="Wickenden, A E" uniqKey="Wickenden A">A. E. Wickenden</name>
</author>
<author>
<name sortKey="Gramann, K" uniqKey="Gramann K">K. Gramann</name>
</author>
<author>
<name sortKey="Jung, T P" uniqKey="Jung T">T. P. Jung</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mckee, A C" uniqKey="Mckee A">A. C. McKee</name>
</author>
<author>
<name sortKey="Cantu, R C" uniqKey="Cantu R">R. C. Cantu</name>
</author>
<author>
<name sortKey="Nowinski, C J" uniqKey="Nowinski C">C. J. Nowinski</name>
</author>
<author>
<name sortKey="Hedley Whyte, E T" uniqKey="Hedley Whyte E">E. T. Hedley-Whyte</name>
</author>
<author>
<name sortKey="Gavett, B E" uniqKey="Gavett B">B. E. Gavett</name>
</author>
<author>
<name sortKey="Budson, A E" uniqKey="Budson A">A. E. Budson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Picone, J" uniqKey="Picone J">J. Picone</name>
</author>
<author>
<name sortKey="Obeid, I" uniqKey="Obeid I">I. Obeid</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ramgopal, S" uniqKey="Ramgopal S">S. Ramgopal</name>
</author>
<author>
<name sortKey="Thome Souza, S" uniqKey="Thome Souza S">S. Thome-Souza</name>
</author>
<author>
<name sortKey="Jackson, M" uniqKey="Jackson M">M. Jackson</name>
</author>
<author>
<name sortKey="Kadish, N E" uniqKey="Kadish N">N. E. Kadish</name>
</author>
<author>
<name sortKey="Sanchez Fernandez, I" uniqKey="Sanchez Fernandez I">I. Sánchez Fernández</name>
</author>
<author>
<name sortKey="Klehm, J" uniqKey="Klehm J">J. Klehm</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schalk, G" uniqKey="Schalk G">G. Schalk</name>
</author>
<author>
<name sortKey="Mcfarland, D J" uniqKey="Mcfarland D">D. J. McFarland</name>
</author>
<author>
<name sortKey="Hinterberger, T" uniqKey="Hinterberger T">T. Hinterberger</name>
</author>
<author>
<name sortKey="Birbaumer, N" uniqKey="Birbaumer N">N. Birbaumer</name>
</author>
<author>
<name sortKey="Wolpaw, J R" uniqKey="Wolpaw J">J. R. Wolpaw</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Selvaraj, T G" uniqKey="Selvaraj T">T. G. Selvaraj</name>
</author>
<author>
<name sortKey="Ramasamy, B" uniqKey="Ramasamy B">B. Ramasamy</name>
</author>
<author>
<name sortKey="Jeyaraj, S J" uniqKey="Jeyaraj S">S. J. Jeyaraj</name>
</author>
<author>
<name sortKey="Suviseshamuthu, E S" uniqKey="Suviseshamuthu E">E. S. Suviseshamuthu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shoeb, A" uniqKey="Shoeb A">A. Shoeb</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stern, R A" uniqKey="Stern R">R. A. Stern</name>
</author>
<author>
<name sortKey="Riley, D O" uniqKey="Riley D">D. O. Riley</name>
</author>
<author>
<name sortKey="Daneshvar, D H" uniqKey="Daneshvar D">D. H. Daneshvar</name>
</author>
<author>
<name sortKey="Nowinski, C J" uniqKey="Nowinski C">C. J. Nowinski</name>
</author>
<author>
<name sortKey="Cantu, R C" uniqKey="Cantu R">R. C. Cantu</name>
</author>
<author>
<name sortKey="Mckee, A C" uniqKey="Mckee A">A. C. McKee</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tatum, W" uniqKey="Tatum W">W. Tatum</name>
</author>
<author>
<name sortKey="Husain, A" uniqKey="Husain A">A. Husain</name>
</author>
<author>
<name sortKey="Benbadis, S" uniqKey="Benbadis S">S. Benbadis</name>
</author>
<author>
<name sortKey="Kaplan, P" uniqKey="Kaplan P">P. Kaplan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, W" uniqKey="Wang W">W. Wang</name>
</author>
<author>
<name sortKey="Collinger, J L" uniqKey="Collinger J">J. L. Collinger</name>
</author>
<author>
<name sortKey="Degenhart, A D" uniqKey="Degenhart A">A. D. Degenhart</name>
</author>
<author>
<name sortKey="Tyler Kabara, E C" uniqKey="Tyler Kabara E">E. C. Tyler-Kabara</name>
</author>
<author>
<name sortKey="Schwartz, A B" uniqKey="Schwartz A">A. B. Schwartz</name>
</author>
<author>
<name sortKey="Moran, D W" uniqKey="Moran D">D. W. Moran</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Weiss, P S" uniqKey="Weiss P">P. S. Weiss</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yamada, T" uniqKey="Yamada T">T. Yamada</name>
</author>
<author>
<name sortKey="Meng, E" uniqKey="Meng E">E. Meng</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="data-paper">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Front Neurosci</journal-id>
<journal-id journal-id-type="iso-abbrev">Front Neurosci</journal-id>
<journal-id journal-id-type="publisher-id">Front. Neurosci.</journal-id>
<journal-title-group>
<journal-title>Frontiers in Neuroscience</journal-title>
</journal-title-group>
<issn pub-type="ppub">1662-4548</issn>
<issn pub-type="epub">1662-453X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmc">4865520</article-id>
<article-id pub-id-type="doi">10.3389/fnins.2016.00196</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Neuroscience</subject>
<subj-group>
<subject>Data Report</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>The Temple University Hospital EEG Data Corpus</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Obeid</surname>
<given-names>Iyad</given-names>
</name>
<xref ref-type="author-notes" rid="fn001">
<sup>*</sup>
</xref>
<uri xlink:type="simple" xlink:href="http://loop.frontiersin.org/people/10044/overview"></uri>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Picone</surname>
<given-names>Joseph</given-names>
</name>
<uri xlink:type="simple" xlink:href="http://loop.frontiersin.org/people/347163/overview"></uri>
</contrib>
</contrib-group>
<aff>
<institution>Electrical and Computer Engineering, Temple University</institution>
<country>Philadelphia, PA, USA</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>Edited by: Mikhail Lebedev, Duke University, USA</p>
</fn>
<fn fn-type="edited-by">
<p>Reviewed by: Duygu Kuzum, University of California, San Diego, USA; Ervin Sejdic, University of Pittsburgh, USA; Ivan Selesnick, New York University, USA; Xiaomu Song, Widener University, USA; Zhanpeng Jin, Binghamton University, USA</p>
</fn>
<corresp id="fn001">*Correspondence: Iyad Obeid
<email xlink:type="simple">iobeid@temple.edu</email>
</corresp>
<fn fn-type="other" id="fn002">
<p>This article was submitted to Neural Technology, a section of the journal Frontiers in Neuroscience</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>13</day>
<month>5</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="collection">
<year>2016</year>
</pub-date>
<volume>10</volume>
<elocation-id>196</elocation-id>
<history>
<date date-type="received">
<day>29</day>
<month>2</month>
<year>2016</year>
</date>
<date date-type="accepted">
<day>20</day>
<month>4</month>
<year>2016</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2016 Obeid and Picone.</copyright-statement>
<copyright-year>2016</copyright-year>
<copyright-holder>Obeid and Picone</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</license-p>
</license>
</permissions>
<kwd-group>
<kwd>EEG</kwd>
<kwd>database</kwd>
<kwd>machine learning</kwd>
<kwd>clinical trials as topic</kwd>
<kwd>big data</kwd>
</kwd-group>
<funding-group>
<award-group>
<funding-source id="cn001">Defense Advanced Research Projects Agency
<named-content content-type="fundref-id">10.13039/100000185</named-content>
</funding-source>
<award-id rid="cn001">D13AP00065</award-id>
</award-group>
<award-group>
<funding-source id="cn002">National Science Foundation
<named-content content-type="fundref-id">10.13039/100000001</named-content>
</funding-source>
<award-id rid="cn002">1458411</award-id>
<award-id rid="cn002">1305190</award-id>
</award-group>
</funding-group>
<counts>
<fig-count count="2"></fig-count>
<table-count count="0"></table-count>
<equation-count count="0"></equation-count>
<ref-count count="16"></ref-count>
<page-count count="5"></page-count>
<word-count count="3219"></word-count>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>The electroencephalogram (EEG) is an excellent tool for probing neural function, both in clinical and research environments, due to its low cost, non-invasive nature, and pervasiveness. In the clinic, the EEG is the standard test for diagnosing and characterizing epilepsy and stroke, as well as a host of other trauma and pathology related conditions (Tatum et al.,
<xref rid="B12" ref-type="bibr">2007</xref>
; Yamada and Meng,
<xref rid="B15" ref-type="bibr">2009</xref>
). In research laboratories, EEG is used to study neural responses to external stimuli, motor planning and execution, and brain-computer interfaces (Lebedev and Nicolelis,
<xref rid="B4" ref-type="bibr">2006</xref>
; Wang et al.,
<xref rid="B13" ref-type="bibr">2013</xref>
). While human interpretation is still the gold standard for EEG analysis in the clinic, a host of software tools exist to facilitate the process or to make predictive analyses such as seizure prediction.</p>
<p>Recently, a confluence of events has underscored the need for robust EEG tools. First, there has been a renewed push via the White House BRAIN initiative to understand neural function and disease (Weiss,
<xref rid="B14" ref-type="bibr">2013</xref>
). Secondly, there is an increased awareness on brain injury owing to both the influx of injured warfighters and numerous high-profile athletes found to have chronic brain damage (McKee et al.,
<xref rid="B6" ref-type="bibr">2009</xref>
; Stern et al.,
<xref rid="B11" ref-type="bibr">2011</xref>
). And thirdly, a wave of consumer grade scalp sensors has entered the market, allowing end users to monitor sleep, arousal, and mood (Liao et al.,
<xref rid="B5" ref-type="bibr">2012</xref>
).</p>
<p>In all these applications, there is a need for robust signal processing tools to analyze the EEG data. Historically, EEG signal processing tools have been devised using either ad hoc heuristic methods, or by training pattern recognition engines on small data sets (Gotman,
<xref rid="B3" ref-type="bibr">1982</xref>
). These methods have yielded limited results, owing mostly to the fact that brain signals (and EEG in particular) are characterized by great variability, which can only be properly interpreted by building statistical models using massive amounts of data (Alotaiby et al.,
<xref rid="B1" ref-type="bibr">2014</xref>
; Ramgopal et al.,
<xref rid="B7" ref-type="bibr">2014</xref>
). Unfortunately, despite EEG being perhaps the most pervasive modality for acquiring brain signals, there is a severe lack of data in the public domain. For example, the “EEG Motor Movement/Imagery Dataset” (
<ext-link ext-link-type="uri" xlink:href="http://www.physionet.org/pn4/eegmmidb/">http://www.physionet.org/pn4/eegmmidb/</ext-link>
) contains ~1500 recordings of 1 or 2 min duration apiece from 109 subjects (Goldberger et al.,
<xref rid="B2" ref-type="bibr">2000</xref>
; Schalk et al.,
<xref rid="B8" ref-type="bibr">2004</xref>
). The CHB-MIT database contains data from 22 subjects, mostly pediatric (Shoeb,
<xref rid="B10" ref-type="bibr">2009</xref>
). A database from Karunya University contains 175 16-channel EEGs of duration 10 s (Selvaraj et al.,
<xref rid="B9" ref-type="bibr">2014</xref>
). One of the most extensive databases for supporting epilepsy research is the European Epilepsy Database (
<ext-link ext-link-type="uri" xlink:href="http://epilepsy-database.eu/">http://epilepsy-database.eu/</ext-link>
), which contains 250 datasets from 30 unique patients, but sells for €3000. Other databases, such as ieee.org, contain a wealth of data from more invasive modalities such as electrocorticogram, but little or no EEG.</p>
<p>This lack of publically available data is ironic considering that hundreds of thousands of EEGs are administered annually in clinical settings around the world. Relatively little of this data is publicly available to the research community in a form that is useful to machine learning research. Massive amounts of EEG data would allow the use of state-of-the-art machine learning algorithms to discover new diagnostics and validate clinical practice. Furthermore, it is desirable that such data be collected in clinical settings, as opposed to tightly controlled research environments, since “clinical-grade” data is inherently more variable with respect to parameters such as electrode location, clinical environment, equipment, and noise. Capturing this variability is critical to the development of robust, high performance technology that has real-world impact.</p>
<p>In this work, we describe a new corpus, the TUH-EEG Corpus, which is an ongoing data collection effort that has recently released 14 years of clinical EEG data collected at Temple University Hospital. The records have been curated, organized, and paired with textual clinician reports that describe the patients and scans. The corpus is publicly available from the Neural Engineering Data Consortium (
<ext-link ext-link-type="uri" xlink:href="http://www.nedcdata.org">www.nedcdata.org</ext-link>
) (Picone and Obeid,
<xref rid="B16" ref-type="bibr">2016</xref>
).</p>
</sec>
<sec sec-type="methods" id="s2">
<title>Methods</title>
<p>Clinical EEG data were collected from archival records at Temple University Hospital (TUH). All work was performed in accordance with the Declaration of Helsinki and with the full approval of the Temple University IRB. All personnel in contact with privileged patient information were fully trained on patient privacy and were certified by the Temple IRB.</p>
<p>Archival EEG signal data were recovered from CD-ROMs. Files were converted from their native proprietary file format (Nicolet&s NicVue) to an open format EDF standard. Data was then rigorously de-identified to conform to the HIPAA Privacy Rule by eliminating 18 potential identifiers including patient names and dates of birth. Patient medical record numbers were replaced with randomized database identifiers, with a key to that mapping being saved to a secure off-line location. Importantly, our process captured instances in which the same patient received multiple EEGs over time and assigned database IDs accordingly. Data de-identification was performed by combining automated custom-designed software tools with manual editing and proofreading. All storage and manipulation of source files was conducted on dedicated non-network connected computers that were physically located within the TUH Department of Neurology.</p>
<p>We also manually paired each retrieved EEG with its corresponding clinician report. These reports are generated by the neurologist after analyzing the EEG scan and are the official hospital summary of the clinical impression. These reports are comprised of unstructured text that describes the patient, relevant history, medications, and clinical impression. Reports were mined from the hospital&s central electronic medical records archives and typically consisted of image scans of printed reports. Various levels of image processing were employed to improve the image quality before applying optical character recognition (OCR) to convert the images into text. A combination of software and manual editing was used to scrub protected health information (PHI) from the reports and to correct errors in OCR transcription. Only sessions with both an EEG and a corresponding clinician report were included in the final corpus.</p>
<p>The corpus was defined with a hierarchical Unix-style filetree structure. The top folder,
<monospace>edf</monospace>
, contains 109 numbered folders, each of which contain numbered folders for up to 100 patients. Each of these patient folders contains sub-folders that correspond to individual recording sessions. Those folder names reflect the session number and date of recording. Finally, each session folder includes one or more EEG (
<monospace>.edf</monospace>
) data files as well as the clinician report in
<monospace>.txt</monospace>
format. Figure
<xref ref-type="fig" rid="F1">1</xref>
summarizes the corpus file structure and gives examples of text and signal data.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption>
<p>
<bold>Directory and file structure of the TUH-EEG database</bold>
. Data is organized by patient (orange) and then by session (yellow). Each session contains one or more signal (edf) and physician report (txt) files. To accommodate file system management issues, patients are grouped into sets of about 100 (blue).</p>
</caption>
<graphic xlink:href="fnins-10-00196-g0001"></graphic>
</fig>
</sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<p>The completed corpus comprises 16,986 sessions from 10,874 unique subjects. Each of these sessions contains at least one EDF file (more in the case of long term monitoring sessions that were broken into multiple files) and one physician report. Corpus metrics are summarized in Figure
<xref ref-type="fig" rid="F2">2</xref>
. Subjects were 51% female and ranged in age from less than 1 year to over 90 (average 51.6, stdev 55.9; see Figure
<xref ref-type="fig" rid="F2">2</xref>
bottom left). The average number of sessions per patient was 1.56, although as many as 37 EEGs were recorded for a single patient over an 8-month period (Figure
<xref ref-type="fig" rid="F2">2</xref>
top left). The number of sessions per year varies from ~1000 to 2500 (with the exception of years 2000–2002, and 2005, in which limited numbers of complete reports were found in the various electronic medical record archives; see Figure
<xref ref-type="fig" rid="F2">2</xref>
top right).</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption>
<p>
<bold>Metrics describing the TUH-EEG corpus</bold>
. [
<bold>Top left</bold>
] histogram showing number of sessions per patient; [
<bold>top right</bold>
] histogram showing number of sessions recorded per calendar year; [
<bold>bottom left</bold>
] histogram of patient ages; [
<bold>bottom right</bold>
] histogram showing number of EEG-only channels (purple); and total channels (green).</p>
</caption>
<graphic xlink:href="fnins-10-00196-g0002"></graphic>
</fig>
<p>There was a substantial degree of variability with respect to the number of channels included in the corpus (see Figure
<xref ref-type="fig" rid="F2">2</xref>
bottom right). EDF files typically contained both EEG-specific channels as well as supplementary channels such as detected bursts, EKG, EMG, and photic stimuli. The most common number of EEG-only channels per EDF file was 31, although there were cases with as few as 20. A majority of the EEG data was sampled at 250 Hz (87%) with the remaining data being sampled at 256 Hz (8.3%), 400 Hz (3.8%), and 512 Hz (1%).</p>
<p>An initial analysis of the physician reports reveals a wide range of medications and medical conditions. Unsurprisingly, the most common listed medications were anti-convulsants such as Keppra and Dilantin, as well as blood thinners such as Lovenox and heparin. Approximately 87% of the reports included the text string “
<monospace>epilep</monospace>
,” and about 12% included “
<monospace>stroke</monospace>
.” Only 48 total reports included the string “
<monospace>concus</monospace>
.”</p>
<p>The TUH-EEG corpus v0.6.0 has been released and is freely available online at
<ext-link ext-link-type="uri" xlink:href="http://www.nedcdata.org">www.nedcdata.org</ext-link>
. Users must register with a valid email address. The uncompressed EDF files and reports together comprise 572 GB. For convenience, the website stores all data from each patient as individual
<monospace>gzip</monospace>
files with a median filesize of 4.1 MB; all 10,874
<monospace>gzips</monospace>
together comprise 330GB. Users wanting to access the entire database are encouraged to physically mail a USB hard drive to the authors in order to avoid the downloading process.</p>
</sec>
<sec sec-type="discussion" id="s4">
<title>Discussion</title>
<p>This work presents the world&s largest publically available corpus of clinical EEG data, representing a grand total of 29.1 years (total duration summed over all EEG channels) of EEG data. In addition to its size, this corpus features a wide variation of patient ages, diagnoses, medications, channel counts, and sampling rates. Furthermore, the corpus continues to be expanded at a rate of ~2500 new sessions per year.</p>
<p>Biomedicine is entering a new age of data-driven discovery driven by ubiquitous computing power, inexpensive data storage, the machine learning revolution, and high speed internet connections. Access to massive quantities of properly curated data is now the critical bottleneck to advancement in many areas of biomedical research. Ironically, doctors and clinicians generate enormous quantities of data every day, but that information is almost exclusively sequestered in secure archives where it cannot be used for research by the biomedical research community. The quantity, quality, and variability of such data represent a significant unrealized potential, which is doubly unfortunate considering that the cost of generating that data has already been borne. Although, there has been some advancement with respect to publishing databases of patient metadata, curated
<italic>signal</italic>
databases are much less commonly available, especially in quantities that would be sufficient to train most contemporary machine learning engines.</p>
<p>In this work, we have endeavored to achieve two goals. The first is to create a corpus of clinical EEG signals and their corresponding physician reports. The second is to establish best practices for the curation and publication of clinical signal data, which is an inherently different entity than discrete metadata. The EEG corpus we present here is the first of its kind, both in terms of volume and heterogeneity, both of which are critical factors for training machine learning engines. Typically, “research-grade” data is created by tightly controlling as many external factors as possible. In contrast, “clinical-grade” data is inherently heterogeneous with respect to those same external factors. Whereas certain classes of research questions can only be answered using well-controlled data, others benefit from variability. For example, an epilepsy detection algorithm that is trained using 31 specific EEG channels may not be effective if one or more of those channels are not connected, or if the electrodes are improperly located or affixed to the scalp. Algorithms that must be sufficiently robust to function under a plurality of conditions must be trained with data that is sufficiently heterogeneous.</p>
<p>Our work has shown that, although clinical signal data is ubiquitous and inherently valuable to the research community, it requires substantial manipulation before it can be released as an adequately curated data corpus. This effort is non-trivial, both in terms of time and cost. Our team&s activities ranged from the mundane (e.g., manually copying archival hospital data from over 1500 CD-ROMs) to more technical challenges (e.g., developing software for detecting data entry errors in the clinical records). Physician reports had to be located through one of five different EMR portals, often manually. A battery of tests was created to validate that each record was complete, unique, error-free, and completely free of privileged patient information. A rigorous accounting system was created to track and organize the tens of thousands of files and their status.</p>
<p>The cost to develop the TUH EEG Corpus has been relatively low, totaling less than $100 K in direct charges. As medical record technology improves, the cost of this collection can be reduced even further. On the balance, these types of large-scale collections are a worthwhile investment, since costs are minor relative to the cost of acquiring the data or conducting research on the data. In general, the authors expect that a dedicated community-wide data facility would be best suited to curate data of the magnitude and complexity described here because there are significant on-going costs associated with such an activity.</p>
<p>An example of these on-going costs is annotation of the data—a critical issue for machine learning research. In most semi-supervised machine learning applications, one of the first steps is to annotate the data, a process in which important elements of the signal are marked as such. This can be performed either manually by a human domain expert, or automatically with a bootstrap-style algorithm. In addition to the EEG data itself, we are releasing a collection of annotations which may be downloaded separately if they are of interest to the user. The annotations contain the start and stop time and an event label and are specific to each channel. Six classes of events are included: (1) spike and/or sharp waves (SPSW), (2) periodic lateralized epileptiform discharges (PLED), and (3) generalized periodic epileptiform discharges (GPED). SPSW events are epileptiform transients that are typically observed in patients with epilepsy. PLED events are indicative of EEG abnormalities and often manifest themselves with repetitive spike or sharp wave discharges that can be focal or lateralized over one hemisphere. These signals display quasi-periodic behavior. GPED events are similar to PLEDs, and manifest themselves as periodic short-interval diffuse discharges, periodic long-interval diffuse discharges and suppression-burst patterns according to the interval between the discharges. Triphasic waves, which manifest themselves as diffuse and bilaterally synchronous spikes with bifrontal predominance, typically at a rate of 1–2 Hz, are also included in this class.</p>
<p>Three events are used to model background noise: (1) artifacts (ARTF) are recorded electrical activity that is not of cerebral origin, such as those due to the equipment, patient behavior or the environment; (2) eye movement (EYEM) are common events that can often be confused with a spike; (3) background (BCKG) is used for all other signals.</p>
<p>These six classes (three signal classes and three noise classes) were arrived at through several iterations of a study conducted with Temple University Hospital neurologists. Automatic labeling of these events allows a neurologist to rapidly search long-term EEG recordings for anomalous behavior. However, there are many more annotations that need to be developed for this data. For example, we are currently developing technology to automatically annotate seizures. There are many other events of interest that need annotation (e.g., sleep states). We expect to be continually enhancing the value of the TUH EEG Corpus.</p>
</sec>
<sec id="s5">
<title>Author contributions</title>
<p>JP led the database creation effort and co-wrote the manuscript. IO contributed to the database creation effort, performed data metrics, and co-wrote the manuscript.</p>
</sec>
<sec id="s6">
<title>Funding</title>
<p>This work has been supported by DARPA award D13AP00065, NSF awards 1305190 and 1458411, and by the Research Office and Dean of Engineering at Temple University.</p>
<sec>
<title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Alotaiby</surname>
<given-names>T. N.</given-names>
</name>
<name>
<surname>Alshebeili</surname>
<given-names>S. A.</given-names>
</name>
<name>
<surname>Alshawi</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Ahmad</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Abd El-Samie</surname>
<given-names>F. E.</given-names>
</name>
</person-group>
(
<year>2014</year>
).
<article-title>EEG seizure detection and prediction algorithms: a survey</article-title>
.
<source>EURASIP J. Adv. Signal Process.</source>
<volume>2014</volume>
:
<issue>183</issue>
<pub-id pub-id-type="doi">10.1186/1687-6180-2014-183</pub-id>
</mixed-citation>
</ref>
<ref id="B2">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goldberger</surname>
<given-names>A. L.</given-names>
</name>
<name>
<surname>Amaral</surname>
<given-names>L. A.</given-names>
</name>
<name>
<surname>Glass</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Hausdorff</surname>
<given-names>J. M.</given-names>
</name>
<name>
<surname>Ivanov</surname>
<given-names>P. C.</given-names>
</name>
<name>
<surname>Mark</surname>
<given-names>R. G.</given-names>
</name>
<etal></etal>
</person-group>
. (
<year>2000</year>
).
<article-title>Physiobank, physiotoolkit, and physionet components of a new research resource for complex physiologic signals</article-title>
.
<source>Circulation</source>
<volume>101</volume>
,
<fpage>e215</fpage>
<lpage>e220</lpage>
.
<pub-id pub-id-type="doi">10.1161/01.CIR.101.23.e215</pub-id>
<pub-id pub-id-type="pmid">10851218</pub-id>
</mixed-citation>
</ref>
<ref id="B3">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gotman</surname>
<given-names>J.</given-names>
</name>
</person-group>
(
<year>1982</year>
).
<article-title>Automatic recognition of epileptic seizures in the EEG</article-title>
.
<source>Electroencephalogr. Clin. Neurophysiol.</source>
<volume>54</volume>
,
<fpage>530</fpage>
<lpage>540</lpage>
.
<pub-id pub-id-type="doi">10.1016/0013-4694(82)90038-4</pub-id>
<pub-id pub-id-type="pmid">6181976</pub-id>
</mixed-citation>
</ref>
<ref id="B4">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lebedev</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Nicolelis</surname>
<given-names>M.</given-names>
</name>
</person-group>
(
<year>2006</year>
).
<article-title>Brain–machine interfaces: past, present and future</article-title>
.
<source>Trends Neurosci.</source>
<volume>29</volume>
,
<fpage>536</fpage>
<lpage>546</lpage>
.
<pub-id pub-id-type="doi">10.1016/j.tins.2006.07.004</pub-id>
<pub-id pub-id-type="pmid">16859758</pub-id>
</mixed-citation>
</ref>
<ref id="B5">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liao</surname>
<given-names>L. D.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>C. T.</given-names>
</name>
<name>
<surname>McDowell</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Wickenden</surname>
<given-names>A. E.</given-names>
</name>
<name>
<surname>Gramann</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Jung</surname>
<given-names>T. P.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2012</year>
).
<article-title>Biosensor technologies for augmented brain – computer interfaces in the next decades</article-title>
.
<source>Proceed. IEEE</source>
<volume>100</volume>
,
<fpage>1553</fpage>
<lpage>1566</lpage>
.
<pub-id pub-id-type="doi">10.1109/JPROC.2012.2184829</pub-id>
</mixed-citation>
</ref>
<ref id="B6">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>McKee</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>Cantu</surname>
<given-names>R. C.</given-names>
</name>
<name>
<surname>Nowinski</surname>
<given-names>C. J.</given-names>
</name>
<name>
<surname>Hedley-Whyte</surname>
<given-names>E. T.</given-names>
</name>
<name>
<surname>Gavett</surname>
<given-names>B. E.</given-names>
</name>
<name>
<surname>Budson</surname>
<given-names>A. E.</given-names>
</name>
<etal></etal>
</person-group>
. (
<year>2009</year>
).
<article-title>Chronic traumatic encephalopathy in athletes: progressive tauopathy after repetitive head injury</article-title>
.
<source>J. Neuropathol. Exp. Neurol.</source>
<volume>68</volume>
,
<fpage>709</fpage>
<lpage>35</lpage>
.
<pub-id pub-id-type="doi">10.1097/NEN.0b013e3181a9d503</pub-id>
<pub-id pub-id-type="pmid">19535999</pub-id>
</mixed-citation>
</ref>
<ref id="B16">
<mixed-citation publication-type="webpage">
<person-group person-group-type="author">
<name>
<surname>Picone</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Obeid</surname>
<given-names>I.</given-names>
</name>
</person-group>
(
<year>2016</year>
).
<source>Temple University Hospital EEG Corpus, Neural Engineering Data Consortium, v0.6.3</source>
. Available online at:
<ext-link ext-link-type="uri" xlink:href="http://www.nedcdata.org">www.nedcdata.org</ext-link>
.</mixed-citation>
</ref>
<ref id="B7">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ramgopal</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Thome-Souza</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Jackson</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Kadish</surname>
<given-names>N. E.</given-names>
</name>
<name>
<surname>Sánchez Fernández</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Klehm</surname>
<given-names>J.</given-names>
</name>
<etal></etal>
</person-group>
. (
<year>2014</year>
).
<article-title>Seizure detection, seizure prediction, and closed-loop warning systems in Epilepsy</article-title>
.
<source>Epilepsy Behav.</source>
<volume>37</volume>
,
<fpage>291</fpage>
<lpage>307</lpage>
.
<pub-id pub-id-type="doi">10.1016/j.yebeh.2014.06.023</pub-id>
<pub-id pub-id-type="pmid">25174001</pub-id>
</mixed-citation>
</ref>
<ref id="B8">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schalk</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>McFarland</surname>
<given-names>D. J.</given-names>
</name>
<name>
<surname>Hinterberger</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Birbaumer</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Wolpaw</surname>
<given-names>J. R.</given-names>
</name>
</person-group>
(
<year>2004</year>
).
<article-title>BCI2000: a general-purpose Brain-Computer Interface (BCI) system</article-title>
.
<source>IEEE Trans. Bio Med. Eng.</source>
<volume>51</volume>
,
<fpage>1034</fpage>
<lpage>1043</lpage>
.
<pub-id pub-id-type="doi">10.1109/TBME.2004.827072</pub-id>
<pub-id pub-id-type="pmid">15188875</pub-id>
</mixed-citation>
</ref>
<ref id="B9">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Selvaraj</surname>
<given-names>T. G.</given-names>
</name>
<name>
<surname>Ramasamy</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Jeyaraj</surname>
<given-names>S. J.</given-names>
</name>
<name>
<surname>Suviseshamuthu</surname>
<given-names>E. S.</given-names>
</name>
</person-group>
(
<year>2014</year>
).
<article-title>EEG database of seizure disorders for experts and application developers</article-title>
.
<source>Clin. EEG Neurosci.</source>
<volume>45</volume>
,
<fpage>304</fpage>
<lpage>309</lpage>
.
<pub-id pub-id-type="doi">10.1177/1550059413500960</pub-id>
</mixed-citation>
</ref>
<ref id="B10">
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Shoeb</surname>
<given-names>A.</given-names>
</name>
</person-group>
(
<year>2009</year>
).
<source>Application of Machine Learning to Epileptic Seizure Onset Detection and Treatment</source>
.
<publisher-loc>Cambridge, MA</publisher-loc>
:
<publisher-name>MIT</publisher-name>
.</mixed-citation>
</ref>
<ref id="B11">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stern</surname>
<given-names>R. A.</given-names>
</name>
<name>
<surname>Riley</surname>
<given-names>D. O.</given-names>
</name>
<name>
<surname>Daneshvar</surname>
<given-names>D. H.</given-names>
</name>
<name>
<surname>Nowinski</surname>
<given-names>C. J.</given-names>
</name>
<name>
<surname>Cantu</surname>
<given-names>R. C.</given-names>
</name>
<name>
<surname>McKee</surname>
<given-names>A. C.</given-names>
</name>
</person-group>
(
<year>2011</year>
).
<article-title>Long-term consequences of repetitive brain trauma: chronic traumatic encephalopathy</article-title>
.
<source>PM R</source>
(
<issue>10 Suppl. 2</issue>
),
<fpage>S460</fpage>
<lpage>S467</lpage>
.
<pub-id pub-id-type="doi">10.1016/j.pmrj.2011.08.008</pub-id>
<pub-id pub-id-type="pmid">22035690</pub-id>
</mixed-citation>
</ref>
<ref id="B12">
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Tatum</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Husain</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Benbadis</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kaplan</surname>
<given-names>P.</given-names>
</name>
</person-group>
(
<year>2007</year>
).
<source>Handbook of EEG Interpretation</source>
.
<publisher-loc>New York, NY</publisher-loc>
:
<publisher-name>Demos Medical Publishing</publisher-name>
.</mixed-citation>
</ref>
<ref id="B13">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Collinger</surname>
<given-names>J. L.</given-names>
</name>
<name>
<surname>Degenhart</surname>
<given-names>A. D.</given-names>
</name>
<name>
<surname>Tyler-Kabara</surname>
<given-names>E. C.</given-names>
</name>
<name>
<surname>Schwartz</surname>
<given-names>A. B.</given-names>
</name>
<name>
<surname>Moran</surname>
<given-names>D. W.</given-names>
</name>
<etal></etal>
</person-group>
. (
<year>2013</year>
).
<article-title>An electrocorticographic brain interface in an individual with tetraplegia</article-title>
.
<source>PLoS ONE</source>
<volume>8</volume>
:
<issue>e55344</issue>
.
<pub-id pub-id-type="doi">10.1371/journal.pone.0055344</pub-id>
<pub-id pub-id-type="pmid">23405137</pub-id>
</mixed-citation>
</ref>
<ref id="B14">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weiss</surname>
<given-names>P. S.</given-names>
</name>
</person-group>
(
<year>2013</year>
).
<article-title>President obama announces the BRAIN initiative</article-title>
.
<source>ACS Nano</source>
<volume>7</volume>
,
<fpage>2873</fpage>
<lpage>2874</lpage>
.
<pub-id pub-id-type="doi">10.1021/nn401796f</pub-id>
<pub-id pub-id-type="pmid">23607423</pub-id>
</mixed-citation>
</ref>
<ref id="B15">
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Yamada</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Meng</surname>
<given-names>E.</given-names>
</name>
</person-group>
(
<year>2009</year>
).
<source>Practical Guide for Clinical Neurophysiologic Testing: EEG</source>
.
<publisher-loc>Philadelphia, PA</publisher-loc>
:
<publisher-name>Lippincott Williams & Wilkins</publisher-name>
.</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
<affiliations>
<list></list>
<tree>
<noCountry>
<name sortKey="Obeid, Iyad" sort="Obeid, Iyad" uniqKey="Obeid I" first="Iyad" last="Obeid">Iyad Obeid</name>
<name sortKey="Picone, Joseph" sort="Picone, Joseph" uniqKey="Picone J" first="Joseph" last="Picone">Joseph Picone</name>
</noCountry>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Ncbi/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000293 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd -nk 000293 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Ncbi
   |étape=   Merge
   |type=    RBID
   |clé=     PMC:4865520
   |texte=   The Temple University Hospital EEG Data Corpus
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/RBID.i   -Sk "pubmed:NONE" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd   \
       | NlmPubMed2Wicri -a OcrV1 

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024