Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

'Seed + expand': a general methodology for detecting publication oeuvres of individual researchers.

Identifieur interne : 000023 ( PubMed/Curation ); précédent : 000022; suivant : 000024

'Seed + expand': a general methodology for detecting publication oeuvres of individual researchers.

Auteurs : Linda Reijnhoudt [Pays-Bas] ; Rodrigo Costas [Pays-Bas] ; Ed Noyons [Pays-Bas] ; Katy Börner [États-Unis] ; Andrea Scharnhorst [Pays-Bas]

Source :

RBID : pubmed:25328257

Abstract

The study of science at the individual scholar level requires the disambiguation of author names. The creation of author's publication oeuvres involves matching the list of unique author names to names used in publication databases. Despite recent progress in the development of unique author identifiers, e.g., ORCID, VIVO, or DAI, author disambiguation remains a key problem when it comes to large-scale bibliometric analysis using data from multiple databases. This study introduces and tests a new methodology called seed + expand for semi-automatic bibliographic data collection for a given set of individual authors. Specifically, we identify the oeuvre of a set of Dutch full professors during the period 1980-2011. In particular, we combine author records from a Dutch National Research Information System (NARCIS) with publication records from the Web of Science. Starting with an initial list of 8,378 names, we identify 'seed publications' for each author using five different approaches. Subsequently, we 'expand' the set of publications in three different approaches. The different approaches are compared and resulting oeuvres are evaluated on precision and recall using a 'gold standard' dataset of authors for which verified publications in the period 2001-2010 are available.

DOI: 10.1007/s11192-014-1256-0
PubMed: 25328257

Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:25328257

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">'Seed + expand': a general methodology for detecting publication oeuvres of individual researchers.</title>
<author>
<name sortKey="Reijnhoudt, Linda" sort="Reijnhoudt, Linda" uniqKey="Reijnhoudt L" first="Linda" last="Reijnhoudt">Linda Reijnhoudt</name>
<affiliation wicri:level="1">
<nlm:affiliation>DANS, Royal Netherlands Academy of Arts and Sciences (KNAW), The Hague, The Netherlands.</nlm:affiliation>
<country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea>DANS, Royal Netherlands Academy of Arts and Sciences (KNAW), The Hague</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Costas, Rodrigo" sort="Costas, Rodrigo" uniqKey="Costas R" first="Rodrigo" last="Costas">Rodrigo Costas</name>
<affiliation wicri:level="1">
<nlm:affiliation>Center for Science and Technology Studies (CWTS)-Leiden University, Leiden, The Netherlands.</nlm:affiliation>
<country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea>Center for Science and Technology Studies (CWTS)-Leiden University, Leiden</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Noyons, Ed" sort="Noyons, Ed" uniqKey="Noyons E" first="Ed" last="Noyons">Ed Noyons</name>
<affiliation wicri:level="1">
<nlm:affiliation>Center for Science and Technology Studies (CWTS)-Leiden University, Leiden, The Netherlands.</nlm:affiliation>
<country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea>Center for Science and Technology Studies (CWTS)-Leiden University, Leiden</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Borner, Katy" sort="Borner, Katy" uniqKey="Borner K" first="Katy" last="Börner">Katy Börner</name>
<affiliation wicri:level="2">
<nlm:affiliation>DANS, Royal Netherlands Academy of Arts and Sciences (KNAW), The Hague, The Netherlands ; Cyberinfrastructure for Network Science Center, School of Informatics and Computing, Indiana University, Bloomington, IN USA.</nlm:affiliation>
<country>États-Unis</country>
<placeName>
<region type="state">Indiana</region>
</placeName>
<wicri:cityArea>DANS, Royal Netherlands Academy of Arts and Sciences (KNAW), The Hague, The Netherlands ; Cyberinfrastructure for Network Science Center, School of Informatics and Computing, Indiana University, Bloomington</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Scharnhorst, Andrea" sort="Scharnhorst, Andrea" uniqKey="Scharnhorst A" first="Andrea" last="Scharnhorst">Andrea Scharnhorst</name>
<affiliation wicri:level="1">
<nlm:affiliation>DANS, Royal Netherlands Academy of Arts and Sciences (KNAW), The Hague, The Netherlands.</nlm:affiliation>
<country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea>DANS, Royal Netherlands Academy of Arts and Sciences (KNAW), The Hague</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2014">2014</date>
<idno type="doi">10.1007/s11192-014-1256-0</idno>
<idno type="RBID">pubmed:25328257</idno>
<idno type="pmid">25328257</idno>
<idno type="wicri:Area/PubMed/Corpus">000023</idno>
<idno type="wicri:Area/PubMed/Curation">000023</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">'Seed + expand': a general methodology for detecting publication oeuvres of individual researchers.</title>
<author>
<name sortKey="Reijnhoudt, Linda" sort="Reijnhoudt, Linda" uniqKey="Reijnhoudt L" first="Linda" last="Reijnhoudt">Linda Reijnhoudt</name>
<affiliation wicri:level="1">
<nlm:affiliation>DANS, Royal Netherlands Academy of Arts and Sciences (KNAW), The Hague, The Netherlands.</nlm:affiliation>
<country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea>DANS, Royal Netherlands Academy of Arts and Sciences (KNAW), The Hague</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Costas, Rodrigo" sort="Costas, Rodrigo" uniqKey="Costas R" first="Rodrigo" last="Costas">Rodrigo Costas</name>
<affiliation wicri:level="1">
<nlm:affiliation>Center for Science and Technology Studies (CWTS)-Leiden University, Leiden, The Netherlands.</nlm:affiliation>
<country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea>Center for Science and Technology Studies (CWTS)-Leiden University, Leiden</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Noyons, Ed" sort="Noyons, Ed" uniqKey="Noyons E" first="Ed" last="Noyons">Ed Noyons</name>
<affiliation wicri:level="1">
<nlm:affiliation>Center for Science and Technology Studies (CWTS)-Leiden University, Leiden, The Netherlands.</nlm:affiliation>
<country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea>Center for Science and Technology Studies (CWTS)-Leiden University, Leiden</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Borner, Katy" sort="Borner, Katy" uniqKey="Borner K" first="Katy" last="Börner">Katy Börner</name>
<affiliation wicri:level="2">
<nlm:affiliation>DANS, Royal Netherlands Academy of Arts and Sciences (KNAW), The Hague, The Netherlands ; Cyberinfrastructure for Network Science Center, School of Informatics and Computing, Indiana University, Bloomington, IN USA.</nlm:affiliation>
<country>États-Unis</country>
<placeName>
<region type="state">Indiana</region>
</placeName>
<wicri:cityArea>DANS, Royal Netherlands Academy of Arts and Sciences (KNAW), The Hague, The Netherlands ; Cyberinfrastructure for Network Science Center, School of Informatics and Computing, Indiana University, Bloomington</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Scharnhorst, Andrea" sort="Scharnhorst, Andrea" uniqKey="Scharnhorst A" first="Andrea" last="Scharnhorst">Andrea Scharnhorst</name>
<affiliation wicri:level="1">
<nlm:affiliation>DANS, Royal Netherlands Academy of Arts and Sciences (KNAW), The Hague, The Netherlands.</nlm:affiliation>
<country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea>DANS, Royal Netherlands Academy of Arts and Sciences (KNAW), The Hague</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Scientometrics</title>
<idno type="ISSN">0138-9130</idno>
<imprint>
<date when="2014" type="published">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The study of science at the individual scholar level requires the disambiguation of author names. The creation of author's publication oeuvres involves matching the list of unique author names to names used in publication databases. Despite recent progress in the development of unique author identifiers, e.g., ORCID, VIVO, or DAI, author disambiguation remains a key problem when it comes to large-scale bibliometric analysis using data from multiple databases. This study introduces and tests a new methodology called seed + expand for semi-automatic bibliographic data collection for a given set of individual authors. Specifically, we identify the oeuvre of a set of Dutch full professors during the period 1980-2011. In particular, we combine author records from a Dutch National Research Information System (NARCIS) with publication records from the Web of Science. Starting with an initial list of 8,378 names, we identify 'seed publications' for each author using five different approaches. Subsequently, we 'expand' the set of publications in three different approaches. The different approaches are compared and resulting oeuvres are evaluated on precision and recall using a 'gold standard' dataset of authors for which verified publications in the period 2001-2010 are available.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="Publisher" Owner="NLM">
<PMID Version="1">25328257</PMID>
<DateCreated>
<Year>2014</Year>
<Month>10</Month>
<Day>20</Day>
</DateCreated>
<DateRevised>
<Year>2014</Year>
<Month>10</Month>
<Day>22</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Print">0138-9130</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>101</Volume>
<Issue>2</Issue>
<PubDate>
<Year>2014</Year>
</PubDate>
</JournalIssue>
<Title>Scientometrics</Title>
<ISOAbbreviation>Scientometrics</ISOAbbreviation>
</Journal>
<ArticleTitle>'Seed + expand': a general methodology for detecting publication oeuvres of individual researchers.</ArticleTitle>
<Pagination>
<MedlinePgn>1403-1417</MedlinePgn>
</Pagination>
<Abstract>
<AbstractText>The study of science at the individual scholar level requires the disambiguation of author names. The creation of author's publication oeuvres involves matching the list of unique author names to names used in publication databases. Despite recent progress in the development of unique author identifiers, e.g., ORCID, VIVO, or DAI, author disambiguation remains a key problem when it comes to large-scale bibliometric analysis using data from multiple databases. This study introduces and tests a new methodology called seed + expand for semi-automatic bibliographic data collection for a given set of individual authors. Specifically, we identify the oeuvre of a set of Dutch full professors during the period 1980-2011. In particular, we combine author records from a Dutch National Research Information System (NARCIS) with publication records from the Web of Science. Starting with an initial list of 8,378 names, we identify 'seed publications' for each author using five different approaches. Subsequently, we 'expand' the set of publications in three different approaches. The different approaches are compared and resulting oeuvres are evaluated on precision and recall using a 'gold standard' dataset of authors for which verified publications in the period 2001-2010 are available.</AbstractText>
</Abstract>
<AuthorList>
<Author>
<LastName>Reijnhoudt</LastName>
<ForeName>Linda</ForeName>
<Initials>L</Initials>
<AffiliationInfo>
<Affiliation>DANS, Royal Netherlands Academy of Arts and Sciences (KNAW), The Hague, The Netherlands.</Affiliation>
</AffiliationInfo>
</Author>
<Author>
<LastName>Costas</LastName>
<ForeName>Rodrigo</ForeName>
<Initials>R</Initials>
<AffiliationInfo>
<Affiliation>Center for Science and Technology Studies (CWTS)-Leiden University, Leiden, The Netherlands.</Affiliation>
</AffiliationInfo>
</Author>
<Author>
<LastName>Noyons</LastName>
<ForeName>Ed</ForeName>
<Initials>E</Initials>
<AffiliationInfo>
<Affiliation>Center for Science and Technology Studies (CWTS)-Leiden University, Leiden, The Netherlands.</Affiliation>
</AffiliationInfo>
</Author>
<Author>
<LastName>Börner</LastName>
<ForeName>Katy</ForeName>
<Initials>K</Initials>
<AffiliationInfo>
<Affiliation>DANS, Royal Netherlands Academy of Arts and Sciences (KNAW), The Hague, The Netherlands ; Cyberinfrastructure for Network Science Center, School of Informatics and Computing, Indiana University, Bloomington, IN USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author>
<LastName>Scharnhorst</LastName>
<ForeName>Andrea</ForeName>
<Initials>A</Initials>
<AffiliationInfo>
<Affiliation>DANS, Royal Netherlands Academy of Arts and Sciences (KNAW), The Hague, The Netherlands.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>ENG</Language>
<GrantList>
<Grant>
<GrantID>U01 GM098959</GrantID>
<Acronym>GM</Acronym>
<Agency>NIGMS NIH HHS</Agency>
<Country>United States</Country>
</Grant>
</GrantList>
<PublicationTypeList>
<PublicationType UI="">JOURNAL ARTICLE</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2014</Year>
<Month>3</Month>
<Day>5</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<MedlineTA>Scientometrics</MedlineTA>
<NlmUniqueID>7901197</NlmUniqueID>
<ISSNLinking>0138-9130</ISSNLinking>
</MedlineJournalInfo>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="N">Author disambiguation</Keyword>
<Keyword MajorTopicYN="N">Publication oeuvre</Keyword>
<Keyword MajorTopicYN="N">Scalable methods</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2013</Year>
<Month>11</Month>
<Day>18</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="epublish">
<Year>2014</Year>
<Month>3</Month>
<Day>5</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2014</Year>
<Month>10</Month>
<Day>21</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2014</Year>
<Month>10</Month>
<Day>21</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2014</Year>
<Month>10</Month>
<Day>21</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="doi">10.1007/s11192-014-1256-0</ArticleId>
<ArticleId IdType="pii">1256</ArticleId>
<ArticleId IdType="pubmed">25328257</ArticleId>
<ArticleId IdType="pmc">PMC4190454</ArticleId>
<ArticleId IdType="mid">NIHMS612248</ArticleId>
</ArticleIdList>
<pmc-dir>pmcsd</pmc-dir>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/PubMed/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000023 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd -nk 000023 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    PubMed
   |étape=   Curation
   |type=    RBID
   |clé=     pubmed:25328257
   |texte=   'Seed + expand': a general methodology for detecting publication oeuvres of individual researchers.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Curation/RBID.i   -Sk "pubmed:25328257" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024