Serveur d'exploration sur les relations entre la France et l'Australie

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

The Potential of Automatic Word Comparison for Historical Linguistics

Identifieur interne : 000060 ( Pmc/Checkpoint ); précédent : 000059; suivant : 000061

The Potential of Automatic Word Comparison for Historical Linguistics

Auteurs : Johann-Mattis List [France] ; Simon J. Greenhill [Allemagne, Australie] ; Russell D. Gray [Allemagne]

Source :

RBID : PMC:5271327

Abstract

The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computational approaches can take care of the repetitive and schematic tasks leaving experts to concentrate on answering interesting questions. Here we test the potential of automatic methods to detect etymologically related words (cognates) in cross-linguistic data. Using a newly compiled database of expert cognate judgments across five different language families, we compare how well different automatic approaches distinguish related from unrelated words. Our results show that automatic methods can identify cognates with a very high degree of accuracy, reaching 89% for the best-performing method Infomap. We identify the specific strengths and weaknesses of these different methods and point to major challenges for future approaches. Current automatic approaches for cognate detection—although not perfect—could become an important component of future research in historical linguistics.


Url:
DOI: 10.1371/journal.pone.0170046
PubMed: 28129337
PubMed Central: 5271327


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:5271327

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">The Potential of Automatic Word Comparison for Historical Linguistics</title>
<author>
<name sortKey="List, Johann Mattis" sort="List, Johann Mattis" uniqKey="List J" first="Johann-Mattis" last="List">Johann-Mattis List</name>
<affiliation wicri:level="1">
<nlm:aff id="aff001">
<addr-line>Centre des Recherches Linguistiques sur l’Asie Orientale, École des Hautes Études en Sciences Sociales, 2 Rue de Lille, 75007 Paris, France</addr-line>
</nlm:aff>
<country xml:lang="fr">France</country>
<wicri:regionArea>Centre des Recherches Linguistiques sur l’Asie Orientale, École des Hautes Études en Sciences Sociales, 2 Rue de Lille, 75007 Paris</wicri:regionArea>
<wicri:noRegion>75007 Paris</wicri:noRegion>
<placeName>
<settlement type="city">Paris</settlement>
<region type="région" nuts="2">Île-de-France</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Greenhill, Simon J" sort="Greenhill, Simon J" uniqKey="Greenhill S" first="Simon J." last="Greenhill">Simon J. Greenhill</name>
<affiliation wicri:level="1">
<nlm:aff id="aff002">
<addr-line>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena, Germany</addr-line>
</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena</wicri:regionArea>
<wicri:noRegion>07743, Jena</wicri:noRegion>
<wicri:noRegion>07743, Jena</wicri:noRegion>
<wicri:noRegion>Jena</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff003">
<addr-line>ARC Centre of Excellence for the Dynamics of Language, Australian National University, Canberra, 2600, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>ARC Centre of Excellence for the Dynamics of Language, Australian National University, Canberra, 2600</wicri:regionArea>
<wicri:noRegion>2600</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Gray, Russell D" sort="Gray, Russell D" uniqKey="Gray R" first="Russell D." last="Gray">Russell D. Gray</name>
<affiliation wicri:level="1">
<nlm:aff id="aff002">
<addr-line>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena, Germany</addr-line>
</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena</wicri:regionArea>
<wicri:noRegion>07743, Jena</wicri:noRegion>
<wicri:noRegion>07743, Jena</wicri:noRegion>
<wicri:noRegion>Jena</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">28129337</idno>
<idno type="pmc">5271327</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5271327</idno>
<idno type="RBID">PMC:5271327</idno>
<idno type="doi">10.1371/journal.pone.0170046</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">002964</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">002964</idno>
<idno type="wicri:Area/Pmc/Curation">002814</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">002814</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000060</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000060</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">The Potential of Automatic Word Comparison for Historical Linguistics</title>
<author>
<name sortKey="List, Johann Mattis" sort="List, Johann Mattis" uniqKey="List J" first="Johann-Mattis" last="List">Johann-Mattis List</name>
<affiliation wicri:level="3">
<nlm:aff id="aff001">
<addr-line>Centre des Recherches Linguistiques sur l’Asie Orientale, École des Hautes Études en Sciences Sociales, 2 Rue de Lille, 75007 Paris, France</addr-line>
</nlm:aff>
<country xml:lang="fr">France</country>
<wicri:regionArea>Centre des Recherches Linguistiques sur l’Asie Orientale, École des Hautes Études en Sciences Sociales, 2 Rue de Lille, 75007 Paris</wicri:regionArea>
<placeName>
<region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Greenhill, Simon J" sort="Greenhill, Simon J" uniqKey="Greenhill S" first="Simon J." last="Greenhill">Simon J. Greenhill</name>
<affiliation wicri:level="1">
<nlm:aff id="aff002">
<addr-line>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena, Germany</addr-line>
</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena</wicri:regionArea>
<wicri:noRegion>07743, Jena</wicri:noRegion>
<wicri:noRegion>07743, Jena</wicri:noRegion>
<wicri:noRegion>Jena</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff003">
<addr-line>ARC Centre of Excellence for the Dynamics of Language, Australian National University, Canberra, 2600, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>ARC Centre of Excellence for the Dynamics of Language, Australian National University, Canberra, 2600</wicri:regionArea>
<wicri:noRegion>2600</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Gray, Russell D" sort="Gray, Russell D" uniqKey="Gray R" first="Russell D." last="Gray">Russell D. Gray</name>
<affiliation wicri:level="1">
<nlm:aff id="aff002">
<addr-line>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena, Germany</addr-line>
</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena</wicri:regionArea>
<wicri:noRegion>07743, Jena</wicri:noRegion>
<wicri:noRegion>07743, Jena</wicri:noRegion>
<wicri:noRegion>Jena</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint>
<date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computational approaches can take care of the repetitive and schematic tasks leaving experts to concentrate on answering interesting questions. Here we test the potential of automatic methods to detect etymologically related words (cognates) in cross-linguistic data. Using a newly compiled database of expert cognate judgments across five different language families, we compare how well different automatic approaches distinguish related from unrelated words. Our results show that automatic methods can identify cognates with a very high degree of accuracy, reaching 89% for the best-performing method
<italic>Infomap</italic>
. We identify the specific strengths and weaknesses of these different methods and point to major challenges for future approaches. Current automatic approaches for cognate detection—although not perfect—could become an important component of future research in historical linguistics.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Greenhill, Sj" uniqKey="Greenhill S">SJ Greenhill</name>
</author>
<author>
<name sortKey="Blust, R" uniqKey="Blust R">R Blust</name>
</author>
<author>
<name sortKey="Gray, Rd" uniqKey="Gray R">RD Gray</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dunn, M" uniqKey="Dunn M">M Dunn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Greenhill, Sj" uniqKey="Greenhill S">SJ Greenhill</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kitchen, A" uniqKey="Kitchen A">A Kitchen</name>
</author>
<author>
<name sortKey="Ehret, C" uniqKey="Ehret C">C Ehret</name>
</author>
<author>
<name sortKey="Assefa, S" uniqKey="Assefa S">S Assefa</name>
</author>
<author>
<name sortKey="Mulligan, Cj" uniqKey="Mulligan C">CJ Mulligan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bowern, C" uniqKey="Bowern C">C Bowern</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fox, A" uniqKey="Fox A">A Fox</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hammarstrom, H" uniqKey="Hammarstrom H">H Hammarström</name>
</author>
<author>
<name sortKey="Forkel, R" uniqKey="Forkel R">R Forkel</name>
</author>
<author>
<name sortKey="Haspelmath, M" uniqKey="Haspelmath M">M Haspelmath</name>
</author>
<author>
<name sortKey="Bank, S" uniqKey="Bank S">S Bank</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mcmahon, A" uniqKey="Mcmahon A">A McMahon</name>
</author>
<author>
<name sortKey="Mcmahon, R" uniqKey="Mcmahon R">R McMahon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Embleton, S" uniqKey="Embleton S">S Embleton</name>
</author>
<author>
<name sortKey="Renfrew, C" uniqKey="Renfrew C">C Renfrew</name>
</author>
<author>
<name sortKey="Mcmahon, A" uniqKey="Mcmahon A">A McMahon</name>
</author>
<author>
<name sortKey="Trask, L" uniqKey="Trask L">L Trask</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Holm, Hj" uniqKey="Holm H">HJ Holm</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Holman, Ew" uniqKey="Holman E">EW Holman</name>
</author>
<author>
<name sortKey="Wichmann, S" uniqKey="Wichmann S">S Wichmann</name>
</author>
<author>
<name sortKey="Brown, Ch" uniqKey="Brown C">CH Brown</name>
</author>
<author>
<name sortKey="Velupillai, V" uniqKey="Velupillai V">V Velupillai</name>
</author>
<author>
<name sortKey="Muller, A" uniqKey="Muller A">A Müller</name>
</author>
<author>
<name sortKey="Bakker, D" uniqKey="Bakker D">D Bakker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wheeler, Wc" uniqKey="Wheeler W">WC Wheeler</name>
</author>
<author>
<name sortKey="Whiteley, Pm" uniqKey="Whiteley P">PM Whiteley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="J Ger, G" uniqKey="J Ger G">G Jäger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Campbell, L" uniqKey="Campbell L">L Campbell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Greenhill, Sj" uniqKey="Greenhill S">SJ Greenhill</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sidwell, P" uniqKey="Sidwell P">P Sidwell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Trask, Rl" uniqKey="Trask R">RL Trask</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ross, Md" uniqKey="Ross M">MD Ross</name>
</author>
<author>
<name sortKey="Durie, M" uniqKey="Durie M">M Durie</name>
</author>
<author>
<name sortKey="Durie, M" uniqKey="Durie M">M Durie</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="List, Jm" uniqKey="List J">JM List</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sidwell, P" uniqKey="Sidwell P">P Sidwell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Saenko, M" uniqKey="Saenko M">M Saenko</name>
</author>
<author>
<name sortKey="Starostin, Gs" uniqKey="Starostin G">GS Starostin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mcelhanon, Ka" uniqKey="Mcelhanon K">KA McElhanon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Starostin, Gs" uniqKey="Starostin G">GS Starostin</name>
</author>
<author>
<name sortKey="Starostin, Gs" uniqKey="Starostin G">GS Starostin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="B Ij Ng, Daxue" uniqKey="B Ij Ng D">Dàxué Běijīng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Syrj Nen, K" uniqKey="Syrj Nen K">K Syrjänen</name>
</author>
<author>
<name sortKey="Honkola, T" uniqKey="Honkola T">T Honkola</name>
</author>
<author>
<name sortKey="Korhonen, K" uniqKey="Korhonen K">K Korhonen</name>
</author>
<author>
<name sortKey="Lehtinen, J" uniqKey="Lehtinen J">J Lehtinen</name>
</author>
<author>
<name sortKey="Vesakoski, O" uniqKey="Vesakoski O">O Vesakoski</name>
</author>
<author>
<name sortKey="Wahlber, N" uniqKey="Wahlber N">N Wahlber</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="List, Jm" uniqKey="List J">JM List</name>
</author>
<author>
<name sortKey="Cysouw, M" uniqKey="Cysouw M">M Cysouw</name>
</author>
<author>
<name sortKey="Forkel, R" uniqKey="Forkel R">R Forkel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, F" uniqKey="Wang F">F Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="H U, J" uniqKey="H U J">J Hóu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hattori, S" uniqKey="Hattori S">S Hattori</name>
</author>
<author>
<name sortKey="Hoenigswald, Hm" uniqKey="Hoenigswald H">HM Hoenigswald</name>
</author>
<author>
<name sortKey="Langacre, Rh" uniqKey="Langacre R">RH Langacre</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhivlov, M" uniqKey="Zhivlov M">M Zhivlov</name>
</author>
<author>
<name sortKey="Starostin, Gs" uniqKey="Starostin G">GS Starostin</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bouchard Cote, A" uniqKey="Bouchard Cote A">A Bouchard-Côté</name>
</author>
<author>
<name sortKey="Hall, D" uniqKey="Hall D">D Hall</name>
</author>
<author>
<name sortKey="Griffiths, Tl" uniqKey="Griffiths T">TL Griffiths</name>
</author>
<author>
<name sortKey="Klein, D" uniqKey="Klein D">D Klein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rosvall, M" uniqKey="Rosvall M">M Rosvall</name>
</author>
<author>
<name sortKey="Bergstrom, Ct" uniqKey="Bergstrom C">CT Bergstrom</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Steiner, L" uniqKey="Steiner L">L Steiner</name>
</author>
<author>
<name sortKey="Stadler, Pf" uniqKey="Stadler P">PF Stadler</name>
</author>
<author>
<name sortKey="Cysouw, M" uniqKey="Cysouw M">M Cysouw</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Csardi, G" uniqKey="Csardi G">G Csárdi</name>
</author>
<author>
<name sortKey="Nepusz, T" uniqKey="Nepusz T">T Nepusz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Turchin, P" uniqKey="Turchin P">P Turchin</name>
</author>
<author>
<name sortKey="Peiros, I" uniqKey="Peiros I">I Peiros</name>
</author>
<author>
<name sortKey="Gell Mann, M" uniqKey="Gell Mann M">M Gell-Mann</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dolgopolsky, Ab" uniqKey="Dolgopolsky A">AB Dolgopolsky</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Levenshtein, Vi" uniqKey="Levenshtein V">VI Levenshtein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sokal, Rr" uniqKey="Sokal R">RR Sokal</name>
</author>
<author>
<name sortKey="Michener, Cd" uniqKey="Michener C">CD Michener</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kessler, B" uniqKey="Kessler B">B Kessler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Meheust, R" uniqKey="Meheust R">R Méheust</name>
</author>
<author>
<name sortKey="Zelzion, E" uniqKey="Zelzion E">E Zelzion</name>
</author>
<author>
<name sortKey="Bhattacharya, D" uniqKey="Bhattacharya D">D Bhattacharya</name>
</author>
<author>
<name sortKey="Lopez, P" uniqKey="Lopez P">P Lopez</name>
</author>
<author>
<name sortKey="Bapteste, E" uniqKey="Bapteste E">E Bapteste</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Corel, E" uniqKey="Corel E">E Corel</name>
</author>
<author>
<name sortKey="Lopez, P" uniqKey="Lopez P">P Lopez</name>
</author>
<author>
<name sortKey="Meheust, R" uniqKey="Meheust R">R Méheust</name>
</author>
<author>
<name sortKey="Bapteste, E" uniqKey="Bapteste E">E Bapteste</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lopez, P" uniqKey="Lopez P">P Lopez</name>
</author>
<author>
<name sortKey="List, Jm" uniqKey="List J">JM List</name>
</author>
<author>
<name sortKey="Bapteste, E" uniqKey="Bapteste E">E Bapteste</name>
</author>
<author>
<name sortKey="Fangerau, H" uniqKey="Fangerau H">H Fangerau</name>
</author>
<author>
<name sortKey="Geisler, H" uniqKey="Geisler H">H Geisler</name>
</author>
<author>
<name sortKey="Halling, T" uniqKey="Halling T">T Halling</name>
</author>
<author>
<name sortKey="Martin, W" uniqKey="Martin W">W Martin</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="List, Jm" uniqKey="List J">JM List</name>
</author>
<author>
<name sortKey="Pathmanathan, Js" uniqKey="Pathmanathan J">JS Pathmanathan</name>
</author>
<author>
<name sortKey="Lopez, P" uniqKey="Lopez P">P Lopez</name>
</author>
<author>
<name sortKey="Bapteste, E" uniqKey="Bapteste E">E Bapteste</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Frey, Bj" uniqKey="Frey B">BJ Frey</name>
</author>
<author>
<name sortKey="Dueck, D" uniqKey="Dueck D">D Dueck</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vlasblom, J" uniqKey="Vlasblom J">J Vlasblom</name>
</author>
<author>
<name sortKey="Wodak, Sj" uniqKey="Wodak S">SJ Wodak</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Girvan, M" uniqKey="Girvan M">M Girvan</name>
</author>
<author>
<name sortKey="Newman, Me" uniqKey="Newman M">ME Newman</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Amig, E" uniqKey="Amig E">E Amigó</name>
</author>
<author>
<name sortKey="Gonzalo, J" uniqKey="Gonzalo J">J Gonzalo</name>
</author>
<author>
<name sortKey="Artiles, J" uniqKey="Artiles J">J Artiles</name>
</author>
<author>
<name sortKey="Verdejo, F" uniqKey="Verdejo F">F Verdejo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ben Hamed, M" uniqKey="Ben Hamed M">M Ben Hamed</name>
</author>
<author>
<name sortKey="Wang, F" uniqKey="Wang F">F Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Starostin, G" uniqKey="Starostin G">G Starostin</name>
</author>
<author>
<name sortKey="Krylov, P" uniqKey="Krylov P">P Krylov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="List, Jm" uniqKey="List J">JM List</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS One</journal-id>
<journal-id journal-id-type="iso-abbrev">PLoS ONE</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">plosone</journal-id>
<journal-title-group>
<journal-title>PLoS ONE</journal-title>
</journal-title-group>
<issn pub-type="epub">1932-6203</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, CA USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">28129337</article-id>
<article-id pub-id-type="pmc">5271327</article-id>
<article-id pub-id-type="publisher-id">PONE-D-16-41494</article-id>
<article-id pub-id-type="doi">10.1371/journal.pone.0170046</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Social Sciences</subject>
<subj-group>
<subject>Linguistics</subject>
<subj-group>
<subject>Languages</subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Physical Sciences</subject>
<subj-group>
<subject>Mathematics</subject>
<subj-group>
<subject>Applied Mathematics</subject>
<subj-group>
<subject>Algorithms</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Research and Analysis Methods</subject>
<subj-group>
<subject>Simulation and Modeling</subject>
<subj-group>
<subject>Algorithms</subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Social Sciences</subject>
<subj-group>
<subject>Linguistics</subject>
<subj-group>
<subject>Historical Linguistics</subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Social Sciences</subject>
<subj-group>
<subject>Linguistics</subject>
<subj-group>
<subject>Languages</subject>
<subj-group>
<subject>Language Families</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Physical Sciences</subject>
<subj-group>
<subject>Mathematics</subject>
<subj-group>
<subject>Applied Mathematics</subject>
<subj-group>
<subject>Algorithms</subject>
<subj-group>
<subject>Clustering Algorithms</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Research and Analysis Methods</subject>
<subj-group>
<subject>Simulation and Modeling</subject>
<subj-group>
<subject>Algorithms</subject>
<subj-group>
<subject>Clustering Algorithms</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Research and Analysis Methods</subject>
<subj-group>
<subject>Computational Techniques</subject>
<subj-group>
<subject>Split-Decomposition Method</subject>
<subj-group>
<subject>Multiple Alignment Calculation</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Social Sciences</subject>
<subj-group>
<subject>Linguistics</subject>
<subj-group>
<subject>Linguistic Morphology</subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Social Sciences</subject>
<subj-group>
<subject>Linguistics</subject>
<subj-group>
<subject>Phonology</subject>
</subj-group>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>The Potential of Automatic Word Comparison for Historical Linguistics</article-title>
<alt-title alt-title-type="running-head">Automatic Word Comparison</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id authenticated="true" contrib-id-type="orcid">http://orcid.org/0000-0003-2133-8919</contrib-id>
<name>
<surname>List</surname>
<given-names>Johann-Mattis</given-names>
</name>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="cor001">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Greenhill</surname>
<given-names>Simon J.</given-names>
</name>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff003">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Gray</surname>
<given-names>Russell D.</given-names>
</name>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff001">
<label>1</label>
<addr-line>Centre des Recherches Linguistiques sur l’Asie Orientale, École des Hautes Études en Sciences Sociales, 2 Rue de Lille, 75007 Paris, France</addr-line>
</aff>
<aff id="aff002">
<label>2</label>
<addr-line>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena, Germany</addr-line>
</aff>
<aff id="aff003">
<label>3</label>
<addr-line>ARC Centre of Excellence for the Dynamics of Language, Australian National University, Canberra, 2600, Australia</addr-line>
</aff>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Berwick</surname>
<given-names>Robert C</given-names>
</name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1">
<addr-line>Massachusetts Institute of Technology, UNITED STATES</addr-line>
</aff>
<author-notes>
<fn fn-type="COI-statement" id="coi001">
<p>
<bold>Competing Interests: </bold>
The authors have declared that no competing interests exist.</p>
</fn>
<fn fn-type="con">
<p>
<list list-type="simple">
<list-item>
<p>
<bold>Conceptualization:</bold>
JML SJG RDG.</p>
</list-item>
<list-item>
<p>
<bold>Data curation:</bold>
JML.</p>
</list-item>
<list-item>
<p>
<bold>Formal analysis:</bold>
JML SJG.</p>
</list-item>
<list-item>
<p>
<bold>Funding acquisition:</bold>
SJG RDG.</p>
</list-item>
<list-item>
<p>
<bold>Investigation:</bold>
JML SJG.</p>
</list-item>
<list-item>
<p>
<bold>Methodology:</bold>
JML SJG.</p>
</list-item>
<list-item>
<p>
<bold>Project administration:</bold>
RDG.</p>
</list-item>
<list-item>
<p>
<bold>Software:</bold>
JML.</p>
</list-item>
<list-item>
<p>
<bold>Validation:</bold>
JML SJG RDG.</p>
</list-item>
<list-item>
<p>
<bold>Visualization:</bold>
JML SJG.</p>
</list-item>
<list-item>
<p>
<bold>Writing – original draft:</bold>
JML.</p>
</list-item>
<list-item>
<p>
<bold>Writing – review & editing:</bold>
JML SJG RDG.</p>
</list-item>
</list>
</p>
</fn>
<corresp id="cor001">* E-mail:
<email>mattis.list@lingpy.org</email>
</corresp>
</author-notes>
<pub-date pub-type="collection">
<year>2017</year>
</pub-date>
<pub-date pub-type="epub">
<day>27</day>
<month>1</month>
<year>2017</year>
</pub-date>
<volume>12</volume>
<issue>1</issue>
<elocation-id>e0170046</elocation-id>
<history>
<date date-type="received">
<day>18</day>
<month>10</month>
<year>2016</year>
</date>
<date date-type="accepted">
<day>28</day>
<month>12</month>
<year>2016</year>
</date>
</history>
<permissions>
<copyright-statement>© 2017 List et al</copyright-statement>
<copyright-year>2017</copyright-year>
<copyright-holder>List et al</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open access article distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>
, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="pone.0170046.pdf"></self-uri>
<abstract>
<p>The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computational approaches can take care of the repetitive and schematic tasks leaving experts to concentrate on answering interesting questions. Here we test the potential of automatic methods to detect etymologically related words (cognates) in cross-linguistic data. Using a newly compiled database of expert cognate judgments across five different language families, we compare how well different automatic approaches distinguish related from unrelated words. Our results show that automatic methods can identify cognates with a very high degree of accuracy, reaching 89% for the best-performing method
<italic>Infomap</italic>
. We identify the specific strengths and weaknesses of these different methods and point to major challenges for future approaches. Current automatic approaches for cognate detection—although not perfect—could become an important component of future research in historical linguistics.</p>
</abstract>
<funding-group>
<award-group id="award001">
<funding-source>
<institution-wrap>
<institution-id institution-id-type="funder-id">http://dx.doi.org/10.13039/501100001659</institution-id>
<institution>Deutsche Forschungsgemeinschaft</institution>
</institution-wrap>
</funding-source>
<award-id>261553824</award-id>
<principal-award-recipient>
<contrib-id authenticated="true" contrib-id-type="orcid">http://orcid.org/0000-0003-2133-8919</contrib-id>
<name>
<surname>List</surname>
<given-names>Johann-Mattis</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group id="award002">
<funding-source>
<institution>Max Planck Institute for the Science of Human History</institution>
</funding-source>
<award-id>Glottobank Research Group</award-id>
<principal-award-recipient>
<contrib-id authenticated="true" contrib-id-type="orcid">http://orcid.org/0000-0003-2133-8919</contrib-id>
<name>
<surname>List</surname>
<given-names>Johann-Mattis</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group id="award003">
<funding-source>
<institution-wrap>
<institution-id institution-id-type="funder-id">http://dx.doi.org/10.13039/501100000923</institution-id>
<institution>Australian Research Council</institution>
</institution-wrap>
</funding-source>
<award-id>DE120101954</award-id>
<principal-award-recipient>
<name>
<surname>Greenhill</surname>
<given-names>Simon J.</given-names>
</name>
</principal-award-recipient>
</award-group>
<funding-statement>As part of the GlottoBank Project, this work was supported by the Max Planck Institute for the Science of Human History and the Royal Society of New Zealand Marsden Fund grant 13¬UOA-121. This paper was further supported by the DFG research fellowship grant 261553824 “Vertical and lateral aspects of Chinese dialect history” (JML), and the Australian Research Council’s Discovery Projects funding scheme (project number DE120101954, SJG).</funding-statement>
</funding-group>
<counts>
<fig-count count="6"></fig-count>
<table-count count="7"></table-count>
<page-count count="18"></page-count>
</counts>
<custom-meta-group>
<custom-meta id="data-availability">
<meta-name>Data Availability</meta-name>
<meta-value>The Supplementary Material contains additional results, as well as data and code to replicate the analyses. You can download it from:
<ext-link ext-link-type="uri" xlink:href="https://zenodo.org/badge/latestdoi/75610836">https://zenodo.org/badge/latestdoi/75610836</ext-link>
(DOI:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5281/zenodo.192607">10.5281/zenodo.192607</ext-link>
).</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
<notes>
<title>Data Availability</title>
<p>The Supplementary Material contains additional results, as well as data and code to replicate the analyses. You can download it from:
<ext-link ext-link-type="uri" xlink:href="https://zenodo.org/badge/latestdoi/75610836">https://zenodo.org/badge/latestdoi/75610836</ext-link>
(DOI:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5281/zenodo.192607">10.5281/zenodo.192607</ext-link>
).</p>
</notes>
</front>
</pmc>
<affiliations>
<list>
<country>
<li>Allemagne</li>
<li>Australie</li>
<li>France</li>
</country>
<region>
<li>Île-de-France</li>
</region>
<settlement>
<li>Paris</li>
</settlement>
</list>
<tree>
<country name="France">
<region name="Île-de-France">
<name sortKey="List, Johann Mattis" sort="List, Johann Mattis" uniqKey="List J" first="Johann-Mattis" last="List">Johann-Mattis List</name>
</region>
</country>
<country name="Allemagne">
<noRegion>
<name sortKey="Greenhill, Simon J" sort="Greenhill, Simon J" uniqKey="Greenhill S" first="Simon J." last="Greenhill">Simon J. Greenhill</name>
</noRegion>
<name sortKey="Gray, Russell D" sort="Gray, Russell D" uniqKey="Gray R" first="Russell D." last="Gray">Russell D. Gray</name>
</country>
<country name="Australie">
<noRegion>
<name sortKey="Greenhill, Simon J" sort="Greenhill, Simon J" uniqKey="Greenhill S" first="Simon J." last="Greenhill">Simon J. Greenhill</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Asie/explor/AustralieFrV1/Data/Pmc/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000060 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd -nk 000060 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Asie
   |area=    AustralieFrV1
   |flux=    Pmc
   |étape=   Checkpoint
   |type=    RBID
   |clé=     PMC:5271327
   |texte=   The Potential of Automatic Word Comparison for Historical Linguistics
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/RBID.i   -Sk "pubmed:28129337" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a AustralieFrV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Dec 5 10:43:12 2017. Site generation: Tue Mar 5 14:07:20 2024