Serveur d'exploration sur les relations entre la France et l'Australie

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

The Potential of Automatic Word Comparison for Historical Linguistics.

Identifieur interne : 001316 ( PubMed/Corpus ); précédent : 001315; suivant : 001317

The Potential of Automatic Word Comparison for Historical Linguistics.

Auteurs : Johann-Mattis List ; Simon J. Greenhill ; Russell D. Gray

Source :

RBID : pubmed:28129337

English descriptors

Abstract

The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computational approaches can take care of the repetitive and schematic tasks leaving experts to concentrate on answering interesting questions. Here we test the potential of automatic methods to detect etymologically related words (cognates) in cross-linguistic data. Using a newly compiled database of expert cognate judgments across five different language families, we compare how well different automatic approaches distinguish related from unrelated words. Our results show that automatic methods can identify cognates with a very high degree of accuracy, reaching 89% for the best-performing method Infomap. We identify the specific strengths and weaknesses of these different methods and point to major challenges for future approaches. Current automatic approaches for cognate detection-although not perfect-could become an important component of future research in historical linguistics.

DOI: 10.1371/journal.pone.0170046
PubMed: 28129337

Links to Exploration step

pubmed:28129337

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">The Potential of Automatic Word Comparison for Historical Linguistics.</title>
<author>
<name sortKey="List, Johann Mattis" sort="List, Johann Mattis" uniqKey="List J" first="Johann-Mattis" last="List">Johann-Mattis List</name>
<affiliation>
<nlm:affiliation>Centre des Recherches Linguistiques sur l'Asie Orientale, École des Hautes Études en Sciences Sociales, 2 Rue de Lille, 75007 Paris, France.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Greenhill, Simon J" sort="Greenhill, Simon J" uniqKey="Greenhill S" first="Simon J" last="Greenhill">Simon J. Greenhill</name>
<affiliation>
<nlm:affiliation>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena, Germany.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Gray, Russell D" sort="Gray, Russell D" uniqKey="Gray R" first="Russell D" last="Gray">Russell D. Gray</name>
<affiliation>
<nlm:affiliation>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena, Germany.</nlm:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2017">2017</date>
<idno type="RBID">pubmed:28129337</idno>
<idno type="pmid">28129337</idno>
<idno type="doi">10.1371/journal.pone.0170046</idno>
<idno type="wicri:Area/PubMed/Corpus">001316</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001316</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">The Potential of Automatic Word Comparison for Historical Linguistics.</title>
<author>
<name sortKey="List, Johann Mattis" sort="List, Johann Mattis" uniqKey="List J" first="Johann-Mattis" last="List">Johann-Mattis List</name>
<affiliation>
<nlm:affiliation>Centre des Recherches Linguistiques sur l'Asie Orientale, École des Hautes Études en Sciences Sociales, 2 Rue de Lille, 75007 Paris, France.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Greenhill, Simon J" sort="Greenhill, Simon J" uniqKey="Greenhill S" first="Simon J" last="Greenhill">Simon J. Greenhill</name>
<affiliation>
<nlm:affiliation>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena, Germany.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Gray, Russell D" sort="Gray, Russell D" uniqKey="Gray R" first="Russell D" last="Gray">Russell D. Gray</name>
<affiliation>
<nlm:affiliation>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena, Germany.</nlm:affiliation>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PloS one</title>
<idno type="eISSN">1932-6203</idno>
<imprint>
<date when="2017" type="published">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Cluster Analysis</term>
<term>Databases, Factual</term>
<term>History, Ancient</term>
<term>Humans</term>
<term>Language (history)</term>
<term>Linguistics (history)</term>
<term>Linguistics (statistics & numerical data)</term>
<term>Semantics</term>
<term>Vocabulary</term>
</keywords>
<keywords scheme="MESH" qualifier="history" xml:lang="en">
<term>Language</term>
<term>Linguistics</term>
</keywords>
<keywords scheme="MESH" qualifier="statistics & numerical data" xml:lang="en">
<term>Linguistics</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Cluster Analysis</term>
<term>Databases, Factual</term>
<term>History, Ancient</term>
<term>Humans</term>
<term>Semantics</term>
<term>Vocabulary</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computational approaches can take care of the repetitive and schematic tasks leaving experts to concentrate on answering interesting questions. Here we test the potential of automatic methods to detect etymologically related words (cognates) in cross-linguistic data. Using a newly compiled database of expert cognate judgments across five different language families, we compare how well different automatic approaches distinguish related from unrelated words. Our results show that automatic methods can identify cognates with a very high degree of accuracy, reaching 89% for the best-performing method Infomap. We identify the specific strengths and weaknesses of these different methods and point to major challenges for future approaches. Current automatic approaches for cognate detection-although not perfect-could become an important component of future research in historical linguistics.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">28129337</PMID>
<DateCreated>
<Year>2017</Year>
<Month>01</Month>
<Day>27</Day>
</DateCreated>
<DateCompleted>
<Year>2017</Year>
<Month>08</Month>
<Day>10</Day>
</DateCompleted>
<DateRevised>
<Year>2017</Year>
<Month>08</Month>
<Day>10</Day>
</DateRevised>
<Article PubModel="Electronic-eCollection">
<Journal>
<ISSN IssnType="Electronic">1932-6203</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>12</Volume>
<Issue>1</Issue>
<PubDate>
<Year>2017</Year>
</PubDate>
</JournalIssue>
<Title>PloS one</Title>
<ISOAbbreviation>PLoS ONE</ISOAbbreviation>
</Journal>
<ArticleTitle>The Potential of Automatic Word Comparison for Historical Linguistics.</ArticleTitle>
<Pagination>
<MedlinePgn>e0170046</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1371/journal.pone.0170046</ELocationID>
<Abstract>
<AbstractText>The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computational approaches can take care of the repetitive and schematic tasks leaving experts to concentrate on answering interesting questions. Here we test the potential of automatic methods to detect etymologically related words (cognates) in cross-linguistic data. Using a newly compiled database of expert cognate judgments across five different language families, we compare how well different automatic approaches distinguish related from unrelated words. Our results show that automatic methods can identify cognates with a very high degree of accuracy, reaching 89% for the best-performing method Infomap. We identify the specific strengths and weaknesses of these different methods and point to major challenges for future approaches. Current automatic approaches for cognate detection-although not perfect-could become an important component of future research in historical linguistics.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>List</LastName>
<ForeName>Johann-Mattis</ForeName>
<Initials>JM</Initials>
<Identifier Source="ORCID">http://orcid.org/0000-0003-2133-8919</Identifier>
<AffiliationInfo>
<Affiliation>Centre des Recherches Linguistiques sur l'Asie Orientale, École des Hautes Études en Sciences Sociales, 2 Rue de Lille, 75007 Paris, France.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Greenhill</LastName>
<ForeName>Simon J</ForeName>
<Initials>SJ</Initials>
<AffiliationInfo>
<Affiliation>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena, Germany.</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>ARC Centre of Excellence for the Dynamics of Language, Australian National University, Canberra, 2600, Australia.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Gray</LastName>
<ForeName>Russell D</ForeName>
<Initials>RD</Initials>
<AffiliationInfo>
<Affiliation>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena, Germany.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016456">Historical Article</PublicationType>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2017</Year>
<Month>01</Month>
<Day>27</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>PLoS One</MedlineTA>
<NlmUniqueID>101285081</NlmUniqueID>
<ISSNLinking>1932-6203</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<CommentsCorrectionsList>
<CommentsCorrections RefType="Cites">
<RefSource>Proc Natl Acad Sci U S A. 2008 Jan 29;105(4):1118-23</RefSource>
<PMID Version="1">18216267</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>PLoS One. 2015 Oct 27;10(10):e0141563</RefSource>
<PMID Version="1">26506615</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Biol Direct. 2016 Aug 20;11:39</RefSource>
<PMID Version="1">27544206</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Proc Biol Sci. 2009 Aug 7;276(1668):2703-10</RefSource>
<PMID Version="1">19403539</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Science. 2007 Feb 16;315(5814):972-6</RefSource>
<PMID Version="1">17218491</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Proc Natl Acad Sci U S A. 2015 Oct 13;112(41):12752-7</RefSource>
<PMID Version="1">26403857</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>BMC Bioinformatics. 2009 Mar 30;10:99</RefSource>
<PMID Version="1">19331680</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Proc Natl Acad Sci U S A. 2016 Mar 29;113(13):3579-84</RefSource>
<PMID Version="1">26976593</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Proc Natl Acad Sci U S A. 2002 Jun 11;99(12):7821-6</RefSource>
<PMID Version="1">12060727</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Proc Natl Acad Sci U S A. 2013 Mar 12;110(11):4224-9</RefSource>
<PMID Version="1">23401532</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Evol Bioinform Online. 2008 Nov 03;4:271-83</RefSource>
<PMID Version="1">19204825</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Trends Microbiol. 2016 Mar;24(3):224-37</RefSource>
<PMID Version="1">26774999</PMID>
</CommentsCorrections>
</CommentsCorrectionsList>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000465" MajorTopicYN="N">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016000" MajorTopicYN="Y">Cluster Analysis</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016208" MajorTopicYN="N">Databases, Factual</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D049690" MajorTopicYN="N">History, Ancient</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D007802" MajorTopicYN="N">Language</DescriptorName>
<QualifierName UI="Q000266" MajorTopicYN="Y">history</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D008037" MajorTopicYN="N">Linguistics</DescriptorName>
<QualifierName UI="Q000266" MajorTopicYN="N">history</QualifierName>
<QualifierName UI="Q000706" MajorTopicYN="Y">statistics & numerical data</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D012660" MajorTopicYN="N">Semantics</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D014825" MajorTopicYN="N">Vocabulary</DescriptorName>
</MeshHeading>
</MeshHeadingList>
<CoiStatement>The authors have declared that no competing interests exist.</CoiStatement>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2016</Year>
<Month>10</Month>
<Day>18</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2016</Year>
<Month>12</Month>
<Day>28</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2017</Year>
<Month>1</Month>
<Day>28</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2017</Year>
<Month>1</Month>
<Day>28</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2017</Year>
<Month>8</Month>
<Day>11</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">28129337</ArticleId>
<ArticleId IdType="doi">10.1371/journal.pone.0170046</ArticleId>
<ArticleId IdType="pii">PONE-D-16-41494</ArticleId>
<ArticleId IdType="pmc">PMC5271327</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Asie/explor/AustralieFrV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001316 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 001316 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Asie
   |area=    AustralieFrV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:28129337
   |texte=   The Potential of Automatic Word Comparison for Historical Linguistics.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:28129337" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a AustralieFrV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Dec 5 10:43:12 2017. Site generation: Tue Mar 5 14:07:20 2024