The Potential of Automatic Word Comparison for Historical Linguistics.
Identifieur interne : 001316 ( PubMed/Corpus ); précédent : 001315; suivant : 001317The Potential of Automatic Word Comparison for Historical Linguistics.
Auteurs : Johann-Mattis List ; Simon J. Greenhill ; Russell D. GraySource :
- PloS one [ 1932-6203 ] ; 2017.
English descriptors
- KwdEn :
- MESH :
- history : Language, Linguistics.
- statistics & numerical data : Linguistics.
- Algorithms, Cluster Analysis, Databases, Factual, History, Ancient, Humans, Semantics, Vocabulary.
Abstract
The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computational approaches can take care of the repetitive and schematic tasks leaving experts to concentrate on answering interesting questions. Here we test the potential of automatic methods to detect etymologically related words (cognates) in cross-linguistic data. Using a newly compiled database of expert cognate judgments across five different language families, we compare how well different automatic approaches distinguish related from unrelated words. Our results show that automatic methods can identify cognates with a very high degree of accuracy, reaching 89% for the best-performing method Infomap. We identify the specific strengths and weaknesses of these different methods and point to major challenges for future approaches. Current automatic approaches for cognate detection-although not perfect-could become an important component of future research in historical linguistics.
DOI: 10.1371/journal.pone.0170046
PubMed: 28129337
Links to Exploration step
pubmed:28129337Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">The Potential of Automatic Word Comparison for Historical Linguistics.</title>
<author><name sortKey="List, Johann Mattis" sort="List, Johann Mattis" uniqKey="List J" first="Johann-Mattis" last="List">Johann-Mattis List</name>
<affiliation><nlm:affiliation>Centre des Recherches Linguistiques sur l'Asie Orientale, École des Hautes Études en Sciences Sociales, 2 Rue de Lille, 75007 Paris, France.</nlm:affiliation>
</affiliation>
</author>
<author><name sortKey="Greenhill, Simon J" sort="Greenhill, Simon J" uniqKey="Greenhill S" first="Simon J" last="Greenhill">Simon J. Greenhill</name>
<affiliation><nlm:affiliation>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena, Germany.</nlm:affiliation>
</affiliation>
</author>
<author><name sortKey="Gray, Russell D" sort="Gray, Russell D" uniqKey="Gray R" first="Russell D" last="Gray">Russell D. Gray</name>
<affiliation><nlm:affiliation>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena, Germany.</nlm:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2017">2017</date>
<idno type="RBID">pubmed:28129337</idno>
<idno type="pmid">28129337</idno>
<idno type="doi">10.1371/journal.pone.0170046</idno>
<idno type="wicri:Area/PubMed/Corpus">001316</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001316</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">The Potential of Automatic Word Comparison for Historical Linguistics.</title>
<author><name sortKey="List, Johann Mattis" sort="List, Johann Mattis" uniqKey="List J" first="Johann-Mattis" last="List">Johann-Mattis List</name>
<affiliation><nlm:affiliation>Centre des Recherches Linguistiques sur l'Asie Orientale, École des Hautes Études en Sciences Sociales, 2 Rue de Lille, 75007 Paris, France.</nlm:affiliation>
</affiliation>
</author>
<author><name sortKey="Greenhill, Simon J" sort="Greenhill, Simon J" uniqKey="Greenhill S" first="Simon J" last="Greenhill">Simon J. Greenhill</name>
<affiliation><nlm:affiliation>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena, Germany.</nlm:affiliation>
</affiliation>
</author>
<author><name sortKey="Gray, Russell D" sort="Gray, Russell D" uniqKey="Gray R" first="Russell D" last="Gray">Russell D. Gray</name>
<affiliation><nlm:affiliation>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena, Germany.</nlm:affiliation>
</affiliation>
</author>
</analytic>
<series><title level="j">PloS one</title>
<idno type="eISSN">1932-6203</idno>
<imprint><date when="2017" type="published">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Cluster Analysis</term>
<term>Databases, Factual</term>
<term>History, Ancient</term>
<term>Humans</term>
<term>Language (history)</term>
<term>Linguistics (history)</term>
<term>Linguistics (statistics & numerical data)</term>
<term>Semantics</term>
<term>Vocabulary</term>
</keywords>
<keywords scheme="MESH" qualifier="history" xml:lang="en"><term>Language</term>
<term>Linguistics</term>
</keywords>
<keywords scheme="MESH" qualifier="statistics & numerical data" xml:lang="en"><term>Linguistics</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Cluster Analysis</term>
<term>Databases, Factual</term>
<term>History, Ancient</term>
<term>Humans</term>
<term>Semantics</term>
<term>Vocabulary</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computational approaches can take care of the repetitive and schematic tasks leaving experts to concentrate on answering interesting questions. Here we test the potential of automatic methods to detect etymologically related words (cognates) in cross-linguistic data. Using a newly compiled database of expert cognate judgments across five different language families, we compare how well different automatic approaches distinguish related from unrelated words. Our results show that automatic methods can identify cognates with a very high degree of accuracy, reaching 89% for the best-performing method Infomap. We identify the specific strengths and weaknesses of these different methods and point to major challenges for future approaches. Current automatic approaches for cognate detection-although not perfect-could become an important component of future research in historical linguistics.</div>
</front>
</TEI>
<pubmed><MedlineCitation Status="MEDLINE" Owner="NLM"><PMID Version="1">28129337</PMID>
<DateCreated><Year>2017</Year>
<Month>01</Month>
<Day>27</Day>
</DateCreated>
<DateCompleted><Year>2017</Year>
<Month>08</Month>
<Day>10</Day>
</DateCompleted>
<DateRevised><Year>2017</Year>
<Month>08</Month>
<Day>10</Day>
</DateRevised>
<Article PubModel="Electronic-eCollection"><Journal><ISSN IssnType="Electronic">1932-6203</ISSN>
<JournalIssue CitedMedium="Internet"><Volume>12</Volume>
<Issue>1</Issue>
<PubDate><Year>2017</Year>
</PubDate>
</JournalIssue>
<Title>PloS one</Title>
<ISOAbbreviation>PLoS ONE</ISOAbbreviation>
</Journal>
<ArticleTitle>The Potential of Automatic Word Comparison for Historical Linguistics.</ArticleTitle>
<Pagination><MedlinePgn>e0170046</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1371/journal.pone.0170046</ELocationID>
<Abstract><AbstractText>The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computational approaches can take care of the repetitive and schematic tasks leaving experts to concentrate on answering interesting questions. Here we test the potential of automatic methods to detect etymologically related words (cognates) in cross-linguistic data. Using a newly compiled database of expert cognate judgments across five different language families, we compare how well different automatic approaches distinguish related from unrelated words. Our results show that automatic methods can identify cognates with a very high degree of accuracy, reaching 89% for the best-performing method Infomap. We identify the specific strengths and weaknesses of these different methods and point to major challenges for future approaches. Current automatic approaches for cognate detection-although not perfect-could become an important component of future research in historical linguistics.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y"><Author ValidYN="Y"><LastName>List</LastName>
<ForeName>Johann-Mattis</ForeName>
<Initials>JM</Initials>
<Identifier Source="ORCID">http://orcid.org/0000-0003-2133-8919</Identifier>
<AffiliationInfo><Affiliation>Centre des Recherches Linguistiques sur l'Asie Orientale, École des Hautes Études en Sciences Sociales, 2 Rue de Lille, 75007 Paris, France.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Greenhill</LastName>
<ForeName>Simon J</ForeName>
<Initials>SJ</Initials>
<AffiliationInfo><Affiliation>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena, Germany.</Affiliation>
</AffiliationInfo>
<AffiliationInfo><Affiliation>ARC Centre of Excellence for the Dynamics of Language, Australian National University, Canberra, 2600, Australia.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Gray</LastName>
<ForeName>Russell D</ForeName>
<Initials>RD</Initials>
<AffiliationInfo><Affiliation>Department for Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07743, Jena, Germany.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList><PublicationType UI="D016456">Historical Article</PublicationType>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic"><Year>2017</Year>
<Month>01</Month>
<Day>27</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo><Country>United States</Country>
<MedlineTA>PLoS One</MedlineTA>
<NlmUniqueID>101285081</NlmUniqueID>
<ISSNLinking>1932-6203</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<CommentsCorrectionsList><CommentsCorrections RefType="Cites"><RefSource>Proc Natl Acad Sci U S A. 2008 Jan 29;105(4):1118-23</RefSource>
<PMID Version="1">18216267</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites"><RefSource>PLoS One. 2015 Oct 27;10(10):e0141563</RefSource>
<PMID Version="1">26506615</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites"><RefSource>Biol Direct. 2016 Aug 20;11:39</RefSource>
<PMID Version="1">27544206</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites"><RefSource>Proc Biol Sci. 2009 Aug 7;276(1668):2703-10</RefSource>
<PMID Version="1">19403539</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites"><RefSource>Science. 2007 Feb 16;315(5814):972-6</RefSource>
<PMID Version="1">17218491</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites"><RefSource>Proc Natl Acad Sci U S A. 2015 Oct 13;112(41):12752-7</RefSource>
<PMID Version="1">26403857</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites"><RefSource>BMC Bioinformatics. 2009 Mar 30;10:99</RefSource>
<PMID Version="1">19331680</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites"><RefSource>Proc Natl Acad Sci U S A. 2016 Mar 29;113(13):3579-84</RefSource>
<PMID Version="1">26976593</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites"><RefSource>Proc Natl Acad Sci U S A. 2002 Jun 11;99(12):7821-6</RefSource>
<PMID Version="1">12060727</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites"><RefSource>Proc Natl Acad Sci U S A. 2013 Mar 12;110(11):4224-9</RefSource>
<PMID Version="1">23401532</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites"><RefSource>Evol Bioinform Online. 2008 Nov 03;4:271-83</RefSource>
<PMID Version="1">19204825</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites"><RefSource>Trends Microbiol. 2016 Mar;24(3):224-37</RefSource>
<PMID Version="1">26774999</PMID>
</CommentsCorrections>
</CommentsCorrectionsList>
<MeshHeadingList><MeshHeading><DescriptorName UI="D000465" MajorTopicYN="N">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D016000" MajorTopicYN="Y">Cluster Analysis</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D016208" MajorTopicYN="N">Databases, Factual</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D049690" MajorTopicYN="N">History, Ancient</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D007802" MajorTopicYN="N">Language</DescriptorName>
<QualifierName UI="Q000266" MajorTopicYN="Y">history</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D008037" MajorTopicYN="N">Linguistics</DescriptorName>
<QualifierName UI="Q000266" MajorTopicYN="N">history</QualifierName>
<QualifierName UI="Q000706" MajorTopicYN="Y">statistics & numerical data</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D012660" MajorTopicYN="N">Semantics</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D014825" MajorTopicYN="N">Vocabulary</DescriptorName>
</MeshHeading>
</MeshHeadingList>
<CoiStatement>The authors have declared that no competing interests exist.</CoiStatement>
</MedlineCitation>
<PubmedData><History><PubMedPubDate PubStatus="received"><Year>2016</Year>
<Month>10</Month>
<Day>18</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted"><Year>2016</Year>
<Month>12</Month>
<Day>28</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez"><Year>2017</Year>
<Month>1</Month>
<Day>28</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed"><Year>2017</Year>
<Month>1</Month>
<Day>28</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline"><Year>2017</Year>
<Month>8</Month>
<Day>11</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList><ArticleId IdType="pubmed">28129337</ArticleId>
<ArticleId IdType="doi">10.1371/journal.pone.0170046</ArticleId>
<ArticleId IdType="pii">PONE-D-16-41494</ArticleId>
<ArticleId IdType="pmc">PMC5271327</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Asie/explor/AustralieFrV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001316 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 001316 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Asie |area= AustralieFrV1 |flux= PubMed |étape= Corpus |type= RBID |clé= pubmed:28129337 |texte= The Potential of Automatic Word Comparison for Historical Linguistics. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i -Sk "pubmed:28129337" \ | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd \ | NlmPubMed2Wicri -a AustralieFrV1
This area was generated with Dilib version V0.6.33. |