La correction participative de l’OCR par crowdsourcing au profit des bibliothèques numériques
Identifieur interne : 000016 ( France/Extraction ); précédent : 000015; suivant : 000017La correction participative de l’OCR par crowdsourcing au profit des bibliothèques numériques
Auteurs : Mathieu Andro [France] ; Imad Saleh [France]Source :
- Bulletin des bibliothèques de France [ 0006-2006 ] ; 2015.
Descripteurs français
- mix :
- Wicri :
- topic : Numérisation.
Abstract
For their digitization projects, libraries produce often OCR with errors which can be corrected by providers employing low cost labor. But libraries May also appeal to web volunteers (explicit crowdsourcing) or to a paid crowd (like Amazon Mechanical Turk marketplace) or to users correcting OCR by playing games (gamification) or to internet users who don’t know that they are correcting OCR (implicit crowdsourcing like reCAPTCHA). Profitability of these experiments is compared.
Url:
Links toward previous steps (curation, corpus...)
- to stream Hal, to step Corpus: 000072
- to stream Hal, to step Curation: 000072
- to stream Hal, to step Checkpoint: 000018
- to stream Main, to step Merge: 000058
- to stream Main, to step Curation: 000028
- to stream Main, to step Exploration: 000028
Links to Exploration step
Hal:hal-01164263Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">La correction participative de l’OCR par crowdsourcing au profit des bibliothèques numériques</title>
<author><name sortKey="Andro, Mathieu" sort="Andro, Mathieu" uniqKey="Andro M" first="Mathieu" last="Andro">Mathieu Andro</name>
<affiliation wicri:level="1"><hal:affiliation type="laboratory" xml:id="struct-92981" status="VALID"><orgName>INRA</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
<listRelation><relation active="#struct-92114" type="direct"></relation>
</listRelation>
<tutelles><tutelle active="#struct-92114" type="direct"><org type="institution" xml:id="struct-92114" status="VALID"><orgName>Institut National de la Recherche Agronomique</orgName>
<orgName type="acronym">INRA</orgName>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.inra.fr</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author><name sortKey="Saleh, Imad" sort="Saleh, Imad" uniqKey="Saleh I" first="Imad" last="Saleh">Imad Saleh</name>
<affiliation wicri:level="1"><hal:affiliation type="laboratory" xml:id="struct-39850" status="VALID"><orgName>Laboratoire Paragraphe</orgName>
<desc><address><addrLine>Département Hypermédia - 2 rue de la Liberté - 93526 Saint-Denis cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://paragraphe.info/</ref>
</desc>
<listRelation><relation name="EA349" active="#struct-11141" type="direct"></relation>
</listRelation>
<tutelles><tutelle name="EA349" active="#struct-11141" type="direct"><org type="institution" xml:id="struct-11141" status="VALID"><orgName>Université Paris 8, Vincennes-Saint-Denis</orgName>
<orgName type="acronym">UP8</orgName>
<desc><address><addrLine>2 rue de la Liberté - 93526 Saint-Denis cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-paris8.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:hal-01164263</idno>
<idno type="halId">hal-01164263</idno>
<idno type="halUri">https://hal.archives-ouvertes.fr/hal-01164263</idno>
<idno type="url">https://hal.archives-ouvertes.fr/hal-01164263</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Hal/Corpus">000072</idno>
<idno type="wicri:Area/Hal/Curation">000072</idno>
<idno type="wicri:Area/Hal/Checkpoint">000018</idno>
<idno type="wicri:doubleKey">0006-2006:2015:Andro M:la:correction:participative</idno>
<idno type="wicri:Area/Main/Merge">000058</idno>
<idno type="wicri:Area/Main/Curation">000028</idno>
<idno type="wicri:Area/Main/Exploration">000028</idno>
<idno type="wicri:Area/France/Extraction">000016</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">La correction participative de l’OCR par crowdsourcing au profit des bibliothèques numériques</title>
<author><name sortKey="Andro, Mathieu" sort="Andro, Mathieu" uniqKey="Andro M" first="Mathieu" last="Andro">Mathieu Andro</name>
<affiliation wicri:level="1"><hal:affiliation type="laboratory" xml:id="struct-92981" status="VALID"><orgName>INRA</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
<listRelation><relation active="#struct-92114" type="direct"></relation>
</listRelation>
<tutelles><tutelle active="#struct-92114" type="direct"><org type="institution" xml:id="struct-92114" status="VALID"><orgName>Institut National de la Recherche Agronomique</orgName>
<orgName type="acronym">INRA</orgName>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.inra.fr</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author><name sortKey="Saleh, Imad" sort="Saleh, Imad" uniqKey="Saleh I" first="Imad" last="Saleh">Imad Saleh</name>
<affiliation wicri:level="1"><hal:affiliation type="laboratory" xml:id="struct-39850" status="VALID"><orgName>Laboratoire Paragraphe</orgName>
<desc><address><addrLine>Département Hypermédia - 2 rue de la Liberté - 93526 Saint-Denis cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://paragraphe.info/</ref>
</desc>
<listRelation><relation name="EA349" active="#struct-11141" type="direct"></relation>
</listRelation>
<tutelles><tutelle name="EA349" active="#struct-11141" type="direct"><org type="institution" xml:id="struct-11141" status="VALID"><orgName>Université Paris 8, Vincennes-Saint-Denis</orgName>
<orgName type="acronym">UP8</orgName>
<desc><address><addrLine>2 rue de la Liberté - 93526 Saint-Denis cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-paris8.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
</analytic>
<series><title level="j">Bulletin des bibliothèques de France</title>
<idno type="ISSN">0006-2006</idno>
<imprint><date type="datePub">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="mix" xml:lang="fr"><term>Correction participative de l'OCR</term>
<term>Crowdsourcing</term>
<term>Numérisation</term>
<term>OCR</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Numérisation</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">For their digitization projects, libraries produce often OCR with errors which can be corrected by providers employing low cost labor. But libraries May also appeal to web volunteers (explicit crowdsourcing) or to a paid crowd (like Amazon Mechanical Turk marketplace) or to users correcting OCR by playing games (gamification) or to internet users who don’t know that they are correcting OCR (implicit crowdsourcing like reCAPTCHA). Profitability of these experiments is compared.</div>
</front>
</TEI>
<affiliations><list><country><li>France</li>
</country>
</list>
<tree><country name="France"><noRegion><name sortKey="Andro, Mathieu" sort="Andro, Mathieu" uniqKey="Andro M" first="Mathieu" last="Andro">Mathieu Andro</name>
</noRegion>
<name sortKey="Saleh, Imad" sort="Saleh, Imad" uniqKey="Saleh I" first="Imad" last="Saleh">Imad Saleh</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/France/Extraction
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000016 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/France/Extraction/biblio.hfd -nk 000016 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= France |étape= Extraction |type= RBID |clé= Hal:hal-01164263 |texte= La correction participative de l’OCR par crowdsourcing au profit des bibliothèques numériques }}
This area was generated with Dilib version V0.6.32. |