Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Intelligent Bar Chart Plagiarism Detection in Documents

Identifieur interne : 000048 ( Pmc/Checkpoint ); précédent : 000047; suivant : 000049

Intelligent Bar Chart Plagiarism Detection in Documents

Auteurs : Mohammed Mumtaz Al-Dabbagh [Malaisie, Iraq] ; Naomie Salim [Malaisie] ; Amjad Rehman [Arabie saoudite] ; Mohammed Hazim Alkawaz [Malaisie, Iraq] ; Tanzila Saba [Arabie saoudite] ; Mznah Al-Rodhaan [Arabie saoudite] ; Abdullah Al-Dhelaan [Arabie saoudite]

Source :

RBID : PMC:4182899

Abstract

This paper presents a novel features mining approach from documents that could not be mined via optical character recognition (OCR). By identifying the intimate relationship between the text and graphical components, the proposed technique pulls out the Start, End, and Exact values for each bar. Furthermore, the word 2-gram and Euclidean distance methods are used to accurately detect and determine plagiarism in bar charts.


Url:
DOI: 10.1155/2014/612787
PubMed: 25309952
PubMed Central: 4182899


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:4182899

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Intelligent Bar Chart Plagiarism Detection in Documents</title>
<author>
<name sortKey="Al Dabbagh, Mohammed Mumtaz" sort="Al Dabbagh, Mohammed Mumtaz" uniqKey="Al Dabbagh M" first="Mohammed Mumtaz" last="Al-Dabbagh">Mohammed Mumtaz Al-Dabbagh</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia</nlm:aff>
<country xml:lang="fr">Malaisie</country>
<wicri:regionArea>Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor</wicri:regionArea>
<wicri:noRegion>Johor</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="I2">Faculty of Computer Sciences and Mathematics, University of Mosul, Mosul, Iraq</nlm:aff>
<country xml:lang="fr">Iraq</country>
<wicri:regionArea>Faculty of Computer Sciences and Mathematics, University of Mosul, Mosul</wicri:regionArea>
<wicri:noRegion>Mosul</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Salim, Naomie" sort="Salim, Naomie" uniqKey="Salim N" first="Naomie" last="Salim">Naomie Salim</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia</nlm:aff>
<country xml:lang="fr">Malaisie</country>
<wicri:regionArea>Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor</wicri:regionArea>
<wicri:noRegion>Johor</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Rehman, Amjad" sort="Rehman, Amjad" uniqKey="Rehman A" first="Amjad" last="Rehman">Amjad Rehman</name>
<affiliation wicri:level="1">
<nlm:aff id="I3">MIS Department, CBA, Salman Bin Abdulaziz University, Alkharj, Saudi Arabia</nlm:aff>
<country xml:lang="fr">Arabie saoudite</country>
<wicri:regionArea>MIS Department, CBA, Salman Bin Abdulaziz University, Alkharj</wicri:regionArea>
<wicri:noRegion>Alkharj</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Alkawaz, Mohammed Hazim" sort="Alkawaz, Mohammed Hazim" uniqKey="Alkawaz M" first="Mohammed Hazim" last="Alkawaz">Mohammed Hazim Alkawaz</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia</nlm:aff>
<country xml:lang="fr">Malaisie</country>
<wicri:regionArea>Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor</wicri:regionArea>
<wicri:noRegion>Johor</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="I2">Faculty of Computer Sciences and Mathematics, University of Mosul, Mosul, Iraq</nlm:aff>
<country xml:lang="fr">Iraq</country>
<wicri:regionArea>Faculty of Computer Sciences and Mathematics, University of Mosul, Mosul</wicri:regionArea>
<wicri:noRegion>Mosul</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Saba, Tanzila" sort="Saba, Tanzila" uniqKey="Saba T" first="Tanzila" last="Saba">Tanzila Saba</name>
<affiliation wicri:level="1">
<nlm:aff id="I4">College of Computer and Information Sciences (CCIS), Prince Sultan University, Riyadh, Saudi Arabia</nlm:aff>
<country xml:lang="fr">Arabie saoudite</country>
<wicri:regionArea>College of Computer and Information Sciences (CCIS), Prince Sultan University, Riyadh</wicri:regionArea>
<wicri:noRegion>Riyadh</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Al Rodhaan, Mznah" sort="Al Rodhaan, Mznah" uniqKey="Al Rodhaan M" first="Mznah" last="Al-Rodhaan">Mznah Al-Rodhaan</name>
<affiliation wicri:level="1">
<nlm:aff id="I5">Computer Science Department, College of Computer & Information Sciences, King Saud University, Riyadh, Saudi Arabia</nlm:aff>
<country xml:lang="fr">Arabie saoudite</country>
<wicri:regionArea>Computer Science Department, College of Computer & Information Sciences, King Saud University, Riyadh</wicri:regionArea>
<wicri:noRegion>Riyadh</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Al Dhelaan, Abdullah" sort="Al Dhelaan, Abdullah" uniqKey="Al Dhelaan A" first="Abdullah" last="Al-Dhelaan">Abdullah Al-Dhelaan</name>
<affiliation wicri:level="1">
<nlm:aff id="I5">Computer Science Department, College of Computer & Information Sciences, King Saud University, Riyadh, Saudi Arabia</nlm:aff>
<country xml:lang="fr">Arabie saoudite</country>
<wicri:regionArea>Computer Science Department, College of Computer & Information Sciences, King Saud University, Riyadh</wicri:regionArea>
<wicri:noRegion>Riyadh</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">25309952</idno>
<idno type="pmc">4182899</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4182899</idno>
<idno type="RBID">PMC:4182899</idno>
<idno type="doi">10.1155/2014/612787</idno>
<date when="2014">2014</date>
<idno type="wicri:Area/Pmc/Corpus">000192</idno>
<idno type="wicri:Area/Pmc/Curation">000192</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000048</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Intelligent Bar Chart Plagiarism Detection in Documents</title>
<author>
<name sortKey="Al Dabbagh, Mohammed Mumtaz" sort="Al Dabbagh, Mohammed Mumtaz" uniqKey="Al Dabbagh M" first="Mohammed Mumtaz" last="Al-Dabbagh">Mohammed Mumtaz Al-Dabbagh</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia</nlm:aff>
<country xml:lang="fr">Malaisie</country>
<wicri:regionArea>Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor</wicri:regionArea>
<wicri:noRegion>Johor</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="I2">Faculty of Computer Sciences and Mathematics, University of Mosul, Mosul, Iraq</nlm:aff>
<country xml:lang="fr">Iraq</country>
<wicri:regionArea>Faculty of Computer Sciences and Mathematics, University of Mosul, Mosul</wicri:regionArea>
<wicri:noRegion>Mosul</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Salim, Naomie" sort="Salim, Naomie" uniqKey="Salim N" first="Naomie" last="Salim">Naomie Salim</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia</nlm:aff>
<country xml:lang="fr">Malaisie</country>
<wicri:regionArea>Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor</wicri:regionArea>
<wicri:noRegion>Johor</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Rehman, Amjad" sort="Rehman, Amjad" uniqKey="Rehman A" first="Amjad" last="Rehman">Amjad Rehman</name>
<affiliation wicri:level="1">
<nlm:aff id="I3">MIS Department, CBA, Salman Bin Abdulaziz University, Alkharj, Saudi Arabia</nlm:aff>
<country xml:lang="fr">Arabie saoudite</country>
<wicri:regionArea>MIS Department, CBA, Salman Bin Abdulaziz University, Alkharj</wicri:regionArea>
<wicri:noRegion>Alkharj</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Alkawaz, Mohammed Hazim" sort="Alkawaz, Mohammed Hazim" uniqKey="Alkawaz M" first="Mohammed Hazim" last="Alkawaz">Mohammed Hazim Alkawaz</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia</nlm:aff>
<country xml:lang="fr">Malaisie</country>
<wicri:regionArea>Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor</wicri:regionArea>
<wicri:noRegion>Johor</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="I2">Faculty of Computer Sciences and Mathematics, University of Mosul, Mosul, Iraq</nlm:aff>
<country xml:lang="fr">Iraq</country>
<wicri:regionArea>Faculty of Computer Sciences and Mathematics, University of Mosul, Mosul</wicri:regionArea>
<wicri:noRegion>Mosul</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Saba, Tanzila" sort="Saba, Tanzila" uniqKey="Saba T" first="Tanzila" last="Saba">Tanzila Saba</name>
<affiliation wicri:level="1">
<nlm:aff id="I4">College of Computer and Information Sciences (CCIS), Prince Sultan University, Riyadh, Saudi Arabia</nlm:aff>
<country xml:lang="fr">Arabie saoudite</country>
<wicri:regionArea>College of Computer and Information Sciences (CCIS), Prince Sultan University, Riyadh</wicri:regionArea>
<wicri:noRegion>Riyadh</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Al Rodhaan, Mznah" sort="Al Rodhaan, Mznah" uniqKey="Al Rodhaan M" first="Mznah" last="Al-Rodhaan">Mznah Al-Rodhaan</name>
<affiliation wicri:level="1">
<nlm:aff id="I5">Computer Science Department, College of Computer & Information Sciences, King Saud University, Riyadh, Saudi Arabia</nlm:aff>
<country xml:lang="fr">Arabie saoudite</country>
<wicri:regionArea>Computer Science Department, College of Computer & Information Sciences, King Saud University, Riyadh</wicri:regionArea>
<wicri:noRegion>Riyadh</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Al Dhelaan, Abdullah" sort="Al Dhelaan, Abdullah" uniqKey="Al Dhelaan A" first="Abdullah" last="Al-Dhelaan">Abdullah Al-Dhelaan</name>
<affiliation wicri:level="1">
<nlm:aff id="I5">Computer Science Department, College of Computer & Information Sciences, King Saud University, Riyadh, Saudi Arabia</nlm:aff>
<country xml:lang="fr">Arabie saoudite</country>
<wicri:regionArea>Computer Science Department, College of Computer & Information Sciences, King Saud University, Riyadh</wicri:regionArea>
<wicri:noRegion>Riyadh</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j">The Scientific World Journal</title>
<idno type="ISSN">2356-6140</idno>
<idno type="eISSN">1537-744X</idno>
<imprint>
<date when="2014">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>This paper presents a novel features mining approach from documents that could not be mined via optical character recognition (OCR). By identifying the intimate relationship between the text and graphical components, the proposed technique pulls out the Start, End, and Exact values for each bar. Furthermore, the word 2-gram and Euclidean distance methods are used to accurately detect and determine plagiarism in bar charts.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="El Tahir Ali, Am" uniqKey="El Tahir Ali A">AM El Tahir Ali</name>
</author>
<author>
<name sortKey="Abdulla, Hmd" uniqKey="Abdulla H">HMD Abdulla</name>
</author>
<author>
<name sortKey="Snasel, V" uniqKey="Snasel V">V Snasel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rehman, A" uniqKey="Rehman A">A Rehman</name>
</author>
<author>
<name sortKey="Saba, T" uniqKey="Saba T">T Saba</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rehman, A" uniqKey="Rehman A">A Rehman</name>
</author>
<author>
<name sortKey="Saba, T" uniqKey="Saba T">T Saba</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Alzahrani, Sm" uniqKey="Alzahrani S">SM Alzahrani</name>
</author>
<author>
<name sortKey="Salim, N" uniqKey="Salim N">N Salim</name>
</author>
<author>
<name sortKey="Abraham, A" uniqKey="Abraham A">A Abraham</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Saba, T" uniqKey="Saba T">T Saba</name>
</author>
<author>
<name sortKey="Altameem, A" uniqKey="Altameem A">A Altameem</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ottenstein, Kj" uniqKey="Ottenstein K">KJ Ottenstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Parker, A" uniqKey="Parker A">A Parker</name>
</author>
<author>
<name sortKey="Hamblen, Jo" uniqKey="Hamblen J">JO Hamblen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bao, Jp" uniqKey="Bao J">JP Bao</name>
</author>
<author>
<name sortKey="Shen, Jy" uniqKey="Shen J">JY Shen</name>
</author>
<author>
<name sortKey="Liu, Xd" uniqKey="Liu X">XD Liu</name>
</author>
<author>
<name sortKey="Song, Qb" uniqKey="Song Q">QB Song</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Saba, T" uniqKey="Saba T">T Saba</name>
</author>
<author>
<name sortKey="Rehman, A" uniqKey="Rehman A">A Rehman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, W" uniqKey="Huang W">W Huang</name>
</author>
<author>
<name sortKey="Zong, S" uniqKey="Zong S">S Zong</name>
</author>
<author>
<name sortKey="Tan, Cl" uniqKey="Tan C">CL Tan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Saba, T" uniqKey="Saba T">T Saba</name>
</author>
<author>
<name sortKey="Rehman, A" uniqKey="Rehman A">A Rehman</name>
</author>
<author>
<name sortKey="Elarbi Boudihir, M" uniqKey="Elarbi Boudihir M">M Elarbi-Boudihir</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rehman, A" uniqKey="Rehman A">A Rehman</name>
</author>
<author>
<name sortKey="Mohammad, D" uniqKey="Mohammad D">D Mohammad</name>
</author>
<author>
<name sortKey="Sulong, G" uniqKey="Sulong G">G Sulong</name>
</author>
<author>
<name sortKey="Saba, T" uniqKey="Saba T">T Saba</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rehman, A" uniqKey="Rehman A">A Rehman</name>
</author>
<author>
<name sortKey="Saba, T" uniqKey="Saba T">T Saba</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rehman, A" uniqKey="Rehman A">A Rehman</name>
</author>
<author>
<name sortKey="Saba, T" uniqKey="Saba T">T Saba</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Saba, T" uniqKey="Saba T">T Saba</name>
</author>
<author>
<name sortKey="Rehman, A" uniqKey="Rehman A">A Rehman</name>
</author>
<author>
<name sortKey="Sulong, G" uniqKey="Sulong G">G Sulong</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Helmy, T" uniqKey="Helmy T">T Helmy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Savva, M" uniqKey="Savva M">M Savva</name>
</author>
<author>
<name sortKey="Kong, N" uniqKey="Kong N">N Kong</name>
</author>
<author>
<name sortKey="Chhajta, A" uniqKey="Chhajta A">A Chhajta</name>
</author>
<author>
<name sortKey="Li, F F" uniqKey="Li F">F-F Li</name>
</author>
<author>
<name sortKey="Agrawala, M" uniqKey="Agrawala M">M Agrawala</name>
</author>
<author>
<name sortKey="Heer, J" uniqKey="Heer J">J Heer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Elzer, S" uniqKey="Elzer S">S Elzer</name>
</author>
<author>
<name sortKey="Carberry, S" uniqKey="Carberry S">S Carberry</name>
</author>
<author>
<name sortKey="Zukerman, I" uniqKey="Zukerman I">I Zukerman</name>
</author>
<author>
<name sortKey="Chester, D" uniqKey="Chester D">D Chester</name>
</author>
<author>
<name sortKey="Green, N" uniqKey="Green N">N Green</name>
</author>
<author>
<name sortKey="Demir, S" uniqKey="Demir S">S Demir</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Carberry, S" uniqKey="Carberry S">S Carberry</name>
</author>
<author>
<name sortKey="Elzer, S" uniqKey="Elzer S">S Elzer</name>
</author>
<author>
<name sortKey="Green, N" uniqKey="Green N">N Green</name>
</author>
<author>
<name sortKey="Mccoy, K" uniqKey="Mccoy K">K McCoy</name>
</author>
<author>
<name sortKey="Chester, D" uniqKey="Chester D">D Chester</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rahim, Msm" uniqKey="Rahim M">MSM Rahim</name>
</author>
<author>
<name sortKey="Rehman, A" uniqKey="Rehman A">A Rehman</name>
</author>
<author>
<name sortKey="Ni Matus, S" uniqKey="Ni Matus S">S Ni'matus</name>
</author>
<author>
<name sortKey="Kurniawan, F" uniqKey="Kurniawan F">F Kurniawan</name>
</author>
<author>
<name sortKey="Saba, T" uniqKey="Saba T">T Saba</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yokokura, N" uniqKey="Yokokura N">N Yokokura</name>
</author>
<author>
<name sortKey="Watanabe, T" uniqKey="Watanabe T">T Watanabe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhou, Y" uniqKey="Zhou Y">Y Zhou</name>
</author>
<author>
<name sortKey="Tan, Cl" uniqKey="Tan C">CL Tan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhou, Yp" uniqKey="Zhou Y">YP Zhou</name>
</author>
<author>
<name sortKey="Tan, Cl" uniqKey="Tan C">CL Tan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, W" uniqKey="Huang W">W Huang</name>
</author>
<author>
<name sortKey="Tan, Cl" uniqKey="Tan C">CL Tan</name>
</author>
<author>
<name sortKey="Leow, Wk" uniqKey="Leow W">WK Leow</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mishchenko, A" uniqKey="Mishchenko A">A Mishchenko</name>
</author>
<author>
<name sortKey="Vassilieva, N" uniqKey="Vassilieva N">N Vassilieva</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, W" uniqKey="Huang W">W Huang</name>
</author>
<author>
<name sortKey="Tan, Cl" uniqKey="Tan C">CL Tan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hassan, Mm" uniqKey="Hassan M">MM Hassan</name>
</author>
<author>
<name sortKey="Al Khatib, W" uniqKey="Al Khatib W">W Al Khatib</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yang, L" uniqKey="Yang L">L Yang</name>
</author>
<author>
<name sortKey="Huang, W" uniqKey="Huang W">W Huang</name>
</author>
<author>
<name sortKey="Tan, Cl" uniqKey="Tan C">CL Tan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tourassi, Gd" uniqKey="Tourassi G">GD Tourassi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chandy, Da" uniqKey="Chandy D">DA Chandy</name>
</author>
<author>
<name sortKey="Johnson, Js" uniqKey="Johnson J">JS Johnson</name>
</author>
<author>
<name sortKey="Selvan, Se" uniqKey="Selvan S">SE Selvan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Han, F" uniqKey="Han F">F Han</name>
</author>
<author>
<name sortKey="Wang, H" uniqKey="Wang H">H Wang</name>
</author>
<author>
<name sortKey="Song, B" uniqKey="Song B">B Song</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Han, F" uniqKey="Han F">F Han</name>
</author>
<author>
<name sortKey="Wang, H" uniqKey="Wang H">H Wang</name>
</author>
<author>
<name sortKey="Song, B" uniqKey="Song B">B Song</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Eissen, Sm" uniqKey="Eissen S">SM Eissen</name>
</author>
<author>
<name sortKey="Stein, B" uniqKey="Stein B">B Stein</name>
</author>
<author>
<name sortKey="Kulig, M" uniqKey="Kulig M">M Kulig</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kasprzak, J" uniqKey="Kasprzak J">J Kasprzak</name>
</author>
<author>
<name sortKey="Brandejs, M" uniqKey="Brandejs M">M Brandejs</name>
</author>
<author>
<name sortKey="K Ipa, M" uniqKey="K Ipa M">M Křipač</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Basile, C" uniqKey="Basile C">C Basile</name>
</author>
<author>
<name sortKey="Benedetto, D" uniqKey="Benedetto D">D Benedetto</name>
</author>
<author>
<name sortKey="Caglioti, E" uniqKey="Caglioti E">E Caglioti</name>
</author>
<author>
<name sortKey="Cristadoro, G" uniqKey="Cristadoro G">G Cristadoro</name>
</author>
<author>
<name sortKey="Esposti, Md" uniqKey="Esposti M">MD Esposti</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hamed, Za" uniqKey="Hamed Z">ZA Hamed</name>
</author>
<author>
<name sortKey="Mohd Hashim, Sz" uniqKey="Mohd Hashim S">SZ Mohd Hashim</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">ScientificWorldJournal</journal-id>
<journal-id journal-id-type="iso-abbrev">ScientificWorldJournal</journal-id>
<journal-id journal-id-type="publisher-id">TSWJ</journal-id>
<journal-title-group>
<journal-title>The Scientific World Journal</journal-title>
</journal-title-group>
<issn pub-type="ppub">2356-6140</issn>
<issn pub-type="epub">1537-744X</issn>
<publisher>
<publisher-name>Hindawi Publishing Corporation</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">25309952</article-id>
<article-id pub-id-type="pmc">4182899</article-id>
<article-id pub-id-type="doi">10.1155/2014/612787</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Intelligent Bar Chart Plagiarism Detection in Documents</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Al-Dabbagh</surname>
<given-names>Mohammed Mumtaz</given-names>
</name>
<xref ref-type="aff" rid="I1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="I2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Salim</surname>
<given-names>Naomie</given-names>
</name>
<xref ref-type="aff" rid="I1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">http://orcid.org/0000-0002-3817-2655</contrib-id>
<name>
<surname>Rehman</surname>
<given-names>Amjad</given-names>
</name>
<xref ref-type="aff" rid="I3">
<sup>3</sup>
</xref>
<xref ref-type="corresp" rid="cor1">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Alkawaz</surname>
<given-names>Mohammed Hazim</given-names>
</name>
<xref ref-type="aff" rid="I1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="I2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Saba</surname>
<given-names>Tanzila</given-names>
</name>
<xref ref-type="aff" rid="I4">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Al-Rodhaan</surname>
<given-names>Mznah</given-names>
</name>
<xref ref-type="aff" rid="I5">
<sup>5</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Al-Dhelaan</surname>
<given-names>Abdullah</given-names>
</name>
<xref ref-type="aff" rid="I5">
<sup>5</sup>
</xref>
</contrib>
</contrib-group>
<aff id="I1">
<sup>1</sup>
Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia</aff>
<aff id="I2">
<sup>2</sup>
Faculty of Computer Sciences and Mathematics, University of Mosul, Mosul, Iraq</aff>
<aff id="I3">
<sup>3</sup>
MIS Department, CBA, Salman Bin Abdulaziz University, Alkharj, Saudi Arabia</aff>
<aff id="I4">
<sup>4</sup>
College of Computer and Information Sciences (CCIS), Prince Sultan University, Riyadh, Saudi Arabia</aff>
<aff id="I5">
<sup>5</sup>
Computer Science Department, College of Computer & Information Sciences, King Saud University, Riyadh, Saudi Arabia</aff>
<author-notes>
<corresp id="cor1">*Amjad Rehman:
<email>ar.khan@sau.edu.sa</email>
</corresp>
<fn fn-type="other">
<p>Academic Editor: Iftikhar Ahmad</p>
</fn>
</author-notes>
<pub-date pub-type="ppub">
<year>2014</year>
</pub-date>
<pub-date pub-type="epub">
<day>17</day>
<month>9</month>
<year>2014</year>
</pub-date>
<volume>2014</volume>
<elocation-id>612787</elocation-id>
<history>
<date date-type="received">
<day>30</day>
<month>3</month>
<year>2014</year>
</date>
<date date-type="rev-recd">
<day>21</day>
<month>6</month>
<year>2014</year>
</date>
<date date-type="accepted">
<day>7</day>
<month>7</month>
<year>2014</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2014 Mohammed Mumtaz Al-Dabbagh et al.</copyright-statement>
<copyright-year>2014</copyright-year>
<license license-type="open-access">
<license-p>This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>This paper presents a novel features mining approach from documents that could not be mined via optical character recognition (OCR). By identifying the intimate relationship between the text and graphical components, the proposed technique pulls out the Start, End, and Exact values for each bar. Furthermore, the word 2-gram and Euclidean distance methods are used to accurately detect and determine plagiarism in bar charts.</p>
</abstract>
</article-meta>
</front>
<body>
<sec id="sec1">
<title>1. Introduction</title>
<p>Detection, determination, and rectification of plagiarism are outstanding quests in every sphere of documentation and copyright. Lately, the significant advancement in information technology represented by digital libraries and World Wide Web is regarded as one of the main reasons for exponential growth in plagiarism appearance. It has become effortless for the plagiarist to utilize or copy the work of others without acknowledging or citing them due to the easy availability of most resources in digital format. Thus, plagiarism is regarded as one of the electronic crimes and intellectual thefts from others documents [
<xref rid="B1" ref-type="bibr">1</xref>
<xref rid="B28" ref-type="bibr">3</xref>
]. In academia, plagiarism posed a severe educational challenge which is acutely faced by research institutions, universities, and even schools. Several efforts are dedicated to detecting different types of plagiarism via programming code and text. Plagiarism detection began in the 1970s, where the identification of rate of plagiarism in programming code written by some computer languages such as C and Pascal was introduced [
<xref rid="B2" ref-type="bibr">4</xref>
]. Digital documents being the major carriers of information require extreme authentication in terms of their origins and trustfulness. The quest for achieving an accurate and efficient image forgery detection method in digital documentation is never ending. Developing a robust plagiarism detector by overcoming the limitations associated with human intervention is the key issue [
<xref rid="B29" ref-type="bibr">5</xref>
].</p>
<p>Recently, several researchers developed the algorithmic approach using computer codes to detect plagiarism in the homework of students [
<xref rid="B3" ref-type="bibr">6</xref>
]. Based on levels of plagiarism patterns some studies introduced plagiarism detection methods which are implemented in the algorithms and program codes [
<xref rid="B4" ref-type="bibr">7</xref>
]. Generally, the computerized or statistical approaches are exploited to detect plagiarism in natural language since the 1990s. The techniques used for natural language are based on various factors such as grammar, semantic, and grammar-semantics hybridizations [
<xref rid="B1" ref-type="bibr">1</xref>
,
<xref rid="B2" ref-type="bibr">4</xref>
]. However, the grammar-based method is one of the restrictive ones to detect plagiarism. This type of method analyzes the sentences based on grammatical structure, which can be efficiently used to detect the Exact Copy of text. While semantic-based method utilizes vector space model to calculate the similarities among the texts. Undoubtedly, the grammar-semantics hybrid approach overcomes all disadvantages of the other methods. This is considered as one of the most versatile techniques to detect text plagiarism [
<xref rid="B1" ref-type="bibr">1</xref>
,
<xref rid="B5" ref-type="bibr">8</xref>
,
<xref rid="B30" ref-type="bibr">9</xref>
].</p>
<p>A new taxonomy is introduced to explain the concepts for various types and patterns of text plagiarism [
<xref rid="B2" ref-type="bibr">4</xref>
]. Plagiarism is divided into two main parts including literal and intelligent one. Each part consists of several subparts which cover all possibilities of text plagiarism. Generally, the representations of quantitative information are formulated via infographic form by using figures, charts, and tables. The information that is displayed in charts, figures, and tables includes results of experiments, framework, and statistical facts. These data and information in homogenous form can be formulated by using various shapes such as pie chart, bar chart, and 2D and 3D plots [
<xref rid="B6" ref-type="bibr">10</xref>
<xref rid="B32" ref-type="bibr">12</xref>
].</p>
<p>We report a new type of plagiarism detection method by highlighting the types of information that can be stolen from others work without referencing. Firstly, different types of forged information are organized into taxonomy of chart, figure, and table to highlight varieties of plagiarism patterns such as Exact and Modified Copy. Secondly, plagiarism detection in bar chart image is performed depending on ten features in images. Some of the features are extracted by OCR tool while others are acquired from the relationship of text and graphic components [
<xref rid="B33" ref-type="bibr">13</xref>
,
<xref rid="B34" ref-type="bibr">14</xref>
]. Finally, the proposed technique is used to extract the features of bar chart images which cannot be extracted by OCR to detect plagiarism. The paper is organized as follows:
<xref ref-type="sec" rid="sec2">Section 2</xref>
describes various existing techniques for extracting data from bar chart images [
<xref rid="B35" ref-type="bibr">15</xref>
]. The taxonomy of chart, figure, and table related to plagiarism is presented in
<xref ref-type="sec" rid="sec3">Section 3</xref>
.
<xref ref-type="sec" rid="sec4"> Section 4</xref>
discusses the methodology and
<xref ref-type="sec" rid="sec5">Section 5</xref>
includes the experimental results of bar chart plagiarism detection. The discussions are elucidated in
<xref ref-type="sec" rid="sec6">Section 6</xref>
.
<xref ref-type="sec" rid="sec7"> Section 7</xref>
concludes the paper.</p>
</sec>
<sec id="sec2">
<title>2. Related Work</title>
<p>Categorizations of bar chart images refer to their labeling into one of the predefined geometrical or nongeometrical classes. Though the classification is apparently manageable, it is proven to be an extremely difficult problem in computer programming. Hence, there is an intense attention in developing automatic tools to categorize, describe, or retrieve images based on their contents.</p>
<p>Consequently, researchers attempted to extract the features and data from chart images. For automatic images categorization and description computational model is successfully introduced [
<xref rid="B20" ref-type="bibr">16</xref>
]. The analysis of local and global image characteristics by using text and image features is used in the model. The model is capable of differentiating geometrical and ordinary images. The computational model is comprised of classifier stage which is trained by the associated text features using advanced concepts and similarity matching stage.</p>
<p>Classification methods based on multiple-instance learning for chart images are also developed [
<xref rid="B6" ref-type="bibr">10</xref>
]. A re-revision system consisting of three concatenated major stages such as classification, extraction, and redesigned chart images is employed [
<xref rid="B7" ref-type="bibr">17</xref>
]. In the extraction stage, two types of charts (pie and bar) are focused on. Some techniques are presented in extracting data and graphical marks from chart images. Truly, the understanding and recognition of chart images require the preprocessing and extraction of data and information. Primarily, two types of available methods that deal with chart images are either to consider electronic chart directly [
<xref rid="B8" ref-type="bibr">18</xref>
<xref rid="B36" ref-type="bibr">20</xref>
] or to obtain them after converting into raster images [
<xref rid="B10" ref-type="bibr">21</xref>
<xref rid="B13" ref-type="bibr">24</xref>
]. Mishchenko and Vassilieva [
<xref rid="B21" ref-type="bibr">25</xref>
] introduced a model-based method for the classification of chart images which involved two main stages: firstly, predicting the location and the size of chart depending on the color distribution of chart image and secondly the extraction and matching of chart image edges to achieve the best match between query and database images.</p>
<p>The techniques for features extraction of image depend on the type of images such as chart or medical representation. Some techniques are applicable on two-dimensional plot of chart images while others work well for bar chart images. Hough transform technique is introduced as an approach to extract the features of bar chart images [
<xref rid="B12" ref-type="bibr">23</xref>
]. Some investigations are based on the edges of bars to extract the features [
<xref rid="B13" ref-type="bibr">24</xref>
,
<xref rid="B14" ref-type="bibr">26</xref>
]. The learning-based method is established to recognize the chart images [
<xref rid="B11" ref-type="bibr">22</xref>
]. The features of bar chart images can be extracted by describing the height and width of each bar, which is applied on statistical images to determine the similarities [
<xref rid="B15" ref-type="bibr">27</xref>
]. Meanwhile, other techniques focused on geometric features rather than data and information of scientific bar chart images [
<xref rid="B16" ref-type="bibr">28</xref>
].</p>
<p>Currently, several techniques are developed to extract the features from medical images. The texture is one of the visual contents of a medical image used in content-based image retrieval (CBIR) to represent the image effectively for searching and recovering similar areas [
<xref rid="B22" ref-type="bibr">29</xref>
]. Gray-level statistical matrix technique is applied to extract the texture information for the content-based retrieval of mammograms from the MIAS database [
<xref rid="B24" ref-type="bibr">30</xref>
]. 3D texture features technique based on the cooccurrence matrixes of the gray-level, gradient, and curvature information regarding the nodule volume data for classifying the malignancy from benign is introduced [
<xref rid="B25" ref-type="bibr">31</xref>
,
<xref rid="B26" ref-type="bibr">32</xref>
].</p>
</sec>
<sec id="sec3">
<title>3. Plagiarism Taxonomy and Patterns</title>
<p>Three types of graphic plagiarism such as figure, chart, and table are important to emphasize. Each type highlights different levels of plagiarism. The patterns and types of plagiarism for figures, charts, and tables are presented as taxonomy. Some kinds of text plagiarism are also evaluated [
<xref rid="B17" ref-type="bibr">33</xref>
]. The methods for detecting passages of text plagiarism for documents without appropriate citations are also suggested. Taxonomy is further extended to cover other types of plagiarism [
<xref rid="B2" ref-type="bibr">4</xref>
]. The taxonomy presented in various studies majorly demonstrates literal and intelligent plagiarism, where each kind includes many patterns of plagiarism. However, we are interested in detecting plagiarism of charts, as well as their taxonomy, figures, and tables. Alternatively, charts (pie, bar, and line) can be considered as one of the methods for representing the data and information of experimental results or comparing among techniques which are copied from other references without citation. Therefore, plagiarism of charts can be formulated in several forms to manifest the same information in various shapes. Taxonomy of chart plagiarism demonstrates many patterns and models which may be used to plagiarize the data of chart image.</p>
<p>Plagiarism patterns of chart, figure, and table are divided into Exact Copy and Modified Copy prototypes. The Exact Copy patterns of plagiarism are defined as the direct quote of data from other works without referencing, where copy and paste of the whole or part of the information image is performed. Simplicity is one of the important attributes of this type of plagiarism. Besides, this type of plagiarism does not require much time to hide the academic crime. The other type of graphic plagiarism is the Modified Copy for information of chart, figure, and table. This is more intelligently performed than the previous one because the same data can be formulated in many ways to exhibit the work in a different style than the original one. The goal of these intelligent means is that the plagiarist attempts to deceive the readers by doing some changes, such as translation from other languages or generating another shape for the same data.</p>
<p>The Modified Copy plagiarisms are primarily divided into translation and restructuring. In this research, new types of copying are organized by taxonomy which explains various patterns of graphic plagiarism. Furthermore, the primary focus of the bar chart image is to detect the proportion of plagarism. Figures
<xref ref-type="fig" rid="fig1">1</xref>
,
<xref ref-type="fig" rid="fig2">2</xref>
, and
<xref ref-type="fig" rid="fig3">3</xref>
depict the taxonomy of chart, figure, and table plagiarism, respectively.</p>
</sec>
<sec id="sec4">
<title>4. Methodology</title>
<p>The methodology of bar chart plagiarism detection as shown in
<xref ref-type="fig" rid="fig4">Figure 4</xref>
consists of three main stages, namely, planning and collection, feature extraction, and development with system evaluation. In the planning and collection stage, various patterns of graphical plagiarism via taxonomy of chart, figures, and table are presented (Figures
<xref ref-type="fig" rid="fig1">1</xref>
,
<xref ref-type="fig" rid="fig2">2</xref>
, and
<xref ref-type="fig" rid="fig3">3</xref>
). The taxonomy of chart plagiarism explains different formulations which plagiarize the data of bar chart images. Therefore, varieties of bar chart images are collected and data sets are considered. The gatherings of the data sets consist of 100 bar chart images for storing in databases and twenty images for query including all possibilities of plagiarism for bar chart images. These data sets are collected from different resources such as thesis, which represented various types of bar chart images in 2D and 3D. Besides, vertical and horizontal bar chart images as shown in
<xref ref-type="fig" rid="fig5">Figure 5</xref>
are taken into account.</p>
<p>In the feature extraction stage, the features of bar chart images are used to detect plagiarism. Various types of bar chart images are analyzed to detect the features of image. The bar chart images inferred to acquire maximum of ten features representing the information and the data of image. These features are common in different types of bar chart images, for instance, in 2D and 3D images. However, the number of uses for these features may different from each other.</p>
<p>The features extraction is an essential process to get the data from images which can be utilized to detect the rate of plagiarized data. Therefore, these ten features are categorized into low-level and high-level features. The low-level features refer to the text features of bar chart. The text features are the text which can be used in image to represent the information and data such as caption of image, label of each bar, label of coordinates, and values on coordinates. Generally, they are extracted from bar chart images using OCR tool. Conversely, the high-level features referring to numeric features cannot be extracted using OCR tool. The extraction of numeric features requires a relationship between the text and graphic components. The numeric features include values of bars in image. Each bar in an image has three numeric features which can be extracted by the proposed technique depending on Start, End, and Exact values. The Start and End values represent the first and last values while the Exact one corresponds to the real value of the bar. For instance, the text features for image in
<xref ref-type="fig" rid="fig5">Figure 5</xref>
are
<italic> “Figure 10: Revision Design Galleries,” “Silver, Platinum,…, Cash,”</italic>
and
<italic> “0%, 5%,…, 40%”</italic>
which represent caption of image and label of coordinates
<italic> X</italic>
and
<italic> Y</italic>
, respectively. Meanwhile, the numeric features represent the value of each bar, which can be detected by Start, End, and Exact features. For example, the Start, End, and Exact features of bar Silver are 5%, 10%, and 6%, respectively.</p>
<p>These features are used to detect the proportion of plagiarism for bar chart image. The extraction of Start, End, and Exact values necessitates preprocessing of bar chart image to the adjacent coordinates of the image. Image scanning is then performed to detect the length of each bar in order to find the numeric features for each bar. Storing of the features in databases depends on the type of features whether numeric or text. The features which are extracted by the proposed technique are represented as vectors, while the text features that are extracted by OCR are characterized as string.</p>
<p>The detection methods for text plagiarism are mainly categorized based on character, semantics, structure, citation, cluster, cross language, and syntax. Comparatively, the smaller number of textual components than normal paragraph text allows us to use character-based methods to detect plagiarism of bar chart images. The character-based methods depend on character matching approaches to exactly or partially detect the identical string for features of bar chart images. Various algorithms of plagiarism are adopted in the text as character
<italic>n</italic>
-gram to identify the similarity between two strings based on the number of identical characters of features. Some researchers use 8-gram and 5-gram techniques [
<xref rid="B18" ref-type="bibr">34</xref>
,
<xref rid="B19" ref-type="bibr">35</xref>
] for matching strings to detect plagiarism. We used 2-gram technique to detect plagiarism of bar chart images. This technique is used to represent the text features of bar chart images. Different similarity measures can be used to obtain the similarity for numeric features such as Euclidean distance, Jaccard, or cosine coefficient. The Euclidean distance is calculated by the following:
<disp-formula id="EEq1">
<label>(1)</label>
<mml:math id="M1">
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mtext>Ec</mml:mtext>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:msqrt>
<mml:mrow>
<mml:msub>
<mml:mstyle displaystyle="true">
<mml:mo stretchy="false"></mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mn mathvariant="normal">2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mrow>
</mml:msqrt>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
</p>
<p>Once the detection and storing of the proportion of plagiarism are completed then the performance of the system is evaluated. The performance is evaluated by overlapping of features using the relation of Precision and Recall given by the following:
<disp-formula id="EEq2">
<label>(2)</label>
<mml:math id="M2">
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mtext>Recall</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mi>  </mml:mi>
<mml:mfrac>
<mml:mrow>
<mml:mtext>Relavent</mml:mtext>
<mml:mi>  </mml:mi>
<mml:mtext>Documents</mml:mtext>
<mml:mi>  </mml:mi>
<mml:mtext>Retreived</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext>All</mml:mtext>
<mml:mi>  </mml:mi>
<mml:mtext>Relavent</mml:mtext>
<mml:mi>  </mml:mi>
<mml:mtext>Documents</mml:mtext>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mtext>Precision</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mi>  </mml:mi>
<mml:mfrac>
<mml:mrow>
<mml:mtext>Relavent</mml:mtext>
<mml:mi>  </mml:mi>
<mml:mtext>Documents</mml:mtext>
<mml:mi>  </mml:mi>
<mml:mtext>Retreived</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext>All</mml:mtext>
<mml:mi>  </mml:mi>
<mml:mtext>Documents</mml:mtext>
<mml:mi>  </mml:mi>
<mml:mtext>Retrived</mml:mtext>
</mml:mrow>
</mml:mfrac>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
</p>
</sec>
<sec id="sec5">
<title>5. Experimental Results</title>
<p>The bar chart plagiarism detections are carried out in four main stages such as submission of query images, feature extraction of bar chart image, plagiarism detection, and highlighting results. The first stage is to submit various types of query images covering different kinds of possible plagiarism to detect and judge plagiarism of bar chart images, while the features of query are extracted in the second stage. The third stage includes detection of plagiarism by using word 2-gram and Euclidean distance techniques. Finally, the features of query bar chart image that are plagiarized from others are highlighted and the proportion of the similarity is displayed.</p>
<p>The bar chart plagiarism is further divided into
<italic> Exact Copy</italic>
and
<italic> Modified Copy</italic>
as explained in taxonomy.
<xref ref-type="fig" rid="fig6"> Figure 6(a)</xref>
shows the query image while
<xref ref-type="fig" rid="fig6">Figure 6(b)</xref>
depicts the plagiarized images detected by the system. The first plagiarized image is similar to the whole data in the image while the second plagiarized image contains the same data that was plagiarized but presented as a horizontal bar chart image. The system extracts the features of query image and detects the proportion of plagiarism depending on Start, End, and Exact values for each bar as well as the label of each bar. The system highlights the data and information that are plagiarized and provides the proportion of plagiarism.</p>
<p>One of the patterns of plagiarism derived from
<italic> Modified Copy</italic>
is the stealing by changing scales. Each bar is modified by plagiarists by changing Start and End values to be different from the original image.
<xref ref-type="fig" rid="fig7"> Figure 7</xref>
illustrates the significant role played by the Exact values to detect this type of plagiarism.</p>
<p>The plagiarists may use integration among patterns of possible bar chart plagiarism to present a more complex image which has the same data quoting from other works.
<xref ref-type="fig" rid="fig8"> Figure 8</xref>
displays the query image which is modified by changing colours and scales of bars as well as changing their location via swapping. The proposed system is capable of detecting this type of plagiarism and identifies the proportion of similarity.</p>
<p>
<xref ref-type="fig" rid="fig9">Figure 9</xref>
illustrates the performance of the system for plagiarism detection of
<italic> Exact Copy</italic>
and
<italic> Modified Copy</italic>
patterns, respectively.</p>
</sec>
<sec id="sec6">
<title>6. Discussion</title>
<p>The state-of-the-art graphical plagiarism techniques and patterns are presented. The graphical plagiarism is considered one of the electronic crimes and thefts and the concepts of such stealing are newly viewed. Various important information and data can be represented as graphical forms such as results or frameworks for academic and business aspects. However, many systems of text plagiarism methods such as Turnitin are incapable of detecting plagiarism of images. In spite of the different styles of bar chart images, the extraction of features of image plays an important role in detecting plagiarism. Our proposed technique which is used to extract the numeric features played an essential role for bar chart plagiarism detection. The patterns of
<italic> Exact Copy</italic>
of bar chart plagiarism detection including direct copy of all data or part of data are underscored. Plagiarism which is carried out by modifying caption of images via restructuring or summarizing for label sentences is emphasized. Alternatively, the patterns of
<italic> Modified Copy</italic>
which are regarded as more complicated than
<italic> Exact Copy</italic>
patterns are also analyzed. The difficulty of these patterns is the changing on image which appears as the same data and information in different forms. The restructuring of information for image within the same shape is also covered. The edition of bar chart images including the change of image bar colors or changing the bar locations either by swapping or via generating horizontal bars from vertical bars and vice versa is discussed in detail. Besides, more professional modification such as changing of scales on coordinates which is completely different from original one can be detected by the proposed method. In this case, the
<italic> Start</italic>
and
<italic> End</italic>
features of bars are completely different. Consequently, the
<italic> Exact</italic>
features as well as other attributes play significant role in detecting plagiarism in bar chart image.</p>
</sec>
<sec id="sec7">
<title>7. Conclusion</title>
<p>We demonstrate the precise recognition of different plagiarized patterns in business documents using an intelligent bar chart detection system. The types and patterns of plagiarism are presented via taxonomy of figure, chart, and table. Various kinds of possible plagiarism are highlighted using taxonomy. Plagiarism of bar chart image as type of chart is detected by the newly proposed technique. It is established that the present technique is capable of extracting the features from a bar chart image which cannot be pulled out using OCR tool. Our technique first recognizes the connection between the text and graphical components to extract the Start, End, and Exact value for each bar. Using word 2-gram and Euclidean distance methods the accurate detection of plagiarism is performed. The detection of plagiarism is based on ten striking features. The system is capable of detecting different levels of plagiarism not only copy and paste of bar chart image but also modification on images such as changing color or scales. The present system efficiently and accurately distinguishes other possible alteration administered on these images such as swapping among bars location and even changes on caption via summarizing and restructuring. The proposed technique may be useful for intelligent plagiarism detection in business and academic documents.</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgment</title>
<p>The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through Research Group no. RGP-264. The authors are also thankful to Ministry of Science and Technology Innovation (MOSTI), Malaysia, and Research Management Center (RMC), Universiti Teknologi Malaysia (UTM), Johor, Malaysia, for their technical support and expertise in conducting this research.</p>
</ack>
<sec sec-type="conflict">
<title>Conflict of Interests</title>
<p>The authors declare that there is no conflict of interests regarding the publication of this paper.</p>
</sec>
<ref-list>
<ref id="B1">
<label>1</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>El Tahir Ali</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Abdulla</surname>
<given-names>HMD</given-names>
</name>
<name>
<surname>Snasel</surname>
<given-names>V</given-names>
</name>
</person-group>
<article-title>Survey of plagiarism detection methods</article-title>
<conf-name>Proceedings of the 5th Asia Modelling Symposium (AMS '05)</conf-name>
<conf-date>May 2011</conf-date>
<fpage>39</fpage>
<lpage>42</lpage>
<pub-id pub-id-type="other">2-s2.0-80052326576</pub-id>
</element-citation>
</ref>
<ref id="B27">
<label>2</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rehman</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Saba</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Features extraction for soccer video semantic analysis: current achievements and remaining issues</article-title>
<source>
<italic>Artificial Intelligence Review</italic>
</source>
<year>2014</year>
<volume>41</volume>
<issue>3</issue>
<fpage>451</fpage>
<lpage>461</lpage>
<pub-id pub-id-type="other">2-s2.0-84894471794</pub-id>
</element-citation>
</ref>
<ref id="B28">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rehman</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Saba</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Evaluation of artificial intelligent techniques to secure information in enterprises</article-title>
<source>
<italic>Artificial Intelligence Review</italic>
</source>
<year>2012</year>
<fpage>1</fpage>
<lpage>16</lpage>
<pub-id pub-id-type="other">2-s2.0-84868608498</pub-id>
</element-citation>
</ref>
<ref id="B2">
<label>4</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Alzahrani</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Salim</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Abraham</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Understanding plagiarism linguistic patterns, textual features, and detection methods</article-title>
<source>
<italic>IEEE Transactions on Systems, Man and Cybernetics C: Applications and Reviews</italic>
</source>
<year>2012</year>
<volume>42</volume>
<issue>2</issue>
<fpage>133</fpage>
<lpage>149</lpage>
<pub-id pub-id-type="other">2-s2.0-84857505386</pub-id>
</element-citation>
</ref>
<ref id="B29">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Saba</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Altameem</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Analysis of vision based systems to detect real time goal events in soccer videos</article-title>
<source>
<italic>Applied Artificial Intelligence</italic>
</source>
<year>2013</year>
<volume>27</volume>
<issue>7</issue>
<fpage>656</fpage>
<lpage>667</lpage>
<pub-id pub-id-type="other">2-s2.0-84882318174</pub-id>
</element-citation>
</ref>
<ref id="B3">
<label>6</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ottenstein</surname>
<given-names>KJ</given-names>
</name>
</person-group>
<article-title>An algorithmic approach to the detection and prevention of plagiarism</article-title>
<source>
<italic>SIGCSE Bulletin</italic>
</source>
<year>1976</year>
<volume>8</volume>
<fpage>30</fpage>
<lpage>41</lpage>
</element-citation>
</ref>
<ref id="B4">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Parker</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Hamblen</surname>
<given-names>JO</given-names>
</name>
</person-group>
<article-title>Computer algorithms for plagiarism detection</article-title>
<source>
<italic>IEEE Transactions on Education</italic>
</source>
<year>1989</year>
<volume>32</volume>
<fpage>94</fpage>
<lpage>99</lpage>
<pub-id pub-id-type="other">2-s2.0-34250902471</pub-id>
</element-citation>
</ref>
<ref id="B5">
<label>8</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bao</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>JY</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>XD</given-names>
</name>
<name>
<surname>Song</surname>
<given-names>QB</given-names>
</name>
</person-group>
<article-title>Survey on natural language text copy detection</article-title>
<source>
<italic>Journal of Software</italic>
</source>
<year>2003</year>
<volume>14</volume>
<issue>10</issue>
<fpage>1753</fpage>
<lpage>1760</lpage>
<pub-id pub-id-type="other">2-s2.0-1442352092</pub-id>
</element-citation>
</ref>
<ref id="B30">
<label>9</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Saba</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Rehman</surname>
<given-names>A</given-names>
</name>
</person-group>
<source>
<italic>Machine Learning and Script Recognition</italic>
</source>
<year>2012</year>
<publisher-name>Lambert Academic Publisher</publisher-name>
</element-citation>
</ref>
<ref id="B6">
<label>10</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Zong</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>CL</given-names>
</name>
</person-group>
<article-title>Chart image classification using multiple-instance learning</article-title>
<conf-name>Proceedings of the IEEE Workshop on Applications of Computer Vision (WACV ’07)</conf-name>
<conf-date>February 2007</conf-date>
<conf-loc>Austin, Tex, USA</conf-loc>
<fpage>27 pages</fpage>
<pub-id pub-id-type="other">2-s2.0-34547189072</pub-id>
</element-citation>
</ref>
<ref id="B31">
<label>11</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Saba</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Rehman</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Elarbi-Boudihir</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Methods and strategies on off-line cursive touched characters segmentation: a directional review</article-title>
<source>
<italic>Artificial Intelligence Review</italic>
</source>
<year>2011</year>
<pub-id pub-id-type="other">2-s2.0-79959226797</pub-id>
</element-citation>
</ref>
<ref id="B32">
<label>12</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Rehman</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Mohammad</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Sulong</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Saba</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Simple and effective techniques for core-region detection and slant correction in offline script recognition</article-title>
<conf-name>Proceedings of the IEEE International Conference on Signal and Image Processing Applications (ICSIPA '09)</conf-name>
<conf-date>November 2009</conf-date>
<conf-loc>Kuala Lumpur, Malaysia</conf-loc>
<fpage>15</fpage>
<lpage>20</lpage>
<pub-id pub-id-type="other">2-s2.0-77954468983</pub-id>
</element-citation>
</ref>
<ref id="B33">
<label>13</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rehman</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Saba</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Document skew estimation and correction: analysis of techniques, common problems and possible solutions</article-title>
<source>
<italic>Applied Artificial Intelligence</italic>
</source>
<year>2011</year>
<volume>25</volume>
<issue>9</issue>
<fpage>769</fpage>
<lpage>787</lpage>
<pub-id pub-id-type="other">2-s2.0-84855708336</pub-id>
</element-citation>
</ref>
<ref id="B34">
<label>14</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rehman</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Saba</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Off-line cursive script recognition: current advances, comparisons and remaining problems</article-title>
<source>
<italic>Artificial Intelligence Review</italic>
</source>
<year>2012</year>
<volume>37</volume>
<issue>4</issue>
<fpage>261</fpage>
<lpage>288</lpage>
<pub-id pub-id-type="other">2-s2.0-84864821765</pub-id>
</element-citation>
</ref>
<ref id="B35">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Saba</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Rehman</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Sulong</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Improved statistical features for cursive character recognition</article-title>
<source>
<italic>International Journal of Innovative Computing, Information and Control</italic>
</source>
<year>2011</year>
<volume>7</volume>
<issue>9</issue>
<fpage>5211</fpage>
<lpage>5224</lpage>
<pub-id pub-id-type="other">2-s2.0-80052494518</pub-id>
</element-citation>
</ref>
<ref id="B20">
<label>16</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Helmy</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>A computational model for context-based image categorization and description</article-title>
<source>
<italic>International Journal of Image and Graphics</italic>
</source>
<year>2012</year>
<volume>12</volume>
<issue>1</issue>
<pub-id pub-id-type="publisher-id">1250001</pub-id>
</element-citation>
</ref>
<ref id="B7">
<label>17</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Savva</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Kong</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Chhajta</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>F-F</given-names>
</name>
<name>
<surname>Agrawala</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Heer</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>ReVision: automated classification, analysis and redesign of chart images</article-title>
<conf-name>Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST '11)</conf-name>
<conf-date>October 2011</conf-date>
<conf-loc>Santa Barbara, Calif, USA</conf-loc>
<fpage>393</fpage>
<lpage>402</lpage>
<pub-id pub-id-type="other">2-s2.0-80755144018</pub-id>
</element-citation>
</ref>
<ref id="B8">
<label>18</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Elzer</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Carberry</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Zukerman</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Chester</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Green</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Demir</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>A probabilistic framework for recognizing intention in information graphics</article-title>
<conf-name>Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI '05)</conf-name>
<conf-date>August 2005</conf-date>
<conf-loc>Edinburgh, UK</conf-loc>
<fpage>1042</fpage>
<lpage>1047</lpage>
<pub-id pub-id-type="other">2-s2.0-84880717848</pub-id>
</element-citation>
</ref>
<ref id="B9">
<label>19</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Carberry</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Elzer</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Green</surname>
<given-names>N</given-names>
</name>
<name>
<surname>McCoy</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Chester</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Understanding information graphics: a discourse-level problem</article-title>
<conf-name>Proceedings of the 4th SIGdial Workshop of Discourse and Dialogue (SIGDIAL '03)</conf-name>
<conf-date>July 2003</conf-date>
<conf-loc>Sapporo, Japan</conf-loc>
<fpage>1</fpage>
<lpage>12</lpage>
</element-citation>
</ref>
<ref id="B36">
<label>20</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rahim</surname>
<given-names>MSM</given-names>
</name>
<name>
<surname>Rehman</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ni'matus</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Kurniawan</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Saba</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Region-based features extraction in ear biometrics</article-title>
<source>
<italic>International Journal of Academic Research</italic>
</source>
<year>2012</year>
<volume>4</volume>
<issue>1</issue>
<fpage>37</fpage>
<lpage>42</lpage>
</element-citation>
</ref>
<ref id="B10">
<label>21</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Yokokura</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Watanabe</surname>
<given-names>T</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Tombre</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Chhabra</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Layout-based approach for extracting constructive elements of bar-charts</article-title>
<source>
<italic>Graphics Recognition Algorithms and Systems</italic>
</source>
<year>1998</year>
<volume>1389</volume>
<publisher-loc>Berlin, Germany</publisher-loc>
<publisher-name>Springer</publisher-name>
<fpage>163</fpage>
<lpage>174</lpage>
</element-citation>
</ref>
<ref id="B11">
<label>22</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>CL</given-names>
</name>
</person-group>
<article-title>Learning-based scientific chart recognition</article-title>
<conf-name>Proceedings of the 4th IAPR International Workshop on Graphics Recognition (GREC '01)</conf-name>
<conf-date>2001</conf-date>
<fpage>482</fpage>
<lpage>492</lpage>
</element-citation>
</ref>
<ref id="B12">
<label>23</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>YP</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>CL</given-names>
</name>
</person-group>
<article-title>Hough technique for bar charts detection and recognition in document images</article-title>
<volume>2</volume>
<conf-name>Proceeding of the International Conference on Image Processing (ICIP ’00)</conf-name>
<conf-date>September 2000</conf-date>
<conf-loc>Vancouver, Canada</conf-loc>
<fpage>605</fpage>
<lpage>608</lpage>
<pub-id pub-id-type="other">2-s2.0-0034443271</pub-id>
</element-citation>
</ref>
<ref id="B13">
<label>24</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Leow</surname>
<given-names>WK</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Lladós</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kwon</surname>
<given-names>Y-B</given-names>
</name>
</person-group>
<article-title>Model-based chart image recognition</article-title>
<source>
<italic>Graphics Recognition. Recent Advances and Perspectives</italic>
</source>
<year>2004</year>
<volume>3088</volume>
<publisher-loc>Berlin, Germany</publisher-loc>
<publisher-name>Springer</publisher-name>
<fpage>87</fpage>
<lpage>99</lpage>
</element-citation>
</ref>
<ref id="B21">
<label>25</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Mishchenko</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Vassilieva</surname>
<given-names>N</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Bebis</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Boyle</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Parvin</surname>
<given-names>B</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Model-based chart image classification</article-title>
<source>
<italic>Advances in Visual Computing</italic>
</source>
<year>2011</year>
<volume>6939</volume>
<publisher-loc>Berlin, Germany</publisher-loc>
<publisher-name>Springer</publisher-name>
<fpage>476</fpage>
<lpage>485</lpage>
<series>Lecture Notes in Computer Science</series>
</element-citation>
</ref>
<ref id="B14">
<label>26</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>CL</given-names>
</name>
</person-group>
<article-title>A system for understanding imaged infographics and its applications</article-title>
<conf-name>Proceedings of the ACM Symposium on Document Engineering</conf-name>
<conf-date>August 2007</conf-date>
<conf-loc>Manitoba, Canada</conf-loc>
<fpage>9</fpage>
<lpage>18</lpage>
<pub-id pub-id-type="other">2-s2.0-37849007558</pub-id>
</element-citation>
</ref>
<ref id="B15">
<label>27</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Hassan</surname>
<given-names>MM</given-names>
</name>
<name>
<surname>Al Khatib</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>Similarity searching in statistical figures based on extracted meta data</article-title>
<conf-name>Proceedings of the Computer Graphics, Imaging and Visualisation (CGIV ’07)</conf-name>
<conf-date>August 2007</conf-date>
<conf-loc>Bangkok, Thailand</conf-loc>
<fpage>329</fpage>
<lpage>334</lpage>
<pub-id pub-id-type="other">2-s2.0-46449127176</pub-id>
</element-citation>
</ref>
<ref id="B16">
<label>28</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>CL</given-names>
</name>
</person-group>
<article-title>Semi-automatic ground truth generation for chart image recognition</article-title>
<conf-name>Proceedings of the 7th international conference on Document Analysis Systems</conf-name>
<conf-date>2006</conf-date>
<conf-loc>Nelson, New Zealand</conf-loc>
</element-citation>
</ref>
<ref id="B22">
<label>29</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tourassi</surname>
<given-names>GD</given-names>
</name>
</person-group>
<article-title>Journey toward computer-aided diagnosis: role of image texture analysis</article-title>
<source>
<italic>Radiology</italic>
</source>
<year>1999</year>
<volume>213</volume>
<issue>2</issue>
<fpage>317</fpage>
<lpage>320</lpage>
<pub-id pub-id-type="other">2-s2.0-0032710766</pub-id>
<pub-id pub-id-type="pmid">10551208</pub-id>
</element-citation>
</ref>
<ref id="B24">
<label>30</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chandy</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>JS</given-names>
</name>
<name>
<surname>Selvan</surname>
<given-names>SE</given-names>
</name>
</person-group>
<article-title>Texture feature extraction using gray level statistical matrix for content-based mammogram retrieval</article-title>
<source>
<italic>Multimedia Tools and Applications</italic>
</source>
<year>2014</year>
<volume>72</volume>
<issue>2</issue>
<fpage>2011</fpage>
<lpage>2024</lpage>
</element-citation>
</ref>
<ref id="B25">
<label>31</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Han</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Song</surname>
<given-names>B</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A new 3D texture feature based computer-aided diagnosis approach to differentiate pulmonary nodules</article-title>
<conf-name>Medical Imaging: Computer-Aided Diagnosis</conf-name>
<conf-date>2014</conf-date>
<conf-loc>San Diego, Calif , USA</conf-loc>
<series>Proceedings of SPIE</series>
</element-citation>
</ref>
<ref id="B26">
<label>32</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Han</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Song</surname>
<given-names>B</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Efficient 3D texture feature extraction from CT images for computer-aided diagnosis of pulmonary nodules</article-title>
<volume>9035</volume>
<conf-name>Proceedings of the SPIE, Medical Imaging: Computer-Aided Diagnosis</conf-name>
<conf-date>2014</conf-date>
<fpage>1</fpage>
<lpage>7</lpage>
</element-citation>
</ref>
<ref id="B17">
<label>33</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Eissen</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Stein</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Kulig</surname>
<given-names>M</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Decker</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Lenz</surname>
<given-names>H-J</given-names>
</name>
</person-group>
<article-title>Plagiarism detection without reference collections</article-title>
<source>
<italic>Advances in Data Analysis</italic>
</source>
<year>2007</year>
<publisher-loc>Berlin, Germany</publisher-loc>
<publisher-name>Springer</publisher-name>
<fpage>359</fpage>
<lpage>366</lpage>
</element-citation>
</ref>
<ref id="B18">
<label>34</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Kasprzak</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Brandejs</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Křipač</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Finding plagiarism by evaluating document similarities</article-title>
<conf-name>Proceedings of the 3rd Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN '09)</conf-name>
<conf-date>September 2009</conf-date>
<conf-loc>Donostia, Spain</conf-loc>
<fpage>24</fpage>
<lpage>28</lpage>
<pub-id pub-id-type="other">2-s2.0-84887479157</pub-id>
</element-citation>
</ref>
<ref id="B19">
<label>35</label>
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Basile</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Benedetto</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Caglioti</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Cristadoro</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Esposti</surname>
<given-names>MD</given-names>
</name>
</person-group>
<article-title>A plagiarism detection procedure in three steps: selection, matches and squares</article-title>
<comment>Donostia, Spain, 2009</comment>
</element-citation>
</ref>
<ref id="B23">
<label>36</label>
<element-citation publication-type="thesis">
<person-group person-group-type="author">
<name>
<surname>Hamed</surname>
<given-names>ZA</given-names>
</name>
<name>
<surname>Mohd Hashim</surname>
<given-names>SZ</given-names>
</name>
</person-group>
<source>
<italic>Hybrid particle swarm optimization and black stork foraging for functional neural fuzzy network learning enhancement [UTM Thesis]</italic>
</source>
<year>2012</year>
</element-citation>
</ref>
</ref-list>
</back>
<floats-group>
<fig id="fig1" orientation="portrait" position="float">
<label>Figure 1</label>
<caption>
<p>Taxonomy of chart plagiarism.</p>
</caption>
<graphic xlink:href="TSWJ2014-612787.001"></graphic>
</fig>
<fig id="fig2" orientation="portrait" position="float">
<label>Figure 2</label>
<caption>
<p>Taxonomy of figure plagiarism.</p>
</caption>
<graphic xlink:href="TSWJ2014-612787.002"></graphic>
</fig>
<fig id="fig3" orientation="portrait" position="float">
<label>Figure 3</label>
<caption>
<p>Taxonomy of table plagiarism.</p>
</caption>
<graphic xlink:href="TSWJ2014-612787.003"></graphic>
</fig>
<fig id="fig4" orientation="portrait" position="float">
<label>Figure 4</label>
<caption>
<p>Flowchart of methodology.</p>
</caption>
<graphic xlink:href="TSWJ2014-612787.004"></graphic>
</fig>
<fig id="fig5" orientation="portrait" position="float">
<label>Figure 5</label>
<caption>
<p>Samples of data set [
<xref rid="B7" ref-type="bibr">17</xref>
,
<xref rid="B23" ref-type="bibr">36</xref>
].</p>
</caption>
<graphic xlink:href="TSWJ2014-612787.005"></graphic>
</fig>
<fig id="fig6" orientation="portrait" position="float">
<label>Figure 6</label>
<caption>
<p>Plagiarism detection for one of Exact Copy patterns as plagiarism of the whole data of image.</p>
</caption>
<graphic xlink:href="TSWJ2014-612787.006"></graphic>
</fig>
<fig id="fig7" orientation="portrait" position="float">
<label>Figure 7</label>
<caption>
<p>Plagiarism detection for one of the Modified Copy patterns which is the stealing by changing scales of image.</p>
</caption>
<graphic xlink:href="TSWJ2014-612787.007"></graphic>
</fig>
<fig id="fig8" orientation="portrait" position="float">
<label>Figure 8</label>
<caption>
<p>Plagiarism detection for integration among possible bar chart plagiarism.</p>
</caption>
<graphic xlink:href="TSWJ2014-612787.008"></graphic>
</fig>
<fig id="fig9" orientation="portrait" position="float">
<label>Figure 9</label>
<caption>
<p>The evaluation of performance (a) for
<italic> Exact Copy</italic>
patterns and (b) for
<italic> Modified Copy</italic>
patterns.</p>
</caption>
<graphic xlink:href="TSWJ2014-612787.009"></graphic>
</fig>
</floats-group>
</pmc>
<affiliations>
<list>
<country>
<li>Arabie saoudite</li>
<li>Iraq</li>
<li>Malaisie</li>
</country>
</list>
<tree>
<country name="Malaisie">
<noRegion>
<name sortKey="Al Dabbagh, Mohammed Mumtaz" sort="Al Dabbagh, Mohammed Mumtaz" uniqKey="Al Dabbagh M" first="Mohammed Mumtaz" last="Al-Dabbagh">Mohammed Mumtaz Al-Dabbagh</name>
</noRegion>
<name sortKey="Alkawaz, Mohammed Hazim" sort="Alkawaz, Mohammed Hazim" uniqKey="Alkawaz M" first="Mohammed Hazim" last="Alkawaz">Mohammed Hazim Alkawaz</name>
<name sortKey="Salim, Naomie" sort="Salim, Naomie" uniqKey="Salim N" first="Naomie" last="Salim">Naomie Salim</name>
</country>
<country name="Iraq">
<noRegion>
<name sortKey="Al Dabbagh, Mohammed Mumtaz" sort="Al Dabbagh, Mohammed Mumtaz" uniqKey="Al Dabbagh M" first="Mohammed Mumtaz" last="Al-Dabbagh">Mohammed Mumtaz Al-Dabbagh</name>
</noRegion>
<name sortKey="Alkawaz, Mohammed Hazim" sort="Alkawaz, Mohammed Hazim" uniqKey="Alkawaz M" first="Mohammed Hazim" last="Alkawaz">Mohammed Hazim Alkawaz</name>
</country>
<country name="Arabie saoudite">
<noRegion>
<name sortKey="Rehman, Amjad" sort="Rehman, Amjad" uniqKey="Rehman A" first="Amjad" last="Rehman">Amjad Rehman</name>
</noRegion>
<name sortKey="Al Dhelaan, Abdullah" sort="Al Dhelaan, Abdullah" uniqKey="Al Dhelaan A" first="Abdullah" last="Al-Dhelaan">Abdullah Al-Dhelaan</name>
<name sortKey="Al Rodhaan, Mznah" sort="Al Rodhaan, Mznah" uniqKey="Al Rodhaan M" first="Mznah" last="Al-Rodhaan">Mznah Al-Rodhaan</name>
<name sortKey="Saba, Tanzila" sort="Saba, Tanzila" uniqKey="Saba T" first="Tanzila" last="Saba">Tanzila Saba</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Pmc/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000048 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd -nk 000048 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Pmc
   |étape=   Checkpoint
   |type=    RBID
   |clé=     PMC:4182899
   |texte=   Intelligent Bar Chart Plagiarism Detection in Documents
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/RBID.i   -Sk "pubmed:25309952" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a OcrV1 

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024