OcrV1, Ncbi, Merge, bibRecord, 000177

Resource Efficient Hardware Architecture for Fast Computation of Running Max/Min Filters

Identifieur interne : 000177 ( Ncbi/Merge ); précédent : 000176; suivant : 000178

Resource Efficient Hardware Architecture for Fast Computation of Running Max/Min Filters

Auteurs : Cesar Torres-Huitzil [Mexique]

Source :

The Scientific World Journal [ 1537-744X ] ; 2013.

RBID : PMC:3833061

Abstract

Running max/min filters on rectangular kernels are widely used in many digital signal and image processing applications. Filtering with a k × k kernel requires of k² − 1 comparisons per sample for a direct implementation; thus, performance scales expensively with the kernel size k. Faster computations can be achieved by kernel decomposition and using constant time one-dimensional algorithms on custom hardware. This paper presents a hardware architecture for real-time computation of running max/min filters based on the van Herk/Gil-Werman (HGW) algorithm. The proposed architecture design uses less computation and memory resources than previously reported architectures when targeted to Field Programmable Gate Array (FPGA) devices. Implementation results show that the architecture is able to compute max/min filters, on 1024 × 1024 images with up to 255 × 255 kernels, in around 8.4 milliseconds, 120 frames per second, at a clock frequency of 250 MHz. The implementation is highly scalable for the kernel size with good performance/area tradeoff suitable for embedded applications. The applicability of the architecture is shown for local adaptive image thresholding.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3833061

DOI: 10.1155/2013/108103
PubMed: 24288456
PubMed Central: 3833061

Links toward previous steps (curation, corpus...)

to stream Pmc, to step Corpus: 000191
to stream Pmc, to step Curation: 000191
to stream Pmc, to step Checkpoint: 000067

Links to Exploration step

PMC:3833061

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Resource Efficient Hardware Architecture for Fast Computation of Running Max/Min Filters</title>
<author><name sortKey="Torres Huitzil, Cesar" sort="Torres Huitzil, Cesar" uniqKey="Torres Huitzil C" first="Cesar" last="Torres-Huitzil">Cesar Torres-Huitzil</name>
<affiliation wicri:level="1"><nlm:aff id="I1">Information Technology Laboratory, CINVESTAV, Km. 5.5 Carretera Ciudad Victoria-Soto La Marina, 87130 Ciudad Victoria, TAMPS, Mexico</nlm:aff>
<country xml:lang="fr">Mexique</country>
<wicri:regionArea>Information Technology Laboratory, CINVESTAV, Km. 5.5 Carretera Ciudad Victoria-Soto La Marina, 87130 Ciudad Victoria, TAMPS</wicri:regionArea>
<wicri:noRegion>TAMPS</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">24288456</idno>
<idno type="pmc">3833061</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3833061</idno>
<idno type="RBID">PMC:3833061</idno>
<idno type="doi">10.1155/2013/108103</idno>
<date when="2013">2013</date>
<idno type="wicri:Area/Pmc/Corpus">000191</idno>
<idno type="wicri:Area/Pmc/Curation">000191</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000067</idno>
<idno type="wicri:Area/Ncbi/Merge">000177</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Resource Efficient Hardware Architecture for Fast Computation of Running Max/Min Filters</title>
<author><name sortKey="Torres Huitzil, Cesar" sort="Torres Huitzil, Cesar" uniqKey="Torres Huitzil C" first="Cesar" last="Torres-Huitzil">Cesar Torres-Huitzil</name>
<affiliation wicri:level="1"><nlm:aff id="I1">Information Technology Laboratory, CINVESTAV, Km. 5.5 Carretera Ciudad Victoria-Soto La Marina, 87130 Ciudad Victoria, TAMPS, Mexico</nlm:aff>
<country xml:lang="fr">Mexique</country>
<wicri:regionArea>Information Technology Laboratory, CINVESTAV, Km. 5.5 Carretera Ciudad Victoria-Soto La Marina, 87130 Ciudad Victoria, TAMPS</wicri:regionArea>
<wicri:noRegion>TAMPS</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j">The Scientific World Journal</title>
<idno type="eISSN">1537-744X</idno>
<imprint><date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><p>Running max/min filters on rectangular kernels are widely used in many digital signal and image processing applications. Filtering with a <italic>k</italic>
 × <italic>k</italic>
 kernel requires of <italic>k</italic>
<sup>2</sup>
 − 1 comparisons per sample for a direct implementation; thus, performance scales expensively with the kernel size <italic>k</italic>
. Faster computations can be achieved by kernel decomposition and using constant time one-dimensional algorithms on custom hardware. This paper presents a hardware architecture for real-time computation of running max/min filters based on the van Herk/Gil-Werman (HGW) algorithm. The proposed architecture design uses less computation and memory resources than previously reported architectures when targeted to Field Programmable Gate Array (FPGA) devices. Implementation results show that the architecture is able to compute max/min filters, on 1024 × 1024 images with up to 255 × 255 kernels, in around 8.4 milliseconds, 120 frames per second, at a clock frequency of 250 MHz. The implementation is highly scalable for the kernel size with good performance/area tradeoff suitable for embedded applications. The applicability of the architecture is shown for local adaptive image thresholding.</p>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Brookes, M" uniqKey="Brookes M">M Brookes</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Maragos, P" uniqKey="Maragos P">P Maragos</name>
</author>
<author><name sortKey="Schafer, Rw" uniqKey="Schafer R">RW Schafer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Arce, Gr" uniqKey="Arce G">GR Arce</name>
</author>
<author><name sortKey="Mcloughlin, Mp" uniqKey="Mcloughlin M">MP McLoughlin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Maragos, P" uniqKey="Maragos P">P Maragos</name>
</author>
<author><name sortKey="Schafer, Rw" uniqKey="Schafer R">RW Schafer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hodgson, R" uniqKey="Hodgson R">R Hodgson</name>
</author>
<author><name sortKey="Bailey, D" uniqKey="Bailey D">D Bailey</name>
</author>
<author><name sortKey="Naylor, M" uniqKey="Naylor M">M Naylor</name>
</author>
<author><name sortKey="Ng, A" uniqKey="Ng A">A Ng</name>
</author>
<author><name sortKey="Mcneill, S" uniqKey="Mcneill S">S McNeill</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gil, Jy" uniqKey="Gil J">JY Gil</name>
</author>
<author><name sortKey="Kimmel, R" uniqKey="Kimmel R">R Kimmel</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Yuan, H" uniqKey="Yuan H">H Yuan</name>
</author>
<author><name sortKey="Atallah, Mj" uniqKey="Atallah M">MJ Atallah</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Van Herk, M" uniqKey="Van Herk M">M van Herk</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gil, J" uniqKey="Gil J">J Gil</name>
</author>
<author><name sortKey="Werman, M" uniqKey="Werman M">M Werman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Diamantaras, Ki" uniqKey="Diamantaras K">KI Diamantaras</name>
</author>
<author><name sortKey="Kung, Sy" uniqKey="Kung S">SY Kung</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Torres Huitzil, C" uniqKey="Torres Huitzil C">C Torres-Huitzil</name>
</author>
<author><name sortKey="Arias Estrada, M" uniqKey="Arias Estrada M">M Arias-Estrada</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chien, S Y" uniqKey="Chien S">S-Y Chien</name>
</author>
<author><name sortKey="Ma, S Y" uniqKey="Ma S">S-Y Ma</name>
</author>
<author><name sortKey="Chen, L G" uniqKey="Chen L">L-G Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Clienti, C" uniqKey="Clienti C">C Clienti</name>
</author>
<author><name sortKey="Bilodeau, M" uniqKey="Bilodeau M">M Bilodeau</name>
</author>
<author><name sortKey="Beucher, S" uniqKey="Beucher S">S Beucher</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Deforges, O" uniqKey="Deforges O">O Déforges</name>
</author>
<author><name sortKey="Normand, N" uniqKey="Normand N">N Normand</name>
</author>
<author><name sortKey="Babel, M" uniqKey="Babel M">M Babel</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Laforest, Ce" uniqKey="Laforest C">CE LaForest</name>
</author>
<author><name sortKey="Steffan, Jg" uniqKey="Steffan J">JG Steffan</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Urbach, Er" uniqKey="Urbach E">ER Urbach</name>
</author>
<author><name sortKey="Wilkinson, Mhf" uniqKey="Wilkinson M">MHF Wilkinson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pai, Y T" uniqKey="Pai Y">Y-T Pai</name>
</author>
<author><name sortKey="Chang, Y F" uniqKey="Chang Y">Y-F Chang</name>
</author>
<author><name sortKey="Ruan, S J" uniqKey="Ruan S">S-J Ruan</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Trier, Od" uniqKey="Trier O">OD Trier</name>
</author>
<author><name sortKey="Jain, Ak" uniqKey="Jain A">AK Jain</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sezgin, M" uniqKey="Sezgin M">M Sezgin</name>
</author>
<author><name sortKey="Sankur, B" uniqKey="Sankur B">B Sankur</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bernsen, J" uniqKey="Bernsen J">J Bernsen</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">ScientificWorldJournal</journal-id>
<journal-id journal-id-type="iso-abbrev">ScientificWorldJournal</journal-id>
<journal-id journal-id-type="publisher-id">TSWJ</journal-id>
<journal-title-group><journal-title>The Scientific World Journal</journal-title>
</journal-title-group>
<issn pub-type="epub">1537-744X</issn>
<publisher><publisher-name>Hindawi Publishing Corporation</publisher-name>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">24288456</article-id>
<article-id pub-id-type="pmc">3833061</article-id>
<article-id pub-id-type="doi">10.1155/2013/108103</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group><article-title>Resource Efficient Hardware Architecture for Fast Computation of Running Max/Min Filters</article-title>
</title-group>
<contrib-group><contrib contrib-type="author"><name><surname>Torres-Huitzil</surname>
<given-names>Cesar</given-names>
</name>
<xref ref-type="aff" rid="I1"></xref>
<xref ref-type="corresp" rid="cor1">*</xref>
</contrib>
</contrib-group>
<aff id="I1">Information Technology Laboratory, CINVESTAV, Km. 5.5 Carretera Ciudad Victoria-Soto La Marina, 87130 Ciudad Victoria, TAMPS, Mexico</aff>
<author-notes><corresp id="cor1">*Cesar Torres-Huitzil: <email>ctorres@tamps.cinvestav.mx</email>
</corresp>
<fn fn-type="other"><p>Academic Editors: B. Sun and J. Zhang</p>
</fn>
</author-notes>
<pub-date pub-type="collection"><year>2013</year>
</pub-date>
<pub-date pub-type="epub"><day>30</day>
<month>10</month>
<year>2013</year>
</pub-date>
<volume>2013</volume>
<elocation-id>108103</elocation-id>
<history><date date-type="received"><day>24</day>
<month>8</month>
<year>2013</year>
</date>
<date date-type="accepted"><day>17</day>
<month>9</month>
<year>2013</year>
</date>
</history>
<permissions><copyright-statement>Copyright © 2013 Cesar Torres-Huitzil.</copyright-statement>
<copyright-year>2013</copyright-year>
<license license-type="open-access"><license-p>This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract><p>Running max/min filters on rectangular kernels are widely used in many digital signal and image processing applications. Filtering with a <italic>k</italic>
 × <italic>k</italic>
 kernel requires of <italic>k</italic>
<sup>2</sup>
 − 1 comparisons per sample for a direct implementation; thus, performance scales expensively with the kernel size <italic>k</italic>
. Faster computations can be achieved by kernel decomposition and using constant time one-dimensional algorithms on custom hardware. This paper presents a hardware architecture for real-time computation of running max/min filters based on the van Herk/Gil-Werman (HGW) algorithm. The proposed architecture design uses less computation and memory resources than previously reported architectures when targeted to Field Programmable Gate Array (FPGA) devices. Implementation results show that the architecture is able to compute max/min filters, on 1024 × 1024 images with up to 255 × 255 kernels, in around 8.4 milliseconds, 120 frames per second, at a clock frequency of 250 MHz. The implementation is highly scalable for the kernel size with good performance/area tradeoff suitable for embedded applications. The applicability of the architecture is shown for local adaptive image thresholding.</p>
</abstract>
</article-meta>
</front>
<body><sec id="sec1"><title>1. Introduction</title>
<p>Running max/min filtering is an important operation that aims at selecting the maximum or minimum value from a set of signal elements. A window moves over all data items and at each point the max/min value of the data within the window is taken as output [<xref ref-type="bibr" rid="B1">1</xref>
]. Max/min filters are widely used in tasks such as noise filtering, adaptive control, pattern recognition, and speech and image processing. The max filter, in gray-level image morphology [<xref ref-type="bibr" rid="B2">2</xref>
], corresponds to the dilation operator over images using a flat structuring element (SE) or kernel, and the min filter corresponds to the erosion operator. These filters are very attractive since their computation requires only comparisons and no other arithmetic operations and because of their robust behavior in the presence of noise and signal nonstationarities [<xref ref-type="bibr" rid="B3">3</xref>
–<xref ref-type="bibr" rid="B5">5</xref>
].</p>
<p>For some image-based industrial applications, such as granulometries, particle size distribution, or local adaptive binarization, the filtering of high resolution images with large two-dimensional kernels could be very time consuming. A direct evaluation of such filters leads to <italic>O</italic>
(<italic>k</italic>
<sup>2</sup>
) comparisons per sample, being <italic>k</italic>
 the size of the kernel. A possible alternative to speed up computations is to decompose large kernels into linear or simpler ones [<xref ref-type="bibr" rid="B6">6</xref>
]. Then, linear filtering might be implemented by efficient algorithms [<xref ref-type="bibr" rid="B7">7</xref>
] and/or by dedicated hardware structures. Under this approach, the HGW algorithm is a widely used method to compute max/min with linear kernels whose complexity is independent of the filter size [<xref ref-type="bibr" rid="B8">8</xref>
, <xref ref-type="bibr" rid="B9">9</xref>
]. Motivated by the advantages of kernel decomposition, the existence of efficient one-dimensional filtering algorithms, and by the challenge to handle the computational cost and memory requirements, the design of an architecture for fast computation of running max/min image filters with arbitrary-length rectangular kernels is presented herein. This paper proposes an efficient coarse-grain pipelined implementation of the HGW algorithm as a building block with a memory usage improvement based on distributed memory available on FPGAs compared to previous architectures that use dedicated embedded Block Ram memory.</p>
<p>Running max/min filters have been realized in different implementation media such as very large scale integration (VLSI) circuits and FPGAs. Most of these hardware implementations target rather small rectangular kernels and employ a pipeline technique in which a raster-scan image is sequentially fed into a long delay line and then into an array of neighboring processing elements (PEs) and the max/min operations are carried out in parallel [<xref ref-type="bibr" rid="B10">10</xref>
, <xref ref-type="bibr" rid="B11">11</xref>
]. The strength of such architectures is that they can be pipelined down to a single compare-swap stage yielding high throughput and frequency. Techniques to decrease the number of comparators required to support large SEs were further introduced. For instance, in [<xref ref-type="bibr" rid="B12">12</xref>
], authors propose a partial-result-reuse (PRR) architecture for gray-level morphological operations with flat SEs. Partial results generated during computations are kept and reused in this architecture to reduce hardware cost. However, a considerable cost, for computation and storage, is still needed for large kernels.</p>
<p>In [<xref ref-type="bibr" rid="B13">13</xref>
], authors present an efficient hardware architecture to achieve erosion/dilation with very large linear kernels based on a slightly modified HGW algorithm. They propose a block mirroring scheme to suppress the need for backward scanning, to ease data propagation and memory access and to minimize memory consumption. However, embedded memories represent a large part of their design. When synthesized on a Virtex 4LX60 device, the architecture uses 3 Block Rams of 18 Kbits and 700 slices. The maximum kernel size that could be supported is 1023 over a line length of 65535 pixels. The design memory consumption is image size independent, but increasing further the parallelism, for instance, to process several image lines concurrently, is limited due to the number of Block Rams available on FPGAs. In [<xref ref-type="bibr" rid="B14">14</xref>
], another implementation of erosion/dilation based on SE decomposition and/or efficient 1-D algorithms is proposed. The method is based on a recursive morphological decomposition of 8-convex SEs by using only causal two-pixel SEs. The proposed architecture is generic and fully regular, built from elementary interconnected modules. It has been synthesized into an FPGA, achieving high operation frequencies for any shape and size of SE; however, for large SEs a long pipeline is required.</p>
<p>Although some architectures for max/min filters have been developed, improvements are still needed for filtering high definition video streams in real time. On one hand, embedded memories represent a considerable cost of previous designs limiting FPGA deployment in embedded environments. One the other hand, the scalability is another concern since the architectures need major modifications when the kernel size increases. This is the primary motivation for the proposed optimized implementation, which relies on some architectural techniques used in [<xref ref-type="bibr" rid="B13">13</xref>
].</p>
<p>The rest of the paper is organized as follows. The HGW algorithm is presented in <xref ref-type="sec" rid="sec2">Section 2</xref>
. In <xref ref-type="sec" rid="sec3">Section 3</xref>
, the proposed architecture is presented in detail as well as the strategy for mapping memory requirements to on-chip resources. <xref ref-type="sec" rid="sec4"> Section 4</xref>
 presents the FPGA implementation, experimental results, and local adaptive thresholding as a case of study. Concluding remarks and future work are presented in <xref ref-type="sec" rid="sec5">Section 5</xref>
.</p>
</sec>
<sec id="sec2"><title>2. van Herk/Gil-Werman Algorithm</title>
<p>The one-dimensional version of a running max filter of order <italic>k</italic>
 can be formulated as follows. Giving an input sequence of size <italic>M</italic>
, <italic>f</italic>
<sub>0</sub>
, <italic>f</italic>
<sub>1</sub>
, <italic>f</italic>
<sub>2</sub>
,…, <italic>f</italic>
<sub><italic>M</italic>
−1</sub>
, the response <italic>r</italic>
<sub><italic>i</italic>
</sub>
 of the filter for <italic>i</italic>
 = 0,…, <italic>M</italic> − 1 is given by the following equation:
<disp-formula id="EEq1"><label>(1)</label>
<mml:math id="M1"><mml:mtable><mml:mtr><mml:mtd><mml:mi>  </mml:mi>
<mml:msub><mml:mrow><mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:munder><mml:mrow><mml:mi>max</mml:mi>
<mml:mo>⁡</mml:mo>
</mml:mrow>
<mml:mrow><mml:mn mathvariant="normal">0</mml:mn>
<mml:mo>≤</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo><</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:munder>
<mml:msub><mml:mrow><mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>i</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
In the actual processing of the sequence, the boundaries usually receive some special treatment, for example, padding, periodic condition, and so forth.</p>
<p>The HGW algorithm consists of three main processing steps [<xref ref-type="bibr" rid="B8">8</xref>
, <xref ref-type="bibr" rid="B9">9</xref>
] as illustrated in <xref ref-type="fig" rid="fig1">Figure 1</xref>
. First, the input sequence is split into segments of length <italic>k</italic>
, where a propagation in a forward way of the current max value of <italic>f</italic>
, for <italic>x</italic>
 = 0,…, <italic>M</italic>
 − 1, is done using the following equation: </p>
<p><disp-formula id="EEq2"><label>(2)</label>
<mml:math id="M2"><mml:mtable><mml:mtr><mml:mtd><mml:mi>g</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow><mml:mo>{</mml:mo>
<mml:mrow><mml:mtable><mml:mtr><mml:mtd columnalign="left"><mml:mi>f</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mtd>
<mml:mtd columnalign="left"><mml:mtext>if</mml:mtext>
<mml:mi>  </mml:mi>
<mml:mi>x</mml:mi>
<mml:mi>mod</mml:mi>
<mml:mo>⁡</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn mathvariant="normal">0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr><mml:mtd columnalign="left"><mml:mi>max</mml:mi>
<mml:mo>⁡</mml:mo>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>g</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>x</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn mathvariant="normal">1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>f</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mtd>
<mml:mtd columnalign="left"><mml:mtext>otherwise</mml:mtext>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
In a second processing step, a propagation in a backward way of the current max value of <italic>f</italic> is performed using the following equation:
<disp-formula id="EEq3"><label>(3)</label>
<mml:math id="M3"><mml:mtable><mml:mtr><mml:mtd><mml:mi>h</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow><mml:mo>{</mml:mo>
<mml:mrow><mml:mtable><mml:mtr><mml:mtd columnalign="left"><mml:mi>f</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mtd>
<mml:mtd columnalign="left"><mml:mtext>if</mml:mtext>
<mml:mi>  </mml:mi>
<mml:mi>x</mml:mi>
<mml:mi>mod</mml:mi>
<mml:mo>⁡</mml:mo>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>k</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn mathvariant="normal">1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mn mathvariant="normal">0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr><mml:mtd columnalign="left"><mml:mi>max</mml:mi>
<mml:mo>⁡</mml:mo>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>h</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>x</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn mathvariant="normal">1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>f</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mtd>
<mml:mtd columnalign="left"><mml:mtext>otherwise</mml:mtext>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
for <italic>x</italic>
 = <italic>M</italic>
 − 1,…, 0. Note that the values in a given segment are scanned in a reverse order to produce <italic>h</italic>
(<italic>x</italic>
) as opposed to <italic>g</italic>
(<italic>x</italic>
).</p>
<p> In the last processing step, the max (or min) is computed by merging the <italic>g</italic>
 and <italic>h</italic>
 arrays using the following equation: </p>
<p><disp-formula id="EEq4"><label>(4)</label>
<mml:math id="M4"><mml:mtable><mml:mtr><mml:mtd><mml:mi>r</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>x</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mi>max</mml:mi>
<mml:mo>⁡</mml:mo>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>g</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>x</mml:mi>
<mml:mo>+</mml:mo>
<mml:mfrac><mml:mrow><mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow><mml:mn mathvariant="normal">2</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>h</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>x</mml:mi>
<mml:mo>−</mml:mo>
<mml:mfrac><mml:mrow><mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow><mml:mn mathvariant="normal">2</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mo> </mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn mathvariant="normal">0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>M</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn mathvariant="normal">1</mml:mn>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
</p>
<p>Equations (<xref ref-type="disp-formula" rid="EEq2">2</xref>
), (<xref ref-type="disp-formula" rid="EEq3">3</xref>
), and (<xref ref-type="disp-formula" rid="EEq4">4</xref>
) each require only a single comparison per array element, that is, three comparisons per sample to filter with a linear kernel of any size. The <italic>d</italic>
-dimensional max/min filter can be computed using a kernel decomposition approach by sequentially applying the one-dimensional filter <italic>d</italic>
 times. In the two-dimensional case, only 6 comparisons per pixel are required by applying the one-dimensional HWG consecutively to rows and columns of the input image.</p>
<p>The HGW algorithm is amenable for parallelism and coarse-grain pipelining; however, the large data buffers required to store <italic>g</italic>
(<italic>x</italic>
), <italic>h</italic>
(<italic>x</italic>
) and the pipelined computation of <italic>r</italic>
(<italic>x</italic>
) are identified as the most challenging aspects for a hardware implementation. In this sense, the solution proposed in [<xref ref-type="bibr" rid="B13">13</xref>
] is not fully adequate for embedded scenarios as the memory resources are substantially high. In this paper, an FPGA-based memory resource efficient architecture that exploits parallelism and pipelining is presented. The goal is to achieve an optimized embedded implementation with a high throughput while reducing the dedicated on-chip Block Ram memory by an efficient utilization of the distributed memory resources available in current FPGA devices.</p>
</sec>
<sec id="sec3"><title>3. Proposed Architecture</title>
<p>The HGW algorithm in addition to modularity and regularity exhibits the following desirable properties for a hardware implementation [<xref ref-type="bibr" rid="B1">1</xref>
]: (1) operations reducing to few comparator modules, (2) local and regular data and control flow requirements, and (3) inherent pipelining and multiprocessing features. The whole architecture is examined and its main components are described in detail in the following subsections.</p>
<sec id="sec3.1"><title>3.1. Architecture Overview</title>
<p> A block diagram of the hardware architecture for the HWG algorithm is shown in <xref ref-type="fig" rid="fig2">Figure 2</xref>
. A set of three comparators are required for the internal computations, which are well mapped to the FPGA resources. The comparators are labeled as forward, backward, and merge to indicate to which processing step each comparator belongs to. The counter-based control unit synchronizes all data and control flow among modules in the architecture. Additionally, it generates the external memory addresses both to read data from the input memory and to store the processed data in the output memory according to image and kernel specifications. For the purpose of simplicity, just the main control signals are shown in <xref ref-type="fig" rid="fig2">Figure 2</xref>
.</p>
<p>The major building blocks in the proposed architecture are memory units since the HGW algorithm is memory centric in the sense that more resources are required for internal storage than for computation. This is an example of the so-called systems on-chip that requires frequent sharing, communication, queuing, and synchronization among distributed functional units [<xref ref-type="bibr" rid="B16">15</xref>
].</p>
</sec>
<sec id="sec3.2"><title>3.2. Memory Organization and Mapping</title>
<p>The architecture memory organization is based on the scheme used in [<xref ref-type="bibr" rid="B13">13</xref>
], where three Block Ram modules were used. However, herein a different strategy is used to map logical memories to FPGA distributed memory such that function-level parallelism can be further exploited to improve scalability and performance. In [<xref ref-type="bibr" rid="B13">13</xref>
], dual-port memories were used to ease the propagation and memory accesses. In addition, a block mirroring strategy was proposed to suppress the need for a complete backward scanning of the input stream. The mirroring scheme requires two Block Rams, RAM<sub>2</sub>
 and RAM<sub>3</sub>
, whose depth depends on the maximum supported kernel size. The third memory unit, RAM<sub>1</sub>
, provides temporal storage for the computed values in the forward step and to synchronize the pipeline stages as will be explained in the following section.</p>
<p>In this paper, we use the memory resources distributed across the FPGA instead of the embedded Block Rams of unique size (i.e., 18 Kbit for Xilinx Spartan-6 devices). Some LUTs within each configurable logic block (CLB) optionally implement a 16 × 1-bit synchronous RAM which can be cascadable for deeper and/or wider memories [<xref ref-type="bibr" rid="B17">16</xref>
]. Distributed RAM writes synchronously and reads asynchronously. This property is exploited in this work so as to avoid the use of dual-port memories. Furthermore, the address port either for single- or dual-port modes is asynchronous with an access time equivalent to a LUT logic delay.</p>
</sec>
<sec id="sec3.3"><title>3.3. Architecture Pipeline Scheme</title>
<p><xref ref-type="fig" rid="fig3">Figure 3</xref>
 shows a high-level overview of the three-stage pipeline scheme used in the architecture in order to sustain a high output data rate. The internal memory requirements of the pipeline are provided by using low overhead on-chip memory, distributed synchronous single port rams, that alleviates the need of dual-port rams. The three memory units can be implemented very efficiently on FPGAs by taking advantage of concurrent synchronous writing and asynchronous reading since only streaming operations are required on windows of at most <italic>k</italic>
 elements.</p>
<p>The coarse-grain computational stages in the pipeline can be described as follows. In the first stage, two processing tasks are performed concurrently on the incoming data stream. The max value is propagated in a forward way and the stream values also undergo a reverse order arrangement in segments of size <italic>k</italic>
. The second pipeline stage starts its operation after <italic>k</italic>
 clock cycles, and it performs the forward propagation of the previous mirrored segment and a backward mapping is also applied. The third stage starts the computations after the second stage completes the computation of <italic>k</italic>
/2 output samples. Since the merge stage requires the data computed by the forward and backward stages becomes available, its operation must be delayed 3<italic>k</italic>
/2 clock cycles after it can operate continuously on the <italic>g</italic>
(<italic>x</italic>
) and <italic>h</italic>
(<italic>x</italic>
) streams as shown in <xref ref-type="fig" rid="fig3">Figure 3</xref>
. For synchronization purposes, the values of the forward stage must be delayed <italic>k</italic>
 clock cycles. This buffering is also implemented using a distributed synchronous single-port memory.</p>
<p><xref ref-type="fig" rid="fig4">Figure 4</xref>
 shows a time diagram of an 8-bit pixel stream <italic>f</italic>
 of an input image used to illustrate the operation of the architecture when a kernel of size <italic>k</italic>
 = 5 is used. A snapshot of the main signals <italic>g</italic>
 and <italic>h</italic>
 in the data flow and computation steps for the pipelined architecture are shown in the simulation assuming a clock frequency of 100 MHz. For simplicity, just two control signals <italic>E</italic>
 and <italic>AddRAM</italic>
 derived from the counter-based controller are shown. Note that <italic>AddRAM</italic>
 is generated by reverted address counters and used as addresses to write and read data in the distributed memory. Each stage is active for <italic>k</italic>
 consecutive clock cycles and the operation of adjacent stages are delayed for <italic>k</italic>
 clock cycles. Signal <italic>E</italic>
 indicates the time when a window of the input stream has been processed. As shown in <xref ref-type="fig" rid="fig4">Figure 4</xref>
, each comparator, after being reset by <italic>E</italic>
, is reused for another adjacent <italic>k</italic>
 window.</p>
</sec>
<sec id="sec3.4"><title>3.4. Parallelism Enhancement</title>
<p>Because pipelining and parallelism are naturally supported by intrinsic resources of current FPGA devices, it is important to fully utilize these resources to improve performance. At a first level, the proposed architecture was divided into a set of simpler functional elements to carry out the internal computations in a pipelined fashion on the input stream. However, the performance for running max/min filters on two-dimensional signals using rectangular kernels can be improved if function-level parallelism is exploited. Thus, the HGW module can be replicated as much as possible and organized in a more parallel structure as shown in <xref ref-type="fig" rid="fig5">Figure 5</xref>
, so as to process concurrently several input streams. In this sense, the number of HGW modules depends on the capacity of the target FPGA device and the actual memory organization that provides data. A set of registers is used to store data coming from the external memory. These registers provide parallel data access to the HGW modules. A multiplexer selects the results produced by the HGW modules and sends them to the output memory.</p>
<p> To apply such parallel scheme for running max/min filters on images, it is assumed that the input image is scanned row by row starting from the upper-left corner sample. In addition, observing that memories can often operate much faster than the user's actual design, memory ports can be time multiplexed to increase the number of independent accesses. In such multipumping scheme [<xref ref-type="bibr" rid="B16">15</xref>
], the memory system is clocked at a multiple of the main clock, providing the illusion of a multiple port memory. Multipumping brings an area reduction if the external memory speed is significantly higher than the required by the rest of the system. Since the number of required ports or the operating frequency is modest in the proposed design, the main benefit of multipumping is reducing the on-chip memory area at the expense of clock frequency.</p>
</sec>
</sec>
<sec id="sec4"><title>4. Implementation and Experimental Results</title>
<p>In this section, experimental results of the FPGA implementation, the hardware resource utilization, and the performance evaluation of the proposed architecture are presented and discussed. </p>
<sec id="sec4.1"><title>4.1. FPGA Implementation</title>
<p>The Atlys FPGA board from Digilent Inc. has been used for prototyping and VHDL was used as the modeling language. Design parameters such as the kernel size, the image dimensions, and the number of parallel units are parameterizable, so they can be set to the appropriate values before synthesis for an optimized implementation. The architecture functionality has been validated on a set of test gray-level images from the Brodatz texture dataset. <xref ref-type="fig" rid="fig6"> Figure 6</xref>
 shows two 1024 × 1024 test images used in the experiments and the results obtained produced by the architecture using rectangular kernels of different sizes, 21 and 63, for max and min filtering. Larger kernel sizes were also tested and validated but for space consideration are not presented here.</p>
<p><xref ref-type="table" rid="tab1">Table 1</xref>
 shows the FPGA resource utilization and the maximum achievable frequency for three different instances of the architecture using 1, 2, and 4 HGW modules. The presented results are obtained from the reports generated by the Xilinx ISE 13.1 tool suite when the design is targeted to a Spartan-6 device. The entire HGW architecture pipeline fits easily into the device thanks to the use of distributed ram resources. Note that the three logic memory modules used in the design are mapped to LUTs in the FPGA device. Only 96 6-input LUTs are necessary to support any kernel size up to 255, that is, a 256 × 8 single-port distributed ram. The maximum clock frequency reported by the tool is 250 MHz for a single HGW module with less than one percent of usage of the target device. Thus, potentially a large number of HGW modules can be used without a considerable increase of resource utilization or speed degradation. The hardware resource utilization for a single HGW is similar to the proposed in [<xref ref-type="bibr" rid="B13">13</xref>
] where 700 slices and 3 Block Rams of 18 Kbits were required. However, recall that authors used an FPGA technology relying on 4-input LUTs and in this work, the used target device natively supports 6-input LUTs; thus, a more compact implementation is expected. On the other hand, the use of distributed synchronous ram allows to replicate the HGW module so as to increase performance. A post-place-and-route simulation model was used to estimate the power consumption of the proposed architecture using the Xilinx XPower tool. The total power consumption of the 4-HGW design is 0.22 W, dynamic (0.18 W) plus quiescent (0.04 W) power.</p>
</sec>
<sec id="sec4.2"><title>4.2. Performance Evaluation</title>
<p>In order to have a baseline for comparison, a straightforward implementation for min/max filtering was carried out in C programming language. Also, the Urbach-Wilkinson algorithm [<xref ref-type="bibr" rid="B18">17</xref>
] is used for comparison purposes by using the source code provided by authors. <xref ref-type="fig" rid="fig7"> Figure 7</xref>
 shows the computation times for these methods over 2160 × 1440 gray-scale images. The implementations were carried out on a MacBook Pro with an Intel Core i7 2.66 GHz processor and 4 GB main memory in ANSI C without multithreading and compiled using gcc with O3 optimization flag set. The computation time for the straightforward implementation grows prohibitively large, not being suitable for real-time performance.</p>
<p><xref ref-type="fig" rid="fig8">Figure 8</xref>
 shows the near constant processing time, around 25 milliseconds, required for the architecture to filter a 2160 × 1440 input image for different kernel sizes. Recall that the architecture must operate twice on the input image since it uses kernel decomposition. A single HGW module clocked at 250 MHz processes row by row and then column by column. The architecture takes 3<italic>k</italic>
/2 clock cycles to produce the first result.</p>
<p>The processing time required for the MATLAB erosion/dilation optimized implementation is used for comparison purposes. Though it is not a parallel implementation, it forms a useful comparison baseline for this work as it uses the HGW algorithm and implements kernel decomposition. <xref ref-type="fig" rid="fig8">Figure 8</xref>
 shows that the proposed architecture is faster than the MATLAB optimized implementation, above 10x, with a deterministic response, and also outperforms the Urbach-Wilkinson implementation. The very low resource utilization makes the architecture suitable for embedded applications in low-cost FPGA devices with similar performances as efficient implementations in graphical processing units (GPUs) but with much lower power consumption.</p>
<p> The architecture is able to process images of different sizes and it can be easily extended to improve its performance by the replication of the HGW module. <xref ref-type="fig" rid="fig9"> Figure 9</xref>
 shows the processing times required for max/min filtering for different image sizes and for three degrees of parallelism using 1, 2, and 4 HGW modules clocked at 250 MHz, 225 MHz, and 150 MHz, respectively. Note that when 4 HGW instances are used, it is still possible to achieve more than standard real-time performance, 30 frames per second, even for high-resolution images. </p>
</sec>
<sec id="sec4.3"><title>4.3. Application on Local Adaptive Binarization</title>
<p> According to the results, the proposed architecture is suitable to be applied in embedded applications thanks to its real-time performance, low resource utilization, and low power consumption. In this section, a further application of the architecture for image binarization is presented. Image binarization converts gray-level or color images into binary ones in order to distinguish objects from background by finding and applying an appropriate threshold for image pixels. In document image analysis, the main goal is to extract printed characters through optical character recognition (OCR) to analyze relevant textual information from document images from sources such as books, magazines, forms, or newspapers [<xref ref-type="bibr" rid="B19">18</xref>
]. Local thresholding methods find a threshold for each image pixel based on local characteristics and statistics of pixels within a neighborhood centered around a given pixel [<xref ref-type="bibr" rid="B20">19</xref>
, <xref ref-type="bibr" rid="B21">20</xref>
]. Motivated by the advantages of local adaptive thresholding and by the challenge to handle efficiently the computational cost and the memory bandwidth, proposed architecture has been applied to accelerate computations for the Bernsen algorithm.</p>
<p>In the Bernsen algorithm [<xref ref-type="bibr" rid="B22">21</xref>
], for each pixel of the original image with gray level <italic>I</italic>
(<italic>x</italic>
, <italic>y</italic>
)∈[0,255], the local threshold, <italic>T</italic>
(<italic>x</italic>
, <italic>y</italic>), is set at the midrange value, which is the mean of the minimum and maximum gray level values in a given neighborhood:
<disp-formula id="EEq5"><label>(5)</label>
<mml:math id="M5"><mml:mtable><mml:mtr><mml:mtd><mml:mi>T</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac><mml:mrow><mml:mn mathvariant="normal">1</mml:mn>
</mml:mrow>
<mml:mrow><mml:mn mathvariant="normal">2</mml:mn>
</mml:mrow>
</mml:mfrac>
<mml:mrow><mml:mo>[</mml:mo>
<mml:mrow><mml:msub><mml:mrow><mml:mi>I</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>max</mml:mi>
<mml:mo>⁡</mml:mo>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:msub><mml:mrow><mml:mi>I</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>min</mml:mi>
<mml:mo>⁡</mml:mo>
</mml:mrow>
</mml:msub>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
</p>
<p>If the contrast <italic>c</italic>
(<italic>x</italic>
, <italic>y</italic>
) = <italic>I</italic>
<sub>max⁡</sub>
(<italic>x</italic>
, <italic>y</italic>
) + <italic>I</italic>
<sub>min⁡</sub>
(<italic>x</italic>
, <italic>y</italic>
) in the neighborhood is below a given threshold <italic>k</italic>
, then it is assumed that the neighborhood consists only of one class, foreground or background, depending on the value of the threshold. Each pixel(<italic>x</italic>
, <italic>y</italic>) is classified as an object pixel (indicated by value 1) or a background pixel (indicated by value 0) according to the following equation:
<disp-formula id="EEq6"><label>(6)</label>
<mml:math id="M6"><mml:mtable><mml:mtr><mml:mtd><mml:mi>b</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow><mml:mo>{</mml:mo>
<mml:mrow><mml:mtable><mml:mtr><mml:mtd columnalign="left"><mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
</mml:mtd>
<mml:mtd columnalign="left"><mml:mtext>if</mml:mtext>
<mml:mi>  </mml:mi>
<mml:mi>I</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo><</mml:mo>
<mml:mi>T</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mrow><mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>></mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr><mml:mtd columnalign="left"><mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mtd>
<mml:mtd columnalign="left"><mml:mtext>otherwise</mml:mtext>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
</p>
<p>The major part of the computations involved in this method is the calculation of local maximum and local minimum. Thus, the Bernsen algorithm fits well the proposed architecture to speed up computations by using two HGW modules working in parallel. <xref ref-type="fig" rid="fig10"> Figure 10</xref>
 shows two input images and the corresponding binarized ones using a window of 31 × 31 and <italic>k</italic>
 = 60. The processing time to binarize a 1024 × 1024 image with this window size is 8.4 milliseconds, 120 frames per second, at a maximum clock frequency of 250 MHz, when the architecture is targeted to a Spartan-6 device. This yields a throughput over 14 Gpixels/second enough for real-time image processing. Recall, however, that real time is a context relative measure.</p>
</sec>
</sec>
<sec id="sec5"><title>5. Conclusions</title>
<p>An efficient implementation of a fast algorithm for arbitrary length max/min filters has been presented. The proposed architecture is very regular and scalable with a good resource-performance tradeoff suitable to be embedded in low-cost FPGA devices. The proposed design takes advantage of distributed memory resources available in current programmable devices without introducing a high-performance penalty. However, for very large kernel sizes, the area for distributed memory increases rapidly and the operating frequency might drop significantly. This motivates the use of the specialized Block Rams as a more efficient solution. The results show that the proposed implementation could achieve the same throughput with less amount of memory resources compared to the previously reported solution. The architecture, when targeted to a Spartan-6 device, can compute a max/min running filter over a 1024 × 1024 image with a kernel size up to 255 in 8.4 milliseconds at a maximum clock frequency of 250 MHz. This performance is sufficient for real-time full-HD video processing. The progress of high resolution image applications on embedded systems requires reviewing existing solutions under this context and proposing hardware accelerators to potentially provide practical, compact, and low power solutions. For future work, it is planned to analyze in detail the power consumption of the proposed implementation, extend further the applicability of the proposed hardware to local adaptive thresholding, and to implement specialized operators in gray-level image morphology.</p>
</sec>
</body>
<back><ack><title>Acknowledgment</title>
<p> The author kindly acknowledges the partial support received from CONACyT, Mexico, through the Research Grant no. 99912.</p>
</ack>
<ref-list><ref id="B1"><label>1</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Brookes</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Algorithms for max and min filters with improved worst-case performance</article-title>
<source><italic>IEEE Transactions on Circuits and Systems II</italic>
</source>
<year>2000</year>
<volume>47</volume>
<issue>9</issue>
<fpage>930</fpage>
<lpage>935</lpage>
<pub-id pub-id-type="other">2-s2.0-0034269522</pub-id>
</element-citation>
</ref>
<ref id="B2"><label>2</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Maragos</surname>
<given-names>P</given-names>
</name>
<name><surname>Schafer</surname>
<given-names>RW</given-names>
</name>
</person-group>
<article-title>Morphological filters. II. Their relations to median, order-statistic, and stack filters</article-title>
<source><italic>IEEE Transactions on Acoustics, Speech, and Signal Processing</italic>
</source>
<year>1987</year>
<volume>35</volume>
<issue>8</issue>
<fpage>1170</fpage>
<lpage>1184</lpage>
<pub-id pub-id-type="other">2-s2.0-0023400430</pub-id>
</element-citation>
</ref>
<ref id="B3"><label>3</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Arce</surname>
<given-names>GR</given-names>
</name>
<name><surname>McLoughlin</surname>
<given-names>MP</given-names>
</name>
</person-group>
<article-title>Theoretical analysis of the max/median filter</article-title>
<source><italic>IEEE Transactions on Acoustics, Speech, and Signal Processing</italic>
</source>
<year>1987</year>
<volume>35</volume>
<issue>1</issue>
<fpage>60</fpage>
<lpage>69</lpage>
<pub-id pub-id-type="other">2-s2.0-0023164597</pub-id>
</element-citation>
</ref>
<ref id="B4"><label>4</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Maragos</surname>
<given-names>P</given-names>
</name>
<name><surname>Schafer</surname>
<given-names>RW</given-names>
</name>
</person-group>
<article-title>Morphological filters. I. Their set-theoretic analysis and relations to linear shift-invariant filters</article-title>
<source><italic>IEEE Transactions on Acoustics, Speech, and Signal Processing</italic>
</source>
<year>1987</year>
<volume>35</volume>
<issue>8</issue>
<fpage>1153</fpage>
<lpage>1169</lpage>
<pub-id pub-id-type="other">2-s2.0-0023400884</pub-id>
</element-citation>
</ref>
<ref id="B5"><label>5</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hodgson</surname>
<given-names>R</given-names>
</name>
<name><surname>Bailey</surname>
<given-names>D</given-names>
</name>
<name><surname>Naylor</surname>
<given-names>M</given-names>
</name>
<name><surname>Ng</surname>
<given-names>A</given-names>
</name>
<name><surname>McNeill</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Properties, implementations and applications of rank filters</article-title>
<source><italic>Image and Vision Computing</italic>
</source>
<year>1985</year>
<volume>3</volume>
<issue>1</issue>
<fpage>3</fpage>
<lpage>14</lpage>
<pub-id pub-id-type="other">2-s2.0-0022011780</pub-id>
</element-citation>
</ref>
<ref id="B6"><label>6</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gil</surname>
<given-names>JY</given-names>
</name>
<name><surname>Kimmel</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Efficient dilation, erosion, opening, and closing algorithms</article-title>
<source><italic>IEEE Transactions on Pattern Analysis and Machine Intelligence</italic>
</source>
<year>2002</year>
<volume>24</volume>
<issue>12</issue>
<fpage>1606</fpage>
<lpage>1617</lpage>
<pub-id pub-id-type="other">2-s2.0-0036941063</pub-id>
</element-citation>
</ref>
<ref id="B7"><label>7</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Yuan</surname>
<given-names>H</given-names>
</name>
<name><surname>Atallah</surname>
<given-names>MJ</given-names>
</name>
</person-group>
<article-title>Running max/min filters using 1+o(1) comparisons per sample</article-title>
<source><italic>IEEE Transactions on Pattern Analysis and Machine Intelligence</italic>
</source>
<year>2011</year>
<volume>33</volume>
<issue>12</issue>
<fpage>2544</fpage>
<lpage>2548</lpage>
<pub-id pub-id-type="other">2-s2.0-80054915025</pub-id>
</element-citation>
</ref>
<ref id="B8"><label>8</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>van Herk</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>A fast algorithm for local minimum and maximum filters on rectangular and octagonal kernels</article-title>
<source><italic>Pattern Recognition Letters</italic>
</source>
<year>1992</year>
<volume>13</volume>
<issue>7</issue>
<fpage>517</fpage>
<lpage>521</lpage>
<pub-id pub-id-type="other">2-s2.0-0001189475</pub-id>
</element-citation>
</ref>
<ref id="B9"><label>9</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gil</surname>
<given-names>J</given-names>
</name>
<name><surname>Werman</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Computing 2-D min median and max filters</article-title>
<source><italic>IEEE Transactions on Pattern Analysis and Machine Intelligence</italic>
</source>
<year>1993</year>
<volume>15</volume>
<issue>5</issue>
<fpage>504</fpage>
<lpage>507</lpage>
<pub-id pub-id-type="other">2-s2.0-0027590120</pub-id>
</element-citation>
</ref>
<ref id="B10"><label>10</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Diamantaras</surname>
<given-names>KI</given-names>
</name>
<name><surname>Kung</surname>
<given-names>SY</given-names>
</name>
</person-group>
<article-title>A linear systolic array for real-time morphological image processing</article-title>
<source><italic>Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology</italic>
</source>
<year>1997</year>
<volume>17</volume>
<issue>1</issue>
<fpage>43</fpage>
<lpage>55</lpage>
<pub-id pub-id-type="other">2-s2.0-0031222122</pub-id>
</element-citation>
</ref>
<ref id="B11"><label>11</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Torres-Huitzil</surname>
<given-names>C</given-names>
</name>
<name><surname>Arias-Estrada</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Real-time image processing with a compact FPGA-based systolic architecture</article-title>
<source><italic>Real-Time Imaging</italic>
</source>
<year>2004</year>
<volume>10</volume>
<issue>3</issue>
<fpage>177</fpage>
<lpage>187</lpage>
<pub-id pub-id-type="other">2-s2.0-4444342086</pub-id>
</element-citation>
</ref>
<ref id="B12"><label>12</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chien</surname>
<given-names>S-Y</given-names>
</name>
<name><surname>Ma</surname>
<given-names>S-Y</given-names>
</name>
<name><surname>Chen</surname>
<given-names>L-G</given-names>
</name>
</person-group>
<article-title>Partial-result-reuse architecture and its design technique for morphological operations with flat structuring elements</article-title>
<source><italic>IEEE Transactions on Circuits and Systems for Video Technology</italic>
</source>
<year>2005</year>
<volume>15</volume>
<issue>9</issue>
<fpage>1156</fpage>
<lpage>1168</lpage>
<pub-id pub-id-type="other">2-s2.0-27644457899</pub-id>
</element-citation>
</ref>
<ref id="B13"><label>13</label>
<element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Clienti</surname>
<given-names>C</given-names>
</name>
<name><surname>Bilodeau</surname>
<given-names>M</given-names>
</name>
<name><surname>Beucher</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>An efficient hardware architecture without line memories for morphological image processing</article-title>
<source><italic>Proceedings of the 10th International Conference on Advanced Concepts for Intelligent Vision Systems (ACIVS '08)</italic>
</source>
<year>2008</year>
<volume>5259</volume>
<publisher-loc>Berlin, Germany</publisher-loc>
<publisher-name>Springer</publisher-name>
<fpage>147</fpage>
<lpage>156</lpage>
<series>Lecture Notes in Computer Science</series>
<pub-id pub-id-type="other">2-s2.0-57049131544</pub-id>
</element-citation>
</ref>
<ref id="B14"><label>14</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Déforges</surname>
<given-names>O</given-names>
</name>
<name><surname>Normand</surname>
<given-names>N</given-names>
</name>
<name><surname>Babel</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Fast recursive grayscale morphology operators: from the algorithm to the pipeline architecture</article-title>
<source><italic>Journal of Real-Time Image Processing</italic>
</source>
<year>2010</year>
<fpage>1</fpage>
<lpage>10</lpage>
<pub-id pub-id-type="other">2-s2.0-77954993498</pub-id>
</element-citation>
</ref>
<ref id="B16"><label>15</label>
<element-citation publication-type="confproc"><person-group person-group-type="author"><name><surname>LaForest</surname>
<given-names>CE</given-names>
</name>
<name><surname>Steffan</surname>
<given-names>JG</given-names>
</name>
</person-group>
<article-title>Efficient multi-ported memories for FPGAs</article-title>
<conf-name>Proceedings of the 18th ACM SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA ’10)</conf-name>
<conf-date>February 2010</conf-date>
<conf-loc>New York, NY, USA</conf-loc>
<publisher-name>ACM</publisher-name>
<fpage>41</fpage>
<lpage>50</lpage>
<pub-id pub-id-type="other">2-s2.0-77951616343</pub-id>
</element-citation>
</ref>
<ref id="B17"><label>16</label>
<element-citation publication-type="book"><collab>Inc Xilinx</collab>
<source><italic>Using Look-up Tables as Distributed RAM in Spartan-3 Generation FPGAs</italic>
</source>
<comment>Application Note XAPP464, March 2005</comment>
</element-citation>
</ref>
<ref id="B18"><label>17</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Urbach</surname>
<given-names>ER</given-names>
</name>
<name><surname>Wilkinson</surname>
<given-names>MHF</given-names>
</name>
</person-group>
<article-title>Efficient 2-D grayscale morphological transformations with arbitrary flat structuring elements</article-title>
<source><italic>IEEE Transactions on Image Processing</italic>
</source>
<year>2008</year>
<volume>17</volume>
<issue>1</issue>
<fpage>1</fpage>
<lpage>8</lpage>
<pub-id pub-id-type="other">2-s2.0-39149125517</pub-id>
<pub-id pub-id-type="pmid">18229799</pub-id>
</element-citation>
</ref>
<ref id="B19"><label>18</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pai</surname>
<given-names>Y-T</given-names>
</name>
<name><surname>Chang</surname>
<given-names>Y-F</given-names>
</name>
<name><surname>Ruan</surname>
<given-names>S-J</given-names>
</name>
</person-group>
<article-title>Adaptive thresholding algorithm: efficient computation technique based on intelligent block detection for degraded document images</article-title>
<source><italic>Pattern Recognition</italic>
</source>
<year>2010</year>
<volume>43</volume>
<issue>9</issue>
<fpage>3177</fpage>
<lpage>3187</lpage>
<pub-id pub-id-type="other">2-s2.0-78649322053</pub-id>
</element-citation>
</ref>
<ref id="B20"><label>19</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Trier</surname>
<given-names>OD</given-names>
</name>
<name><surname>Jain</surname>
<given-names>AK</given-names>
</name>
</person-group>
<article-title>Goal-directed evaluation of binarization methods</article-title>
<source><italic>IEEE Transactions on Pattern Analysis and Machine Intelligence</italic>
</source>
<year>1995</year>
<volume>17</volume>
<issue>12</issue>
<fpage>1191</fpage>
<lpage>1201</lpage>
<pub-id pub-id-type="other">2-s2.0-0029547060</pub-id>
</element-citation>
</ref>
<ref id="B21"><label>20</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sezgin</surname>
<given-names>M</given-names>
</name>
<name><surname>Sankur</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Survey over image thresholding techniques and quantitative performance evaluation</article-title>
<source><italic>Journal of Electronic Imaging</italic>
</source>
<year>2004</year>
<volume>13</volume>
<issue>1</issue>
<fpage>146</fpage>
<lpage>168</lpage>
<pub-id pub-id-type="other">2-s2.0-1842422015</pub-id>
</element-citation>
</ref>
<ref id="B22"><label>21</label>
<element-citation publication-type="confproc"><person-group person-group-type="author"><name><surname>Bernsen</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Dynamic thresholding of gray level images</article-title>
<conf-name>Proceedings of IEEE International Conference on Pattern Recognition (ICPR ’86)</conf-name>
<conf-date>1986</conf-date>
<fpage>1251</fpage>
<lpage>1255</lpage>
</element-citation>
</ref>
</ref-list>
</back>
<floats-group><fig id="fig1" orientation="portrait" position="float"><label>Figure 1</label>
<caption><p>A simplified view of the main steps, forward, backward, and merge, of the HGW algorithm for a linear kernel of size <italic>k</italic>
 over an input sequence <italic>f</italic>
 of size <italic>M</italic>
.</p>
</caption>
<graphic xlink:href="TSWJ2013-108103.001"></graphic>
</fig>
<fig id="fig2" orientation="portrait" position="float"><label>Figure 2</label>
<caption><p>Block diagram of the proposed hardware architecture for running max/min filters based on the HWG algorithm and its main components.</p>
</caption>
<graphic xlink:href="TSWJ2013-108103.002"></graphic>
</fig>
<fig id="fig3" orientation="portrait" position="float"><label>Figure 3</label>
<caption><p>Pipeline scheme used in the HGW architecture. The computations and data flow are organized around a three-stage pipeline, forward, backward, and merge.</p>
</caption>
<graphic xlink:href="TSWJ2013-108103.003"></graphic>
</fig>
<fig id="fig4" orientation="portrait" position="float"><label>Figure 4</label>
<caption><p> Timing diagram snapshot of the architecture functionality for running max filtering over the input pixel stream, <italic>f</italic>
, using a kernel of size <italic>k</italic>
 = 5. The first output result is produced at 12 microseconds as indicated by the vertical line; then, results are produced on each clock cycle. The signal <italic>AddRAM</italic>
 shows the addresses generated through time used for accessing the synchronous single-port memories.</p>
</caption>
<graphic xlink:href="TSWJ2013-108103.004"></graphic>
</fig>
<fig id="fig5" orientation="portrait" position="float"><label>Figure 5</label>
<caption><p> Organization of several HGW processing units to exploit function-level parallelism. The address generator unit works at a clock speed <italic>n</italic>
 times faster than the computational modules.</p>
</caption>
<graphic xlink:href="TSWJ2013-108103.005"></graphic>
</fig>
<fig id="fig6" orientation="portrait" position="float"><label>Figure 6</label>
<caption><p>Examples of 1024 × 1024 test images used to validate the architecture functionality. ((a) and (d)) the input images and (b) max filter by 21 × 21, (c) max filter by 63 × 63, (e) min filter by 21 × 21, and (f) min filter by 63 × 63.</p>
</caption>
<graphic xlink:href="TSWJ2013-108103.006"></graphic>
</fig>
<fig id="fig7" orientation="portrait" position="float"><label>Figure 7</label>
<caption><p> Processing time for the running max/min filters on an 2160 × 1440 input image with different kernel sizes for a straightforward implementation and the Urbach-Wilkinson algorithm. </p>
</caption>
<graphic xlink:href="TSWJ2013-108103.007"></graphic>
</fig>
<fig id="fig8" orientation="portrait" position="float"><label>Figure 8</label>
<caption><p>Processing time for the running max/min filters on an 2160 × 1440 input image with different kernel sizes using a single HGW module in the proposed architecture clocked at 250 MHz. The processing time required for the Urbach-Wilkinson algorithm and the MATLAB implementation of the HGW algorithm is also shown for comparison.</p>
</caption>
<graphic xlink:href="TSWJ2013-108103.008"></graphic>
</fig>
<fig id="fig9" orientation="portrait" position="float"><label>Figure 9</label>
<caption><p>Processing time for different image sizes using three different degrees of parallelism. </p>
</caption>
<graphic xlink:href="TSWJ2013-108103.009"></graphic>
</fig>
<fig id="fig10" orientation="portrait" position="float"><label>Figure 10</label>
<caption><p>Test images used to show the applicability of the architecture for local adaptive thresholding. ((a) and (c)) The input images; ((b) and (d)) thresholded images by using the Bernsen algorithm with a 31 × 31 window and <italic>k</italic>
 = 60.</p>
</caption>
<graphic xlink:href="TSWJ2013-108103.010"></graphic>
</fig>
<table-wrap id="tab1" orientation="portrait" position="float"><label>Table 1</label>
<caption><p>Summary of the hardware resource utilization for the proposed architecture targeted to a Xilinx Spartan-6 LX45 device for different number of instances of the HGW module.</p>
</caption>
<table frame="hsides" rules="groups"><thead><tr><th align="left" rowspan="1" colspan="1">Resource utilization (total available) </th>
<th align="center" rowspan="1" colspan="1"> 1-HGW </th>
<th align="center" rowspan="1" colspan="1"> 2-HGW </th>
<th align="center" rowspan="1" colspan="1"> 4-HGW </th>
</tr>
</thead>
<tbody><tr><td align="left" rowspan="1" colspan="1">Slice registers (54576) </td>
<td align="center" rowspan="1" colspan="1">75 </td>
<td align="center" rowspan="1" colspan="1">150 </td>
<td align="center" rowspan="1" colspan="1">216</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">Slice LUTs (27288) </td>
<td align="center" rowspan="1" colspan="1">258 </td>
<td align="center" rowspan="1" colspan="1">566 </td>
<td align="center" rowspan="1" colspan="1">910</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1"> LUTs used as logic (27288) </td>
<td align="center" rowspan="1" colspan="1">159 </td>
<td align="center" rowspan="1" colspan="1">368 </td>
<td align="center" rowspan="1" colspan="1">518</td>
</tr>
<tr><td align="left" rowspan="1" colspan="1"> LUTs as memory (6408) </td>
<td align="center" rowspan="1" colspan="1">96 </td>
<td align="center" rowspan="1" colspan="1">192 </td>
<td align="center" rowspan="1" colspan="1">384 </td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">RAMB16BWERs (116) </td>
<td align="center" rowspan="1" colspan="1">0 </td>
<td align="center" rowspan="1" colspan="1">0 </td>
<td align="center" rowspan="1" colspan="1">0 </td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">RAMB8BWERs (232) </td>
<td align="center" rowspan="1" colspan="1">0 </td>
<td align="center" rowspan="1" colspan="1">0 </td>
<td align="center" rowspan="1" colspan="1">0 </td>
</tr>
<tr><td align="left" rowspan="1" colspan="1">Maximum frequency </td>
<td align="center" rowspan="1" colspan="1">250 MHz </td>
<td align="center" rowspan="1" colspan="1">232 MHz </td>
<td align="center" rowspan="1" colspan="1">215 MHz </td>
</tr>
</tbody>
</table>
</table-wrap>
</floats-group>
</pmc>
<affiliations><list><country><li>Mexique</li>
</country>
</list>
<tree><country name="Mexique"><noRegion><name sortKey="Torres Huitzil, Cesar" sort="Torres Huitzil, Cesar" uniqKey="Torres Huitzil C" first="Cesar" last="Torres-Huitzil">Cesar Torres-Huitzil</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Ncbi/Merge

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000177 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd -nk 000177 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Ncbi
   |étape=   Merge
   |type=    RBID
   |clé=     PMC:3833061
   |texte=   Resource Efficient Hardware Architecture for Fast Computation of Running Max/Min Filters
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/RBID.i   -Sk "pubmed:24288456" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd   \
       | NlmPubMed2Wicri -a OcrV1

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Resource Efficient Hardware Architecture for Fast Computation of Running Max/Min Filters

Resource Efficient Hardware Architecture for Fast Computation of Running Max/Min Filters

Source :

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki