Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Performance, optimization, and fitness: Connecting applications to architectures

Identifieur interne : 000786 ( Istex/Corpus ); précédent : 000785; suivant : 000787

Performance, optimization, and fitness: Connecting applications to architectures

Auteurs : Mohammad A. Bhuiyan ; Melissa C. Smith ; Vivek K. Pallipuram

Source :

RBID : ISTEX:E43C7B0758A3069211F596C690F59417BBC0471F

English descriptors

Abstract

Recent trends involving multicore processors and graphical processing units (GPUs) focus on exploiting task‐ and thread‐level parallelism. In this paper, we have analyzed various aspects of the performance of these architectures including NVIDIA GPUs, and multicore processors such as Intel Xeon, AMD Opteron, IBM's Cell Broadband Engine. The case study used in this paper is a biological spiking neural network (SNN), implemented with the Izhikevich, Wilson, Morris–Lecar, and Hodgkin–Huxley neuron models. The four SNN models have varying requirements for communication and computation making them useful for performance analysis of the hardware platforms. We report and analyze the variation of performance with network (problem size) scaling, available optimization techniques and execution configuration. A Fitness performance model, that predicts the suitability of the architecture for accelerating an application, is proposed and verified with the SNN implementation results. The Roofline model, another existing performance model, has also been utilized to determine the hardware bottleneck(s) and attainable peak performance of the architectures. Significant speedups for the four SNN neuron models utilizing these architectures are reported; the maximum speedup of 574x was observed in our GPU implementation. Our results and analysis show that a proper match of architecture with algorithm complexity provides the best performance. Copyright © 2010 John Wiley & Sons, Ltd.

Url:
DOI: 10.1002/cpe.1688

Links to Exploration step

ISTEX:E43C7B0758A3069211F596C690F59417BBC0471F

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Performance, optimization, and fitness: Connecting applications to architectures</title>
<author>
<name sortKey="Bhuiyan, Mohammad A" sort="Bhuiyan, Mohammad A" uniqKey="Bhuiyan M" first="Mohammad A." last="Bhuiyan">Mohammad A. Bhuiyan</name>
<affiliation>
<mods:affiliation>Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, U.S.A.</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Smith, Melissa C" sort="Smith, Melissa C" uniqKey="Smith M" first="Melissa C." last="Smith">Melissa C. Smith</name>
<affiliation>
<mods:affiliation>Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, U.S.A.</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Pallipuram, Vivek K" sort="Pallipuram, Vivek K" uniqKey="Pallipuram V" first="Vivek K." last="Pallipuram">Vivek K. Pallipuram</name>
<affiliation>
<mods:affiliation>Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, U.S.A.</mods:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:E43C7B0758A3069211F596C690F59417BBC0471F</idno>
<date when="2011" year="2011">2011</date>
<idno type="doi">10.1002/cpe.1688</idno>
<idno type="url">https://api.istex.fr/document/E43C7B0758A3069211F596C690F59417BBC0471F/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000786</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Performance, optimization, and fitness: Connecting applications to architectures</title>
<author>
<name sortKey="Bhuiyan, Mohammad A" sort="Bhuiyan, Mohammad A" uniqKey="Bhuiyan M" first="Mohammad A." last="Bhuiyan">Mohammad A. Bhuiyan</name>
<affiliation>
<mods:affiliation>Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, U.S.A.</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Smith, Melissa C" sort="Smith, Melissa C" uniqKey="Smith M" first="Melissa C." last="Smith">Melissa C. Smith</name>
<affiliation>
<mods:affiliation>Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, U.S.A.</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Pallipuram, Vivek K" sort="Pallipuram, Vivek K" uniqKey="Pallipuram V" first="Vivek K." last="Pallipuram">Vivek K. Pallipuram</name>
<affiliation>
<mods:affiliation>Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, U.S.A.</mods:affiliation>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Concurrency and Computation: Practice and Experience</title>
<title level="j" type="abbrev">Concurrency Computat.: Pract. Exper.</title>
<idno type="ISSN">1532-0626</idno>
<idno type="eISSN">1532-0634</idno>
<imprint>
<publisher>John Wiley & Sons, Ltd.</publisher>
<pubPlace>Chichester, UK</pubPlace>
<date type="published" when="2011-07">2011-07</date>
<biblScope unit="volume">23</biblScope>
<biblScope unit="issue">10</biblScope>
<biblScope unit="page" from="1066">1066</biblScope>
<biblScope unit="page" to="1100">1100</biblScope>
</imprint>
<idno type="ISSN">1532-0626</idno>
</series>
<idno type="istex">E43C7B0758A3069211F596C690F59417BBC0471F</idno>
<idno type="DOI">10.1002/cpe.1688</idno>
<idno type="ArticleID">CPE1688</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">1532-0626</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Fitness model</term>
<term>GPU</term>
<term>multicore</term>
<term>optimization</term>
<term>performance</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Recent trends involving multicore processors and graphical processing units (GPUs) focus on exploiting task‐ and thread‐level parallelism. In this paper, we have analyzed various aspects of the performance of these architectures including NVIDIA GPUs, and multicore processors such as Intel Xeon, AMD Opteron, IBM's Cell Broadband Engine. The case study used in this paper is a biological spiking neural network (SNN), implemented with the Izhikevich, Wilson, Morris–Lecar, and Hodgkin–Huxley neuron models. The four SNN models have varying requirements for communication and computation making them useful for performance analysis of the hardware platforms. We report and analyze the variation of performance with network (problem size) scaling, available optimization techniques and execution configuration. A Fitness performance model, that predicts the suitability of the architecture for accelerating an application, is proposed and verified with the SNN implementation results. The Roofline model, another existing performance model, has also been utilized to determine the hardware bottleneck(s) and attainable peak performance of the architectures. Significant speedups for the four SNN neuron models utilizing these architectures are reported; the maximum speedup of 574x was observed in our GPU implementation. Our results and analysis show that a proper match of architecture with algorithm complexity provides the best performance. Copyright © 2010 John Wiley & Sons, Ltd.</div>
</front>
</TEI>
<istex>
<corpusName>wiley</corpusName>
<author>
<json:item>
<name>Mohammad A. Bhuiyan</name>
<affiliations>
<json:string>Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, U.S.A.</json:string>
</affiliations>
</json:item>
<json:item>
<name>Melissa C. Smith</name>
<affiliations>
<json:string>Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, U.S.A.</json:string>
</affiliations>
</json:item>
<json:item>
<name>Vivek K. Pallipuram</name>
<affiliations>
<json:string>Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, U.S.A.</json:string>
</affiliations>
</json:item>
</author>
<subject>
<json:item>
<lang>
<json:string>eng</json:string>
</lang>
<value>performance</value>
</json:item>
<json:item>
<lang>
<json:string>eng</json:string>
</lang>
<value>multicore</value>
</json:item>
<json:item>
<lang>
<json:string>eng</json:string>
</lang>
<value>GPU</value>
</json:item>
<json:item>
<lang>
<json:string>eng</json:string>
</lang>
<value>optimization</value>
</json:item>
<json:item>
<lang>
<json:string>eng</json:string>
</lang>
<value>Fitness model</value>
</json:item>
</subject>
<articleId>
<json:string>CPE1688</json:string>
</articleId>
<language>
<json:string>eng</json:string>
</language>
<originalGenre>
<json:string>article</json:string>
</originalGenre>
<abstract>Recent trends involving multicore processors and graphical processing units (GPUs) focus on exploiting task‐ and thread‐level parallelism. In this paper, we have analyzed various aspects of the performance of these architectures including NVIDIA GPUs, and multicore processors such as Intel Xeon, AMD Opteron, IBM's Cell Broadband Engine. The case study used in this paper is a biological spiking neural network (SNN), implemented with the Izhikevich, Wilson, Morris–Lecar, and Hodgkin–Huxley neuron models. The four SNN models have varying requirements for communication and computation making them useful for performance analysis of the hardware platforms. We report and analyze the variation of performance with network (problem size) scaling, available optimization techniques and execution configuration. A Fitness performance model, that predicts the suitability of the architecture for accelerating an application, is proposed and verified with the SNN implementation results. The Roofline model, another existing performance model, has also been utilized to determine the hardware bottleneck(s) and attainable peak performance of the architectures. Significant speedups for the four SNN neuron models utilizing these architectures are reported; the maximum speedup of 574x was observed in our GPU implementation. Our results and analysis show that a proper match of architecture with algorithm complexity provides the best performance. Copyright © 2010 John Wiley & Sons, Ltd.</abstract>
<qualityIndicators>
<score>7.496</score>
<pdfVersion>1.3</pdfVersion>
<pdfPageSize>593.887 x 782.362 pts</pdfPageSize>
<refBibsNative>true</refBibsNative>
<keywordCount>5</keywordCount>
<abstractCharCount>1484</abstractCharCount>
<pdfWordCount>15448</pdfWordCount>
<pdfCharCount>92191</pdfCharCount>
<pdfPageCount>35</pdfPageCount>
<abstractWordCount>208</abstractWordCount>
</qualityIndicators>
<title>Performance, optimization, and fitness: Connecting applications to architectures</title>
<genre>
<json:string>article</json:string>
</genre>
<host>
<volume>23</volume>
<publisherId>
<json:string>CPE</json:string>
</publisherId>
<pages>
<total>35</total>
<last>1100</last>
<first>1066</first>
</pages>
<issn>
<json:string>1532-0626</json:string>
</issn>
<issue>10</issue>
<subject>
<json:item>
<value>Research Article</value>
</json:item>
</subject>
<genre>
<json:string>journal</json:string>
</genre>
<language>
<json:string>unknown</json:string>
</language>
<eissn>
<json:string>1532-0634</json:string>
</eissn>
<title>Concurrency and Computation: Practice and Experience</title>
<doi>
<json:string>10.1002/(ISSN)1532-0634</json:string>
</doi>
</host>
<publicationDate>2011</publicationDate>
<copyrightDate>2011</copyrightDate>
<doi>
<json:string>10.1002/cpe.1688</json:string>
</doi>
<id>E43C7B0758A3069211F596C690F59417BBC0471F</id>
<score>0.06944263</score>
<fulltext>
<json:item>
<original>true</original>
<mimetype>application/pdf</mimetype>
<extension>pdf</extension>
<uri>https://api.istex.fr/document/E43C7B0758A3069211F596C690F59417BBC0471F/fulltext/pdf</uri>
</json:item>
<json:item>
<original>false</original>
<mimetype>application/zip</mimetype>
<extension>zip</extension>
<uri>https://api.istex.fr/document/E43C7B0758A3069211F596C690F59417BBC0471F/fulltext/zip</uri>
</json:item>
<istex:fulltextTEI uri="https://api.istex.fr/document/E43C7B0758A3069211F596C690F59417BBC0471F/fulltext/tei">
<teiHeader>
<fileDesc>
<titleStmt>
<title level="a" type="main" xml:lang="en">Performance, optimization, and fitness: Connecting applications to architectures</title>
</titleStmt>
<publicationStmt>
<authority>ISTEX</authority>
<publisher>John Wiley & Sons, Ltd.</publisher>
<pubPlace>Chichester, UK</pubPlace>
<availability>
<p>Copyright © 2010 John Wiley & Sons, Ltd.</p>
</availability>
<date>2011</date>
</publicationStmt>
<sourceDesc>
<biblStruct type="inbook">
<analytic>
<title level="a" type="main" xml:lang="en">Performance, optimization, and fitness: Connecting applications to architectures</title>
<author xml:id="author-1">
<persName>
<forename type="first">Mohammad A.</forename>
<surname>Bhuiyan</surname>
</persName>
<affiliation>Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, U.S.A.</affiliation>
</author>
<author xml:id="author-2">
<persName>
<forename type="first">Melissa C.</forename>
<surname>Smith</surname>
</persName>
<note type="correspondence">
<p>Correspondence: Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, U.S.A.</p>
</note>
<affiliation>Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, U.S.A.</affiliation>
</author>
<author xml:id="author-3">
<persName>
<forename type="first">Vivek K.</forename>
<surname>Pallipuram</surname>
</persName>
<affiliation>Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, U.S.A.</affiliation>
</author>
</analytic>
<monogr>
<title level="j">Concurrency and Computation: Practice and Experience</title>
<title level="j" type="abbrev">Concurrency Computat.: Pract. Exper.</title>
<idno type="pISSN">1532-0626</idno>
<idno type="eISSN">1532-0634</idno>
<idno type="DOI">10.1002/(ISSN)1532-0634</idno>
<imprint>
<publisher>John Wiley & Sons, Ltd.</publisher>
<pubPlace>Chichester, UK</pubPlace>
<date type="published" when="2011-07"></date>
<biblScope unit="volume">23</biblScope>
<biblScope unit="issue">10</biblScope>
<biblScope unit="page" from="1066">1066</biblScope>
<biblScope unit="page" to="1100">1100</biblScope>
</imprint>
</monogr>
<idno type="istex">E43C7B0758A3069211F596C690F59417BBC0471F</idno>
<idno type="DOI">10.1002/cpe.1688</idno>
<idno type="ArticleID">CPE1688</idno>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<creation>
<date>2011</date>
</creation>
<langUsage>
<language ident="en">en</language>
</langUsage>
<abstract xml:lang="en">
<p>Recent trends involving multicore processors and graphical processing units (GPUs) focus on exploiting task‐ and thread‐level parallelism. In this paper, we have analyzed various aspects of the performance of these architectures including NVIDIA GPUs, and multicore processors such as Intel Xeon, AMD Opteron, IBM's Cell Broadband Engine. The case study used in this paper is a biological spiking neural network (SNN), implemented with the Izhikevich, Wilson, Morris–Lecar, and Hodgkin–Huxley neuron models. The four SNN models have varying requirements for communication and computation making them useful for performance analysis of the hardware platforms. We report and analyze the variation of performance with network (problem size) scaling, available optimization techniques and execution configuration. A Fitness performance model, that predicts the suitability of the architecture for accelerating an application, is proposed and verified with the SNN implementation results. The Roofline model, another existing performance model, has also been utilized to determine the hardware bottleneck(s) and attainable peak performance of the architectures. Significant speedups for the four SNN neuron models utilizing these architectures are reported; the maximum speedup of 574x was observed in our GPU implementation. Our results and analysis show that a proper match of architecture with algorithm complexity provides the best performance. Copyright © 2010 John Wiley & Sons, Ltd.</p>
</abstract>
<textClass xml:lang="en">
<keywords scheme="keyword">
<list>
<head>keywords</head>
<item>
<term>performance</term>
</item>
<item>
<term>multicore</term>
</item>
<item>
<term>GPU</term>
</item>
<item>
<term>optimization</term>
</item>
<item>
<term>Fitness model</term>
</item>
</list>
</keywords>
</textClass>
<textClass>
<keywords scheme="Journal Subject">
<list>
<head>article-category</head>
<item>
<term>Research Article</term>
</item>
</list>
</keywords>
</textClass>
</profileDesc>
<revisionDesc>
<change when="2010-07-07">Received</change>
<change when="2010-10-16">Registration</change>
<change when="2011-07">Published</change>
</revisionDesc>
</teiHeader>
</istex:fulltextTEI>
<json:item>
<original>false</original>
<mimetype>text/plain</mimetype>
<extension>txt</extension>
<uri>https://api.istex.fr/document/E43C7B0758A3069211F596C690F59417BBC0471F/fulltext/txt</uri>
</json:item>
</fulltext>
<metadata>
<istex:metadataXml wicri:clean="Wiley, elements deleted: body">
<istex:xmlDeclaration>version="1.0" encoding="UTF-8" standalone="yes"</istex:xmlDeclaration>
<istex:document>
<component version="2.0" type="serialArticle" xml:lang="en">
<header>
<publicationMeta level="product">
<publisherInfo>
<publisherName>John Wiley & Sons, Ltd.</publisherName>
<publisherLoc>Chichester, UK</publisherLoc>
</publisherInfo>
<doi registered="yes">10.1002/(ISSN)1532-0634</doi>
<issn type="print">1532-0626</issn>
<issn type="electronic">1532-0634</issn>
<idGroup>
<id type="product" value="CPE"></id>
</idGroup>
<titleGroup>
<title type="main" xml:lang="en" sort="CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE">Concurrency and Computation: Practice and Experience</title>
<title type="short">Concurrency Computat.: Pract. Exper.</title>
</titleGroup>
<selfCitationGroup>
<citation type="ancestor" xml:id="cit1">
<journalTitle>Concurrency: Practice and Experience</journalTitle>
<accessionId ref="info:x-wiley/issn/10403108">1040-3108</accessionId>
<accessionId ref="info:x-wiley/issn/10969128">1096-9128</accessionId>
<pubYear year="2000">2000</pubYear>
<vol>12</vol>
<issue>15</issue>
</citation>
</selfCitationGroup>
</publicationMeta>
<publicationMeta level="part" position="100">
<doi origin="wiley" registered="yes">10.1002/cpe.v23.10</doi>
<numberingGroup>
<numbering type="journalVolume" number="23">23</numbering>
<numbering type="journalIssue">10</numbering>
</numberingGroup>
<coverDate startDate="2011-07">July 2011</coverDate>
</publicationMeta>
<publicationMeta level="unit" type="article" position="30" status="forIssue">
<doi origin="wiley" registered="yes">10.1002/cpe.1688</doi>
<idGroup>
<id type="unit" value="CPE1688"></id>
</idGroup>
<countGroup>
<count type="pageTotal" number="35"></count>
</countGroup>
<titleGroup>
<title type="articleCategory">Research Article</title>
<title type="tocHeading1">Research Articles</title>
</titleGroup>
<copyright ownership="publisher">Copyright © 2010 John Wiley & Sons, Ltd.</copyright>
<eventGroup>
<event type="manuscriptReceived" date="2010-07-07"></event>
<event type="manuscriptRevised" date="2010-10-05"></event>
<event type="manuscriptAccepted" date="2010-10-16"></event>
<event type="xmlConverted" agent="Converter:JWSART34_TO_WML3G version:2.5 mode:FullText mathml2tex" date="2011-06-02"></event>
<event type="publishedOnlineEarlyUnpaginated" date="2010-12-28"></event>
<event type="firstOnline" date="2010-12-28"></event>
<event type="publishedOnlineFinalForm" date="2011-06-02"></event>
<event type="xmlConverted" agent="Converter:WILEY_ML3G_TO_WILEY_ML3GV2 version:3.8.8" date="2014-01-16"></event>
<event type="xmlConverted" agent="Converter:WML3G_To_WML3G version:4.1.7 mode:FullText,remove_FC" date="2014-10-16"></event>
</eventGroup>
<numberingGroup>
<numbering type="pageFirst">1066</numbering>
<numbering type="pageLast">1100</numbering>
</numberingGroup>
<correspondenceTo>Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, U.S.A.</correspondenceTo>
<linkGroup>
<link type="toTypesetVersion" href="file:CPE.CPE1688.pdf"></link>
</linkGroup>
</publicationMeta>
<contentMeta>
<countGroup>
<count type="figureTotal" number="26"></count>
<count type="tableTotal" number="18"></count>
<count type="referenceTotal" number="30"></count>
</countGroup>
<titleGroup>
<title type="main" xml:lang="en">Performance, optimization, and fitness: Connecting applications to architectures</title>
<title type="short" xml:lang="en">CONNECTING APPLICATIONS TO ARCHITECTURES</title>
</titleGroup>
<creators>
<creator xml:id="au1" creatorRole="author" affiliationRef="#af1">
<personName>
<givenNames>Mohammad A.</givenNames>
<familyName>Bhuiyan</familyName>
</personName>
</creator>
<creator xml:id="au2" creatorRole="author" affiliationRef="#af1" corresponding="yes">
<personName>
<givenNames>Melissa C.</givenNames>
<familyName>Smith</familyName>
</personName>
<contactDetails>
<email>smithmc@clemson.edu</email>
</contactDetails>
</creator>
<creator xml:id="au3" creatorRole="author" affiliationRef="#af1">
<personName>
<givenNames>Vivek K.</givenNames>
<familyName>Pallipuram</familyName>
</personName>
</creator>
</creators>
<affiliationGroup>
<affiliation xml:id="af1" countryCode="US" type="organization">
<unparsedAffiliation>Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, U.S.A.</unparsedAffiliation>
</affiliation>
</affiliationGroup>
<keywordGroup xml:lang="en" type="author">
<keyword xml:id="kwd1">performance</keyword>
<keyword xml:id="kwd2">multicore</keyword>
<keyword xml:id="kwd3">GPU</keyword>
<keyword xml:id="kwd4">optimization</keyword>
<keyword xml:id="kwd5">Fitness model</keyword>
</keywordGroup>
<abstractGroup>
<abstract type="main" xml:lang="en">
<title type="main">Abstract</title>
<p>Recent trends involving multicore processors and graphical processing units (GPUs) focus on exploiting task‐ and thread‐level parallelism. In this paper, we have analyzed various aspects of the performance of these architectures including NVIDIA GPUs, and multicore processors such as Intel Xeon, AMD Opteron, IBM's Cell Broadband Engine. The case study used in this paper is a biological spiking neural network (SNN), implemented with the Izhikevich, Wilson, Morris–Lecar, and Hodgkin–Huxley neuron models. The four SNN models have varying requirements for communication and computation making them useful for performance analysis of the hardware platforms. We report and analyze the variation of performance with network (problem size) scaling, available optimization techniques and execution configuration. A Fitness performance model, that predicts the suitability of the architecture for accelerating an application, is proposed and verified with the SNN implementation results. The Roofline model, another existing performance model, has also been utilized to determine the hardware bottleneck(s) and attainable peak performance of the architectures. Significant speedups for the four SNN neuron models utilizing these architectures are reported; the maximum speedup of 574x was observed in our GPU implementation. Our results and analysis show that a proper match of architecture with algorithm complexity provides the best performance. Copyright © 2010 John Wiley & Sons, Ltd.</p>
</abstract>
</abstractGroup>
</contentMeta>
</header>
</component>
</istex:document>
</istex:metadataXml>
<mods version="3.6">
<titleInfo lang="en">
<title>Performance, optimization, and fitness: Connecting applications to architectures</title>
</titleInfo>
<titleInfo type="abbreviated" lang="en">
<title>CONNECTING APPLICATIONS TO ARCHITECTURES</title>
</titleInfo>
<titleInfo type="alternative" contentType="CDATA" lang="en">
<title>Performance, optimization, and fitness: Connecting applications to architectures</title>
</titleInfo>
<name type="personal">
<namePart type="given">Mohammad A.</namePart>
<namePart type="family">Bhuiyan</namePart>
<affiliation>Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, U.S.A.</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Melissa C.</namePart>
<namePart type="family">Smith</namePart>
<affiliation>Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, U.S.A.</affiliation>
<description>Correspondence: Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, U.S.A.</description>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Vivek K.</namePart>
<namePart type="family">Pallipuram</namePart>
<affiliation>Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, U.S.A.</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<typeOfResource>text</typeOfResource>
<genre type="article" displayLabel="article"></genre>
<originInfo>
<publisher>John Wiley & Sons, Ltd.</publisher>
<place>
<placeTerm type="text">Chichester, UK</placeTerm>
</place>
<dateIssued encoding="w3cdtf">2011-07</dateIssued>
<dateCaptured encoding="w3cdtf">2010-07-07</dateCaptured>
<dateValid encoding="w3cdtf">2010-10-16</dateValid>
<copyrightDate encoding="w3cdtf">2011</copyrightDate>
</originInfo>
<language>
<languageTerm type="code" authority="rfc3066">en</languageTerm>
<languageTerm type="code" authority="iso639-2b">eng</languageTerm>
</language>
<physicalDescription>
<internetMediaType>text/html</internetMediaType>
<extent unit="figures">26</extent>
<extent unit="tables">18</extent>
<extent unit="references">30</extent>
</physicalDescription>
<abstract lang="en">Recent trends involving multicore processors and graphical processing units (GPUs) focus on exploiting task‐ and thread‐level parallelism. In this paper, we have analyzed various aspects of the performance of these architectures including NVIDIA GPUs, and multicore processors such as Intel Xeon, AMD Opteron, IBM's Cell Broadband Engine. The case study used in this paper is a biological spiking neural network (SNN), implemented with the Izhikevich, Wilson, Morris–Lecar, and Hodgkin–Huxley neuron models. The four SNN models have varying requirements for communication and computation making them useful for performance analysis of the hardware platforms. We report and analyze the variation of performance with network (problem size) scaling, available optimization techniques and execution configuration. A Fitness performance model, that predicts the suitability of the architecture for accelerating an application, is proposed and verified with the SNN implementation results. The Roofline model, another existing performance model, has also been utilized to determine the hardware bottleneck(s) and attainable peak performance of the architectures. Significant speedups for the four SNN neuron models utilizing these architectures are reported; the maximum speedup of 574x was observed in our GPU implementation. Our results and analysis show that a proper match of architecture with algorithm complexity provides the best performance. Copyright © 2010 John Wiley & Sons, Ltd.</abstract>
<subject lang="en">
<genre>keywords</genre>
<topic>performance</topic>
<topic>multicore</topic>
<topic>GPU</topic>
<topic>optimization</topic>
<topic>Fitness model</topic>
</subject>
<relatedItem type="host">
<titleInfo>
<title>Concurrency and Computation: Practice and Experience</title>
</titleInfo>
<titleInfo type="abbreviated">
<title>Concurrency Computat.: Pract. Exper.</title>
</titleInfo>
<genre type="journal">journal</genre>
<subject>
<genre>article-category</genre>
<topic>Research Article</topic>
</subject>
<identifier type="ISSN">1532-0626</identifier>
<identifier type="eISSN">1532-0634</identifier>
<identifier type="DOI">10.1002/(ISSN)1532-0634</identifier>
<identifier type="PublisherID">CPE</identifier>
<part>
<date>2011</date>
<detail type="volume">
<caption>vol.</caption>
<number>23</number>
</detail>
<detail type="issue">
<caption>no.</caption>
<number>10</number>
</detail>
<extent unit="pages">
<start>1066</start>
<end>1100</end>
<total>35</total>
</extent>
</part>
</relatedItem>
<relatedItem type="preceding">
<titleInfo>
<title>Concurrency: Practice and Experience</title>
</titleInfo>
<identifier type="ISSN">1040-3108</identifier>
<identifier type="ISSN">1096-9128</identifier>
<part>
<date point="end">2000</date>
<detail type="volume">
<caption>last vol.</caption>
<number>12</number>
</detail>
<detail type="issue">
<caption>last no.</caption>
<number>15</number>
</detail>
</part>
</relatedItem>
<identifier type="istex">E43C7B0758A3069211F596C690F59417BBC0471F</identifier>
<identifier type="DOI">10.1002/cpe.1688</identifier>
<identifier type="ArticleID">CPE1688</identifier>
<accessCondition type="use and reproduction" contentType="copyright">Copyright © 2010 John Wiley & Sons, Ltd.</accessCondition>
<recordInfo>
<recordContentSource>WILEY</recordContentSource>
<recordOrigin>John Wiley & Sons, Ltd.</recordOrigin>
</recordInfo>
</mods>
</metadata>
<enrichments>
<json:item>
<type>multicat</type>
<uri>https://api.istex.fr/document/E43C7B0758A3069211F596C690F59417BBC0471F/enrichments/multicat</uri>
</json:item>
</enrichments>
<serie></serie>
</istex>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Istex/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000786 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Corpus/biblio.hfd -nk 000786 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Istex
   |étape=   Corpus
   |type=    RBID
   |clé=     ISTEX:E43C7B0758A3069211F596C690F59417BBC0471F
   |texte=   Performance, optimization, and fitness: Connecting applications to architectures
}}

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024