HapticV1, Main, Exploration, bibRecord, 000E69

Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications

Identifieur interne : 000E69 ( Main/Exploration ); précédent : 000E68; suivant : 000E70

Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications

Auteurs : Javier Cabezas ; Isaac Gelado ; John E. Stone ; Nacho Navarro ; David B. Kirk ; Wen-Mei Hwu

Source :

IEEE transactions on parallel and distributed systems : a publication of the IEEE Computer Society [ 1045-9219 ] ; 2014.

RBID : PMC:4500157

Abstract

Heterogeneous parallel computing applications often process large data sets that require multiple GPUs to jointly meet their needs for physical memory capacity and compute throughput. However, the lack of high-level abstractions in previous heterogeneous parallel programming models force programmers to resort to multiple code versions, complex data copy steps and synchronization schemes when exchanging data between multiple GPU devices, which results in high software development cost, poor maintainability, and even poor performance. This paper describes the HPE runtime system, and the associated architecture support, which enables a simple, efficient programming interface for exchanging data between multiple GPUs through either interconnects or cross-node network interfaces. The runtime and architecture support presented in this paper can also be used to support other types of accelerators. We show that the simplified programming interface reduces programming complexity. The research presented in this paper started in 2009. It has been implemented and tested extensively in several generations of HPE runtime systems as well as adopted into the NVIDIA GPU hardware and drivers for CUDA 4.0 and beyond since 2011. The availability of real hardware that support key HPE features gives rise to a rare opportunity for studying the effectiveness of the hardware support by running important benchmarks on real runtime and hardware. Experimental results show that in a exemplar heterogeneous system, peer DMA and double-buffering, pinned buffers, and software techniques can improve the inter-accelerator data communication bandwidth by 2×. They can also improve the execution speed by 1.6× for a 3D finite difference, 2.5× for 1D FFT, and 1.6× for merge sort, all measured on real hardware. The proposed architecture support enables the HPE runtime to transparently deploy these optimizations under simple portable user code, allowing system designers to freely employ devices of different capabilities. We further argue that simple interfaces such as HPE are needed for most applications to benefit from advanced hardware features in practice.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4500157

DOI: 10.1109/TPDS.2014.2316825
PubMed: 26180487
PubMed Central: 4500157

Affiliations:

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications</title>
<author><name sortKey="Cabezas, Javier" sort="Cabezas, Javier" uniqKey="Cabezas J" first="Javier" last="Cabezas">Javier Cabezas</name>
</author>
<author><name sortKey="Gelado, Isaac" sort="Gelado, Isaac" uniqKey="Gelado I" first="Isaac" last="Gelado">Isaac Gelado</name>
</author>
<author><name sortKey="Stone, John E" sort="Stone, John E" uniqKey="Stone J" first="John E." last="Stone">John E. Stone</name>
</author>
<author><name sortKey="Navarro, Nacho" sort="Navarro, Nacho" uniqKey="Navarro N" first="Nacho" last="Navarro">Nacho Navarro</name>
</author>
<author><name sortKey="Kirk, David B" sort="Kirk, David B" uniqKey="Kirk D" first="David B." last="Kirk">David B. Kirk</name>
</author>
<author><name sortKey="Hwu, Wen Mei" sort="Hwu, Wen Mei" uniqKey="Hwu W" first="Wen-Mei" last="Hwu">Wen-Mei Hwu</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">26180487</idno>
<idno type="pmc">4500157</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4500157</idno>
<idno type="RBID">PMC:4500157</idno>
<idno type="doi">10.1109/TPDS.2014.2316825</idno>
<date when="2014">2014</date>
<idno type="wicri:Area/Pmc/Corpus">001743</idno>
<idno type="wicri:Area/Pmc/Curation">001743</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000975</idno>
<idno type="wicri:Area/Ncbi/Merge">003A95</idno>
<idno type="wicri:Area/Ncbi/Curation">003A95</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">003A95</idno>
<idno type="wicri:doubleKey">1045-9219:2014:Cabezas J:runtime:and:architecture</idno>
<idno type="wicri:Area/Main/Merge">000E69</idno>
<idno type="wicri:Area/Main/Curation">000E69</idno>
<idno type="wicri:Area/Main/Exploration">000E69</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications</title>
<author><name sortKey="Cabezas, Javier" sort="Cabezas, Javier" uniqKey="Cabezas J" first="Javier" last="Cabezas">Javier Cabezas</name>
</author>
<author><name sortKey="Gelado, Isaac" sort="Gelado, Isaac" uniqKey="Gelado I" first="Isaac" last="Gelado">Isaac Gelado</name>
</author>
<author><name sortKey="Stone, John E" sort="Stone, John E" uniqKey="Stone J" first="John E." last="Stone">John E. Stone</name>
</author>
<author><name sortKey="Navarro, Nacho" sort="Navarro, Nacho" uniqKey="Navarro N" first="Nacho" last="Navarro">Nacho Navarro</name>
</author>
<author><name sortKey="Kirk, David B" sort="Kirk, David B" uniqKey="Kirk D" first="David B." last="Kirk">David B. Kirk</name>
</author>
<author><name sortKey="Hwu, Wen Mei" sort="Hwu, Wen Mei" uniqKey="Hwu W" first="Wen-Mei" last="Hwu">Wen-Mei Hwu</name>
</author>
</analytic>
<series><title level="j">IEEE transactions on parallel and distributed systems : a publication of the IEEE Computer Society</title>
<idno type="ISSN">1045-9219</idno>
<idno type="eISSN">1558-2183</idno>
<imprint><date when="2014">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><p id="P1">Heterogeneous parallel computing applications often process large data sets that require multiple GPUs to jointly meet their needs for physical memory capacity and compute throughput. However, the lack of high-level abstractions in previous heterogeneous parallel programming models force programmers to resort to multiple code versions, complex data copy steps and synchronization schemes when exchanging data between multiple GPU devices, which results in high software development cost, poor maintainability, and even poor performance. This paper describes the HPE runtime system, and the associated architecture support, which enables a simple, efficient programming interface for exchanging data between multiple GPUs through either interconnects or cross-node network interfaces. The runtime and architecture support presented in this paper can also be used to support other types of accelerators. We show that the simplified programming interface reduces programming complexity. The research presented in this paper started in 2009. It has been implemented and tested extensively in several generations of HPE runtime systems as well as adopted into the NVIDIA GPU hardware and drivers for CUDA 4.0 and beyond since 2011. The availability of real hardware that support key HPE features gives rise to a rare opportunity for studying the effectiveness of the hardware support by running important benchmarks on real runtime and hardware. Experimental results show that in a exemplar heterogeneous system, peer DMA and double-buffering, pinned buffers, and software techniques can improve the inter-accelerator data communication bandwidth by 2×. They can also improve the execution speed by 1.6× for a 3D finite difference, 2.5× for 1D FFT, and 1.6× for merge sort, all measured on real hardware. The proposed architecture support enables the HPE runtime to transparently deploy these optimizations under simple portable user code, allowing system designers to freely employ devices of different capabilities. We further argue that simple interfaces such as HPE are needed for most applications to benefit from advanced hardware features in practice.</p>
</div>
</front>
</TEI>
<affiliations><list></list>
<tree><noCountry><name sortKey="Cabezas, Javier" sort="Cabezas, Javier" uniqKey="Cabezas J" first="Javier" last="Cabezas">Javier Cabezas</name>
<name sortKey="Gelado, Isaac" sort="Gelado, Isaac" uniqKey="Gelado I" first="Isaac" last="Gelado">Isaac Gelado</name>
<name sortKey="Hwu, Wen Mei" sort="Hwu, Wen Mei" uniqKey="Hwu W" first="Wen-Mei" last="Hwu">Wen-Mei Hwu</name>
<name sortKey="Kirk, David B" sort="Kirk, David B" uniqKey="Kirk D" first="David B." last="Kirk">David B. Kirk</name>
<name sortKey="Navarro, Nacho" sort="Navarro, Nacho" uniqKey="Navarro N" first="Nacho" last="Navarro">Nacho Navarro</name>
<name sortKey="Stone, John E" sort="Stone, John E" uniqKey="Stone J" first="John E." last="Stone">John E. Stone</name>
</noCountry>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/HapticV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000E69 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000E69 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    HapticV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     PMC:4500157
   |texte=   Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:26180487" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a HapticV1

This area was generated with Dilib version V0.6.23.
Data generation: Mon Jun 13 01:09:46 2016. Site generation: Wed Mar 6 09:54:07 2024

Serveur d'exploration sur les dispositifs haptiques

Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications

Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki

	Serveur d'exploration sur les dispositifs haptiques
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.