Yelick, “The International Exascale Software Project RoadMap”, Journal of High Performance Computer Applications, vol. Lethin, “A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction”, in Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, ser. Abdelrahman, “hicuda: High-level gpgpu programming”, IEEE Transactions on Parallel and Distributed Systems, vol. de Supinski, “OpenMP for Accelerators.” in IWOMP’11, 2011, pp. HMPP, “HMPP Workbench, a directive-based compiler for hybrid computing.Using our Kernel Tuning Toolkit, we show that with autotuning most of the kernels reach near-peak performance on various GPUs and outperform baseline implementations on CPUs and Xeon Phis. PGI Accelerator, “The Portland Group, PGI Fortran and C Accelarator Programming Model. In this paper, we introduce a benchmark set of ten autotunable kernels for important computational problems implemented in OpenCL or CUDA. ![]() Autotuning OpenACC Work Distribution via Direct Search.To appear at Extreme Science and Engineering Discovery Enviroment (XSEDE15), Jul. IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). Overview and Comparison of OpenCL and CUDA Technology for GPGPU. Po-Yu Chen Chun-Chieh Lan Long-Sheng Huang and Kuo-Hsuan Wu.A Performance Comparison of CUDA and OpenCL. The Open ACC Application Programming Interface, Version1.0, November 2011.UCB/EECS-2006-183, Electrical Engineering and Computer Science, University of California at Berkeley “The landscape of parallel computing research: a view from Berkeley”. International Conference of Parallel Processing (ICPP), pp. Some of these models, like CUDA and OpenCL are designed for low-level optimization in C/C++. “A Comprehensive Performance Comparison of CUDA and OpenCL”. Modern GPUs support a wide range of parallel programming models. “A characterization of the rodinia benchmark suite with comparison to contemporary cmp workloads”, in Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’10), pp. Che, S., Jeremy, W., Sheaffer, Michael, B., Lukasz G.27th International Conference on Architecture of Computing Systems (ARCS). “A comparison of CUDA and OpenACC: Accelerating the Tsunami Simulation EasyWave”. Christgau, S., Spazier, J., Schnor, B., Hammitzsch, M., Babeyko, A.2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC). “Accelerating Hydrocodes with OpenACC, OpenCL and CUDA”. 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. “CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a memory-bound CFD Application”. Hoshino, T., Maruyama, N., Matsuoka, S.Proceedings of 2011 International Conference on Machine Learning and Cybernetics (ICMLC), IEEE, Guilin, pp.1453–1459. “A Retrieval System of Vehicles Based on Recognition of License Plates”.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |