Instructor | Esteban Meneses, PhD |
esteban DOT meneses AT acm DOT org | |
Institution | Instituto Tecnológico de Costa Rica |
Location | Centro Académico Barrio Amón |
Time | Thursdays 6:00-9:00pm |
Term | I semester, 2021 |
Teaching Assistant | Alex Saenz (alexsaenz AT estudiantec DOT cr) |
SESSION | DATE | TOPIC | PRESENTER |
---|---|---|---|
1 | February 18 | Introduction History |
Instructor |
2 | February 25 | Parallel Programming Design Patterns | Instructor |
3 | March 4 | Shared-memory Programming | Instructor |
4 | March 11 | Distributed-memory Programming | Instructor |
5 | March 18 | Performance Analysis Scientific Visualization |
Instructor |
6 | March 25 | Parallel Programming Models | Students |
April 1 | Holy Week | ||
7 | April 8 | Performance Models | Students |
8 | April 15 | Midterm Exam | |
9 | April 22 | Interconnects | Students |
10 | April 29 | Parallel Algorithms | Students |
11 | May 6 | Parallel Computer Architectures | Students |
12 | May 13 | Accelerators | Students |
13 | May 20 | Fault Tolerance | Students |
14 | May 27 | Job Scheduling | Students |
15 | June 3 | Graph Processing | Students |
16 | June 10 | Cloud Computing | Students |
June 17 |
Resources
Final report sample: paper
An interesting extension to a class project
Presentation:
Prof. Gupta’s tips on presentations and reviews
Reading List
Introduction
- Parallelization of a Denoising Algorithm for Tonal Bioacoustic Signals Using OpenACC Directives (Jorge Castro and Esteban Meneses - IEEE International Work Conference on Bioinspired Intelligence, IWOBI -2018) HTML
- Exascale Computing and Big Data (Daniel A. Reed and Jack Dongarra) HTML
Accelerators
- Demystifying GPU microarchitecture through micro benchmarking (Henry Wong et al – I EEE International Symposium on Performance Analysis of Systems and Software – 2010) PDF
- Reliability Lessons Learned From GPU Experience With The Titan Supercomputer at Oak Ridge Leadership Computing Facility (Devesh Tiwari et al – ACM/IEEE Supercomputing – 2015) PDF
- A Unified Programming Model for Intra- and Inter-Node Offloading on Xeon Phi Clusters (Matthias Noack et al – ACM/IEEE Supercomputing – 2014) PDF
Algorithms
- How Much Parallelism is There in Irregular Applications? (Milind Kulkarni et al – ACM Principles and Practices of Parallel Programming – 2009) PDF
- Faster Topology-aware Collective Algorithms Through Non-minimal Communication (Paul Sack and William Gropp – ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming – 2012) PDF
- Parallel Random Numbers: As Easy as 1, 2, 3 (John K. Salmon et al – ACM/IEEE Supercomputing – 2011) PDF
- Millisecond-Scale Molecular Dynamics Simulations on Anton (David E. Shaw et al – ACM/IEEE Supercomputing – 2009) PDF ALT
Architecture
- IBM POWER7 multicore server processor (B. Sinharoy et al – IBM Journal of Research and Development – 2011) PDF
- Designing reliable systems from unreliable components: the challenges of transistor variability and degradation (Shekhar Borkar – IEEE Micro – 2005) PDF
- 3D-Stacked Memory Architectures for Multi-core Processors (Gabriel H. Loh – International Symposium on Computer Architecture 2008) PDF
- From Microprocessors to Nanostores: Rethinking Data-Centric Systems (Parthasarathy Ranganathan – IEEE Computer Magazine – 2011) PDF
Cloud Computing
- Above the Clouds: A Berkeley View of Cloud Computing (Michael Armbrust et al – White Paper) PDF
- MapReduce: Simplied Data Processing on Large Cluster (Jeffrey Dean and Sanjay Ghemawat – USENIX Symposium on Operating Systems Design & Implementation – 2004) PDF
- Improving MapReduce Performance in Heterogeneous Environments (Matei Zaharia et al – USENIX Symposium on Operating Systems Design & Implementation – 2008) PDF
Fault Tolerance
- Diskless Checkpointing (James S. Plank et al – IEEE Transactions on Parallel and Distributed Systems – 1998) PDF ALT
- Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System (Adam Moody et al – IEEE/ACM Supercomputing – 2010) PDF ALT
- Using Migratable Objects to Enhance Fault Tolerance Schemes in Supercomputers (Esteban Meneses et al – IEEE Transactions on Parallel and Distributed Systems – 2014) PDF
- Containment Domains: A Scalable, Efficient, and Flexible Resilience Scheme for Exascale Systems (Jinsuk Chung et al – IEEE/ACM Supercomputing – 2012) PDF
Graph Processing
- Pregel: a system for large-scale graph processing (Grzegorz Malewicz et al – ACM SIGMOD International Conference on Management of Data – 2010)
- GraphX: Graph Processing in a Distributed Dataflow Framework (Joseph Gonzalez et al – USENIX Symposium on Operating Systems Design & Implementation – 2014) PDF
Interconnects
- Blue Gene/L torus interconnection network (N.R. Adiga et al – IBM Journal of Research and Development – 2010) PDF
- Technology-Driven, Highly-Scalable Dragonfly Topology (John Kim et al – International Symposium on Computer Architecture – 2008) PDF
- Adaptive Routing in High-Radix Clos Network (John Kim et al – ACM/IEEE Supercomputing – 2006) PDF
- Communication Requirements and Interconnect Optimization for High-End Scientific Applications (Shoaib Kamil et al – IEEE Transactions on Parallel and Distributed Systems – 2009) PDF
Performance Models
- Isoefficiency: Measuring the Scalability of Parallel Algorithms and Architectures (Ananth Grama et al – IEEE Concurrency – 1993)
- LogP: A Practical Model of Parallel Computation (David E. Culler et al – Communications of the ACM – 1996)
- Roofline: An insightful Visual Performance model for multicore Architectures (Samuel Williams et al – Communications of the ACM – 2009)
Parallel Programming Models
- Using Simple Abstraction to Reinvent Computing for Parallelism (Uzi Vishkin - Communications of the ACM - 2011)
- A Bridging Model for parallel Computation (Leslie G. Valiant – Communications of the ACM – 1990)
- Stream Processors: Programmability with Efficiency (William J. Dally et al – ACM Queue – 2004)
Scheduling
- A fair share scheduler (J Kay and P Lauder – Communications of the ACM – 1988) PDF ALT
- Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling (A. Mu’alem and D. Feitelson – IEEE Transactions on Parallel and Distributed Systems – 2001) PDF ALT
- Core Algorithms of the Maui Scheduler (D. Jackson, Q. Snell, and M. Clement – International Workshop on Job Scheduling Strategies for Parallel Processing – 2001) PDF
- Backfilling using system-generated predictions rather than user runtime estimates (D. Tsafrir, Y. Etsion, and D. Feitelson – IEEE Transactions on Parallel and Distributed Systems – 2007) PDF ALT
Additional References
- [Accelerators] An Adaptive Performance Modeling Tool for GPU Architectures (Sara S. Baghsorkhi et al – ACM Principles and Practices of Parallel Programming – 2010) PDF
- [Accelerators] GPUs and the future of parallel computing (Stephen W Keckler et al – IEEE Micro – 2011) PDF
- [Accelerators] A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors (Mark Gebhart et al – ACM Transaction on Computer Systems – 2012) PDF
- [Algorithms] A Parallel Hashed Oct-Tree N-body Algorithm (M.S. Warren and J.K. Salmon – ACM/IEEE Supercomputing – 1993).
- [Algorithms] Data Parallel Algorithms (W. Daniel Hillis and Guy L. Steele – Communications of the ACM – 1986).
- [Algorithms] Development of Parallel Methods for a 1024-processor Hypercube (John L. Gustafson et al – SIAM Journal on Scientific and Statistical Computing – 1988).
- [Algorithms] Highly Scalable Parallel Algorithms for Sparse Matrix Factorization (Anshul Gupta et al – IEEE Transactions on Parallel and Distributed Systems – 1997).
- [Algorithms] SUMMA: scalable universal matrix multiplication algorithm (R. A. van de Geijn and J Watts – Concurrency: Practice and Experience – 1997).
- [Architecture] The MIPS R10000 Superscalar Microprocessor (Kenneth C. Yeager – IEEE Micro – 1996).
- [Architecture] The Stanford DASH Multiprocessor (Daniel Lenoski et al – IEEE Computer – 1992).
- [Architecture] Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor (Dean M. Tullsen et al – ISCA – 1996).
- [Cloud Computing] Cloud-driven HPC (Amazon Web Services – HPC Wire – 2014) PDF
- [Fault Tolerance] MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes (George Bosilca et al – IEEE/ACM Supercomputing – 2002)
- [Interconnects] Fat-trees: Universal Networks for Hardware-efficient Supercomputing (Charles E. Leiserson – IEEE Transactions on Computers – 1985).
- [Interconnects] A Survey of Wormhole Routing Techniques in Direct Networks (Lionel M. Ni and Philip McKinley – IEEE Computer – 1993).
- [Interconnects] Deadlock-free Adaptive Routing in Multicomputer Networks Using Virtual Channels (William J. Dally and Hiromichi Aoki – IEEE Transactions on Parallel and Distributed Systems – 1993).
- [Introduction] How Will Rebooting Computing Help IoT? (Bichlien Hoang and Sin-Kuen Hawkins) PDF
- [Languages] OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization (Seyong Lee et al – ACM Principles and Practices of Parallel Programming – 2009) PDF
- [Languages] Compilers and More: The Past, Present and Future of Parallel Loops (Michael Wolfe – HPC Wire – 2015) HTML
- [Languages] Compilers and More: MPI+X (Michael Wolfe – HPC Wire – 2014) HTML
- [Load Balancing] Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors (Sanjeev Kumar et al – ISCA – 2007).
- [Load Balancing] The Implementation of the Cilk-5 Multithreaded Language (Matteo Frigo et al – PLDI -1998).
- [Load Balancing] A dynamic scheduling strategy for the Chare-Kernel system (Wennie Shu and Laxmikant V. Kale – IEEE/ACM Supercomputing – 1989).
- [Memory Consistency] Cohesion: a hybrid memory model for accelerators (John H. Kelm et al – ISCA – 2010).
- [Memory Consistency] Comparative Evaluation of Fine- and Coarse-Grain Approaches for Software Distributed Shared Memory (Sandhya Dwarkadas et al – HPCA – 1999).
- [Memory Consistency] Cooperative Shared Memory: Software and Hardware for Scalable Multiprocessors (Mark D. Hill et al – ACM Transactions on Computer Systems – 1993).
- [Memory System Design] Sequoia: Programming the Memory Hierarchy (Kayvon Fatahalian et al – IEEE/ACM Supercomputing – 2006).
- [Memory System Design] On-chip Memory System Optimization Design for FT64 Scientific Stream Accelerator (Mei Wen et al – IEEE MICRO – 2008).
- [Memory System Design] Comparing Memory Systems for Chip Multiprocessors (Jacob Leverich et al – ISCA – 2007).
- [Performance Models] Characterizing the Influence of System Noise on Large-Scale Applications by Simulation (Torsten Hoefler et al – IEEE/ACM Supercomputing – 2010 ) PDF
- [Performance Models] The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8192 Processors of ASCI Q (Fabrizio Petrini et al – ACM/IEEE Supercomputing – 2003) PDF
- [Programming Models] Parallel Programmability and the Chapel Language (Brad Chamberlain et al – International Journal of High Performance Computing Applications – 2007) PDF ALT
- [Programming Models] The Foundations for Scalable Multi-core Software in Intel® Threading Building Blocks (Alexey Kukanov et al – Intel Technology Journal – 2007)
- [Programming Models] A Survey of Parallel Programming Models and Tools in the Multi and Many-Core Era (Javier Diaz et al – IEEE Transactions on Parallel and Distributed Systems – 2012) PDF
Benchmarks
- HPC Graph Analysis HTML