Instructor | Esteban Meneses, PhD |
esteban DOT meneses AT acm DOT org | |
Institution | Instituto Tecnológico de Costa Rica |
Location | Centro Académico Barrio Amón |
Time | Tuesdays 6:00-9:00pm |
Term | I semester, 2022 |
Teaching Assistant | Alex Saenz (alexsaenz AT estudiantec DOT cr) |
SESSION | DATE | TOPIC | READING | PRESENTER |
---|---|---|---|---|
1 | February 8 | Introduction History |
Instructor | |
2 | February 15 | Parallel Computing Reasoning | INTRO1 | 1. Mariela Abdalah Instructor |
3 | February 22 | Parallel Programming Design Patterns | Instructor | |
4 | March 1 | Programming Models | PROGMOD1 PROGMOD2 PROGMOD3 |
1. Daniel Piedra 2. Gabriel Barboza 3. Javier Cordero |
5 | March 8 | Shared-memory Programming | Instructor | |
6 | March 15 | Scalability | SCAL1 SCAL2 SCAL3 |
1. Deivid Calvo 2. ---------- 3. Javier Buzano |
7 | March 22 | Interconnects | INTER1 INTER2 INTER3 |
1. Jeison Meléndez 2. Ángel Phillips 3. Andrés Vargas |
8 | March 29 | Distributed-memory Programming |
|
Instructor |
9 | April 5 | Performance Models | PERFMOD1 PERFMOD2 PERFMOD3 |
1. ---------- 2. ---------- 3. Jeison Meléndez |
April 12 | Holy Week | |||
10 | April 19 | Performance Analysis | PERFAN1 PERFAN2 PERFAN3 |
1. Ángel Phillips 2. ---------- 3. Javier Herrera |
11 | April 26 | Performance Analysis Scientific Visualization |
Instructor | |
12 | May 3 | Midterm Exam | ||
13 | May 10 | Algorithms | INVITED ALG1 ALG2 |
- Cristina Soto - Mariela Abdalah - Javier Buzano |
14 | May 17 | Fault Tolerance | INVITED FAULT2 |
- Elvis Rojas - Gabriel Barboza |
15 | May 24 | Job Scheduling | INVITED INVITED SCHED1 SCHED2 |
- Alejandro Morales - Óscar Blandino - Deivid Calvo - Javier Cordero |
16 | May 31 | Architecture | INVITED ARCH1 ARCH2 |
- Diego Jiménez - Javier Herrera - Daniel Piedra |
June 7 | Final Presentations |
Reading List
Introduction
- [INTRO1] Performance vs Programming Effort between Rust and C on Multicore Architectures: Case Study in N-Body (Manuel Costanzo, Enzo Rucci, Marcelo Naiouf, Armando De Giusti - XLVII Latin American Computing Conference -CLEI - 2021)
Programming Models
- [PROGMOD1] Models for practical parallel computation (D. B. Skillicorn - International Journal of Parallel Programming - 1991)
- [PROGMOD2] A Bridging Model for parallel Computation (Leslie G. Valiant – Communications of the ACM – 1990)
- [PROGMOD3] A Survey of Parallel Programming Models and Tools in the Multi and Many-Core Era (Javier Diaz et al – IEEE Transactions on Parallel and Distributed Systems – 2012)
Scalability
- [SCAL1] A Case for NOW (Networks of Workstations) (Thomas E. Anderson, David E. Culler, David A. Patterson - IEEE Micro - 1995)
- [SCAL2] The Landscape of Parallel Computing Research: A View from Berkeley (Krste Asanovic et al - Technical Report, University of California at Berkeley - 2006)
- [SCAL3] A survey of high-performance computing scaling challenges (Al Geist and Daniel A Reed - IJHPCA - 2015)
Interconnects
- [INTER1] Blue Gene/L torus interconnection network (N.R. Adiga et al – IBM Journal of Research and Development – 2010).
- [INTER2] Technology-Driven, Highly-Scalable Dragonfly Topology (John Kim et al – International Symposium on Computer Architecture – 2008)
- [INTER3] Evaluating HPC Networks via Simulation of Parallel Workloads (Nikhil Jain, Abhinav Bhatele, Sam White, Todd Gamblin, Laxmikant V. Kale - ACM/IEEE Supercomputing – 2016)
Performance Models
- [PERFMOD1] Isoefficiency: Measuring the Scalability of Parallel Algorithms and Architectures (Ananth Grama et al – IEEE Concurrency – 1993)
- [PERFMOD2] LogP: A Practical Model of Parallel Computation (David E. Culler et al – Communications of the ACM – 1996)
- [PERFMOD3] Roofline: An insightful Visual Performance model for multicore Architectures (Samuel Williams et al – Communications of the ACM – 2009)
Performance Analysis
- [PERFAN1] COZ: Finding Code that Counts with Causal Profiling (Charlie Curtsinger and Emery D. Berger - ACM Symposium on Operating Systems Principles - 2015)
- [PERFAN2] The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8192 Processors of ASCI Q (Fabrizio Petrini et al – ACM/IEEE Supercomputing – 2003)
- [PERFAN3] Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results (Torsten Hoefler and Roberto Belli – ACM/IEEE Supercomputing – 2015)
Algorithms
- [ALG1] How Much Parallelism is There in Irregular Applications? (Milind Kulkarni et al – ACM Principles and Practices of Parallel Programming – 2009)
- [ALG2] Parallel Random Numbers: As Easy as 1, 2, 3 (John K. Salmon et al – ACM/IEEE Supercomputing – 2011)
- [ALG3] A Parallel Hashed Oct-Tree N-body Algorithm (M.S. Warren and J.K. Salmon – ACM/IEEE Supercomputing – 1993)
Architecture
- [ARCH1] Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU (Victor W Lee et al – International Symposium on Computer Architecture 2008)
- [ARCH2] Demystifying GPU microarchitecture through micro benchmarking (Henry Wong et al – IEEE International Symposium on Performance Analysis of Systems and Software – 2010)
- [ARCH3] 3D-Stacked Memory Architectures for Multi-core Processors (Gabriel H. Loh – International Symposium on Computer Architecture 2008)
Job Scheduling
- [SCHED1] A fair share scheduler (J Kay and P Lauder – Communications of the ACM – 1988)
- [SCHED2] A Comparative Study of Job Scheduling Strategies in Large-scale Parallel Computational Systems (Aftab Ahmed Chandio et al - IEEE International Conference on Trust, Security and Privacy in Computing and Communications - 2013)
- [SCHED3] Backfilling using system-generated predictions rather than user runtime estimates (D. Tsafrir, Y. Etsion, and D. Feitelson – IEEE Transactions on Parallel and Distributed Systems – 2007)
Fault Tolerance
- [FAULT1] Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System (Adam Moody et al – IEEE/ACM Supercomputing – 2010)
- [FAULT2] Assessing Fault Sensitivity in MPI Applications (Charng-da Lu and Daniel A. Reed – ACM/IEEE Supercomputing – 2004)
- [FAULT3] Reliability Lessons Learned From GPU Experience With The Titan Supercomputer at Oak Ridge Leadership Computing Facility (Devesh Tiwari et al – ACM/IEEE Supercomputing – 2015)
Class Project
Final report sample: paper
An interesting extension to a class project
Prof. Martin Schulz's list of research ideas
Prof. Abhinav Bhatele's list of research ideas
Prof. Esteban Meneses's list of research ideas
Publication Venues
- Conferencia Latinoamericana de Estudios en Informática (CLEI) 2021
https://clei2021.cr/home - Latin America High Performance Computing Conference (CARLA) 2021
http://carla2021.org
Resources
Prof. Gupta’s tips on presentations and reviews
Additional References
- [Accelerators] An Adaptive Performance Modeling Tool for GPU Architectures (Sara S. Baghsorkhi et al – ACM Principles and Practices of Parallel Programming – 2010)
- [Accelerators] A Unified Programming Model for Intra- and Inter-Node Offloading on Xeon Phi Clusters (Matthias Noack et al – ACM/IEEE Supercomputing – 2014)
- [Accelerators] GPUs and the future of parallel computing (Stephen W Keckler et al – IEEE Micro – 2011) PDF
- [Accelerators] A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors (Mark Gebhart et al – ACM Transaction on Computer Systems – 2012) PDF
- [Algorithms] Millisecond-Scale Molecular Dynamics Simulations on Anton (David E. Shaw et al – ACM/IEEE Supercomputing – 2009).
- [Algorithms] Data Parallel Algorithms (W. Daniel Hillis and Guy L. Steele – Communications of the ACM – 1986).
- [Algorithms] Development of Parallel Methods for a 1024-processor Hypercube (John L. Gustafson et al – SIAM Journal on Scientific and Statistical Computing – 1988).
- [Algorithms] Highly Scalable Parallel Algorithms for Sparse Matrix Factorization (Anshul Gupta et al – IEEE Transactions on Parallel and Distributed Systems – 1997).
- [Algorithms] SUMMA: scalable universal matrix multiplication algorithm (R. A. van de Geijn and J Watts – Concurrency: Practice and Experience – 1997).
- [Algorithms] Faster Topology-aware Collective Algorithms Through Non-minimal Communication (Paul Sack and William Gropp – ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming – 2012)
- [Architecture] The MIPS R10000 Superscalar Microprocessor (Kenneth C. Yeager – IEEE Micro – 1996).
- [Architecture] Designing reliable systems from unreliable components: the challenges of transistor variability and degradation (Shekhar Borkar – IEEE Micro – 2005)
- [Architecture] The Stanford DASH Multiprocessor (Daniel Lenoski et al – IEEE Computer – 1992).
- [Architecture] Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor (Dean M. Tullsen et al – ISCA – 1996).
- [Architecture] From Microprocessors to Nanostores: Rethinking Data-Centric Systems (Parthasarathy Ranganathan – IEEE Computer Magazine – 2011)
- [Architecture] IBM POWER7 multicore server processor (B. Sinharoy et al – IBM Journal of Research and Development – 2011)
- [Cloud Computing] Cloud-driven HPC (Amazon Web Services – HPC Wire – 2014) PDF
- [Cloud Computing] Above the Clouds: A Berkeley View of Cloud Computing (Michael Armbrust et al – White Paper)
- [Cloud Computing] MapReduce: Simplied Data Processing on Large Cluster (Jeffrey Dean and Sanjay Ghemawat – USENIX Symposium on Operating Systems Design & Implementation – 2004)
- [Cloud Computing] Improving MapReduce Performance in Heterogeneous Environments (Matei Zaharia et al – USENIX Symposium on Operating Systems Design & Implementation – 2008)
- [Epidemic Simulations] EpiSimdemics: an efficient algorithm for simulating the spread of infectious disease over large realistic social networks (Christopher L Barrett et al – ACM/IEEE Supercomputing – 2008)
- [Epidemic Simulations] Overcoming the Scalability Challenges of Epidemic Simulations on Blue Waters (Jae-Seung Yeom et al – IEEE International Parallel and Distributed Processing Symposium – 2014)
- [Epidemic Simulations] PREEMPT: Scalable Epidemic Interventions Using Submodular Optimization on Multi-GPU Systems (Marco Minutoli et al – ACM/IEEE Supercomputing – 2020)
- [Fault Tolerance] MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes (George Bosilca et al – IEEE/ACM Supercomputing – 2002)
- [Fault Tolerancre] Using Migratable Objects to Enhance Fault Tolerance Schemes in Supercomputers (Esteban Meneses et al – IEEE Transactions on Parallel and Distributed Systems – 2014)
- [Fault Tolerance] Containment Domains: A Scalable, Efficient, and Flexible Resilience Scheme for Exascale Systems (Jinsuk Chung et al – IEEE/ACM Supercomputing – 2012)
- [Fault Tolerance] Diskless Checkpointing (James S. Plank et al – IEEE Transactions on Parallel and Distributed Systems – 1998)
- [Graph Processing] Pregel: a system for large-scale graph processing (Grzegorz Malewicz et al – ACM SIGMOD International Conference on Management of Data – 2010)
- [Graph Processing] GraphX: Graph Processing in a Distributed Dataflow Framework (Joseph Gonzalez et al – USENIX Symposium on Operating Systems Design & Implementation – 2014) PDF
- [Interconnects] Fat-trees: Universal Networks for Hardware-efficient Supercomputing (Charles E. Leiserson – IEEE Transactions on Computers – 1985).
- [Interconnects] Communication Requirements and Interconnect Optimization for High-End Scientific Applications (Shoaib Kamil et al – IEEE Transactions on Parallel and Distributed Systems – 2009)
- [Interconnects] A Survey of Wormhole Routing Techniques in Direct Networks (Lionel M. Ni and Philip McKinley – IEEE Computer – 1993).
- [Interconnects] A Comparative Study of Topology Design Approaches for HPC Interconnects (Md Atiqul Mollah et al - CCGRID - 2018).
- [Interconnects] Adaptive Routing in High-Radix Clos Network (John Kim et al – ACM/IEEE Supercomputing – 2006)
- [Interconnects] Deadlock-free Adaptive Routing in Multicomputer Networks Using Virtual Channels (William J. Dally and Hiromichi Aoki – IEEE Transactions on Parallel and Distributed Systems – 1993).
- [Interconnects] There Goes the Neighborhood: Performance Degradation due to Nearby Jobs (Abhinav Bhatele et al - ACM/IEEE Supercomputing – 2013)
- [Introduction] How Will Rebooting Computing Help IoT? (Bichlien Hoang and Sin-Kuen Hawkins) PDF
- [Introduction] Exascale Computing and Big Data (Daniel A. Reed and Jack Dongarra) HTML
- [Languages] OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization (Seyong Lee et al – ACM Principles and Practices of Parallel Programming – 2009) PDF
- [Languages] Compilers and More: The Past, Present and Future of Parallel Loops (Michael Wolfe – HPC Wire – 2015) HTML
- [Languages] Compilers and More: MPI+X (Michael Wolfe – HPC Wire – 2014) HTML
- [Load Balancing] Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors (Sanjeev Kumar et al – ISCA – 2007).
- [Load Balancing] The Implementation of the Cilk-5 Multithreaded Language (Matteo Frigo et al – PLDI -1998).
- [Load Balancing] A dynamic scheduling strategy for the Chare-Kernel system (Wennie Shu and Laxmikant V. Kale – IEEE/ACM Supercomputing – 1989).
- [Memory Consistency] Cohesion: a hybrid memory model for accelerators (John H. Kelm et al – ISCA – 2010).
- [Memory Consistency] Comparative Evaluation of Fine- and Coarse-Grain Approaches for Software Distributed Shared Memory (Sandhya Dwarkadas et al – HPCA – 1999).
- [Memory Consistency] Cooperative Shared Memory: Software and Hardware for Scalable Multiprocessors (Mark D. Hill et al – ACM Transactions on Computer Systems – 1993).
- [Memory System Design] Sequoia: Programming the Memory Hierarchy (Kayvon Fatahalian et al – IEEE/ACM Supercomputing – 2006).
- [Memory System Design] On-chip Memory System Optimization Design for FT64 Scientific Stream Accelerator (Mei Wen et al – IEEE MICRO – 2008).
- [Memory System Design] Comparing Memory Systems for Chip Multiprocessors (Jacob Leverich et al – ISCA – 2007).
- [Performance Models] Characterizing the Influence of System Noise on Large-Scale Applications by Simulation (Torsten Hoefler et al – IEEE/ACM Supercomputing – 2010 ) PDF
- [Programming Models] Parallel Programmability and the Chapel Language (Brad Chamberlain et al – International Journal of High Performance Computing Applications – 2007) PDF ALT
- [Programming Models] The Foundations for Scalable Multi-core Software in Intel® Threading Building Blocks (Alexey Kukanov et al – Intel Technology Journal – 2007)
- [Programming Models] Stream Processors: Programmability with Efficiency (William J. Dally et al – ACM Queue – 2004)
- [Programming Models] Using Simple Abstraction to Reinvent Computing for Parallelism (Uzi Vishkin - Communications of the ACM - 2011)
- [Scheduling] Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling (A. Mu’alem and D. Feitelson – IEEE Transactions on Parallel and Distributed Systems – 2001)
- [Scheduling] Core Algorithms of the Maui Scheduler (D. Jackson, Q. Snell, and M. Clement – International Workshop on Job Scheduling Strategies for Parallel Processing – 2001)
Benchmarks
- HPC Graph Analysis HTML