

# **FACULTY OF ENGINEERING & TECHNOLOGY**

Effective from Academic Batch: 2022-23

Programme: Bachelor of Technology (Information Technology)

Semester: VIII

**Course Code:** 202047806

Course Title: High Performance Computing

Course Group: Professional Elective Course - V

**Course Objectives:** This course provides an overview and fundamentals of the design of high-performance computing (HPC) systems and to understand the means by which to measure, assess and analyze the performance of HPC applications. In addition, students will also learn parallel models of computation.

**Teaching & Examination Scheme:** 

| Conta        | Contact hours per week |           | Course  | Examination Marks (Maximum / Passing) |          |          |          |        |
|--------------|------------------------|-----------|---------|---------------------------------------|----------|----------|----------|--------|
| Locturo      | Tutorial               | Dractical | Credits | The                                   | eory     | J/V      | //P*     | Total  |
| Lecture Tuto | Tutoriai               | Practical |         | Internal                              | External | Internal | External | Total  |
| 3            | 0                      | 2         | 4       | 50/18                                 | 50/17    | 25/9     | 25/9     | 150/53 |

<sup>\*</sup> J: Jury; V: Viva; P: Practical

Detailed Syllabus

| Sr. | Contents                                                                          | Hours |  |  |  |
|-----|-----------------------------------------------------------------------------------|-------|--|--|--|
| 1   | Introduction to High Performance Computing (HPC):                                 | 07    |  |  |  |
|     | What is HPC? Motivation, Applications, Challenges, Communication (Shared          |       |  |  |  |
|     | Address Space vs. Message Passing), Parallel Processing Concepts (Quick           |       |  |  |  |
|     | Overview), Levels of parallelism (instruction, transaction, task, thread, memory, |       |  |  |  |
| h   | function), Models (SIMD, MIMD, SIMT, SPMD, Dataflow Models, Demand-driven         |       |  |  |  |
|     | Computation etc), Architectures: N-wide superscalar architectures, multi-core,    |       |  |  |  |
|     | multi-threaded                                                                    |       |  |  |  |
| 2   | HPC Paradigms:                                                                    |       |  |  |  |
|     | Supercomputing, Cluster Computing, Grid Computing, Cloud Computing, many          |       |  |  |  |
| T   | core Computing, Petascale Systems, Memory Hierarchies, Multi-core Processors:     |       |  |  |  |
| 4   | Homogeneous and Heterogeneous, Shared-memory Symmetric Multiprocessors,           |       |  |  |  |
| 1 4 | Vector Computers, Distributed Memory Computers, Application Accelerators /        |       |  |  |  |
|     | Reconfigurable Computing, Novel computers: Stream, multithreaded, and             |       |  |  |  |
| L   | purpose-built                                                                     |       |  |  |  |



| 3 | Parallel Algorithms:                                                          | 10 |  |  |  |
|---|-------------------------------------------------------------------------------|----|--|--|--|
|   | Parallel models: ideal and real frameworks, Basic Techniques: Balanced Trees, |    |  |  |  |
|   | Pointer Jumping, Divide and Conquer, Partitioning, Regular Algorithms: Matrix |    |  |  |  |
|   | operations and Linear Algebra, Irregular Algorithms: Lists, Trees, Graphs,    |    |  |  |  |
|   | Randomization: Parallel Pseudo-Random Number Generators, Sorting, Monte       |    |  |  |  |
|   | Carlo techniques                                                              |    |  |  |  |
| 4 | Parallel Programming:                                                         | 07 |  |  |  |
| 1 | Introduction to OpenMP, Parallel constructs, Runtime Library routines, Work-  |    |  |  |  |
|   | sharing constructs, Introduction to GPU Computing, CUDA Programming Model,    |    |  |  |  |
|   | CUDA Memory Model, Shared Memory Matrix Multiplication, Additional CUDA       |    |  |  |  |
|   | API Features                                                                  |    |  |  |  |
| 5 | Message Passing Interface:                                                    | 07 |  |  |  |
|   | Overview of MPI, MPI Constructs, OpenMP vs MPI. MPI Introduction. Collective  |    |  |  |  |
|   | communication. Data grouping for communication. Distributed memory parallel   |    |  |  |  |
|   | programming with MPI: message passing - introduction to MPI - example -       |    |  |  |  |
|   | messages and point-to-point communication - collective communication -        |    |  |  |  |
|   | nonblocking point-to-point communication- virtual topologies - MPI            |    |  |  |  |
|   | parallelization of Jacobi solver- MPI implementation - performance properties |    |  |  |  |
|   | Total                                                                         | 41 |  |  |  |

**List of Practicals / Tutorials:** 

|     | 19 - 1 1 1 1                                                                          |  |  |  |  |
|-----|---------------------------------------------------------------------------------------|--|--|--|--|
| 1   | Multi-Threaded Hello World parallel programs.                                         |  |  |  |  |
| 2   | Addition of two arrays using parallel programming.                                    |  |  |  |  |
| 3   | Matrix multiplication using parallel programs.                                        |  |  |  |  |
| 4   | Vector multiplication using parallel programs.                                        |  |  |  |  |
| 5   | Configure MPI and test the MPI Interface.                                             |  |  |  |  |
| 6   | Configure OpenMPI interface and perform Hello World Program.                          |  |  |  |  |
| 7   | Write a program to perform Matrix multiplication using OpenMPI.                       |  |  |  |  |
| 8   | Write a program to perform Matrix multiplication using Message Passing Interface with |  |  |  |  |
|     | size of matrix is 1000.                                                               |  |  |  |  |
| 9 / | Study of CUDA architecture and Hello World program Using CUDA Programming.            |  |  |  |  |
| 10  | Study of various GPU architecture and test performance on different GPU models.       |  |  |  |  |
| 11  | Performance analysis of matrix multiplication on different GPU architectures.         |  |  |  |  |
|     |                                                                                       |  |  |  |  |

### **Reference Books:**

| 1 | Georg Hager, Gerhard Wellein, Introduction to High Performance Computing for Scientists |  |  |  |
|---|-----------------------------------------------------------------------------------------|--|--|--|
|   | and Engineers, Chapman & Hall / CRC Computational Science series, 2011.                 |  |  |  |
| 2 | Victor Eijkhout, Edmond Chow, Robert van de Geijn, Introduction to High Performance     |  |  |  |
|   | Scientific Computing, 2nd edition, revision 2016                                        |  |  |  |
| 3 | Introduction to parallel computing, By Ananth Grama (st. al), Addison-Wesley 2nd ed.    |  |  |  |
|   | (2003), ISBN 0201648652                                                                 |  |  |  |



| 4 | Barbara Chapman, Gabriele Jost, Ruud van der Pas, Using OpenMP, MIT Press, 2008.         |  |  |  |
|---|------------------------------------------------------------------------------------------|--|--|--|
| 5 | Gropp, Lusk, Skjellum, Using MPI, Using MPI, 2014.                                       |  |  |  |
| 6 | Recent publications in IPDPS, PACT, and similar.                                         |  |  |  |
| 7 | Wen-Mei W Hwu, David B Kirk, Programming Massively Parallel Processors A Hands-on        |  |  |  |
|   | Approach, Morgann Kaufmann, 3e                                                           |  |  |  |
| 8 | Rob Farber, CUDA Application Design and Development, Morgan Kaufmann Publishers, 2013    |  |  |  |
| 9 | Zbigniew J. Czech, Introduction to parallel computing, 2nd edition, Cambridge University |  |  |  |
|   | Press,2016                                                                               |  |  |  |

| Sup | Supplementary learning Material:                                                  |  |  |  |  |  |
|-----|-----------------------------------------------------------------------------------|--|--|--|--|--|
| 1   | NPTEL - https://nptel.ac.in/courses/106108055                                     |  |  |  |  |  |
|     | https://onlinecourses.nptel.ac.in/noc20_me61/preview                              |  |  |  |  |  |
| 2   | Coursera - https://www.coursera.org/learn/introduction-high-performance-computing |  |  |  |  |  |

#### **Pedagogy:**

- Direct classroom teaching
- Audio Visual presentations/demonstrations
- Assignments/Quiz
- Continuous assessment
- Interactive methods
- Seminar/Poster Presentation
- Industrial/Field visits
- Course Projects

# Suggested Specification table with Marks (Theory) (Revised Bloom's Taxonomy):

| Distribution of Theory Marks in % |     |     |     |     |   | R: Remembering; U: Understanding; A: Applying; |
|-----------------------------------|-----|-----|-----|-----|---|------------------------------------------------|
| R                                 | U   | A   | N   | E   | C | N: Analyzing; E: Evaluating; C: Creating       |
| 10%                               | 25% | 25% | 10% | 30% |   |                                                |

Note: This specification table shall be treated as a general guideline for students and teachers. The actual distribution of marks in the question paper may vary slightly from above table.

## **Course Outcomes (CO):**

| Sr.         | Course Outcome Statements                                                                                                   | %weightage |  |  |
|-------------|-----------------------------------------------------------------------------------------------------------------------------|------------|--|--|
| CO-1        | To design, formulate, solve, and implement high performance versions of standard single threaded algorithms                 | ions 20    |  |  |
| CO-2        | To demonstrate the architectural features in the GPU and MIC hardware accelerators.                                         |            |  |  |
| CO-3        | To develop programs to extract maximum performance in a multicore, shared memory execution environment processor.  25       |            |  |  |
| <b>CO-4</b> | To analyze and deploy large scale parallel programs on tightly coupled parallel systems using the message passing paradigm. | 25         |  |  |



| Curriculum Revision:           |           |  |  |  |
|--------------------------------|-----------|--|--|--|
| Version:                       | 2.0       |  |  |  |
| Drafted on (Month-Year):       | June-2022 |  |  |  |
| Last Reviewed on (Month-Year): | -         |  |  |  |
| Next Review on (Month-Year):   | June-2025 |  |  |  |