Harvard CS 242
Mondays and Wednesdays from 3:45-5:00 pm, SEC building in Allston, MA
Instructor: H.T. Kung
Scaling computation over parallel and distributed accelerators is a rapidly advancing area of research receiving high levels of interest from academia and industry. A recent focus is to increase speed and energy efficiency for deep learning computations in data center servers and small embedded devices. The industry is building a variety of innovative accelerators in this area.
In this course, students will learn principled methods of mapping computations used in machine learning and scientific computing onto parallel and distributed compute nodes implemented with CPU, GPU, FPGA, and ASIC. These mapping techniques lay the foundation for understanding and designing modern computing libraries and architectures for high-performance and energy‐efficient systems and devices. Students will implement these mapping techniques for fast and efficient machine learning computations. At the end of the course, students will know how to co-design machine learning models, computational algorithms, software abstractions, and underlying computing systems.
The class has the following modules:
- Parallelization. Map deep neural networks computation onto parallel and distributed computing units.
- Compression. Quantize and prune neural networks, leverage low-precision arithmetic operations and stochastic rounding, use low-rank filters for approximate pattern matching, and apply knowledge distillation to derive lightweight models.
- Data reuse. Reuse local data to lessen memory access and communication costs.
- Collaborative learning. Use federated learning and simulation for collaboration among distributed clients.
- Porting accelerators. Partition large-scale computation into small chunks and schedule them for execution on fixed-size accelerators.
- In-hardware acceleration. Explore new opportunities of using emerging memory and 3D packaging technologies for accelerators.
Students will learn the subject through a deep dive into recent research papers as well as lectures, quizzes, programming assignments, and a final project.
Students will have the freedom to choose a final project they are passionate about. They will formulate their projects early in the course, so there will be sufficient time for discussion and iterations with the teaching staff and system design and implementation.
The course will use server clusters at Harvard as well as external computing resources in the cloud.
Industry partners may support the teaching by giving guest lectures and advice to student projects.
The course provides a database for course materials. The database contains hundreds of articles that the course instructor has collected and tagged. Students with Harvard email addresses can access the database at https://htk.redsox.jetos.com/alpha/login.php
Recommended Prep/Requirements: (1) programming experience (Python, MatLab, or C/C++ should be OK); (2) basic knowledge in computing systems (e.g., CPU and GPU) and machine learning (e.g., convolutional neural networks); (3) familiarity in data structures; and (4) maturity in mathematics (e.g., undergraduate linear algebra and statistics).
The course instructor may waive some of these requirements for students with a strong interest in the subject matter and related work or research experiences. Labs in the first weeks of the semester will help students quickly obtain the background necessary to excel in the course.