Harvard CS 242
Mondays and Wednesdays from 2:30-4 pm, Maxwell Dworkin G125
Instructor: H.T. Kung
Scaling computation over parallel and distributed computing systems is a rapidly advancing area of research receiving high levels of interest from both academia and industry. The objective can be scaling up computation size, e.g., for deep neural networks, with only modestly increased energy consumption. The objective can also be scaling down energy usage, e.g., for the Internet of Things (IoT) devices, with only modestly decreased outcome quality. To this end, in this course students will learn principled methods of mapping prototypical computations used in machine learning, IoT, and scientific computing onto parallel and distributed compute nodes of various forms. To master the subject, students will need to appreciate the close interactions between computational algorithms, computer organizations and software abstractions. After having successfully taken this course, students will acquire an integrated understanding of these issues.
The class will be organized into the following modules:
- Big picture: use of parallel and distributed computing to scale computation size and energy usage
- End-to-end example 1: mapping nearest neighbor computation onto parallel computing units in the forms of CPU, GPU, ASIC and FPGA
- Communication and I/O: latency hiding with prediction, computational intensity, lower bounds
- Computer architectures and implications to computing: multi-cores, CPU, GPU, clusters, accelerators, and virtualization
- End-to-end example 2: mapping convolutional neural networks onto parallel computing units in the forms of CPU, GPU, ASIC, FPGA and clusters
- Great inner loops and parallelization for feature extraction, data clustering and dimension reduction: PCA, random projection, clustering (K-means, GMM-EM), sparse coding (K-SVD), compressive sensing, FFT, etc.
- Software abstractions and programming models: MapReduce (PageRank, etc.), GraphX/Apache Spark, OpenCL and TensorFlow
- Advanced topics: autotuning and neuromorphic spike-based computing
Students will learn the subject through lectures/quizzes, programming assignments, research paper presentations, and a final project. Students will have latitude in choosing a final project they are passionate about. They will formulate their projects early in the course, so there will be sufficient time for discussion and iterations with the teaching staff, as well as for system design and implementation. Industry partners will support the course by giving guest lectures and providing resources.
The course will use server clusters at Harvard as well as external resources in the cloud. In addition, labs will have access to state-of-the-art IoT devices and 3D cameras for data acquisition. Students will use open source tools and libraries and apply them to data analysis, modeling, and visualization problems.
Prerequisites: (1) programming experience (Python, MatLab or C/C++ should be fine); (2) basic knowledge in systems and machine organization; (3) familiarity in data structures and algorithms; and (4) maturity in mathematics (e.g., undergraduate linear algebra and statistics). For students with strong interest in the subject matter and related research topics, one of these four requirements may be waived. Labs and extra support will provide preparation in the first weeks of the semester to help students quickly obtain the background necessary to excel in the course.