| 
          
         David Brooks
         
        Haley Family Professor   of Computer Science 
        
	  
	    Maxwell Dworkin 141 
            33 Oxford Street 
            Cambridge MA 02138 
	     
	    Phone: 617-495-3989 
            Fax: 617-496-6404 
             
            E-mail:  dbrooks@eecs.harvard.edu
	  
	 
        
        Syllabus 
        Meeting time: 
	  
	    Friday 9:00–11:45AM 
	    MD 119
	  
	 
         
	
         | 
      Introduction
         The class will review
        fundamental structures in modern microprocessor and computer system
        architecture design.  Tentative topics will include computer
        organization, instruction set design, memory system design, pipelining,
        and other techniques to exploit parallelism.  We will also cover
        system level topics such as storage subsystems and basics of
        multiprocessor systems.  The class will focus on quantitative
        evaluation of design alternatives while considering design metrics such
        as performance and power dissipation.
         Prerequisites 
         CS 141 (Computing Hardware) or equivalent, C Programming  
        Textbook 
        Textbook: “Computer Architecture: A Quantitative Approach,” Third Edition, 
        John L. Hennessy and David A. Patterson, ISBN 1-55860-596-7A 
        Course Readings 
        
        
          
              | 
            
               Lecture 1:
              Introduction to Computer Architecture
               
               | 
           
          
              | 
            
               Lecture 2: CPU
              Performance and Metrics
               
               | 
           
          
              | 
            
               Lecture 3:
              Instruction Set Architecture
               
                
                    | 
                  
                     Readings: 
                    Ruby B. Lee, "Subword Parallelism with MAX-2,"  IEEE Micro, 16(4),August 1996, pp. 51-59.  | 
                 
                
                    | 
                  
                     Class
                    Notes  | 
                 
               
               | 
           
          
              | 
            
                     Homework
                    1  | 
           
          
              | 
            
                     Lecture 4:
                    Implementation and Pipelining
               
               | 
           
          
              | 
            
                     Lecture
                    5: Exceptions, Multi-cycle Ops, Dynamic Scheduling
               
               | 
           
          
              | 
            
                     Lecture
                    6: Scoreboarding Example, Tomasulo's Algorithm
               
               | 
           
          
              | 
            
                     Lecture
                    7: Dynamic Branch Prediction
               
                
                    | 
                  Readings:
                    Tse-Yu Yeh, Yale N. Patt, "A Comparison of Dynamic Branch Predictors
                    that use Two Levels of Branch History," The 20th International Symposium
                    on Computer Architecture, May, 1993.   | 
                 
                
                    | 
                  Class
                    Notes  | 
                 
               
               | 
           
          
              | 
            Lecture
              8: Multiple Issue and Speculation
               
                
                    | 
                  
                     Class
                    Notes  | 
                 
                
                    | 
                  
                     Readings:
                    G. S. Sohi  and S. Vajapeyam, "Instruction Issue Logic for  
                    High-performance, Interruptable Pipelined 
Processors," International Symposium on Computer Architecture, 1987.  | 
                 
                
                    | 
                  
                     Readings:
                    J. E. Smith and A. Pleszkun, "Implementing Precise Interrupts in
                    Pipelined Processors," IEEE Transactions on Computers, Volume 37,
                    Issue 5  (May 1988).  | 
                 
               
               | 
           
          
              | 
            
                     Homework
                    2  | 
           
          
              | 
            
                     Lecture
                    9: Limits of ILP, Case Studies
               
                
                    | 
                  
                     Class
                    Notes  | 
                 
                
                    | 
                  
                     Readings:
                    David W. Wall, "Limits of instruction-level parallelism," Architectural
                    Support for Programming Languages and Operating Systems (ASPLOS) 1991.  
                      | 
                 
                
                    | 
                  
                     Readings:
                    Subbarao Palacharla, Norman P. Jouppi, James E. Smith,  
                    "Complexity-Effective Superscalar Processors," 24th International Symposium on Computer Architecture
                    (ISCA-24), June 1997.  | 
                 
                
                    | 
                  
                     Readings:
                    Eric Rotenberg, Steve Bennett, J. E. Smith, "Trace Cache: A Low Latency
                    Approach to High Bandwidth Instruction Fetching," 29th International Symposium on
                    Microarchitecture (MICRO-29), Dec 1996.  | 
                 
               
               | 
           
          
            
              | 
            
               Lecture 10: Static Scheduling, Loop 
Unrolling, and Software Pipelining
               
               | 
           
              | 
            
                     Homework 
3  | 
           
            
              | 
            
               Lecture 11: Software Pipelining and Global
Scheduling
               
               | 
           
            
              | 
            
                     
                    Sample Midterm from Fall 2017  | 
           
          
            
              | 
            
               Lecture 12: Hardware Assisted Software ILP
and IA64/Itanium Case Study
               
               | 
           
            
              | 
            
               Lecture 14: Introduction to Caches
               
               | 
           
            
              | 
            
               Lecture 15: More on Caches
               
               | 
           
            
              | 
            
                     
                    Homework 4  | 
           
            
              | 
            
               Lecture 16: More on Caches
               
               | 
           
            
              | 
            
               Lecture 17: Main Memory
               
               | 
           
            
              | 
            
               Lecture 18: Virtual Memory
               
               | 
           
            
              | 
            
               Lecture 19: Multiprocessors
               
               | 
           
            
              | 
            
                     
                    Homework 5  | 
           
          
            
              | 
            
               Lecture 20: More Multiprocessors
               
                
                  
                    | 
                  
                     
Class Notes  | 
                 
                
                  
                    | 
                  
                     
                    Readings
Simultaneous Multithreading: Maximizing On-Chip Parallelism, D.M. Tullsen, S.J. Eggers, and H.M. Levy, In 22nd Annual
International Symposium on Computer Architecture, June, 1995.
  | 
                 
                
                  
                    | 
                  
                     
                    Readings
L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese.
Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing. In Proceedings of the 27th Annual
International Symposium on Computer Architecture (ISCA'00), June 2000.
  | 
                 
               
               | 
           
            
              | 
            
               Lecture 21: Multithreading and I/O
               
               | 
           
            
              | 
            
               Lecture 22: More I/O
               
               | 
           
            
              | 
            
               Lecture 23: Clusters and Wrapup
               
                
                  
                    | 
                  
                     
Class Notes  | 
                 
                
                  
                    | 
                  
                     
                    Readings
L. Barroso, J. Dean, and U. Holzle, "Web search for a planet: The Google
Cluster Architecture," IEEE Micro, 23, 2, March-April 2003, pp. 22-28.
  | 
                 
               
               | 
          
        
                 |