CS 222 -- Algorithms at The Ends of the Wire

Preliminary Syllabus for 2016

Instructor: Michael Mitzenmacher
E-mail: michaelm AT eecs.harvard.edu
Office: Maxwell Dworkin 331
Phone: 496-7172
Office Hours: Tuesday/Thursday 11:30-12:30 (subject to change, depending on conflicts). Or by appointment.

Teaching Assistant: TBD
E-mail: TBD
Office Hours: TBD

Syllabus: www.eecs.harvard.edu/~michaelm/CS222/syllabus.html

Handouts: www.eecs.harvard.edu/~michaelm/CS222/class.html


This course is loosely based on the theme of how to deal with really big data, especially over networks. The topics change from year to year, and the below is subject to change. The course will consist of multiple independent units, covering the major themes of information retrieval (search engines), compression, data summarization algorithms, and coding theory. Although the course will emphasize theoretical foundations, it will definitely be a mix of both theory and practice, and current issues will also be emphasized. The course is also meant to promote skills required of graduate students, such as criticial and creative reading and analysis of papers, and research. The main work of the course will consist of the following: reading and analyzing a number of current and classic research papers; homework assignments based on the material; participating in a mock program committee; and undertaking a final research project.

During the semester, you will frequently be reading essentially 2 research papers to prepare for each class. This is more work than it sounds like! You must come to class prepared consistently; if your schedule will not permit that, you should not take the class.

There will be 3 or 4 homework assignments; I expect to have one for each unit. These are described more below.

I'm hoping to try something new this year, which is to run a "mock program committee". I will choose papers from recent conferences (so ostensibly the are already good) thematically related to the course, and the class will act as a program committee to choose the best ones. This will give you an idea of how program committees work (or don't), and let you see some more up-to-date research, as class reading will be more focused on "classics". I warn again, this is new and untried.

Finally, a major component of the class will be a final project, which you will work on for approximately the last 2 months of the course. The hope is that this project may form the foundation of either a research paper or, for undergraduates, a senior thesis. Although you will need to obtain approval for your project choice, the topic of the final project will primarily be up to you. This project can either be theoretical or implementation based in nature. Generally people work in pairs for the final project, but this is not required.


Students should have taken at least CS 124 or its equivalent. Students should be able to program in a standard programming language; C or C++ is preferred. Knowledge of probability will be extremely helpful but not required. Generally, mathematics will be fundamental to the course, so you should expect to spend time learning some additional mathematics on your own if necessary. Similarly, some knowledge of networks and network issues will be very helpful. For students wishing to review important aspects of probability, there are many books available. Sheldon Ross has written several excellent introductory books which should be available in the library. My personal favorite is "Introduction to Probability Models." A more advanced book for those with more background is "Elements of Information Theory" by Cover and Thomas. Another good book is "Information Theory, Inference, and Learning Algorithms" by David Mackay, which has the benefit of being online: This link should work. Of course, my completely biased opinion is that the best book for a computer scientist to buy is by Mitzenmacher and Upfal, "Randomized Algorithms and Probabilistic Analysis." I'd recommend most students get one (or more) of these books as a reference.


Your performance will be measured in four ways. (The percentage contributions to your grade given below are approximate and subject to change.)

At this time there is no intention to hold a final or midterm exam.

All assignments will be due at the beginning of class on the appropriate day. Late assignments are not acceptable without the prior consent of the instructor. Consent will generally only be given for significant events or emergencies. Being busy in other classes is not a significant event.

Collaboration policy

I would like to emphasize the rules on working with others on homework assignments. For problem sets, limited collaboration in planning and thinking through solutions to homework problems is allowed, but no collaboration is allowed in writing up solutions. You are allowed to work with a few other students currently taking the class in discussing, brainstorming, and verbally walking through solutions to homework problems. But when you are through talking, you must write up your solutions independently and may not check them against each other. There may be no passing of homework papers between collaborators; nor is it permissible for one person simply to tell another the answer. Paper summaries are meant to be done almost entirely on your own. Brief discussions with other students after you have all read the paper are permissible, but the paper summary should be entirely your own work.

If you collaborate with other students in the course in the planning and design of solutions to homework problems, then you should give their names on your homework papers.

Under no circumstances may you use solution sets to problems that may have been distributed by the course in past years, or the homework papers of students who have taken the course past years. Nor should you look up solution sets from other similar courses.

Violation of these rules may be grounds for giving no credit for a homework paper and also for serious disciplinary action.

Required Text

Generally papers will be made available either in class or at the class Web site. No text is required.

Class Information/Notes

Class notes, homework assignments, and other information will be made available on the Web. For access go to the class web site. We will also use Piazza to disseminate further information. In many cases, the class web site may be the only location where information is posted or available, so look in from time to time!

Incomplete List of Possible Topics

The following is an incomplete list of topics based on previous course offerings. Again, the topics may change somewhat for the coming year.