CS 222 -- Algorithms at the End of the Wire
Handouts and Class Materials
Announcements
- [10/24] Please remember to sign into the mock trial system and put in paper preferences. See timeline in Assignments below.
- [9/8] Remember Monday's class 9/12 will be by zoom. The link is available in Canvas. Class will be in person 9/14. 9/19 is not clear at this point; I may switch things around and have a class reading + video instead if I can't make it back.
- [9/8] People have asked about projects. I'm purposely not getting into details yet -- I'd like people to read papers, think a bit about possible topics, etc. -- but we will start discussing at the end of the month. In broad strokes you will first provide a project proposal (providing your team, a description of what you want to do for your project, a preliminary look at recent work), and the end result will be a paper, roughly 15-20 pages in length. I'm putting up the project description from last year so that you can have a look if you like -- it will be essentially the same (dates will change). See assignments below.
- [8/25] Read the Bloom filter paper for sure before the first class.
- [7/30] We will plan to use Gradescope+Canvas to turn in assignments. Please make sure you are
signed up to Gradescope+Canvas. (This probably won't be set up for a couple weeks into the class.)
Basic Class Information
Assignments
Readings
All reading dates are tentative and will be confirmed in class.
We may go faster, we may go slower. We may have to move things
for other reasons.
You should consider the assigned papers a minimum of what you should be
reading for this class. Feel free to explore on the Web or otherwise
(and additional suggested readings for each topic will be listed as well). We are just
touching the surface of these topics; there's much more out there.
Unit 0: Fun Stuff to Start us off
Please read all these, preferably before class begins, to see if you're interested.
Unit 1: Data Sketches (and Using Predictions)
- Class 1 [8/31]: Class Introduction (syllabus, expectations, etc.)
- Class 1 [8/31]:
Network Applications of Bloom Filters: A Survey. by Broder and Mitzenmacher.
- Class Null [9/5]: Labor Day Holiday.
- Class 2 [9/7]: Approximate counting sketches
- Class 2 [9/7]:
New directions in traffic measurement and accounting: focusing
on the elephants, ignoring the mice by Estan and Varghese.
- Class 2 [9/7]:
The count-min sketch and its applications by
Cormode and Muthukrishnan.
- Discussion Questions: Compare contrast the mice/elephants paper and the count-min sketch
paper? How do they describe and define the underlying problem(s) they are considering?
How do they formalize their solution(s)? How do they compare?
- Class 3 [9/12]: Algorithms with Predictions Introduction
- Class 3 [9/12]:
The Case for Learned Index Structures
by Kraska, Beutel, Chi, Dean, and Polyzotis.
- Class 3 [9/12]:
Algorithms with Predictions
by Mitzenmacher and Vassilvitskii.
- Discussion Questions: Explain, in your words, how learning can be used to improve algorithms and data structures. What are the goals of this approach? What are the possible benefits, and possible pitfalls? What problems seem amenable to this type of attack?
- Class 4 [9/14]: Range Filters (Eric Knorr lecture)
- Class 4 [9/14]:
Approximate Range Emptiness in Constant Time and Optimal Space
by Goswami, Gronlund, Green, and Pagh.
- Class 4 [9/14]:
Proteus: A Self-Designing Range Filter
by Knorr, Lemaire, Lim, Luo, Zhang, Idreos, and Mitzenmacher.
- Discussion Questions: Explain how to use an encoding argument
to achieve data structure lower bounds. Explain how a worst-case lower bound can be avoided by a system such as Proteus.
- Class 5 [9/19]: Still Working On It [To Be Filled in Shortly]. Will confirm the below.
- Class 5 [9/19]:
A Brief History of Generative Models for Power Law and Lognormal Distributions. by Mitzenmacher.
- Class 5 [9/19]:
Power-Law Distributions in Empirical Data.
by Clauset, Shalizi, and Newman.
- Class 5 [9/19]: NOT REQUIRED, ADDITIONAL READING IF INTERESTED.
Scale-free networks are rare by Broido and Clauset.
- Class 5 [9/19]: NOT REQUIRED, ADDITIONAL READING IF INTERESTED.
Editorial: The Future of Power Law Research.
by Mitzenmacher.
- Discussion Questions:
Explain some of the controversy behind power law network/scale-free
networks in research work. Where can you go wrong when you say, "Here
is a power law" in a paper? How instead can you go right? Why is finding
a power law interesting (or not).
Additional useful papers/places for Unit 1
-
Algoriths with Predictions Metabibliography
- Streaming, Sketching and Sufficient Statistics I , [*** first 40 minutes,
through Count-Min sketch] by Graham Cormode.
- Cuckoo hashing for undergraduates by Pagh.
- The Bloomier Filter by Chazelle,
Kilian, Rubinfeld, and Tal.
- Compressed Bloom Filters by Mitzenmacher.
-
Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters
by Graf and Lemire.
- Encoding Arguments by Morin, Mulzer, and Reddad.
-
SuRF: Practical Range Query Filtering with Fast Succinct Tries
by Zhang, Lim, Leis, Andersen, Kaminsky, Keeton, Pavlo.
- Rosetta: A Robust Space-Time Optimized Range Filter for Key-Value Stores by Luo, Chatterjee,
Ketsetsidis, Dayan, Qin, and Idreos.
-
Graham Cormode's page -- lots of papers on streaming..
-
My past student Justin Thaler's page -- lots of papers on streaming..
-
Andrew McGregor's page -- lots of papers on streaming..
We're not really "done" with unit one yet, but I'd like to introduce some additional topics early. We'll get back to data sketches again as the class continues.
Unit 2: Link Information and Web History
- Class 8 [9/28]: On the Importance of Links
- Class 8 [9/28]:
Rank Aggregation Methods for the Web. by Dwork, Kumar, Naor, Sivakumar.
An alternative version .
- Class 8 [9/28]:
The Link Prediction Problem for Social Networks
by Liben-Nowell and Kleinberg.
- Discussion Questions: Define Spearman and Kendall distance. Why
and how are Markov chains useful for combining rankings? What, from
the paper, are the best methods for link prediction? Can you think of
additional methods they might not have tried? How might you improve
on their methodology?
Additional useful papers/places for Unit 2
Unit 3: Compression and Basic Information Theory
- Class 9 [10/3]: We will be covering the basics of compression and
information theory, including Huffman coding, arithmetic coding, LZ-style coding, etc.
Some good online introductions to the material include:
        Information Theory, Inference, and Learning Algorithms, specifically part 1 (Data Compression) of Mackay's Book. (Though the whole book is good.)
        Introduction to Data Compression,
notes by Guy Blelloch.
Class 9 [10/3]: No discussion questions today! For class today the plan is you should review the Blelloch notes on compression, and I will lecture/we'll do problems in class. We will be focusing on Section 1-3 and 5 of the Blelloch notes for this class; we will also be covering other things form the notes (Burrows-Wheeler, JPEG/MPEG, etc.) with papers coming up, so it makes a good reference, feel free to read the rest if you like.
Class 10 [10/5]: Compressing Network Graphs
- Class 10 [10/5]: On Compressing Social Networks, by F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan.
- Class 10 [10/5]: Permuting Web and Social Graphs, by P. Boldi, M. Santini, and S. Vigna.
- Discussion questions: How are compressing social networks and the web alike, and different? What role does having an underlying model
play in determining how to compress these types of structures? What properties of the model(s) appear important?
- Class 14 [10/24]: Program Committee Preparation
- Class 14 [10/24]:
How Not to Review a Paper. by Cormode. This is just for general background, you will not need to write up anything for discussion, but look over before today's class.
- Class 14 [10/24]:
Thoughts on Reviewing. by Allman. This is just for general background, you will not need to write up anything for discussion, but look over before today's class.
- Class 19 [11/09]: Guest Speaker Andrei Broder, 8pm, on office hour
zoom link (see class announcements for link).
- Class 19 [11/09]:
Graph Structure in the Web
Andrei Broder and many others.
- Class 19 [11/09]:
On the resemblance and containment of documents
- Class 19 [11/09]:
A Note on Double Pooling Tests.
Discussion questions for 11/9: How does the shape of the Web graph match or not match your intuitions? Do you think it has changed since this study, and how? Do you think this resemblance mechanism described in this paper would work for plagiarism detection? How might someone try to avoid being detected as plagiarizing, and do you think the algorithm could be extended to handle your approach for avoiding detection? What do you think about the concept of double pooling?
- Class 20 [11/14]: No class; you are to be working to complete your reviews for the Mock Program Committee; all reviews should be turned in by 11/16.
- Class 21 [11/16]: Guest speaker Kapil Vaidya. (Class will be in-person live as always -- come to class!)
- Class 21 [11/16]:
Partitioned Learned Bloom Filter .
- Class 21 [11/16]:
SNARF: A Learning-Enhanced Range Filter .
- Discussion questions for 11/16: How do these works improve or differ on previous data structures you have seen for these filtering problems? Also: Kapil is a very recent PhD graduate, now working at Amazon. Please think of a question you'd like to ask him, either about the process of getting a PhD, or what it's like moving into post-graduate work.
- Class 22 [11/21]: Class cancelled for Thanksgiving week.
(No class 11/23 either.) You are expected to use this time to participate in online discussions for the Mock Program Committee.
- Class 23 [11/28]: Mock Program Committee, Day 1; Come prepared to talk about your papers
- Class 24 [11/30]: Mock Program Committee, Day 2; Come prepared to talk about your papers
Additional useful papers/places for Unit 3