CS 222 -- Algorithms at the End of the Wire
Handouts and Class Materials
- [11/6] Assignment 3 is up.
- [10/9] You should be part of the CS222 PC, which will be getting underway soon.
Please check that you can log in.
- [10/9] Assignment 2 is up.
- [9/26] Description of project requirements now up; see assignments below.
- [8/10] I'm told the course videos are at
https://matterhorn.dce.harvard.edu/engage/ui/index.html#/2019/01/15666 . Let me know of any issues.
- [7/30] We'll have a Piazza page up. Go to
to sign up, and
for the Piazza page. If you cannot sign up, please send me an e-mail with the e-mail you
would like to use for Piazza and I will add you manually. Both the Harvard and extension class
will use the same Piazza page.
- [7/30] We will plan to use Canvas to turn in assignments. Please make sure you are
signed up to Canvas. There should be (separate) Canvas systems set up for both CS222 and the Extension
Basic Class Information
All reading dates are tentative and will be confirmed in class.
We may go faster, we may go slower. We may have to move things
for other reasons.
You should consider the assigned papers a minimum of what you should be
reading for this class. Feel free to explore on the Web or otherwise
(and additional suggested readings for each topic will be listed as well). We are just
touching the surface of these topics; there's much more out there.
Unit 1: Search Engines and the Importance of Links
- 9/4 Background: Review Markov chains.
- Start with
wikipedia. The fourth external link is to a useful book chapter
on Markov chains.
Additional useful papers for Unit 1:
- Graph Structure in the Web
by Broder et al.
- The Link Database: Fast Access to Graphs of the Web
by Randall et al.
The Eigentrust Algorithm for Reputation Management in P2P Networks
by Kamvar, Schlosser, and Garcia-Molina.
Trust-Based Recommendation Systems
by Andersen et al.
PicASHOW: Pictorial Authority Search by Hyperlinks on the Web. by Lempel
The Stochastic Approach for Link-Structure Analysis (SALSA) and the TKC Effect
by Lempel and Moran.
Power Laws, Pareto Distributions, and Zipf's Law.
by Mark Newman.
On Power-Law Relationships of the Internet Topology.
by Faloutsos, Faloutsos, and Faloutsos.
Power-Law Distributions in Empirical Data.
by Clauset, Shalizi, and Newman.
The Anatomy of the Long Tail.
by Goel, Broder, Gabrilovich, and Pang.
- Aggregating Inconsistent Information.
by Ailon, Charikar, Newman.
- Editorial: The Future of Power Law Research.
Unit 2: Compression and Basic Information Theory
- 9/18, and ongoing: We will be covering the basics of compression and
information theory, including Huffman coding, arithmetic coding, LZ-style coding, etc.
Some good online introductions to the material include:
Information Theory, Inference, and Learning Algorithms, specifically part 1 (Data Compression) of Mackay's Book. (Though the whole book is good.)
Introduction to Data Compression,
notes by Guy Blelloch.
No assigned reading summary today, but start going over background material.
- 9/27: On Compressing Social Networks, by F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan.
- 9/27: Permuting Web and Social Graphs, by P. Boldi, M. Santini, and S. Vigna.
- Questions for 9/27: How are compressing social networks and the web alike, and different? What role does having an underlying model
play in determining how to compress these types of structures? What properties of the model(s) appear important?
- I am expecting to be out of class Oct 2 and Oct 4. This will give you some time to catch up
on reading and such, and there will likely be a sub for a class in my absence. This week's work
will be updated shortly.
Additional useful papers for Unit 2:
Unit 3: Data Streams and Data Summaries
- As a general resource, my colleague Justin Thaler teaches
a fantastic, more comprehensive course on streaming algorithms; this class webpage is a fantastic resource for information on this big topic. Start here!
Data Streams: Algorithms and Applications. by Muthukrishnan.
Computing on Data Streams by Henzinger, Raghavan, and Rajagopalan.
- Questions for 10/2: How is computing on a data stream different from a "standard"
input-output algorithm? What are the goals, how do you measure
resource usage, what sorts of problems are considered? Where has
computing on data streams had successes? What lower bound techniques
are there showing that computing on data streams can be hard?
Network Applications of Bloom Filters: A Survey. by Broder and Mitzenmacher.
Invertible Bloom Lookup Tables by Goodrich and Mitzenmacher.
- Questions for 10/13:
I've been told that all a Bloom filter does is save a small constant factor in space -- do you agree with this assessment, and whether you do or not, do you think this makes a Bloom filter uninteresting? How could you imagine extending the
Bloom filter? What would you wish a Bloom filter could do that it doesn't?
The IBLT paper focuses on questions relating to the synchronization of sets between
a pair of agents. Can you think of good uses for this solution? Or generalizations to this question?
- 10/18 MapReduce: Simplified Data Processing on Large Clusters
by Dean and Ghemawat.
- 10/18 A Model of Computation for MapReduce by Karloff, Suri and Vassilvitskii.
- Questions for 10/18: Compare and contrast the theory and practice of the MapReduce paradigm. What kind of tasks might MapReduce not be good for?
Are there any eventual scaling problems for this paradigm? Suppose you had access to a large-scale MapReduce system -- what would you
most want to use it for?
- 10/30 No class, finish homework. New unit starts 11/1.
Additional useful papers for Unit 3:
Unit 4: Coding, and Network Coding
- 11/8: The Politics of Search: A Decade Retrospective
- 11/8: Access, Transparency and Control: A Proposal to Restore the Marketplace of Ideas by Regulating Search Engine Algorithms
Crossed out parts can be skipped, highlighted ones should be focused on.
- 11/8: OPTIONAL: Shaping the Web: Why the Politics of Search Engines Matters
- Please answer the questions emailed to you and send your answers to
email@example.com. Questions are repeated here.
1. What is 'Algorithmic Transparency' as it is used in the Granka paper? (1-2 sentences)
2. Granka provides a number of reasons both in support of and against algorithmic transparency (pp. 365-366). Of the reasons listed which do you find most compelling and why? (It can be a reason either for or against!) (short paragraph)
3. The Brotman paper appeals to the idea of the 'Marketplace of Ideas', what does Brotman take this concept to mean and how does it pertain to democratic value? (short paragraph)
4. Finally, looking at the discussion of transparency on pp. 43-44 of the Brotman paper, on what grounds does Brotman claim we need transparency? Given the following definition below of a 'filter bubble', do you think Brotman would cite 'filter bubbles' as an example to support her claim in this section? (slightly longer paragraph)
Filter bubbles: A filter bubble is the intellectual isolation that can occur when websites make use of algorithms to selectively assume the information a user would want to see, and then give information to the user according to this assumption. Websites make these assumptions based on the information related to the user, such as former click behavior, browsing history, search history and location. For that reason, the websites are more likely to present only information that will abide by the user's past activity. A filter bubble, therefore, can cause users to get significantly less contact with contradicting viewpoints, causing the user to become intellectually isolated. (source: https://www.techopedia.com/definition/28556/filter-bubble)
- 11/15: [This class will either be for project discussion, or there is some chance it
will be cancelled.]
- 11/20 Network Coding, an Instant Primer , by Fragouli, Le Boudec,and Widmer.
- 11/20 Network Coding for Large Scale Content Distribution , by Gkantsidis and Rodriguez.
- 11/20 What is network coding, and how does it differ from the types of encoding (digital fountain codes, Reed-Solomon,
etc.) that we have seen previously? What potential advantages does network coding offer? What are the corresponding costs?
Is network coding being used for content distribution today? If not, why not, do you think?
- 11/29: Mock CS 222 PC Meeting
- 12/4: Mock CS 222 PC Meeting, class wrapup