DIVISION OF ENGINEERING AND APPLIED SCIENCES
HARVARD UNIVERSITY
CS 260r. Topics in Computer SystemsFall 2006 |
| Final Project Presentations |
Each presentation will be 10 minutes long, followed by 5 minutes of questions and change over. We encourage you to use slides, demos, etc. during your talk. We suggest having no more than 5 slides total in your talk (excluding the title) - make it short and sweet!
Content of talks: The idea here is to give a "work in progress" (WIP) talk, similar to a WIP session held at a scientific conference. In your short allotted time you need to impress the audience with the brilliant ideas you have and what progress you have made so far on the problem, while pointing out what's left to be done. I suggest your talk be organized along the following rough outline:
- Introduction / motivation - What problem are you trying to solve? Why? What makes this an interesting and exciting problem to work on?
- Approach - What is your overall approach? Give a high level description of your design or system architecture. Discuss the main benefits of your approach (e.g., scalability, fault tolerance, etc.) and how your design achieves those goals.
- Current status - Where you are now with the project, focusing on accomplishments so far and roadblocks that you have overcome.
- Future work - What are the main challenges that lie ahead. Beyond the scope of your class project, what future directions you envision this work taking.
Please be on time to class to avoid disturbing presentations in progress. Everyone should come to both lectures where you'll be asked to score each talk on two axes: technical content and presentation quality.
Thursday, 12/14/06
Tuesday, 12/19/06
- 2:40-2:55: Ian Fischer and Elias Torres, A Distributed Blog Search Platform
- 2:55-3:10: Manas Mittal and Michael T. Lapinski, A Mobile Interface for Browsing Large Scale Sensor Networks
- 3:10-3:25: Aaron Gibralter and Mohamed Ahmed, Load Balancing in the Cobra RSS Search Engine
- 3:25-3:40: Jonathan Hyman and Kevin M. Bombino, CitySense: Prototyping a Wireless Mesh Network of Sensor Nodes at Citywide Scale
- 3:40-3:55: Chit-Kwan Lin and Chia-Yung Su, Dynamic Query Execution on a UAV Sensor Network
- 2:40-2:55: Kyle Buza and Darren Baker, Streaming Analysis of Wireless Network Traffic
- 2:55-3:10: Michael Lyons and Peter Webb, A TinyDB Proxy for Optimizing Borealis Query Processing in Mote Sensor Networks
- 3:10-3:25: Geoffrey Werner-Allen, Resource-Aware Data Extraction from Wireless Sensor Networks
- 3:25-3:40: Bor-rong Chen and Geoffrey Peterson, Passive Sensor Network Analysis and Monitoring
| Research Project Ideas |
These are only suggestions to help you get a sense of the scope and topics for research projects that would work for CS260r. As mentioned previously, projects must have some connection to the overall topic of the course, but can draw on ideas from other fields (e.g., theory, AI, languages, etc.) In fact, we encourage projects that have a "non-systems" component.
Projects are to be undertaken in groups of two students, unless you have made special arrangements with me.
Proposal Format Research Project Proposals due by 5pm on Friday, October 27. Please email your proposal in PDF format to cs260r-reviews@eecs.
The proposal should be a 2-3 page document including sections on:
- A summary of the project;
- Background and related work (specifically, describe what is novel about your project);
- A brief description of your proposed approach, and any other thoughts on how you will proceed;
- A specific timeline of milestones that you intend to accomplish for your project. This should include the initial starting point, goal for what you intend to accomplish by the project update (due November 21), and final goal for the end of the project (final project report due Friday, January 6, 2007).
Project Ideas
- Analyze overhead of fault-tolerance techniques for large-scale query systems.
The Borealis papers discuss an approach to fault tolerance based on buffering tuples and replaying portions of a query after a failure recovery. As you scale up the number of data sources, nodes, and increase the failure rates, how feasible are these techniques?
Using a simulation (or, if you like, an analytical model) look at what happens to a large-scale query (thousands of data sources, many operators) as you vary the rate of failure, the duration of failure, the source data rate, and other parameters. Borealis marks tuples as 'tentative' if they are not based on stable data. What fraction of tuples are marked tentative? Can you further quantify the error in the results seen by the end user?
You can also look at the memory usage, latency, and other metrics associated with the Borealis approach. Contrast this to the approach from the "Dependable Internet Scale Sensing" paper that argues for a more best-effort approach. How much buffering and replication are needed to achieve a given level of reliability? In the end, what is the right way to design a fault tolerance approach for large-scale streaming queries?
- Explore migration of query operators between sensor nodes and query processing hosts.
TinyDB provides a simple SQL-based query interface for wireless sensor networks. Likewise, Borealis and other systems provide an SQL-like query model for streaming data over the Internet. Hook the two together and allow a user to state a query over multiple sensor networks using Borealis. This requires partitioning the query across the sensor nodes and the Borealis nodes, that is, pushing some of the query processing into the sensor network itself.
Develop a TinyDB "proxy" for Borealis that allows parts of a query plan to be translated into TinyDB queries and pushed into the sensor net. What is the right way to do this? Given a specific high-level query, which operations should be pushed into the sensor net, which should be run on the proxy, and which should be run on other Borealis nodes? What are the optimization goals? Bandwidth and energy usage come to mind; what about memory usage (sensor nodes have small memories) and reliability (sensor nodes can't buffer tuples persistently to accomodate link or node failures).
Finally, test your system using MoteLab (our 190-node sensor testbed in MD), partitioning MoteLab up into a number of "virtual" sensor networks each running their own instance of TinyDB, and feed the data into your proxy nodes running on the DEAS "blade" servers. (That is, you want to make MoteLab look like a geographically distributed network rather than one big network.)
- Develop a multi-application/multi-query resource management strategy.
Most work on sensor networks assumes a single application, user, or query is using the network at one time. However, if sensor networks are successful, we imagine many applications and queries running in one network at a time. This opens up many interesting questions for resource sharing, protection, and optimization. How do you ensure that multiple applications do not interfere with each other? How do you protect apps from each other? Virtual machines, like Mate, could potentially support multiple apps per network, but need to be extended to fairly allocate resources across apps, for example, to prevent one app from consuming too much energy or radio bandwidth.
A similar idea exists in a framework such as TinyDB, which could conceiveably permit multiple concurrent queries from different users. In traditional databases, work on "multiquery optimization" attempts to combine multiple queries and schedule them to optimize resource usage. This is a very interesting problem in a sensor network context; for example, if many queries are all sampling the same sensors, it would be very inefficient to run them all separately. Likewise, if two queries are performing "similar" operations, it should be possible to combine them in some way. There are many interesting problems to look at in this space.
- Develop a system to track RSS feeds and blog postings in real time.
We have developed a system called Cobra that allows many RSS feeds to be crawled in real time and matched against a set of user keywords, delivering the results to end users as "personalized RSS feeds". It's like a distributed push-based search engine for the Blogosphere (could we be any more buzzword-compliant?). The system is an early prototype but has a lot of potential. A bunch of interesting projects come up in this space, for example:
- Could you build an interface like Google Trends to track the popularity of certain people or topics over time? How would you store this information persistently? Check out other Blog tracking sites like Technorati and the immensely interesting State of the Blogosphere report for inspiration.
- Can you extend the crawler to not only crawl the content of the RSS feeds themselves but also whatever Web pages a given story links to? You could develop something like Google's PageRank algorithm that scores each "hit" for a given topic based on how it is linked to other pages. However blog postings are being updated continuously so how do you make this efficient so you can track updates in real time?
- Can you develop a Google Maps interface (or something similar) to show summaries of the contents of blog postings geographically? I have something in mind like the "red state"/"blue state" map.
- Extend previous work on distributed intrusion detection systems.
Several systems have been proposed for real-time, distributed intrusion and virus detection. Build your own system, preferably using a nice, high-level interface for setting up queries on this data. Setting up a honeypot is not difficult; you can also tap into various sources of data such as the PlanetFlow service running on PlanetLab nodes.
- Build a sensor network consisting of an array of 802.11 sniffers.
Corporations, universities, and even communities are be interested in understanding the nature of 802.11 traffic across a geographic area. For example, just within Maxwell Dworkin, it would be very interesting to understand the kind of traffic in the network, how it breaks down across different protocols, client OSs, and so forth, and also whether we see any malicious traffic (e.g., virus propagation or worm scans) over the wireless LAN.
Look at the Jigsaw paper from UCSD for inspiration. Using a couple of laptops in different parts of the building, can you set up a system that can sniff the 802.11 network and aggregate this data to build up a high-level picture of the overall traffic? Ideally you would propose a simple query interface to the real-time data so users can specify different interests -- e.g., a summary of all traffic destined for port 80, versus a summary of traffic that appears to be portscans.
It would also be interesting to run this system at home, at a nearby Starbuck's, etc. to get a picture of the network traffic from different locations.
- Keep in mind these are only suggestions! You are encouraged to come up with your own ideas ... the point is to give you a sense of what kind of projects we're expecting, and how large they should be.