I Multiple choice 42 points

1. C
2. A, D, F
3. D
4. A, B, C, D
5. None -- all run in user mode in a virtual machine environment
6. A
7. A, D, E 
8. A, B, C
9. C
10. B, D


=-=-=-=-

II Short Answer (30)

1. wait channels have spinlocks instead of sleep locks.
   wait channels are managed in that you have a single structure
       that holds all the wait channels;

2. If you inadvertently attach an uninitialized block to a file, you might
	reveal confidential data to the new file's owner
3. If you let a VM change the hardware mappings directly, it could peak at
	other VM's memory
4. Because it's dependent on the trap frame, which is determined by the
	hardware registers
5. HW: could be faster SW: Gives more flexibility
6. Page coloring -- placing things in the address space to reduce thrashing
	The idea is that you assign colors to addresses that share
	something important, in this case, TLB entries.  Then you
	allocate things likely to be used together to different
	colors.

=-=-=-=-=-

III. Stupid or Clever (50)

1. Stupid (doesn't work)
2. Clever (lots of different processors around)
3. Clever (kind of obvious, but a nice idea)
4. Stupid (now you have to figure out how to commit across logs)
5. One could argue either way: with buffering, it's not stupid, but it
might not be clever; without buffering it would be stupid.

=-=-=-=-=-
IV: Quantum Drive (38)

1. (5) 100 (1 TB = 1000,000 1 MB writes). Sequentially, you can do 10,000 1
MB writes in a second, so you need 100 such streams to get to a million.

2. (9 points; 3 per challenge)

* The destructive reads are a real problem -- this means you MUST replicate.
If you replicate each object twice, then whenever you read something you must
immediately write it back.  I'm not sure 3-way replication helps much.

* The disk works best when you can leverage parallelism and the writes are
big so we really want to transfer things to the drive in BIG BIG BIG units

* Related to the second one above, small writes are going to be very, very
bad, so we probably want to avoid them at all cost.

3. (4 points) inode design
Design rationale: This minimum 1 MB transfer size means a couple of things:

It seems that this would suggest an LFS-like design OR an extent-based
design with large extents. Let's see -- an Exabyte is 2^40 MB or a trillion
extents.  If we assume 3-way replication, that is still O(trillion) extents.
So I think that you take files < 1 MB and pack them together into segments
and you take files > 1 MB and represent them with 1 MB extents.  Now, we
also need to replicate inodes, because you destroy those when you read them.
So here is what I'll do:

For small files, do what Cedar did and write:

bunch-of-data-blocks, ordered and sorted by file followed by a bunch of inodes
followed by a summary

Call that a segment and write it three times (striped across parallel units).

For large files, we make the inode as large as it needs to be to hold all
the extents.  We write the extents (3 times each) and then pack these large
file inodes into segments and write them too.  And we write segment summaries
too.

Now, we have variable sized inodes, so they will store their size as a first
element and we'll keep a map of locations of inodes, their sizes, and their
multiple locations.  This gets written 3 times on every checkpoint as well.

In summary:

Inodes are variable size
Files smaller than 1 MB have a single extent
Files larger have as many 1 MB extents as necessary
Inodes are replicated 3 times
Inodes get packed into segments
Inode locations are tracked with a triply replicated map


4. (10 points) allocation policy
The key observations are:
	1. For big files, you want to exploit parallelism, so however the
	parallel writes can happen, you really want to make sure that
	a TB file gets striped over all those parallel units.

	2. For small files, we need to pack them into large segments a la
	LFS. Of course, if you do that, then you need to deal with finding
	things

I'm going to view my disk kind of like a large RAID0 device in that I
will view the device as having 100 parallel devices (corresponding
to the 100x parallelism computed in #1).  Logically I will think of
3-4 of those logical devices as holding mapping information, but I will
use RAID 5 style parity striping to avoid hotspots for the meta-data.
(Although I want to triply store the data, I might want more copies
of the meta-data because of those pesky destructive reads.)  So, my
allocation policy breaks down into:

Triple replicate all data
Big files: Stripe across parallel logical devices in 1 MB units.
Small files: Pack together in 1 MB segments.
Inodes: Pack them into segments; keep a map that gets written into metadata
	logical devices


5. (10 points)

Robustness: I will triple replicate data and quadruple replicate metadata.
	I will replicate on a per 1 MB segment basis.
Recovery: I am going to do a largely LFS like recovery.  Take
	checkpoints which involves writing out the mapping data to mapping
	extents (let's assume I know how to find all the mapping extents).

	So, I start from the last checkpoint and read through one copy of the
	data that follows it, rewriting each segment after I read it.

	Then I take a checkpoint.

=-=-=-=-=-

V. Elastic OS (80)

==+== cs161 Paper Review Form
==-== DO NOT CHANGE LINES THAT START WITH "==+==" UNLESS DIRECTED!

==+== =====================================================================
==+== Begin Review
==+== Reviewer: YOURNAME GOES HERE

==+== Paper #1
==-== Title: Towards Elastic Operating Systems

==+== Review Readiness
==-== Enter "Ready" if the review is ready for others to see:

Ready

==+== A. Overall merit
==-== Choices: 1. Reject
==-==          2. Weak reject
==-==          3. Weak accept
==-==          4. Accept
==-==          5. Strong accept
==-== Enter the number of your choice:

4

(5 points for taking a position that seems consistent with the review.)

==+== B. Reviewer expertise
==-== Choices: 1. No familiarity
==-==          2. Some familiarity
==-==          3. Knowledgeable
==-==          4. Expert
==-== Enter the number of your choice:

2

(5 points for proper humility)

==+== C. Paper summary

# Write a 1-2 paragraph summary that explains this paper to those
# committee members who do not read the paper.

This paper presents a design for Elastic OS, which is a software layer that
takes responsibility for providing resource elasticity transparently to the
application.  In particular, the framework is trying to relieve application
developers of the burden of 1) writing their application in a granular
fashion, 2) monitoring the infrastructure to know when to add/remove resources,
3) figure out how to distribute the workload, and 4) deal with shared state.
The fundamental mechanism proposed is that of "elastic page tables," which
extend traditional page tables to be able to refer to pages on remote
nodes. Then, by sending execution threads to the processors where the memory
is found, we avoid many of the complications of DSM.

20 points: Accurate summary -- highlights what the contributions are,
gets technical details correct.

15 points: Either vague or minor flaws in description

10 points: A worthy effort, shows they read the paper, but not quite on
target.



==+== D. Paper strengths

# List a few (3-5) bullets outlining the strengths of this paper

* Tackles a real problem -- designing applications that can scale seamlessly
is challenging.

* The small experiments in section 2 do a nice job of demonstrating
the overheads and feasibility of the approach, before diving into
technical details.

* The overall approach is elegantly simple.

5 points per strength up to 15 points

==+== E. Paper weaknesses

# List a few (3-5) bullets outlining the weaknesses of this paper

* I would have liked a bit more technical detail about what is involved
in the remote invocation.

* Even though this is a hotOS paper, since the authors do have some
prototype implementation, I would have liked to see a bit more in
a demonstration of how changing some of the thressholds (e.g., when to
pull and when so push.

* I would have liked a few more words about how stretching is different
from process migration.

* I would have liked a few more details about how you migrate page tables
during a stretch.

5 points per weakness up to 15 points

==+== F. Comments for author

# Provide a more detailed discussion of this paper -- highlighting
# things you thought were confusing, things you particularly liked
# about the paper and things you did not particularly like.

I would have liked more discussion about how you move page tables when
you stretch a process to a new machine. Is the local machine in the page
table auxilliary structure represented the same way that the remote machine
is so that most entries are valid?  You don't mention what a page table
entry looks like for a page that is remote but has been swapped to disk
(because in most cases you don't know), but when you first migrate a thread,
you may, in fact, have pages on disk on the local node; how is this handled?

I think I know what you mean by "putting complexity in user-space" ...
"increases security," but the way it's phrased feels wrong.  I think, avoiding
putting complexity in the kernel largely improves security, but that's 
slightly different from adding complexity in user space (avoiding complexity
altogether is a better strategy).

What are the security implications of allowing the migration to take place
at user level?  It seems that somehow you're giving the management process
access to page tables?

How do you decide when to shrink a process?

20 points here:
20/15/10/0 largely as above

==+== G. Comments for PC (hidden from authors)

# If there are things you wish to tell the program committee, but that
# you do not wish the authors to see, write it here.


==+== End Review