Computer Science 161

Midterm Exam

March 10, 2015

82 minutes/82 points

Instructions

The points allocated to each problem correspond to how much time we think the problem should take.

Please send your answers to me (margo@eecs.harvard.edu) in a PLAIN TEXT file named your-lastname.txt as an attachment - with the string "your-lastname" replaced with your last name. Please do not put your name at the top of your exam (as you'll see below I would like your name at the bottom). In a perfect world, your text file will not have lines longer than 80 characters, but I'll reformat it if it does, albeit sadly. Label the answers clearly so it is obvious what answers correspond to what questions. Please place your answers in order in the file. When you have completed the exam, please email it to me (margo). This exam is open text book, open source code, open notes, and open course web site. You may not use computers (cell phones, PDAs, or any other device that can talk to a network) during the exam (other than to record your answers, to view OS/161 or System/161 code or documentation, or to examine your notes or the course web site). You must complete the exam on your own, without assistance.

Exam Policy Statement

At the end of your exam, assuming you have followed the rules of the exam, please write the following statement and then place your name after it, asserting that you have followed the rules of the exam. "I have neither given nor received help on this exam, and I have not used any disallowed materials in the completion of this exam."

1. True False (20 points)

Answer true if the statement is always true; otherwise answer false. If you feel that there is ambiguity and that affects your answer, feel free to explain your logic.

A uniprocessor with only single-threaded processes does not need synchronization. False: Processes can still share resources (e.g., memory) and need to synchronize and the OS still needs to synchronize with devices.
Threads waiting on a CV will be granted the CV lock on a FIFO basis. False: There is no guarantee on how threads waiting on a CV will be scheduled.
If a processor supports virtual memory, then it must have a TLB. False: you could have a processor that knows how to traverse page tables in HW and then you might not have a TLB.
A drum served the same purpose as modern day harddrives. True
The job of the operating system has always been to allow the hardware to be used as efficiently as possible. False: sometime we sacrifice hardware efficiency to make people more efficient (think timesharing).
Operating systems as an area of intellectual investigation arose in the 60's when it was an open question whether one could construct a layer of software that allowed hardware to be used efficiently. True
Most of the types of problems identified in early operating systems have been solved today. False: It is surprising how many of the problems we are trying to solve today are identical to those from the 60's!
Loadable kernel modules are the modern-day solution to extensible operating systems, providing all the features that were desired by the extensible OS community. False: They are, in fact, the solution adopted; while they solve extensibility they don't provide the safety properties researchers were seeking -- turns out people didn't care so much.
The transition from user mode to supervisor mode requires hardware support. True
It is possible to handle page faults without hardware support. We gave credit for both T/F -- while you can do it using two processors instead of explicit page fault support, the second processor is, in fact, hardware

2. Short Answer & Multiple Choice (24 points)

Answer the following questions in as few sentences as necessary.

(5 points) On a trap, what information must the kernel save or figure out.
- Must know location of the trap (PC)
- Must know address for a memory trap (Address)
- Must know if you were in kernel mode
- Must know the state of interrups
- Must know what kind of trap was encountered
- Save user registers
- Find kernel stack
(5 points) You have been asked to design a TLB for a new processor. Your manager would like to understand the tradeoffs involved in deciding whether to implement a 2-way set associative TLB or a 16-way set associative TLB. What do you tell him/her? Which would you advise selecting for the design. (Assume that both TLBs will hold exactly the same number of mappings.)
- A 2-way set associative TLB:
  - is going to require a lot less hardware.
  - is probably going to allow a shorter cycle time
- A 16-way set associate TLB:
  - will probably be able to avoid thrashing under all circumstances
  - with 16 entries, the replacement policy might actually matter, while with only 2 entries it probably doesn't.
Unless they have a good reason, 16 seems like overkill
(5 points) Jack and Jill are arguing about page replacement schemes. One of them claims that MRU -- replacing the most recently used page would be a monumentally stupid replacement policy. The other argues that if you focus only on data pages, then there are workloads for which it would be optimal. What do you think?
If you are sequentially accessing something larger than memory, MRU produces ideal replacement.
(5 points) You are writing a C program and you've created a nice clean design with a reasonably short main program and a small library implemented in a separate file. You want functions in both files to share a common, global debug parameter, so you put:
```
extern int debug;
```
in both files. Each file compiles cleanly by itself, but when you go to build your program, you get the error:
```
Undefined symbols for architecture x86_64:
  "_debug", referenced from:
        _main in foo.o
```
Why are you getting this error? Explain this first in high level terms and then in lower level detail referring to the various segments in which the debug variable is/is not defined. You have declared that there exists a variable debug, but you never actually allocated space for it (defined it) -- each file thinks that it's going to be allocated somewhere else, but it never is. So, in both main and the library .o file you've got a symbol in the cross references section for debug, but when you link the two files together, you never find a reference to it in the symbol table.
(2 points) Which is true of a modern operating system kernel (select only one):
1. The kernel code is always executing.
2. The kernel only stores resource state, it does not execute code.
3. The kernel code executes when the kernel process is scheduled.
4. The kernel code executes in response to hardware interrupts and system calls.
5. None of the above
(2 points) You have a bunch of CS161 students who are frantically writing code and periodically compiling it while the CS205 students are running a bunch of long simulations. Which scheduling algorithm would be best for the workload described? (select only one):
1. FCFS
2. Lottery Scheduling
3. MLFQ
4. Round Robin

3. Synchronization (20 points)

For each synchronization problem described below, select the best synchronization primitives to solve the problem. You may use each primitive as many times as you need to. Briefly (1-2 sentences), explain why you chose the primitive you did. Select the synchronization primitive from the following list:

Counting semaphore
Binary semaphore
Lock (w/out a CV)
Lock and condition variable

A common problem in soccer collisions between players jumping for headers (this often leads to concussions). They've decided that clever CS161 students could easily solve this problem with a synchronization primitive that would arbitrate which player got to jump up for the header (such primitives would have to be very high performance). Which primitive would you suggest?
Lock -- this is classic mutual exclusion and you want the same entity to unlock the resource that locked it, so a lock is better than a binary sempahore.
Professor Seltzer was making pancakes for hungry teenagers. The pancakes come out in batches of three. She'd like a synchronization primitive that would allocate pancakes to teenagers without fighting. (There is no need to worry about leftovers between batches -- all pancakes are consumed pretty much instantly.) Suggestions?
Counting semaphore: we just need to let enough teenagers through to match the number of pancakes.
You are competing in a new Olympic event called a distributed relay. One team member has to run one lap at the Harvard track. The next team member must run a lap at the Yale track. The third team member runs a lap at Brown track and the last team member runs a lap at the Columbia track. Each runner must not start until the previous runner has completed his/her lap. Natually, the traditional passing of the baton or slapping of the hands won't work, so they've turned to you, as a savvy CS161 student, to provide the proper synchronization. What mechanism do you use?
Binary semaphore (at each handoff site) -- this is a classic case of using a binary semaphore to schedule. The problem with a lock and CV is that it is inefficient -- if you use 1 CV, then you have to broadcast to make sure that runners run at the right time; if you use multiple CVs then you're using a much heavier weight mechanism (lock+CV) when a simpler one (single semaphore) would work.
Competition for parking in Cambridge has become brutal due to the massive quantities of snow accumulating on our streets. The traditional use of space-savers (things like cones or lawn chairs that mark a spot as being "owned" because someone invested the physical labor in shoveling it out) has become too contentious and the Cambridge Department of Public Works is looking for a better solution. They've decided to have neighborhoods hand out parking tokens and any car without a visible token will be towed to the far reaches of the universe. Each neighborhood has a token checkin point and cars must drive to the checkin point to obtain a token for a specific spot and return a token when they leave a spot (cars that do not return tokens within seven minutes of leaving a spot are subject to enormous fines). They would like the checkin points to operate as efficiently as possible, allocating and deallocating spots in parallel as much as possible. What synchronization primitive(s) should they use?
You need a lock and CV here (one pair for each neighborhood) -- you need to allocate specific parking spots, so you have some shared state that tells you what spots are available and you need a blocking mechanisms in case a car arrives when all the spots are full.
It is well known that graduate students (as well as many others) are motivated by free food. After the first couple of years (once classes are complete), a graduate student's day consists mostly of doing research and periodically checking email to determine if anyone has announced any free food lately. Unfortunately, the better research is going, the less likely students are to check email, and by the time they get to the location of the free food, it's often gone. (While they could configure their mail to alert them every time a new message comes in, most messages do not concern free food, so this would be quite distracting.) There must be a better way -- can you propose use of a synchronization primitive that would let them receive/see messages only when they concerned free food?
This is another Lock + CV problem. Students want to block (do research) until food becomes available.

4. Virtual Memory (18 points)

Open RISC is a specification for a family of open-source, synthesizable RISC microprocessor cores. The OpenRISC 1000 MMU supports:

Both 32-bit and 64-bit address spaces.
Multiple page sizes.
Hardware support for page tables.
Context IDs (CID) that allow multiple processes to be translated by the hardware without changing hardware registers.

32-bit address translation takes place as follows (depicted below):

We form a 36-bit address by prepending the process's address with a four bit context ID.
The top 23 bits of the 36-bit address form the virtual page number.
The low 13 bits are the page offset.
The CID and level 1 page index are used to index into an L1 Page Directory.
The physical page number from the L1 Page Directory is added to the level 2 page index to form an index into the L2 Page Table.
The physical page number from the L2 PTE is concatenated with the page offset to form a phyiscal address

A page table entry is as shown:

where:

Bits 0-3 indicate cache behavior (you can ignore them)
Bit 4 is an accessed (use) bit.
Bit 5 is a dirty bit.
Bits 6-8 indicate the page protection.
Bit 9 is a link bit and can be ignored for now.
Bits 10-31 are the physical page number.

(2 points) What is the smallest page this architecture supports? 8K, 2^13
(2 points) The hardware allows for treating this as a segmented architecture by ignoring the L2 page tables. In such a configuration, how large are segments? 2^24 = 16 MB
(2 points) What is the largest physical address you can construct using segments? If you were using segments, then you could have a 22-bit page number from the TLB attached to a 24-bit offset for a 46 bit physical address.
(4 points) There are only 4 bits of Context ID -- is it possible to run more than 16 processes on this architecture? If so, how?
Yes -- you would have to
1. select a process whose CID you would reassign.
2. Save it's L1 page table somewhere
3. Copy a new process's L1 page table into place
(4 points) Although OpenRISC provides hardware support for page tables, it also allows software management. Name one thing that you could do using software management that you cannot do in the hardware.
1. You would not need to layout the 16 "in-context" process L1 tables contiguously.
2. You could have different sizes for the two levels of page table -- if you wanted only a few segments and then wanted to partially use them, that might let you use memory more efficiently.
3. You would not need to fully populate any page table you were using (because you could maintain a bounds).
4. If the architecture supported it, you could throw away some of the cache bits and have larger physical addresses.
(4 points) Knowing only what you know about this architecture, how would you propose dealing with the operating system's memory management? Would you run it mapped? Unmapped? Why? Since the instructions didn't say anything about address ranges that specify mapped versus unmapped translations (as the MIPS has), I would probably just allocate one context ID for the kernel (CID = 0) and run it mapped. When I started up, it's possible that I might have to allocate specific regions of physical memory to interact with hardware devices.