Computer Science 286r
Homework 4
Due 20 Mar 2008, 11:59 pm

This is a short homework that will serve as a warm-up exercise for our 
project discussion on Friday, and bringing together the concepts we have 
studied in our unit on machine learning / artificial intelligence tools.  

There is no presentation due for this homework.


Part 1:
Understanding tools

SVN: Support Vector Machine
ANN: Artificial Neural Net
BN:  Bayesian Network
HMM: Hidden Markov Model
DBN: Dynamic Bayesian Network
KF:  Kalman Filter
GM:  GARCH models

Using class readings and other resources of your choice, write a
1-paragraph description of N of the above 6 tools, where N is 
the number of members of your group.  Describe the data 
that are inputs and outputs.  Are they continuous?  Discrete?
How does the tool deal with incomplete data?  Can the tool be 
useful in teaching us about the underlying phenomena, or is it
a "black box"?


Part 2:
Applying the right tool

Consider the following financial applications of (some of) the above tools:
a) Take a closed-end equities-only mutual fund with published top 10 holdings.
   Is there an arbitrage opportunity between the fund and these ten equities?

b) Describe two or more related industrial/economic sectors, then examine 
   if there is an arbitrage opportunity between them.  For example, do 
   "Auto Parts" manufacturers fare better after declines in 
   "Auto Manufacturers - Major"?  Yahoo! Finance has an industrial sector
   browser at http://biz.yahoo.com/p/ .

c) Model the relationship between key macro economic variables such as
   unemployment, inflation, the Fed funds rate, GDP growth, USD versus
   major currencies, and the price (in USD) of major asset class, such as
   commodities (oil, gold, and corn), or equity indices (Dow, S&P).

d) Model the relationship between very illiquid markets and liquid
   markets and also markets which have various degrees of price
   transparency, i.e. price information is available on different
   time scales.   You could look at two equities in the same sector 
   of companies of very different size (one thinly traded, one liquid).
   Coal is a more interesting example in that it is clearly
   a huge market, but 85-90% of the market in the US takes the form
   of bilateral contracts, which have very opaque pricing.  Data on
   average prices in the bilateral market tend to be on the order of
   weekly or even monthly. The majority of the remaining market is
   over-the-counter (OTC), which has more transparent prices, but 
   limited liquidity: depending on the type of coal, a large power
   plant would have trouble buying all of its coal OTC and not moving 
   the market. Data on prices in the OTC market tend to be daily. 
   NYMEX also has 3 types of coal futures based on different types 
   of coal, which are clearly transparent but are very illiquid.  
   Days can pass without any trading.

e) Detect classic "technical analysis" indicators, such as the
   "fat candlestick", "head and shoulders", "triple bottom", etc.,
   in a price time series.  Validate your results by collecting data
   from a technical analysis website that publishes when these signals
   occur.

f) Given a system that can detect such technical analysis indicators,
   predict whether the presence of such an indicator 1) predicts the
   event chartists claim it predicts, and 2) whether such events 
   precede unusual market activity.
   Then, using the same or another tool (explain your choice), 
   decide whether there is evidence to claim that the TA indicator
   "causes" the unusual market activity.


Select N of these applications (where N is the number of members of your
group) and, for each, explain which of the tools in Part 1 would be most
appropriate for the task and why.  Then name one tool which would be
definitely inappropriate for the same application and why.