Computer Science 286r Homework 4 Due 20 Mar 2008, 11:59 pm This is a short homework that will serve as a warm-up exercise for our project discussion on Friday, and bringing together the concepts we have studied in our unit on machine learning / artificial intelligence tools. There is no presentation due for this homework. Part 1: Understanding tools SVN: Support Vector Machine ANN: Artificial Neural Net BN: Bayesian Network HMM: Hidden Markov Model DBN: Dynamic Bayesian Network KF: Kalman Filter GM: GARCH models Using class readings and other resources of your choice, write a 1-paragraph description of N of the above 6 tools, where N is the number of members of your group. Describe the data that are inputs and outputs. Are they continuous? Discrete? How does the tool deal with incomplete data? Can the tool be useful in teaching us about the underlying phenomena, or is it a "black box"? Part 2: Applying the right tool Consider the following financial applications of (some of) the above tools: a) Take a closed-end equities-only mutual fund with published top 10 holdings. Is there an arbitrage opportunity between the fund and these ten equities? b) Describe two or more related industrial/economic sectors, then examine if there is an arbitrage opportunity between them. For example, do "Auto Parts" manufacturers fare better after declines in "Auto Manufacturers - Major"? Yahoo! Finance has an industrial sector browser at http://biz.yahoo.com/p/ . c) Model the relationship between key macro economic variables such as unemployment, inflation, the Fed funds rate, GDP growth, USD versus major currencies, and the price (in USD) of major asset class, such as commodities (oil, gold, and corn), or equity indices (Dow, S&P). d) Model the relationship between very illiquid markets and liquid markets and also markets which have various degrees of price transparency, i.e. price information is available on different time scales. You could look at two equities in the same sector of companies of very different size (one thinly traded, one liquid). Coal is a more interesting example in that it is clearly a huge market, but 85-90% of the market in the US takes the form of bilateral contracts, which have very opaque pricing. Data on average prices in the bilateral market tend to be on the order of weekly or even monthly. The majority of the remaining market is over-the-counter (OTC), which has more transparent prices, but limited liquidity: depending on the type of coal, a large power plant would have trouble buying all of its coal OTC and not moving the market. Data on prices in the OTC market tend to be daily. NYMEX also has 3 types of coal futures based on different types of coal, which are clearly transparent but are very illiquid. Days can pass without any trading. e) Detect classic "technical analysis" indicators, such as the "fat candlestick", "head and shoulders", "triple bottom", etc., in a price time series. Validate your results by collecting data from a technical analysis website that publishes when these signals occur. f) Given a system that can detect such technical analysis indicators, predict whether the presence of such an indicator 1) predicts the event chartists claim it predicts, and 2) whether such events precede unusual market activity. Then, using the same or another tool (explain your choice), decide whether there is evidence to claim that the TA indicator "causes" the unusual market activity. Select N of these applications (where N is the number of members of your group) and, for each, explain which of the tools in Part 1 would be most appropriate for the task and why. Then name one tool which would be definitely inappropriate for the same application and why.