Harvard Theory of Computation Seminar

Sample Complexity Bounds for Link Prediction

David Choi Harvard University



Place and Time: Monday, March 30th, 2009, Piece Hall 320, 29 Oxford Street

Refreshments at 2:30pm, talk at 2:45

ABSTRACT

Link prediction is an example of recent activity in the machine learning community on relational data sets, with very visible applications such as collaborative recommendation systems (as used by companies such as Netflix and Amazon). However, the problem differs from traditional machine learning topics in some basic ways. As a result, it is unclear how the problem should be formally posed, or if a guarantee of performance theoretically exists akin to the standard learnability results that are foundational to machine learning.

 

We show that under slightly stronger assumptions than those usually posed, edge prediction can be formulated in such a way that sample complexity bounds (using the concept of VC-dimension) apply to the problem. We then examine how the theoretical result applies to some models and algorithms currently used in the field. Interestingly, results from extremal graph theory prove handy in analyzing the VC-dimension of various graph families.