Harvard Theory of Computation Seminar
Learning from Partial Observations (Information Recovery through Learning)
Loizos Michael, Harvard University
Place and Time: Monday, November 20, Maxwell Dworkin 319
Refreshments at 2:30pm, talk at 2:45pm
ABSTRACT
Consider the task of predicting missing entries in a medical database, given
the information that is already available. How does one go about making such
predictions, and what kind of guarantees might one provide on the accuracy
of these predictions? The problem with which one is faced here is that of
missing information in data, an arguably universal and multidisciplinary
problem. Standard statistical techniques fail to provide a formal treatment
for the general case of this problem, where the fact that information is
missing might be arbitrarily correlated with the actual value of the missing
information (e.g., patients exhibiting a certain symptom might be less
inclined to disclose this fact).
In this work we present a general machine learning framework for modelling
the phenomenon of missing information in data. We propose a masking process
model to capture the stochastic nature of information loss. Learning in this
context is employed as a means to recover as much of the missing information
as is recoverable. We extend the Probably Approximately Correct semantics to
the case of learning from partial observations with *arbitrarily* hidden
attributes. We establish that simply requiring learned hypotheses to be
consistent with observed values suffices to guarantee that hidden values are
recoverable to a certain accuracy; we also show that, in some sense, this is
an optimal strategy for achieving accurate recovery. We then establish that
a number of natural concept classes, including all the classes of monotone
formulas that are PAC learnable by monotone formulas, and the classes of
conjunctions, disjunctions, k-CNF, k-DNF, and linear thresholds, are
consistently learnable from partial observations. We finally show that the
concept classes of parities and monotone term 1-decision lists are not
properly consistently learnable from partial observations, if RP =/= NP.
This implies a separation of what is consistently learnable from partial
observations versus what is learnable in the complete or noisy setting.