- Series
- Combinatorics Seminar
- Time
- Friday, October 16, 2009 - 3:05pm for 1.5 hours (actually 80 minutes)
- Location
- Skiles 255
- Speaker
- Nina Balcan – Computing Science &amp; Systems, Georgia Tech
- Organizer
- Prasad Tetali

There has been substantial work on approximation algorithms for clustering
data under distance-based objective functions such as k-median, k-means, and
min-sum objectives. This work is fueled in part by the hope that
approximating these objectives well will indeed yield more accurate
solutions. That is, for problems such as clustering proteins by function, or
clustering images by subject, there is some unknown correct "target"
clustering and the implicit assumption is that clusterings that are
approximately optimal in terms of these distance-based measures are also
approximately correct in terms of error with respect to the target. In this
work we show that if we make this implicit assumption explicit -- that is, if
we assume that any c-approximation to the given clustering objective Phi is
epsilon-close to the target -- then we can produce clusterings that are
O(epsilon)-close to the target, even for values c for which obtaining a
c-approximation is NP-hard. In particular, for the k-median, k-means, and
min-sum objectives, we show that we can achieve this guarantee for any
constant c > 1.
Our results show how by explicitly considering the alignment between the
objective function used and the true underlying clustering goals, one can
bypass computational barriers and perform as if these objectives were
computationally substantially easier.
This talk is based on joint work with Avrim Blum and Anupam Gupta (SODA
2009), Mark Braverman (COLT 2009), and Heiko Roeglin and Shang-Hua Teng (ALT 2009).