Friday, September 22, 2017 - 3:00pm
1 hour (actually 50 minutes)
Clash with "The IDEaS Seminar Series": the talk of Ravi Kannan at 3pm on "Topic Modeling: Proof to Practice" might of interest (Location: TSRB Auditorium) -- Topic Modeling is used in a variety of contexts. This talk will outline from first principles the problem, and the well-known Latent Dirichlet Al-location (LDA) model before moving to the main focus of the talk: Recent algorithms to solve the model-learning problem with provable worst-case error and time guarantees. We present a new algorithm which enjoys both provable guarantees as well performance to scale on corpora with billions of words on a single box. Besides corpus size, a second challenge is the growth in the number of topics. We address this with a new model in which topics lie on low-dimensional faces of the topic simplex rather than just vertices.