A collapsed gibbs sampling LDA implementation

15 Oct 2016

Topic Model Implementation

1. Introduction

2. LDA

LDA is a generative probabilistic model for collections of grouped discrete data.Each group is described as a random mixture over a set of latent topics where each topic is a discrete distribution over the collection’s vocabulary.

Corpus: a collection of documents

data: words

The generative process for a document collection D under the LDA model:

The generative process described above results in the following joint distribution:

​ **p(w, z, θ, φ α, β) = p(φ β)p(θ α)p(z θ)p(w φ z )**

