# Talk:Dirichlet process

WikiProject Statistics (Rated C-class, Mid-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C  This article has been rated as C-Class on the quality scale.
Mid  This article has been rated as Mid-importance on the importance scale.

WikiProject Mathematics (Rated C-class, Low-importance)
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
 C Class
 Low Importance
Field:  Probability and statistics

## Formal definition

Is it correct that ${\displaystyle X(B_{k})}$ means ${\displaystyle P(x\in B_{k};x\sim X)}$, i.e., the probability of the random variable ${\displaystyle x}$ following the distribution ${\displaystyle X}$ to fall within the partition ${\displaystyle B_{k}}$? If so, it may help to state so in the main text. Without it, understanding other parts would be hard, so I think this is of high priority.

## Missing clear description of the optimization step

The main page explains how to create the initial assignment of cluster members (for example using the chinese restaurant process) but leaves out a clear description of how to update the cluster assignments to obtain meaningful clusters. The assignment algorithms assign members to clusters initially without regard to the members' properties/features. From my discussions here (http://metaoptimize.com/qa/questions/10731/dirichlet-process-basic-intuition) it was explained that there is a step (using MCMC/GIBBS) which moves the documents around until the clusters are stable that is key to understanding how DP produces non-random results. Clearly, a detailed explanation of MCMC (and the alternatives) belongs in their own wikipages but the DP page needs to make it clear that this optimization step is key (without it, DP seems like bad magic). — Preceding unsigned comment added by Swframe (talkcontribs) 18:34, 3 August 2012 (UTC)

## Stick-breaking Construction

The ${\displaystyle \delta }$ in the formula is undefined. Anyone know what it is? Took 04:49, 31 October 2007 (UTC)

I've clarified that. It's the Dirac delta function. It is a function that integrates to 1 when it is evaluated on an argument equal to its index. This is just a mechanism to say that the summation will be ${\displaystyle \beta _{k}}$ whenever ${\displaystyle \theta }$ is equal to ${\displaystyle \theta _{k}}$. Rodrigo de Salvo Braz (talk) 06:10, 4 March 2009 (UTC)

Shouldn't ${\displaystyle \delta _{\theta _{k}}}$ be the Measure as opposed to the Delta Function? —Preceding unsigned comment added by 137.111.13.200 (talk) 06:02, 4 May 2011 (UTC)

Yes, it should be the Dirac Measure. Which means it doesn't integrate to 1, but actually is 1 when the index equals the argument. Another term for its use here would be indicator function. corrected. --Ingmar Schuster 12:23, 18 April 2013 (UTC)

## Chinese restaurant process

What exactly is the relationship between the chinese restaurant process and the Dirichlet process? The article does not make it clear. Robinh (talk) 07:51, 1 August 2008 (UTC)

### Half a customer

The text with the CRP visualization states: "Additionally, a customer opens a new table with a probability proportional to the scaling parameter \alpha." However, the visualization with \alpha = 1/2 shows the new table as already being present with half a customer sitting at it. That's very difficult restaurant to imagine and doesn't help the metaphor in any way. For example rather than 9 customers present it shows 9.5 total customers. Also if alpha = 3, and 9 customers entered the restaurant are there in total 12 customers? Someone should render the video again. Anne van Rossum (talk) 19:53, 21 June 2017 (UTC)

You are right, the half customer could be confusing. The parameters are pseudo-customers and I was calling half customers "drunken" and thus less attractive, but this gets too complicated. I'll change the animation and record a new video.Ckling (talk) 22:20, 27 June 2017 (UTC)

I'm sorry, if I increase the scaling parameter, it is unlikely that I only see 4 tables before the tables are hidden. I would have to change my code or try many, many times till I'm lucky. I don't want to make the animation larger. So for now, I will leave it at 0.5 customers. The code and the commands for recording the video are in the description of the file, help is appreciated.Ckling (talk) 13:24, 8 August 2017 (UTC)

## Stick-breaking Construction possible error

"The smaller α is, more of the stick will be left for subsequent values (on average)."

Shouldn't it be "less of the stick will be left..."? Took (talk) 19:34, 31 March 2009 (UTC)

Seconded: E(\beta_i) is (1 + \alpha)^{-1}, according to my understanding, so as \alpha _decreases_, E(1 - \beta_i) should _decrease_. -- pyeditor

Yes I think you're right, will change the article --mcld (talk) 11:08, 12 March 2010 (UTC)

Regarding Errors; In the intro formula: p(z_i = k |z_{1,\dots,i-1},\alpha,K) = \frac{n_k + \frac{\alpha}{K}}{i-1+\alpha} seems flawed. Shouldn't i be replaced with something like N = \sum_{k=1}^{K} n_k and in all formulas below pertaining to the derivation of DP as the limit of the of a DM distribution? Bamayer (talk) 20:19, 20 November 2012 (UTC)

I see now that i is just the number of total counts given a set of 1-of-K random variables and equal to N above. Sorry for the confusion, it just looks like i is an arbitrary index and the denominator is a function of that index.Bamayer (talk) 23:11, 20 November 2012 (UTC)

## Context

The phrase

Given a set ${\displaystyle S}$ equipped with a suitable ${\displaystyle \sigma }$-algebra,

does nothing to inform the lay reader that mathematics is what the article is about. It is a terrible phrase to use as the beginning of a Wikipedia article. Michael Hardy (talk) 05:47, 22 May 2009 (UTC)

This page is still wrong. Where is the base distribution. Should be notated X ~ DP(M,P0) where M is the scale parameter and P0 is the is the base distribution -- Anon

In regard of the above: Simply an alternate parameterization; in the article, M is unnormalized, and could be expressed equivalently as P0\times M_\text{norm}, where M_\text{norm} is a normalized measure (aka. a distribution). It might be worthwhile noting this in the article. -- pyeditor

I am afraid to say that this alternate parametrization is inconsistent with all other literature, and I would go so far as to say, wrong. Distinguishing between the base measure and concentration parameter is essential in practise, both from an educational point of view and from a usage point of view. When explaining a DP the concept of it quantising an existing probability distribution is conceptually important, especially when it comes to some of the useful usage scenarios, for instance Hierarchical DPs and DP mixture models. Additionally the DP can be explained as the limit of a Dirichlet distribution going to infinite elements, with a prior symmetric Dirichlet distribution, the parameter of which is directly equivalent to the concentration parameter. The effects on real world models as the concentration parameter is varied also warrant discussion. In use the DPs are invariably used in Bayesian models, where the concentration parameter and base measure come from different sources - often the concentration parameter is fixed, or has a prior (Gamma is computationally convenient.), whilst the base measure is being learnt or ultimately integrated out. In my opinion this article needs a rewrite, though unfortunately I do not have the time right now so can only moan about it. -- thaines —Preceding unsigned comment added by Thaines (talkcontribs) 11:48, 6 August 2010 (UTC)

## Inference and applications sections

It would be great if there is a section dedicated to inference and a section with applications. But preferably not in the way as the monster article https://en.wikipedia.org/wiki/Dirichlet-multinomial_distribution. 145.94.110.25 (talk) 11:06, 12 November 2013 (UTC)

## How do you pronounce Dirichlet?

This is super-important. I don't know how to say Dirichlet and I don't want to sound stupid... — Preceding unsigned comment added by 129.6.220.243 (talk) 14:31, 4 March 2016 (UTC)

## Introduction

What am I missing? The introduction describes drawing from a Dirichlet:

1. Draw ${\displaystyle X_{1}}$ from the distribution ${\displaystyle H}$.
2. For ${\displaystyle n>1}$:

a) With probability ${\displaystyle {\frac {\alpha }{\alpha +n-1}}}$ draw ${\displaystyle X_{n}}$ from ${\displaystyle H}$.

b) With probability ${\displaystyle {\frac {n_{x}}{\alpha +n-1}}}$ set ${\displaystyle X_{n}=x}$, where ${\displaystyle n_{x}}$ is the number of previous observations ${\displaystyle X_{j},j, such that ${\displaystyle X_{j}=x}$.

How does ${\displaystyle n_{x}}$ ever increase beyond 0? If ${\displaystyle X_{1}}$ comes from ${\displaystyle H}$ then ${\displaystyle X_{2}}$ must also come from ${\displaystyle H}$ since ${\displaystyle n_{x}}$ is still 0 at ${\displaystyle n=2}$. Similarly for ${\displaystyle X_{3}}$ and so on. Why isn't the probability of setting ${\displaystyle X_{n}=x}$ just ${\displaystyle 1-{\frac {\alpha }{\alpha +n-1}}}$ for any ${\displaystyle n}$? Don't the probabilities of drawing ${\displaystyle X_{n}}$ from ${\displaystyle H}$ and setting ${\displaystyle X_{n}=x}$ have to sum to 1 for any value of ${\displaystyle n}$? What did I miss? Chafe66 (talk) 18:34, 26 April 2016 (UTC)

I'm confused about your confusion; e.g. if ${\displaystyle X_{1}=x}$ then at step 2 ${\displaystyle n_{x}=1}$ and ${\displaystyle X_{2}=x}$ with probability ${\displaystyle {\frac {1}{\alpha +n-1}}}$ Victor veitch (talk) 20:08, 27 April 2016 (UTC)
Oh--I didn't see that the first draw is ${\displaystyle x}$ by definition basically. I thought the implication was that the value ${\displaystyle x}$ was not from the distn ${\displaystyle H}$, which of course would make no sense whatsoever. In the words of Gilda Radner "nevermind." ;) Chafe66 (talk) 17:37, 6 May 2016 (UTC)

Hello fellow Wikipedians,

I have just modified 2 external links on Dirichlet process. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

• Added archive https://web.archive.org/web/20070524045420/http://www.cs.toronto.edu:80/~beal/npbayes/ to http://www.cs.toronto.edu/~beal/npbayes/
• Added archive https://web.archive.org/web/20120317193643/http://www.ece.sunysb.edu:80/~zyweng/dpcluster.html to http://www.ece.sunysb.edu/~zyweng/dpcluster.html

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}).

As of February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete the "External links modified" sections if they want, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{sourcecheck}} (last update: 15 July 2018).

• If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
• If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot 17:35, 13 December 2016 (UTC)

— Preceding unsigned comment added by Ohthere1 (talkcontribs) 23:15, 30 December 2017 (UTC)