Talk:Dirichlet process
Missing clear description of the optimization step
The main page explains how to create the initial assignment of cluster members (for example using the chinese restaurant process) but leaves out a clear description of how to update the cluster assignments to obtain meaningful clusters. The assignment algorithms assign members to clusters initially without regard to the members' properties/features. From my discussions here (http://metaoptimize.com/qa/questions/10731/dirichletprocessbasicintuition) it was explained that there is a step (using MCMC/GIBBS) which moves the documents around until the clusters are stable that is key to understanding how DP produces nonrandom results. Clearly, a detailed explanation of MCMC (and the alternatives) belongs in their own wikipages but the DP page needs to make it clear that this optimization step is key (without it, DP seems like bad magic). — Preceding unsigned comment added by Swframe (talk • contribs) 18:34, 3 August 2012 (UTC)
 The reason is that the Dirichlet process has no optimisation step. You're thinking about the Dirichlet process embedded in a Bayesian optimisation problem, but this article is about the Dirichlet process generally.mcld (talk) 14:07, 30 July 2013 (UTC)
Stickbreaking Construction
The in the formula is undefined. Anyone know what it is? Took 04:49, 31 October 2007 (UTC)
I've clarified that. It's the Dirac delta function. It is a function that integrates to 1 when it is evaluated on an argument equal to its index. This is just a mechanism to say that the summation will be whenever is equal to . Rodrigo de Salvo Braz (talk) 06:10, 4 March 2009 (UTC)
Shouldn't be the Measure as opposed to the Delta Function? —Preceding unsigned comment added by 137.111.13.200 (talk) 06:02, 4 May 2011 (UTC)
Yes, it should be the Dirac Measure. Which means it doesn't integrate to 1, but actually is 1 when the index equals the argument. Another term for its use here would be indicator function. corrected. Ingmar Schuster 12:23, 18 April 2013 (UTC)
Chinese restaurant process
What exactly is the relationship between the chinese restaurant process and the Dirichlet process? The article does not make it clear. Robinh (talk) 07:51, 1 August 2008 (UTC)
Half a customer
The text with the CRP visualization states: "Additionally, a customer opens a new table with a probability proportional to the scaling parameter \alpha." However, the visualization with \alpha = 1/2 shows the new table as already being present with half a customer sitting at it. That's very difficult restaurant to imagine and doesn't help the metaphor in any way. For example rather than 9 customers present it shows 9.5 total customers. Also if alpha = 3, and 9 customers entered the restaurant are there in total 12 customers? Someone should render the video again. Anne van Rossum (talk) 19:53, 21 June 2017 (UTC)
You are right, the half customer could be confusing. The parameters are pseudocustomers and I was calling half customers "drunken" and thus less attractive, but this gets too complicated. I'll change the animation and record a new video.Ckling (talk) 22:20, 27 June 2017 (UTC)
I'm sorry, if I increase the scaling parameter, it is unlikely that I only see 4 tables before the tables are hidden. I would have to change my code or try many, many times till I'm lucky. I don't want to make the animation larger. So for now, I will leave it at 0.5 customers. The code and the commands for recording the video are in the description of the file, help is appreciated.Ckling (talk) 13:24, 8 August 2017 (UTC)
Stickbreaking Construction possible error
"The smaller α is, more of the stick will be left for subsequent values (on average)."
Shouldn't it be "less of the stick will be left..."? Took (talk) 19:34, 31 March 2009 (UTC)
Seconded: E(\beta_i) is (1 + \alpha)^{1}, according to my understanding, so as \alpha _decreases_, E(1  \beta_i) should _decrease_.  pyeditor
Regarding Errors; In the intro formula: p(z_i = k z_{1,\dots,i1},\alpha,K) = \frac{n_k + \frac{\alpha}{K}}{i1+\alpha} seems flawed. Shouldn't i be replaced with something like N = \sum_{k=1}^{K} n_k and in all formulas below pertaining to the derivation of DP as the limit of the of a DM distribution? Bamayer (talk) 20:19, 20 November 2012 (UTC)
 I see now that i is just the number of total counts given a set of 1ofK random variables and equal to N above. Sorry for the confusion, it just looks like i is an arbitrary index and the denominator is a function of that index.Bamayer (talk) 23:11, 20 November 2012 (UTC)
Context
The phrase

 Given a set equipped with a suitable algebra,
does nothing to inform the lay reader that mathematics is what the article is about. It is a terrible phrase to use as the beginning of a Wikipedia article. Michael Hardy (talk) 05:47, 22 May 2009 (UTC)
This page is still wrong. Where is the base distribution. Should be notated X ~ DP(M,P0) where M is the scale parameter and P0 is the is the base distribution  Anon
In regard of the above: Simply an alternate parameterization; in the article, M is unnormalized, and could be expressed equivalently as P0\times M_\text{norm}, where M_\text{norm} is a normalized measure (aka. a distribution). It might be worthwhile noting this in the article.  pyeditor
I am afraid to say that this alternate parametrization is inconsistent with all other literature, and I would go so far as to say, wrong. Distinguishing between the base measure and concentration parameter is essential in practise, both from an educational point of view and from a usage point of view. When explaining a DP the concept of it quantising an existing probability distribution is conceptually important, especially when it comes to some of the useful usage scenarios, for instance Hierarchical DPs and DP mixture models. Additionally the DP can be explained as the limit of a Dirichlet distribution going to infinite elements, with a prior symmetric Dirichlet distribution, the parameter of which is directly equivalent to the concentration parameter. The effects on real world models as the concentration parameter is varied also warrant discussion. In use the DPs are invariably used in Bayesian models, where the concentration parameter and base measure come from different sources  often the concentration parameter is fixed, or has a prior (Gamma is computationally convenient.), whilst the base measure is being learnt or ultimately integrated out. In my opinion this article needs a rewrite, though unfortunately I do not have the time right now so can only moan about it.  thaines —Preceding unsigned comment added by Thaines (talk • contribs) 11:48, 6 August 2010 (UTC)
Inference and applications sections
It would be great if there is a section dedicated to inference and a section with applications. But preferably not in the way as the monster article https://en.wikipedia.org/wiki/Dirichletmultinomial_distribution. 145.94.110.25 (talk) 11:06, 12 November 2013 (UTC)
How do you pronounce Dirichlet?
This is superimportant. I don't know how to say Dirichlet and I don't want to sound stupid... — Preceding unsigned comment added by 129.6.220.243 (talk) 14:31, 4 March 2016 (UTC)
Introduction
What am I missing? The introduction describes drawing from a Dirichlet:
 Draw from the distribution .
 For :
a) With probability draw from .
b) With probability set , where is the number of previous observations , such that .
How does ever increase beyond 0? If comes from then must also come from since is still 0 at . Similarly for and so on. Why isn't the probability of setting just for any ? Don't the probabilities of drawing from and setting have to sum to 1 for any value of ? What did I miss? Chafe66 (talk) 18:34, 26 April 2016 (UTC)
 I'm confused about your confusion; e.g. if then at step 2 and with probability Victor veitch (talk) 20:08, 27 April 2016 (UTC)
