| Title: | Project Code - Nonparametric Bayes |
|---|---|
| Description: | Basic implementation of a Gibbs sampler for a Chinese Restaurant Process along with some visual aids to help understand how the sampling works. This is developed as part of a postgraduate school project for an Advanced Bayesian Nonparametric course. It is inspired by Tamara Broderick's presentation on Nonparametric Bayesian statistics given at the Simons institute. |
| Authors: | Erik-Cristian Seulean [aut, cre]
|
| Maintainer: | Erik-Cristian Seulean <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.1 |
| Built: | 2026-06-05 08:34:29 UTC |
| Source: | https://github.com/cran/nonparametric.bayes |
Gibbs sampling for the Chinese Restaurant Process Implementation details can be found in the associated paper The algorithm stops at every 1000th iteration and prints the current cluster configuration.
cluster_datapoints( data, sd = 1, initialisation = rep(1, nrow(data)), sigma0 = matrix(c(1, 0, 0, 1), nrow = 2, byrow = TRUE) )cluster_datapoints( data, sd = 1, initialisation = rep(1, nrow(data)), sigma0 = matrix(c(1, 0, 0, 1), nrow = 2, byrow = TRUE) )
data |
A matrix of nx2 containing the datapoints |
sd |
Prior standard deviation |
initialisation |
Cluster initialisation for each datapoint. Default initialisation is to set every point in the same cluster. |
sigma0 |
Covariance matrix for the points. Default initialisation is set to matrix(c(1, 0, 0, 1), mrow=2, byrow=TRUE) |
Returns the cluster assignments after the last iteration. Examples cluster_datapoints(generate_split_data(350, 0.5)$x, sigma0=diag(3^2, 2)) cluster_datapoints(petal, sigma0=petal_sigma0) cluster_datapoints(width, sigma0=width_sigma0) cluster_datapoints(mixed, sigma0=mixed_sigma0)
Draws from a Dirichlet distribution and shows the clusters that were generated by this draw. Varying alpha, will put more or less mass in the first clusters compared to higher clusters (rhos).
generate_dirichlet_clusters(a, K)generate_dirichlet_clusters(a, K)
a |
Parameter that will be passed in to a Gamma distribution in order to draw from the Dirichlet distribution. |
K |
Number of clusters to draw |
No return value
generate_dirichlet_clusters(10, 10) generate_dirichlet_clusters(0.5, 30)generate_dirichlet_clusters(10, 10) generate_dirichlet_clusters(0.5, 30)
Each point is generated one at a time, need to hit enter to generate a new point. Typing "x" will stop the clustering and the function will return.
generate_dirichlet_clusters_with_sampled_points(n, a, K)generate_dirichlet_clusters_with_sampled_points(n, a, K)
n |
Number of points to be drawn in the clusters |
a |
Parameter that will be passed in to a Gamma distribution in order to draw from the Dirichlet distribution. |
K |
Number of clusters to draw |
No return value
generate_dirichlet_clusters_with_sampled_points(15, 0.5, 20)generate_dirichlet_clusters_with_sampled_points(15, 0.5, 20)
Generates a dataset used to exemplify clustering The cluster centers are set relatively far away to see how well the algorithm performs in simple scenarios
generate_split_data(n, sd)generate_split_data(n, sd)
n |
Number of datapoints to generate |
sd |
Standard deviation from the cluster center |
Returns the datapoints and the cluster assignments. The cluster assignments can be used to calculate the performance of the clustering.
Generate a sample from a Dirichlet distirbution Using: https://en.wikipedia.org/wiki/Dirichlet_distribution#Random_number_generation
rdirichlet(n, alpha)rdirichlet(n, alpha)
n |
Number of observations. |
alpha |
A vector containing the parameters for the Dirichlet distribution. |
A sample of n observations from the Dirichlet distribution.
rdirichlet(n=1, alpha=c(2, 2, 2))rdirichlet(n=1, alpha=c(2, 2, 2))
Hit enter to keep drawing until max n or type "x" to exit.
rDPM(n, alpha, mu, sigma_0, sigma)rDPM(n, alpha, mu, sigma_0, sigma)
n |
Number of observations. |
alpha |
Alpha corresponding to GEM(alpha) used to draw the rho vector. |
mu |
Mean of the Normal distribution used to draw the clusters. |
sigma_0 |
Standard deviation of the Normal distribution used to draw the points around the cluster centre. |
sigma |
Standard deviation for cluster centers |
Returns the n observations sampled from the DPMM distribution.
rDPM(n=30, alpha=3, mu=0, sigma_0=1.5, sigma=0.7)rDPM(n=30, alpha=3, mu=0, sigma_0=1.5, sigma=0.7)
Hit enter to keep drawing until max n, type x to exit.
rDPM_visual(n, alpha, mu, sigma_0, sigma)rDPM_visual(n, alpha, mu, sigma_0, sigma)
n |
Number of observations. |
alpha |
Alpha corresponding to GEM(alpha) used to draw the rho vector. |
mu |
Mean of the Normal distribution used to draw the clusters. |
sigma_0 |
Standard deviation of the Normal distribution used to draw the points around the cluster centre. |
sigma |
Standard deviation for the cluster centre. |
Returns the n observations sampled from the DPMM distribution.
rDPM_visual(n=30, alpha=3, mu=0, sigma_0=1.5, sigma=0.7)rDPM_visual(n=30, alpha=3, mu=0, sigma_0=1.5, sigma=0.7)