# Density Clustering

Download the datafile iris.data from the UCI Machine Learning Repository. This has five attributes with 150 instances. The last column of the data is a categorical attribute for the type of Iris flower.

Write a script to implement the DENCLUE density-based clustering algorithm Algorithm 15.2 in chapter 15. The script should take as input a dataset $$\mathbf{D}$$, the minimum density $$\xi$$, the tolerance for convergence $$\epsilon$$, and the width $$h$$. Do not make any assumptions about the data (i.e., column names, etc), except that the last column gives the "true" cluster id.

Run your script on the iris dataset, with $$\epsilon=0.0001$$. Your script should output the following:

• The number of clusters, and the size of each cluster

• The density attractor, followed by the set of point in that cluster.

• Purity of the clustering, based on the true id.

For Iris, you should use a value of $$\xi$$ that gives you 3 clusters in the end, i.e., try different values and then finally report only the results for the value that gives you 3 clusters, since there are 3 true clusters in the data. Select the value of $$h$$ empirically.

To speed up the computation for estimating the density at a point, you may want to first identify the K nearest neighbors, and use only those neighbors.