January 17, 2021

8 Links

8 Links of Separation

Using Python and Sklearn’s DBSCAN to Find Core Samples of High Density | by Mahnoor Javed | Nov, 2020

We will then apply feature standardization using StandardScaler() on X. This will standardize the features by removing the mean and scaling it to unit variance.

Computing DBSCAN

Once we have created and standardized our data, we will deploy the DBSCAN algorithm from Sklearn with the value of epsilon = 0.3 and 10 as the minimum number of samples in a cluster.

We will then define an array core_sample_mask having the same dimensions as the labels. The core_sample_mask will be an array of 750 elements with zeroes (False).

After fitting the DBSCAN model on the data, we will compute the core_samples_mask.

Then, we will store the label values of all the data points in labels array.

There are a total of 4 values in the labels array : 0, 1, 2, and -1. The values 0,1 and 2 refer to the 3 clusters made out of the data whereas -1 is the label given to those data points whose core sample points do not match those in the centers array.

Number of Clusters and Noise

Let us now print the number of clusters formed ignoring the noise data points as well as the total number of outliers:

We have 3 clusters and 18 outliers/noise:

Malcare WordPress Security