Cluster validation

What is cluster validation?
A process of estimating the goodness of cluster formed. That is, to ensure how accurate the obtained results of clustering.

To understand the cluster validation, consider the below figure 1 example confusion matrix obtained by the process of classification (unsupervised learning) or clustering (supervised learning).

Figure 1: Example of the confusion matrix.
 Where,
The expected result of cluster 1: CE1. The expected result of cluster 2: CE2.
The expected result of cluster 3: CE3. The expected result of cluster 4: CE4.
Obtained result of cluster 1: CO1. Obtained result of cluster 2: CO2.
Obtained result of cluster 3: CO3. Obtained result of cluster 4: CO4.
Row: Indicates how many samples belong to a particular cluster?
Column: Indicates how many samples are placed in the cluster from which cluster?

Example: Arrangement of library books.

Calculating recall (recall ability) concerning:

Cluster C1:

Cluster C2:

Cluster C3:


Cluster C4:

Calculating average recall :
 
Calculating precision concerning:

Cluster C1:
Cluster C2:

Cluster C3:
Cluster C4:
Calculating average precision :

In the case of k clustering problem,





Both P and R are normalized.

Generally, we go when we need two cluster problems.

When the population is imbalanced accuracy does not hold good.

F-Measure is high only when P and R are high.

F-Measure is also called as Ff-Micro.

Note: It suits all classification & cluster recognition, segmentation, and suits for all research projects.

The title of the next topic that I'll upload is "Techniques of dimensionality reduction".

Follow and subscribe to my blog.

Thank you for reading this article.