Try to clustering dataset into 38 cluster with k-means

and this is the result …

Cluster 0: 17382 items
Cluster 1: 1 items
Cluster 2: 1 items
Cluster 3: 1 items
Cluster 4: 79 items
Cluster 5: 1 items
Cluster 6: 223 items
Cluster 7: 3 items
Cluster 8: 1 items
Cluster 9: 166 items
Cluster 10: 1 items
Cluster 11: 1 items
Cluster 12: 1 items
Cluster 13: 5 items
Cluster 14: 2 items
Cluster 15: 2 items
Cluster 16: 1955 items
Cluster 17: 2 items
Cluster 18: 332 items
Cluster 19: 6 items
Cluster 20: 166 items
Cluster 21: 4 items
Cluster 22: 1 items
Cluster 23: 39 items
Cluster 24: 810 items
Cluster 25: 3 items
Cluster 26: 1 items
Cluster 27: 73 items
Cluster 28: 478 items
Cluster 29: 5 items
Cluster 30: 1 items
Cluster 31: 4 items
Cluster 32: 394 items
Cluster 33: 4 items
Cluster 34: 2 items
Cluster 35: 260 items
Cluster 36: 112 items
Cluster 37: 22 items
Total number of items: 22544

Using Map Clustering on Labels

The answer and prediction

Change k-means measure types to default (BregmanDivergences)

Try to clustering dataset into 38 cluster with k-means (H2O)

Cluster 0: 4148 items
Cluster 1: 1 items
Cluster 2: 1 items
Cluster 3: 1 items
Cluster 4: 1 items
Cluster 5: 2 items
Cluster 6: 4 items
Cluster 7: 10 items
Cluster 8: 6 items
Cluster 9: 1 items
Cluster 10: 1 items
Cluster 11: 7 items
Cluster 12: 1 items
Cluster 13: 2222 items
Cluster 14: 1 items
Cluster 15: 1 items
Cluster 16: 4 items
Cluster 17: 57 items
Cluster 18: 267 items
Cluster 19: 1 items
Cluster 20: 4 items
Cluster 21: 4 items
Cluster 22: 1 items
Cluster 23: 3 items
Cluster 24: 6 items
Cluster 25: 1 items
Cluster 26: 800 items
Cluster 27: 4 items
Cluster 28: 2344 items
Cluster 29: 1 items
Cluster 30: 28 items
Cluster 31: 1 items
Cluster 32: 1 items
Cluster 33: 8 items
Cluster 34: 3 items
Cluster 35: 471 items
Cluster 36: 1 items
Cluster 37: 12126 items
Total number of items: 22544

Problem found with X-Means, k-Medoids, DBSCAN, Agglomerative Clustering all of this can’t process with simple example set…

X-Means work when change label to bionominal (normal is true or false) but the result is too bad… as picture below.

Cluster 0: 22542 items
Cluster 1: 2 items
Total number of items: 22544

k-Medoids work when change label to bionominal (normal is true or false) + reduce attributes + reduce example set the result is not good… as picture below.

Cluster 0: 95 items
Cluster 1: 4414 items
Total number of items: 4509

DBSCAN and Agglomerative can’t define number of expect cluster, so we don’t try.

Choose best result k-Means (H2O) integrate with unsupervised feature selection.

Visualization best pattern of clustering… (accuracy 61.9%)

--

--