Data Science - K-Means and K-Medians

For more information, you can check this Repository in my Github.

This project was done by myself. In this project, I implemented two popular clustering algorithms, K-Means and K-Medians, for clustering words belonging to four different files. I vary the value of k, add l2 normalization before cluster, and evaluate the quality of the clustering using B-CUBED precision, recall, and F-score. The results are then visualized through plots. The implementation is developed using Python, NumPy, and Matplotlib.

Introduction to K-Means
Introduction to K-Medians.
The kmeans.py can cluster words belonging to four files: animals, countries, fruits and veggies.
The standalone .py file including:
- Implement K-Means and K-Medians clustering algorithm
- Vary K value from 1-9, then calculate the B-CUBED precision, recall, and F-score for each set of clusters. Then plot the result.
- Normalise each data object (vector) to unit l2 length before clustering, then re-run the algrithm and plot B-CUBED precision, recall, and F-score

You can see these demo

Demo for K-Means:
Demo for K-Medians:
Vary K value, and Normalise each data object (vector) to unit l2 length before clustering, then compare and analysis

Ziqi Han

For more information, you can check this Repository in my Github.