Data Science - K-Means and K-Medians
For more information, you can check this Repository in my Github.
This project was done by myself. In this project, I implemented two popular clustering algorithms, K-Means and K-Medians, for clustering words belonging to four different files. I vary the value of k, add l2 normalization before cluster, and evaluate the quality of the clustering using B-CUBED precision, recall, and F-score. The results are then visualized through plots. The implementation is developed using Python, NumPy, and Matplotlib.
- Introduction to K-Means
- The kmeans.py can cluster words belonging to four files: animals, countries, fruits and veggies.
- The standalone .py file including:
- Implement K-Means and K-Medians clustering algorithm
- Vary K value from 1-9, then calculate the B-CUBED precision, recall, and F-score for each set of clusters. Then plot the result.
- Normalise each data object (vector) to unit l2 length before clustering, then re-run the algrithm and plot B-CUBED precision, recall, and F-score
You can see these demo
Demo for K-Means:

Demo for K-Medians:

Vary K value, and Normalise each data object (vector) to unit l2 length before clustering, then compare and analysis

