K-means and Hierarchical Clustering of Top 50 Spotify songs: Raising awareness in the legal world


  1. Introduction

  2. Literature Review

  3. Methodology

  4. K-Means Clustering

  5. Hierarchical Clustering

  6. Comparison


This paper aims to raise awareness in the legal world about data science processes. Because finding a data science project in the legal field is a difficult task. Especially if one wants to involve music data. When the researcher types different keywords about law practices in search engines, the researcher can only find unrelated results with music, law and data science. One can accept this challenge!

Some reasonable findings are not related with any side of the law (Liberatore, Quijano-Sánchez & Camacho-Collados; Martens, 2020; Coussement & Benoit, 2021). The legal practices and statistical methods seem not to come in a common ground. So, the extending aforementioned sentences will not be necessary. To give perspective to anyone who might be interested in the data science process, this article will be dedicated to data science enthusiasts in the law field.

To do so, Spotify's top 50 songs have been chosen. There are 13 subjects in this data set. 3 metrics are about qualitative insights, one metric is about the number of songs, and the final 10 metric is about quantitative insights. The songs in this data set will be analyzed by K-means Clustering and Hierarchical Clustering. The reason why the clustering method is going to be applied is, the music has genres. This means one cannot compare rock songs with rap songs. Or maybe rock songs and rap songs can be compared. The clustering results will portray a perspective for the readers of this article.

The steps of this article are Introduction, Literature Review, Methodology, K-means Clustering, Hierarchical Clustering and comparison of those below given methods.

Literature Review

When the researcher uses the given words below to narrow her/his research area, she/he might encounter some difficulties;

"Spotify" "arbitration" "k-means" "music" "hierarchical" "disputes" "litigation" "law" "legal" "data science"

Because finding a common ground with law practices' keywords with music (especially for Spotify) and data science can be a gruelling journey. Here's a couple of article examples;

Chandrasekharan (2020) did his thesis for the purpose of raising awareness on online abuses and their effect on people. Chandrasekharan used machine learning methods to analyze his data. To do that, one of the techniques that he uses is k-means clustering. For the music and law part, one can find those words a very few times. Of course, it has been clear that this dissertation was not about the law or the music industry. However, when one types to Google Scholar "Spotify" "arbitration" "k-means" at the same time, she/he can only find three articles that contain those abovementioned keywords.

In the next example, the finding a relation in law practices and hierarchical clustering is a bit easy process. However, this time there is no music in it. Only keyword which is related to music is in the references section that is about hierarchical clustering and music recommendation. Zhang and Zhang (2021) conducted a research that is about the judgment debtors in law enforcement cases. When someone thinks about the keywords' results, this article is not perfect outcome either. Because at this time, "music" "hierarchical clustering" "litigation" has been typed to Google Scholar. Meaning that this article is chosen to show the readers something about abovementioned keywords. Also, litigation finds itself one time in this article.

One must grasp one thing correctly, by using different keywords give the researcher a way to dig deep everything she/he would to like learn. So, the researcher can type to Google Scholar (it could be another academic platform) "music" "hierarchical clustering" "data science" "Spotify" "k-means clustering" and she/he can find many articles/theses/dissertations that she/he wouldn't like to find. Because, every academic is writing something about those keywords everyday. But, what makes this research unique is that when someone tries to add keywords that is related to legal (law) practices, the scenario changes. That's why this article is written. To find a combination in music, clustering methods and law practices.


Spotify Top 50 listened songs in 2019 has been selected as data set. Beats Per Minute and Danceability will be clustering variables. Data set has been found from Kaggle. Thanks to the author who collected this data set. There are 50 rows in this data set. One can find 13 different variables that can be seen in the given list below;

Beats Per Minute (BPM) - The tempo of the song.

Energy - The energy of a song - the higher the value, the more energetic. song

Danceability - The higher the value, the easier it is to dance to this song.

Loudness (dB) - The higher the value, the louder the song.

Liveness - The higher the value, the more likely the song is a live recording.

Valence - The higher the value, the more positive mood for the song.

Length - The duration of the song.

Acousticness - The higher the value the more acoustic the song is.

Speechiness - The higher the value the more spoken word the song contains.

Popularity - The higher the value the more popular the song is.

Data will be analyzed for both K-Means Clustering and Hierarchical Clustering. After the process normalization will be applied to see if there are manipulations between the variables. Finally, both methods will be compared to test which method is suitable for this data set in terms of using Beats Per Minute and Danceability as the continuous variables.

K-means Clustering

The measure of the total variance is explained by the clustering is 68.9%. The mean values are close to each other for Danceability. It can be stated that the mean values of BPM varied highly. Maybe, because, BPM rates can reach 200 and Danceability score only reaches 100. Another reason for the differentiation of BPM and Danceability is; the list is about the top 50 most listened to songs in the world by Spotify. So, the song categories or genres are differentiated by listeners choices. One example about the choices, it would not be fair to compare Electronic music (Hörschläger et. al, 2015; Alspach, 2020) to Romantic music (Pérez Lloret et. al, 2014) types in terms of their BPM. One suggestion here is that normalization could be applied (Dodge & Commenges, 2003). Because the difference is not just between BPM and Danceability. There are some variables (Loudness DB) in this dataset whose scores are below zero, or there are some variables (Length) whose scores are more than 300.

The first observation is 125 beats per minute is the divider point. The previous interpretations can be seen here visually. There are two interesting variables here. One of them has 190 BPM. But still, its Danceability score is 40. The other one has 85 BPM. But its Danceability is 29. The first one might be interpreted by its BPM score. Because after some point people might not want to dance. The second one is that even if that song has 85 BPM, 29 Danceability is a bit strange. People dance in romantic songs too. A related argument is that these songs are listed top 50 in Spotify. So, how come those two songs have the lowest Danceability scores? Danceability definition might help the readers here. The author was extracted his data from OrganizeYourMusic. The website describes Danceability with the given words; the higher the value, the easier it is to dance to this song. So, this explanation did not help too. The dataset can be examined. The song that has 190 BPM belongs to Ariana Grande. The song's category is dance-pop. The other song belongs to The Chainsmokers. The song's category is "EDM", which means Electronic dance music. This song has 85 BPM, 29 Danceability, and it was categorized as Electronic dance music. Whoever categorized those songs deserves criticism. In fairness to Danceability score, one shouldn't expect others to dance to every song they listen to.

Since normalization was applied prior to performing Hierarchical Clustering, the same process will be applied K-means Clustering too. This process is important to be fair to Hierarchical Clustering.

The measure of the total variance is explained by the clustering is 39.7%.

It seems that the graph obtained from K-means Clustering is almost identical previous one. It looks like the scaling did not change the output.

Hierarchical Clustering

Before applying Hierarchical Clustering to this dataset, variables are normalized.

The outcome of scaled: centre is for BPM is 120.06, is for Danceability is 71.38. The outcome of scaled: scale is for BPM is 30.89839, is for Danceability is 11.92988

Single linkage is confusing in terms of how the variables are clustered. The same statement can be made for Average linkage too. Complete Linkage, on the other hand, seems more fit than the other two. To clarify this comment, the songs numbers and their BPM/Danceability scores need to be examined. It looks like the left side of the dendrogram is for Danceability, and the right side of the dendrogram is for BPM. Number 36 is Martin Garrix, and its Danceability is 66. Number 21 is Martin Garrix, and its Danceability is 66. Number 26 is Shawn Mendes, and its Danceability is 69. Number 10 is Billie Eilish, and its Danceability is 70. Number 25 is Billie Eilish, and its Danceability is 67. The examples of another side of the aisle are as follows; Number 3 is Ariana Grande, and its BPM is 190. Number 14 is Sech, and its BPM is 176. Number 7 is Lil Tecca, and its BPM is 180. Number 17 is J Balvin, and its BPM is 176. Number 37 is Sech, and its BPM is 176.

So, complete linkage divides the songs in terms of some of the highest Danceability and BPM. On the left side, there is an interesting table which is about the same artists' songs. Martin Garrix and Billie Eilish had two songs side by side. How come Hierarchical Clustering recognized those two people's work without knowing their other information? Could that be a coincidence? On the right side, the accuracy of Hierarchical Clustering is at its peak. At least, it can be stated that Hierarchical Clustering is ordered the songs almost their BPM.

The examination has started from the left side. 36,21,26,10 and 25 are K-means Clustering. 3,14,7,17 and 37 are Hierarchical Clustering.

Cutting Tree for 2 Clusters

Complete Linkage

Cluster 1 is 28 and Cluster 2 is 22.

Average Linkage

Cluster 1 is 49 and Cluster 2 is 1.

Single Linkage

Cluster 1 is 49 and Cluster 2 is 1.

The interpretation given above can be seen when cutting trees. Complete Linkage separated the observations into their highest scores. Also, to obtain a more sensible answer, 4 clusters are going to be examined.

Cutting Tree for 4 Clusters

Complete Linkage

Cluster 1 is 28, Cluster 2 is 11, Cluster 3 is 10, and Cluster 4 is 1.

Average Linkage

Cluster 1 is 30, Cluster 2 is 1, Cluster 3 is 16, and Cluster 4 is 3.

Single Linkage

Cluster 1 is 47, Cluster 2 is 1, Cluster 3 is 1, and Cluster 4 is 1.

To grasp the output given above (cutting a tree), complete, average, and single linkages are going to be drawn for 4 clusters.

After a few tries, it is clear that Complete Linkage separated the observations into their highest scores. On the other hand, finding a common ground for categorization of Average Linkage and Single Linkage is more than difficult. Also, Complete Linkage is fairer than the other two when it comes to separating the observations.


In terms of reading the variables from the graphics, Hierarchical Clustering has advantages over K-means Clustering. The most important advantage is the numbers of songs. This insight creates many paths for the analyzer. For example, finding Martin Garrix (21 and 36) and Billie Eilish (10 and 25) in Hierarchical Clustering can clarify many details. Another example is Hierarchical Clustering starts from higher numbers of BPM and Danceability. Related critique can be open here. Some people may argue that starting from the highest score might not be the expected approach. Speaking of the highest scores, K-means only divided this dataset into two parts. But this dividing makes interpretation difficult. For that reason, in the interpretation part, HC is more accurate. Meaning, one can interpret accurately when one uses HC. In fairness to both K-means Clustering and Hierarchical Clustering, in one study, Long, Hu and Jin (2021) conclude their article with these words; “the higher the energy, valance, tempo, count, and liveness indexes are, the more the song fits the characteristics of the Pop/Rock genre and the more popular it is by loyal fans in the field.” The meaning of those words is there are only two variables in this test. K-means Clustering and Hierarchical Clustering might have needed more indexes to cluster these songs.


  1. Alspach, G. (2020). Electronic Music Subgenres for Music Providers.

  2. Chandrasekharan, E. (2020). Combatting abusive behavior in online communities using cross-community learning (Doctoral dissertation, Georgia Institute of Technology).

  3. Coussement, K., & Benoit, D. F. (2021). Interpretable data science for decision making. Decision Support Systems, 150, 113664.

  4. Dodge, Y., & Commenges, D. (Eds.). (2006). The Oxford dictionary of statistical terms. Oxford University Press on Demand.

  5. Hörschläger, F., Vogl, R., Böck, S., & Knees, P. (2015). Addressing tempo estimation octave errors in electronic music by incorporating style information extracted from Wikipedia. In Proceedings of the Sound and Music Computing Conference (SMC), Maynooth, Ireland.

  6. Liberatore, F., Quijano-Sánchez, L., & Camacho-Collados, M. (2019). Applications of Data Science in Policing. European Law Enforcement Research Bulletin, (4 SCE), 89-96.

  7. Long, M., Hu, L., & Jin, F. (2021, March). Analysis of Main Characteristics of Music Genre Based on PCA Algorithm. In 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE) (pp. 101-105). IEEE.

  8. Martens, D. (2020). FAT Flow: A data science ethics framework. University of Antwerp, Faculty of Business and Economics.

  9. Pérez Lloret, S., Diez, J. J., Domé, M. N., Alvarez Delvenne, A., Braidot, N., Cardinali, D. P., & Vigo, D. E. (2014). Effects of different" relaxing" music styles on the autonomic nervous system.

  10. Zhang, H., & Zhang, Z. (2021). Characteristic Analysis of Judgment Debtors Based on Hesitant Fuzzy Linguistic Clustering Method. IEEE Access, 9, 119147-119157.