Trend Analysis Review: Structure & Insights

1. Spotify Integration

In this project, I selected playlists from well-known Spotify creators, focusing on their relevance to trending music and social media engagement. This process led to a collection of 866 tracks. These playlists were chosen not only for their popularity but also for their relevance in the context of content creation, making sure they reflect the kind of music that resonates with listeners across various platforms. To keep the data accurate, I regularly update the popularity metrics to capture real-time changes in trends.

The track data was pulled in JSON format from Spotify's API, which provides detailed information such as popularity, energy, and tempo. These attributes are key for understanding how music trends evolve and what kinds of songs are gaining traction among listeners.

Spotify Playlists

Today's Top Hits

Discover the hottest 50 tracks currently making waves. Stay updated with the freshest sounds that everyone is enjoying! Created by Spotify. Listen on Spotify.
Viral Hits

Dive into the viral hits that are trending and taking off right now. This playlist features the hottest tracks that everyone is talking about! Created by Spotify. Listen on Spotify.
Best of TikTok 2019-2024

Explore popular songs and trending music from TikTok, featuring top tracks from 2019 to 2023. Enjoy the ultimate TikTok hits and charts in one playlist! Created by The Vibe Guide. Listen on Spotify.
Instagram Reels Top Trending 2024

Dive into the hottest and trending songs from Instagram Reels, all in one playlist! Stay updated with the latest Insta trends and enjoy the top trending tracks. Created by FeelQ Recordings. Listen on Spotify.
TikTok Monthly 2024

This playlist features the top TikTok hits of 2024, including viral hits and the latest trends. Get ready to dance to the best TikTok trends and edits of the year! Created by partyfiesta!. Listen on Spotify.
Instagram Songs 2024

Discover the most popular Instagram songs of 2024, featuring the hottest trends and viral tracks that are dominating the platform. Created by FeelQ Recordings. Listen on Spotify.
Big on the Internet

A collection that speaks for itself—if you know, you know! Created by Spotify. Listen on Spotify.

Sample Spotify JSON

            {
                "album": {
                  "name": "Album Name",
                  "external_urls": {
                    "spotify": "https://open.spotify.com/album/2up3OPMp9Tb4dAKM2erWXQ"
                  },
                  "images": [
                    {
                      "url": "https://i.scdn.co/image/ab67616d00001e02ff9ca10b55ce82ae553c8228",
                      "height": 300,
                      "width": 300
                    }
                  ]
                },
                "artists": [
                  {
                    "name": "Artist Name",
                    "external_urls": {
                      "spotify": "https://open.spotify.com/artist/artist_id"
                    }
                  }
                ],
                "name": "Track Name",
                "popularity": 85,
                "external_urls": {
                  "spotify": "https://open.spotify.com/track/track_id"
                }
                ...
              }

            {
                "danceability": 0.8,
                "energy": 0.7,
                "key": 5,
                "loudness": -5.0,
                "mode": 1,
                "speechiness": 0.05,
                "acousticness": 0.1,
                "instrumentalness": 0.0,
                "liveness": 0.1,
                "valence": 0.6,
                "tempo": 120.0,
                "type": "audio_features",
                "id": "track_id",
                "uri": "spotify:track:track_id",
                "track_href": "https://api.spotify.com/v1/tracks/track_id",
                "analysis_url": "https://api.spotify.com/v1/audio-analysis/track_id",
                "duration_ms": 180000,
                "time_signature": 4
              }

I use scripts like spotify_api.py and spotify_insertion.py to extract the JSON data shown above and import it into my Django models, which are stored in a MySQL database. This setup allows for easy data management and supports in-depth trend analysis based on the musical attributes collected. Additionally, I’ve set up scheduled tasks to run these scripts and others automatically, ensuring that the data is always up to date without manual intervention.

Sample Track Model

Spotify ID	Spotify URL	Name	Album	Artist	Popularity	Danceability	Energy	Tempo	Valence	Speechiness	Acousticness	Instrumentalness	Liveness	Updated At
3L95m6wi0vkhR9DB7GSSp9	https://open.spotify.com/track/3L95m6wi0vkhR9DB7GSSp9	Lobster	Lobster	RJ Pasin	3	0.562	0.453	114.912	0.179	0.0396	0.0684	0.223	0.235	2024-11-26 19:03:27

Sample TrackFeatures Model

Track	Current Popularity	Velocity	Median Popularity	Mean Popularity	Standard Deviation Popularity	Trend	Retrieval Frequency	Updated At	RF Prediction	HGB Prediction	LR Prediction	SVM Prediction	LDA Prediction	ET Prediction	KNN Prediction	Predicted Trend
Lobster	3	0.0	5.0	5.586206896551724	1.7911129535476626	stable	low	2024-11-26 19:03:27	stable	stable	stable	stable	stable	stable	stable	stable

Sample Popularity History Model

Date	Popularity
2024-11-26	3
2024-11-25	3
2024-11-24	3
2024-11-23	3

2. Features & Calculations

This project leverages two types of features: Spotify attributes and calculated variables. The Spotify attributes are retrieved directly from the Spotify API, providing data points like tempo, energy, and popularity. After initial analysis of these attributes, valence, tempo, speechiness, danceability, and liveness were identified as the most relevant features for trend analysis. By combining these key attributes with calculated variables such as velocity and trend, we can effectively assess the importance of each feature in the model selection process.

Calculations

Calculation	Code Snippet
Velocity	`historical_popularity = self.get_historical_popularity() if len(historical_popularity) < 2: self.velocity = 0 return recent = np.array(historical_popularity[-2:]) rate_of_change = (recent[1] - recent[0]) / recent[0] if recent[0] != 0 else 0 self.velocity = rate_of_change`
Median Popularity	`historical_popularity = self.get_historical_popularity() self.median_popularity = np.median(historical_popularity) if historical_popularity else None`
Mean Popularity	`historical_popularity = self.get_historical_popularity() self.mean_popularity = np.mean(historical_popularity) if historical_popularity else None`
Standard Deviation Popularity	`historical_popularity = self.get_historical_popularity() self.std_popularity = np.std(historical_popularity) if historical_popularity else None`
Retrieval Frequency	`if self.velocity > 0.1: self.retrieval_frequency = 'high' elif 0.01 < self.velocity <= 0.1: self.retrieval_frequency = 'medium' else: self.retrieval_frequency = 'low'`
Trend	# Get historical popularity data historical_popularity = self.get_historical_popularity() # If there are not enough data points to determine a trend, default to 'stable' if len(historical_popularity) < 3: self.trend = 'stable' return # Use the previously calculated mean popularity mean_popularity = self.mean_popularity # Compare current popularity with mean popularity if self.current_popularity > mean_popularity: # Determine trend direction based on velocity if self.velocity > 0: self.trend = 'up' # Upward trend elif self.velocity == 0: self.trend = 'stable' # No significant change else: self.trend = 'down' # Current popularity is decreasing elif self.current_popularity < mean_popularity: # Determine trend direction based on velocity if self.velocity < 0: self.trend = 'down' # Downward trend elif self.velocity == 0: self.trend = 'stable' # No significant change else: self.trend = 'up' # Current popularity is improving else: self.trend = 'stable' # Current popularity equals mean

Feature Importance

To determine the significance of each feature in predicting trends, I calculate feature importance using various machine learning algorithms. These include RandomForest, Extra Trees Classifier, HistGradientBoosting, and others. After training the models, I evaluate which features have the greatest impact on the predictions. The combined insights from both Spotify attributes and calculated variables allow us to focus on the most predictive features. The models being utilized will be discussed in further detail in the next section:

RandomForest
HistGradientBoosting
LogisticRegression
SVM
LDA
ExtraTrees
KNN

Example of Stored Feature Importance

Feature	Importance
track__energy	-0.0011560693641619046
track__valence	-0.0008670520231214285
track__tempo	0.0
track__speechiness	0.0
track__danceability	-0.0007225433526011904
track__liveness	0.0
velocity	0.03988439306358379
current_popularity	-0.0005780346820809523
median_popularity	0.0
mean_popularity	0.0
std_popularity	0.001011560693641589
retrieval_frequency	0.05982658959537569

The above values are saved along with the model data to ensure that we can trace which features played the most critical role in the predictions.

3. Model Selection and Breakdown

This section provides a comprehensive overview of the models employed in the trend analysis, along with a detailed breakdown of each model's performance, strengths, and weaknesses. We will analyze the effectiveness of various algorithms in capturing the nuances of musical trends and making predictions based on historical data.

Random Forest

Strengths: RandomForest is a versatile and robust model known for its ability to handle both classification and regression problems with high accuracy. It can automatically handle missing data and capture complex relationships between variables.

Weaknesses: One of the downsides is that it can be slower to train, especially with large datasets, and might not perform as well as other algorithms for high-dimensional data.

See More Details

HistGradientBoosting

Strengths: This algorithm is particularly strong in handling large datasets and complex data relationships, often outperforming RandomForest for smaller datasets by refining the predictive power with gradient boosting.

Weaknesses: Gradient boosting can be prone to overfitting, especially if hyperparameters are not tuned properly. It also requires more training time.

See More Details

Logistic Regression

Strengths: LogisticRegression is a fast, simple, and interpretable model, especially useful when the relationship between the features and the target variable is approximately linear. It works well with smaller datasets and binary classification tasks.

Weaknesses: Its simplicity limits its performance with non-linear data, and it may underperform if the features are not well-scaled.

See More Details

SVM (Support Vector Machine)

Strengths: SVM is highly effective in high-dimensional spaces and can handle both classification and regression. It is powerful for datasets with clear margin separation.

Weaknesses: SVM can be computationally intensive, especially with larger datasets, and performance can vary significantly based on the choice of kernel.

See More Details

LDA (Linear Discriminant Analysis)

Strengths: LDA is an excellent model for classification, particularly when the data follows a Gaussian distribution. It reduces the dimensionality while maximizing class separability.

Weaknesses: LDA assumes linear separability and equal covariance matrices across classes, which limits its performance on non-linear or complex datasets.

See More Details

Extra Trees Classifier

Strengths: Similar to RandomForest, Extra Trees is efficient for large datasets and often performs better by introducing randomness in feature selection and splitting.

Weaknesses: It shares similar downsides with RandomForest, such as sensitivity to noisy data and increased computational time.

See More Details

K-Nearest Neighbors (KNN)

Strengths: KNN is intuitive, simple, and effective, particularly for smaller datasets. It can capture local patterns and relationships in the data.

Weaknesses: It can struggle with larger datasets, as it is computationally expensive, and performance can degrade with irrelevant or redundant features.

See More Details

Technical Insights

Download Starter Script

This section explains key technical components used in the project:

Data Preprocessing: The dataset is processed using SimpleImputer for handling missing values and StandardScaler for scaling numerical features, ensuring models like LogisticRegression, SVM, and KNN perform optimally.
Model Training: We utilize GridSearchCV to fine-tune hyperparameters for each model. For instance, the parameter grids used for RandomForest and HistGradientBoosting include varying estimators, learning rates, and depth configurations to ensure the best model fit for the data.
Feature Importance: For models supporting feature importance (e.g., RandomForest, Extra Trees), we extract the importance values directly from the model attributes. For others, such as SVM and HistGradientBoosting, permutation importance is calculated using permutation_importance from Scikit-learn to assess feature relevance.
Model Saving: The joblib library is used to serialize and save trained models, imputation strategies, and feature names for reproducibility.
Performance Metrics: The evaluation includes accuracy, classification reports, and confusion matrices. These metrics allow us to assess the performance and identify areas for model improvement.