Music Classifiers and how UMTN processes data

Alicia and James

We make our own music classifier with librosa and learn about formatting musical data into tensors we can work with.

Music Classifiers

Jupyter Notebook

To format the data for this classifier:

Download the Musicnet library
Put the folder titled “musicnet” in the music-translation folder
Create a folder titled “music_classification_data” in the “musicnet” folder
Create two folders in “music_classification_data” titled “test” and “train”
In the “test” and “train” folders, put folders with the wav files in each such that the folders are titled with the labels for the wav files (e.g. “Beethoven_Accompanied_Violin”)

Access the Jupyter Notebook here

Helpful Resources for inspiration:

Pytorch Audio Classifier Tutorial

Uses the UrbanSound8K dataset (contains 10 audio classes with over 8000 audio samples, classes such as air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music)
Data is formatted using torchaudio.
CNN, modeled after the M5 network architecture

Music Genre Classification with Python

Basic guide on processing music with librosa
Has basic Jupyter notebook on classifying music
Uses the GTZAN music dataset (consists of 1000 audio tracks each 30 seconds long. It contains 10 genres namely, blues, classical, country, disco, hiphop, jazz, reggae, rock, metal and pop. Each genre consists of 100 sound clips)

Musical Genre Classification, Github Project by Dohppak

Uses the GTZAN music dataset
Processes music with librosa
4 layer CNN, validation accuracy is 77%, test accuracy of this model is 83.39%

Music genre classification with LSTMs in Keras & PyTorch, Github Project by Ruohoruotsi

Uses the GTZAN music dataset
Processes music with librosa
Multiple layers of LSTM Recurrent Neural Nets

UMTN Data Notes

Use librosa for reading in audio files
Also uses h5py, which is a Python package that lets you store huge amounts of numerical data and manipulate it from Numpy

self.data = [DatasetSet(d, args.seq_len, args) for d in args.data]

This line of code is pulled from train.py and is basically how they pass in most data (args.data is passed in through the data parameters in the shell script train.sh)

DatasetSet is located in data.py. Its subclasses such as H5Dataset help take the program take in .wav and .h5 files, process/sample them depending on specific parameters, and then return a dataset. More about manipulating/pulling data from datasets can be found here.

Written on March 20, 2020