Earlier than beginning any deep studying mission with MIDI recordsdata, be sure to know the distinction between MIDI scores and MIDI performances!
This text is for folks planning or starting to work with MIDI recordsdata. This format is broadly used within the music group, and it caught the eye of pc music researchers because of the availability of datasets.
Nevertheless, various kinds of info may be encoded in MIDI recordsdata. Particularly, there’s a huge distinction between MIDI scores and MIDI performances. Not being conscious of this leads to time wasted on a ineffective activity or an incorrect selection of coaching knowledge and approaches.
I’ll present a primary introduction to the 2 codecs and provides hands-on examples of how you can begin working with them in Python.
What’s MIDI?
MIDI was launched as a real-time communication protocol between synthesizers. The primary concept is to ship a message each time a notice is pressed (notice on) on a MIDI keyboard and one other message when the notice is launched (notice off). Then the synthesizer on the receiving finish will know what sound to provide.
Welcome to MIDI recordsdata!
If we acquire and save all these messages (ensuring so as to add their time place) then now we have a MIDI file that we are able to use to breed a chunk. Apart from note-on and note-off, many other forms of messages exist, for instance specifying pedal info or different controllers.
You’ll be able to consider plotting this info with a pianoroll.
Beware, this isn’t a MIDI file, however solely a doable illustration of its content material! Some software program (on this instance Reaper) provides a small piano keyboard subsequent to the pianoroll to make it simpler to visually interpret.
How is a MIDI file created?
A MIDI file may be created primarily in two methods: 1) by taking part in on a MIDI instrument, 2) by manually writing right into a sequencer (Reaper, Cubase, GarageBand, Logic) or a musical rating editor (for instance from MuseScore).
With every means of manufacturing MIDI recordsdata comes additionally a distinct sort of file:
- taking part in on a MIDI instrument → MIDI efficiency
- manually writing the notes (sequencer or musical rating) → MIDI rating
We’ll now dive into every sort, after which summarize their variations.
Earlier than beginning, a disclaimer: I can’t focus particularly on how the knowledge is encoded, however on what info may be extracted from the file. For instance, once I say “ time is represented in seconds” it signifies that we are able to get seconds, regardless that the encoding itself is extra advanced.
We will discover 4 varieties of knowledge in a MIDI efficiency:
- When the notice begin: notice onset
- When the notice finish: notice offset (or notice period computed as offset -onset)
- Which notice was performed: notice pitch
- How “sturdy” was the important thing pressed: notice velocity
Notice onset and offset (and period) are represented in seconds, similar to the seconds the notes have been pressed and launched by the particular person taking part in the MIDI instrument.
Notice pitch is encoded with an integer from 0 (lowest) to 127 (highest); notice that extra notes may be represented than these that may be performed by a piano; the piano vary corresponds to 21–108.
Notice velocity can also be encoded with an integer from 0 (silence) to 127 (most depth).
The overwhelming majority of MIDI performances are piano performances as a result of most MIDI devices are MIDI keyboards. Different MIDI devices (for instance MIDI saxophones, MIDI drums, and MIDI sensors for guitar) exist, however they aren’t as widespread.
The most important dataset of human MIDI performances (classical piano music) is the Maestro dataset by Google Magenta.
The primary property of MIDI performances
A basic attribute of MIDI performances is that there are by no means notes with precisely the identical onset or period (that is, in principle, doable however, in apply, extraordinarily unlikely).
Certainly, even when they actually attempt, gamers gained’t be capable of press two (or extra) notes precisely on the identical time, since there’s a restrict to the precision people can get hold of. The identical is true for notice durations. Furthermore, this isn’t even a precedence for many musicians, since time deviation may help to provide a extra expressive or groovy feeling. Lastly, consecutive notes may have some silence in between or partially overlap.
For that reason, MIDI performances are typically additionally known as unquantized MIDI. Temporal positions are unfold on a steady time scale, and never quantized to discrete positions (for digital encoding causes, it’s technically a discrete scale, however extraordinarily wonderful, thus we are able to take into account it steady).
Arms-on instance
Allow us to have a look at a MIDI efficiency. We are going to use the ASAP dataset, out there on GitHub.
In your favourite terminal (I’m utilizing PowerShell on Home windows), go to a handy location and clone the repository.
git clone https://github.com/fosfrancesco/asap-dataset
We may even use the Python library Partitura to open the MIDI recordsdata, so you may set up it in your Python atmosphere.
pip set up partitura
Now that every thing is ready, let’s open the MIDI file, and print the primary 10 notes. Since it is a MIDI efficiency, we are going to use the load_midi_performance
operate.
from pathlib import Path
import partitura as pt# set the trail to the asap dataset (change it to your native path!)
asap_basepath = Path('../asap-dataset/')
# choose a efficiency, right here we use Bach Prelude BWV 848 in C#
performance_path = Path("Bach/Prelude/bwv_848/Denisova06M.mid")
print("Loading midi file: ", asap_basepath/performance_path)
# load the efficiency
efficiency = pt.load_performance_midi(asap_basepath/performance_path)
# extract the notice array
note_array = efficiency.note_array()
# print the dtype of the notice array (useful to know how you can interpret it)
print("Numpy dtype:")
print(note_array.dtype)
# print the primary 10 notes within the notice array
print("First 10 notes:")
print(efficiency.note_array()[:10])
The output of this Python program ought to seem like this:
Numpy dtype:
[('onset_sec', '<f4'), ('duration_sec', '<f4'), ('onset_tick', '<i4'), ('duration_tick', '<i4'), ('pitch', '<i4'), ('velocity', '<i4'), ('track', '<i4'), ('channel', '<i4'), ('id', '<U256')]
First 10 notes:
[(1.0286459, 0.21354167, 790, 164, 49, 53, 0, 0, 'n0')
(1.03125 , 0.09765625, 792, 75, 77, 69, 0, 0, 'n1')
(1.1302084, 0.046875 , 868, 36, 73, 64, 0, 0, 'n2')
(1.21875 , 0.07942709, 936, 61, 68, 66, 0, 0, 'n3')
(1.3541666, 0.04166667, 1040, 32, 73, 34, 0, 0, 'n4')
(1.4361979, 0.0390625 , 1103, 30, 61, 62, 0, 0, 'n5')
(1.4361979, 0.04296875, 1103, 33, 77, 48, 0, 0, 'n6')
(1.5143229, 0.07421875, 1163, 57, 73, 69, 0, 0, 'n7')
(1.6380209, 0.06380209, 1258, 49, 78, 75, 0, 0, 'n8')
(1.6393229, 0.21484375, 1259, 165, 51, 54, 0, 0, 'n9')]
You’ll be able to see that now we have the onset and durations in seconds, pitch and velocity. Different fields aren’t so related for MIDI performances.
Onsets and durations are additionally represented in ticks. That is nearer to the precise means this info is encoded in a MIDI file: a really brief temporal period (= 1 tick) is chosen, and all temporal info is encoded as a a number of of this period. If you cope with music performances, you may sometimes ignore this info and use straight the knowledge in seconds.
You’ll be able to confirm that there are by no means two notes with precisely the identical onset or the identical period!
Midi scores use a a lot richer set of MIDI messages to encode info equivalent to time signature, key signature, bar, and beat positions.
For that reason, they resemble musical scores (sheet music), regardless that they nonetheless miss some important info, for instance, pitch spelling, ties, dots, rests, beams, and many others…
The temporal info shouldn’t be encoded in seconds however in additional musically summary models, like quarter notes.
The primary property of MIDI scores
A basic attribute of MIDI rating is that all notice onsets are aligned to a quantized grid, outlined first by bar positions after which by recursive integer divisions (primarily by 2 and three, however different divisions equivalent to 5,7,11, and many others…) are used for tuplets.
Arms-on instance
We are actually going to take a look at the rating from Bach Prelude BWV 848 in C#, which is the rating of the efficiency we loaded earlier than. Partitura has a devoted load_score_midi
operate.
from pathlib import Path
import partitura as pt# set the trail to the asap dataset (change it to your native path!)
asap_basepath = Path('../asap-dataset/')
# choose a rating, right here we use Bach Prelude BWV 848 in C#
score_path = Path("Bach/Prelude/bwv_848/midi_score.mid")
print("Loading midi file: ", asap_basepath/score_path)
# load the rating
rating = pt.load_score_midi(asap_basepath/score_path)
# extract the notice array
note_array = rating.note_array()
# print the dtype of the notice array (useful to know how you can interpret it)
print("Numpy dtype:")
print(note_array.dtype)
# print the primary 10 notes within the notice array
print("First 10 notes:")
print(rating.note_array()[:10])
The output of this Python program ought to seem like this:
Numpy dtype:
[('onset_beat', '<f4'), ('duration_beat', '<f4'), ('onset_quarter', '<f4'), ('duration_quarter', '<f4'), ('onset_div', '<i4'), ('duration_div', '<i4'), ('pitch', '<i4'), ('voice', '<i4'), ('id', '<U256'), ('divs_pq', '<i4')]
First 10 notes:
[(0. , 1.9958333 , 0. , 0.99791664, 0, 479, 49, 1, 'P01_n425', 480)
(0. , 0.49583334, 0. , 0.24791667, 0, 119, 77, 1, 'P00_n0', 480)
(0.5, 0.49583334, 0.25, 0.24791667, 120, 119, 73, 1, 'P00_n1', 480)
(1. , 0.49583334, 0.5 , 0.24791667, 240, 119, 68, 1, 'P00_n2', 480)
(1.5, 0.49583334, 0.75, 0.24791667, 360, 119, 73, 1, 'P00_n3', 480)
(2. , 0.99583334, 1. , 0.49791667, 480, 239, 61, 1, 'P01_n426', 480)
(2. , 0.49583334, 1. , 0.24791667, 480, 119, 77, 1, 'P00_n4', 480)
(2.5, 0.49583334, 1.25, 0.24791667, 600, 119, 73, 1, 'P00_n5', 480)
(3. , 1.9958333 , 1.5 , 0.99791664, 720, 479, 51, 1, 'P01_n427', 480)
(3. , 0.49583334, 1.5 , 0.24791667, 720, 119, 78, 1, 'P00_n6', 480)]
You’ll be able to see that the onsets of the notes are all falling precisely on a grid. If we take into account onset_quarter
(the third column) we are able to see that sixteenth notes fall each 0.25 quarters, as anticipated.
The period is a little more problematic. For instance, on this rating, a sixteenth notice ought to have a quarter_duration
of 0.25. Nevertheless, we are able to see from the Python output that the period is definitely 0.24791667. What occurred is that MuseScore, which was used to generate this MIDI file, shortened a bit every notice. Why? Simply to make the audio rendition of this MIDI file sound a bit higher. And it does certainly, at the price of inflicting many issues to the folks utilizing these recordsdata for Pc Music analysis. Comparable issues additionally exist in broadly used datasets, such because the Lakh MIDI Dataset.
Given the variations between MIDI scores and MIDI performances we’ve seen, let me provide you with some generic pointers that may assist in appropriately organising your deep studying system.
Want MIDI scores for music technology programs, for the reason that quantized notice positions may be represented with a reasonably small vocabulary, and different simplifications are doable, like solely contemplating monophonic melodies.
Use MIDI efficiency for programs that focus on the way in which people play and understand music, for instance, beat monitoring programs, tempo estimators, and emotion recognition programs (specializing in expressive taking part in).
Use each varieties of knowledge for duties like score-following (enter: efficiency, output: rating) and expressive efficiency technology (enter: rating, output: efficiency).
Additional issues
I’ve introduced the principle variations between MIDI scores and MIDI performances. Nevertheless, as usually occurs, issues could also be extra advanced.
For instance, some datasets, just like the AMAPS datasets, are initially MIDI scores, however the authors launched time modifications at each notice, to simulate the time deviation of actual human gamers (notice that this solely occurs between notes at completely different time positions; all notes in a chord will nonetheless be completely simultaneous).
Furthermore, some MIDI exports, just like the one from MuseScore, may even attempt to make the MIDI rating extra just like a MIDI efficiency, once more by altering tempo indication if the piece modifications tempo, by inserting a really small silence between consecutive notes (we noticed this within the instance earlier than), and by taking part in grace notes as a really brief notice barely earlier than the reference notice onset.
Certainly, grace notes represent a really annoying downside in MIDI scores. Their period is unspecified in musical phrases, we simply generically know that they need to be “brief”. And their onset is within the rating the identical one of many reference notice, however this could sound very bizarre if we listed to an audio rendition of the MIDI file. Ought to we then shorten the earlier notice, or the subsequent notice to create space for the grace notice?
Different elaborations are additionally problematic since there aren’t any distinctive guidelines on how you can play them, for instance, what number of notes ought to a trill accommodates? Ought to a mordent begin from the precise notice or the higher notice?
MIDI recordsdata are nice, as a result of they explicitly present details about the pitch, onset, and period of each notice. This implies for instance that, in comparison with audio recordsdata, fashions focusing on MIDI knowledge may be smaller and be skilled with smaller datasets.
This comes at a value: MIDI recordsdata, and symbolically encoded music on the whole, are advanced codecs to make use of since they encode so many varieties of knowledge in many various methods.
To correctly use MIDI knowledge as coaching knowledge, it is very important concentrate on the sort of knowledge which can be encoded. I hope this text gave you an excellent start line to be taught extra about this subject!
[All figures are from the author.]