

We’re excited to hear what people create! If you create a piece you like, you can upload it to a free service like Instaudio and then tweet us the link (the MuseNet demo has a tweet button to help with this). One embedding divides the larger piece into 128 parts, while the second encoding is a countdown from 127 to 0 as the model approaches the (end) token. Finally, we add two structural embeddings which tell the model where a given musical sample is within the larger musical piece. We then add an embedding for each note in a chord (this mimics relative attention, since it will be easier for the model to learn that note 4 needs to look back at note 3, or else at note 4 of the previous chord). This way, all of the notes that sound at the same time are given the same timing embedding. In addition to the standard positional embeddings, we added a learned embedding that tracks the passage of time in a given sample. We added several different kinds of embeddings to give the model more structural context. We landed on an encoding that combines expressivity with conciseness: combining the pitch, volume, and instrument information into a single token. We also tried two different methods of marking the passage of time: either tokens that were scaled according to the piece’s tempo (so that the tokens represented a musical beat or fraction of a beat), or tokens that marked absolute time in seconds. Second, we tried condensing the musical patterns by only focusing on the starts of notes, and tried further compressing that using a byte pair encoding scheme. First, a chordwise approach that considered every combination of notes sounding at one time as an individual “chord”, and assigned a token to each chord. We experimented with several different ways to encode the MIDI files into tokens suitable for this task.

The transformer is trained on sequential data: given a set of notes, we ask it to predict the upcoming note.

Additionally, we used the MAESTRO dataset. ClassicalArchives and BitMidi donated their large collections of MIDI files for this project, and we also found several collections online, including jazz, pop, African, Indian, and Arabic styles. We collected training data for MuseNet from many different sources.
