dnb.ck

February 7th, 2023 4:45 PM

The story of music and AI is a new one and a rapidly progressing one. But let’s start at the very beginning.

One of the preliminary uses of AI with music was genre classification. The first approach involved a simple feature extractor that would convert a song or sound portion into an n-dimensional vector where each number represents some quantity about the music. For instance, one dimension might represent the centroid of the frequency domain, while others might represent various MFCCs (Mel Frequency Cepstral Coefficients).

My first experiment involved training a simple KNN (K-nearest neighbors) model on 1000 audio samples of 10 genres with various levels of complexity to see how well they perform at genre classification.

Model 0: I started with a 1-dimensional feature vector containing just the centroid. This model had an accuracy of about 17%.

Model 1: Then I added flux and rms for a total of 3 dimensions, and it performed much better with an accuracy of about 30%.

Model 2: Model 2 was 8 dimensions, 5 of which were MFCCs. This model had an accuracy of about 34%.

Model 3: Model 3 now includes 20 MFCCs for a total of 23 dimensions. This model had an accuracy of about 43%.

Model 4: Model 4 was the same as Model 3, with 2 more dimensions added for 25% roll-off and 75% roll-off. The accuracy was also around 43%.

The second experiment was finding a way to use this classification system to generate new music. I wanted to find a way to convert my voice into a drum and bass track.

So far, my project kinda does that. Essentially the program takes the input from the microphone, finds the dnb samples (1 drum and 1 bass) that have feature vectors most similar to the input, then plays them. Here it is kinda working:

Meet dnb-synthesis-mic.ck:

I was impressed by the program's robustness and how well it mimicked what my voice sounded like. Next, I had to think about improvements that I could make to the model that I had already created.

One lacking quality was the ability to tweak hyperparameters while running the program. I wanted the ability to turn on and off certain bass and drum tracks so I could better create shifts in dynamics throughout a performance. For this, I added the ability to increment or decrease the number of bass and drum tracks played simultaneously. There could be, for example, 1 bass track running while 4 drum tracks are running.

My original program was also limited in the fact that it had a fixed synth window that was also relatively long. So, in part two, I added the ability to change how long the synth window was for the bass and drum tracks individually. Now, you can synthesize the bass and drums at different rates while they are still always in sync.

Controls (on keyboard):

number of bass tracks: w increases, and s decreases

number of drum tracks: e increases, and d decreases

bass synth window size: r doubles it, and f halves it

drum synth window size: t doubles it, and g halves it

The video below demonstrates the sonic results of my final result. The video in the background is kind of irrelevant, as it just shows my friend messing around with the program. The text flying over the screen represents how often each drum and bass window is synthesized.

Many things could be improved about this program; however, I do not have infinite time. If I did, I would add envelopes to the various sounds to remove any pops while it is playing, and I would also add a low pass filter that could be controlled with your voice (I think that would sound pretty cool since it would sound like it was coming out of your mouth).

But the most interesting part of this project was how it felt to create music using it. It didn’t feel like playing a smart instrument, but it also didn’t feel like I was playing a dumb instrument. There were essentially 9 inputs to the instrument: 8 keys and my voice. I had to play around with it for a while before I figured out interesting ways to transition between sections and add dynamics to arrangements. But there was also this aspect to the instrument that was inherently mysterious. Maybe I just don’t know how to play it well enough, but part of me thinks that the underlying math that it uses is inherently not intuitive. We don’t listen to sounds and hear the centroid of the MFCCs of a sound, so in some regards, the sounds that came out of the program were surprising.

When using dnb-synthesis-mic.ck, it felt like I was performing while also experiencing a performance. It was, in some ways, a duet between man and machine. It was a partner dance where I took the lead, but she took me places I didn’t even consider.

It’s not a perfect tool by any means, but I think the experience of using dnb-synthesis-mic.ck perfectly walks the line between incorporating enough AI elements and maintaining control through various knobs and inputs.

Here is the code in all of its glory. Huge thank you to Ge Wang for writing most of it.

1// input: pre-extracted model file
2string DRUM_FEATURES_FILE;
3string BASS_FEATURES_FILE;
4// if have arguments, override filename
5if( me.args() > 1 )
6{
7    me.arg(0) => DRUM_FEATURES_FILE;
8    me.arg(1) => BASS_FEATURES_FILE;
9}
10else
11{
12    // print usage
13    <<< "usage: chuck mosaic-synth-mic.ck:INPUT", "" >>>;
14    <<< " |- INPUT: drum model file : bass model file", "" >>>;
15}
16//------------------------------------------------------------------------------
17// unit analyzer network: *** this must match the features in the features file
18//------------------------------------------------------------------------------
19// audio input into a FFT
20adc => FFT fft;
21// a thing for collecting multiple features into one vector
22FeatureCollector combo => blackhole;
23// add spectral feature: Centroid
24fft =^ Centroid centroid =^ combo;
25// add spectral feature: Flux
26fft =^ Flux flux =^ combo;
27// add spectral feature: RMS
28fft =^ RMS rms =^ combo;
29// add spectral feature: MFCC
30fft =^ MFCC mfcc =^ combo;
31
32
33//-----------------------------------------------------------------------------
34// setting analysis parameters -- also should match what was used during extration
35//-----------------------------------------------------------------------------
36// set number of coefficients in MFCC (how many we get out)
37// 13 is a commonly used value; using less here for printing
3820 => mfcc.numCoeffs;
39// set number of mel filters in MFCC
4010 => mfcc.numFilters;
41
42// do one .upchuck() so FeatureCollector knows how many total dimension
43combo.upchuck();
44// get number of total feature dimensions
45combo.fvals().size() => int NUM_DIMENSIONS;
46
47// set FFT size
48// 4096 => fft.size;
4915207 => fft.size;
50// set window type and size
51Windowing.hann(fft.size()) => fft.window;
52// our hop size (how often to perform analysis)
53// (fft.size()/2)::samp => dur HOP;
54(fft.size())::samp => dur HOP;
55// how many frames to aggregate before averaging?
56// (this does not need to match extraction; might play with this number)
574 => int NUM_FRAMES;
58// how much time to aggregate features for each file
59fft.size()::samp * NUM_FRAMES => dur EXTRACT_TIME;
60
61
62//------------------------------------------------------------------------------
63// unit generator network: for real-time sound synthesis
64//------------------------------------------------------------------------------
65// how many max at any time?
662 => int NUM_VOICES_BASS;
672 => int NUM_VOICES_DRUMS;
68// a number of audio buffers to cycel between
69SndBuf buffers_bass[NUM_VOICES_BASS]; SndBuf buffers_drums[NUM_VOICES_DRUMS]; ADSR envs[NUM_VOICES_BASS];
70// set parameters
71for( int i; i < NUM_VOICES_BASS; i++ )
72{
73    // connect audio
74    // buffers_bass[i] => envs[i] => pans[i] => dac;
75    buffers_bass[i] => NRev rev => Pan2 pan => dac;
76    0.8 => buffers_bass[i].gain;
77    Math.random2f(-.75,.75) => pan.pan;
78    Math.random2f(0,.5) => rev.mix;
79    // set chunk size (how to to load at a time)
80    // this is important when reading from large files
81    // if this is not set, SndBuf.read() will load the entire file immediately
82    fft.size() => buffers_bass[i].chunks;
83    
84    // randomize pan => pans[i].pan;
85    // set envelope parameters
86    envs[i].set( EXTRACT_TIME, EXTRACT_TIME/2, 1, EXTRACT_TIME );
87}
88for( int i; i < NUM_VOICES_DRUMS; i++ )
89{
90    // connect audio
91    // buffers_bass[i] => envs[i] => pans[i] => dac;
92    buffers_drums[i] => Pan2 panR => dac;
93    // 0.5 => panR.pan;
94    // set chunk size (how to to load at a time)
95    // this is important when reading from large files
96    // if this is not set, SndBuf.read() will load the entire file immediately
97    fft.size() => buffers_drums[i].chunks;
98
99}
100
101//------------------------------------------------------------------------------
102// load feature data; read important global values like numPoints and numCoeffs
103//------------------------------------------------------------------------------
104// values to be read from file
1050 => int numPointsDrums; // number of points in data
1060 => int numPointsBass;
1070 => int numCoeffs; // number of dimensions in data
108// file read PART 1: read over the file to get numPoints and numCoeffs
109<<< "LOADING FILES" >>>;
110loadFile( DRUM_FEATURES_FILE, 1 ) @=> FileIO @ fin_drum;
111loadFile( BASS_FEATURES_FILE, 0 ) @=> FileIO @ fin_bass;
112<<< "LOADED FILES", numPointsBass, numPointsDrums >>>;
113// check
114if( !fin_drum.good() ) me.exit();
115if( !fin_bass.good() ) me.exit();
116// check dimension at least
117if( numCoeffs != NUM_DIMENSIONS )
118{
119    // error
120    <<< "[error] expecting:", NUM_DIMENSIONS, "dimensions; but features file has:", numCoeffs >>>;
121    // stop
122    me.exit();
123}
124
125
126//------------------------------------------------------------------------------
127// each Point corresponds to one line in the input file, which is one audio window
128//------------------------------------------------------------------------------
129class AudioWindow
130{
131    // unique point index (use this to lookup feature vector)
132    int uid;
133    // which file did this come file (in files arary)
134    int fileIndex;
135    // starting time in that file (in seconds)
136    float windowTime;
137    
138    // set
139    fun void set( int id, int fi, float wt )
140    {
141        id => uid;
142        fi => fileIndex;
143        wt => windowTime;
144    }
145}
146
147// array of all points in model file
148AudioWindow windows[numPointsBass + numPointsDrums];
149// unique filenames; we will append to this
150string files[0];
151// map of filenames loaded
152int filename2state[0];
153// feature vectors of data points
154float inFeaturesBass[numPointsBass][numCoeffs];
155float inFeaturesDrums[numPointsDrums][numCoeffs];
156// generate array of unique indices
157int uids_bass[numPointsBass]; for( int i; i < numPointsBass; i++ ) i => uids_bass[i];
158int uids_drums[numPointsDrums]; for( int i; i < numPointsDrums; i++ ) i => uids_drums[i];
159
160int uids_playing[NUM_VOICES_BASS + NUM_VOICES_DRUMS]; for( int i; i < uids_playing.size(); i++ ) -1 => uids_playing[i];
161
162// use this for new input
163float features[NUM_FRAMES][numCoeffs];
164// average values of coefficients across frames
165float featureMean[numCoeffs];
166
167
168//------------------------------------------------------------------------------
169// read the data
170//------------------------------------------------------------------------------
171readData( fin_drum, 1 );
172readData( fin_bass, 0 );
173
174//------------------------------------------------------------------------------
175// set up our KNN object to use for classification
176// (KNN2 is a fancier version of the KNN object)
177// -- run KNN2.help(); in a separate program to see its available functions --
178//------------------------------------------------------------------------------
179KNN2 knn_drums;
180KNN2 knn_bass;
181// k nearest neighbors
1822 => int K;
183// results vector (indices of k nearest points)
184int knnResultDrums[K];
185int knnResultBass[K];
186// knn train
187knn_drums.train( inFeaturesDrums, uids_drums );
188knn_bass.train( inFeaturesBass, uids_bass );
189
190
191// used to rotate sound buffers
1920 => int which_bass;
1930 => int which_drums;
194
195
196
197fun void synthesize_both( int uid_drums, int uid_bass, int loop_num)
198{
199    if (checkIfLooping(uid_drums, which_drums) == 0) {
200        buffers_drums[which_drums] @=> SndBuf @ sound;
201    // increment and wrap if needed
202        which_drums++; if( which_drums >= buffers_drums.size() ) 0 => which_drums;
203
204    // get a referencde to the audio fragment to synthesize
205        windows[uid_drums] @=> AudioWindow @ win;
206    // get filename
207    // chout <= files[0];
208        files[win.fileIndex] => string filename;
209        <<< filename, win.fileIndex, uid_drums >>>;
210    // load into sound buffer
211        filename => sound.read;
212        chout <= filename <= " ";
213        sound.loop(1);
214        chout <= "synthsizing drum window:";
215        chout <= win.uid <= "["
216              <= win.fileIndex <= ":"
217              <= win.windowTime <= ":POSITION="
218              <= sound.pos() <= "]";
219        chout <= IO.newline();
220
221
222    } else {
223        chout <= "ALREADY PLAYING" <= IO.newline();
224    }
225
226
227    // if (checkIfLooping(uid_bass, which_bass + NUM_VOICES_BASS) == 0) {
228
229        buffers_bass[which_bass] @=> SndBuf @ sound;
230        envs[which_bass] @=> ADSR @ envelope;
231        which_bass++; if( which_bass >= buffers_bass.size() ) 0 => which_bass;
232
233        windows[uid_bass + numPointsDrums] @=> AudioWindow @ win;
234        files[win.fileIndex] => string filename;
235        filename => sound.read;
236        chout <= filename <= " ";
237        0 => sound.pos;
238    
239        chout <= "synthsizing bass window:";
240        chout <= win.uid <= "["
241          <= win.fileIndex <= ":"
242          <= win.windowTime <= ":POSITION="
243          <= sound.pos() <= "]";
244        chout <= IO.newline();
245
246        envelope.keyOn();
247        30000::samp => now;
248        envelope.keyOff();
249        envelope.releaseTime() => now;
250
251    // } else {
252    //     chout <= "ALREADY PLAYING" <= IO.newline();
253    //     chout <= uid_drums, which_bass, NUM_VOICES_BASS;
254    // }
255}
256
257fun int checkIfLooping(int uid, int whichIndex) {
258    for (0 => int i; i < uids_playing.size(); i++) {
259        if (uids_playing[i] == uid) {
260            return 1;
261        }
262    }
263    uid => uids_playing[whichIndex];
264    return 0;
265}
266
267//------------------------------------------------------------------------------
268// real-time similarity retrieval loop
269//------------------------------------------------------------------------------
2700 => int loop_num;
271while( true )
272{
273    // aggregate features over a period of time
274    for( int frame; frame < NUM_FRAMES; frame++ )
275    {
276        //-------------------------------------------------------------
277        // a single upchuck() will trigger analysis on everything
278        // connected upstream from combo via the upchuck operator (=^)
279        // the total number of output dimensions is the sum of
280        // dimensions of all the connected unit analyzers
281        //-------------------------------------------------------------
282        combo.upchuck();  
283        // get features
284        for( int d; d < NUM_DIMENSIONS; d++) 
285        {
286            // store them in current frame
287            combo.fval(d) => features[frame][d];
288        }
289        // advance time
290        2 * 15206::samp => now;
291    }
292    
293    // compute means for each coefficient across frames
294    for( int d; d < NUM_DIMENSIONS; d++ )
295    {
296        // zero out
297        0.0 => featureMean[d];
298        // loop over frames
299        for( int j; j < NUM_FRAMES; j++ )
300        {
301            // add
302            features[j][d] +=> featureMean[d];
303        }
304        // average
305        NUM_FRAMES /=> featureMean[d];
306    }
307    
308    //-------------------------------------------------
309    // search using KNN2; results filled in knnResults,
310    // which should the indices of k nearest points
311    //-------------------------------------------------
312    knn_bass.search( featureMean, K, knnResultBass );
313    knn_drums.search( featureMean, K, knnResultDrums );
314        
315    // SYNTHESIZE THIS
316        // spork ~ synthesize_both( knnResultDrums[Math.random2(0,knnResultDrums.size()-1)],
317        // knnResultBass[Math.random2(0,knnResultBass.size()-1)],
318        // loop_num);
319    spork ~ synthesize_both( knnResultDrums[0],knnResultBass[0],loop_num);
320    loop_num++;
321    // if (loop_num % 1 == 0) {
322    //     spork ~ synthesize_bass( knnResultBass[Math.random2(0,knnResultBass.size()-1)]);
323    // }
324    // if (loop_num % 4 == 0) {
325    //     spork ~ synthesize_drums( knnResultDrums[Math.random2(0,knnResultDrums.size()-1)]);
326    // }
327    // 15207::samp => now;
328}
329//------------------------------------------------------------------------------
330// end of real-time similiarity retrieval loop
331//------------------------------------------------------------------------------
332
333
334
335
336//------------------------------------------------------------------------------
337// function: load data file
338//------------------------------------------------------------------------------
339fun FileIO loadFile( string filepath , int isDrums)
340{
341    // reset
342    if (isDrums == 1) {
343        0 => numPointsDrums; 
344    } else {
345        0 => numPointsBass; 
346    }
347    0 => numCoeffs;
348
349    // load data
350    FileIO fio;
351    if( !fio.open( filepath, FileIO.READ ) )
352    {
353        // error
354        <<< "cannot open file:", filepath >>>;
355        // close
356        fio.close();
357        // return
358        return fio;
359    }
360
361    string str;
362    string line;
363    // read the first non-empty line
364    while( fio.more() )
365    {
366        // read each line
367        fio.readLine().trim() => str;
368        // check if empty line
369        if( str != "" )
370        {
371            if (isDrums == 1) {
372                numPointsDrums++; 
373            } else {
374                numPointsBass++;
375            }
376            str => line;
377        }
378    }
379
380    // a string tokenizer
381    StringTokenizer tokenizer;
382    // set to last non-empty line
383    tokenizer.set( line );
384    // negative (to account for filePath windowTime)
385    -2 => numCoeffs;
386    // see how many, including label name
387    while( tokenizer.more() )
388    {
389        tokenizer.next();
390        numCoeffs++;
391    }
392    
393    // see if we made it past the initial fields
394    if( numCoeffs < 0 ) 0 => numCoeffs;
395    
396    // check
397    if( (isDrums == 1 && numPointsDrums == 0) || (isDrums == 0 && numPointsBass == 0) || numCoeffs <= 0 )
398    {
399        <<< "no data in file:", filepath >>>;
400        fio.close();
401        return fio;
402    }
403    
404    // print
405    <<< "# of drum data points:", numPointsDrums, " # of bass data points: ", numPointsBass, "dimensions:", numCoeffs >>>;
406    
407    // done for now
408    return fio;
409}
410
411
412//------------------------------------------------------------------------------
413// function: read the data
414//------------------------------------------------------------------------------
415fun void readData( FileIO fio, int isDrums )
416{
417    // rewind the file reader
418    fio.seek( 0 );
419    
420    // a line
421    string line;
422    // a string tokenizer
423    StringTokenizer tokenizer;
424    
425    // points index
426    0 => int index;
427    // file index
428    0 => int fileIndex;
429    // file name
430    string filename;
431    // window start time
432    float windowTime;
433    // coefficient
434    int c;
435    
436    // read the first non-empty line
437    while( fio.more() )
438    {
439        // read each line
440        fio.readLine().trim() => line;
441        // check if empty line
442        if( line != "" )
443        {
444            // set to last non-empty line
445            tokenizer.set( line );
446            // file name
447            tokenizer.next() => filename;
448            // window start time
449            tokenizer.next() => Std.atof => windowTime;
450            // have we seen this filename yet?
451            if( filename2state[filename] == 0 )
452            {
453                // append
454                filename => string sss;
455                files << sss;
456                // new id
457                files.size() => filename2state[filename];
458            }
459            // get fileindex
460            filename2state[filename]-1 => fileIndex;
461            // set
462            if (isDrums == 1) {
463                windows[index].set( index, fileIndex, windowTime );
464            } else {
465                windows[index + numPointsDrums].set( index, fileIndex, windowTime );
466            }
467
468            // zero out
469            0 => c;
470            // for each dimension in the data
471            repeat( numCoeffs )
472            {
473                // read next coefficient
474                if (isDrums == 0) {
475                    tokenizer.next() => Std.atof => inFeaturesBass[index][c];   
476                } else {
477                    tokenizer.next() => Std.atof => inFeaturesDrums[index][c];   
478                }
479                // increment
480                c++;
481            }
482            
483            // increment global index
484            index++;
485        }
486    }
487}