MidiVision

March 13th, 2023 4:45 PM

I can't believe no one has done this before.

One day, I was trying to play a song on my synthesizer which required both hands, but I also wanted to modulate the high pass filter and distortion simultaneously. I had run out of hands. But then I realized, we have other parts of our body that we can use. The most obvious candidate is our feet (with a pedal for instance), but why not take advantage of the most expressive part of our entire body: our face.

It clicked. I just needed to use computer vision for facial gesture mapping, then use those signals to control various midi inputs on my synthesizer.

But I didn't want to hardcode some function for converting from facial mapped features to the midi outputs, so instead I used Wekinator so that a user can easily train their own shallow MLP to convert from face to MIDI space. Wekinator is great because it easily can interpolate complex functions with few shot learning. The user merely needs to demonstrate a few examples of faces they might make and what they want the instrument to sound like, then the MLP will interpolate between the rest.

Check out what I made with it:

Here's some of the code which I wrote in ChucK.

First the program relaying messages from Wekinator to my Synth:

10xB0 => int CONTROL_CHANGE; 273 => int MACRO_1; 375 => int MACRO_2; 419 => int FX1_AMOUNT; 516 => int FX2_AMOUNT; 617 => int FX3_AMOUNT; 7 874 => int CUTOFF; 9 10 11 12 13 14[ 15 MACRO_1, 16 MACRO_2, 17 FX1_AMOUNT, 18 FX2_AMOUNT, 19 CUTOFF 20] @=> int MIDI_LABELS[]; 21 22MIDI_LABELS.size() => int NUM_LABELS; 23 240 => int curr_i; 255 => int NUM_AVG; 26// last index is running average 27float MIDI_DATA[NUM_LABELS][NUM_AVG + 1]; 28 29 30MidiOut mout; 31MidiMsg msgOut; 32CONTROL_CHANGE => msgOut.data1; 33 34if( !mout.open( 0 ) ) me.exit(); 35 36OscIn oin; 3712000 => oin.port; 38OscMsg msg; 39 40oin.addAddress( "/wek/outputs, fffff" ); 41 42 43cherr <= "listening for messages on port " <= oin.port() 44 <= "..." <= IO.newline(); 45 46spork ~ incoming(); 47 48while( true ) 49{ 50 .01::second => now; 51 sendMIDI(); 52} 53 54fun void incoming() 55{ 56 while( true ) 57 { 58 oin => now; 59 while( oin.recv(msg) ) 60 { 61 cherr <= "wek Recieved" <= IO.newline(); 62 if( msg.address == "/wek/outputs" ) 63 { 64 for (0 => int i; i < NUM_LABELS; i++) { 65 push(msg.getFloat(i), i); 66 } 67 (curr_i + 1) % NUM_AVG => curr_i; 68 } 69 } 70 } 71} 72 73fun void push(float value, int index) { 74 value => MIDI_DATA[index][curr_i]; 75 0 => float sum; 76 for (0 => int i; i < NUM_AVG; i++) {sum + MIDI_DATA[index][i] => sum;} 77 sum / NUM_AVG => MIDI_DATA[index][NUM_AVG]; 78} 79 80fun int discretize(float num) { 81 Math.floor(128 * num) $ int => int result; 82 if (result > 127) { 83 return 127; 84 } else if (result < 0) { 85 return 0; 86 } 87 return result; 88} 89 90fun void sendMIDI() { 91 cherr <= "sending MIDI message: "; 92 for (0 => int i; i < NUM_LABELS; i++) { 93 MIDI_LABELS[i] => msgOut.data2; 94 discretize(MIDI_DATA[i][NUM_AVG]) => msgOut.data3; 95 cherr <= " (" <= msgOut.data2 <= ", " <= msgOut.data3 <= ")"; 96 mout.send( msgOut ); 97 } 98 cherr <= IO.newline(); 99}

And this is the program which relays messages from facial data to Wekinator:

1OscIn oin; 28338 => oin.port; 3OscMsg msg; 4oin.listenAll(); 5 6"localhost" => string hostname; 76448 => int port; 8OscOut xmit; 9xmit.dest( hostname, port ); 10 11float MOUTH_WIDTH; 12float MOUTH_HEIGHT; 13float EYEBROW_LEFT; 14float EYEBROW_RIGHT; 15float EYE_LEFT; 16float EYE_RIGHT; 17float POSITION[2]; 18float ORIENTATION[3]; 19 20// print 21cherr <= "listening for messages on port " <= oin.port() 22 <= "..." <= IO.newline(); 23 24spork ~ incoming(); 25 26while( true ) 27{ 28 1::second => now; 29} 30 31// listener 32fun void incoming() { 33 while( true ) { 34 oin => now; 35 while( oin.recv(msg) ) { 36 cherr <= "RECEIVED: \"" <= msg.address <= "\": "; 37 printArgs(msg); 38 if( msg.address == "/gesture/mouth/width" ){msg.getFloat(0) => MOUTH_WIDTH;} 39 else if( msg.address == "/gesture/mouth/height" ){msg.getFloat(0) => MOUTH_HEIGHT;} 40 else if (msg.address == "/gesture/eyebrow/right"){msg.getFloat(0) => EYEBROW_RIGHT;} 41 else if (msg.address == "/gesture/eyebrow/left"){msg.getFloat(0) => EYEBROW_LEFT;} 42 else if (msg.address == "/gesture/eye/left"){msg.getFloat(0) => EYE_LEFT;} 43 else if (msg.address == "/gesture/eye/right"){msg.getFloat(0) => EYE_RIGHT;} 44 else if (msg.address == "/pose/position"){msg.getFloat(0) => POSITION[0];msg.getFloat(1) => POSITION[1];} 45 else if (msg.address == "/pose/orientation"){msg.getFloat(0) => ORIENTATION[0];msg.getFloat(1) => ORIENTATION[1];msg.getFloat(2) => ORIENTATION[2];} 46 } 47 48 send2wek(); 49 } 50} 51 52fun void send2wek() 53{ 54 xmit.start( "/wek/inputs" ); 55 56 // print 57 cherr <= " *** SENDING: \"/wek/inputs/\": " 58 <= MOUTH_WIDTH <= " " <= MOUTH_HEIGHT <= IO.newline(); 59 60 MOUTH_WIDTH => xmit.add; 61 MOUTH_HEIGHT => xmit.add; 62 EYEBROW_LEFT => xmit.add; 63 EYEBROW_RIGHT => xmit.add; 64 EYE_LEFT => xmit.add; 65 EYE_RIGHT => xmit.add; 66 POSITION[0] => xmit.add; 67 POSITION[1] => xmit.add; 68 ORIENTATION[0] => xmit.add; 69 ORIENTATION[1] => xmit.add; 70 ORIENTATION[2] => xmit.add; 71 72 xmit.send(); 73} 74fun void printArgs( OscMsg msg ) 75{ 76 // iterate over 77 for( int i; i < msg.numArgs(); i++ ) 78 { 79 if( msg.typetag.charAt(i) == 'f' ) // float 80 { 81 cherr <= msg.getFloat(i) <= " "; 82 } 83 else if( msg.typetag.charAt(i) == 'i' ) // int 84 { 85 cherr <= msg.getInt(i) <= " "; 86 } 87 else if( msg.typetag.charAt(i) == 's' ) // string 88 { 89 cherr <= msg.getString(i) <= " "; 90 } 91 } 92 93 // new line 94 cherr <= IO.newline(); 95}