The tech behind the AirConsole Karaoke Machine

Today we launched our online karaoke machine. Our HTML5 app transforms your big screen into the karaoke display and your smartphone becomes the remote control. You can connect any number of smartphones and everyone can queue up songs.

The smartphones are connected to the big screen using the AirConsole API. In this blog post I'm going to dive into the karaoke tech we have built and not so much into how the AirConsole platform works.

One of the coolest features of our web app is that the artist can help you sing. If you forget the lyrics, the artist will take over and will help you to get back into the song. Let's look at how we do this.

Singing together with the artist

We're getting our songs from YouTube. We automatically crawl the most popular karaoke YouTube channels to get the latest songs and also search for the original music video. So far we have about 7500 karaoke songs in our library. A few hundred have the "singing together with the artist feature" enabled. In the screenshot above, you can see that the original video clip including the artist audio is running on the left side of the screen, while the karaoke version without the artist audio is running on the right side of the screen. When you sing, we mute the original video clip, when you don't sing we mute the karaoke version and un-mute the original clip so you can hear the artist.

Syncing the songs

In order for this to work well, the two video clips need to be perfectly in sync. Most music and karaoke videos have some kind of intro of different length, so you can't just start playing both youtube videos and expect that they are in sync, you need to time shift them. Finding the correct timeshift manually would be very time consuming, so we developed an algorithm that looks at the audio tracks of both videos and tries to minimize the difference by trying different time shifts. Once the difference is minimal, the songs are in sync. Well, most of the time. We have a hitrate of about 98% and we review the results manually.

Detecting if the user is singing

We have different mechanisms to decide if the artist audio should be played or if the user wants to sing himself. The easiest is the microphone button on the smartphone. If the user presses it, we know he wants to sing and we should mute the artist. But we also have a more advanced way. You can hookup a real microphone to your laptop and we use the WebRTC getUserMedia and AudioContext HTML5 functionality to get the input volume of the microphone in javascript. When we see that the volume is above a threshold, we know the user is singing and doesn't need any help so we mute the artist. We could also use the smartphones microphone, but because iOS does not support WebRTC yet we decided to use the laptops microphone instead.

Recording yourself

Our karaoke machine can also record a video of you singing your favorite song. We only wanted to use HTML5 technology without plugins. This turned out to be trickier than we first thought. WebRTC getUserMedia has no implemented support for recording yet. However there are some great javascript libraries like RecordRTC that help you to record your videos locally. However different browsers use different video codecs and some can only record raw WAV audio. These files are huge and can't really be easily shared by the user. Another problem is that Chrome, the most popular browser on AirConsole, records audio and video into two separate files and they need to be merged. So we decided to add an autoscaling server side, that merges these files, transforms them into a compact mp4 video and emails them to the users. Only one problem left: How do you transfer huge WAV files to the server? 40MB for a 3 minute video? No thank you. It's amazing what you can do these days in javascript, for example transcode a WAV file into an MP3 inside the browser using multithreading with the help of webworkers. Problem solved.

Now give it a try!

Now it's time to try our AirConsole Karaoke Machine. We had tons of fun building the karaoke machine and sang way too much during development. Apologies to the rest of the people here in the office.

