Multiple Audio Stream Recording

Multiple Audio Stream Recording

Posted by on May 18, 2017 in Hardware, Other Geeks

Don’t let the title fool you: this is a topic I’ve been chasing for some time because it’s not as simple as it sounds. In fact, what I’m talking about here was only discovered by me due to random happenstance.

My goal has been to find a way to get myself and at least one other person to converse, to record that conversation, and then to split each person into their own audio track. You might recognize this as something relevant to podcasting, and you’d be right: that is the ultimate goal.

The problem, then, is that when talking about audio on a Windows PC, there’s the local user — the person on the mic who is controlling the recording software — and there is literally everyone else. Whether it’s using Skype or Discord or Teamspeak, Mumble, Ventrillo, or whatever, all of the rest of the participants are jumbled together into a single audio stream received by the person doing the recording.

In the worst case scenario, the remote participant(s) audio is merged with the local audio into a single track, meaning that when it comes time to edit, any cuts or filters are applied to everyone, no exceptions. That’s certainly passable, but really not optimal because having each person on his or her own track would allow for discreet person-by-person editing for volume, noise reduction, and dead-space filtering (et al).

So the other day I was trawling YouTube for videos on the Elgato Streamdeck setup when I came across a series by the silken-voiced EposVox who not only spoke about the Streamdeck, but also about OBS setup. In one video, he mentioned multiple audio sources which, if you’ve used OBS, is not something exciting. OBS allows for (at minimum) mic audio and desktop audio to be recorded alongside the video. While OBS is primarily used for streaming to Twitch/Beam/YouTube/etc., it can also be used to record local video and audio.

Now, I don’t know how some people do it. I suspect that a lof of folks might record video using OBS or something, muting the mic so that they can record their voice over using another app, like Audacity. That works to separate the video from the voice over but then requires the user to sync the voice with the video which can be unnerving if it’s even slightly out of sync. But thanks to EposVox, I know now that there’s a better way using OBS, an alternative audio output, a mic, and Audacity.

I’ll refer you to this video.

In a nutshell (if you skipped the video), OBS allows you to add additional audio inputs. You can then send each input to a different track, assuming you’re recording in anything other than FLV (so MOV, MKV, MP4, etc). What you get in the end is a file with multiple audio tracks, and depending on how you set it up, you might have a track with all audio, and then each input on a different track, or just each audio source on a different track. What you’re seeing is the same tech that allows DVDs to have different language tracks.

Of course, as you know, you can’t watch a DVD with several audio tracks playing at once, so it is with trying to get a hold of these multiple audio tracks. This threw me for a while because my video editing app doesn’t display all audio tracks, only the first one it encounters. Since I only want the audio anyway I learned that Audacity with the FFMPG codec can import the audio from a video file using the IMPORT > AUDIO option which allows me to then select the audio tracks from the file that I want to edit.

I ran some tests with the Esteemed Mindstrike as my guinea pig on the other side of Discord. For my set-up, I had OBS recording my Yeti mic for my voice, but I had to set Discord to output to the Yeti Headphone output. That my mic has it’s own audio output is the aforementioned happenstance, because otherwise, I’d need to go down the dark road of virtual audio cables to create a fake output and send Discord output to that. In OBS, I set up an audio source for the mic (which was already there), and an additional audio source for the Yei Headphone output. The benefit of this was that I could hook up the headset to the Yeti mic (duh) and listen and converse with Mindstrike like there was nothing weird going on. When OBS recorded, my mic audio recorded on one track, the Discord output on another, and had there been any desktop audio at the time, it would have recorded that on a third track (I turned off the multi-source channels for this test, just to be sure).

When I managed to get FFMPG installed with Audacity, I imported my test file audio and got this:

The top two waveforms are my mic, and the bottom two sourced from Discord. Magical! For an actual production run, I’d probably want to have OBS “downmix to mono” because there wouldn’t need to be a left and right channel for single-position voice; both the left and right output would simply source from the mono channel, leaving me with one waveform per track in Audacity, to keep things clean.

Now, the obvious problem is that in a multi-multi user situation — me at the desk, a bunch of people in a channel in Discord — I’m still only going to get two tracks: me, and everyone else. For my “intended purpose” though, this is exactly what I needed. I don’t know if it’s possible for apps like Discord to pick up, send, and deliver individual voices on individual tracks; I suspect that would be horribly bandwidth and CPU intensive, so, for now, I’m glad I stumbled across this, and that I have the hardware that just happens to support the exact situation that I wanted to enable.