Remove music from live audio stream, leaving just voices - possible?

Matt Holmes

New member
Sep 3, 2021
3
0
1
Charlotte, NC
Greetings. I'm looking for a filter or tool that can remove the music from a live video source (e.g., a NASA broadcast), while leaving the voices and non-musical noises. It doesn't have to be perfect, but it needs to remove/mute enough of the music to avoid triggering YouTube's copyright detection, and not mess up the voices too much. To be clear: I'm not trying to circumvent copyright rules. I'm trying to remove potentially-copyrighted incidental background music from an otherwise copyright-free broadcast. This is my "holy grail". The human brain can easily distinguish between music and speech, and can "filter" out the music in order to focus on the words. Surely there must be a way to do this programmatically, possibly with some sort of AI processing. A simple frequency filter is not sufficient; there are too many common frequencies between voice and music. There could even be a processing delay of a few seconds; it doesn't have to be an instantaneous filter - but the "scrubbed" audio output needs to be sync'able with the video.

Does such a filter or tool exist? I'd even be willing to pay a clever audio-knowledgeable programmer to develop one. I myself am a developer, but I have no knowledge of audio algorithms or AI.
 
If the feed is in stereo, I've occasionally had luck by eliminating anything not panned center (perhaps combined with a frequency filter) - this can be done by splitting the stream into sum/difference (mid/side) components, and then removing the difference (side) component from the stream.
There are undoubtedly more clever approaches leveraging the various speech and music recognition algorithms used by the various digital assistants, but those algorithms are likely to be proprietary.
 
  • Like
Reactions: Matt Holmes
If the feed is in stereo, I've occasionally had luck by eliminating anything not panned center (perhaps combined with a frequency filter) - this can be done by splitting the stream into sum/difference (mid/side) components, and then removing the difference (side) component from the stream.
There are undoubtedly more clever approaches leveraging the various speech and music recognition algorithms used by the various digital assistants, but those algorithms are likely to be proprietary.
Interesting technique! I haven't tried that. Something to play with. Thank you.
 
On other forums, the same question produced the comments about various plugins and Spectral Layers - stuff like that. I don;t think he returned for the responses, like here!
 
There is a technique to remove the music without any filters or tools. You just have to split the stream or the video into difference + sum components. This should help you spot the difference and remove it from the original file. I'm almost sure you can do that in many other ways. And using a filter could actually make it much more manageable. I used to remove the music using the method I described above. I would delete the music, so I don't have any issues posting the stream. I would leave the music only if it's some license-free music, and I wouldn't face any copyright issues.
 
Last edited:
Yeah, man, it's possible. For such purposes, there are a large number of platforms, programs, and applications that can decompose the necessary fragment. Typically it works as follows, you upload a video sequence from which you need to extract the audio and then click on the appropriate function. This action is often called "extraction," If you often need such extraction, the joincombo.com platform can be useful. Of the advantages I have identified for myself is the user-friendly interface and the ability to use it for various purposes and tasks. What other effective tips have you been given?
 
Yeah, man, it's possible. For such purposes, there are a large number of platforms, programs, and applications that can decompose the necessary fragment. Typically it works as follows, you upload a video sequence from which you need to extract the audio and then click on the appropriate function. This action is often called "extraction," If you often need such extraction, the joincombo.com platform can be useful. Of the advantages I have identified for myself is the user-friendly interface and the ability to use it for various purposes and tasks. What other effective tips have you been given?
Such tools would work for "offline" content, but not for live streams. I need to try the "sum/difference" technique mentioned above.