I tried the Audacity noise-removal plugin recently and it's complete crap. I fed a high-quality audio stream from a Rode mic into a few different options to see which could remove the noise of my server rack. iMovie made the voice sound like a robot and Audacity barely did anything. The only thing that worked was DeepFilterNet and it's free, open-source and cargo installable.

There's no reason to lock yourself into an intel-only solution. Just use DeepFilterNet. The results of this on my noisy server room were insanely good. Almost no voice dropout with 100% fan noise removal.

https://github.com/Rikorose/DeepFilterNet

EDIT: Even more interesting, it looks like OpenVino is just DeepFilterNet glued to Whisper.cpp and tied to Intel hardware.

https://github.com/intel/openvino-plugins-ai-audacity/tree/m...

OpenVINO is an Intel toolkit for deploying AI models. This particular project is an Intel project using the OpenVINO toolkit to package several existing models as audacity plugins.
> an intel-only solution.

> OpenVino is just DeepFilterNet glued to Whisper.cpp and tied to Intel hardware.

Well, no.

When you want to run a model on a truly wide set of devices, you end up sort of wedged into either ONNX, OpenVINO, TensorFlow Lite, and a few other frameworks.

They're all FOSS, and they're software libraries.

YMMV on which is best, of course, but broadly and widely: where are your users, mostly? Desktop? OpenVINO. Web? TensorFlow. Mobile and desktop? ONNX. This isnt entirely accurate because ex. I reach for ONNX every time because that is what I'm familiar with. All of them make effort to reach every platform, ex. OpenVINO goes supports ARM, and not in a trivial manner.

That all being said, TL;DR:

It is "not even wrong", in the Pauli sense, to imply OpenVINO is Intel-only, and to describe OpenVINO as "just glu[ing a model to inference code]"

You're describing 3 different components (a hardware acceleration library, and inference library, and a model) and suggesting the hardware accelerated inference library just glues together a model-specific inference library and a model. The mastroyshka doll is inverted: whisper.cpp uses openvino to acclerate its model-specific inference code.

The Noise Removal plugin takes a bit getting used to, but I have had great results from it. I don't mean to point to operator error... it's got too many options for someone new to it to tune.
This only works with Intel GPUs, and CPUs and NPUs. No Nvidia support for instance.

https://docs.openvino.ai/2024/about-openvino/release-notes-o...

I used to work at Intel doing OpenVINO stuff. Should work on AMD too; it's just not validated for it so there might be quirks.
  • vient
  • ·
  • 3 weeks ago
  • ·
  • [ - ]
They have some NVIDIA support in the form of external project: https://github.com/openvinotoolkit/openvino_contrib/tree/mas...
Nvidia has Broadcast which is Windows-only:

https://www.nvidia.com/en-us/geforce/broadcasting/broadcast-...

Doesn't work on files so can't be used in Audacity, only on live mic audio.
Fair point! I don't suppose Windows allows pulse audio style routing of audio streams to use denoising on a fake microphone.
I ran it on my amd 5950x with an rtx3090. It crashed when I attempted GPU process but ran fine on CPU. YMMV
So use your cpu
  • htsh
  • ·
  • 3 weeks ago
  • ·
  • [ - ]
A lot of us have ryzen / nvidia combos... hopefully, soon, though.
Openvino runs fine on AMD last I checked
Maybe it does, however the system requirements page makes it looks like it supports everything BUT AMD.

https://docs.openvino.ai/2024/about-openvino/release-notes-o...

It supports AMD cpus because, if I understand correctly, AMD licenses x86 from Intel, so it shares the same bits needed to run openVINO as Intel’s cpus.

Go look at CPUs benchmarks on Phoronix; AMD Ryzen cpus regularly trounce Intel cpus using openVINO inference.

Or use the underlying open-source models directly; this is just several existing open models packaged by an Intel-specific deployment framework and wrapped as Audacity plugins.
This is a great suggestion and all, but don’t you need a frontend/pipeline to run data through these models?
There are existing frontends for these models that aren't tied to Intel hardware. It may be somewhat less convenient than having them packaged as audacity plugins, but they certainly exist, for people who would want to use them but do not want to be limited to Intel hardware.
It might also work with AMD CPUs too
  • jogu
  • ·
  • 3 weeks ago
  • ·
  • [ - ]
I've used this plugin on an AMD CPU, it definitely works.
Is there a tool that can remove very noisy audio recording of a song using actual song as a reference?

I found a very old audio cassette from my childhood with me and some other kids talking while a song is playing in background. I tried subtracting the song using Audacity but for that to work reference song and recording must align "perfectly" which is very very hard. Not just the timing (which i found can be a problem with cassettes) loudness/frequency distribution must also align perfectly.

Found Smartsubtract https://oxfordwaveresearch.com/products/smartsubtract/ which seems to do exactly the same but it's not available for download.

Is there any (AI even?) tool that might do that? I tried an online AI tool which claimed it can extract voices but it returned back silence. I want to try OpenVino but not sure it will be useful with faint spoken words in a noisy environment with a song.

I don't know if there's an available tool to do it, but "assuming these sources are aligned in time, remove this reference B from that recording A" would be quite a nice undergrad problem. You'd do something like cross-correlate A with B, multiply B by the correlation coefficient and then subtract the result from A. In the frequency domain, because that makes things a little easier.

The next question on the problem would be "Give at least three reasons why this doesn't perfectly remove the reference sound," of course.

I would try openvino, it might get you part of the way there.

Other things to do would be to fix any tape warble or flutter, normalize the volume, do simple things like high pass filtering anything below 75hz (as most voices don't make audible volume at those frequencies, especially children's voices.

Then I would get a spectrum analyzer plugin and see if there are any spots that are clearly music vs children speaking and zap them out.

(Audition is pretty good software for this, you might still be able to find the download and mass unlock serial key that Adobe released for version 3.0 somewhere on the internet, of course, this is only for people who bought it as a perpetual license back in the day and need to activate it now that the activation servers have gone offline, so no pirating it!)

I'm not gonna say it will be perfect but you might do well enough to be able to hear what everyone is saying and it not sound too bizarre.

I tried Photosounder which works on FFT of the sounds. I expected it have many powers of photoshop to crop/copy/subtract spectrum which I couldn't do with it. It has option of layers but its not as intuitive as photoshop like tools.

Also looking at frequency spectrum and removing sounds from there I learned that music and spoken both contain a bunch of different frequency. To completely eleminate music, i have to remove all of it which is not easy to see or trace manually.

From what you are saying, given audition is an adobe tool, it should be sufficient.

  • oDot
  • ·
  • 3 weeks ago
  • ·
  • [ - ]
Try iZotope RX
Thanks for suggestion. Some quick lookup suggests it should be able to do it. Will give this a try.
  • kmfrk
  • ·
  • 3 weeks ago
  • ·
  • [ - ]
I'm a big fan of RTX Voice, but it seems like the kind of feature you can only use in real-time as virtual audio and not as postprocessing. Anyone if Nvidia makes this possible?
  • pabs3
  • ·
  • 3 weeks ago
  • ·
  • [ - ]
Wonder if these are open models like RNNoise now is.

https://github.com/xiph/rnnoise