Hearted Youtube comments on AI Search (@theAIsearch) channel.

  1. 5100
  2. 2500
  3. 2400
  4. 1400
  5. 1400
  6. 1300
  7. 1300
  8. 1200
  9. 1200
  10. 1100
  11. 948
  12. 906
  13. So I made some discoveries which could help some people. -Chunk: You should increase this if you experience any distortions or voice lag when gaming with this. This adds more graphics processing time, and if you set it longer it wont rush out bad audio. -Extra: This setting gives bonus CPU usage to help iron out the audio. I found that sometimes the changer would translate an F sound to an S sound, but adding a bit "extra" CPU to it (like the 8k setting or higher) fixes the problem. You don't want to max this out unless the only thing you're doing is using your voice, as maxing it out will use all or nearly all of your CPU. -Noise: I recommend using Sup2 option if you have an Air-Conditioner or other background noise. Sup1 didn't work as well and is probably for a different frequency range, so your millage may vary. When you start real time voice changing, you'll see some info in a box with millisecond timers. The thing to watch is the "res" time. If this time starts going up, from around 300ms it starts rising to a thousand then two thousand etc, this means the computer is unable to get the voice processing out in time, its being pushed back in priority you could say. The fix is to increase the Chunk, this will give it more time to process with the remainder of your resources, and switching it you should see the number start decreasing rapidly. If it doesn't just raise it even higher, and also again keep in mind that if you're doing something that is CPU intensive, you need to keep the Extra setting fairly low (like around 8k). I have a powerful computer (10 core i9 10900kf, with a reference 3080ti), and I found that if im going to play a "serious" game like GTAV or StarCitizen etc, its best to have the Chunk as high as 192, or 256, with the Extra set to 8192. If you're just on discord, or playing some very light game, you can crank the Extra up, and reduce the Chunk to maintain high quality audio but process it considerably faster. Hope this helps someone!! Good luck o/
    695
  14. 617
  15. 505
  16. 499
  17. 478
  18. 457
  19. 440
  20. 420
  21. 404
  22. 387
  23. 344
  24. 331
  25. I've been experimenting with this for a bit, and I'm disappointed by how vague and incomplete the English documentation on these settings is. In an effort to remedy this, here's my breakdown of each setting: Response threshold: Controls the noise gate. Any sound below the threshold is suppressed. This is used to prevent background noise and hiss from being turned into strange mumbling. Equivalent to "S. Threshold" in w-okada. Not applicable in RVC WebUI. Pitch settings: Applies a pitch offset to your input voice. Every multiple of 12 setting increases or decreases the voice by an octave. Adjustments by 1 increase or decrease by a semitone. Using whole octaves is primarily used to ensure you can sing in the same key. Equivalent to "TUNE" in w-okada. Equivalent to "Transpose" in RVC WebUI. Index rate: When an index file is provided, this slider augments the target voice by preserving more of its accent and less of the input voice (to reduce tone leakage). This is particularly useful for voices trained with a low epoch count (around 200-ish or less). If set too high, it can cause strange pronunciation artifacts. I usually find something around 0.30 to sound good, but it varies by voice model. Equivalent to "INDEX" in w-okada. Equivalent to "Search feature ratio" in RVC WebUI. Loudness factor: How little to preserve the loudness of the input performance. At 0, the loudness of the cloned voice should match the loudness of the input voice. At 1, the cloned voice will always be at full loudness. 0 is useful if you want to distinguish between whispers, talking, screaming, etc. 1 is useful to have the cloned voice always speak loudly and clearly, as loud as the loudest things it was trained on (which can have artifacts such as mic clipping depending on the training set). Values in-between provide partial volume control biased toward being louder, the closer you get to 1. There is no equivalent in w-okada. Equivalent to "volume envelope scaling" in RVC WebUI. Pitch detection algorithm: Different algorithms are better at different things. rmvpe is the current state-of-the-art and works fastest and usually with the highest quality. Equivalent to "F0 Det." in w-okada. Equivalent to "pitch extraction algorithm" in RVC WebUI. Sample length: The realtime voice changer works by sending small chunks of audio for quick conversion, then stitching them together. Longer sample lengths feed in longer chunks, making the stitches less obvious and reducing GPU requirements but increasing output latency. On a low end GPU, setting this too low will make the GPU unable to keep up and produces stutters. On a high end GPU, setting this too low will cause warbling as an artifact of stitching many overly-short chunks together. Equivalent to "CHUNK" in w-okada. Not applicable in RVC WebUI. Number of CPUs: Self explanatory. Note, however, that rmvpe is a GPU-based pitch extractor and should be relatively unaffected by this setting. There is no equivalent in w-okada. Not applicable in RVC WebUI. Fade length: The length between chunks to crossfade together. Longer may reduce warbling. Equivalent to "overlap" in w-okada advanced settings. Not applicable in RVC WebUI. Extra inference time: How much old audio to load into each chunk. The extra context usually improves voice quality for the generated chunk but is more demanding for the GPU. Equivalent to "EXTRA" in w-okada. Not applicable in RVC WebUI. Input noise reduction: Attempts to remove non-speech background noise from the input to prevent sounds from being turned into strange mumbling. Equivalent to "NOISE" in w-okada. Not applicable in RVC WebUI. Output noise reduction: Applies the same noise reduction to the output voice. Possibly good for poorly trained voices with lots of background noise. There is no equivalent in w-okada, but the usefulness of this setting is dubious. Not applicable in RVC WebUI. Input voice monitor: Lets you hear the voice audio being passed in to the voice changer, sent to the target output device. Useful to ensure you are passing in the audio you actually want or to passthrough your audio without voice changing. Comparable to "monitor" settings in w-okada. Not applicable in RVC WebUI. Output converted voice: Outputs the voice conversion to the target output device. Main features RVC realtime has that w-okoda doesn't: Loudness factor controls. W-okoda seems to always use a value of 0. Significantly lower CPU usage at equivalent performance settings, in my experience. Main features that w-okoda has that RVC realtime doesn't: No system to save model presets. Input/output gain is missing. Input noise reduction is less robust compared to w-okoda, which offers echo reduction and multiple noise suppression techniques. Unlike w-okoda, you cannot passthrough to the input mic, instead requiring the use of virtual audio cable to pass the cloned voice into voice calls and microphone recording programs. In w-okoda, when the mic loudness falls below the response threshold, the tool is paused until speech is once again loud enough, saving GPU and CPU resources. RVC realtime always passes audio whenever it is running. Unlike w-okoda, you cannot monitor the cloned voice while outputting it. You can work around this by using the "listen" feature in the Windows sounds panel on a virtual audio cable instead. No built-in recording functionality. Missing most of the settings in the w-okoda "advanced settings" menu. No way to choose which GPU to run the voice model on. You can get around this by setting CUDA_VISIBLE_DEVICES=# in a terminal before launching the tool from there, where # is the index of your target GPU (0, 1, 2, etc.).
    319
  26. 263
  27. 261
  28. 229
  29. 205
  30. 193
  31. 171
  32. 169
  33. 161
  34. 138
  35. 135
  36. 132
  37. 129
  38. 128
  39. 125
  40. 120
  41. 117
  42. 116
  43. 111
  44. 110
  45. 106
  46. 100
  47. 91
  48. 90
  49. 89
  50. 88