Comments by "Chrysippus" (@4.0.4) on "All You Need To Know About Running LLMs Locally" video.

Dunno why my comment isn't going through, but try Kobold! Better for GGUF. Current fav is "Crunchy Onion" Q4_K_M GGUF. Give it a taste! 10t/s on a 3090 and pretty smart.
3
For a while it was only Mac-based, so it saw limited use with most AI folks who have Nvidia cards. If you're stuck on a Mac I hear it's really the better one for that.
2
ST can plug into Kobold, not just Ooba; notably, Kobold offers better GGUF inference. There is even a "Frankenstein" version which has a few novel 3-bit and even 2-bit (and now lower) quantization methods (< 2 bit not recommended). If you have a 3090 or 4090, do not use 7b models. You can run so much better ones. Give 8x7b ones a try, such as Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss, I'd also recommend Crunchy-onion-Q4_K_M.gguf (little known one, but fun).
1