For speech-to-text, large-language-model inference and text-to-speech I created three wrapper libraries in C/C++ (using Whisper.cpp, Llama.cpp and Piper).
Follow the URL to see an example that shows how to use these libraries for a speech-to-text, LLM inference, text-to-speech pipeline.