The tables are very similar - though you've added a custom calculator which is a nice touch.
Also for the Versus Comparison, it might be nice to have a checkbox that when clicked highlights the superlative fields of each LLM at a glance.
This page has up to date information of all models and providers: https://artificialanalysis.ai/leaderboards/providers We also on other pages cover Speech to Text, Text to Speech, Text to Image, Text to Video.
Note I'm one of the creators of Artificial Analysis.
How do you see this differing from or adding to other analyses such as:
https://huggingface.co/spaces/TTS-AGI/TTS-Arena
https://huggingface.co/spaces/hf-audio/open_asr_leaderboard
https://huggingface.co/spaces/TIGER-Lab/GenAI-Arena
Great work on all the aggregation. The website is nice to navigate.
I was going to add reviews on each model but ran out of steam. Some users have messaged me saying the comparisons are still helpful to them in getting a sense of how different models respond to the same prompt and how temperature affects the same models output on the same prompt.
I'll try to make it as user-friendly as possible. Most of the websites are ugly + too technical.
I like your work visually on first glance, god knows you're right about gradio, even if its irrelevant.
But peddling extremely limited, out of date, versions of other people's data, trumps that, especially with this tagline. "A website to compare every AI model: LLMs, TTSs, STTs"
It is a handful of LLMs, then one TTS model, then one STT model, both with 0 data. And it's worth pointing out, since this endeavor is motivated by design trumping all: all the columns are for LLM data.
According to this website, the cost is half of the gpt4-o mini. 0.15 vs 0.07 per 1M token.
Also, best bang for the buck is very subjective, since one person might need it to work for one use case vs somebody else, who needs it for more.
But I think this moment mirrors financial markets during times of frenzy. When markets are volatile, one common piece of advice is to “wait and see”. Similarly, in AI, so many brilliant minds and organizations are racing to create groundbreaking innovations. Often, what you're envisioning as your next big project might already be happening, or will soon be, somewhere else in the world.
Adopting a “wait and see” strategy could be surprisingly effective. Instead of rushing in, let the dust settle, observe trends, and focus on leveraging what emerges. In a way, the entire AI ecosystem is working for you: building the foundations for your next big idea.
That said, this doesn't mean you can't integrate the state of the art into your own (working) products and services.
That being said, there is no free lunch: when you're doing this, you're more reactive than proactive. You minimize risk, but you also lose any change to have a stake [1] in the few survivors that will remain and be extremely valuable.
Do this long enough and you'll have no idea what people are talking about in the field. Watch the latest Dwarkesh Patel episode to get a sense of what I am talking about.
[1] stake to be understood broadly as: shares in a company, knowledge as an AI researcher, etc.
That said, my perspective focuses more on strategic timing rather than complete passivity. It's about being engaged with understanding trends, staying informed, and preparing to act decisively when the right opportunity emerges. It's less about "waiting on the sidelines" and more about deliberate pacing, recognizing that it’s not always necessary to be at the bleeding edge to create value.
I'll definitely check out Dwarkesh Patel’s latest episode. I assume it is the Gwern one, right? Thanks!
See https://huggingface.co/models?pipeline_tag=automatic-speech-...
Note: Text to Speech and Audio Transcription/Automatic Speech Recognition models can be trained on the same data. They currently require training separately as the models are structured differently. One of the challenges is training time as the data can run into the hundreds of hours of audio.
As per llmarena I'll definitely add it, a lot of other people recommended it as well.
over time will make the website more descriptive and detailed!
In my own experiments with the chat models they seem to lose the plot after about 10 replies unless constantly "refreshed", which is a tiny fraction of the supposed 128000 token input length that 4o has. Does Gemini actually do something dramatically differently, or is their 3 million token context window pure marketing nonsense?
Anecdotally, I use NotebookLM a bit, and while that’s probably RAG plus large contexts (to be clear, this is a guess not based on inside knowledge), it seems very accurate.
Then i just continue from there or simply use this as a seed in another fresh chat.
the website is updated, don't worry :)
As far as I know, there's a volcano engine in China that has impressive text-to-speech capabilities. Many local companies are using this model.
A small suggestion, a toggle to exclude between "free" and hosted models.
Reason is, I'm obv. interested in seeing the cheaper models first but am not interested in self-hosting which dominate the first chunk of results because they're "free".
11labs, deepgram, etc.
you're missing a lot
TTS: 11labs, PlayHT, Cartesia, iFLYTEK, AWS Polly, Deepgram Aura
STT: Deepgram (multiple models, including Whisper), Gladia Whisper, Soniox
just off the top of my head (it's my dayjob!)
1. Maybe explain what Chat Embedding Image generation Completion Audio transcription TTS (Text To Speech) means?
2. Put a running number on the left, or at least just show total?
I've got one where "deploying" means updating a few version strings and image reverences in a different repo. The "build" clones that repo and makes the changes in the necessary spots and makes a commit. Yes, the side effect I want is that the commit gets pushed--which requires my ssh key which is not a build input--but I sort of prefer doing that bit by hand.
BTW impressive idea and upvoted on PH as well.
[0] https://azure.microsoft.com/en-us/pricing/details/cognitive-...
OpenAI and Azure should be the same, it's weird that it shows it as different. I'll look into fixing this.
currently #2 on PH, any help would be appreciated!
I wonder if adding a chatbot might be a good idea. Users could ask specific questions based on their needs, and the bot could recommend the most suitable model. Perhaps this would add more value.