For a quick test I've uploaded a photo of my home office and asked the following prompt: "Retouch this photo to fix the gray panels at the bottom that are slightly ripped, make them look brand new"
Input image (rescaled): https://i.imgur.com/t0WCKAu.jpeg
Output image: https://i.imgur.com/xb99lmC.png
I think it did a fantastic job. The output image quality is ever so slightly worse than the original but that's something they'll improve with time I'm sure.
We had a realtor list a property in our area and they had used generative AI to "re-imagine" the property because the original owner who had bought it in 1950 and died in it in 2023 had done zero maintenance and upgrades. People who showed up to see it were really super pissed. The realtor argued this was just the next step after staging, but it certainly didn't work here. They took it off the market and a bunch of people showed up to fix it up (presumably from the family but one never knows).
When I last bought a used car I found it in a classified newspaper ad: there was no picture.
I looked at every car I considered in-person.
When I found one I liked I paid for an independent pre-purchase inspection, discovered a crack in the radiator, and negotiated the price down to cover my post-sale expense fixing it.
The "success" of Craiglist is that it exposes you item to a wider pool of buyers, which increases the chance that the one person who really wants it, will see it. And if they really want it they are motivated to go out of their way to get to it. But if even the pictures lie and you don't know what you're getting until you get there, your willingness to take the risk and drive out is reduced, which means people will have items that might have sold if you were trusted.
This happens on EBay too. Sellers list something and it isn't as described, and fraudulent sellers will say "but it is! This buyer is trying to scam me." and EBay usually sides with the seller.
My prediction (and hey, its just a guess) is that if people start using these tools to "enhance" the images they use to sell stuff and it becomes a regular practice, then the total population of people who will use Craigslist will go down and prices overall will be reduced as that fraud gets priced in. Sellers won't get as much as they think they should and stop selling there. If it drops below critical mass then the service suffers.
This is not my experience at all and I've used eBay since 2008. eBay is pro buyer to the point that I don't sell anything on eBay (and buy all everything on eBay if price is the same).
I was going to say the same thing. The car on the picture may not have a broken headlight, and the one in reality may, but if it takes for the person >2 hours just to visit that car, they may still end up buying it anyway as they have already invested too much time (and possibly money) into it.
- No, but people do it anyway due to anxiety
- People can be pressured, the trick is to meet them the first time
- People say they care about faces, but don't actually care about faces
It happened to me, too. I did not find someone particularly attractive, but their experiences, their views of relationships, the world, and so forth somehow ended up making them look more attractive.
Especially when the house is vacant/empty, it helps to see a proposed layout so you can imagine living there.
OpenAI just yesterday added the ability to do higher fidelity image edits with their model [1], though I'm not sure if the functionality is only in the API or if their chat UI will make use of this feature too. Same prompt and input image: [2]
I couldn't help but notice that you can still see the shadows of the rips in the fixed version. I wonder how hard it would be to get those fixed as well.
The input image is scaled down to the closest aspect ratio of approximately 1 megapixel.
I ran some experiments with Kontext and added a slider so you can see the before / after of the isolated changes it makes without affecting the entire image.
https://specularrealms.com/ai-transcripts/experiments-with-f...
Incidentally and veering off topic, I find it extremely annoying that to open both pictures I need to click numerous times to avoid receiving unwanted cookies (even if some are „legitimate“, implying others are not). A further nuisance is from the fact that multiple websites have the same cookies vendor pop-up, suggesting there is a „cookies-as-a-service“ vendor of some sort.
I don’t know how much tool use there is these days of the llm „just“ calling image generation models, with a bunch of prompt reformulation for the text-to-image model which is most likely a „steerable“ diffusion model (really nice talks by Stefano Ermon on youtube!).
Actually multimodal models usually have a vision encoder submodel that translates image patches into tokens and then the pretrained llm and vision model are jointly finetuned. I think reading the reports about gemma or kimi VL will give a good idea here.
3 thumbs up. Part of the community which _knows_ what is peak gaming (and natively on elf/linux...)
From the "One more thing" on the Mistral Medium 3 blog post in May:
> With the launches of Mistral Small in March and Mistral Medium today, it’s no secret that we’re working on something ‘large’ over the next few weeks. With even our medium-sized model being resoundingly better than flagship open source models such as Llama 4 Maverick, we’re excited to ‘open’ up what’s to come :)
That version ought to close the gap to large top models today enough to not matter much anymore. And the Cerebras speed will/does make it feel awesome compared to ChatGPT.
We don’t need another heat source there, the glaciers are already having a hard time.
What about their infra ? Nothing's running in France ?
They are very good at making smaller models, not the smartest or the most knowledgeable, but you generally get pretty clean results quickly. Also, in my experience, they are less heavy handed than others when it comes to censorship.
I fire off the ide switch the model and think oh great this is better. I switch to something that worked before and man, this sucks now.
Context switching llm, Model Release Fatigue
Fine tuning will work for niche business use cases better than promises of AGI.
I was listening to a Taiwanese news channel earlier today and although I wasn't paying much attention, I remember hearing about how Chinese AIs are biased towards Chinese political ideas and that some programme to create a more Taiwanese-aligned AI was being put in place.
I wouldn't be surprised if just for this reason, at least a few different open models kept being released, because even if they don't directly bring in money, several actors care more about spreading or defending their ideas and IAs are perfect for that.
One theory is that they believe the real endpoint value will be embodied AIs (i.e. robots), where they think they'll hold a long-term competitive advantage. The models themselves will become commoditized, under the pressure of the open-source models.
Hats off to the folks who have decided to deal with the nascent versions though.
The server is basically just my Windows gaming PC, and the client is my editor on a macOS laptop.
Most of this effort is so that I can prepare for the arrival of that mythical second half of 2026!
[1] https://github.com/ollama/ollama/blob/main/docs/faq.md#how-d...
[2] https://huggingface.co/collections/Qwen/qwen25-coder-66eaa22...
Not useful though, I just like the idea of having so much compressed knowledge on my machine in just 20gb. In fact I disabled all Siri features cause they're dogshit.
In particular it’s important to get past the whole need-to-self-host thing. Like, I used to be holding out for when this stuff would plateau, but that keeps not happening, and the things we’re starting to be able to build in 2025 now that we have fairly capable models like Claude 4 are super exciting.
If you just want locally runnable commodity “boring technology that just works” stuff, sure, cool, keep waiting. If you’re interested in hacking on interesting new technology (glances at the title of the site) now is an excellent time to do so.
If they have to enshiffify, I don’t want that baked into my workflow. If they have to raise prices, that changes the local vs remote trade off. If they manage to lower prices, then the cost of running locally will be reduced as well.
I’m also not sure what the LLMs that I’d want to use look like. No real deal-maker applications have shown up so far; if the good application ends up being something like “integrate it into neovim and suggest completions as you type” obviously I won’t want to hit the network for that.
Early days still.
i can understand maybe if you’re spending hours setting it up but to me these are download and go
Maybe s.th. like a collective that buys the gpu's together and then uses them without leaking data can work.
maybe 128gb of vram becomes the new mid tier model and most llms can fit into this nicely and do everything one wants in an llm
given how fast llms are progressing it wouldn’t surprise me if we reach this point by 2030
I hope I'm wrong though, and we see a large bump soon. Even just 32GB in the mid tier would be huge.
I'm really tempted to try out a Mac Studio with 256+ GB Unified Memory (192 GB VRAM), but it is sadly out of my budget at the moment. I know there is a bandwidth loss, but being able to run huge models and huge contexts locally would be quite nice.
I use AI mostly for problems on my fringes. Things like manipulating some Excel table somebody sent me with invoice data from one of our suppliers and some moderately complex question that they (pure business) don't know how to handle, where simple formulas would not be sufficient and I would have to start learning Power Query. I can tell the AI exactly what I want in human language and don't have to learn a system that I only use because people here use it to fill holes not yet served by "real" software (databases, automated EDI data exchange, and code that automates the business processes). It works great, and it saves me hours on fringe tasks that people outsource to me, but that I too don't really want to deal with too much.
For example, I also don't check various vendors and models against one another. I still stick to whatever the default is from the first vendor I signed up with, and so far it worked well enough. If I were to spend time checking vendors and models, the knowledge would be outdated far too quickly for my taste.
On the other hand, I don't use it for my core tasks yet. Too much movement in this space, I would have to invest many hours in how to integrate this new stuff when the "old" software approach is more than sufficient, still more reliable, and vastly more economical (once implemented).
Same for coding. I ask AI on the fringes where I don't know enough, but in the core that I'm sufficiently proficient with I wait for a more stable AI world.
I don't solve complex sciency problems, I move business data around. Many suppliers, many customers, different countries, various EDI formats, everybody has slightly different data and naming and procedures. For example, I have to deal with one vendor wanting some share of pre-payment early in the year, which I have to apply to thousands of invoices over the year and track when we have to pay a number of hundreds or thousands of invoices all with different payment conditions and timings. If I were to ask the AI I would have to be so super specific I may as well write the code.
But I love AI on the not-yet-automated edges. I'm starting to show others how they can ask some AI, and many are surprised how easy it is - when you have thee right task and know exactly hat you have and what you want. My last colleague-convert was someone already past retirement age (still working on the business side). I think this is a good time to gradually teach regular employees some small use cases to get them interested, rather than some big top-down approach that mostly creates more work and many people then rightly question what the point is.
About politically-touched questions like whether I should rather use an EU-made AI like the one this topic is about, or use one from the already much of the software-world dominating US vendor, I don't care at this point, because I'm not yet creating any significant dependencies. I am glad to see it happening though (as an EU country citizen).
Another nice thing about waiting a bit—one can see how much (if any) the EU models get from paying the “do things somewhat ethically” price. I suspect it won’t be much of a penalty.
I’m big on AI, but vibe coding is such a fuck around and find out situation.
But using AI tools for things like completing simple functions (co-pilot) or asking questions about a codebase can still be huge time savers. I've also had really good success with having AI generate me basic scripts that would have taken 45 minutes of work, but it gets me a working script in 3. It's not the revolution that's been promised, but it definitely makes me faster even though I don't like it
(Aside: Hi Ben! If you are who I think you are, we started at the same company on the same day back in August of 2014.)
For wage workers, not learning the latest productivity tools will result in job loss. By the time it is expected of your role, if you have not learned already, you won't be given the leniency to catch up on company time. There is no impactful resistance to this through individual protest, only by organizing your peers in industry
Personally I only use Claude/Anthropic and ignore other providers because I understand it the more. It's smart enough, I rarely need the latest greatest.
One way to avoid this: stick with one LLM and bet on the company behind it (meaning, over time, they’ll always have the best offering). I’ve bet on OpenAI. Others can make different conclusions.
While Mistral might not have the best LLM performances, their UX is IMO the best, or at least a tie with OpenAI's:
- I never had any UI bug, while these were common with Claude or OpenAI (e.g. a discussion disappearing, LLM crashing mid-answer, long context errors on Claude ...);
- They support most of the features I liked from OpenAI, such as libraries and projects;
- Their app is by far the fastest, thanks to their fast reply feature;
- They allow you to disable web-search.
Enough! I just paid for a year of Gemini Pro, I use gemini-cli for free for small sessions, turn on using my API key for longer sessions to avoid timeout, and most importantly: for API use I mostly just use Gemini 2.5-flash, sometimes -pro, and Moonshot’s Kimi K2. I also use local models on Ollama when they are sufficient (which is surprisingly often.)
I simply decided that I no longer wanted the hobby of always trying everything. I did look again at Mistral a few weeks ago, a good option, but Google was a good option for me.
Well, OpenAI copied the Deep Research feature from Google. They even used the same name (as does Mistral).
They've recently removed (limited) use of it from the free plan, so I guess it was costing more than they were making from paid subscribers
All of the major labs are innovating and copying one another.
Anthropic has all of the other labs trying to come up with an "agentic" protocol of their own. They also seem to be way ahead on interpretability research
Deepseek came up with multi-headed latent attention, and publishing an open-source model that's huge and SOTA.
Deepmind's way ahead on world models
...
This used to be a good example of innovation that is hard to copy. But it doesn't apply anymore for two reasons:
1. Apple went from being an agile, pro-developers, creative company to an Oracle-style old-board milking-cow company; not much innovation is happening at Apple anymore.
2. To their surprise, much of what they call "innovative" is actually pretty easy to replicate on other platforms. It took 4 hours for Flutter folks to re-create Liquid Glass...
Steve Jobs did say they "patented the hell out of [the iPhone]" and went about saber-rattling, then came the patent wars which proved that Apple also rely on innovation by others, and that patent workarounds would still result in competitive products, and things calmed down afterwards.
Bear in mind that there are a lot of very strong _open_ STT models that Mistral's press-release didn't bother to compare to, making impression they are the best new open thing since Whisper. Here is an open benchmark: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard . The strongest model Mistral compared to is Scribe, ranked 10 here.
This benchmark is for English, but many of those models are multilingual (eg https://huggingface.co/nvidia/canary-1b-flash )
One element of comparison is OpenAI Whisper v3, which achieves 7.44 WER on the ASR leaderboard, and shows up as ~8.3 WER on FLEURS in the Voxtral announcement[0]. If FLEURS has +1 WER on average compared to ASR, it would imply that Voxtral does have a lead on ASR.
Also note that, Voxtral's capacity is not necessarily all devoted to speech, since it "Retains the text understanding capabilities of its language model backbone"
IBM’s granite models seems multilingual and well ranked, but can’t find any app using it.
Anybody aware of a dictation app using one of those "better" models?
They do support Voxtral, among others.
I am a bit disappointed, the headline made me think they offer a voice mode similar to OpenAI.
There is a lot of value to say engineers doing tradeoff studies using these tools as a huge head start.
Agreed about Google, accuracy is a little better on the paid version but the reports are still frustrating to read through. They're incredibly verbose, like an undergrad padding a report to get to a certain word count.
"Be terse" is a mandatory part of the prompt now.
Either it's to increase token counts so they can charge more, or to show better usage growth metrics internally or for shareholders, or just some odd effects of fine tuning / system prompt ... who knows.
Sites like simonwillison.net/2025/jul/ and channels like https://www.youtube.com/@aiexplained-official also cover new model releases pretty quickly for some "out of the box thinking/reasoning" evaluations.
For me and my usage I can really only tell if I start using the new model for tasks I actually use them for.
My personal benchmark andrew.ginns.uk/merbench has full code and data on GitHub if you want a staring point!
ref: scene: https://www.youtube.com/watch?v=Vm6F0mVVJBo