Something one doesn't see in news headlines. Happy to see this comment.
I taught our entry-level calculus course a few years ago and had two blind students in the class. The technology available for supporting them was abysmal then -- the toolchain for typesetting math for screen readers was unreliable (and anyway very slow), for braille was non-existent, and translating figures into braille involved sending material out to a vendor and waiting weeks. I would love to hear how we may better support our students in subjects like math, chemistry, physics, etc, that depend so much on visualization.
He is still active and online and has a contact page see https://www.foneware.net. I have been a poor correspondent with him - he will not know my HN username. I will try to reach out to him.
https://www.reddit.com/r/openscad/comments/1p6iv5y/christmas...
The creator, https://www.reddit.com/user/Mrblindguardian/ has asked for help a few times in the past (I provided feedback when I could), but hasn't needed to as often of late, presumably due to using one or more LLMs.
He did a great skit with Lee Mack at the BAFTAs 2022[0], riffing on the autocue the speakers use for announcing awards.
I'm not a fan of his (nothing against him, just not my cup of tea when it comes to comedy and mostly not been interested in other stuff he's done), but the few times I have seen him as a guest on shows it's been clear that he's a generally clever person.
I hope this wasn't a terrible pun
A call home let us know that our son had set it off learning to reverse-sear his steak.
The same arguments were said for blind people and the multitude of one-off devices that smartphones replaced, OCR to TTS, color detection, object detection in photos/camera feeds, detecting what denomination US bills are, analyzing what's on screen semantically vs what was provided as accessible text (if any was at all), etc. Sure, services for the blind would come by and help arrange outfits for people, and audiobook narrators or braille translator services existed, and standalone devices to detect money denominations were sold, but a phone can just do all of that now for much cheaper.
All of these accessibility AI/ML features run on-device, so the knee-jerk anti-AI crowd's chief complaints are mostly baseless anyways. And for the blind and the deaf, carrying all the potential extra devices with you everywhere is burdensome. The smartphone is a minimal and common social and physical burden.
I've worked on some audio/video alert systems. Basic threshold detectors produce a lot of false positives. It's common for parents to put white noise machines in the room to help the baby sleep. When you have a noise generating machine in the same room, you need more sophisticated detection.
False positives are the fastest way to frustrate users.
Need? Probably not. I bet it helps though (false positives, etc.)
>would be cheaper, faster detection, more reliable, easier to maintain, and more.
Cheaper than the phone I already own? Easier to maintain than the phone that I don't need to do maintenance on?
From a fun hacking perspective, a different sensor & device is cool. But I don't think it's any of the things you mentioned for the majority of people.
I know this is a low quality comment, but I'm genuinely happy for you.
A question directed to GP, directly asking about their life and pointing this out is somehow virtue signalling, OK.
Maybe you’re just being defensive? I’m sure he didn’t mean an attack at you personally.
Accusing someone of “virtue signaling” is itself virtue signaling, just for a different in-group to use as a thought terminating cliche. It has been for decades. “Performative bullshit” is a great way to put it, just not in the way you intended.
If the OP had a substantive point to make they would have made it instead of using vague ad hominem that’s so 2008 it could be the opening track on a Best of Glenn Beck album (that’s roughly when I remember “virtue signaling” becoming a cliche).
Or should I too perhaps wait for OP to respond.
The lens through which you're analyzing the phrase is coloring how you see it negatively, and the one I'm using is doing the opposite. There is no need to change the phrase, just how it's viewed, I think.
And when I say 'it never crosses our minds' I really mean it, there's zero thoughts between thinking about a message and having it show up in a text box.
A really great example are slurs, for a lot of people they have to double take, but there's zero extra neurons fired when I read them. I guess early internet culture is to blame since all kinds of language was completely uncensored and it was very common to run into very hostile people/content.
No. It’s acknowledging that that perhaps one’s opinion may not be as useful as somebody else’s in that moment. Which is often true!
Your first and third paragraphs are true, but they don’t apply to every bloody phrase.
Are you making these five mistakes when writing alt text? [1] Images tutorial [2] Alternative Text [3]
[1]: https://www.a11yproject.com/posts/are-you-making-these-five-...
For example... https://chatgpt.com/share/692f1578-2bcc-8011-ac8f-a57f2ab6a7...
There's a great app by an indie developer that uses ML to identify objects in images. Totally scriptable via JavaScript, shell script and AppleScript. macOS only.
Could be 10, 100 or 1,000 images [1].
Just saying "It's a chart" doesn't feel like it'd be useful to someone who can't see the chart. But if the other text on the page talks about the chart, then maybe identifying it as the chart is enough?
Would love to hear a good example of alt text for something like that where the data isn't necessarily clear and I also don't want to do any interpreting of the data lest I influence the person's opinion.
Yeah, I think I misunderstood the context. I understood/assumed it to be for an article/post you're writing, where you have something you want to say in general/some point of what you're writing. But based on what you wrote now, it seems to be more about how to caption an image you're sending to a blind person in a conversation/discussion of some sort.
I guess at that point it'd be easier for them if you just share the data itself, rather than anything generated by the data, especially if there is nothing you want to point out.
“Why is this here? What am I trying to say?” are super important things in design and also so easy to lose track of.
[1]: https://web.archive.org/web/20130922065731/http://www.last-c...
Additionally, recently I've been a participant in accessibility studies where charts, diagrams and the like have been structured to be easier to explore with a sr. Those needed js to work and some of them looked custom, but they are also an alternative way to layer data.
The amount of times I've seem captions that wouldn't make sense for people who never been able to see is staggering, I don't think most people realize how visual our typical language usage is.
Video descriptions, through PiccyBot, have made watching more visual videos or videos where things happen that don't make sense without visuals much easier. Of course, it'd be much better if YouTube incorporated audio description through AI the same way they do captions, but that may happen in a good 2 years or so. I'm not holding my breath. Google as a whole is hard to get accessibility out of more than the bare minimum.
Looking up information like restaurant menus. Yes it can make things up, but worst-case, the waiter says they don't have that.
AI has been a boon for me and my non-tech job. I can pump out bespoke apps all day without having to get bent on $5000/yr/usr engineering software packages. I have a website for my side business that looks and functions professionally and was done with a $20 monthly AI subscription instead of a $2000 contractor.
I use AI daily as a senior coder for search and docs, and when used for prototyping you still need to be a senior coder to go from say 60% boilerplate to 100% finished app/site/whatever unless it's incredibly simple.
I know you would like to believe that, but with the tools available NOW, that's not necessarily the case. For example, by using the Playwright or Chrome DevTools MCPs, models can see the web app are it's being created and it's pretty easy to prompt them to fix something they can see.
These models know the current frameworks and coding practices but they do need some guidance; they're not mindreaders.
Again it's the last 5% that takes 95% of the time, and those 5% i haven't seen fixed with Claude or Gemini, because it's essentially quirks, browser errors, race conditions, visual alignment, etc etc. All stuff that completely goes way above any LLM's head atm from what i've seen.
They can definitely bullshit a 95% working app though, but that's 95% from being done ;)
Nothing I do is in the tech industry. It's all manufacturing and all the software is for in-house processes.
Believe it or not, software is useful to everyone and no longer needs to originate from someone who only knows software.
You didn't give any examples of the valuable bespoke apps that you are creating by the hour.
I simply don't believe you, and the arrogant salesy tone doesn't help.
If your needs fit in a program that size, you are pretty much good to go.
It will not rewrite PCB_CAD 2025, but it will happily create a PCB hole alignment and conversion app, eliminated the need for the full PCB_CAD software if all you need is that one toolset from it.
Very, very, few pieces of software need to be full package enterprise productivity suites. If you just make photos black and white and resize them, you don't need Photoshop to do it. Or even ms paint. Any LLM will make a simple free program with no ads to do it. Average people generally do very simple dumb stuff with the expensive software they buy.
As far as enshittification goes, this was happening long before AI. It probably started with SEO and just kept going from there.
Yet we fail to see AI as a good thing but just as a jobs destroyer. Are we "better than" the people that used to fill toothpaste tubes manually until a machine was invented to replace them? They were just as mad when they got the pink slip.
https://www.microsoft.com/en-us/garage/wall-of-fame/seeing-a...
... and that was 10 years ago. I'm curious for what it could do now.
[0]https://github.com/apple/ml-starflow/blob/main/LICENSE_MODEL
As for the license, happily, Model Weights are the product of machine output and not creative works, so not copyrightable under US law. Might depend on where you are from, but I would have no problem using Model Weights however I want to and ignoring pointless licenses.
Did I miss anything?
Sure, its smallish.
> Are other open weight video models also this small?
Apples models are weights-available not open weights, and yes, WAN 2.1, as well as the 14B models, also has 1.3B models; WAN 2.2, as well as the 14B models, also has a 5B model (the WAN 2.2 VAE used by Starflow-V is specifically the one used with the 5B model.) and because the WAN models are largely actually open weights models (Apache 2.0 licensed) there are lots of downstream open-licensed derivatives.
> Can this run on a single consumer card?
Modern model runtimes like ComfyUI can run models that do not fit in VRAM on a single consumer card by swapping model layers between RAM and VRAM as needed; models bigger than this can run on single consumer cards.
As far as I know, this might be the most advanced text-to-video model that has been released? I'm not sure whether the license will qualify as open enough in everyone's eyes, though.
This Apple license is click wrap MIT with the rights, at least, to modify and redistribute the model itself. I suppose I should be grateful for that much openness, at least.
To extend the analogy, "closed source machine code" would be like conventional SaaS. There's an argument that shipping me a binary I can freely use is at least better than only providing SaaS.
Better to execute locally than to execute remotely where you can't change or modify any part of the model though. Open weights at least mean you can retrain or distill it, which is not analogous to a compiled executable that you can't (generally) modify.
Of course, model weights almost certainly are not copyrightable so the license isn't enforceable anyway, at least in the US.
The EU and the UK are a different matter since they have sui generis database rights which seemingly allows individuals to own /dev/random.
For a 7b model the results look pretty good! If Apple gets a model out here that is competitive with wan or even veo I believe in my heart it will have been trained with images of the finest taste.
JG's recent departure and follow up massive reorg to get rid of AI, rumors on Tim's upcoming step down in early 2026... All of these signals indicate that those non-ML folks have won corporate politics to reduce the in-house AI efforts.
I suppose this was a part of serious efforts to deliver in-house models but the directional changes on AI strategy made them to give up. What a shame... At least the approach itself seem interesting and hope others to take a look and use it for building something useful.
They should really buy Snapchat.
> The checkpoint files are not included in this repository due to size constraints.
So it's not actually open weights yet. Maybe eventually once they actually release the weights it will be. "Soon"
> Datasets. We construct a diverse and high-quality collection of video datasets to train STARFlow-V. Specifically, we leverage the high-quality subset of Panda (Chen et al., 2024b) mixed with an in-house stock video dataset, with a total number of 70M text-video pairs.
Wonder if "iCloud backups" would be counted as "stock video" there? ;)
They shared audio Siri recordings with contractors in 2019. It became opt-in only after backlash, similar to other privacy controversies.
This shows that they clearly prioritize not being sued or caught, which is slightly different from prioritizing user choices.
But also, Starflow-V is a research model with a substandard text encoder, it doesn't have to be competitive as-is to be an interesting spur for further research on the new architecture it presents. (Though it would be nice if it had some aspect where it offered a clear improvement.)
A little bit more background for those who don't know what a VAE is (I'm simplifying here, so bear with me): it's essentially a model which turns raw RGB images into a something called a "latent space". You can think of it as a fancy "color" space, but on steroids.
There are two main reasons for this: one is to make the model which does the actual useful work more computationally efficient. VAEs usually downscale the spatial dimensions of the images they ingest, so your model now instead of having to process a 1024x1024 image needs to work on only a 256x256 image. (However they often do increase the number of channels to compensate, but I digress.)
The other reason is that, unlike raw RGB space, the latent space is actually a higher level representation of the image.
Training a VAE isn't the most interesting part of image models, and while it is tricky, it's done entirely in an unsupervised manner. You give the VAE an RGB image, have it convert it to latent space, then have it convert it back to RGB, you take a diff between the input RGB image and the output RGB image, and that's the signal you use when training them (in reality it's a little more complex, but, again, I'm simplifying here to make the explanation more clear). So it makes sense to reuse them, and concentrate on the actually interesting parts of an image generation model.
No, using the WAN 2.2 VAE does not mean it is a WAN 2.2 edit.
> compressed to 7B.
No, if it was an edit of the WAN model that uses the 2.2 VAE, it would be expanded to 7B, not compressed (the 14B models of WAN 2.2 use the WAN 2.1 VAE, the WAN 2.2 VAE is used by the 5B WAN 2.2 model.)