I got access to the preview, here's what it gave me for "A pelican riding a bicycle along a coastal path overlooking a harbor" - this video has all four versions shown:

https://static.simonwillison.net/static/2024/pelicans-on-bic...

Of the four two were a pelican riding a bicycle. One was a pelican just running along the road, one was a pelican perched on a stationary bicycle, and one had the pelican wearing a weird sort of pelican bicycle helmet.

All four were better than what I got from Sora: https://simonwillison.net/2024/Dec/9/sora/

There's another important contender in the space: Hunyuan model from Tencent

My company (Nim) is hosting Hunyuan model, so here's a quick test (first attempt) at "pelican riding a bycicle" via Hunyuan on Nim: https://nim.video/explore/OGs4EM3MIpW8

I think it's as good, if not better than Sora / Veo

> A whimsical pelican, adorned in oversized sunglasses and a vibrant, patterned scarf, gracefully balances on a vintage bicycle, its sleek feathers glistening in the sunlight. As it pedals joyfully down a scenic coastal path, colorful wildflowers sway gently in the breeze, and azure waves crash rhythmically against the shore. The pelican occasionally flaps its wings, adding a playful touch to its enchanting ride. In the distance, a serene sunset bathes the landscape in warm hues, while seagulls glide gracefully overhead, celebrating this delightful and lighthearted adventure of a pelican enjoying a carefree day on two wheels.

What does it produce for “A pelican riding a bicycle along a coastal path overlooking a harbor”?

Or, what do Sora and Veo produce for your verbose prompt?

If Sora is anything like Dall-e a prompt like "A pelican riding a bicycle along a coastal path overlooking a harbor" will be extended into something like the longer prompt behind the scenes. OpenAI has been augmenting image prompts from day 1.
Hard to say about SORA but the video you shared is most definitely worse than Veo.

The Pelican is doing some weird flying motion, motion blur is hiding a lack of detail, cycle is moving fast so background is blurred etc. I would even say SORA is better because I like the slow-motion and detail but it did do something very non physical.

Veo is clearly the best in this example. It has high detail but also feels the most physically grounded among the examples.

The prompt asks that it flaps its wings. So it's actually really impressive how closely it adheres (including the rest of the little details in the prompt, like the scarf). Definitely the best of the three, in my opinion.
Pretty good except the backwards body and the strange wing movement. The feeling of motion is fantastic though.
  • arjie
  • ·
  • 4 days ago
  • ·
  • [ - ]
I was curious how it would perform with prompt enhancement turned off. Here's a single attempt (no regenerations etc.): https://www.youtube.com/watch?v=730cb2qozcM

If you'd like to replicate, the sign-up process was very easy and I was easily able to run a single generation attempt. Maybe later when I want to generate video I'll use prompt enhancement. Without it, the video appears to have lost a notion of direction. Most image-generation models I'm aware of do prompt-enhancement. I've seen it on Grok+Flow/Aurora and ChatGPT+DallE.

    Prompt
    A pelican riding a bicycle along a coastal path overlooking a harbor
    Seed
    15185546
    Resolution
    720×480
  • taneq
  • ·
  • 3 days ago
  • ·
  • [ - ]
I mean, you didn’t SAY riding forwards…
I suppose if you reverse it would look okish
  • gcr
  • ·
  • 4 days ago
  • ·
  • [ - ]
FYI your website shows me a static image on iOS 18.2 Safari. Strangely, the progress bar still appears to “loop,” but the bird isn’t moving at all.

Turning content blockers off does not make a difference.

Fwiw, it is finicky but the video played after a couple seconds (iOS 18.2 Safari).
Reddit says it is much better than Sora. Are you hosting the full version of Nunyuan? (Your video looks great.)
HunYuan is also open source / source available unless you have 100M DAU.

Then there's Lightricks LTX-1 model and Genmo's Mochi-1. Even the research CogVideoX is making progress.

Open source video AI is just getting started, but it's off to a strong start.

Our limited tests show that yes, Hunyuan is comparable or better than Sora on most prompts. Very promising model
Is it still better if you copy his whole prompt instead of half of it?
  • ·
  • 4 days ago
  • ·
  • [ - ]
I mean, the pelican's body is backwards...
Here's one of a penguin paragliding and it's surprisingly realistic https://x.com/Plinz/status/1868885955597549624
This is the first GenAI video to produce an "oh shit" reflex in me.

oh, shit!

As long as at least one option is exactly what you asked for throwing variations at you that don't conform to 100% of your prompt seems like it could be useful if it gives the model leeway to improve the output in other aspects.
Here is my version of pelican at bicycle made with hailuoai:

https://hailuoai.video/share/N9dlRd1L1o0p

It's funny having looked forward to Sora for a while and then seeing it be superseded so shortly after access to it is finally made public.
His little bike helmet is adorable
The AI safety team was really proud of that one.
I am surprised that the top/right one still shows a cut and switch to a difference scene. I would assume that that's something that could be trivially filtered out of the training data, as those discontinuities don't seem to be useful for either these short 6sec video segments or for getting an understanding of the real world.
It looks much better than Sora but still kind of in uncanny valley
This is the worst it will ever be…
That is surprisingly good. We are at a point where it seems to be good enough for at least b-roll content replacing stock video clips.
  • rob74
  • ·
  • 4 days ago
  • ·
  • [ - ]
Well yeah, if you look closely at the example videos on the site, one of them is not quite right either:

> Prompt: The sun rises slowly behind a perfectly plated breakfast scene. Thick, golden maple syrup pours in slow motion over a stack of fluffy pancakes, each one releasing a soft, warm steam cloud. A close-up of crispy bacon sizzles, sending tiny embers of golden grease into the air. [...]

In the video, the bacon is unceremoniously slapped onto the pancakes, while the prompt sounds like it was intended to be a separate shot, with the bacon still in the pan? Or, alternatively, everything described in the prompt should have been on the table at the same time?

So, yet again: AI produces impressive results, but it rarely does exactly what you wanted it to do...

  • soco
  • ·
  • 4 days ago
  • ·
  • [ - ]
Technically speaking I'd say your expectation is definitely not laid out in the prompt, so anything goes. Believe me I've had such requirements from users and me as a mere human programmer am never quite sure what they actually want. So I take guesses just like the AI (because simply asking doesn't bring you very far, you must always show something) and take it from there. In other words, if AI works like me, I can pack my stuff already.
This tech is cute but the only viable outcomes are going to be porn and mass produced slop that'll be uninteresting before it's even created. Why even bother?
There will be both of those things in abundance.

But I'm also seeing some genuinely creative uses of generative video - stuff I could argue has got some genuine creative validity. I am loathe to dismiss an entire technique because it is mostly used to create garbage.

We'll have to figure out how to solve the slop problem - it was already an issues before AI so maybe this is just hastening the inevevitable.

The real problem is that trust in legacy media hit rock bottom right as we enter the era where we would need such trust the most. Soon enough, nothing you see on video can be believed, but (perhaps more importantly) nothing must be believed either.
Comments like this one are so predictable and incredulous. As if the current state of the art is the final form of this technology. This is just getting started. Big facepalm.
Have you already noticed the trend of image search results for porn containing inferior AI slop porn?

I have. It sucks. The world we're headed for maybe isn't one we actually wind up wanting in the end.

I like the idea of increasingly advanced video models as a technologist, but in practice, I'm noticing slop and I don't like it. Having grown up on porn, when video models are in my hands, the addiction steers me in the direction of only using the the technology to generate it. That's a slot machine so addictive akin to the leap from the dirty magazines of old to the world of internet porn I witnessed growing up. So, porn addiction on steroids. I found it eventually damaging enough to my mental health that I sold my 4090. I'm a lot better off now.

The nerd in me absolutely loves Generative models from a technology perspective, but just like the era of social media before it, it's a double edged sword.

It sounds like you have a personal problem that you’re trying to project onto the rest of society.
No, I'm providing a personal anecdote that some members of society that do have, or may develop, the same or similar problems are having both the (perceived) good and the bad aspects of those problems seriously magnified by this technology. This can have personal consequences, but also the consequences can affect the lives of others.

Hence, a certain % of the population will be negatively affected by this. I personally personally think it's worth raising awareness of.

I hope they're right. If the technology improves to such a degree that meaningful content can be produced then it could spell global disaster for a number of reasons.

Also I just don't want to live in a world where the things we watch just aren't real. I want to be able to trust what I see, and see the human-ness in it. I'm aware that these things can co-exist, but I'm also becoming increasingly aware that as long as this technology is available and in development, it will be used for deception.

That ship sailed shortly after the invention of photography. Photos were altered for political purposes during the US Civil War.

Now, we have entire TV shows shot on green screen in virtual sets. Replacing all the actors is just the next logical step.

  • ·
  • 4 days ago
  • ·
  • [ - ]
That's exactly what I mean, all of those methods take some human effort, there is a human involved in the process. Now we face a reality that it might take no human effort to do... well, anything. Which is terrifying to me.

I do believe that humans are restless, and even when there is no longer any point to create, and it is far easier to dictate, we still will, just because we are too driven not to.

you know that there is still offline artforms like concerts theaters opera installations etc so i wouldn see it that negative. and we have nearly 100years of music and film we can enjoy. so maybe video is a dying artform for human to act in but there is so much more.
The most predictable comment is yours, especially since you completely missed the point of the original comment which had nothing to do with the video quality.
  • gruez
  • ·
  • 4 days ago
  • ·
  • [ - ]
AI generated slop content begets human generated slop comment.
So, even better porn?
Winning 2:1 in user preference versus sora turbo is impressive. It seems to have very similar limitations to sora. For example- the leg swapping in the ice skating video and the bee keeper picking up the jar is at a very unnatural acceleration (like it pops up). Though by my eye maybe slightly better emulating natural movement and physics in comparison to sora. The blog post has slightly more info:

>at resolutions up to 4K, and extended to minutes in length.

https://blog.google/technology/google-labs/video-image-gener...

It looks Sora is actually the worst performer in the benchmarks, with Kling being the best and others not far behind.

Anyways, I strongly suspect that the funny meme content that seems to be the practical uses case of these video generators won't be possible on either Veo or Sora, because of copyright, PC, containing famous people, or other 'safety' related reasons.

I’ve been using Kling a lot recently and been really impressed, especially by 1.5.

I was so excited to see Sora out - only to see it has most of the same problems. And Kling seems to do better in a lot of benchmarks.

I can’t quite make sense of it - what OpenAI were showing when they first launched Sora was so amazing. Was it cherry picked? Or was it using loads more compute than what they’ve release?

The SORA model available to the public is a smaller, distilled model called SORA Turbo. What was originally shown was a more capable model that was probably too slow to meet their UX requirements for the sora.com user interface.
> the jar is at a very unnatural acceleration (like it pops up).

It does pop up. Look at where his hand is relative to the jar when he grabs it vs when he stops lifting it. The hand and the jar are moving, but the jar is non-physically unattached to the grab.

  • lukol
  • ·
  • 4 days ago
  • ·
  • [ - ]
Last time Google made a big Gemini announcement, OpenAI owned them by dropping the Sora preview shortly after.

This feels like a bit of a comeback as Veo 2 (subjectively) appears to be a step up from what Sora is currently able to achieve.

  • htrp
  • ·
  • 4 days ago
  • ·
  • [ - ]
Some PM is literally sitting on this release waiting for their benchmarks to finish
  • ·
  • 4 days ago
  • ·
  • [ - ]
And it's going to be hard for OpenAI to do that again, now that Google's woken up.
I appreciate they posted the skateboarding video. Wildly unrealistic whenever he performs a trick - just morphing body parts.

Some of the videos look incredibly believable though.

our only hope for verifying truth in the future is that state officials give their speeches while doing kick flips and frontside 360s.
What officials actually say doesn't make a difference anymore. People do not get bamboozled because of lack of facts. People who get bamboozled are past facts.
Off topic from the video AI thread, but to elaborate on your point: people believe what they want, based on what they have been primed to believe from mass media. This is mainly the normal TV and paper news, filtered through institutions like government proclamations, schools, and now supercharged by social media. This is why the "narrative" exists, and news media does the consensus messaging of what you should believe (and why they hate X and other freer media sources).

By the time the politician says it, you've been soaking in it for weeks or months, if not longer. That just confirms the bias that has been implanted in you.

If anything I'd say the opposite. Look at the last US elections, a lot of the criticisms against the side that lost were things people "thought" and "felt" they were for/against, without them actually coming out and saying anything of the like. It was people criticising them for stuff that wasn't actually real on X, traditional TV, and the like that made voters "feel" like that stuff is real.

And X is really egregious, where the owner shitposts frequently and often things of dubious factuality.

You say offtopic, but I think AI video generation is the most on-topic place to bring up the subject of falsified politically charged statements. Companies showcasing these things aren't exactly lining up to include "moral" as one of the bullet point adjectives in a limitations section.
> and why they hate X and other freer media sources

I left X precisely because it was flooded with Russian propaganda/misinformation.

Such as?
Nonsense like:

- People were 'forced' into vaccinations

- Covid 19 was a testing ground for the next global pandemic so that "they" can control us

- Climate change is a hoax/Renewables are our doom

- Everything our government does is to create a totalitarian state next.

- Putin is actually the victim, it is all NATO fault and their imperialism

  • 15155
  • ·
  • 2 days ago
  • ·
  • [ - ]
> People were 'forced' into vaccinations

"Take this novel vaccine primarily for someone else's benefit or lose your job: it's your choice, you totally aren't being 'forced.'"

For people like nurses, yes. That just a normal job requirement.
  • 15155
  • ·
  • 1 day ago
  • ·
  • [ - ]
How about remote software engineers who often equally befell this issue?
Why is it impossible for these opinions to be homegrown? Would people be a hivemind without Putin?
It's not impossible, but of course they're not homegrown.

Putin's apologists always demand he be given the benefit of the doubt. That's akin to convicting a spy beyond a reasonable doubt. That standard is meant to favor false negatives over false positives when incarcerating people. Better to let a thousand criminals go free than to imprison an innocent person.

If we used that for spies, we'd have 1000 of them running around for each convicted one. Not to mention that they have a million ways to avoid detection. They rely on their training, on the resources of the state, and on infiltrators who sabotage detection efforts. The actual ratio would be much higher.

In the case of opinion manipulation, the balance is even more pernicious. That's because the West decided a couple decades ago to use the "it's just a flesh wound" approach to foreign interference.

The problem is that we're not just protecting gullible voters. We're also defending the reputation of democracy. Either democracy works, or it doesn't. If it doesn't, then we're philosophically no better than Russia and China.

But if it was possible to control the outcome of elections by online manipulation alone, that would imply that democracy doesn't really work. Therefore online manipulation "can't work." Officially, it might sway opinion by a few points, but a majority of voters must definitionally be right. If manipulation makes little difference, then there's not much reason to fight it (or too openly anyways.)

Paradoxically, when it comes to detecting Russian voter manipulation, the West and Putin are strange bedfellows. Nothing to see here, move along.

That's an interesting question.

My sense is that the "hivemind" is, in a symbiotic way, both homegrown and significantly foreign-influnced.

More specifically: the core sentiment of the hivemind (basically: anti-war/anti-interventionist mixed with a broader distrust of anything the perceived "establishment" supports) is certainly indigenous -- and it is very important to not overlook this fact.

But many of its memes, and its various nuggets of disinformation do seem to be foreign imports. This isn't just an insinuation; sometimes the lineage can actually be traced word-for-word with statements originating from foreign sources (for example, "8 years of shelling the Donbas").

The memes don't create the sentiment. But they do seem to reinforce it, and provide it with a certain muscle and kick. While all the while maintaining the impression that it's all entirely homegrown.

And the farther one goes down the "multipolar" rabbit hole, the more often one encounters not just topical memes, but signature phrases lifted directly from known statements by Putin and Lavrov themselves. E.g. that Ukraine urgently needs to "denazify". The more hardcore types even have no qualms about using that precious phrase "Special Military Operation", with a touch of pride in their voice.

It's really genuinely weird, what's happening. What people don't realize is that none of this is happening by accident. It's a very specific craft that the Russian security services (in particular) have nurtured and developed, literally across generations, to create language that pushes people's buttons in this way.

The Western agencies and institutions have their own way of propaganda of course, but usually it's far more bland and boring (e.g. as to how NATO "supports fosters broader European integration" and all that).

Would we have the same kind of hivemind without Putin? There's always some kind of a hivemind -- but as applies to Eastern Europe, it does seem that the general climate of discourse was quite different before his ascendancy. And that it certainly took a very sharp, weird bend in the road after the start of Special Military Operation.

What are you talking about? News media LOVE twitter/X, it is where they get all their stories from and journalists are notoriously addicted to it, to their detriment.
sadly it's likely that video gen models will master this ability faster than state officials
Remember when the iPhone came out and BlackBerry smugly advertised that their products were “tools not toys”?

I remember saying to someone at the time that I was pretty sure iPhone was going to get secure corporate email and device management faster than BlackBerry was going to get an approachable UI, decent camera, or app ecosystem.

Maybe they will do more in person talks, I guess. Back to the old times.
Until the AI can master the art of creating "footage of someone's phone if they were in the crowd of the speech in this other video", then we can't even trust that.
This was my favorite of all of the videos. There's no uncanny valley; it's openly absurd, and I watched it 4-5 times with increasing enjoyment.
Cracks in the system are often places where artists find the new and interesting. The leg swapping of the ice skater is mesmerizing in its own way. It would be useful to be able to direct the models in those directions.
It is great so see a limitations section. What would be even more honest is a very large list of videos generated without any cherry picking to judge the expected quality for the average user. Anyway, the lack of more videos suggests that there might be something wrong somewhere.
The honey, Peruvian women, swimming dog, bee keeper, DJ etc. are stunning. They’re short but I can barely find any artifacts.
The prompt for the honey video mentions ending with a shot of an orange. The orange just...isn't there, though?
Just pretend it's a movie about a shape shifter alien and it's just trying it's best at ice skating, art is subjective like that doesn't it? I bet Salvador Dali would have found those morphing body parts highly amusing.
  • cyv3r
  • ·
  • 4 days ago
  • ·
  • [ - ]
I don't know why they say the model understands physics when it makes mistakes like that still.
  • 0xcb0
  • ·
  • 4 days ago
  • ·
  • [ - ]
Imho is stunning, yet what is happening there is super dangerous.

These videos will and may be too realistic.

Our society is not prepared for this kind of reality "bending" media. These hyperrealistic videos will be the reason for hate and murder. Evil actors will use it to influence elections on a global scale. Create cults around virtual characters. Deny the rules of physics and human reason. And yet, there is no way for a person to detect instantly that he is watching a generated video. Maybe now, but in 1 year, it will be indistinguishable from a real recorded video

Are Apple and other phone/camera makers working on ways to "sign" a video to say it's an unedited video from a camera? Does this exist now? Is it possible?

I'm thinking of simple cryptographic signing of a file, rather than embedding watermarks into the content, but that's another option.

I don't think it will solve the fake video onslaught, but it could help.

Leica M11 signs each photo. "Content Authority Initiative" https://leica-camera.com/en-US/news/partnership-greater-trus...

Cute hack showing that its kinda useless unless the user-facing UX does a better job of actually knowing whether the certificate represents the manufacturer of the sensor (dude just uses a self signed cert with "Leica Camera AG" as the name. Clearly cryptography literacy is lagging behind... https://hackaday.com/2023/11/30/falsified-photos-fooling-ado...

Even if the certs were properly cryptographically vetted, you could just point the camera at a high-enough resolution screen displaying false content.
  • ttul
  • ·
  • 4 days ago
  • ·
  • [ - ]
I think this will be a thing one day, where photos are digitally watermarked by the camera sensor in a non-repudiable manner.
This is a losing battle. You can always just record an AI video with your camera. Done, now you have a real video.
I agree is probably a losing battle, but maybe worth fighting. If the metadata is also encrypted, you can also verify the time and place it was recorded. Of course, this requires closed/locked hardware and still possible to spoof. Not ideal, but some assurances are better than a future of can't trust anything.
  • hbn
  • ·
  • 3 days ago
  • ·
  • [ - ]
This is what I think every time I hear about AI watermarking. If anything, convincing people that AI watermarking is a real, reliable thing is just gonna cause more harm because bad actors that want to convince people something fake is real would obviously do the simple subversion tactics. Then you have a bunch of people seeing it passes the watermark check, and therefore is real.
  • 110
  • ·
  • 3 days ago
  • ·
  • [ - ]
Potential solutions:

1. AI video watermarks that carry over even if a video of the AI video is taken

2. Cameras that can see AI video watermarks and put an AI video watermark on the videos of any AI videos they take

Nikon has had digital signature ability in some of their flagship cameras since at least 2007, and maybe before then. The feature is used by law enforcement when documenting evidence. I assume other brands also have this available for the same reasons.
  • tomp
  • ·
  • 4 days ago
  • ·
  • [ - ]
We've had realistic sci-fi and alternate history movies for a very long time.
Which take millions of dollars and huge teams to make. These take one bored person, a sentence, and a few minutes to go from idea to posting on social media. That difference is the entire concern.
If “evil actors” could really “manipulate elections” with fake video, would they really let a few million dollars stop them?

That’s not that much money.

Who says they don't? Interference is being "democratized".
Any examples of hoax videos that you can name? I’m having a hard time placing any.

I really find the threat to be overhyped.

So, nothing. Got it.
  • ·
  • 3 days ago
  • ·
  • [ - ]
[flagged]
  • krapp
  • ·
  • 4 days ago
  • ·
  • [ - ]
We already have hate and murder, evil actors influencing elections on a global scale, denial of physics and reason, and cults of personality. We also already have the ability to create realistic videos - not that it matters because for many people the bar of credulity isn't realism but simply confirming their priors. We already live in a world where TikTok memes and Facebook are the primary sources around which the masses base their reality, and that shit doesn't even take effort.

The only thing this changes is not needing to pay human beings for work.

Instead of calling for regulations, the big tech companies should run big campaigns educating the public, especially boomers, that they no longer can trust images, videos, and audio on the Internet. Put paid articles and ads about this in local newspapers around the world so even the least online people gets educated about this.
Do we really want a world where we can't trust anything we see, hear, or read? Where people need to be educated to not trust their senses, the things we use to interpret reality and the world around us.

I feel this kind of hypervigilance will be mentally exhausting, and not being able to trust your primary senses will have untold psychological effects

You can trust what you see and hear around you. You might be able to trust information from a party you trust. You certainly shouldn't trust digital information from unknown entities with unknown agendas.

We're already in a world where "fake news" and "alt-facts" influence our daily lives and political outcomes.

What I see and hear around me is a miniscule fraction of the outside world. To have a shared understanding of reality, of what is happening in my town, my city, my state, my country, my continent, the world, requires much more than what is available in your immediate environment.

In the grand scheme of understanding the world at large, our immediate senses are not particularly valuable. So we _have_ to rely on other streams of information. And the trend is towards more of those streams being digital.

The existence of "fake news" and "alt facts", doesn't mean we should accept a further and dramatic worsening of our ability to have a shared reality. To accept that as an inevitability is defeatist and a kind of learned helplessness.

Have you seen the Adam Curtis documentary "Hypernormalisation"? It deals with some similar themes, but on a much smaller scale (at least it is smaller in the context of current and near future tech)

One absolutely should not trust what you see and hear around you. One cannot trust the opinions of others, one should not trust faith, one can only reliable develop critical analysis and employ secondary considerations to the best of their ability, and then be cautious at every step. Trust and faith are relics of a time now gone, and it is time to realize it, to grow up and see the reality.
  • ·
  • 4 days ago
  • ·
  • [ - ]
I wonder if we’ll eventually see people abandoning the digital reality in favor of real-life, physical interactions wherever possible.

I recently had an issue with my mobile service provider and I was insanely glad when I could interact with a friendly and competent shop clerk (I know I got lucky there) in a brick&mortar instead of a chatbot stuck in a loop.

Yeah I think it's a real possibility that people will disconnect from the digital world. Though I fear the human touch will become a luxury only afforded by the wealthy. If it becomes a point of distinction, people will charge extra for it. While the rest are pleaing with a brainless chat bots
That world is already here. Nothing you can do about it, might as well democratize access to the technology.
No it's not. We are not at the stage where reality is completely indistinguishable from fiction. We are still in the uncanny valley. Nothing is inevitable
Do you think China will stop here?

This is like trying to hide Photoshop from the public. Realistic AI generated videos and adversary-sponsored mass disinformation campaigns are 100% inevitable, even if the US labs stopped working on it today.

So, you might as well open access to it to blunt the effect, and make sure our own labs don't fall behind on the global stage.

That is reality, that is nature. The natural world is filled with camouflaged animals and plants that prey on one another via their camouflage. This is evolution, and those unable to discriminate reality from fiction will be the causalities, as they always have since the dawn of life.
The naturalistic fallacy is weak at best, but this is one of the weirdest deployments of it I've encountered. It's not evolution, it's nothing like it.

If it's kill or be killed, we should do away with medicine right? Only the strong survive. Why are we saving the weak? Sorry but this argument is beyond silly

Deception is a key part of life, and the inability to discriminate fact from fiction is absolutely a key metric of success. Who said "kill or be killed"? Not I. It is survival or not, flourish or not, succeed or not.
But why must the deception take place? Evolution is natural, The development of AI generated videos takes teams of people, years of effort and millions of pounds. Why should those that are more easily deceived be culled? Do you believe that the future of technology is weeding out the weak? Do you believe the future of humanity is the existence of only those that can use the technologies we develop? You might very well find yourself in a position, a long time from now, where you are easily deceived by newer technologies that you are not familiar with.
Deception takes places because deception takes place, because it can. I'm not the gatekeeper of it, I'm just acknowledging it and some of the secondary effects that will occur due to these inevitable technologies. I don't believe the future is anything other than a hope. That hope will require those future individuals to be very discriminating of their surroundings to survive, all surroundings includes all the society information and socialization, because that is filled with misinformation too. All that filled with misinformation right now, and it will just get more sophisticated. That's what I'm saying.
I can't disagree with you there. It's a shame I can't. The future is a scary place to be.
Maybe people should have some of those psychological effects.

Maybe operation Timber Sycamore, that bears fruit in Syria right now wouldn't happen, if the population was less trusting of the shit they see on tv.

We have evolved to trust our senses as roughly representative of reality. I'm not convinced we are able to adapt to that kind of rapid shift.

I have not heard of Timber Sycamore until this comment. A quick look at Wikipedia I'm struggling to see the relevance here. Can you elaborate?

Sure. No amount of perception will let you see the financing of Al-Quida or Al-Nusra soldiers. You can't perceive your way out of your blindness. You need to reflect.
Of all the different sci-fi futures I’ve encountered, I never thought we’d end up in the Phillip k Dick one.
It will also reinforce whatever bias we have already. When facing ambiguous or unknowable situations our first reaction is to go with "common sense" or "trusting our gut".

"Uh, Is that video of [insert your least favourite politician here] taking a bribe real or not? Well, I'm going to trust my instincts here..."

What would motivate "big tech" to warn people about their own products, if not regulations?
Don't forget text. You can't trust text either.

And no big tech company would run the ads you're suggesting, because they only make money when people use the systems that deliver the untrustworthy content.

  • onel
  • ·
  • 3 days ago
  • ·
  • [ - ]
The same things could be said when everyone could print their own newspapers or books. How would people distinguish between fake and real news?

I think we will need the same healthy media diet.

  • dbbk
  • ·
  • 3 days ago
  • ·
  • [ - ]
There wasn't even a healthy media diet before generative AI given the amount of 'fake news' in 2016 and 2020.
Photoshop has been a thing for over 30 years.
Isn't the whole point of OP that we're currently watching the barrier to generating realistic assets go from "spend months grinding Photoshop tutorials" to "type what you want into this box and wait a few minutes"?
  • dbbk
  • ·
  • 3 days ago
  • ·
  • [ - ]
I still don't really know why we're doing this. What is the upside? Democratising Hollywood? At the expense of... enormous catastrophic disinformation and media manipulation.
The society voted with their money. Google refrained from launching their early chatbots and image generation tools due to perceived risks of unsafe and misleading content being generated, and got beaten to the punch in the market. Of course now they'll launch early and often, the market has spoken.
We have constructed a society where market forces feel inevitable, but it doesn't have to be that way.
Of course; but this is the current society, and attempts to reform it, e.g. communism, failed abjectly, so by evolution pressure, the capitalist society dominated by market forces is the best that we have
Right, but there are plenty of middle grounds between true communism and just letting markets freewheel.
And places with these systems are those that achieved the best quality of life and peace.
There's no evidence that this fearmongering over safety is actually correct. The worst thing you can do is pummel an emerging technology into the grave because of misplaced fear.

Just take a look at how many everyday things were "incredibly dangerous for society" - https://pessimistsarchive.org/

Spend any amount of time on mainstream social media and you'll see AI-generated media being shared credulously. It's not a hypothetical risk, it's already happening.

Even if you're not convinced that it's dangerous, at the very least it's incredibly annoying.

If someone dumped a trailer full of trash in your garden, you're not going to say "oh well, market forces compelled them to do that".

Eh, growing pains.
Superficially impressive but what is the actual use case of the present state of the art? It makes 10-second demos, fine. But can a producer get a second shot of the same scene and the same characters, with visual continuity? Or a third, etc? In other words, can it be used to create a coherent movie --even a 60-second commercial -- with multiple shots having continuity of faces, backgrounds, and lighting?

This quote suggests not: "maintaining complete consistency throughout complex scenes or those with complex motion, remains a challenge."

B-roll for YouTube videos.
This is still early. It's only going to get better.
Fun. Fun! I find it a lot of fun to have a computer spit out pixels based on silly ideas I have. It is very amusing to me
  • m3kw9
  • ·
  • 4 days ago
  • ·
  • [ - ]
You blend them and extend the videos and then you connect enough for a 2 min short
That's what I think the tech at this stage cannot do. You make two clips from the same prompt with a minor change, e.g.

> a thief threatens a man with a gun, demanding his money, then fires the gun (etc add details)

> the thief runs away, while his victim slowly collapses on the sidewalk (etc same details)

Would you get the same characters, wearing the identical clothing, the same lighting and identical background details? You need all these elements to be the same, that's what filmmakers call "continuity". I doubt that Veo or any of the generators would actually produce continuity.

Dank memes.
  • ·
  • 4 days ago
  • ·
  • [ - ]
> "what is the actual use case of the art?"

Not much. Low quality over-saturated advertising? Short films made by untalented lazy filmmakers?

When text prompts are the only source, creativity is absent. No craft, no art. Audiences won't gravitate towards fake crap that oozes out of AI vending machines, unrefined, artistically uncontrolled.

Imagine visiting a restaurant because you heard the chef is good. You enjoy your meal but later discover the chef has a "food generator" where he prompts the food into existence. Would you go back to that restaurant?

There's one exception. Video-to-video and image-to-video, where your own original artwork, photos, drawings and videos are the source of the generated output. Even then, it's like outsourcing production to an unpredictable third party. Good luck getting lighting and details exactly right.

I see the role of this AI gen stuff as background filler, such as populating set details or distant environments via green screen.

> Imagine visiting a restaurant because you heard the chef is good. You enjoy your meal but later discover the chef has a "food generator" where he prompts the food into existence. Would you go back to that restaurant?

That's an obvious yes from me. I liked it, and not only that, but I can reasonably assume it will be consistently good in the future, something lot's of places can't do.

So you'd forgive the deception (there's no "chef" only a button pusher) and revisit the restaurant even though you could easily generate the food yourself.

You don't care about the absence of a lifetime of hard work behind your meal, or the efforts of small business owners inspired by good food and passion in the kitchen. All that matters to you is that your taste buds were satisfied?

Interesting. Perhaps we can divide the world into those who'd happily dine at "Skynet Gourmet", and those who'd seek a real restaurant.

I think it's more complex than that. At least to me food in particular is nothing special, I truly eat just for substance, so if it doesn't taste bad, I'm happy.

I still believe that there's place for creative work, I just don't see why something created by something other than a human is inherently bad.

short video creation tools, its a huge market
Misinformation
FWIW it feels like Google should dominate text/image -> video since they have access to Youtube unfettered. Excited to see what the reception is here.
  • paxys
  • ·
  • 4 days ago
  • ·
  • [ - ]
Everyone has access to YouTube. It’s safe to assume that Sora was trained on it as well.
All you can eat? Surely they charge a lot for that, at least. And how would you even find all the videos?
Nobody in this space gives a fuck about anyone or anything further upstream than the file sitting in their ingestion queue. If they can see it, they 'own' it.
Who says they've talked to Google about it at all?

I can't speak to OpenAI but ByteDance isn't waiting for permission.

ByteDance has their own unlimited supply of videos...
They already did it, and I’m guessing they were using some of the various YouTube down loaders Google has been going after.
Does everyone have "legal" access to YouTube.

In theory that should matter to something like Open(Closed)Ai. But who knows.

I mean, I have trained myself on Youtube.

Why can't a silicon being train itself on Youtube as well?

Because silicon is a robot. A camcorder can't catch a flick with me in the theater even if I dress it up like a muppet.
Not with that attitude.

A corporation "is a person" with all the rights that come along with that - free speech etc.

What if I'm part-carbon, part-silicon?

Like, a blind person with vision restored by silicon eyes?

Do I not have rights to run whatever firmware I want on those eyes, because it's part of my body?

Okay, so what if that firmware could hypothetically save and train AI models?

presumably, it should be illegal to record a movie with with an inbuilt camera. capturing the data in such a way that an identical copy can be automatically be reproduced brakes the social contract around the way those works are shared. the majority of media is produced by large companies that are ultimately not harmed by such activities, but individual artisans that create things shouldn't be subjected to this.

we can take this a step further: if your augmented eyes and ears can record people in a conversation, should you be allowed to produce lifelike replicas of people's appearance and voice? a person can definitely imagine someone saying/doing anything. a talented person with enough effort could even make a 3D model and do a voice impression on their own. it should be obvious that having a conversation with a stranger doesn't give them permission to clone your every detail, and shouldn't that also be true for your creations?

The difference is that you didn't need to scrape millions of videos from YouTube with residential proxy network scrapers to train yourself.
Only because I'm significantly more intelligent than ChatGPT, so I can achieve its level of competency on a lot of things with a thousand videos instead of a million videos.

If it just reduces to an issue of data efficiency, AI research will eventually get there though.

Humans have rights, machines don't.
When a company trains an AI model on something, and then that company sells access to the ai model, the company, not the ai model, is the being violating copyright. If Jimmy makes an android in his garage and gives it free will, then it trains itself on youtube, i doubt anyone would have an issue.
  • golol
  • ·
  • 4 days ago
  • ·
  • [ - ]
If OpenAI training on youtube videos violates copyright then so does Google training on them.
In what possible way is that true? Not that I like it, but google has its creators sign away the rights to their material for uses like this. Nobody signs a contract with openai when they make their youtube videos.
When you sign away full rights to one company, that one company can give rights to another company (for money or not).

They could also just acquire that other company.

From the creator's standpoint, signing away rights to one company is as good as gone.

Did openai make a deal with google to train on youtube?
They also had a good chunk of the web text indexed, millions of people's email sent every day, Google scholar papers, the massive Google books that digitized most ever published books and even discovered transformers.
  • xnx
  • ·
  • 4 days ago
  • ·
  • [ - ]
This looks great, but I'm confused by this part:

> Veo sample duration is 8s, VideoGen’s sample duration is 10s, and other models' durations are 5s. We show the full video duration to raters.

Could the positive result for Veo 2 mean the raters like longer videos? Why not trim Veo 2's output to 5s for a better controlled test?

I'm not surprised this isn't open to the public by Google yet, there's a huge amount of volunteer red-teaming to be done by the public on other services like hailuoai.video yet.

P.S. The skate tricks in the final video are delightfully insane.

> I'm not surprised this isn't open to the public by Google yet,

Closed models aren't going to matter in the long run. Hunyuan and LTX both run on consumer hardware and produce videos similar in quality to Sora Turbo, yet you can train them and prompt them on anything. They fit into the open source ecosystem which makes building plugins and controls super easy.

Video is going to play out in a way that resembles images. Stable Diffusion and Flux like players will win. There might be room for one or two Midjourney-type players, but by and large the most activity happens in the open ecosystem.

> Hunyuan and LTX both run on consumer hardware

Are there other versions than the official?

> An NVIDIA GPU with CUDA support is required. > Recommended: We recommend using a GPU with 80GB of memory for better generation quality.

https://github.com/Tencent/HunyuanVideo

> I am getting CUDA out of memory on an Nvidia L4 with 24 GB of VRAM, even after using the bfloat16 optimization.

https://github.com/Lightricks/LTX-Video/issues/64

Yes you can, with some limitations

https://github.com/Tencent/HunyuanVideo/issues/109

  • jcims
  • ·
  • 4 days ago
  • ·
  • [ - ]
Yes. Lots of folks on reddit running it on 24gb cards.
I wonder if the more decisive aspect is the data, not the model. Will closed data win over open data?

With the YouTube corpus at their disposal, I don't see how anyone can beat Google for AI video generation.

Stable Diffusion and Flux did not win though. Midjourney and chatGPT won.
“Won” what exactly? I have no issues running stable diffusion locally.

Since Llama3.3 came out it is my first stop for coding questions, and I’m only using closed models when llama3.3 has trouble.

I think it’s fairly clear that between open weights and LLMs plateauing, the game will be who can build what on top of largely equivalent base models.

The quality for SD is no where near the clear leaders.
> The quality for SD is no where near the clear leaders.

It absolutely is. Moreover, the tools built on top of SD (and now Flux) are superior to any commercial vertical.

The second-place companies and research labs will continue to release their models as open source, which will cause further atrophy to the value of building a foundation model. Value will accrue in the product, as has always been the case.

SD will also generate what I tell it, unlike the corporate models that have all kinds of “safeguards”.
You must be stuck at SDXL for posting something absolutely and verifiably false as the sentence above.
OpenAI is like the super luxurious yacht all pretty and shiny, while Google's AI department is the humongous nuclear submarine at least 5 times bigger than the yacht with a relatively cool conning tower, but not that spectacular to look at.

Like the tanker which is still steering to fully align with the course people expect it to be, which they don't recognize that it will soon be there and be capable of rolling over everything which comes in its way.

If OpenAi claims they're close to having AGI, Google most likely already has it and is doing its shenanigans with the US government under the radar. While Microsoft are playing the cool guys and Amazon is still trying to get their act together.

All it took was a good old competition that has potential to steal user base from core Google search product. Nice to be back to competition era of web tech.
Or, using Occams Razor; Sundar is a shit CEO and is playing catchup with a company largely fueled by innovations created at Google but never brought to market because it would eat into ads revenue.

That, or they have a secret super human intelligence under wraps at the pentagon.

That's the conventional take, but (as far as I can tell), the TPU program was also started under Sundar, which would have been a bold investment at the time, and looks like absolute genius in retrospect.

OpenAI might be well-capitalized, but they're (1) bleeding money, (2) no clear path to profitability, and (3) competing head-to-head with a behemoth who can profitably provide a similar offering at 10-20x cheaper (literally).

Google might be slow out the blocks, but it's not like they've been sitting on their hands for the past decade.

No they’ve been acquiring layers upon layers of middle management for a decade.

That’s the core issue, and they’ve also pissed off a non-zero percentage of top talent by ditching what still existed of Google culture and going full “Corporate Megacorp” a few years ago.

Google is having to pay a ton to retain the talent they have left and it’s often not enough.

That might be true, but I'm not sure it will matter that much. They've put out two very compelling* products in the last month (Flash 2 & Veo), and hints that another Gemini Pro model is in the pipeline. When it comes to AI, they're in a very good position, no matter what middle-management shenanigans are going on behind the scenes. Their core ad business is also so absurdly profitable that overpaying for talent won't even register as pocket lint.

Google's biggest threat isn't OpenAI. It's the FTC (which I admit is a very real danger).

* from a developer/platform perspective, at least. The "consumer" facing side of things (e.g. the AI Studio UI) is still pretty awful.

  • ·
  • 4 days ago
  • ·
  • [ - ]
google definitely does not have AGI hhaaha
ex-googler confirms :/
Yeah pretty bad example from parent but the point stands I think... I mostly just assume that for everything ChatGPT hypes/teases Google probably has something equivalent internally that they just aren't showing off to the public.
I know that Google's internal ChatGPT alternative was significally worse than ChatGPT(confirmed both in news and by Googlers) around a year back. So you might say they might overtake OpenAI because of more resources, but they aren't significantly ahead of OpenAI.
just to remind everyone that state of the art was Will Smith Eating Spaghetti in April of 2023

https://arstechnica.com/information-technology/2023/03/yes-v...

We're not even done with 2024.

Just imagine what's waiting for us in 2025.

  • nosbo
  • ·
  • 4 days ago
  • ·
  • [ - ]
But it's the same thing just at a higher fidelity. Which is impressive don't get me wrong. But they are also kinda bad looking. Like even there good examples have so many issues. I just don't see how this gets extrapolated into the ideas in various posts like full length movies, custom TV shows and holodecks or whatever else people dream up. Do we have any examples of tech that just kept improving at exponential or linear rates? Why is everyone so confident it will just keep getting better?
> Do we have any examples of tech that just kept improving at exponential or linear rates?

SD Cards?

> Why is everyone so confident it will just keep getting better?

Because there are literally thousands of avenues to explore and we've only just begun with the lowest of low hanging fruit.

  • nosbo
  • ·
  • 3 days ago
  • ·
  • [ - ]
What should I be looking into?
My friend working in a TV station is already using these tools to generate videos for public advertising programs. It has been a blast.
This might be a dumb question to ask, but what exactly is this useful for? B-Roll for YouTube videos? I'm not sure why so much effort is being put into something like this when the applications are so limited.
If you want to train a model to have a general understanding of the physical world, one way is to show it videos and ask it to predict what comes next, and then evaluate it on how close it was to what actually came next.

To really do well on this task, the model basically has to understand physics, and human anatomy, and all sorts of cultural things. So you're forcing the model to learn all these things about the world, but it's relatively easy to train because you can just collect a lot of videos and show the model parts of them -- you know what the next frame is, but the model doesn't.

Along the way, this also creates a video generation model - but you can think of this as more of a nice side effect rather than the ultimate goal.

It doesn’t have to understand anything, none of these demonstrate reasoning or understanding.

All these models have just “seen” enough videos of all those things to build a probability distribution to predict the next step.

This is not bad, or make it inherently dumb, a major component of human intelligence is built on similar strategies. I couldn’t tell what grammatical rules are broken in text or what physical rules in a photograph but can tell it is wrong using the same methods .

Inference can take it far with large enough data sets, but sooner or later without reasoning you will hit a ceiling .

This is true for humans as well, plenty of people go far in life with just memorization and replication do a lot of jobs fairly competently, but not in everything.

Reasoning is essential for higher order functions and transformers is not the path for that

That's like saying that your brain doesn't understand anything, it just analyzes the visual data coming in via your eyes and predicts the next step of reality
The brain also does that . It doesn’t do it exclusively, but we do it an awful lot .

we do extensive amount of pattern matching and drop enormous amount of sensory input very quickly because we expect patterns and assume a lot about our surroundings.

Unlearning this is a hard skill to pick up. There are many versions of training from martial arts to meditation that attempt to achieve this .

Point is that alone is not sufficient, the other core component is reasoning and understanding , transformers and learning on data is insufficient .

Parrot and few other animals can imitate human speech very well , that doesn’t mean they are understanding the speech or constructing .

Don’t get me wrong, i am not saying it is not useful, it is , but this attribution of reasoning and understanding to models that foundationally has no such building block is just being impressed by a speaking parrot

I think people are just fundamentally not willing to attribute intelligence to things that can't have conversations. This is why the incredible belief was possible that babies or dogs don't feel pain. Once the AI is given some long term memory all of these ideas that AI is just a parrot will suddenly be gone and I personally think that it will probably be pretty easy to give robots memories and their own personal motivations. All you have to achieve is to train them in realtime and the rest is an optimization, you want the training to make sense and have it not store/believe every single thing that it is being told etc.
It is also the corollary: we tend to attribute intelligence to things merely because it can have conversations from the first golden era of AI in 1960's that is always the case.

Mimicking more patterns like emotion and motivation may be better user experience, it doesn't make the machine any smarter, just a better mime.

Your thesis is that as we mimic reality more and more the differences will not matter, this is a idea romanticized by popular media like Blade Runner.

I believe there are classes of applications, particularly if the goal singularity or better than human super intelligence, emulating human responses no matter how sophisticated won't take you take there. Proponents may hand wash this as moving the goalposts, it is only refining the tests to reflect the models of the era.

If the proponents of AI were serious about their claims of intelligence than they should also be pushing for AI rights , there is no such serious discourse happening, only issues related to human data privacy rights on what can be used by AI models for learning or where they can the models be allowed to work.

> If the proponents of AI were serious about their claims of intelligence than they should also be pushing for AI rights , there is no such serious discourse happening

It's beginning to happen. Anthropic hired their first AI welfare researcher from Eleos AI, which is an organization specifically dedicated to investigating this question: https://eleosai.org/

Back when computers took up a whole room, you'd also have asked: "but what exactly is this useful for? B-Roll some simple calculations that anybody can do with a piece of paper and a pen."?

Think 5-10 years into the future, this is a stepping stone

That's comparing apples to oranges though isn't it? Generating videos is the output of the technology, not the tech itself. It would be like someone asking "this computer that takes up a whole room printed out ascii art, what is this useful for?"
all the "creative" gen ai does a thing worse and more annoying than what exists now. the first computers did calculations faster and faster with immediate utility (for defense mostly)
this is kind of an unfair comparison. Whats the endpoint of generating AI videos? What can this do that is useful, contributes something to society, has artistic value, etc etc. We can make educational videos with a script but its also pretty easy for motivated parties to do that already, and its getting easier as cameras get better and smaller. I think asking "whats the point of this" is at least fair.
The end point is enabling people to put into video what is in their mind. Like a word processor for video. When you remove the need to have a room full of VFX artists to make a movie, then anyone can make a movie. Whether this is beneficial is dubious, but that's an end goal if you are looking for one.
They’re a way firo
They were calculating missile trajectories, everybody understood what they were useful for.
We're preparing to use video generation (specifically image+text => video so we can also include an initial screenshot of the current game state for style control) for generating in-game cutscenes at our video game studio. Specifically, we're generating them at play-time in a sandbox-like game where the game plays differently each time, and therefore we don't want to prerecord any cutscenes.
Okay, so is the aim to run this locally on a client's computer or served from a cloud? How does the math work out where it's not just easier at that point to render it in game?
in it's current state, it's already useful for b-roll, video backgrounds for websites, and any other sort of "generic" application where the point of the shot is just to establish mood and fill time.

but more than anything it's useful as a stepping stone to more full-featured video generation that can maintain characters and story across multiple scenes. it seems clear that at some point tools like this will be able to generate full videos, not just shots.

TV commercials / youtube ads. You don't need a video team anymore to make an ad.
This is a first step towards "the holodeck". You describe a scene and it exists. Imagine you could jump in and interact with it. That seems like something that could happen in 10-20 years.
  • mbil
  • ·
  • 4 days ago
  • ·
  • [ - ]
You and your friends gather around the TV to watch a video about the time that you all traveled abroad and met a mysterious stranger. In the film, you witness each other take incredible risks, have intimate private conversations, and change in profound ways. Of course none of it actually happened; your voices and likenesses were fed into the movie generator. And did I mention in the film you’re driving expensive cars and wearing designer clothes?
Are they that limited? It's a machine that can make videos from user input: it can ostensibly be used wherever you need video, including for creative, technical and professional applications.

Now, it may not be the best fit for those yet due to its limitations, but you've gotta walk before you can run: compare Stable Diffusion 1.x to FLUX.1 with ControlNet to see where quality and controllability could head in the future.

  • ·
  • 4 days ago
  • ·
  • [ - ]
I have observed some musicians creating their own music videos with tools like this.
This silly music video was put together by one person in about 10 hours.

https://www.reddit.com/r/aivideo/comments/1hbnyi2/comment/m1...

Another more serious music video also made entirely by one person. https://www.youtube.com/watch?v=pdqcnRGzH5c Don't know how long it took though.

Because it's pretty cool to be able to imagine any kind of scene in your head, put it into words, then see it be made into a video file that you can actually see and share and refine.
Use your imagination.
this is perfect for the landing page of any website I make

my templates all are waiting for stock videos to be added looping in the background

you have no idea how cool I am with the lack of copyright protections afforded to these videos I will generate, I'm making my money other ways

Streaming services where there is no end to new content that matches your viewing patterns.
this sounds awful haha
It's got a lot of potential as a way for google to get paid for other people's skills and hard work instead of the people that made all of that "data".
It’s kind of hilarious that anybody considers this “democratizing” creating media. How many people that need a video clip are going to be capable of running an open version of this themselves? The wonky “open” models aren’t even close. How much do you think these services are going to cost once the introductory period financed by race-to-the-bottom money stops? OpenAI already charges $200/mo if you want to be guaranteed more than 30-60 minutes of Advanced Voice. The introductory period exists solely to get people engaged enough to push through blatantly stealing millions of artists creative output so they can have a beautiful tool they sell to Hollywood for a whole lot of money that’s still less than traditional vfx, and to m everyone gets to dink around in the useless free models or too-expensive-for-most prosumer tools and people with expensive video card arrays or the functional equivalent will still be niche tinkering hobbyists with inferior tooling and models and the skilled commercial artists still employed are being paid shit because of market forces. Great job SV. Making the world a better place.
You really think making videos with computers is not useful? Is this a joke?
The example of a "Renaissance palace chamber" is very historically inaccurate by around a century or two, the generated video looks a lot like a pastiche of Versailles from the Age en Enlightenment instead. I guess that's what you get by training on the internet.
  • ralfd
  • ·
  • 4 days ago
  • ·
  • [ - ]
I watched that 10 times because the details are bonkers and I find amazing that she and the candle is visible in the mirror! Speaking of inaccuracy though are these pencils/textmarkers/pens on the desk? ;)
What's inaccurate about it?
It's technically and superficially breathtaking, but on closer inspection, it's a mishmash of styles across like 500 years.

- gold everywhere is excessive - more Rococo (1730s-1760s) than Renaissance (1300-1600 roughly), which was a lot more restrained

- mirror way too big and clear. Renaissance mirrors were small polished metal or darker imperfect glass

- candelabras too ornate and numerous for Renaissance. Multi tier candleholders are more Baroque (1600-1750), and candles look suspiciously perfect, as opposed to period-appropriate uneven tallow or beeswax

- white paper too pristine (parchment or vellum would be expected), pen holders hilariously modern, gold-plated(??) desk surface is absurd

- woman's clothing looks too recent (Victorian?); sleeves and hair are theatrical

- hard to tell, but background dudes are lurking in what look like theatrical costumes rather than anything historically accurate

  • chrsw
  • ·
  • 4 days ago
  • ·
  • [ - ]
So, in addition to images that don't look right the web will also be flooded with animations and videos that are disturbingly awful. Great.
I'm always curious with the examples in these announcements, how close is the training data to the sample prompts? And how much of the prompt is important or ends up ignored in the result?

The prompt for the figure running through glowing threads seems to contain a lot of detail that doesn't show up in the video.

In the first example (close-up of DJ), the last line about her captivating presence and the power of music I guess should give the video a "vibe" (compared to prescriptively describing the video). I wonder how the result changes if you leave it out?

Cynically I think that it's a leading statement there for the reader rather than the model. Like now that you mention it, her presence _is_ captivating! Wow!

  • ·
  • 4 days ago
  • ·
  • [ - ]
I've already started not to notice the quality differences in the photo-like images produced by each image generation model.

Now, examples of image or video generation models showing off how great they are should be stickman drawings or stickman videos. As far as I know, no model has been able to do that properly yet. If a model can do it well, it will be a huge breakthrough.

Is it just me or do all these models generate everything in a weird pseudo-slow motion framerate?
I've noticed this too, it's extremely prominent to me and I'm not sure why it's not discussed frequently.
Better than the opposite. You can always skip frames to get the normal speed. But motion interpolation never looks good to me.
I mean, I'm not sure it's done deliberately but... if I was trying to guarantee video gen was always 5 seconds in a consistent manner and the gen process was highly non-deterministic then if the resultant video would have only been 3 seconds I'd stretch it out, interpolate the frames, and then send it down the pipes.

Another point to consider is that if my generative video system isn't good at maintaining world consistency, then doing a slow-motion video gives the illusion of a long video while being able to maintain a smaller "world context".

  • m3kw9
  • ·
  • 4 days ago
  • ·
  • [ - ]
With unfettered access to video training data from YouTube this isn’t all surprising they can get be better than what OpenAI has with Sora. Not sure how they will respond
Impressive but the page crashed chrome on my iPad!
Might be time for a new iPad. My old-school iPad Air has 2gb of memory and is an absolute hog when loading content-heavy websites.
It crashed Safari on iPhone 16 Pro Max, I doubt it's the device.

The website is horrible on resources.

  • wruza
  • ·
  • 4 days ago
  • ·
  • [ - ]
This site is not content-heavy, it’s less than a bunch of videos, pictures and text. This site was just done by useless jokes not worth their money.
  • zb3
  • ·
  • 4 days ago
  • ·
  • [ - ]
We should collectively ignore these announcements of unavailable models. There are models you can use today, even in the EU.
Actually there is a pretty significant new model announced today and available now: "MiniMax (Hailuo)Video-01-Live" https://blog.fal.ai/introducing-minimax-hailuo-video-01-live...

Although I tried that and it has the same issue all of them seem to have for me: if you are familiar with the face but they are not really famous then the features in the video are never close enough to be able to recognize the same person.

It was announced weeks ago.

50 cents per video. Far more when accounting for a cherrypick rate.

I don't see why, unless you think they're lying and they filmed their demos, or used some other preexisting model. I didn't ignore the JWST launch just because I haven't been granted to ability to use the telescope.
  • zb3
  • ·
  • 4 days ago
  • ·
  • [ - ]
Back when Imagen was not public, they didn't properly validate whether you were a "trusted tester" on the backend, so I managed to generate a few images..

..and that's when I realized how much cherry picking we have in these "demos". These demos are about deceiving you into thinking the model is much better than it actually is.

This promotes not making the models available, because people then compare their extrapolation of demo images with the actual outputs. This can trick people into thinking Google is winning the game.

as OpenAI released a feature that hit Google where it hurts, Google released Veo 2 to utterly destroy OpenAI's Sora.

Google won.

Random fact: Veo means "I see" in Spanish. Take it on any way you want.
Hernan Moraldo is from Argentina. That may be all there is to it.
  • mgnn
  • ·
  • 4 days ago
  • ·
  • [ - ]
Video without the id. You pick which definition of id.
While Video means "I see" in latin
Impressive we can do that - but, again, a hyped up solution in search for a problem after pouring tons of resources into it
  • brap
  • ·
  • 4 days ago
  • ·
  • [ - ]
Google is killing it
It's interesting they host these videos on YouTube, cause it signals they're fine with AI generated content. I wonder if Google forgets that the creators themselves are what makes YouTube interesting for viewers.
  • yoavm
  • ·
  • 4 days ago
  • ·
  • [ - ]
What makes you think that viewers wouldn't be watching AI generated content? Considering the possibilities of fake videos, I'm sure that it can be very engaging. And the costs are zero.
The costs are not zero. I recently generated a short AI video for my son in Runway Act One. That $15 balance evaporated in like 6 prompts.

Of course, it's orders of magnitude cheaper than making a video or an animation yourself.

  • yoavm
  • ·
  • 4 days ago
  • ·
  • [ - ]
It was a figure of speech. Comparing to how much it can cost to make a non-AI video, this is basically free, and if we can learn from the change of costs in LLMs, the price will probably be ~10% in ~2 years from now.
Search youtube for stoicism, you'll find an overwhelming amount of generated content. And a lot of other niche subjects have been colonized like that.
  • yoavm
  • ·
  • 4 days ago
  • ·
  • [ - ]
And that shows that viewers won't be watching AI generated content? If anything, I think that it shows exactly what I'm saying - that there are viewers and the cost is essentially zero.
  • ·
  • 4 days ago
  • ·
  • [ - ]
Anybody does realize this is very sad?

Namely, so few neurons to get picture in our heads.

I guess, end of the world scenarios may lead us to create that super intelligence with a gigantic ultra performant artificial "brain".

Website keeps crashing and reloading on Brave iOS.
Same here. Well, Google being Google, not surprised.
  • wruza
  • ·
  • 4 days ago
  • ·
  • [ - ]
A page with a bunch of videos struggles to scroll on iphone and crashes the browser for me. Google actively punches through rock bottom with its frontend teams.
Yeah but I’m sure they crushed Leetcode exercises
  • ible
  • ·
  • 4 days ago
  • ·
  • [ - ]
That product name sucks for Veo the AI sports video camera company who literally makes a product called the Veo 2. (https://www.veo.co)
Judging by how they've been trying to ram AI into YouTube creators workflows I suppose it's only a matter of time before they try to automate the entire pipeline from idea, to execution, to "engaging" with viewers. It won't be good at doing any of that but when did that ever stop them.

https://www.youtube.com/watch?v=26QHXElgrl8

https://x.com/surri01/status/1867433782992879617

They basically already have this: https://workspace.google.com/products/vids/
  • cj
  • ·
  • 4 days ago
  • ·
  • [ - ]
Last week I started seeing a banner in Google Docs along the lines of "Create a video based on the content of this doc!" with a call to action that brought me to Google Vids.
  • lukan
  • ·
  • 4 days ago
  • ·
  • [ - ]
Hey, it's AI and so it is good, right?

Seriously, it sounds like something kids can have fun with, or bored deskworkers. But a serious use case, at the current state of the art? I doubt it.

And then suddenly this is not something that fascinates people anymore… in 10 years as non-synthetic becomes the new bio or artisan or whatever you like.

Humanity has its ways of objecting accelerationism.

Put another way, over time people devalue things which can be produced with minimal human effort. I suspect it's less about humanity's values, and more about the way money closely tracks "time" (specifically the duration of human effort).
  • EGreg
  • ·
  • 4 days ago
  • ·
  • [ - ]
I strongly disagree. How many clothes do you buy that have 100 thread count, and are machine-made, vs hand-knit sweaters or something?

When did you ask people for directions, or other major questions, instead of Google?

You can wax poetic about wanting "the human touch", but at the end of the day, the market speaks -- people will just prefer everything automated. Including their partners, after your boyfriend can remember every little detail about you, notice everything including your pupils dilating, know exactly how you like it, when you like it, never get angry unless it's to spice things up, and has been trained on 1000 other partners, how could you go back? When robots can raise children better than parents, with patience and discipline and teaching them with individual attention, know 1000 ways to mold their behavior and achieve healthier outcomes. Everything people do is being commodified as we speak. Soon it will be humor, entertainment, nursing, etc. Then personal relations.

Just extrapolate a decade or three into the future. Best case scenario: if we nail alignment, we build a zoo for ourselves where we have zero power and are treated like animals who have sex and eat and fart all day long. No one will care about whatever you have to offer, because everyone will be surrounded by layers of bots from the time they are born.

PS: anything you write on HN can already have been written by AI, pretty soon you may as well quit producing any content at all. No one will care whether you wrote it.

> PS: anything you write on HN can already have been written by AI, pretty soon you may as well quit producing any content at all. No one will care whether you wrote it.

People theoretically would care, but the internet has already set up producing things to be pseudo-anonymous, so we have forgotten the value of actually having a human being behind content. That's why AI is so successful, and it's a damn shame.

What exactly is the value of having a human behind content if it gets to the point that content generated by AI is indistinguishable from content generated by humans?
The fact that anyone would ask this question is incredible!

It's so we can in a fraction of those cases, develop real relationships to others behind the content! The whole point of sharing is to develop connections with real people. If all you want to do is consume independently of that, you are effectively a soulless machine.

I think "indistinguishable" is a receding horizon. People are already good at picking out AI text, and AI video is even easier. Even if it looks 100% realistic on the surface, the content itself (writing, concept, etc) will have a kind of indescribable "sameness" that will give it away.

If there's one thing that connects all media made in human history, it's that humans find humans interesting. No technology (like literally no technology ever) will change that.

> People are already good at picking out AI text, and AI video is even easier.

Source? My experience has been that people at most might be “ok” at picking up completely generic output, and outright terrible at identifying anything with a modicum of effort or chance placed into it.

> My experience has been that people at most might be “ok” at picking up completely generic output, and outright terrible at identifying anything with a modicum of effort or chance placed into it.

Bold of you to assume any effort is placed into content when the entire point of using AI in the first place is to avoid this.

> Bold of you to assume any effort is placed into content when the entire point of using AI in the first place is to avoid this.

I mean, i've seen people using it in that way yes. These are normally the same people I saw copying and pasting the first google result they found for any search as an answer to their customers/co-workers etc. qOr to whom you would say "Do not send this to the customer, this is my explanation to you, use your own words, this is just a high level blah blah" and then five minutes later you see your response word for word having gone out to a customer with zero modification or review for appropriateness.

I equally see a very different kind of usage, where its just another tool used for speeding up portions of work, but not being produced to complete a work in totality.

Like sadly yes, i've now see sales members with rando chrome extensions that just attach AI to everything and they just let it do whatever the fuck it wants, which makes me want to cry...but again, these people were already effectively doing that, they are just doing it faster than ever.

What does indistinguishable even mean here?

If a fish could write a novel, would you find what it wrote interesting, or would it seem like a fish wrote it? Humans absorb information relative to the human experience, and without living a human existence the information will feel fuzzy or uncanny. AI can approximate that but can't live it for real. Since it is a derivative of an information set, it can never truly express the full resolution of it's primary source.

> What exactly is the value of having a human behind content if it gets to the point that content generated by AI is indistinguishable from content generated by humans?

What would be the point of paying for AI content if nobody did anything to produce it? Just take that shit!

>PS: anything you write on HN can already have been written by AI

Yeah in some broad sense, the same as we've always had: back in the 2010s it could have been generated by a Markov chain, after all. The only difference now is that the average quality of these LLMs is much, much higher. But the distribution of their responses is still not on par with what I'd consider a good response, and so I hunt out real people to listen to. This is especially important because LLMs are still not capable of doing what I care most about: giving me novel data and insights about the real world, coming from the day to day lived experience of people like me.

HN might die but real people will still write blogs, and real people will seek them out for so long as humans are still economically relevant.

I have both machine-made and hand-knit sweaters. In general, I expect handmade clothes to be more expensive than machine-made, which kinda proves my point. I never said machine-made things had zero value. I said we will tend to devalue them relative to more human-intensive things.

Asking for directions is a bad example, because it takes very little time for both humans and machines to give you directions. Therefore it would be highly unusual for anyone to pay for this service (LOL)

Yes, exactly. Marx had this right. Money is a way to trade time.
> Humanity has its ways of objecting accelerationism.

Actually, typically human objection only slows it down and often it becomes a fringe movement, while the masses continue to consume the lowest common denominator. Take the revival of the flip phone, typewriter, etc. Sadly, technology marches on and life gets worse.

Does life get worse for the majority of people or do the fruits of new technology rarely address any individual person’s progress toward senescence? (The latter feels like tech moves forward but life gets worse.)
Of course, it depends on how you define "worse". If you use life expectancy, infant mortality, and disease, then life has in the past gotten better (although the technology of the past 20 years has RARELY contributed to any of that).

If you use 'proximity to wild nature', 'clean air', 'more space', then life has gotten worse.

But people don't choose between these two. They choose between alternatives that give them analgesics in an already corrupt society creating a series of descending local maximae.

Are you kidding?

TikTok is one of the easiest platforms to create for, and look at how much human attention it has sucked up.

The attention/dopamine magnet is accelerating its transformation into a gravitational singularity for human minds.

TikTok’s main attraction are the people, not just the videos. Trends, drama and etc. all involve real humans doing real human stuff, so it’s relatable.

I might be wrong, but AI videos are on the same path as AI generated images. Cool for the first year, then “ah ok, zero effort content”.

Sure, humanity has its ways of objecting Accelerationism, but the process fundamentally challenges human identity:

"The Human Security System is structured by delusion. What's being protected there is not some real thing that is mankind, it's the structure of illusory identity. Just as at the more micro level it's not that humans as an organism are being threatened by robots, it's rather that your self-comprehension as an organism becomes something that can't be maintained beyond a certain threshold of ambient networked intelligence." [0]

See also my research project on the core thesis of Accelerationism that capitalism is AI. [1]

[0] https://syntheticzero.net/2017/06/19/the-only-thing-i-would-...

[1] https://retrochronic.com/

  • noch
  • ·
  • 4 days ago
  • ·
  • [ - ]
> Judging by how they've been trying to ram AI into YouTube creators workflows […]

Thanks for sharing that video and post!

One way to think about this stuff is to imagine that you are 14 and starting to create videos, art, music, etc in order to build a platform online. Maybe you dream of having 7 channels at the same time for your sundry hobbies and building audiences.

For that 14 year old, these tools are available everywhere by default and are a step function above what the prior generation had. If you imagine these tools improving even faster in usability and capability than prior generations' tools did …

If you are of a certain age you'll remember how we were harangued endlessly about "remix culture" and how mp3s were enabling us to steal creativity without making an effort at being creative ourselves, about how photobashing in Photoshop (pirated cracked version anyway) was not real art, etc.

And yet, halfway through the linked video, the speaker, who has misgivings, was laughing out loud at the inventiveness of the generated replies and I was reminded that someone once said that one true IQ test is the ability to make other humans laugh.

> laughing out loud at the inventiveness of the generated replies

Inventive is one way of putting it, but I think he was laughing at how bizarre or out-of-character the responses would be if he used them. Like the AI suggesting that he post "it is indeed a beverage that would make you have a hard time finding a toilet bowl that can hold all of that liquid" as if those were his own words.

"remix culture" required skill and talent. Not everyone could be Girl Talk or make The Grey Album or Wugazi. The artists creating those projects clearly have hundreds if not thousands of hours of practice differentiating them from someone who just started pasting MP3s together in a DAW yesterday.

If this is "just another tool" then my question is: does the output of someone who has used this tool for one thousand hours display a meaningful difference in quality to someone who just picked it up?

I have not seen any evidence that it does.

Another idea: What the pro generative AI crowd doesn't seem to understand is that good art is not about _execution_ it's about _making deliberate choices_. While a master painter or guitarist may indeed pull off incredible technical feats, their execution is not the art in and of itself, it is widening the amount of choices they can make. The more and more generative AI steps into the role of making these choices ironically the more useless it becomes.

And lastly: I've never met anyone who has spent significant time creating art react to generative AI as anything more than a toy.

> does the output of someone who has used this tool for one thousand hours display a meaningful difference in quality to someone who just picked it up?

Yes. A thousand hours confers you with a much greater understanding of what it's capable of, its constraints, and how to best take advantage of these.

By comparison, consider photography: it is ostensibly only a few controls and a button, but getting quality results requires the user to understand the language of the medium.

> What the pro generative AI crowd doesn't seem to understand is that good art is not about _execution_ it's about _making deliberate choices_. While a master painter or guitarist may indeed pull off incredible technical feats, their execution is not the art in and of itself, it is widening the amount of choices they can make.

This is often not true, as evidenced by the pre-existing fields of generative art and evolutionary art. It's also a pretty reductive definition of art: viewers can often find art in something with no intentional artistry behind it.

> I've never met anyone who has spent significant time creating art react to generative AI as anything more than a toy.

It's a big world out there, and you haven't met everyone ;) Just this last week, I went to two art exhibitions in Paris that involved generative AI as part of the artwork; here's one of the pieces: https://www.muhka.be/en/exhibitions/agnieszka-polska-flowers...

  • noch
  • ·
  • 4 days ago
  • ·
  • [ - ]
> Just this last week, I went to two art exhibitions in Paris that involved generative AI as part of the artwork; here's one of the pieces

The exhibition you shared is rather beautiful. Thank you for the link!

  • noch
  • ·
  • 4 days ago
  • ·
  • [ - ]
> "remix culture" required skill and talent.

We were told that what we were doing didn't require as much skill as whatever the previous generation were doing to sample music and make new tracks. In hindsight, of course you find it easy to cite the prominent successes that you know from the generation. That's arguing from survivorship bias and availability bias.

But those successes were never the point: the publishers and artists were pissed off at the tens of thousands of teenagers remixing stuff for their own enjoyment and forming small yet numerous communities and subcultures globally over the net. Many of us never became famous so you can cite our fame as proof of skill but we made money hosting parties at the local raves with beats we remixed together ad hoc and that others enjoyed.

> The artists creating those projects clearly have hundreds if not thousands of hours of practice differentiating them from someone who just started pasting MP3s together in a DAW yesterday.

But they all began as I did, by being someone who "just started pasting MP3s together" in my bedroom. Darude, Skrillex, Burial, and all the others simply kept doing it longer than those who decided they had to get an office job instead.

The teenagers today are in exactly the same position, except with vastly more powerful tools and the entire corpus of human creativity free to download, whether in the public domain or not.

I guess in response to your "required skill and talent", I'm saying that skill is something that's developed within the context of the technology a generation has available. But it is always developed, then viewed as such in hindsight.

> If this is "just another tool" then my question is: does the output of someone who has used this tool for one thousand hours display a meaningful difference in quality to someone who just picked it up?

Yes, absolutely. Not necessarily in apparent execution without knowledge of intent (though, often, there, too), but in the scope of meaningful choices that fhey can make and reflect with the tools, yes.

This is probably even more pronounced with use of open models than the exclusively hosted ones, because more choices and controls are exposed to the user (with the right toolchain) than with most exclusively-hosted models.

  • EGreg
  • ·
  • 4 days ago
  • ·
  • [ - ]
Who needs viewers anyway? Automate the whole thing. I just see the endgame for the internet is https://en.wikipedia.org/wiki/Dead_Internet_theory
  • wruza
  • ·
  • 4 days ago
  • ·
  • [ - ]
They already do that. 90% of cheerful top comments on some channels are clearly generated. They never mention content (yet) and are abstract as hell. “Very useful video, thanks”, “I’m watching this every day, love the content” and so on. It’s unclear if they have a real view count either, at least in promotion phase.
It’s telling that safety and responsibility gets so much fluff words, technical details are fairly extensive, but no mention of the training data? It’s clearly relevant for both performance and ethical discussions.

Maybe it’s just me who couldn’t find it, (the website barely works at all on FF iOS)..

  • ·
  • 4 days ago
  • ·
  • [ - ]
Most people called that the second one of the companies stop caring about safety, others will stop as well. People hate being told what they’re not supposed to do. And not companies will go forward with abandoning their responsible use policies.
Huge swathes of social media users are going to love this shit. It makes me so sad.
Time and money are better spent on creating actual video, animation, and art than this gen AI drivel.
Google being Google:

> VideoFX isn't available in your country yet.

Don't worry, even if it was "available" in your country, it's not really available. I am in the US and I just see a waitlist sign up.
Give it a few months and it'll get cancelled
Why would the country get cancelled?
He means the project, obviously
[flagged]
My theory as to why all the bigtech companies are investing so much money in video generation models is simple: they are trying to eliminate the threat of influencers/content creators to their ad revenue.

Think about it, almost everyone I know rarely clicks on ads or buys from ads anymore. On the other hand, a lot of people including myself look into buying something advertised implicitly or explicitly by content creators we follow. Say a router recommended by LinusTechTips. A lot of brands started moving their as spending to influencers too.

Google doesn't have a lot of control on these influencers. But if they can get good video generations models, they can control this ad space too without having human in the loop.

It's so much simpler than that:

1) AI is a massive wave right now and everyone's afraid that they're going to miss it, and that it will change the world. They're not obviously wrong!

2) AI is showing real results in some places. Maybe a lot of us are numb to what gen AI can do by now, but the fact that it can generate the videos in this post is actually astounding! 10 years ago it would have been borderline unbelievable. Of course they want to keep investing in that.

> Think about it, almost everyone I know rarely clicks on ads or buys from ads anymore.

This is a typical tech echo chamber. There is a significant number of people who make direct purchases through ads.

> But if they can get good video generations models, they can control this ad space too without having human in the loop.

Looks like based on a misguided assumption. Format might have significant impacts on reach, but decision factor is trust on the reviewer. Video format itself does not guarantee a decent CTR/CVR. It's true that those ads company find this space lucrative, but they're smart enough to acknowledge this complexity.

> This is a typical tech echo chamber. There is a significant number of people who make direct purchases through ads.

Even if its not, TV ads, newspaper ads, magazine ads, billboards, etc... get exactly 0 clickthrus, and yet, people still bought (and continue to buy) them. Why do we act like impressions are hunky-dory for every other medium, but worthless for web ads?

> Think about it, almost everyone I know rarely clicks on ads or buys from ads anymore.

I remember saying this to a google VP fifteen years ago. Somehow people are still clicking on ads today.

  • wruza
  • ·
  • 4 days ago
  • ·
  • [ - ]
Sometimes it feels like we could solve most of the world’s problems by simply finding all those people and giving them a good talk. Cause I know that even stupid ads may work, on you, on me, on someone else, simply by mentioning brand existence. But clicking on ads equals to signing your own stupidity in my book. It must be not more than a few per thousand. Maybe the world is so big that even 0.1% is enough?
I did not think about that angle yet but I have to admit, I agree. I rarely ever even pay attention to the YT ads and kind of just zone out but the recommendations by content creators I usually watch are one of the main sources I keep up with new products and decide what to buy.
> Think about it, almost everyone I know rarely clicks on ads or buys from ads anymore.

Most people have claimed not to be influenced by ads since long before networked computers were a major medium for delivering them.

Nah. They're trying to eliminate the threat of content creators, artists, designers, animators, etc getting paid for their art and hard won skill instead of google.
[dead]