David Greene: https://youtu.be/xYxQrLp4MQk
NotebookLM: https://youtu.be/AR4dRtzFvxM
I think he just has "podcast guy" voice. It's pretty generic.
Btw, are you sure that is the part David Greene is upset about? The NotebookLLM hosts will vary their voice, and jump into and out of different voices in a glitchy manner sometimes.
Why is everybody so inclined to defend NotebookLLM here? I've heard Chris Fisher and other Jupiter Broadcasting hosts, but also leo Laporte (from TWIT) for example. It's obvious it is trained on a lot of open podcasting material and clones a voice every now and then.
Nor does it seem like his voice but changed "just enough" (like in pitch).
I agree, he just has a very generic-sounding "podcast guy" voice. And obviously, NotebookLM trained on tons of podcasts and is generating a highly generic, average-sounding voice. Which is why it's pitched higher, since David Greene has a lower than average pitch.
This lawsuit is either just to generate buzz to build his personal brand, or maybe he's worried about the competitive threat from AI. But there's no way he's going to win this suit. This isn't like the case with Bette Midler, where Ford intentionally hired someone to mimic her voice.
However it does seem to copy the the way he "lisps" his S's. I am not sure that is common 'generic-sounding "podcast guy" voice'.
What you're hearing is the way microphones deal with the hissing of an "s", same as they struggle with plosives like "p", from the whoosh of air. It's an artifact of microphones close to the mouth, so it makes sense that Google replicates it.
You can use physical pop filters or digital audio filters to reduce the effect, but podcasters don't usually use the physical ones, and the level of audio processing podcasters do really depends on their level of expertise and how much they even care.
In particular, check out the pronunciation of the trailing S is the word "this" at 28 seconds in the clip of Davide Greene compared to 24 seconds in the Notebook LM clip. Really seemed uncannily similar to me.
To me that ‘s’ sound reminds me of the sibilance a Shure SM58 picks up without a pop filter. I hear a different side of the same idea on ‘p’ and ‘b’ as well.
I had a speech impediment as a youngster and the sound got in my head. Now I hear it on podcasts.
As the article notes, the AI doesn't even have to be trained on Greene's voice for him to have a case.
> Grimmelmann said Greene doesn’t necessarily have to show definitively that Google trained NotebookLM on his voice to have a case, or even that the voice is 100 percent identical to his. He cited a 1988 case in which the singer and actress Bette Midler successfully sued Ford Motor Company over a commercial that used a voice actor to mimic her distinctive mezzo-soprano. But Greene would then have to show that enough listeners assume it’s Greene’s voice for it to affect either his reputation or his own opportunities to capitalize on it.
There's a lot of characteristics to people's voices. Tons of people impersonate Trump purely through cadence. Same with Obama. How many singers impersonate Tom Waits?
Only the pack-a-day smokers, and we're always a dying breed.
So I would say that where there is smoke there is sometimes fire at this point.
But I’m the guy who blurts out how the voice actor for the gate guard played the brother in that movie with that guy. And I can hear what he’s complaining about. There’s a lot of elements of his voice and the tempo is pretty close.
)usually it’s the tempo and certain phonemes that give people away to me when they are doing a different accent)
There are going to be countless people that think AI is using their voice. Humans share remarkably similar voices, but obviously you can’t copy that (other than impersonations, obviously).
Unless there is evidence that a company intentionally went after a specific human voice to train their AI, there’s no reason to report on these people claiming AI is using their voice.
Maybe if it’s someone with a very distinctive voice. But this guy, as the OP said, just has a “generic podcast guy” voice.
Of course you can have an AI target someone else’s voice. My point is that unless there is evidence it was intentional, it’s silly to claim that just because it sounds similar to a human’s voice, that means it must’ve been intentional.
But they did. It's literally what the article, and this thread are about.
As @crazygringo said, David's voice is lower. I think it might have some of the same harmonics, but it has some lower ones too, which make the overall sound come across as lower-pitched. I'm not using technical terminology here, so perhaps someone can jump in with the appropriate terms.
As for his wife, it's possible that he speaks in a higher/friendlier register when talking to her/their kids.
But I am also very anti-AI in the artistic space, because if it weren’t for humans freely providing so much artistic content, we wouldn’t have this outcome. And I believe the only end result will be less humans openly sharing knowledge, because some heavily money backed entities will just steal all the art and put it behind a paywall or advertisement.
As much as I appreciate the easy search (because actual useful search has become nonexistent since AI) and the ability to ask AI to find some metadata from a large data payload, I also dislike AI, because it has effectively broken the open internet and the willingness for humans to be open to freely sharing knowledge.
Copying does not directly deprive anyone of anything. In fact it just adds more value to the world, and makes it more available to more people.
Nobody can "copy" stuff and put it behind a paywall, because the original is still free. It's the prevention of copying that leads to expression being locked behind paywalls.
It's said that copying disincentivizes creativity and creation, but in practice it does the opposite. Just look at the incredible amount of music, fiction, software, stories, art, and information that have proliferated since the birth of the web.
What copying does do is it indirectly deprives people and companies of the ability monopolize profits on particular expressions without competition. But I'm not so sure that's a bad thing.
For example, look at the software industry. I'm extremely grateful that patents and copyright are so rarely enforced in software and UI design, and that we've all been copying the good ideas that came before us for decades with no consequence. I'm grateful the same is true of food recipes, too. I think the world would likely be a richer one if this was true for most fields and art.
So they got someone who could fake it pretty well.
Ofcourse fast forward in 2026 an actor automatically sells off their face, voice and soul when they sign a contract in perpuity.
Edit, here an older piece, there have been many since: [0], it’s the 3rd voice that enters the NotebookLLM clip so it takes a minute before it comes in (shared this clip here late 2024 [1]).
[0] https://podverse.fm/clip/Vy4y7ZG2Rd
[1] https://hn.algolia.com/?query=NotebookLM%20Copied%20a%20Podc...
I kept listening waiting to hear the voice that was supposed to sound like him, and never did.
Was it the first one (I heard three different voices during the clip)? That one is considerably deeper than the podcaster's voice, and has different tones, too. It definitely wasn't the last one, that one was much higher pitched (and then a female voice in the middle).
Feels like a big stretch, to say the least. But I can tell a big difference between the two.
Ultimately, it's like some of the music copyright lawsuits, where they're suing over chord progression. There are a billion voices on the planet -- any AI generated voice is going to sound similar to someone else's real voice (and again, I don't hear it at all in this case).
EDIT: So it's the third voice apparently. The pitch is close, but the tones and accents still definitely feel "off" enough that it doesn't sound like they were intentionally going for this guy. It still feels like a stretch to me, but not as much as the first voice did.
But it is always possible that this is what Chris sounds like in his own head. Nobody listening to audio will hear it the way he does.
But as said, this is an old example, there have been many since (which I am too lazy to look up) that are also very clear voice clones.
As with most (all?) things we do, exposure is king. This is how we don't die from trying to process infinite dimensional reality. The brain compresses, it prunes. Things seem similar if you don't have much need to distinguish them.
Unless you've listened to hours of either NotebookLM or Greene, you simply won't be able to participate in the distinguishing of these voices with much ability.
However, the equation changes considerably when the voice becomes familiar. You can imagine it like going from CPU to an ASIC. The brain is rather good at telling when a voice is your friend or not, the evolutionary pressure should be clear. Therefore, the people most qualified to speak on this matter will be first and foremost Greene and his podcast fans. It's a matter of exposure.
It's so easy to do now. You can just grab your favorite voiceover artist's demo reel and clone it from there. The chances of getting caught are slim, and what is the (poorly paid) artist going to do? Most of them will lack the resources to fund a protracted court case to sue some anonymous users in Tajikistan making AI slop videos en masse.
There is a lot of variability on this from person to person.
A lot of people are terrible at recognizing voices out of context. I have always been able to recognize people's voices just about as easily as their faces.
(Unfortunately, while this is a neat parlor trick, I haven't found it to be a particularly valuable skill).
It doesn't matter whether it sounds distinctive to you. What matters is whether it's close enough to the real person's voice to be an infringement.
Just like it doesn't matter if you used a machine to duplicate a painting. It's still an infringement.
You can't publish a Harry Potter novel and then throw up your hands and say, "It wasn't me. The AI decided to name the characters Hargid and Hermione and Snape."
Google says it paid a voice actor. If it provides proof of that, good. But like with a lot of AI things, we're in new territory here.
Seems like there's a market for a tool that can compare an AI voice to a library of known famous voices so that companies like Google can tweak their machines to not sound too much like someone who can be harmed by a sound-alike.
Also not sufficient. There has to be some evidence they attempted to copy the voice rather than just found one that was eerily similar.
This comes up from time to time without AI either. Like its not good if a firm goes out to find someone with a voice similar to a famous person / voice actor…but its fine if they just randomly find one that sounds exactly the same and they say “oooh lets go with this one” and not “oooh perfect this sounds just like Dan LaFontaine!”
Even if it is complete chance, there's no way to peer inside and confirm that because these things are completely opaque black boxes
in perceptual psychology/psychophysics, there's the concept of the "just-noticeable difference" (JND) which is the smallest change to a stimulus you can make that is reliable detectable.
normally the JND is measured on physical properties like brightness, pitch, etc but there's no reason it couldn't be applied to a more abstract latent space. two points in a particular latent space may be mathematically unique, but if they're indistinguishable to humans we shouldn't treat them as distinct voices
Then came the completely nonsensical HN threads with people arguing about something they hadn't heard.
Maybe don't redo that whole thing? Could we at least make sure to secure some examples of A and B, this time?
--
Statement from Scarlett Johansson on the OpenAI "Sky" voice (May 20, 2024)
https://news.ycombinator.com/item?id=40421225 (1021 comments)
OpenAI didn’t copy Scarlett Johansson’s voice for ChatGPT, records show (May 23, 2024)
https://news.ycombinator.com/item?id=40448045 (1218 comments)
https://news.ycombinator.com/item?id=40421757
I had to wade through 12 gigantic generic political subthreads to find this.
"Do you have an example of the changed voice anywhere?" (No replies.)
"Yes, I feel gaslit by the whole situation" is a great summary.
Please post a clip from the time. I'm still curious to hear how similar or not they acually were.
Turns out he still has his own voice, that one sounds like him.
The voices don't sound that similar to me, and he has what I think of as a generic mid-atlantic accent. I'm sure it feels uncomfortably similar to him, but I think this is a mix of confirmation bias and the fact that radio and tv stations have long selected for 'average' sounding voices because listeners and viewers will call in to complain about voices they find annoying. Performers in this field cultivate those kind of generic voices, much as real estate agents cultivate aim for a friendly-but-bland look rather than trying to stand out individually.
I do feel for the guy a bit because voice generation is now so good that there's no reason to pay performers with a 'radio voice' for commercial voice-over or narration work in many cases, and I question the value of applying AI to fields of personal rather than industrial endeavor - was the cost of human vocalization such a drain on the economy that we are better off for automating those jobs away as quickly as possible? However, I don't buy his claim that his individual voice and way of speaking was stolen. It's just not very distinctive to me, in the same way that faces from thispersondoesnotexist.com will inevitably approximate the appearance of some real people.
I think a few random samples trivially shows NotebookLM is higher pitched, although if you generalize to "deep male voice with vocal fry" you could lump them together with half the radio and podcast voices.
Nobody at Google was like "we should use this guy's voice!"