1) those directly involved with the incident, or employees of the same company. They have too much to lose by circumventing the PR machine.
2) people at similar companies who operate similar systems with similar scale and risks. Those people know how hard this is and aren’t likely to publicly flog someone doing their same job based on uninformed speculation. They know their own systems are Byzantine and don’t look like what random onlookers think it would look like.
So that leaves the rest, who offer insights based on how stuff works at a small scale, or better yet, pronouncements rooted in “first principles.”
Especially in a time where the gates have come crashing down to pronouncements of, "now anybody can learn to code by just using LLMs," there is a shocking tendency to overly simplify and then pontificate upon what are actually bewilderingly complicated systems wrapped up in interfaces, packages, and layers of abstraction that hide away that underlying complexity.
It reminds me of those quantum woo people, or movies like What the Bleep Do We Know!? where a bunch of quacks with no actual background in quantum physics or science reason forth from drastically oversimplified, mathematics-free models of those theories and into utterly absurd conclusions.
This happens when your terms are underspecified: someone says “Netflix’s servers are struggling under load” and while people in similar efforts know that basically just equivalent to “something is wrong” and the whole conversation is basically esoteric to most people outside a few specialized teams, these other people jump to conclusions and start having conversations based on their own experience having to do with what is (to them) related (and usually fashionable, because that is how most smaller players figure out how to do things).
Whenever an HN thread covers subjects where I have direct professional experience I have to bite my tongue while people who have no clue can be as assertive and confidently incorrect as their ego allows them to be.
As one of thoes who cant help themselves; the way you phrase it feels a bit too cynical, I've always interpreted it as people want to help, but don't want to offer something that's wrong. Which is basically how falsifiable science works. It's so much easier to refute the assertion that birds generate lift with tiny backpacks with turboprops attached. Than it is to explain the finer details of avian flight mechanics. I couldn't describe above a superficial level how flapping works, but I can confidently refute the idea of a turboprop backpack. (Everyone knows birds gave up the turboprop design during the great kerosene shortage of 1128)
This comes from both first-hand experience of talking to several of their directors when consulted upon on how to make certain systems of theirs better.
It's not just a matter of guarantees, it's a matter of complexity.
Like right now Google search is dying and there's nothing that they can do to fix it because they have given up control.
The same thing happened with Netflix where they wanted to push too hard to be a tech company and have their tech blogs filled with interesting things.
On the back end they went too deep on the microservices complexity. And on the front end for a long time they suffered with their whole RxJS problem.
So it's not an objective matter of what's better. It's more cultural problem at Netflix. Plus the fact that they want to be associated with "Faang" and yet their product is not really technology based.
"Microservices" have nothing to do with it.
Netflix regularly puts out blog articles proudly proclaiming that they process exabytes of logs per microsecond or whatever it is that their microservices Rube Goldberg machine spits out these days, patting themselves on the back for a heroic job well done.
Meanwhile, I've been able to go on the same rant year after year that they're still unable to publish more than five subtitle languages per region. These are 40 KB files! They had an employee argue with me about this in another forum, saying that the distribution of these files is "harder than I thought".
It's not hard!
They're solving the wrong problems. The problems they're solving are fun for engineers, but pointless for the business or their customers.
From a customer perspective Netflix is either treading water or noticeably getting worse. Their catalog is smaller than it was. They've lost licensing deals for movies and series that I want to watch. The series they're producing themselves are not things I want to watch any more. They removed content ratings, so I can't even pick something that is good without using my phone to look up each title manually!
Microservices solve none of these issues (or make it worse), yet this is all we hear about when Netflix comes up in technology discussions. I've only ever read one article that is actually relevant to their core business of streaming video, which was a blog about using kTLS in BSD to stream directly from the SSD to the NIC and bypassing the CPU. Even that is questionable! They do this to enable HTTPS... which they don't need! They could have just used a cryptographic signature on their static content, which the clients can verify with the same level of assurance as HTTPS. Many other large content distribution networks do this.
It's 100% certain that someone could pretend to be Elon, fire 200-500 staff from the various Netflix microservices teams and then hire just one junior tech to figure out how to distribute subtitles... and that would materially improve customer retention while cutting costs with no downside whatsoever.
Every tech company massively inflated their headcount during the leadup to the Twitter acquisition because money was free.
I interviewed at Meta in 2021 and asked an EM what he would do if given a magic wand to fix one problem at the company. His response: "I would instantly hire 10,000 more engineers."
Elon famously did the opposite and now revenue is down 80%.
From my experience, this answer usually belies someone who doesn’t fully understand the system and problems of the business. The easy answer when overwhelmed is “we need more people.” To use a manufacturing analogy, you can cover up a lot of quality issues with increased throughout, but it makes for an awfully inefficient system.
Borrowing costs went to nearly zero. That's not the same thing. You have to repay the money, you just don't have to repay it with interest.
I would have assumed people generally know this, but everybody (and I do mean everybody) talks like they don't know this. I would like to assume that "money is free" is just a shorthand, buuuut... again... these arguments! People like that EM talk like it was literally free money raining from the sky that could be spent (gone!) without it ever having to be repaid.
If you watched any of the long-form interviews Musk gave immediately after the acquisition, he made the point that if he hadn't bailed out Twitter, it had maybe 3 months of runway left before imploding.
Doubling headcount without a clear vision of how that would double revenues is madness. It is doubly so in orgs like Twitter or Netflix where their IT was already over-complicated.
It's too difficult for me to clearly and succinctly explain all the myriad ways in which a sudden inrush of noobs -- outnumbering the old guard -- can royally screw up something that is already at the edge of human capability due to complexity. There is just no way in which it would help matters. I could list the fundamental problems with that notion for hours.
I highly recommend everyone take a university-level financial instruments course. The math isn’t super hard, and it does a very good job of explaining how rational investors behave.
Surely they expect at a minimum that their capital investment would make them dividends (increased revenue), and also that the money wasn’t simply set on fire with nothing to show for it and no way to repay it.
If I’m wrong then Twitter - and similar companies - are little better than Ponzi schemes, with investors relying on the money of the greater fool to recover their money.
Ah, HN, where you try to explain how things work, and you get ignorant sarcasm in return.
> Surely they expect at a minimum that their capital investment would make them dividends (increased revenue), and also that the money wasn’t simply set on fire with nothing to show for it and no way to repay it.
Yes, of course. But when safe investments (e.g., Treasuries) are paying out close to zero, investors are going to tolerate lower returns than they do when Treasuries are paying out 3% or more.
It's basic arithmetic: you take the guaranteed rate, add a risk premium, and that's what investors expect from riskier investments. This is well-covered in the class I recommended.
Also, not every investor thinks in terms of consistent return. A pensioner may have a need for a guaranteed 3% annual return to keep pace with inflation. A VC, on the other hand, is often content to have zero returns for years followed by a 100x payout through an IPO.
I know how all this works, but 100x payout is for the small initial investments, not after 10 years of operating at multi-billion-dollar scales.
Small amounts of money are set on fire all of the time, chasing this kind of high-risk return.
Nonetheless, there's an expectation of a return, even if only in aggregate across many small startups.
What I was observing (from the outside, at a distance) was that Twitter was still being run by a startup despite being in an effectively monopoly position already and a "mature" company. Similarly, Amazon could set money on fire while they were the growing underdog. If they doubled their headcount today without doubling either revenue or profits, the idiots responsible for that would be summarily fired.
I get that Silicon Valley and their startup culture does a few things in an unusual way, but that doesn't make US dollars not be US dollars and magically turn into monopoly money that rains from the sky just because interest rates are low.
Subtitles are also complicated because you have to deal with different media player frameworks on the +40 different players you deal with. Getting those players, which you may not own, to recognise multiple sub tracks can be a PITA.
Things look simple to a junior developer, but those experience in building streaming platforms at scale know there are dragons when you get into the implementation. Sometimes developers and architects do over complicate things, but smart leaders avoid writing code, so its an assumption to say things are being made over complicated.
I read and understood their entire technical whitepaper. I get the what, I'm just saying that the why might not make as much sense as you might assume.
> +40 different players you deal with
They own the clients. They wrote the apps themselves. This is Netflix code reading data from Netflix servers. Even if there are third-party clients (wat!?), that doesn't explain why none of Netflix's home-grown clients support more than 5 subtitle languages.
> Getting those players, which you may not own, to recognise multiple sub tracks can be a PITA.
This is a core part of the service, which everyone else has figured out. Apple TV for example has dozens of subtitle languages.[1]
With all due respect: Read what you just wrote. You're saying that an organisation that has the engineering prowess to stream at 200 Gbps per edge box and also handles terabytes of diagnostic log ingestion per hour can't somehow engineer the distribution of 40 KB text files!?
I can't even begin to outline the myriad ways in which these excuses are patent nonsense.
These are children playing with the fun toys, totally ignoring like... 1/3rd of the viewing experience. As far as the users are concerned, there's nothing else of consequence other than the video, audio, and text that they see on the screen.
"Nah, don't worry about the last one, that only affects non-English speakers or the deaf, we only care about DEI for internal hires, not customers."
[1] Just to clarify: I'm asking for there to be an option to select one language at a time from all available languages, not showing multiple languages at once, which is a tiny bit harder. But... apparently not that hard, because I have two different free, open-source video players on my PC that can do this so I can have my spouse get "full" subtitles in a foreign language while I see the "auto" English subtitles pop up in a different colour when appropriate. With Netflix I have to keep toggling between her language and my language every time some foreign non-English thing is said. Netflix is worth $362B, apparently, but hasn't figured out something put together by poor Eastern European hobbyists in their spare time.
The browser gives you a certain level of control on computers, although you have to deal with the oddities of Safari, but when you go to smart TVs it's the wild west. Netflix does provide their tested framework to TV vendors but it's still not easy, because media playback often requires hardware acceleration, but the rendering framework isn't standard.
Developing for set-top boxes, multiple generations of phones, and smart TVs comes with all sorts of oddities. You think it's easy because you haven't done it.
You lost me. Netflix built a massive CDN, a recommendation engine, did dynamic transcoding of video, and a bunch of other things, at scale, quite some years before everyone else. They may have enshittified in the last five years, but I dont see any reason why they dont have a genuinely legitimate claim to being a founder member of the FAANG club.
I have a much harder time believing that companies with AI in their name or domain are doing any kind of AI, by contrast.
You can argue whether or not that edge translates into more revenue, but the edge is objectively there.
- frequently decide that episodes I've watched are either completely unwatched (with random fully watched eps of the show mixed in).
- seemingly every time I leave at the start of the end-credits, I surely must have intended to come back and watch them.
- rebuild the entire interface (progressively, slowly) when I've left the tab unfocussed for too long. Instead of letting continue where I was, they show it for less than a second, the rebuild the world.
- keep resetting the closed-caption setting to "none", regardless of choosing "always" or "on instant replay"; worse, they sometimes still have the correct setting in the interface, but have disabled captions anyway.
Netflix has only once since they started streaming forgotten playback position or episode completion. They politely suggest when to reload the page (via a tiny footer banner), but even that might not appear for months. They usually know where end-credits really start, and count that as completion. They don't seem to mess with captions.
Maybe you're in a rural area and Netflix scaled gracefully. Maybe you're deep in SF and Netflix simply outspent to give minimal disruption to a population hub. These could both be true but don't speak to what performs better overall.
I have always wondered how do they deliver their content and what goes on behind the scenes and nobody on tech twitter or even youtubers talk about pornhub's infra for some reason. A lot of the innovation in tech has roots in people wanting to see high quality tiddies on the internet.
* The frontend itself runs on bare metal
* A lot of the backend was built out as microservices, running on top of Mesos and then later K8s
* CDN was in-house. The long tail is surprisingly long, but the front page videos were cached globally
The unifying theme behind PH is that they host everything on their own equipment, no AWS/GCP involved.
Can you explain where this is relevant to buffering issues?
Also, you are very wrong regarding failure modes. The larger the service, the more failure modes it has. Moreover, in monoliths if a failure mode can take down/degrade the whole service, all other features are taken down/degraded. Is having a single failure mode that brings down the whole service what you call fewer points of failure?
I asked nothing about Netflix. My question was directed at your remark regarding monoliths vs microservices.
Now, can you answer the question?
That’s service would technically be a “microservice” even if it is a large service.
I’m genuinely curious about the reasoning behind that statement. It’s very possible that you are using a different set of assumptions or definitions than I am.
Network requests (sometimes called hops) take a significant amount of time. You don’t want your streaming service to take a significant amounts of time.
In microservices land, you generally try making services based on some “domain” (metaphorical, not like a literal domain name) which defines the responsibility of any given service. Defining these domains is more art than science and depends on the business needs and each team.
Video streaming might be one of those domains for Netflix.
Tell me specifics.
Absolutely. I think a great filter for developers is determining how well they understand this. Over-simplification of problems and certainty about one’s ability to build reliable services at scale is a massive red flag to me.
I have to say some of the hardest challenges I’ve encountered were in e-commerce, too.
It’s a lot harder and more interesting than I think many people realize. I learned so much working on those projects.
In one case, the system relied on SQLite and god damn did things go sideways as the company grew its customer base. That was the fastest database migration project I’ve ever been on, haha.
I often think it could have worked today. SQLite has made huge leaps in the areas we were struggling. I’m not sure it would have been a forever solution (the company is massive now), but it would have bought us some much-needed time. It’s funny how that stuff changes. A lot of my takeaways about SQLite 10 years ago don’t apply quite the same anymore. I use it for things now that I never would have back then.
And for limit checking, how often do you write array limit handlers? And if the BE contract doesn't specify? Additionally, it will need as a regression unit test, because who knows when the next developer will remove that limit check.
An effective operational culture has methods for removing those people from the conversations that matter. Unfortunately that earns you a reputation for being “cutthroat” or “lacking empathy.”
Both of those are real things, but it’s the C players who claim they are being unfairly treated, when in fact their limelight-seeking behavior is the problem.
If all that sounds harsh, like the kitchen on The Bear, well…that’s kinda how it is sometimes. Not everyone thrives in that environment, and arguably the ones who do are a little “off.”
In one case I was doing an upgrade on an IPTV distribution network for a cable provider (15+ years ago at this point). This particular segment of subscribers totalled more than 100k accounts. I did validation of the hardware and software rev installed on the routers in question prior to my trip to the data center (2+ hour drive). I informed management that the currently running version on the router wasn't compatible with this hardware rev of card I was upgrading to. I was told that it would in fact work, that we had that same combination of hw/sw running elsewhere. I couldn't find it when I went to go look at other sites. I mentioned it in email prior to leaving I was told to go.
Long story short, the card didn't work, had to back it out. The HA failover didn't work on the downgrade and took down all of those subscribers as the total outage caused a cascading issue with some other gear in this facility. All in all it was during off-peak time of day, but it was a waste of time and customer sat.
this is where you get up and leave
That’s a bold claim given that people with inside knowledge could post here without disclosing they are insiders.
Is that some kind of No True Scotsman?
For every thread like this, there are likely people who are readers but cannot be writers, even though they know a lot. That means the active posters exclude that group, by definition.
These threads often have interesting and insightful comments, so that’s cool.
GP clearly meant some people not everybody. You are the one making bold claims.
It’s a very different problem to distributing video on demand which is Netflix’s core business.
We (yep) don't know the exact details, but we do get sent snapshots of full configs and deployments to debug things... we might not see exact load patterns, but it's enough to know. And if course we can't tell due to NDAs.
now take this realization and apply it to any news article or forum post you read and think about how uninformed they actually are.
Reputational damage is going to be far more Netflix than the NFL if they totally club it.
That and this fight is going to likely be an order of magnitude more viewers than the Christmas NFL games if the media estimates on viewership were remotely accurate. You’re talking Super Bowl type numbers vs a regular season NFL game. The problems start happening at the margin of capacity most of the time.
Most people are consumers and at the end of the day, their ability to consume a (boring) match was disrupted. If this was PPV (I don't think it is) the paid extra to not get the quality of product they expected. I'm not surprised they dominate the conversation.
I'm also not going to criticise my peers because they could recognise me and I might want to work with them one day.
Stuff goes wrong, random internet people jump on the opportunity to speculate and say wildly off-the-mark comments, and the engineers trying to keep the ship from sinking have to sit quietly for fear of making the PR backlash worse.
Another person was observing the interview, for training purposes, and afterwards said to me: “Do you have kids? You have so much patience!”
And looking through the comments, this is just wrong.
If you code it to utilize high-bandwidth users upload, the service becomes more available as more users are watching -- not less available.
It becomes less expensive with scale, more available, more stable.
The be more specific, if you encode the video in blocks with each new block hash being broadcast across the network, just managing the overhead of the block order, it should be pretty easy to stream video with boundless scale using a DHT.
Could even give high-bandwidth users a credit based upon how much bandwidth they share.
With a network like what Netflix already has, the seed-boxes would guarantee stability. There would be very little delay for realtime streams, I'd imagine 5 seconds top. This sort of architecture would handle planet-scale streams for breakfast on top of the already existing mechanism.
But then again, I don't get paid $500k+ at a large corp to serve planet scale content, so what do I know.
The problems with using it as part of a distributed service have more to do with asymmetric connections: using all of the limited upload bandwidth causes downloads to slow. Along with firewalls.
But the biggest issue: privacy. If I'm part of the swarm, maybe that means I'm watching it?
[1]: Chainsaw: P2P streaming without trees, https://link.springer.com/chapter/10.1007/11558989_12
The torrent is an example of the system I am describing, not the same system. Torrents cannot work for live streams because the entire content is not hashable yet, so already you have to rethink how it's done. I am talking about adding a p2p layer on top of the existing streaming protocol.
The current streaming model would prioritize broadcasting to high-bandwidth users first. There should be millions of those in a world-scale stream.
Even a fraction of these millions would be enough to reduce Netflix's streaming costs by an order of magnitude. But maybe Netflix isn't interested in saving billions?
With more viewers, the availability of content increases, which reduces load on the centralized servers. This is the property of the system I am talking about, so think backwards from that.
With a livestream, you want the youngest block to take priority. You would use the DHT to manage clients and to manage stale blocks for users catching up.
The youngest block would be broadcast on the p2p network and anyone who is "live" would be prioritizing access to that block.
Torrent clients as they are now handle this case, in reverse; they can prioritize blocks closer the current timestamp to created an uninterrupted stream.
The system I am talking about would likely function at any scale, which is an improvement from Netflix's system, which we know will fail -- because it did.
1. Everyone only cares about the most recent "block". By the time a "user" has fully downloaded a block from Netflix's seedbox, the block is stale, so why would any other user choose to download from a peer rather from netflix directly?
2. If all the users would prefer to download from netflix directly rather than a p2p user, then you already have a somewhat centralized solution, and you gain nothing from torrents.
1. I exclusively download from a peer and my stream is measurably behind
2. I switch to a peer when Netflix is at capacity and then I have to wait for the peer to download from Netflix, and then for me to download from the peer. This will cause the same buffering issue that Netflix is currently being lambasted for.
This solution doesn’t solve the problem Netflix has
1. You still get a better viewing experience without interruptions. Besides, your "measurably behind" can be an imperceptible fraction of a second?
2. Similar thing - shorter queues - the switch can happen faster due to the extra capacity
So yes, it does solve the practical problem, though not the theoretical one
But it does seem the capacity of a hybrid system of Netflix servers plus P2P would be strictly greater than either alone? It's not an XOR.
And note that in this case of "live" streaming, it still has a few seconds of buffer, which gives a bandwidth-delay product of a few MB. That's plenty to have non-stale blocks and do torrent-style sharing.
If the solution to users complaining about buffering is to build a system with more inherent buffering then you are back at square one.
I think it’s might be helpful to look at netlfix’s current system as already a distributed video delivery system in which they control the best seeds. Adding more seeds may help, but if Netflix is underprovisioned from the start you will have users who cannot access the streams
Hell, in the US, this setup might actually be illegal because of the VPPA[0]. The only reason why it's not illegal for the MAFIAA to catch you torrenting is because of a fun legal principle where criminals are not allowed to avail themselves of the law to protect their crimes. (i.e. you can't sue over a drug deal gone wrong)
[0] Video Privacy Protection Act, a privacy law passed which makes it illegal to ask video providers for a list of who watched what, specifically because a reporter went on a fishing expedition with video data.
[1] Music and Film Industry Association of America, a hypothetical merger of the MPAA and RIAA from a 2000s era satire article
We should always be doing (the thing we want to do)
Somme examples that always get me in trouble (or at least big heated conversations)
1. Always be building: It does not matter if code was not changed, or there has been no PRs or whatever, build it. Something in your org or infra has likely changed. My argument is "I would rather have a build failure on software that is already released, than software I need to release".
2. Always be releasing: As before it does not matter if nothing changed, push out a release. Stress the system and make it go through the motions. I can't tell you how many times I have seen things fail to deploy simply because they have not attempted to do so in some long period of time.
There are more just don't have time to go into them. The point is if "you did it, and need to do it again ever in the future, then you need to continuously do it"
Consider publishing a new version of a library: you'd be bumping the version number all the time and invalidating caches, causing downstream rebuilds, for little reason. Or if clients are lazy about updating, any two clients would be unlikely to have the same version.
Or consider the case when shipping results in a software update: millions of customer client boxes wasting bandwidth downloading new releases and restarting for no reason.
Even for a web app, you are probably invalidating caches, resulting in slow page loads.
With enough work, you could probably minimize these side effects, so that releasing a new version that doesn't actually change anything is a non-event. But if you don't invalidate the caches, you're not really doing a full rebuild.
So it seems like there's a tension between doing more end-to-end testing and performance? Implementing a bunch of cache levels and then not using it seems counterproductive.
1) I want to invalidate caches, I want to know that these systems work. I want to know that my software properly handles this situation.
2) if I have lazy clients. I want to know. And I want to motivate them on updating sooner or figure out how to force update them. I don’t want to not update because some people are slow. I want the norm to be it is updating, so when there is a reason to update, like a zero day, I can have some notion that the updates will work and the lazy clients will not be an issue.
I am not talking about fake or dry runs that go through some portion of motions, I want every aspect of the process to be real.
Performance means nothing if your stuff is down. And any perceived performance gained by not doing proper hygiene is just tweaking the numbers to look better than they really are.
You can try and predict everything that'll happen in production, but if you have nothing to extrapolate from, e.g. because this is your very first large live event, the chances of getting that right are almost zero.
And you can't easily import that knowledge either, because your system might have very different points of failure than the ones external experts might be used to.
2) You're assuming whatever issue happened would have been caught by testing on generic EC2 instances in AWS. In the end these streams were going to users on tons of different platforms in lots of different network environments, most of which look nothing like an EC2 instance. Maybe there was something weird with their networking stack on TCL Roku TVs that ended up making network connections reset rapidly chewing up a lot of network resources which led to other issues. What's the EC2 instance type API name for a 55" TCL Roku TV from six years ago on a congested 2.4GHz Wireless N link?
I don't know what happened in their errors. I do know I don't have enough information to say what tests they did or did not run.
These would likely have completely different network connectivity and usage patterns, especially if they don't have historical data distributions to draw from because this was their first big live event.
Systemic issues causing widespread buffering isn't "user behavior". It's a problem with how Netflix is trying to distribute video. Sure some connections aren't up to the task, and that isn't something Netflix can really control unless they are looking to improve how their player falls-back to lower bitrate video, which could also be tested.
>because this was their first big live event.
That's the point of testing. They should have already had a "big live event" that nobody paid for during automated testing. Instead they seem to have trusted that their very smart and very highly paid developers wouldn't embarrass them based on nothing more than expectations, but they failed. They could have done more rigorous "live" testing before rolling this out to the public.
> Even my small team spins up 10,000 EC2 instances on the regular.
Woah, this sounds very cool. Can you share more details?When we do these large screenshot operations, the EC2 instances are running for maybe 15 or 20 minutes total. It's not exactly cheap, but losing clients because we broke their site is something we want to avoid. The sites are hosted on a 3rd party service, and we're rate-limited by IP address, so to get this done in a reasonable amount of time we need to spin up 10,000 EC2 instances to distribute the work. We have our own software to manage the EC2 instances. It's honestly pretty simple, but effective.
I can’t tell you the I bet of times things worked because the cache was hot. And a restart or cache invalidation would actually cause an outage.
Caches must be invalidated at a regular interval. Any system that does not do this is heading for some bad days.
While I too am generally a long-term sort of engineer, it's important to understand that this is a valid argument on its own terms, so you don't try to counter it with just "piffle, that's stupid". It's not stupid. It can be shortsighted, it leads to a slippery slope where every day you make that decision it is harder to release next time, and there's a lot of corpses at the bottom of that slope, but it isn't stupid. Sometimes it is even correct, for instance, if the system's getting deprecated away anyhow why take any risk?
And there is some opportunity cost, too. No matter how slick the release, it isn't ever free. Even if it's all 100% automated it's still going to barf sometimes and require attention that not making a new release would not have. You could be doing something else with that time.
A great engineering team will identify a tax they dislike and work to remove it. Using the same example, that means improving the success rate of deployments so you have the data (the success record) to take to leadership to change the policy and remove the tax.
It is just a reframing of build vs maintain.
Simply put. You don’t want to delay funding out something is broke, you want to know the second it is broken.
The the case I am suggesting, a failed release will be often deploying the same functionality, thus many failure modes will result in zero outage. It all failure modes will result in an outage.
When the software is expected to behave differently after the deployment, more systems can result in being part of the outage. Such as the new systems can’t do something or the old systems can’t do something.
Additionally, refactor circle jerks are terrible for back-porting subsequent bug fixes that need to be cherry picked to stable branches.
A lot of of the world isn’t CD and constant releases are super expensive.
"Test what you fly, and fly what you test" (Supposedly from aviation)
"There should be one joint, and it should be greased regularly" (Referring to cryptosystems I think, but it's the same principle. Things like TLS will ossify if they aren't exercised. QUIC has provisions to prevent this.)
> 2. Always be releasing...
A good argument for this is security. Whatever libraries/dependencies you have, unpin the versions, and have good unit tests. Security vulnerabilities that are getting fixed upstream must be released. You cannot fix and remove those vulnerabilities unless you are doing regular releases. This in turn also implies having good unit tests, so you can do these builds and releases with a lower probability of releasing something broken. It also implies strong monitoring and metrics, so you can be the first to know when something breaks.
Nitpick: unit tests by definition should not be exercising dependencies outside the unit boundary. What you want are solid integration and system tests for that.
I suspect it’s a bit of both Netflix issues and ISPs over subscribing bandwidth.
So I agree the problems could have been localized to unique (region, ISP) combinations.
Also, from lurking in various threads on the topic Netflix's in app messages added to people's irritation by suggesting that they check their WiFi/internet was working. Presumably that's the default error message but perhaps that could have been adjusted in advance somehow.
That eliminates a whole raft of problems.
Unless Netflix eng decides to release a public postmorterm, we can only speculate. In my time organizing small-time live streams, we always had up to 3 parallel "backup" streams (Vimeo, Cloudflare, Livestream). At Netflix's scale, I doubt they could simply summon any of these providers in, but I guess Akamai / Cloudflare would have been up for it.
A company I used to work for ran a few Super Bowl ads. The level of traffic you get during a Super Bowl ad is immense, and it all comes at you in 30 seconds, before going back to a steady-state value just as quickly. The scale pattern is like nothing else I've ever seen.
Super Bowl ads famously seven million dollars. These are things we simply can't repeat year over year, even if we believed it'd generate the same bump in recognition each time.
Also, "No experience in" really? You have no idea if that's really the case
Rolling Stone reported 120m for Tyson and Paul on Netflix [1].
These are very different numbers. 120m is Super Bowl territory. Could Hotstar handle 3-4 of those cricket matches at the same time without issue?
[0] https://www.the-independent.com/sport/cricket/india-pakistan...
[1] https://www.rollingstone.com/culture/culture-news/jake-paul-...
https://www.icc-cricket.com/news/biggest-cricket-world-cup-e...
Six seconds on the Google shows 58 million households in the United States. So, roughly 145,000,000 people.
You make the tech bubble mistake of believing that high speed internet is as ubiquitous as coax.
I see 68.7 million people, not households. There's my 6 seconds.
Maybe 10 minutes would give me a better truth.
>You make the tech bubble mistake of believing that high speed internet is as ubiquitous as coax.
Yes, and no. Given that the top US cities contain about 8% of the population, you can cover a surprising amount of large country with a surprisingly small amount of area coverage. So it's not as straightforward as "people in SF are in a bubble".
Don’t act so surprised—-streaming is a pain in the ass to figure out. People have been trained to tolerate a 3-second UI lag for every button press (seemingly all cable boxes are godawfully shitty like this—-it must be the server-side UI rendering design?)
BUT! You can record your game and the cable TV DVR is dead reliable and with high quality. There is no fear of competing for Wi-Fi bandwidth with your apartment or driveway neighbors, and the DVR still works even if cable is out. And as long as you haven’t deleted the recording it won’t go away for some stupid f’ing reason.
Finally, the cable TV DVR will let you fast forward through commercials—-or you can pause live TV to break for bathroom and make a snack, so you can build up a little buffer, now you are fast forwarding commercials on nearly-live TV. You can’t fast forward commercials with most mainstream streaming anymore. Who broadcasts your big games? Big players like Paramount+ won’t let you skip commercials anymore. The experience is now arguably worse. Once you settle in, forward 30sec back 30sec buttons work rather smoothly (that’s one part of cable TV boxes that has sub-half-second latency).
Your concern about extra remotes and extra boxes and hiding wires is a vanity most don’t care about. They are grateful for how compact big-screen TVs are these days compared to the CRTs or projection TVs of the past. They probably have their kids’ game console and a DVD/BluRay player on the same TV stand anyway.
Apparently movies purchased on Roku are now on Vudu. I hope that people who bought movies on Roku were able to figure it out. This is how technology sucks. Movies purchased with my cable provider’s Video On Demand are still with me, slow as shit as navigating to them is.
They exist in places where the internet infrastructure is not adequate for constant multiple streams.
Content is king. And there's lots of content on cable that is not on streaming. Just consider local and regional news and sports.
Many residential buildings, like the one in which I live, include cable TV with the rent. Why add more clutter and expense for streaming?
There are plenty of other reasons. Your position seems to be "stop liking things I don't like."
Live news and sports is huge for a ton of people.
You have access to all the shows from the major networks. You don’t need to subscribe to Peacock and Paramount and Hulu and the TBS app and Discovery+ and…
Better yet, they’re all combined in one interface as opposed to all trying to be the only thing that you use.
Also, especially if you grew up with it, there is absolutely a simplicity in linear TV. Everyone was used to a DVR. And yeah the interface sucks, but it sucked for everyone already anyway so they’re used to it. Don’t know what you wanna watch? Turn on a channel you watch and just see what’s on. No looking at 400 things to pick between.
I’ve seen people switch off and have serious trouble because it’s such a different way of watching TV from what they were used to. They end up using something like Hulu Live or YouTube TV to try and get the experience they’re used to back.
This is definitely turning into my version of an old man rant. “Back in my day…” the main benefit of it all is I actually just don’t watch as much as I once did. The friction is too high. Or, the commitment is too high-I dont usually want to jump into some 10 episode series.
I don’t subscribe to anything that doesn’t work with my Apple TV. Netflix for example won’t integrate with it the way Hulu does. So whatever show I’m watching on Netflix? Wouldn’t show up in my show list on my Apple TV. I forget it exists.
So I don’t subscribe to it. Or anything else like that. You are NOT more important than me, service I pay for.
The only two exceptions are YouTube (which obviously works differently) and Plex for the few things that I already already owned on DVD or can’t get on any service.
It works well enough for me. But I still find myself missing a linear TV now and then.
For many people, often those with backgrounds that make them unlikely to frequent HN, the experience they're looking for is "1. get home, 2. open beer, 3. turn TV on, 4. watch."
The default state of a streaming app is to ask you what you want to watch, and then show you exactly the thing you selected. The default state of traditional TV is to show you something, and let you switch to something else if you can't stand the thing you're watching right now or have something specific in mind. Surprisingly, many people prefer the latter over the former.
The same applies to radio versus streaming, many family members of mine don't use streaming, because all it takes to turn on the radio is turning the key in the ignition, which they have to do anyway.
I watched it for the game trailers, actually shocked that it's also superbowl viewership territory.
https://variety.com/2023/digital/news/game-awards-2023-break...
It might just have been easier to start from scratch, maybe using an external partner experienced in live streaming, but the chances of that decision happening in a tech-heavy company such as Netflix that seems to pride itself on being an industry leader are close to zero.
depending on whom you ask, the bitrate used by the stream is significantly lower than what is considered acceptable from free livestreaming services, that albeit stream to much, much smaller audience.
without splitting hairs, livestreaming was never their forte, and going live with degradation elsewhere is not a great look for our distributed computing champ.
1. Netflix is a 300B company, this isn't a resources issue.
2. This isn't the first time they have done live streaming at this scale either. They already have prior failure experience, you expect the 2nd time to be better, if not perfect.
3. There were plenty of time between first massive live streaming to second. Meaning plenty of time to learn and iterate.
Peak traffic is very expensive to run, because you're building capacity that will be empty/unsused when the event ends. Who'd pay for that? That's why it's tricky and that's why Akamai charges these insane prices for live streaming.
A "public" secret in that network layer is usually not redundant in your datacenter even if it's promised. To have redundant network you'd need to double your investment and it'll seat idle of at 50% max capacity. For 2hr downtime per year when you restart the high-capacity routers it's not cost efficient for most clients.
There is no middle ground where you commit a mediocre amount of resources, end up with downtime and a mediocre experience, and then go “but we saved money.”
They indeed have a great CDN network, but it's not very good for this particular type of traffic. May be they will know/fix/buy next time...
But like you stated, they dont want to spend money and their technical people couldn't deliver on time. This isn't a technical issue a lot of people on HN and Twitter wants to discuss about. It is a management issue.
Buying the exclusive rights for 20Mil and puting 30Mil to stream it wouldn't be a very smart choice. Fuckups happen and this might be a mistake that cost them in lost reputation more than they expected to win.
Is that a surprise? They're not who I would think of first as a gold standard for high viewership live streams.
What was the previous fail?
I spoke to multiple Netflix senior technicians about this.
They said that's the whole shtick.
Live streaming is just much harder than streaming, and it takes a years of work and a huge headcount to get something good.
To be clear when I said that PrimeVideo is composed of hundreds of microservices, I actually meant that it's composed of hundreds of services, themselves composed, more often than not, of multiple microservices.
Depending on your definition of a microservice, my team alone owns dozens.
Live has changed over the years from large satellite dishes beaming to a geosat and back down to the broadcast center($$$$$), to microwave to a more local broadcast center($$$$), to running dedicated fiber long haul back to a broadcast center($$$), to having a kit with multiple cell providers pushing a signal back to a broadcast center($$), to having a direct internet connection to a server accepting a live http stream($).
I'd be curious to know what their live plan was and what their redundant plan was.
This isn’t NFLX’s first rodeo in live streaming. Have seen a handful of events pop up in their apps.
There is no excuse. All of the resources and talent at their disposal, and they looked absolutely amateurish. Poor optics.
I would be amazed if they are able to secure another exclusive contract like this in the future.
"I flew into EGLL on Monday" - pilot
I don’t do it as some sort of “signaling” for “fintech bros” or anything like that
I think I subconsciously adopted it since it made my job easier. Sort of how I use YYYYMMDD format in almost everything from programming to daily communication.
It’s a lot fewer keystrokes for MS (Morgan Stanley), GS (Goldman Sachs) and MSFT (Microsoft) than it is for AAPL, but it’s a force of habit for some. Once you’re used to referring to firms by their ticker symbols, you do it all the time.
E.g. an ex trader friend still says “spot” instead of “point” when referring to decimal points, even if talking in other contexts like software versions.
I guess if we allowed the first two letter ticker symbol for the missing singles, we could send messages by mentioning a bunch of company names.
Eg "Buy: Dominion Energy, Agilent. Hold: Nano Labs. Sell: Genpact." would refer to our esteemed moderator, and "Hyatt Hotels is pleased to announce special corporate rates for Nano Labs bookings" to this site itself.
[maybe it would be better to use the companies where the corporate name and ticker letter don't match? Like US Steel for X and AT&T for T?]
On a phone, at least in iOS, you have to double tap the shift key.
It’s pointless jargon.
Just what the fuck are these people doing?
If I were a major investor in them I'd be pissed.
I was pointing out how dumb a multibillion dollar company is for getting this so wrong. Broadcasting live events is something that is underestimated by everyone that has never it, yet hubris of a major tech company thinking it knows better is biting them in the ass.
As many other people have commented, so many other very large dwarfing this event have been pulled off with no hiccups visible to the viewers. I have amazing stories of major hiccups during MLB World Series that viewers had no idea about happening, but “inside baseball” people knew. To the point that the head of the network caught something during the broadcast calling the director in the truck saying someone is either going to be fired or get a raise yet the audience would never have noticed if the person ended up getting fired. They didn’t, btw.
I guess we now know the limits of what "at scale" is for Netflix's live-streaming solution. They shouldn't be failing at scale on a huge stage like this.
I look forward to reading the post mortem about this.
Given than many people had no problems with the stream, it is unlikely to have been an origin problem but more likely the mechanism to fanout quickly to OCAs. Normally latency to an OCA doesn’t matter when you’re replicating new catalogs in advance, but live streaming makes a bunch of code that previously “didn’t need to be fast” get promoted to the hot path.
I was watching a pirated, live retransmission of the event on Twitch (in Portuguese), and there was zero buffering on my end.
Every major network can broadcast the Super Bowl without issue.
And while Netflix claims it streamed to 280 million, that’s if every single subscriber viewed it.
Actual numbers put it in the 120 million range. Which is in line with the Super Bowl.
Maybe Netflix needs to ask CBS or ABC how to broadcast
Live sports do not broadcast the event directly to a streamer. They push it to their broadcast centers. It then gets distributed from there to whatever avenues it needs to go. Trying to push a live IP stream directly from the remote live venue rarely works as expected. That's precisely why the broadcasters/networks do not do it that way
Those are multicast feeds.
> Trying to push a live IP stream directly from the remote live venue rarely works as expected.
In my experience it almost always works as expected. We have highly specialized codecs and equipment for this. The stream is actively managed with feedback from the receiver so parameters can be adjusted for best performance on the fly. Redundant connections and multiple backhauls are all handled automatically.
> That's precisely why the broadcasters/networks do not do it that way
We use fixed point links and satellite where possible because we own the whole pipe. It's less coordination and effort to setup and you can hit venues and remotes where fixed infrastructure is difficult or impossible to install.
> We use fixed point links and satellite where possible because we own the whole pipe.
Over long distance I get better reliability out of a decent internet provision than in many fixed point to point links, and certainly when comparing at a price point. The downside of the internet is you can't guarantee path separation - even if today you're routing via two different paths, tomorrow the routes might change and you end up with everything going via the same data centre or even same cable.
Which is probably done over the cableco's private network (not the public Internet) with a special VLAN used for television (as opposed to general web access). They're probably using multicast.
Not really the same as an IP service live stream where the distribution point is sending out one copy per viewer and participating in bitrate adaptation.
AFAIK, Netflix hasn't publicly described how they do live events, but I think it's safe to assume they have some amount of onsite production that outputs the master feed for archiving and live transcoding for the different bitrate targets (that part may be onsite, or at a broadcast center or something cloudy), and then goes to a distribution network. I'd imagine their broadcast center/or onsite processing feeds to a limited number of highly connected nodes that feed to most of their CDN nodes; maybe more layers. And then clients stream from the CDN nodes. Nobody would stream an event like this direct from the event; you've got to have something to increase capacity.
Over the US and Canada it mostly is, though how advanced the transition is is very regional.
The plan is to drop both analog signal and digital (QAM) to reclaim the frequencies and use them for DOCSIS internet.
Newer set top boxes from Comcast (xfinity) runs over the internet connection (in a tagged VLAN on a private network, and they communicate over a hidden wifi).
Exactly! It was a solved problem.
The Superbowl isn't even the biggest. World Cup finals bring in billions of viewers.
I guarantee the people trying to watch the fight cared more about watching the fight than how the fight was watched.
That’s a very different area to transmission of live to end users.
All of the “mixing” as you call it is done in the truck. If you’ve never seen it, it is quite impressive. In one part of the truck is the director and the technical director. The director is the one calling things like “ready camera 1”, “take 1”, etc. the TD is the one on the switcher pushing the actual buttons on the console to make it happen. Next to them is the graphics team prepping all of the stats made available to the TD to key in. In another area is the team of slomo/replay that are taking the feeds from all of the cameras to recorders that allow the operators to pull out the selects and make available for the director/TD to cut to. Typically in the back of the truck is the audio mixer that mixes all of the mics around the event in real time. All of that creates the signal you see on your screen. It leaves the back of the truck and heads out to wherever the broadcaster has better control
BT sport are interesting, spin up graphics, replay, etc in an AWS environment a couple of hours before. I was impressed by their uefa youth league coverage a couple of years ago, and they aren’t slowing down
https://www.limitlessbroadcast.tv/portfolio/uefa-youth-leagu...
https://www.svgeurope.org/blog/headlines/stratospheric-revol...
Obviously not every broadcast, or even most, are remote now, but it’s an ever increasing number.
I don’t know how the US industry works, I suspect the heavy union presence I’ve seen at places like NAB will slow it, but in Europe remote production is increasingly the future.
Sure thing, but also, how much resources do you think Netflix threw on this event? If organizations like FOSSDEM and CCC can do live events (although with way smaller viewership) across the globe without major hiccups on (relatively) tiny budgets and smaller infrastructure overall, how could Netflix not?
Haven't Asian live sports been using p2p already two decades ago ?
(What is the biggest Peertube livestream so far ?)
This is a major broadcast. I'd expect a full on broadcast truck/trailer. If they were attempting to broadcast this with the ($) option directly to a server from onsite, then I would demand my money back. Broadcasting a live IP signal just falls on its face so many times it's only the cheap bastard option. Get the video signal as a video signal away from the live location to a facility with stable redundant networking.
This is the kind of thinking someone only familiar with computers/software/networking would think of rather than someone in broadcasting. It's nice to think about disrupting, but this is the kind of failure that disruptors never think about. Broadcasters have been there done that with ensuring live broadcasts don't go down because an internet connection wasn't able to keep up.
The hard part is over, and people new to the problem think they are almost done, but then the next part turns out to be 100x harder.
Lots of people can encode a video.
They also have the benefit of having practiced their craft at the CCC events for more than a decade. Twice a year. (Their summer event is smaller but still fairly well known. Links to talks show up on HN every now and then.)
Funky anecdote: the video crew at Assembly have more broadcasting and live AV gear for their annual event than most medium-sized studios.
Now if they could just get audio levels and compression figured out.
Or, for that matter, Youtube (Live) and Twitch.
Based on the results, I hope it was a small team working 20% time on the idea. If you tell me they threw everything they had at it to this result, then that's even more embarrassing for them.
It was really bad. My Dad has always been a fan of boxing so I came over to watch the whole thing with him.
He has his giant inflatable screen and a projector that we hooked up in the front lawn to watch it, But everything kept buffering. We figured it was the Wi-Fi so he packed everything up and went inside only to find the same thing happening on ethernet.
He was really looking forward to watching it on the projector and Netflix disappointed him.
What did your Dad think about the 'boxing'?
To rephrase your question then what does someone think of the entertainment on display?
I don't think it was good entertainment.
None of the hallmarks of a good show was present. i.e. It wasn't close, nor was it bloody or anything unexpected like say a KO everything went pretty much as expected. It wasn't nice watch as all,no skill or talent was on disply, all Paul had to do was use his speed to backpedal from the slow weak punches of a visibly older tyson with a bum knee and land some points occasionally to win.
--
[1] There is a deeper argument here is any spectator sports just entertainment or is truly about skill talent and competition. Boxing however including the ones promoted by traditional four major associations falls clearly on the entertainment side than say another sport like NFL to me.
The Masters app is the only thing that comes close imo.
Cable TV + DVR + high speed internet for torrenting is still an unmatched entertainment setup. Streaming landscape is a mess.
It's too bad the cable companies abused their position and lost any market goodwill. Copper connection direct to every home in America is a huge advantage to have fumbled.
Reliable and redundant multicast streaming is pretty much a solved problem, but it does require everyone along the way to participate. Not a problem if you're an ISP offering TV, definitely a problem if you're Netflix trying to convince every single provider to set it up for some one-off boxing match.
So far, no one seems particularly motivated.
He’s definitely got issues..
This sure doesn't help with that impression, and it hasn't just been a momentary glitch but hours of instability. And the Netflix status page saying "Netflix is up! We are not currently experiencing an interruption to our streaming service." doesn't help either...
They never tried to do a live reunion again. I suppose they should have to get the experience. Because they are hitting the same problems with a much bigger stake event.
From a livestreaming standpoint, netflix is 0/x - for many large events such as love is blind, etc.
From a livestreaming standpoint, look to broadcast news, sports / Olympics broadcasters, etc and you'll see technology, equipment, bandwidth, planning, and professionalism at 1000x of netflix.
Heck, for publicly traded quarterly earnings livestream meetings, they book direct satellite time in addition to fiber to make sure they don't rely only on terrestrial networks which can fail. From a business standpoint, failure during a quarterly meeting stream can mean the destruction of a company (by making shareholders mad that they can't see and vote during the meeting making them push for internal change) - so the stakes are much higher than live entertainment streaming.
Netflix is good at many things, livestreaming is not one of those things.
Perhaps Netflix still needs a dozen more microservices to get this right...
Netflix events is small potatoes compared to other livestream stalwarts.
Imagine having to stream a cricket match internationally to UK / India / Australia with combined audience that crushes the Superbowl or a football match to all of Europe, or even something like livestreaming F1 racing that has multiple magnitudes of audience than a boxing match and also has 10x the number of cameras (at 8K+ resolution) across a large physical staging arena (the size of the track/course) in realtime, in addition to streaming directly from the cockpit of cars that are racing 200mph++.
Livestream focused outfits do this all day, everyday.
Netflix doesn't even come close to scratching the "beginner" level of these kinds of live events.
It's a matter of competencies. We wouldn't expect Netflix to be able to serve burgers like McDonald's does - Livestreaming is a completely different discipline and it's hubris on Netflix's part to assume just because they're good at sending video across the internet they can competently do livestreaming.
yes, love is blind failed, but was definitely not the most recent attempt. they did some other golf thing too, iirc
chris rock -> love is blind -> mike tyson
they have had other, successful executions in between. the comment i was replying to had cherry picked failures and i’m trying to git rebase them onto main.
In particular, they have been revising their compensation structure to issue RSUs, add in a bunch of annoying review process, add in a bunch of leveling and titles, begin hiring down market (e.g. non-sr employees), etc.
In addition to doing this, shuffling headcount, budgets, and title quotas around has in general made the company a lot more bureaucratic.
I think, as streaming matured as a solution space, this (what is equivalent to cost-cutting) was inevitable.
If Netflix was running the same team/culture as it was 10 years ago, I'd like to say that they would have been able to pull of streaming.
Or did they have a lot of needs that they decided didn't require top-skilled people?
Or was this a beancounter thing, of someone deciding that the company was paying more money on staffing than they needed to, without understanding it?
Netflix's edge nodes are optimized for streaming already encoded videos to end users. They have to transcode some number of formats from the source and send them all to the edge nodes to flow out. It's harder to manage a ton of different streams flowing out to the edge nodes cleanly.
I would guess YouTube, being built on google's infrastructure , has powerful enough edge nodes that they stream one video stream to each edge location and the edges transcode for the clients. Only one stream from source to edge to worry about and is much simpler to support and reason about.
But that's just my wild assed guess.
Ha, no, our edge nodes don't have anywhere near enough spare CPU to do transcoding on the fly.
We have our own issues with livestreaming, but our system's developed differently over the past 15 years compared to Netflix's. While they've historically focused on intelligent pre-placement of data (which of course doesn't work for livestreaming), such an approach was never feasible for YT with the sheer size of our catalog (thanks to user-generated content).
Netflix is still new to the space, and there isn't a good substitute for real-world experience for understanding how your systems behave under wildly different traffic patterns. Give them some time.
I get maybe 1m total of buffering per week, if that.
Seems uncharitable to complain about that.
Even the best latency is dozens of seconds behind live action.
Your ISP doesn't have enough bandwidth to the Internet (generally speaking) for all users to get their feed directly from a central location. And that central location doesn't have enough bandwidth to serve all users even if the ISP could. That said, the delay can be pretty small, e.g. the first user to hit the cache goes upstream, the others basically get the stream as it comes in to the cache. This doesn't make things worse, it makes them better.
I last saw Tyson at +500 while Jake was around -800 on DraftKings somewhere in the 6th round.
I think this could be one of upsells that Netflix could use.
Premium: get no delay
Normal users: get cache and delay
You know, like every other broadcaster, streaming platform, and company that does live content has been able to do.
Acting like this is a novel, hard problem that needs to be solved and we need to "upsell" it in tiers because Netflix is incompetent and live broadcasting hasn't been around for 80+ years is so fucking stupid.
Sample size 1, but...
I saw a ton of buffering and failure on an embedded Netflix app on a TV, including some infinite freezes.
Moved over to laptop, zero buffering.
I assume the web app runs with a lot bigger buffer than whatever is squeezed into the underpowered TV.
E.g. "give me this previous chunk" vs "send me the current stream"
They do make you code but the questions were 1. Not on hacker rank or leetcode 2. Pratical coding questions that didn't require anything more than basic hashmaps/lists/loops/recursion if you want. Some string parsing, etc.
They were still hard, you had to code a fast, but no tricky algorithms required. It also felt very collaborative, it felt like you were driving pair programming. Highly recommended even though didn't get an offer!
Tells you the uselessness of their engineering blogs.
They stream plenty of pre recorded video, often collocated. Live streaming seems like something they aren’t yet good at.
It may be "new" to them, but they should have been ready.
I did expect that Netflix would have appropriately accounted for demand and scale, though, especially given the hype for this particular event.
Live sports and VOD movies/TV are very different beasts.
To me they're basically padding their front page.
But honestly that's most of the major streaming platforms these days. I recently cancelled Disney Plus for similar reasons. The only reasons I don't cancel prime or Netflix are because I have family members I split the memberships with to share.
It’s pretty much a two-story townhouse packed head to toe with DVDs (lots of blu rays!)
You don’t realize how limited the streaming collection is until you’re back in a movie store, looking through thousands and thousands of movies you would never find otherwise.
Since I found it, I’ve started doing movie night every week with my friends. It’s such an absolute blast. We go to the store, each pick out a random movie that looks good (or bad, in a good way) or just different.
All of a sudden, I love movies again!!
Heck, mine even have some video games; though from when I've checked they're usually pretty back-reserved.
I suspect life stage is a factor, but it does feel like there are many classes of entertainment (cinema and standup come to mind) that don't resonate like they used to.
I'm in the same boat where as soon as they make it too hard to share, I'll probably cancel it. I think the main reason their sharing crackdown hasn't been a problem so far is that I use it so seldomly, it thinks the "main" address is my parents, which makes it easy for me to pass the "are you traveling" 2FA on my own phone when I do want to watch something.
All of the streaming services do this and I hate it. Netflix is the worst of the bunch, in my experience. I already scrolled past a movie, I don't want to watch it, don't show it to me six more times.
Imagine walking through a Blockbuster where every aisle was the same movies over and over again.
It's been pretty rough the last few years. So many great films and series, not to mention kids programming, removed to make way for mediocre NeTfLiX oRiGiNaLs and Bollywood trash.
I think it got worse for sellers recently too. If I search for something, like a specific item using its description, sometimes the only result for it shows "sponsored".
It used to show up as sponsored and also unsponsored below.
If this changed, I assume it is bad for the seller. Either they pay for all search results, or their metrics are skewed because all searches were helped by "sponsorship" (and there are no longer unsponsored hits)
I exited playback and haven't gone back to finish it. I'll wait for it eventually to make it to a Blu-ray release someday.
I would put Prime Video at 2nd worst. Absolute worst IME is Paramount+.
Edit: worst for streaming quality
The Amazon originals are way better imo. They do the dark pattern crap with paid content, as one would expect from Amazon.
I always assumed youtube was top dog for performance and stability. I can’t remember the last time I had issues with them and don’t they handle basically more traffic than any other video service?
Most people pay Netflix to watch movies and tv shows, not sports. If I hadn't checked Hacker News today, I wouldn't even know they streamed sports, let alone that they had issues with it. Even now that I do, it doesn't affect how I see their core offering, which is their library of on-demand content.
Netflix's infrastructure is clearly built for static content, not live events, so it's no shock they aren't as polished in this area. Streaming anything live over the internet is a tough technical challenge compared to traditional cable.
It has been pretty useless. At the moment seems to be working only when running in non-live mode several minutes behind.
So if there are 1 million trying to stream it, that means they would lose $15 million. So.. they might only give a partial refund.
But people should push for an automatic refund instead of a class action.
I've watched ball games on streaming networks where I can also hear a local radio broadcast, and the stream is always delayed compared to the radio, sometimes by quite a lot. But you'd never know it if you were just watching the stream.
i don't bet on sports. but from friends who do: yes, it's a really really big deal.
I guess they could bet before the betting streams caught up.
It seems ridiculous to me that you can bet on individual points, but here we are.
So for the first hour it was just total frustration until I stopped trying to go back to live mode.
https://en.wikipedia.org/wiki/The_Sting
Offshore combined streaming and betting houses will be cleaning up the rubes.
It was actually great that the fight itself was so boring because it justifies never having to spend time / money on that kind of bullshit. It was a farce. A very bright, loud, sparkly, and expensive (for some people) farce.
The value I got from it was the knowledge that missing out on that kind of thing isn't really missing out on anything at all.
Live, not so much. One source that you have to fanout from and absolutely no way to get cheap redundancy. Right?
Maybe if we're not counting Youtube as 'streaming', but in my mind no one holds a candle to YT quality in (live)streaming.
Most third-party internet-based streaming solutions are overlaid on top of a point-to-point network, while broadcast is one-to-many, and even cable tends to use multicast within the cable provider's network.
You have potentially different problems, e.g. limited bandwidth / spectrum. If, say, there are multiple games going on at the same time, you can only watch whichever feed the broadcaster decides to air. And, of course, regardless of the technology in use, there are matters of acquiring rights for various events. One benefit of internet-based streaming is that one service can acquire the rights and be able to reach everyone, whereas an individual cable provider might only reach its direct subscribers.
> Some Cricket graphs of our #Netflix cache for the #PaulVsTyson fight. It has a 40 Gbps connection and it held steady almost 100% saturated the entire time.
YouTube, Twitch, Amazon Prime, Hulu, etc have all demonstrated to stream simultaneously to hundreds of millions live without any issues. This was Netflix's chance to do this and they have largely failed at this.
There are no excuses or juniors to blame this time. Quite the inexperience from the 'senior' engineers at Netflix not being able to handle the scale of live-streaming which they may lose contracts for this given the downtime across the world over this high impact event.
Very embarrassing for a multi-billion dollar publicly traded company.
I wouldn't be surprised if lots of engineers at Netflix are currently now writing up a length post mortem of this.
And this is from the company that created the discipline of chaos engineering for resilience.
It is clear they under invested and took the eye of the ball with this.
This is bad, like very very bad.
We will see why it went down and to what extent they underinvested in their post mortem.
This isn't Netflix's first time they had this live streaming problem.
People will see this as an underinvestment from Netflix's part and they will reconsider going to a different streaming partner.
In my country every time there's a big football match the people who try to watch it on the internet face issues.
Not trying to downplay their complexity, but last I heard Netflix is splitting the shows in small data chunks and just serves them as static files.
Live streaming is a different beast
With the way that they are designed, you can even use a regular CDN.
Scale has increased but the techniques were figured out 20 years ago. There is not much left to invent in this space at the current moment so screwing up more than once is a bit unacceptable.
If they botch the NFL games, it will surely hurt their reputation.
I have a feeling Netflix said 'how hard could this be?' and is finding out right now.
But also people were saying they weren't having any issues streaming on Ziply.
[1] https://www.reddit.com/r/ZiplyFiber/comments/1gsenik/netflix...
8m vs 60m. And not in 4K. Not a great choice for comparison.
The only reasonable way to scale something like this up is probably to... scale it up.
Sure, there are probably some generic lessons, but I bet that the pain points in Netflix's architecture (historically grown over more than a decade and optimized towards highly cacheable content) are very different from Youtube, which has ramped up live content gradually over as many years.
E.g. the median engineer, excluding entry level/interns, at YouTube in 2012 was a literal genius at their niche or quite close to it.
Netflix simply can’t hire literal geniuses with mid six figure compensation packages in 2024 dollars anymore… though that may change with a more severe contraction.
It could come down to something as stupid as:
Executive: "we handled [on demand show ABCD] on day one, that was XX million"
Engineering: "live is really different"
Executive: (arguing about why it shouldn't be that different and should not need a lot of new infrastructure)
Engineering: (can't really argue with his boss about this anymore after having repeated the same conversation 3 or 4 times) -- tells the team: we are not getting new servers or time for a new project. We have to just make do with what we have. You guys are brilliant, I know you can do it!"
Live is a lot harder than on demand especially when you can't estimate demand (which I'm sure this was hard to do). People are definitely not understanding that. Then there is that Netflix is well regarded for their engineering not quite to the point of snobbery.
What is actually interesting to me is that they went for an event like this which is very hard to predict as one of their first major forays into live, instead of something that's a lot easier to predict like a baseball game / NFL game.
I have to wonder if part of the NFL allowing Netflix to do the Christmas games was them proving out they could handle live streams at least a month before. The NFL seems to be quite particular (in a good way) about the quality of the delivery of their content so I wouldn't put it past them.
To me it speaks to how most of the top tech companies of the 2010s have degraded as of late. I see it all the time with Google hiring some of the lower performing engineers on my teams because they crushed Leetcode.
Alas, my experience with the NFL in the UK does not reflect that. DAZN have the rights to stream NFL games here, and there are aspects of their service that are very poor. My major, long-standing issue has been the editing of their full game “ad-free” replays - it is common for chunks of play to be cut out, including touchdowns and field goals. Repeated complaints to DAZN haven’t resulted in any improvements. I can’t help but think that if the NFL was serious about the quality of their offering, they’d be knocking heads together at DAZN to fix this.
I'm more comparing Thursday Night Football and the quality of the encoding than anything. Delivery glitches are a seperate issue that I think they care about less.
NFL: 90+ minutes after the match on NFL Gameday, and it auto plays the most recent video for that team, which is always the post game interview. So you load it up, go to your team and it auto plays the "we won" or "it was a tough loss" like why the f*ck am I paying for a dvr solution when you do that. NFL Sunday Ticket: you can watch the games sometime monday after the fact but not the sunday night games. Good thing I paid well below half price for it with a disciount.
NHL: constantly shifting networks each year with worse solutions and not letting you get to half the previous games after a week. Totally useless for deferred unless you only want to watch the game a day or more after. Fubo, you have to 'record' the game and sometimes it's on a slightly different network and doesn't record. And their blackout system is the worst of all, who cares about your mediocre local team sorry you can't watch Chefs/Bills because they overlapped by some amount.
MLB: always broken at the top of the year, constantly changing the interface. You often get stuck watching the commercial break which is not actually commercials and is just the same "ohtani / judge highlight video from 2 years ago" and a "stat" about the sluggers that is almost an entire season out of date. The app resets when switching from the live CDN to the on demand one once the game ends which often resets the game and jumps you 6 innings forward, or makes the game unavialable for 30 minutes.
And EFF you if you want to watch playoffs on any of them.
Aside from latency (which isn't much of a problem unless you are competing with TV or some other distribution system), it seems easier than on-demand, since you send the same data to everyone and don't need to handle having a potentially huge library in all datacenters (you have to distribute the data, but that's just like having an extra few users per server).
My guess is that the problem was simply that the number of people viewing Netflix at once in the US was much larger than usual and higher than what they could scale too, or alternatively a software bug was triggered.
Live content is harder because it can't really be cached, nor, due to TLS, can you really serve everyone the same stream. I think the hardest problem to solve is provisioning. If you are expecting 1 million users, and 700,000 of them get routed to a single server, that server will begin to struggle. This can happen in a couple different ways - for example an ISP who isn't a large consumer normally, suddenly overloads its edge server. Even though your DC can handle the traffic just fine, the links between your DC and the ISP begin to suffer, and since the even is live, it's not like you can just wait until the cache is filled downstream.
isn't it a tree of cache servers? as origin sends the frames they're cached.
and as load grows the tree has to grow too, and when it cannot resorting to degrading bitrate, and ultimately to load shedding to keep the viewers happy?
and it seems Netflix opted to forego the last one to avoid a the bad PR of an error message of "we are over capacity" and instead went with actually let it burn, no?
When I mean "cached", it means that the PoP server can serve content without contacting the origin server. (The PoP can't serve content it does not have).
>and it seems Netflix opted to forego the last one to avoid a the bad PR of an error message of "we are over capacity" and instead went with actually let it burn, no?
Anything other than 100% uptime is bad PR for Netflix.
With on-demand you can push the episodes out through your entire CDN at your leisure. It doesn't matter if some bottleneck means it takes 2 hours to distribute a 1 hour show worldwide, if you're distributing it the day before. And if you want to test, or find something that needs fixing? You've got plenty of time.
And on-demand viewers can trickle in gradually - so if clients have to contact your DRM servers for a new key every 15 minutes, they won't all be doing it at the same moment.
And if you did have a brief hiccup with your DRM servers - could you rely on the code quality of abandonware Smart TV clients to save you?
People using over the air antennas get it “live“. Getting it from cable or a streaming service meant anywhere between a few seconds and over a minute of delay.
It was absolutely common to have a friend text you about something that just happened when you haven’t even seen it yet.
You can’t even say that $some_service is fast, some of them vary over 60 seconds just between their own users.
I'd imagine with on-demand services you already have the full content and therefore can use algorithms to compress frames and perform all kinds of neat tricks to.
With live streaming I'd imagine a lot of these algorithms are useless as there isn't enough delay & time to properly use them, so they're required to stream every single pixel and maybe some JIT algorithms
But in either case, you can put that stuff on your CDN days ahead of time. You can choose to preload it in the cache because you know a bunch of people are gonna want it. You also know that not every single individual is going to start at the exact same time.
For live, every single person wants every single bite at the same time and you can’t preload anything. Brutal.
* Encoding - low latency encoders are quite different than storage encoders. There is a tradeoff to be made in terms of the frequency of key frames vs. overall encoding efficiency. More key frames means that anyone can tune in or recover from a loss more quickly, but it is much less efficient, reducing quality. The encoder and infrastructure should emit transport streams, which are also less efficient but more reliable than container formats like mp4.
* Adaptation - Netflix normally encodes their content as a ladder of various codecs and bitrates. This ensures that people get roughly the maximum quality that their bandwidth will allow without buffering. For a live event, you need the same ladder, and the clients need to switch between rungs invisibly.
* Buffering - for static content, you can easily buffer 30 seconds to a minute of video. This means that small latency or packet loss spikes are handled invisibly at the transport/buffering layer. You can't do this for a live event, since that level of delay would usually be unacceptable for a sporting event. You may only be able to buffer 5-10 seconds. If the stream starts to falter, the client has only a few seconds to detect and shift to a lower rung.
* Transport - Prerecorded media can use a reliable transport like TCP (usually HLS). In contrast, live video would ideally use an unreliable transport like UDP, but with FEC (forward error correction). TCP's reaction to packet loss halves the receive window, which halves bandwidth, which would have to trash the connection to shift to a lower bandwidth rung.
* Serving - pre-recorded media can be synchronized to global DCs. Live events have to be streamed reliably and redundantly to a tree of servers. Those servers need to be load balanced, and the clients must implement exponential backoff or you can have cascading failures.
* Timing - Unlike pre-recorded media, any client that has a slightly fast clock will run out of frames and either need to repeat frames and stretch audio, or suffer glitches. If you resolve this on the server side by stretching the media, you will add complication and your stream will slowly get behind the live event.
* DVR - If you allow the users to pause, rewind, catch up, etc., you now have a parallel pre-recorded infrastructure and the client needs to transition between the two.
* DRM - I have no idea how/if this works on a live stream. It would not be ideal that all clients use the same decryption keys and have the same streams with the same metadata. That would make tracing the source of a pirate stream very difficult. Differentiation/watermarking adds substantial complexity, however.
If anyone was waiting for the main card to tune in, I recommend tuning in now.
Also, no buffering issues on my end. Have to wonder if it's a regional issue.
A combination of hubris and groupthink.
I'd think this isn't too crazy to stress test. If you have 300 million users signed up then you're stress test should be 300 million simultaneous streams in HD for 4 hours. I just don't see how Netflix screws this up.
Maybe it was a management incompetence thing? Manager says something like "We only need to support 20 million simultaneous streams" and engineers implement to that spec even if the 20 million number is wildly incorrect.
As far as single stream, Disney's Hotstar claimed 59 million for last year's Cricket World Cup, and as far as the YT platform, the Chandrayaan-3 lunar landing hit 8 million.
100 million is a lot of streams, let alone 300. But also note that not every stream reaches a single individual.
And, as far as the 59 million concurrent streams in India, the bitrate was probably very low (I'd wager no more than 720p on average, possibly even 480p in many cases). It's again a very different problem across the board due to regional differences (such as spread of devices, quality of network, even behavioral differences).
I remember watching the last season of Game of Thrones on one streaming provider, which topped out about 3.5mbit but claimed it was "1080p".
Give me a 15mbit 640x480 over 3.5mbit of 1920x1080 for that type of material any day.
Yes, I don't think anyone's disputing that.
> I remember watching the last season of Game of Thrones on one streaming provider, which topped out about 3.5mbit but claimed it was "1080p".
Why the scare quotes? That's a perfectly reasonable bitrate using modern compression like H.265, especially for a TV show that's filmed at 24 fps.
The world cup final itself (and other major events) is distributed from the host broadcaster to either on site at the IBC or at major exchange points.
When I've done major events of that magnitude there's usually a backup scanner and even a tertiary backup. Obviously feeds get sent via all manner - the international feed for example may be handed off at an exchange point, but the reserve is likely available on satelite for people to downlink on. If the scanner goes (fire etc), then at least some camera/sound feeds can be switched direct to these points, on some occasions there's a full backup scanner too.
Short of events that take out the venue itself, I can't think of a plausible scenario which would cause the generation or distribution of the broadcast to break on a global basis.
I don't work for OBS/HBS/etc but I can't imagine they are any worse than other broadcast professionals.
The IT part of this stuff is pretty trivial nowadays, even the complex parts like the 2110 networks in the scanner tend to be commoditised and treated as you'd treat any other single system.
The most technically challenging part is unicast streaming to millions of people at low latency (DASH etc). I wouldn't expect an enormous architectural difference between a system that can broadcast to 10 million or 100 million though.
Anyway, network cable is the only way to go!
- Months ago, the "higher ups" at Netflix struck a deal to stream the fight on Netflix. The exec that signed the deal was probably over the moon because it would get Netflix into a brand new space and bring in large audience numbers. Along the way the individuals were probably told that Netflix doesn't do livestreaming but they ignored it and assumed their talented Engineers could pull it off.
- Once the deal was signed then it became the Engineer's problem. They now had to figure out how to shift their infrastructure to a whole new set of assumptions around live events that you don't really have to think about when streaming static content.
- Engineering probably did their absolute best to pull this off but they had two main disadvantages, first off they don't have any of the institutional knowledge about live streaming and they don't really know how to predict demand for something like this. In the end they probably beefed up livestreaming as much as they could but still didn't go far enough because again, no one there really knows how something like this will pan out.
- Evening started off fine but crap hit the fan later in the show as more people tuned in for the main card. Engineering probably did their best to mitigate this but again, since they don't have the institutional knowledge of live events, they were shooting in the dark hoping their fixes would stick.
Yes Netflix as a whole screwed this one up but I'm tempted to give them more grace than usual here. First off the deal that they struck was probably one they couldn't ignore and as for Engineering, I think those guys did the freaking best they could given their situation and lack of institutional knowledge. This is just a classic case of biting off more than one can chew, even if you're an SV heavyweight.
These failures reflect very poorly on Netflix leadership. But we all know that leadership is never held accountable for their failures. Whoever is responsible for this should at least come forward and put out an apology while owning up to their mistakes.
[0] https://time.com/6272470/love-is-blind-live-reunion-netflix/
You've never heard of a CEO or other C-suite or VP getting fired?
It most definitely happens. On the other hand, people at every level make mistakes, and it's preferable that they learn from them rather than be fired, if at all possible.
We have evidence of prior failures with livestreaming from Netflix. Were the same people responsible for that failure or do we have evidence of them having learned anything between events? If anything, I'd expect the best leaders would have a track record that includes failures while showcasing their ability to overcome and learn from those mistakes. But based on what information is publicly available, this doesn't seem to be the case in this situation.
It wasn't their first live event. A previous live event had similar issues.
It is. You can hire the people who have solved it to do it for you.
"GPGPU compute is a solved problem if you buy Nvidia hardware" type comment
You're replacing the word hire with buy. That misconstrues the comment. If you need to do GPGPU compute and have never done it, you work with a team that has. (And if you want to build it in house, you scale to it.)
Which is valid? If your problem can be solved by writing a check, then it's the easiest problem to have on the planet.
Netflix didn't have to put out 3 PhD dissertations on how to improve the SOTA of live streaming, they only needed to reliably broadcast a fight for a couple hours.
That is a solved problem.
Amazon and Cloudflare do that for you as a service(!). Twitch and YouTube do it literally every day. Even X started doing it recently so.
No excuses for Netflix, tbh.
You only need a big enough check.
India has landed on Mars for a fraction of the cost it took other nations, and the ESA has never been able to pull it off.
Not every cost is fungible and money isn't always the limiting factor.
People say this, but then fall in love, get divorced, get depressed, or their company might lose its mojo, get sued, or lose an unreplaceable employee. But they will still say “all risk can be costed.”
This further confirms my assertion, btw.
If I have to spell it out you're clearly debating in bad faith and we're done here.
Who cares if a thousand guys are incapable? (like Netflix, lmao)
What matters are the ones that can do it, and you even said they've done it at "a fraction of the cost".
Paraphrasing, your argument says more about the incompetence of the ESA than the impossibility of doing such thing.
Sure. This isn’t relevant to Netflix.
Every off the shelf component on the market needs institutional knowledge to implement, operate, and maintain it. Even Apple's "it just works" mantra is pretty laughable in the cold light of day. Very rarely in my experience do you ever get to just benefit from someone else's hard work in production without having an idea how properly implement, operate, and maintain it.
And that's at my little tiny ant scale. To call the problem of streaming "solved" for Netflix... Given the guess of the context from the GP post?
I just don't think this perspective is realistic at all.
Right. They have to hire one of the companies that does this. Each of YouTube, Twitch (Amazon), Facebook and TikTok have, I believe, handled 10+ million streams. The last two don't compete with Netflix.
Offering it for sale != having solved it.
If you can't provide the service you shouldn't sell it?
Pre-optimization is definitely a thing and it can massively hurt (i.e. startups go under) businesses. Let's stop pretending any businesses would say 'no' to extra revenue even before the engineering team had full assurance there was no latency drop.
And sure, there have probably been lots of examples where a business made promises they weren't confident about and succeeded. But there are surely also lots of examples where they didn't succeed.
So what's the moral of the story? I don't know, maybe if you take a gamble you should be prepared to lose that gamble. Sounds to me like Netflix fucked up. They sold something they couldn't provide. What are the consequences of that? I don't know, not do I particularly care. Not my problem.
Do startups really do this? I thought the capability is built or nearly built or at least in testing already with reasonable or amazing results, THEN they go to market?
Do startups go to other startups, fortune 500 companies and public companies to make false promises with or without due diligence and sign deals with the knowledge that the team and engineers know the product doesn't have the feature in place at all?
In other words:
Company A: "We can provide web scale live streaming service around the world to 10 billion humans across the planet, even the bots will be watching."
Company B: "OK, sounds good, Yes, here is a $2B contract."
Company A: "Now team I know we don't have the capability, but how do we build, test and ship this in under 6 months???"
Next thing you know it's 9pm on a Sunday night and your desperately trying to ship a build for a client.
Netflix isn't some scrappy company though. If I had to guess they threw money at the problem.
A much better approach would of been to slowly scale over the course of a year. Maybe stream some college basketball games first, slowly picking more popular events to get some real prod experience.
Instead this is like their 3rd or 4th live stream ever. Even a pre show a week before would of allowed for greater testing.
I'm not a CTO of a billion dollar company though. I'm just an IC who's seen a few sites go down underload.
To be fair no one knows how it's going to go before it happens. It would of been more surprising for them to pull this off without issues... It's a matter of managing those issues. I know if I had paid 30$ for a Netflix subscription to watch this specific event I'd assume I got ripped off.
You can be totally honest and upfront that the functionality doesn't exist yet and needs to be built first, but that you think you understand the problem space and can handle the engineering, provided you can secure the necessary funding, where, by the way, getting a contract and some nominal revenue now could greatly help make this a reality...
And if the upside sounds convincing enough, a potential customer might happily sign up to cover part of your costs so they can be beta testers and observe and influence ongoing development.
Of course it happens all the time that the problem space turns out to be more difficult than expected, in which case they might terminate the partnership early and then the whole thing collapses from lack of funding.
In the enterprise sector this is rampant. Companies sell "platforms" and those missing features are supposed to be implemented by consultants after the sale. This means the buyer is the one footing the bill for the time spent, and suffering with the delays.
That's for startups that can't bootstrap (most of them). For ones which can, they may still choose to do this with customers, as you describe, because it means letting their work follow the money.
> If you can't provide the service you shouldn't sell it?
Then how will the folks in Sales get their commission?
Besides, not providing the service hasn't stopped Tesla from selling FSD, and their stock has been going gangbusters.
/s
That's why serious analysis requires a factual basis, such as science, law, and good engineering and management. You need analytics data to figure out where the performance and organizational bottlenecks are.
Before people tried to understand illness with a factual basis, they wrote speculative essays on leeching and finding 'better' ways to do it.
They failed. Full stop. There is no valid technical reason they couldn’t have had a smooth experience. There are numerous people with experience building these systems they could have hired and listened to. It isn’t a novel problem.
Here are the other companies that are peers that livestream just fine, ignoring traditional broadcasters:
- Google (YouTube live), millions of concurrent viewers
- Amazon (Thursday Night Football, Twitch), millions of concurrent viewers
- Apple (MLS)
NBC live streamed the Olympics in the US for tens of millions.
So Netflix had 2 factors outside of their control
- unknown viewership
- unknown peak capacities outside their own networks
Both are solvable, but if you serve "saved" content you optimize for different use case than live streaming.
Live events are difficult.
I'll also add on, that the other things you've listed are generally multiple simultaneous events; when 100M people are watching the same thing at the same time, they all need a lot more bitrate at the same time when there's a smoke effect as Tyson is walking into the ring; so it gets mushy for everyone. IMHO, someone on the event production staff should have an eye for what effects won't compress well and try to steer away from those, but that might not be realistic.
I did get an audio dropout at that point that didn't self correct, which is definitely a should have done better.
I also had a couple of frames of block color content here and there in the penultimate bout. I've seen this kind of stuff on lots of hockey broadcasts (streams or ota), and I wish it wouldn't happen... I didn't notice anything like that in the main event though.
Experience would likely be worse if there were significant bandwidth constraints between Netflix and your player, of course. I'd love to see a report from Netflix about what they noticed / what they did to try to avoid those, but there's a lot outside Netflix's control there.
- 120m viewers [1]
- Entire Netflix CDN Traffic grew 4x when the live stream started [2]
[1] https://www.rollingstone.com/culture/culture-news/jake-paul-...
Despite their already huge presence, Amazon for example has multiple CDNs involved for capacity for live events. Same for Peacock.
https://www.geeksforgeeks.org/how-did-hotstar-managed-5-9-cr...
For what it is worth, all things being equal there would be lot more non engineering in Hotstar for 2000 employees versus a streaming company of similar size or scale of users. Hotstar operates in challenging and fragmented market, India has 10+ major languages(and corresponding TV, music and movie markets) Technically there is not much difference to what Netflix or Disney has to do for i18n, however operationally each market needs separate sales, distribution and operations.
---
P.S. Yes Netflix operates in more markets including India than anybody else, however if you are actually using Netflix for almost any non English content, you will know how weak their library and depth in other markets are, their usual model in most of these markets is to have few big high quality(for that market) content rather than build depth.
P.P.S. Also yes, Indian market is seeing consolidation in the sense that many releases on streaming are multiple lingual and use major stars from more than one language to draw talent ( not new, but growing in popularity as distribution becomes cheaper with streaming), however this is only seen in big banner productions as tastes are quite different in each market and can't scale for all run of the mill content.
This is the company that supplies technology to Hotstar, Hulu, MLB Live streaming, etc.
https://en.m.wikipedia.org/wiki/Disney_Streaming
Hotstar is a completely different company.
* Twitch: Essentially invented live streaming. Fantastic.
* Amazon Interactive Video Service [0]: Essentially "Twitch As A Service", built by Twitch engineers. Fantastic.
* Prime Video. Same exact situation as Netflix: original expertise is all in static content. Lots of growing pains with live video and poor reports. But they've figured it out: now there are regular live streams (NHL and NFL), and other channel providers do live streaming on Prime Video as a distribution platform.
pisses me off.
physical media for the win, tho.
It's not full stop. There are reasons why they failed, and for many it's useful and entertaining to dissect them. This is not "making excuses" and does not get in the way of you, apparently, prioritizing making a moral judgment.
I’m pretty confident that when the post mortem is done the issues are going to be way closer to the broadcast truck than the user.
At least for NFL pirate streams, it seems they tend to use "burner" tenants from Azure and AWS. Of course they get shut down, but how hard is it to spin up another one?
Antonio Brown is not “just some dude”. He’s a national treasure.
> But the real indicator of how much Sunday’s screw-up ends up hurting Netflix will be the success or failure of its next live program—and the next one, and the one after that, and so on. There’s no longer any room for error. Because, like the newly minted spouses of Love Is Blind, a streaming service can never stop working to justify its subscribers’ love. Now, Netflix has a lot of broken trust to rebuild.
And you’ll never guess which Presidential candidate they both support!
edit: literally a nginx gateway timed out screen if you view the response from the cdn... wow
> https://www.icc-cricket.com/news/biggest-cricket-world-cup-e...
An impressive achievement and the scale netflix failed to do a year later.
In a protocol where a oft-repeated request goes through multiple intermediaries, usually every intermediate will be able to cache the response for common queries (Eg: DNS).
In theory, ISPs would be able to do the same with the HTTP. Although I am not aware of anyone doing such (since it will rightfully raise concerns of privacy and tampering).
Now TLS (or other encryption) will break this abstraction. Every user, even if they request a live stream, receives a differently encrypted response.
But live stream of a popular boxing match has nothing to do with the "confidentiality" of encryption protocol, only integrity.
Do we have a protocol which allows downstream intermediates eg ISPs to cache content of the stream based on demand, while a digital signature / other attestation being still cryptographically verified by the client?
I don't see it much mentioned the last few years, but the research groups have ongoing publications. There's an old 2006 Van Jacobson video that is a nice intro.
I assume this came down to some technical manager saying they didn't have the human and server resources for the project to work smoothly and a VP or something saying "well, just do the best you can.. surely it will be at least a little better than last time we tried something live, right?"
I think there should be a $20 million class action lawsuit, which should be settled as automatic refunds for everyone who streamed the fight. And two executives should get fired.
At least.. that's how it would be if there was any justice in the world. But we now know there isn't -- as evidenced by the fact that Jake Paul's head is still firmly attached to his body.
I have done live streaming for around 100k concurrent users. I didn't setup infrastructure because it was CloudFront CDN.
Why it is hard for Netflix. They have already figured out CDN part. So it should not be a problem even if it is 1M or 100M. because their CDN infrastructure is already handling the load.
I have only work with HLS live streaming where playlist is constantly changing compared to VOD. Live video chunks work same as VOD. CloudFront also has a feature request collapsing that greatly help live streaming.
So, my question is if Netflix has already figured out CDN, why their live infrastructure failing?
Note: I am not saying my 100k is same scaling as their 100M. I am curious about which part is the bottleneck.
100k concurrents is a completely different game compared to 10 million or 100 million. 100k concurrents might translate to 200Gbps globally for 1080p, whereas for that same quality, you might be talking 20T for 10 million streams. 100k concurrents is also a size such that you could theoretically handle it on a small single-digit number of servers, if not for latency.
> CloudFront also has a feature request collapsing that greatly help live streaming.
I don't know how much request coalescing Netflix does in practice (or how good their implementation is). They haven't needed it historically, since for SVOD, they could rely on cache preplacement off-peak. But for live, you essentially need a pull-through cache for the sake of origin offload. If you're not careful, your origin can be quickly overwhelmed. Or your backbone if you've historically relied too heavily on your caches' effectiveness, or likewise your peering for that same reason.
200Gbps is a small enough volume that you don't really need to provision for that explicitly; 20Tbps or 200Tbps may need months if not years of lead time to land the physical hardware augments, sign additional contracts for space and power, work with partners, etc.
In fact, optimizing for later can hurt the former.
Would be interesting to read any postmortems on this failure. Maybe someone will be kind enough to share the technical details for the curious crowd.
I'm sure they will get it figured out.
That's the plain-text message I see when I tried to refresh the stream.
Follow-up:
My location: East SF Bay.
Now even the Netflix frontpage (post login, https://www.netflix.com/browse ) shows the same message.
The same message even in a private window when trying to visit https://www.netflix.com/browse
The first round of the fight just finished, and the issues seem to be resolved, hopefully for good. All this to say what others have noted already, this experience does not evoke a lot of confidence in Netflix's live-streaming infrastructure.
I remember when ESPN started streaming years back, it was awful. Now I almost never have problems with their live events, primarily their NHL streams.
Anyway, I wouldn’t be surprised if they were prioritizing mobile traffic because it’s more forgiving of shitty bitrate.
These kind of reports are the equivalent of saying "I have power" when you're hundreds of miles away from where a hurricane landed. It's uninteresting, it's likely you have power, and it does literally nothing for people that do not have power.
It doesn't advance the topic anywhere. There are other places to report these issues directly and in aggregate with other people -- HN (and many other forums) are not that place.
I hope you find it within yourself to treat strangers nicer.
You butted into a conversation to tell someone their contribution added no value without adding anything constructive. A comment of “your comment is useless” is pure aggression and is ironically even less useful than the one it’s deriding.
> Tone however, matters in any context.
You are getting upset because someone used a swear word. You’ll find that is just deep seated classism and working on that will let you have much more fulfilling interactions.
Tone policing never works. It’s a waste of calories and everyone’s time.
Being able to livestream a sporting event is the default now and has been for at least over a decade since HBO’s original effort to stream a Game of Thrones season opener failed because of the MSFT guy they hired, and they fixed it by handing it over to MLBAM.
Maybe that’s what Netflix should do. Buy Disney so they can get access to the MLBAM people (now Disney Streaming because they bought it from MLB).
All your attacking power comes from your legs and hips, so if his legs weren’t stable he didn’t have much attacking power.
I think he gave it everything he had in rounds 1 and 2. Unfortunately, I just don’t think it was ever going to be enough against a moderately trained 27 year old.
There is a reason that cable doesn’t stream unicast and uses multicast and QAM on a wire. We’ve just about hit the point where this kind of scale unicast streaming is feasible for a live event without introducing a lot of latency. Some edge networks (especially without local cache nodes) just simply would not have enough capacity, whether in the core or peering edge, to do the trick.
I can't see traditional DVB/ATSC surviging much beyond 2040 even accounting for the long tail.
You're right that large scale parallel live streams has only become feasible in the last few years. The BBC has some insights in how the BBC had to change their approach to scale to getting 10 million in 2021, having had technical issues in the 3 million range in 2018
https://www.bbc.co.uk/webarchive/https%3A%2F%2Fwww.bbc.co.uk...
Personally I don't think the latency is solved yet -- TV is slow enough (about 10 seconds from camera to TV), but IP streaming tends to add another 20-40 seconds on top of that.
That's no good when you're watching the penalties. Not only will your neighbours be cheering before you as they watch on normal TV, but even if you're both on the same IPTV you may well 5 seconds of difference.
The total end-to-end time is important too, with 30 seconds the news push notifications, tweets, etc on your phone will come in before you see the result.
SFP itself isnt much the issue. Serdes is, and then secondarily the operating power envelope for the things (especially for the kinds of optics that run hot). Many tradeoffs available.
>I can't see traditional DVB/ATSC surviging much beyond 2040 even accounting for the long tail.
Tend to agree in well-developed infra, but rural and poorly-developed are well served with more traditional broadcast. Just saying “starlink!” 5 times in a dark bathroom won’t fix that part.
> Personally I don't think the latency is solved yet -- TV is slow enough (about 10 seconds from camera to TV), but IP streaming tends to add another 20-40 seconds on top of that.
I dont think it will get better. Probably worse, but with net better service quality. HLS/DASH are designed for doing the bursty networking thing. Among good reasons for this, mobile works much better in bursts than strict linear streams, segment caching is highly effective, etc.
But I think this makes sense: its a server-side buffering thing that has to happen. So assuming transmuxing (no transcoding lol) and wire latency are 0, we’re hitting the 1-5 seconds for the segment, probably waiting for a fill of 10 seconds to produce the manifest, then buffering client-side another 10 or so. Throw in more cache boxes and it’ll tick up more. It is quite high, but aside from bookies, i dont know how much people will actually care vs complain.
I'm tired of all this junk entertainment which only serves to give people second-hand emotions that they can't feel for themselves in real life. It's like, some people can't get sex so they watch porn. People can't fight so they watch boxing. People can't win in real life so they play video games or watch superhero movies.
Many people these days have to live vicariously through random people/entities; watch others live the life they wished they had and then they idolize these people who get to have everything... As if these people were an intimate projection of themselves... When, in fact, they couldn't be more different. It's like rooting for your opponent and thinking you're on the same team; when, in fact, they don't even know that you exist and they couldn't be more different from you.
You're no Marvel superhero no matter how many comic books you own. The heroes you follow have nothing to do with you. Choose different heroes who are more like you. Or better; do something about your life and give yourself a reason to idolize yourself.
Isn't live streaming at scale already solved problem by cable companies? I never seen ESPN going down during a critical event
To understand how to do things correctly look at something like pornhub who handle more scale than Netflix without crying about it.
The other day I was having this discussion with somebody who was saying distributed counter logic is hard and I was telling them that you don't even need it if Netflix didn't go completely mental on the microservices and complexity.
Netflix dropped the ball hard
If lots of people are upvoting and downvoting the same comments, that's treated as a signal the topic is contentious and people are likely to start behaving badly.
HN is very clear they prioritize good behavior as the long term goal, and they are as a result comfortable having contentious topics fall in the ranking even if everyone involved in the discussion feels the topic is important.
Also, I have never seen any Netflix employees who are arrogant or who think they are superior to other people. What I have seen is Netflix's engineering organization frequently describes the technical challenges they face and discusses how they solve them.
A typical streaming architecture is multi-tiered caches, source->midtier->edge.
We don't know what happened but it's possible they ran out of capacity on their edge (or anywhere else).
> Even though widely used, this pattern has some significant drawbacks, the best illustration being the major incident that hit the BBC during the 2018 World Cup quarter-final. Our routing component experienced a temporary wobble which had a knock-on effect and caused the CDN to fail to pull one piece of media content from our packager on time. The CDN increased its request load as part of its retry strategy, making the problem worse, and eventually disabled its internal caches, meaning that instead of collapsing player requests, it started forwarding millions of them directly to our packager. It wasn’t designed to serve several terabits of video data every second and was completely overwhelmed. Although we used more than one CDN, they all connected to the same packager servers, which led to us also being unable to serve the other CDNs. A couple of minutes into extra time, all our streams went down, and angry football fans were cursing the BBC across the country.
https://www.bbc.co.uk/webarchive/https%3A%2F%2Fwww.bbc.co.uk...
I worked in this space. All these potential failure modes and how they're mitigates is something that we paid a fair amount of attention to.
And then all these sessions lag, or orphan taking up space, so many reconnections at various points in the stream.
System getting hammered. Can't wait for this writeup.
I'd expect the NFL games to have a largely American audience, but today's boxing event attracted a global audience.
https://www.theverge.com/2024/11/16/24298338/netflix-mike-tt...
Assuming Netflix used its extensive edge cache network to distribute the streams to the ISPs. The software on the caching servers would have been updated to be capable of dealing with receiving and distributing live streamed content, even if maybe the hardware was not optimal for that (throughput vs latency is a classic networking tradeoff).
Now inside the ISPs network again everyting would probably be optimized for the 99.99% usecase of the Netflix infra: delivering large bulk data that is not time sensitive. This means very large buffers to shift big gobs of packets in bulk.
As everything along the path is trying to fill up those buffers before shipping to the next router on the path, some endpoints aware this is a live stream start cancelling and asking for more recent frames ...
Hilarity ensues
Unfortunately, except for the women's match, the fights were pretty lame...4 of the 6 male boxers were out of shape. Paul and Tyson were struggling to stay awake and if you were to tell me that Paul was just as old as Tyson I would have believed it.
I don’t know if it’s still the case, but in the past some devices worked better than others during peak times because they used different bandwidth providers. This was the battle between Comcast and Cogent and Netflix.
Remember back in 2014 or so when Netflix users on Comcast were getting slow connections and buffering pauses? It didn't affect people who watched Netflix via Apple TV because Netflix served Apple TV users with a different network.
> In a little known, but public fact, anyone who is on Comcast and using Apple TV to stream Netflix wasn’t having quality problems. The reason for this is that Netflix is using Level 3 and Limelight to stream their content specifically to the Apple TV device. What this shows is that Netflix is the one that decides and controls how they get their content to each device and whether they do it via their own servers or a third party. Netflix decides which third party CDNs to use and when Netflix uses their own CDN, they decide whom to buy transit from, with what capacity, in what locations and how many connections they buy, from the transit provider. Netflix is the one in control of this, not Comcast or any ISP.
https://www.streamingmediablog.com/2014/02/heres-comcast-net...
Cogent just seems to love picking fights with everyone (see Hurricane Electric). Why are they still in business?
For me it was buffering and low resolution, on the current AppleTV model, hardwired, with a 1Gbps connection from AT&T. Some streaming devices may have handled whatever issues Netflix was having better than others, but this was clearly a bigger problem than just the streaming device.
I think that every time I wait for Paramount+ to restart after its gone black in picture on picture, and yet, I’n still on Paramount+ and not Netflix, so maybe that advantage isn’t real.
[1]https://www.marketwatch.com/story/i-opt-to-fly-private-no-ma...
I specifically found the Netflix suite for Spring very lacking, and found message oriented architectures on something like NATS a lot easier to work with.
I remember a lot of trade magazines in the late 1990's during the dot com boom talked about how important it would be.
https://en.wikipedia.org/wiki/IP_multicast
I never hear about it anymore. Is that because everyone wants to watch something different at their own time? Or is it actually working just fine now in the background? I see under the "Deployment" section it mentions IPTV in hotel rooms.
Sure you get these black swan events that everyone wants to watch, but they're just that, really infrequent. So instead you have to provision capacity if on the interent to do big events like this. The upside is that you have a billion people able to point to point to another billion people instead of 30 companies who can send to everyone.
Since then I am very used to it because our institutional web sites traditionally crash when there is a deadline (typically the taxes or school inscriptions).
As for that one, my son is studying in Europe (I am also in Europe), he called me desperate at 5 am or so to check if he is the only one with the problem (I am the 24/7 family support for anything plugged in). After having liberally insulted Netflix he realized he confirmed with his grandparents that he will be helping them at 10 :)
I had some technical experience with live video streaming over 15 years ago. It was a nightmare back then. I guess live video is still difficult in 2024. But congrats to Jake Paul and boxing fans. It was a great event. And breaking the internet just adds more hype for the next one.
I’m in the Pacific Northwest. I wonder if we got lucky on this or just some areas got unlucky.
All of the conditions was perfect for Netflix, and it seems that the platform entirely flopped.
Is this what chaos engineering is all about that Netflix was marketing heavily to engineers? Was the livestream supposed to go down as Netflix removed servers randomly?
Some buffering issues for us, but I bet views are off the charts. Huge for Netflix, bad for espn, paramount, etc etc
A see in the comments multiple people talking about how "cable" companies who have migrated to IPTV has solved this problem.
I'd disagree.
I'm on IPTV and any major sporting event (World Series, Super Bowl, etc) is horrible buffering when I try to watch on my 4K IPTV (streaming) channel. I always have to downgrade to the HD channel and I still occasionally experience buffering.
So Netflix isn't alone in this matter.
I didn’t watch it live (boxing has lot its allure for me) but vicariously lived through it via social feed on Bluesky/Mastadon.
Billions of dollars at their disposal and they just can’t get it right. Probably laid off the highly paid engineers and teams that made their shit work.
More likely overpaying a bunch of posers without the chops, a victim of their own arrogance.
The fight itself was lame which worked in their favor. No one really cared about not being able to see every second of the "action". It's not like it was an NBA game that came down to the last second.
In boxing you are old by 32 or maybe 35 year old for heavy weight, and everything goes down very very fast.
End of rant.
"Netflix streamed the fight to its 280 million subscribers"
Perhaps the technology was over-sold.
Oddly having watched PPV events via the high seas for years, it feels normal...
Leading up to the fight, there were many staged interactions meant to rile up the audience and generate hype and drive subscription revenue, and subsequently make ad spots a premium ($$$).
Unfortunately, American television/entertainment is so fake. I can’t get even be bothered to engage or invest time into it anymore.
Hopefully Netflix can share more about what they learned, I love learning about this stuff.
I had joked I would probably cancel Netflix after the fight.. since I realized other platforms seemed to have more content both old and new.
Then the video started stuttering.
Internet live streaming is harder than cable tv sattelite live streaming over "dumb" TV boxes cable. They should not have used internet for this honestly. A TV signal can go to millions live.
Woke up at 4am (EU here), to tune for the main event. Bought Netflix just for this. The women fight went good, no buffering, 4K.
As it approached the time for Paul vs Tyson, it started to first drop to 140p, and then constantly buffer. Restarted my chromecast a few times, tried from laptop, and finally caught a stream on my mobile phone via mobile network rather than my wifi.
The TV Netflix kept blaming my internet which kept coming back as “fast”.
Ended up watching the utterly disappointing, senior abuse, live stream on my mobile phone with 360p quality.
Gonna cancel Netflix and never pay for it it again, nor watch hyped up boxing matches.
Netflix is clearly not designed nor prepared for scalable multi-region live-streaming, no matter the amount of 'senior' engineers they throw at the problem.
Well, yes. Who would think Netflix was designed for that? They do VOD. They're only trying to move into this now.
It's almost like this platform has been taken over by JavaScripters completely.
That is an incredible way to phrase the sentiment, thank you.
I have Spectrum (600 Mbps) for ISP and Verizon for mobile.
Update: Switched to the app on my phone and so far so good.
You're referring to Hooli's streaming of UFC fight that goes awry and Gavin Belson totally loses it, lol. Great scene and totally relevant to what's happening with Netflix rn.
I guess in the year when Trump is being reelected this is hardly a surprise.
These secondary streams might be serving a couple thousand users at best.
Initial estimates are in the hundreds of millions for Netflix. Kind of a couple of orders of magnitude difference there.
I think that is the point, in fact.
Because taken at face value it's false. Any technical challenges involved in distributing a stream cannot possibly be affected by the legal status of the bits being pushed across the network.
i thought tyson was in eldercare.
Our engineers are fucking morons. And this guy was the dumbest of the bunch. If you think Netflix hires top tier talent, you don't know Netflix.
Apparently he was smart enough to get away from the Fortune 500 company he worked at, reporting to yourself, and "got a pay raise too."
> Our engineers are fucking morons. And this guy was the dumbest of the bunch.
See above.
> If you think Netflix hires top tier talent, you don't know Netflix.
Maybe you don't know the talent within your own organization. Which is entirely understandable given your proclamation:
Our engineers are fucking morons.
Then again, maybe this person who left your organization is accurately described as such, which really says more about the Fortune 500 company employing him and presumably continues to employ yourself.IOW, either the guy left to get out from under an EM who says he is a "fucking moron" or he actually is a "fucking moron" and you failed as a manager to elevate his skills/performance to a satisfactory level.
Managers aren't teachers. They can spend some time mentoring and teaching but there's a limit to that. I've worked with someone who could not write good code and no manager could change that.
Most people I've worked with aren't like that of course (there's really only one that stands out), so maybe you've just been lucky enough to avoid them.
I do find it unlikely that all of his engineers are morons, but on the other hand I haven't worked for a typical fortune 500 company - maybe that's where all the mediocre programmers end up.
In fact, what am I even doing in this thread? - close-tab.
sometimes managers don't have the authority to fire somebody and are forced to keep their subordinates. Yes good managers can polish gold, but polishing poop still results in poop.
cost arrayIneed = [];
const arrayIdontNeed = firstArray.map(item => {
if(item.hasProp) { arrayIneed.push(item); }
});
return arrayIneed;
the above is very much a cleaned up and elegant version of what he would actually push into the repo.
he left for a competitor in the same industry, this was at the second biggest company for the industry in Denmark and he left for the biggest company - presumably he got a pay raise.
I asked the manager after he was gone, one time when I was refactoring some code of his - which in the end just meant throwing it all out and rewriting from scratch - why he had been kept on so long, and the manager said there were some layoffs coming up and he would have been out with those but because of the way things worked it didn't make sense to let him go earlier.
Incentives are fucked across the board right now.
Move on a low performer today and you'll have an uphill battle for a backfill at all. If you get one, many companies are "level-normalizing" (read: level-- for all backfills). Or perhaps your management thinks the job could be done overseas cheaper, or you get pushed to turn it into a set of tasks so you can farm it out to contractors.
So you keep at least some shitty devs to hold their positions, and as ballast to throw overboard when your bosses say "5% flat cut, give me your names". We all do it. If we get back to ZIRP I'll get rid of the actively bad devs when I won't risk losing their position entirely. Until then, it's all about squeezing what limited value they have out and keeping them away from anything important.
I don't think I'd want to work for you.
In fact it would be incredibly weird to ask a close friend who at their work kicks ass and who sucks and have them respond back, "I've never really thought about how good any of my coworkers were at their jobs"
That's not out of respect or anything, but because they're all good. I hired and mentored them, and they all passed probation.
Sure there are junior devs who are just starting, but they're getting paid less, so they're pulling their weight proportionately. They're not worse.
Example - the manager who started this sub-thread may be a pretty smart guy and able to accurately rate the intelligence of the engineers at his organization - but he had a minor momentary failing of intelligence to post on HN calling those engineers fucking morons.
You've got to rank how often the intelligence fails in someone to be able to figure out how reliable their intelligence is.
The trick is to use my massive brain to root cause several significant outages, discover that most of them originate in code written by the same employee, and notice that said employee liked to write things like
// @ts-nocheck
// obligatory disabling of typescript: static typing is hard, so why bother with it?
async function upsertWidget() {
try {
// await api.doSomeOtherThings({ ... })
// 20 line block of commented-out useless code
// pretend this went on much longer
let result = await api.createWidget({ a, b, c })
if (!result.ok) {
result = await api.createWidget({ a, b }) // retries for days! except with different args, how fun
if (!result.ok) {
result = await api.updateWidget({ a, b, c }) // oh wait, this time we're updating
}
}
// notice that api.updateWidget() can fail silently
// also, the three function calls can each return different data, I sure am glad we disabled typescript
return result
} catch (error) {
return error // I sure do love this pattern of returning errors and then not checking whether the result was an error or the intended object
}
}
function doSomething() {
const widget = await upsertWidget()
}
...except even worse, because instead of createWidget the name was something far less descriptive, the nesting was deeper and involved loops, there were random assignments that made no goddamn sense, and the API calls just went to an unnecessary microservice that was only called from here and which literally just passed the data through to a third party with minor changes. Those minor changes resulted in an internal API that was actually worse than the unmodified third party API.I am so tired of these people. I am not a 10x rockstar engineer and not without flaws, but they are just so awful and draining, and they never seem to get caught in time to stop them ruining perfectly fine companies. Every try>catch>return is like an icy cat hand from the grave reaching up to knock my coffee off my desk.
So again, maybe they're a bad employee but it seems like nothing's done to even try and minimize the risks they present.
Which, yes, does raise interesting questions about how someone who can't be trusted to handle errors in an API call ended up in a senior enough position to do that.
At that point it's not one person being obnoxious and never approving their team members diffs and more of a team effort to do so.
But at minimum if you have a culture of trying to improve your codebase you'll inevitably set up tests, ci/cd with checks, etc. before any code can be deployed. Which should really take any weight of responsibility out of any one member of the team. Whether trying to put out code or reject said code.
Sure, there's such a thing as stupid code, but without knowing the whole context under which a bit of code was produced, unless it's utterly moronic, (which is entirely possible, dailywtf has some shiners), it's hard to really judge it. Hindsight, as applied to code, is 2020.
The difference for me is that this is pervasive. Yes, sometimes I might write code with a bug in error handling at 3am when I'm trying to bring a service back up, but I don't do it consistently across all the code that I touch.
I accept that the scope is hard to understand without having worked on a codebase which a genuine fucking moron has also touched. "Oh strken," you might say, "surely it can't be that bad." Yes, it can. I have never seen anything like this before. It's like the difference between a house that hasn't been cleaned in a week and a hoarder's house. If I tried to explain what hoarding is, well, maybe you'd reply that sometimes you don't do the dishes every day or mop the floor every week, and then I'd have to explain that the kitchen I'm talking about is filled from floor to roof with dirty dishes and discarded wrappers, including meat trays, and smells like a dead possum.
Have you considered that maybe you're being overly harsh about your co-workers? Maybe take the fact that one of them was hired by a top paying employer as a sign that you should improve your own ability to judge skill?
And so on
If he still works there the morron who left was less of a.
The timesheets were on paper so good luck putting your real hours on without your manager, who files it, finding out.
I’d be amazed if they ever cleaned up their act.
The reality though is that large companies with thousands of people generally end up having average people. Some company may hire more PhD's. But on average those aren't better software engineers than non-PhD's. Some might hire people who are strong competitive coders, but that also on average isn't really that strong of a signal for strong engineers.
Once you have a mix of average people, on a curve, which is the norm, the question becomes do you have an environment where the better people can be successful. In many corporate environments this doesn't happen. Better engineers may have obstacles put in front of them or they can forced out of the organization. This is natural because for most organizations can be more of a political question than a technical question.
Smaller organizations, that are very successful (so can meet my two criterias) and can be highly selective or are highly desirable, can have better teams. By their nature as smaller organizations those teams can also be effective. As organizations grow the talent will spread out towards average and the politics/processes/debt/legacy will make those teams less effective.
I used to want to work at a FAANG-like company when I was just starting out thinking they were going to be full of amazing devs. But over the years, I've seen some of the worst devs go to these companies so that just destroyed that illusion. And the more you hear about the sort of work they do, it just sounds boring compared to startups.
I interviewed at Netflix a few years ago; with several of their engineers. One thing I cannot say is that they are morons.
their interview process is top notch too and while I was ultimately rejected, I used their format as the base template for how I started hiring at my company.
It can be both true that Netflix has God tier talent and a bunch of idiots. In fact, that's probably true of most places. I guess the ratio matters more.
You just have to accept most staff at any corporation are simply just average. There has to be an average for there to be better and worse.
If your "dumbest engineer" got a job and a hefty raise going to Netflix, it means he was very capable engineer who was playing the part of moron at this Fortune 500 company because he was reporting to a manager who was calling him and the entire team morons and he didn't feel the need to go above and beyond for that manager.
Also, highly likely that it was the manager that was the moron and not everyone around him.
It's also possible that there's very little correlation between capability, reputation and salary.
Don't we all know someone who is overpaid? There are more than a few well known cases of particular employers who select for overpaid employees...
Not well-known enough, apparently. Where should I be applying?
- The recent story of AWS using serverless for video processing comes to mind [1].
- Google is renowned for rest and vest.
- Many government jobs pay more than their private counterparts.
- Military contractors
- Most of the healthcare industry
- Lobbyists
Yes, usually managers.
They obviously have some really good engineers, but many low-tier ones as well. No idea how long they stay there, though.
I'm watching the fight now and have experienced the buffering issues. Bit embarrassing for a company that fundamentally only does a single thing, which is this. Also, yeah, 900k TC and whatnot but irl you get this. Mediocre.
But given how much they spend on engineering, how much time they had and how important this event is ... mediocre performance.
> a company that fundamentally only does a single thing, which is this
… isn’t true. From the couch, watching Suits and watching a live sports match may seem similar but they’re very different technical problems to solve.
It's more likely that you are bad at managing, growing and motivating your team.
Even if it was true, to refer to your team in this way makes you look like you are not ready for management.
Your duty is to get the most out of the team, and that mindset won't help you.
There's no reason to doubt what you say, probably people identify with the mistreated one. Why?
I’ve worked with engineers where I had to wonder how they found their computer every morning. I can easily see how a few of those would make you bitter and angry.
All the engineers in MY company are morons.
They're just bureucrats.
How are you involved in the hiring process?
> Our engineers are fucking morons. And this guy was the dumbest of the bunch.
Very indicative of a toxic culture you seem to have been pulled in to and likely have contributed to by this point given your language and broad generalizations.
Describing a wide group of people you're also responsible for as "fucking morons" says more about you than them.
Why do you call your engineers morons? Is it a lack of intelligence, a lack of wisdom, a lack of experience, inability to meet deadlines, reading comprehension, or something else?
I wonder if Netflix is just hiring for different criteria (e.g. you want people who will make thoughtful decisions while they want people who have memorized all the leetcode problems).
I have questions..
I think this is a result of most software "engineering" having become a self-licking ice cream cone. Besides mere scaling, the techniques and infrastructure should be mostly squared away.
Yes, it's all complicated, but I don't think we should excuse ourselves when we objectively fail at what we do. I'm not saying that Netflix developers are bad people, but that it doesn't matter how hard of a job it is; it was their job and what they did was inadequate to say the least.
Jonathan Blow is right.