Coordinating the Superbowl's visual fidelity with Elixir

644
172
lawik
3 months ago
elixir-lang.org

laserbeam
·
3 months ago
·
[ - ]

Of course! Of course you have to do color correction on all the different cameras pointed from different angles at a sports event.

I absolutely love reading about hard problems that are invisible to most people.

abrookewood
·
3 months ago
·
[ - ]

Yes, it's one of those super-niche & super important functions that are obvious once you know about them, but would never think about otherwise.

myst
·
3 months ago
·
[ - ]

Why is it super-important?

GiorgioG
·
3 months ago
·
[ - ]

Because events using many cameras without this type of setup would cause jarring visual differences when switching between one camera/view to another during the broadcast.

HeatrayEnjoyer
·
3 months ago
·
[ - ]

Why wasn't it jarring in early Superbowl games?

danielvf
·
3 months ago
·
[ - ]

There was even more color correction happening then - cameras were worse and more analog! It just was not being controlled from offsite, instead there was a dedicated room and engineer in the broadcast truck doing camera configuration + color correction.

The usual technique was to start by holding up a color card on the stage/floor then use a vectorscope[1] and get all the dots to line up in the right place. Then with a waveform monitor for exposure. During the event, there would be fine tuning by eye, or as things drifted out of line.

[1] https://en.wikipedia.org/wiki/Vectorscope#/media/File:PAL_Ve...

PS: You can also see modern vectorscope / waveform monitor images in this photo from the cyanview blog. Look for the black and white X-ray looking things on the screens. https://www.cyanview.com/wp-content/uploads/2022/10/20221006...

davidbou
·
3 months ago
·
[ - ]

They used around 150 cameras for the last Super Bowl. Most of them were Sony studio cameras, controlled with Sony remotes to ensure perfect alignment. But now, they’ve added a lot of specialty cameras: probably 4 or 8 pylons, each equipped with 2 to 4 cameras, plus drones, handheld mirrorless cameras, mini high-speed cameras, and a few other mini-cams for PoV (Point of View) shots. Last year, they even had a mini-cam inside the cars driving from the Bellagio to the stadium, controlled remotely over cellular. An Elixir process ran on a RIO in the car to manage the camera and connect to a cloud server, while the remote panel was linked to the same server to complete the connection. All three ran Elixir code, with the cloud server acting as a simple data relay.

If you want the green of the grass on all the pylon cameras to match your main production cameras, adjustments are a must. And with outdoor stadiums, this is a constant task—lighting conditions change throughout the day, even when a cloud moves across the sky. When night falls, video engineers are working non-stop to keep everything perfectly aligned with the main cameras.

lgeorget
·
3 months ago
·
[ - ]

Fewer cameras, lower resolution, poor color rendering even with one camera anyway?

devmor
·
3 months ago
·
[ - ]

And less color fidelity on receiving displays I would imagine!

lo_zamoyski
·
3 months ago
·
[ - ]

I don't think he's saying the end this tech is serving is super important (televised professional sports), only that in order to televise professional sports and other events requiring similar camera work, it is important to do this kind of stuff.

·
3 months ago
·
[ - ]

pdntspa
·
3 months ago
·
[ - ]

If you've ever watched a poorly-produced porno where the colors change with every camera cut, you'll know why....

·
3 months ago
·
[ - ]

rekttrader
·
3 months ago
·
[ - ]

It’s the basis for the 99% invisible podcast.... I loved the one about elevators.

johnisgood
·
3 months ago
·
[ - ]

[flagged]

dugmartin
·
3 months ago
·
[ - ]

You would be surprised. When my brother was a movie theater manager he got invites to a lot of pre-screenings of movies. I was able to go with him a few times and I'll never forget seeing my first movie print before it went through color correction and ADR/sound balancing. Without those steps (and I'm sure others I'm not aware of) the movie experience was very jarring (and somewhat funny).

sagacity
·
3 months ago
·
[ - ]

Your comment is dismissing the entire field of color correction. That is not just a thing for this project, it is a part of literally every movie and TV show you watch and has been since the inception of colour film.

PaulHoule
·
3 months ago
·
[ - ]

I got into color grading still photos last summer. In my case it is not "correction" to the truth but rather making a set of images conform to a brand image. (I had a day when I went out to a beauty spot and packed the wrong lens, I made up a story about another photographer who had a camera from an alternate timeline and developed a method to take distinctive pictures with a cheap lens)

Funny the only kind of picture that I don't color grade are sports photos because I don't want to mess up the color of the jerseys, though if I was careful in how I did it, it would be OK.

I have been struggling to develop a reliable process for making red-cyan anaglyphs and one step of the process would be a color grade that moves colors away from reds and cyans that would all be in one eye or the other eye. I've got to figure out how to make my own LUT cubes to do it.

https://resolve.cafe/developers/luts/

AlecSchueler
·
3 months ago
·
[ - ]

Sounds interesting, is there somewhere we can see your work?

jjulius
·
3 months ago
·
[ - ]

Are they dismissing it, or are they just ignorant (which is totally OK) and need to be shown the way? They've literally asked if it's "really important", perhaps we could answer that question?

johnisgood
·
3 months ago
·
[ - ]

In all fairness, I edited my comment, I did say "I don't think it's important", but it was indeed due to ignorance, as well as ask if it really is important.

ziddoap
·
3 months ago
·
[ - ]

>perhaps we could answer that question?

Perhaps they could read the article prior to asking because some of those questions might be answered in the article?

johnisgood
·
3 months ago
·
[ - ]

Might be, or is?

ziddoap
·
3 months ago
·
[ - ]

I know of a great way that you could find out whether your questions are answered in the article!

mhb
·
3 months ago
·
[ - ]

Sorry, didn't get to the end of your comment. What's the answer?

h3half
·
3 months ago
·
[ - ]

>Sorry

Apology accepted. If you wrote anything after the apology you'll have to have your AI talk to my AI because I didn

·
3 months ago
·
[ - ]

johnisgood
·
3 months ago
·
[ - ]

I did not intend to dismiss color correction as a whole, my bad if it came across as such.

sagacity
·
3 months ago
·
[ - ]

Perhaps the reason you're dismissing it is because it is so ubiquitous (and well done) you have never really noticed that it was even a thing? :)

johnisgood
·
3 months ago
·
[ - ]

It could be the case, yes. My friend studied and works in optics (physics & CS) but we have not talked about it, not even sure color correction is something related to optics but could be. Perhaps the time to ask has come now. :P

johnmaguire
·
3 months ago
·
[ - ]

Usually color correction happens in post-production, where optics are developed pre-production. Definitely, poorly designed lenses will have color casting and fringing issues, but color correction is largely about balancing colors across various lighting conditions / sources. (Think Daylight vs. Warm bulbs. Now think about how many lights are in a football stadium.)

johnisgood
·
3 months ago
·
[ - ]

> Think Daylight vs. Warm bulbs. Now think about how many lights are in a football stadium.

This example helped, I think. Thanks!

·
3 months ago
·
[ - ]

·
3 months ago
·
[ - ]

PaulHoule
·
3 months ago
·
[ - ]

It's about more than color correction. The software they have lets people in the control room set all the parameters on the cameras, so instead of having a camera operator do it behind the camera they do it from the control room, which might even be on another continent.

andruby
·
3 months ago
·
[ - ]

With all the switching between camera angles during a sports broadcast, the difference in white balance, brightness and color grading would be really distracting and annoying.

mikedelfino
·
3 months ago
·
[ - ]

Perhaps it’s so important that you take it for granted, even though it took a great deal of effort from others to make sure you don’t notice the problem in the first place.

YesBox
·
3 months ago
·
[ - ]

This lovely color correction article was posted on HN years ago: https://prolost.com/blog/2010/2/15/memory-colors.html

dagi3d
·
3 months ago
·
[ - ]

It if wasn't that important no one would buy it. Doesn't matter how good your sales people are, if the product doesn't solve a real problem, it's very unlikely you will sell it in a sustainable way.

johnisgood
·
3 months ago
·
[ - ]

> if the product doesn't solve a real problem, it's very unlikely you will sell it in a sustainable way.

I do not believe this. If you look around, there are many non-issues being sold as real problems[1], and people buy it. People buy all sorts of crap, that is just consumerism in effect. If you did sales, you probably know this. Same thing with "bullshit jobs". Perhaps "sustainable" is the keyword here, but I am not so sure about that either.

[1] Snake-oils comes to mind. Pretty flourishing business.

guiriduro
·
3 months ago
·
[ - ]

On the whole though, B2B sales (as this would be) are generally much more rational than consumer emotive/fantasy-driven marketing and sales. Its not like perfume sales. There are usually several people who hold each other to account and need rational justification for things, products need to demonstrate their value and meet several metrics that are reasonably objective. Not that emotion/status/gut holds no part, but that it biases the decisions to a much lower degree in B2B.

·
3 months ago
·
[ - ]

iamacyborg
·
3 months ago
·
[ - ]

AG1 is a spectacularly good example of a marketing led product that doesn’t solve any real problems.

goatlover
·
3 months ago
·
[ - ]

(Would) Solve the problem of paying labor.

fuckbrownpeople
·
3 months ago
·
[ - ]

[dead]

chamomeal
·
3 months ago
·
[ - ]

This is sorta beside the point about color-grading, but I don't entirely agree about a product needing to solve a real problem.

I worked a startup that had decent tech, but a shit product. Wasn't focused enough to really solve clients' issues. Maybe alleviated some issues, but also introduced more. It was disliked by the people who actually had to use it. But our sales guy was really good at convincing those peoples' bosses that it would make the company more money.

It was a total top-down sales approach. Throw a bunch of buzzwords at the founder/CFO/boss, they force it on the people actually doing the work. I hated it, and it worked so well that fixing the product was never a priority. It was always new "features" to slap on more buzzwords to the sales pitch. I really think it could've been a good product, too!

davidbou
·
3 months ago
·
[ - ]

We're still a rather small team of mainly tech people, we don't have a single sales person in the traditional way (not yet). Ghislain was able to develop an architecture that we could count on as being reliable while being able to quickly experiment in all directions and build on top of what was started. We were never really afraid of major failures as the system has been proven to be robust after the first 2 years (everything was started from scratch, including hardware).

As we were able to very quickly respond to customer demands for anything special that they would need, they ended-up being our main sales channel by recommending the solution further. And nearly 10 years after, we're still pretty much on the same model, trying to keep up with the developments, delivering products and supporting our customers. The website is outdated and it's been years we're trying to make any progress there, eventually we'll succeed at that.

dr_kiszonka
·
3 months ago
·
[ - ]

Congrats on an incredibly impressive and technically complex product.

Operating such high visibility events like the Olympics sounds pretty nerve-wracking. How much of an issue is security for you? Do you experience any attacks?

davidbou
·
3 months ago
·
[ - ]

Security has been a hot topic for the past few years, but it's getting even more attention now. Fortunately, it’s mostly a concern for production facilities, and the most effective solution is often complete isolation—most production networks don’t have internet access at all.

With the rise of remote production (where the control room is located at headquarters while cameras and microphones are on-site at stadiums), broadcasters are implementing VPNs, private fiber connections, and other methods to stay largely separate from the public internet.

In our case, the only part that uses the public internet is the relay server, which is necessary when working over cellular networks. Security is one of the main reasons we haven’t expanded this service into a full cloud portal yet—it’s much easier to secure a lightweight data relay with no database, running on a single port, than to lock down a larger, more complex system.

ghislainle
·
3 months ago
·
[ - ]

I want to add that the relay server is never handling any customer secrets (so a low value target), and we have techniques in place to reduce the probability of DoS (increase the cost to the attacker).

So even if someone would be able to break into the server through the small attack surface, he would not be able to change any setting on any of our customer's devices. Or even read any status either. Of course, if someone can break into our server, the DoS is inevitable, but so far this never happened.

laserbeam
·
3 months ago
·
[ - ]

The article doesn't go in detail about how they solve that. But that's the key problem they highlight as being solved. It's a product which manages multiple cameras for events, and color correction is one of those "obvious in hindsight" problems to be solved.

johnisgood
·
3 months ago
·
[ - ]

Managing multiple cameras is definitely something I would consider important, but keep in mind I am not knowledgeable at all about the entertainment industry.

anttiai
·
3 months ago
·
[ - ]

I interpreted that Cyanview controls color settings in cameras, but video doesn’t run through their product. I wonder if an AI model could efficiently balance colors after the video mixer, especially if the incoming feed was in 10-bit color depth and the outgoing feed 8-bit.

dagi3d
·
3 months ago
·
[ - ]

It if wasn't that important no one would buy it. Doesn't matter how good your sales people are if the product doesn't solve a real problem, it's very unlikely you will sell it in a sustainable way.

dekhn
·
3 months ago
·
[ - ]

there is also a hardware dongle on the camera to give the operations team remote control to settings that aren't internet accessible.

jjulius
·
3 months ago
·
[ - ]

Upvoting this because I don't think it's fair to downvote someone for trying to understand why something might be more important than they otherwise would've thought.

·
3 months ago
·
[ - ]

sschueller
·
3 months ago
·
[ - ]

Someone tracked every single camera shot during the halftime show: https://www.youtube.com/watch?v=YXNWfFtgbNI

jcalx
·
3 months ago
·
[ - ]

You can also see what it looks like from the control room on Hamish Hamilton's YouTube channel, with the AD calling out shots and all: https://m.youtube.com/watch?v=gfjWjkTP4p8. (Hamish Hamilton has directed every Super Bowl halftime show since 2010.)

oplav
·
3 months ago
·
[ - ]

John DeMarsico directs the SNY broadcasts for the NY Mets and sometimes posts behind the scenes for how all the cameras come together into a production. I think they are pretty interesting to watch.

https://x.com/SNYtv/status/1832250958258036871

ZeWaka
·
3 months ago
·
[ - ]

> Without any marketing, it earned a reputation among seasoned professionals and became a staple at the world’s top live events.

Sounds like the entertainment industry. Everyone truly knows everyone, especially when you're working on the same show with the same crew year after year.

It's definitely a family of sorts.

pharrington
·
3 months ago
·
[ - ]

"Without any marketing" is also an obvious lie, since Cyanview has a storefront website and marketing posts on Linkedin.

davidbou
·
3 months ago
·
[ - ]

I don't think it was meant to be taken literally (we didn’t write the article). We’d actually love to do more marketing, we barely have time for it though. We don’t have a storefront website—just a basic site with outdated product info but we dedicate all our efforts to the support section. We post on LinkedIn a couple times a year to reassure everyone that we're still alive, but that’s hardly a real marketing strategy. Currently our sales come from word-of-mouth and industry connections, not much from marketing. Hopefully, we’ll find the time to step it up in the future!

pharrington
·
3 months ago
·
[ - ]

Yeah, reflecting on it, the article was obviously just being hyperbolic - I think I'm just on a hair's trigger for anything bordering falsehoods because of the current state of my country (USA). Also "storefront" was a poor word choice - I was originally going to say "professional," but decided against it for some reason.

Regardless, just keep making quality software that sells itself!

lawik
·
3 months ago
·
[ - ]

I admit to hyperbole.

The interesting part is that the main marketing and sales is by word-of-mouth and quality of product. All the hardware is not even on the website, which was very confusing to my understanding when writing. It makes sense under the resource constraints.

caseyohara
·
3 months ago
·
[ - ]

Also this article seems more like an ad for Cyanview than for Elixir. Smells like content marketing to me.

mtndew4brkfst
·
3 months ago
·
[ - ]

Most of the Elixir in Production posts on the official blog (including this one) come off this way, IMO:

https://elixir-lang.org/blog/categories.html#Elixir%20in%20P...

It's pretty much just appeal to authority. "These people are successful and they used Elixir, why don't you?"

lawik
·
3 months ago
·
[ - ]

Really?

I can spill all the juicy details as the main author and instigator.

Cyanview reached out to me to help find a dev a while back. Hearing about their customers I knew it would be a decently big splash for Elixir. I was surprised that they were unknown and had this succcess with big household name clients.

I like them. I like their whole deal. Small team, punching above their weight. Hardware, software, FPGAs and live broadcasts. The story has so much to it. David and team have been great sports in sharing their story.

Fundamentally I care more about Elixir adoption though, I reached out to the Elixir team and offered to interview them and write something up.

A case study about successful Elixir production deployments is definitely content marketing. But for Elixir. It is a very common question when mentioning a less common language. "Who uses this?" I thought it was a very interesting case. Glad to have it documented. The style of a case study won't suit everyone.

I suppose "without any marketing, before _this_" would have been funny.

Capricorn2481
·
3 months ago
·
[ - ]

Thanks for your work, I wish other language communities did this.

ram_rar
·
3 months ago
·
[ - ]

Great to see Elixir gaining traction in mission-critical broadcast systems! I wonder, how much of Cyanview's reliability comes from Elixir specifically versus just good implementation of MQTT? and is there any specific Elixir features were essential that couldn't be replicated in other languages?

ghislainle
·
3 months ago
·
[ - ]

Main developer here.

We use MQTT a lot, it is really a central piece of our architecture, but Elixir brings a lot of benefits regarding the handling of many processes which are often loosely coupled. The BEAM and OTP offer a sane approach to concurrency and Elixir is a nice language on top. Here is what I find the most important benefits:

- good process isolation, even the heap is per process. This allows us to have robust and mature code running along more experimental features without the fear of everything going down. And you still have easy communication between processes

- supervision tree allows easy process management. I also created a special supervisor with different restart strategies. The language allows this and then, it integrates as any other supervisor. With network connections being broken and later reconnected, the resilience of our system is tested regularly, like a physical chaos monkey

- the immutability as implemented by the BEAM greatly simplifies how to write concurrent code. Inside a process, you don't need to worry about the data changing under you, no other process can change your state. So no more mutex/critical sections (or very little need). You can still have deadlock though, so it is not a silver bullet

somethingsome
·
3 months ago
·
[ - ]

Hey it's nice to see a very successful business in Belgium in this space!

I work at the university and we build acquisition systems with exotic cameras and screens, do you think we could meet one time to discuss possible (commercial and research) projects ?

dist-epoch
·
3 months ago
·
[ - ]

Have you looked at stuff like NATS/Jetstream instead of raw MQTT?

davidbou
·
3 months ago
·
[ - ]

MQTT is used for messaging between processes on the embedded device itself, which can be the remote control panel, or a camera node. The panel itself is driven by a microcontroller which gets all the parameters to display and request changes through MQTT. If the camera is controlled locally, like on a LAN, then another process picks up the action and handles the communication with the camera. If the camera is remote (over cellular for example), we don't rely on the bridging functionality that some MQTT brokers provide but rather use Elixir sockets to send the data over. Typically parameter changes would be sent towards the camera and new status would be populated back to everyone. In most cases it's been a single control room, sometimes 2 at different locations, and one camera site so the needs for a wide distributed architecture hasn't been felt yet.

One of the next steps would be to have a real cloud portal where we could remotely access cameras, manage and shade them from the portal itself. In this context we have been advised to look at NATS. Remote production or REMI is now getting more traction and some of our clients handle 60+ games at the same time from a central location. That definitely creates new challenges as centralizing everything for control is a need but keeping distributed processes and hardware is key to keep the whole system up if one part fails.

travisgriggs
·
3 months ago
·
[ - ]

Which MQTT library are you using? Did you roll your own?

travisgriggs
·
3 months ago
·
[ - ]

I ask because we’ve taken up Elixir and we use MQTT (also with a custom RPC on top) to coordinate ag irrigation. But I’ve been very frustrated with the state of MQTT implementations on Elixir (or lack of good documentation). I’m wondering if I’ve just missed an obvious one. We currently use a fork of Tortoise, but it has some issues.

Feel free to contact me, details in profile.

ghislainle
·
3 months ago
·
[ - ]

We also use a fork of Tortoise, wrapped in some easier to integrate code (for our use case).

At first, we used an erlang lib emqtt, but it was left unmaintained and then removed from github. We had to switch to something else. Not completely happy, but it works for us

joehosteny
·
3 months ago
·
[ - ]

Which version of emqtt are you referring to? We are successfully using 1.11.0 (with AWS IoT Core), which I believe is the "blessed" version for elixir.

ghislainle
·
3 months ago
·
[ - ]

I do not remember all the details, but I think it was during the transition from MQTT 3.1 to 5.0. We only use 3.1 and the 5.0 code was not yet finished, emqtt had a hard coded a dependency with an old lib, which we wanted to upgrade. There were no hex package, it was pulled from github. And then the repo on github disappeared (or maybe moved?)

I will have a look at this new package, it looks promising

joehosteny
·
3 months ago
·
[ - ]

Ah, yes - we are pulling from the tag on GH. However, I believe the hard dependency on the older version of gun was fixed recently. IIRC, that was what prevented a proper package at the latest version from being on hex.

joehosteny
·
3 months ago
·
[ - ]

I was able to update to the latest tag of emqtt on GH (1.14.4), which was not previously possible. I don't believe there are any blockers now to a package on hex.pm, so hopefully this is made available soon.

jerf
·
3 months ago
·
[ - ]

This is Elixir/Erlang/BEAM's core use case, the thing it was designed to do, coordinating and routing with failover and fallbacks a large number of realtime feeds. The original use case was phone calls, but other than the fact these video streams are much much larger per second, most of the principles carry over.

As much as I am a critic of the system, if this is your use case, this is out-of-the-box a very strong foundation for what you need to get done.

davidbou
·
3 months ago
·
[ - ]

Yes, this was one of our initial considerations when we first started, and the telecom analogy of the original Erlang development application was one of the main reasons we took this approach. Now, we only "stream" metadata, control data, and status. Even though we manage video pipelines and color correctors, the video stream itself is always handled separately.

For anyone interested in the video stream itself, here's a summary. On-site, everything is still SDI (HD-SDI, 3G-SDI, or 12G-SDI), which is a serial stream ranging from 1.5Gbps (HD) to 12Gbps (UHD) over coax or fiber, with no delay. Wireless transmission is typically managed via COFDM with ultra-low latency H.264/H.265 encoders/decoders, achieving less than 20ms glass-to-glass latency and converting from/to SDI at both ends, making it seamless.

SMPTE 2110 is gaining traction as a new standard for transmitting SDI data over IP, uncompressed, with timing comparable to SDI, except that video and audio are transmitted as separate independent streams. To work with HD, you need at least 10G network ports, and for UHD, 25G is required. Currently, only a few companies can handle this using off-the-shelf IT servers.

Anything streamed over the public internet is compressed below 10 Mbps and comes with multiple seconds of latency. Most cameras output SDI, though some now offer direct streaming. However, SDI is still widely used at the end of the chain for integration with video mixers, replay servers, and other production equipment.

jerf
·
3 months ago
·
[ - ]

I was tempted to go into the fact that the video streams wouldn't pass through BEAM, because that would be crazy, but I cut it out.

AIUI, technically, the old phone switches worked the same way. BEAM handled all the metadata and directed the hardware that handled the phone call data itself, rather than the phone call data directly passing through BEAM. In 2025 it would be perfectly reasonable to handle the amount of data those switches dealt with in 2000 through BEAM, but even in 2025, and even with voice data, if you want to maximize your performance for modern times you'd still want actual voice data to be handled similarly to how you handle your video streams, for latency reliability reasons. By great effort and the work of tons of smart people, the latency sensitivity of speech data is somewhat less than it used to be, but one still does not want to "spend" your latency budget carelessly, and BEAM itself is only best-effort soft realtime.

zaik
·
3 months ago
·
[ - ]

> couldn't be replicated in other languages?

All programming languages can do any task. It's about how easy they make that task for you.

zwnow
·
3 months ago
·
[ - ]

Yea and with that i'd think it would be a pain in the ass trying to replicate BEAM behavior in different langs

thibaut_barrere
·
3 months ago
·
[ - ]

This is true in general but only until it gets false.

For instance, Elixir supports compilation targeting GPUs (within exactly the same language, not a fork).

Most languages do not allow that (and for most it would be fairly hard to implement).

goatlover
·
3 months ago
·
[ - ]

For any finite, computable task, as long as the language has access to the hardware that can perform the task in practical time, assuming the language doesn't present any compilation or memory issues to take advantage of said hardware in practical time for the task to be worth computing.

jdufawdfas
·
3 months ago
·
[ - ]

[flagged]

dorian-graph
·
3 months ago
·
[ - ]

> and is there any specific Elixir features were essential that couldn't be replicated in other languages?

From the article:

> “Yes. We’ve seen what the Erlang VM can do, and it has been very well-suited to our needs. You don’t appreciate all the things Elixir offers out of the box until you have to try to implement them yourself.

innocentoldguy
·
3 months ago
·
[ - ]

I have implemented Elixir in critical financial applications, B2B growth intelligence applications, fraud detection applications, scan-and-go shopping applications, and several others.

In every case, like the engineering team in this article demonstrates, the developer experience and end results have exceeded expectations. If you haven’t used Elixir, you should give it a try.

Edit: Fixed an editing error.

roughly
·
3 months ago
·
[ - ]

Elixir and Erlang have always garnered a lot of respect and praise - I’m always curious why they’re not more widely used (I’m no exception - despite hearing great things for literal decades, I’ve never actually picked it up to try for a project).

solid_fuel
·
3 months ago
·
[ - ]

I've thought about this a lot, and I think that part of what hurts Erlang/Elixir adoption is the scale of the OTP. It brings a ton of fantastic tools, like supervision trees, process linking, ETS, application environments & config management, releases, and more. In some ways it's closer to adopting a new OS than a new programming language.

That's what I love about Elixir, but it means that selling it is more like convincing a developer who knows and uses CSV to switch to Postgres. There's a ton of advantages to storing data in a relational DB instead of flat files, but now you have to define a schema up front, deal with table and row locking, figure out that VACUUM thing, etc.

When you're just setting out to learn a new language, trying to understand a new OS on top hurts adoption.

AlchemistCamp
·
3 months ago
·
[ - ]

I think most people tend to stick with what they learn first or hop to very similar languages. Schools generally taught Java and then more recently Python and JS, all of which are relatively similar.

Unless someone who knows those three languages is curious or encounters a particular problem that motivates them to explore, they're unlikely to pick up an immutable, functional language.

innocentoldguy
·
3 months ago
·
[ - ]

I think you’re right. I only picked up Elixir about 10 years ago after getting frustrated with Python’s GIL and Java’s cumbersomeness, and feeling that object oriented programming over complicates things and never lived up to its hype.

I have never looked back.

Elixir is an absolute joy to use. It simplifies multi-threaded programming, pattern-matching makes code easier to understand and maintain, and it is magnitudes faster to code in than Java. For me, Elixir’s version of functional programming provides the ease of development that OOP promised and failed to deliver.

In my opinion, Elixir is software engineering’s best kept secret.

joehosteny
·
3 months ago
·
[ - ]

We use it in our robotics startup, and I wholeheartedly agree.

As an example, we just rolled out a feature in our cloud offering that allows a user to remotely call a robot to a specified waypoint inside a facility, and show real time updates of the robot's position on its map of the world as it navigates there. We did this with just MQTT, LiveView, Phoenix PubSub, and a very small amount of JS for map controls. The cloud portion of this feature was basically built in 2-3 weeks by one person (minus some pre-existing code for handle displaying raw map PNGs from S3, existing MQTT ingress handling, etc.).

Of course you _can_ do things like this with other languages. However, the core language features are just so good that, for our use cases, it blows the other choices out of the water.

_rs
·
3 months ago
·
[ - ]

Would you be open to dropping an email address? I'd love to chat about your experience with Elixir for financial applications if you have any time

eggy
·
3 months ago
·
[ - ]

Would Gleam be practical for a similar application aside from the OTP/BEAM runtime? I am guessing you'd have to leverage Elixir libraries that are not present for Gleam yet, and you might have slower compile times due to static typing, but you'd catch runtime errors sooner. Would it be more of a debugging vs. fast dynamic iteration trade-off? I am looking to settle on either Gleam or Elixir. I liked Gleam's original ML syntax before, but I like static typing. Thoughts? I am replacing C with Zig, and I am brushing up on my assembly by adding ARM to my x64 skill set.

AlchemistCamp
·
3 months ago
·
[ - ]

> you'd catch runtime errors sooner

I don’t think there’s any evidence whatsoever that you would catch runtime bugs sooner with Gleam than with Elixir (or Erlang). Erlang’s record for reliability is stronger than many statically typed languages, including even Java.

There is a certain class of errors static types can prevent but there’s a much larger set of those it can’t. To make the case for a language like TS/Java/Swift/Golang or Gleam actually resulting in fewer runtime defects than Erlang or Elixir, I’d want to see some real world data.

eggy
·
3 months ago
·
[ - ]

It depends on what “sooner” means to you. Gleam catches more before the code runs; Elixir catches them when they happen but recovers gracefully. If you’re paranoid about bugs reaching users, I would think Gleam’s your pick, no? If you trust your tests and love dynamic freedom, Elixir should be fine. I don't have much experience with either language. I did more in Erlang 8 years ago, but not much. I am on the edge of choosing Gleam over Elixir. It's mainly subjective: I prefer the syntax in Gleam, although I liked the original ML-like syntax when it first came out.

__jonas
·
3 months ago
·
[ - ]

> There is a certain class of errors static types can prevent but there’s a much larger set of those it can’t

Maybe you can go into this more, but I don't really understand what that means, what is this larger set of runtime errors that can't be prevented by static typing?

I use a bit of Elixir, and I'd say most of the errors I'm facing at runtime are things like "(FunctionClauseError) no function clause matching", which is not only avoidable in Gleam, but actually impossible to write without dipping into FFI.

I'm excited for more static typing to come into Elixir, as it stands I'm only really confident about my Elixir code when it has good test coverage, and even then I feel uneasy when refactoring. Still a fun language to use though.

AlchemistCamp
·
3 months ago
·
[ - ]

Depending on the language and the static type system, they typically can't prevent errors related to:

- Logic errors

- Null or Undefined values (prevented in many newer languages)

- Out-of-bounds errors

- Concurrency-related issues

- Arithmetic errors (undefined operations, integer overflow, etc)

- Resource management errors

- I/O errors

- External system failures

- Unhandled exceptions (e.g., RuntimeException in Java)

If you use a language like Rust, you can get help from the type system on several of these points, but ultimately there's a limit to what type systems can do before becoming too complex.

ghislainle
·
3 months ago
·
[ - ]

One criticism I have with elixir is the lack of typing (they are working on it now, but I have yet to use it). So yes, I think gleam would be nice. But when we started, it was not even version 0.1 (and I had not heard of it)

I suppose we can have a mixed language project, with erlang, elixir and gleam. Not sure about the practicality of it though

eggy
·
3 months ago
·
[ - ]

Amazing work, and certainly for such a tentacled project good enough is good enough. I only brought up Gleam vs. Elixir because I am going to pick one to learn this year. I've played with LFE too, and as I wrote earlier, I played with Erlang for a bit.

widdershins
·
3 months ago
·
[ - ]

Gleam has a subset of OTP functionality already [1]. It also compiles extremely quickly. I haven't made any huge projects yet, but I've used some fairly chunky libraries and everything compiles super quick.

[1] https://github.com/gleam-lang/otp

nesarkvechnep
·
3 months ago
·
[ - ]

It’s subpar at the moment.

JSR_FDED
·
3 months ago
·
[ - ]

It’s always surprised me how the world of digital video is a cousin of IT yet is impenetrable to people outside the video industry. How they refer to resolutions, colors, networking, storage is (almost deliberately?) different.

davidbou
·
3 months ago
·
[ - ]

This gives an idea of the parameters we cover for roughly 200 different models of broadcast cameras we might have so far. These are only to tweak the image quality which is the job of the video engineer (vision engineer in UK). We usually don't cover all the other functions a camera has, which could be more intended for the camera operator himself. The difficulty is to bring some consistency with so many different cameras and protocols.

https://pastebin.com/cgeG2r0k

noisy_boy
·
3 months ago
·
[ - ]

Do you "normalize" the parameters to some intermediate config so that everything behind that just needs to work with that uniform intermediate config? What about settings that are unique to a given device?

davidbou
·
3 months ago
·
[ - ]

That was the idea—we started by normalizing all the standard parameters found in most cameras. The challenge came when we had to incorporate brand-specific parameters, many of which are only used by a single manufacturer. Operators also weren’t keen on having values changed from what the camera itself provided, as some settings serve as familiar reference points. For example, they know the right detail enhancement values to use for football or studio work. So, we kept normalization for the key functions where it made sense, but for other parameters, we now try to stay as close as possible to the camera’s native values.

As for the topics on MQTT, they function as a kind of universal API—at least internally. Some partners and customers are already using them to automate certain functions. However, we haven’t officially released anything yet, as we wouldn’t be able to guarantee stability or prevent changes at this stage.

noisy_boy
·
3 months ago
·
[ - ]

I have noticed that you and your team's answers are detailed and insightful - much appreciated.

walrus01
·
3 months ago
·
[ - ]

People who only ever work with 'consumer' video equipment need extra training and a back-to-basics set of reading material to understand things like the difference between a 420 and 422 color space, or why serious cinema cameras record video ungraded, or what the color grading process in a post-production workflow looks like (and the different aesthetic choices of grading that might be possible). That's before even getting into things like raw yuv/y4m uncompressed video, or very-high-bitrate barely compressed video, generating proxy footage to work with in an editor because the raw is too much of a firehose of data to handle even on a serious workstation....

I would say that unless you have a professional reason, there's very little benefit to the average end-user to do a deep dive into it. If your intention is to spend $7000 on a RED camera and then $13,000 on lenses, gimbal, cage, follow focus, matte box, memory cards etc to make a small and cost effective single camera production package, then by all means, dig into it.

keane
·
3 months ago
·
[ - ]

4:2:0 vs 4:2:2 for anyone curious: https://youtu.be/7JYZDnenaGc?feature=shared&t=101 and https://www.red.com/red-101/video-chroma-subsampling

dist-epoch
·
3 months ago
·
[ - ]

Grading is abused so much these days, it's like a curse. You have a pristine video chain, only to turn it all into yellow/blue at the end.

davidbou
·
3 months ago
·
[ - ]

There's a notable difference between shading and grading. Shading is for the TV industry where you adjust all cameras to match perfectly the exposure, tone curve and colors. So when switching between camera angles you don't notice any difference in skin tone or detail, and the green of the grass and blue of the sky are all the same. Also a very important point is to get the color of the sponsor logos right, that would be where to start sometimes... There's less creativity here, you have mainly to follow the standards like ITU-R BT.709 or for HDR HLG and ITU-R BT.2020.

Grading is the creative process of adding a look to your production, which is usually handled in post production but there are now ways to do it live, although by using similar tools as the post production software. And they still re-do it in post production. This is used live for concerts and fashion shows.

There is a significant distinction between shading and grading.

Shading is essential in the TV industry, where the goal is to ensure all cameras are perfectly matched in exposure, tone curve, and colors. This ensures seamless transitions between camera angles, maintaining consistency in skin tones, fine details, and the color of grass and sky. A crucial aspect of shading is accurately reproducing sponsor logos' colors, which can sometimes be the starting point as that's where the money comes from. Creativity plays a lesser role here, as the focus is on following industry standards such as ITU-R BT.709 for SDR or ITU-R BT.2020 and HLG for HDR.

Grading, on the other hand, is a creative process meant to give a distinctive look to a production . Traditionally done in post-production, it can also now be applied in real time using tools similar to those found in post-production software. Despite this, it is often still refined further in post. Live grading is commonly used for events such as concerts and fashion shows, where you want to look different from TV productions.

aalam
·
3 months ago
·
[ - ]

TIL about shading, and am surprised how less I've seen this term in grading tutorials. While different, I feel like shading is something that should be learnt before grading.

PS You might have pasted two different answer drafts above. Paras 1,4 and 2,5 deliver similar information

·
3 months ago
·
[ - ]

markb139
·
3 months ago
·
[ - ]

30 odd years ago, part of my role was to colour balance cameras in a studio environment. We didn’t need computers - but at most there were only 5 cameras :)

frankfrank13
·
3 months ago
·
[ - ]

Really cool piece, this jumped out to me:

> The devices in a given location communicate and coordinate on the network over a custom MQTT protocol. Over a hundred cameras without issue on a single Remote Control Panel (RCP), implemented on top of Elixir’s network stack.

Makes sense! MQTT is, if I understand right, built on TCP. Idk if I would have found the same solution, but its seemingly a good one

·
3 months ago
·
[ - ]

notepad0x90
·
3 months ago
·
[ - ]

What is being used in similar broadcast setups outside of this Superbowl?

davidbou
·
3 months ago
·
[ - ]

Major events use it for all kind of soecialty cameras as they aready have the technology for the main studio cameras. So we had to develop solutions for everything that was not working. And major productions have budgets for all kind of new toys:mini-cams, drones, cable cams, now cinematic look from small mirrorless cameras, slow motion, etc. That opened up a whole lot of possibilities to be creative but you have to be as reliable as the main cameras and aim for the best image quality.

Now the same products are used for very small productions that don't have the budget for any studio camera (look typically at 50k+ for a camera without lens). In that case we try to provide a similar user experience and functions but with much more ffordable cameras.

Finaly more and more live productions are now handled using cine style cameras which don't have the standard broadcast remote panels and that's another area we cover, by combining camera control with control of many external boxes like external motors to drive manual lenses or 3D Lut video processors. Applications are on fashion shows, concerts, theater, Churches, studio shows, even corporate.

In the end Elixir is used for a lot of small processes which handle very low level control protocols. And then on top of that add a high-level of communication between devices either on local networks or over the cloud.

mcintyre1994
·
3 months ago
·
[ - ]

> Now the same products are used for very small productions that don't have the budget for any studio camera

Just out of curiosity, what would be examples of very small productions here? Would an independent YouTube channel with great production quality be using this?

davidbou
·
3 months ago
·
[ - ]

Typically 4 cameras setups where a single remote can control all of their cameras. For classical concert, they would use 2 PTZ robotic cameras and 2 mini cams on some artists and instruments. There is no camera operator at the camera side (for costs reasons) so a single operator has to do it all.

One important point, if you are not live, then there's usually the possibility to adjust everyting manually on the camera and then finish in post production so our remotes are nearly never used outside the constraints of live productions.

One the opposite direction, I heard that they had around 250 cameras on Love Island but you can pretty much control everything from one or 2 remotes as there isn't a need for a lot of changes at a single time. The action only happens in front of a few of them. That said, we still have 250 processes running and controlling these cameras continuously.

lawik
·
3 months ago
·
[ - ]

The extreme upper range of YouTube channels sometimes use a RED camera. I've not seen a lot of ARRI for YouTuber behind-the-scenes. Usually they go with high-end prosumer full-frame mirrorless Sony, Canon or equivalent. Those are probably below what the Cyanview's stuff is intended for or just on the edge of what gets used.

I suppose FX30, FX3 and FX6 is in Sony's cinema line and may have all the color stuff that these systems want to tweak but I'm not sure. These cameras do get a fair bit of play on YouTube.

imjonse
·
3 months ago
·
[ - ]

According to the article this software is used for all major sporting events.

DerekL
·
3 months ago
·
[ - ]

The title is misspelled, should be “Super Bowl”, two words.

xlii
·
3 months ago
·
[ - ]

I won’t let a friend start development in Elixir.

Let me get it out: I love BEAM. OTP is awesome and one of the best systems in its kind. I was completely enamored with Elixir years ago as a modern Erlang which excited me to the bone.

It’s no longer the case. When you get into non-trivial things there are many sharp edges and paper cuts. Some from the top of my head:

- it’s impossible to disable warnings - test runs are often highly verbose because libraries ignore them and (as discussed somewhere in forums) warnings are deemed useful so they can’t be disable

- the only way to catch some, important ones, in CI is to use "warnings-as-errors" …

- so one cannot use deprecation flags because it’s also a warning

- when having non trivial ecosystem one cannot selectively deprecate and get errors, this has to be a human process (remember to replace…)

- when doing umbrella tests on non compiled code seed influences order of compilation

- this order of compilation influences outcome and can lead to buggy code

- dependency compilation is not parallelized - takes a lot of time and uses 10% of CPU

- compilation process can break and Elixir isn’t aware that it was interrupted - this means that it doesn’t try to compile properly app/dependency but instead tries to release crippled one

- documentation is hard to use - searching shows random versions and there isn’t even a link to „open newest”

- searching in docs often not finds the keywords you can actually see

- a lot of knowledge is implicit (try checking if you can dynamically add children to a Supervisor)

- sidebar with new ExDocs break for some reason so there is no navigation

- there is no way to browse navigation outside this broken ExDocs which outputs only HTML and LSP

- LSP is afterthought, there are few but neither works well

- Dialyzer/dialyxyr typespecs are useless most of the time

- Squiggly arrow works weird (i.e. ~0.3 might catch 0.99) - my colleague recently mentioned Renovate not picking it up

I could go on and on. I’m doing plenty of research so I’m working with various languages including „niche” ones like Prolog, OCaml, Clojure, Cuelang. Recently I’ve been developing tooling in Go and many core systems are developed in Rust in Elixir, and I work on the latter often.

In principle Elixir is awesome, but has the worst developer experience of all. Sometimes it takes 4h to prepare and push release. Tooling I’m working on can do the same in 5 minutes - I’m parallelizing processes in containers, making idempotent output artifacts and heuristic failure detection to retry on flakiness. When switching between Go and OCaml you can sense how tooling cares for me and my time. Often I bounce off forums where people’s need are shrugged off as non-essential, treating those who came as uneducated juniors (because who in sane mind would like to have a parallel dependencies compilation or disable compilation time warning).

There is nothing better than BEAM, but (for me) Elixir got much worse over the years.

coastalpuma
·
3 months ago
·
[ - ]

That's funny cause in my experience, Elixir docs are some of the best I've used, and a major strength of the ecosystem which I miss when I'm not writing Elixir. HexDocs has been integrated and standardized from the start, leading to a consistent experience across all packages. The formatting is clear and visually pleasing. All libraries you would want to use have adequate docs. It also includes sections for guides and cheatsheets, avoiding a bifurcation between those an API docs. In general, it's really easy to find the function being called because Elixir mostly avoids duck typing, and you can just directly look up the module in question. All in all, I seriously miss the docs when programming for example in Ruby (slow, ugly, rdocs which many projects don't adopt or adopt half-heartedly), or especially Javascript. In the latter case, the trend of making flashy marketing sites instead of using a standardized tool is a serious pain. I really wish the community would settle on a high quality standard. Deno's JSR docs looks kind of promising in how it's positioned, but I don't think the formatting and usability is that great so far.

simoncion
·
3 months ago
·
[ - ]

I agree about the content of Elixir's docs. They seem to have inherited the habit of documenting things well, which is a fuckin great habit to have.

But. Honestly, I hate the hell out of how Elixir's docs look, and am pretty unhappy with how Erlang's docs started aping the style.

Seriously, check out the EBNF-esque description of the types for this function: <https://web.archive.org/web/20170509122932/http://erlang.org...>. Notice also how some of the documentation for all of the 'request' variants fits on a single screen. Scroll to the top of the page and notice the regular formatting and prominent, clear section headings. See how the significant data types used by the module are described in one place. Scroll to the bottom and observe the "See also" section. Notice the very clear navigation on the left-hand side that gives you obvious springboards to any part of the documentation.

Compare that to this: <https://www.erlang.org/doc/apps/inets/httpc.html#request/1>. Notice how you get a function or three of typespec on a screen. There's so much scrolling to get to the function's behavioral description. And the EBNF has been replaced with raw typespecs! If you understand how to read Erlang typespecs, it's totally possible to read the function type. But, like, if you're starting out with the language, this is WAY harder to read. Not to mention the loss of the clear headings at the top of the document and the centralized list of data types, as well as the "phone style" navigation widget on the left that obscures at least as much as it reveals.

seer
·
3 months ago
·
[ - ]

About Dializer errors - coming from typescript I’ve had the exact opposite experience.

Now I’m pretty hand with types, especially typescript types, managed to do some pretty complex stuff like using TS types to statically verify complex OpenAPI (swagger) apis on both client and server - basically re-implementing it all for compile time checking.

When I started using Elixer/Dialyzer types I would get into situations where I was like “this stupid error here! It doesn’t understand exactly what I’m trying to do and complains for no reason”. After delving deeper though I found that in 90% of cases it was actually a bug it was that I misunderstood/forgot something.

After that I stated respecting the dialyzer more. Hopefully with the new built in types it would be even better more user friendly.

1oooqooq
·
3 months ago
·
[ - ]

"coming from typescript" in this conversations feels like bait

vendiddy
·
3 months ago
·
[ - ]

Some of your points are valid (like the LSP) but for our company I've found Elixir has been a great developer experience if you take the good with the bad.

I'm curious what you're working on with Elixir because my experience overall has not been the same.

For example our releases take 10 minutes to cut.

Or when I ask questions on Discord I tend to get answers pretty quickly.

xlii
·
3 months ago
·
[ - ]

I can only say that it’s in the definition of the "critical system". I wouldn’t risk exposing minimal implementation detail to third party (i.e. Discord).

It’s not a specific build but the whole thing, e.g. build takes 10 minutes, but test suite takes 10 minutes as well. Test suite can fail because of a bug or (more often) because of some race condition or build issue.

As I mentioned - today I’m working with Go. It’s nowhere near BEAM but it’s not critical, I never spent more than 15 minutes debugging Go race condition.

And yes, code is at fault, but I’d expect ecosystem to help fixing it, but we have none. E.g. circular dependendencies in umbrella. You can have them. You can print them. There is no warning. They result in inconsistent builds and 40s LSP check loops, during which I have zero access to documentation.

But if I use arrow for map I will get a warning and a compilation error.

josevalim
·
3 months ago
·
[ - ]

> And yes, code is at fault, but I’d expect ecosystem to help fixing it, but we have none. E.g. circular dependendencies in umbrella. You can have them. You can print them. There is no warning.

Can you reproduce this in any way? Because I cannot:

    mix new parent --umbrella
    cd parent
    mix new apps/foo
    mix new apps/bar

Now change `Foo.hello` to call `Bar.hello` and vice-versa. When you run `mix compile`, you will get warnings like this:

    warning: Bar.hello/0 is undefined (module Bar is not available or is yet to be defined). Make sure the module name is correct and has been specified in full (or that an alias has been defined)

But of course, the `foo` and `bar` applications do not depend on each other, you can add explicit dependencies, such as `foo` depending on `bar` or `bar` depending on `foo`, but you always get warnings. And if you literally make it a dependency cycle, the app doesn't even boot:

    ** (Mix) Could not sort dependencies. The following dependencies form a cycle: foo, bar

Apps have to be compiled in order and one will by definition be compiled before the other, so it is really unclear how you could have those circular dependencies.

But even then, let's say that somehow you have an undeclared and undefined cycle between `foo` and `bar`. The point of umbrella projects is that each app can be compiled in isolation, so you should be able to go to `bar` and compile it in isolation without `foo`, and if it is trying to invoke `foo` somehow, it will be made visible.

So yes, I would need a way to reproduce this, because there are warnings and tooling in place to deal with those. Thanks!

Etheryte
·
3 months ago
·
[ - ]

I've built things for a number of armed forces and I have a very hard time believing that what you're working on is so secret and sensitive that it's not possible to ask a third party for input on an isolated replication case of your problem. Surely you can cut away at it until nothing but dry technicalities are left? I can understand the problem in regular engineering, but in software I don't really see it.

fastball
·
3 months ago
·
[ - ]

What is your company building with Elixir?

josevalim
·
3 months ago
·
[ - ]

Thanks for the feedback! I'd like to comment on some points:

- There seems to be some confusion in relation to warnings. There are two types of warnings, compile-time warnings and runtime warnings. Compile-time warnings are emitted during compilation time and therefore should not affect test runs, aka, when you run code. Runtime warnings can be captured during tests, using `ExUnit.CaptureIO`. Deprecating modules and functions are compile-time warnings

- Indeed you need to enable `--warnings-as-errors` to halt compilation due to warnings in CI. Our philosophy here is to emit compilation warnings instead of compilation errors whenever possible, so you can run, debug, and test your code, instead of forcing your code to be pristine while you are still working on it. The focus here is precisely to provide a better developer experience. Then if you do want them to fail upfront, as in CI, you pass the flag

- "when doing umbrella tests on non compiled code seed influences order of compilation" - I am not sure what this means, sorry. Can you expand? But generally speaking our test framework randomizes test order by default, because you should not depend on order between tests or have dependencies between test files

- "dependency compilation is not parallelized" - when a dependency is compiled, the files in a dependency are parallelized, but not the dependencies themselves, so I'd very surprised if it only used 10% CPU before. In any case, a PR adding this feature was merged this week: https://github.com/elixir-lang/elixir/pull/14340. In my machine, compiling a project like Livebook uses 350% CPU without the flag above (showing some parallelism), and with the flag above set to 4, it is about 800% (250% + 250% + 150% + 150%). Note my machine has 8 performance cores and I don't get additional gains beyond 4 partitions

- "compilation process can break and Elixir isn’t aware that it was interrupted" - our tooling has code to deal precisely with this: https://github.com/elixir-lang/elixir/blob/c5c87a661efac6809.... If it still happens, it is a bug and must be reported, so we can fix it

- "try checking if you can dynamically add children to a Supervisor" - the top 4 results for "dynamically add children to a Supervisor in elixir" in Google and Ecosia lead to correct answers in the ElixirForum, StackOverflow, and the documentation. For completeness, I have also asked Claude, which gave a perfect answer (IMO): https://claude.ai/share/9d1e2ad4-2e43-4c32-a293-6fff32dd7001

- "there isn’t even a link to „open newest”" - this has been added to ExDoc, here is one example: https://hexdocs.pm/req/0.5.9/readme.html - but note we have always listed the versions in the sidebar

- "sidebar with new ExDocs break for some reason so there is no navigation" - please give an example, as this would be a bug and should be fixed, and I am not aware of any reports at the moment. Also note docs are available in the terminal, both inside `iex` or by doing `mix help SomeModule`

conradfr
·
3 months ago
·
[ - ]

Great, I hope the new "go to latest" will help Google/DDG/etc link to this instead of a random version of whatever module you look for.

xlii
·
3 months ago
·
[ - ]

Hey Jose, thanks for your comment. I actually wanted you to notice, because Elixir is still one of the best languages out there (and BEAM is the best), but developer experience IMO is deteriorating over the years.

It's not a problem for people who are completely immersed and can remember most of the guidelines/policies/idiosyncracies. Getting new people on board or even following guildeilnes is hard when work process is interrupted. Some paper cuts are brought on Elixir Forums but I've seen them evaluated as not providing enough benefit to developers or being against design - and I found them through trying to solve very similar problem. I like some changes in devex direction (e.g. recent LSP initiative), yet I think it's lagging to other languages.

I often get cut by various - often small things - but there are so many. Disappointed is amplified by good experience with other (I'll give it to them - more popular or better funded) languages. Yet given very static download count of Jason on Hex I think that rarely new projects are started in Elixir, while Erlang's popularity is slowly but visibly growing, so I don't think I'm alone in my perception.

I will try to respond succintly to followup to not blow up already large text, so let me know if you'd like more info. https://imgur.com/a/iWbTEUf I've uploaded some weirdness I experienced in recent weeks/months.

> (ExDocs breaking) ... please give an example In screenshot - happened with Chromium,Safari and Firefox ~6 months ago. Often with OTEL libraries. Today I can't reproduce, but I also have DNS blackholing enabled. Maybe it's fixed or maybe that was analytics breaking on me.

> WRT: checking if you can dynamically add children to a Supervisor

Our case was bug caused by change of Dynamic to Normal (we had an app that would be replace in specific context, but otherwise should be supervised as usual). After that we started observing comm channel blocks due to dead connections - it was 77 1/2 bug: Line 77 shut down the child and Line 78 deleted the child, in 77 1/2 Supervisor restarted the child so it couldn't be deleted anymore - and it was able to pick some comm channels. It's not hard to fix, but one has to know that. I don't like "not recommended", as many things aren't recommended but we do it given circumstances. It's better to know the difference and being able to make decision by oneself.

> Also note docs are available in the terminal, both inside `iex` or by doing `mix help SomeModule` `mix help Ecto` shows missing task, so I suppose it's a typo (or something I don't have). Help in `iex` (and probably `iex -S mix`) requires dependency download, build, maybe rebuild and some flag and env manipulation (so that app doesn't start entirely) and I hope I started it before breaking compilation because otherwise it won't. Yes, it's there - I agree, but it takes energy to use.

> ...but note we have always listed the versions in the sidebar

I know, however as text moves places the worst possible example of it is like: Search for A - open - notice wrong version - check which one I should be using in code - change to version B in sidebar - not linked, got bounced to home page - search in ExDocs - nothing found (searchbox often fails to return results, see screenshot) - get back to search engine - type exactly version and query - click there. When it repeats multiple time it starts to become unavoidable busywork.

> - "compilation process can break and Elixir isn't aware that it was interrupted" - our tooling has code to deal precisely with this: [https://github.com/elixir-lang/elixir/blob/c5c87a661efac6809...](https://github.com/elixir-lang/elixir/blob/c5c87a661efac6809...). If it still happens, it is a bug and must be reported, so we can fix it

I'm 99% sure that it's a result of circular dependencies and maybe one pass fails but then the other starts overwriting or something. But could also be something in compilation pipeline (we have extra steps). I wish there was something like "mix elixir_checks.compile_consistency" (with a flag to send a bug report). Right now feeling a bug means: isolating and justifying it. It takes energy, especially when codebase is complex, big and ridden with prior decisions. I considered doing that, but I think environment is defensive and I'm easy person to pull into fights, but don't enjoy them.

> "dependency compilation is not parallelized" I made a mental shortcut - i.e it's using only one core right now, and it's taking approx. 2-3 minutes. Looks like PR would resolve it, but not sure when we'll be able to use it.

> when doing umbrella tests on non compiled code seed influences order of compilation In short (I don't know cause) if I run `mix test` from umbrella I'm seeing different compilation order on applications and their dependencies (if I hadn't compiled those before). Those applications aren't guaranteed to be in homogenous dependency state (in fact when I'm looking for dependencies in `mix.exs` I can see popular libraries spread across 3-4 different major versions). Unlucky run happens and consensus is "don't debug `rm -fr _build`).

> Our philosophy here is to emit compilation warnings instead of compilation errors whenever possible, so you can run, debug, and test your code, instead of forcing your code to be pristine while you are still working on it.

This is big pain point for me. I care about some warnings, but not for others (e.g. in libraries that ale planned to be dropped ). I also can't enable those I'd like (deprecated - so my colleagues don't use something we want to sunset in root or dreaded circular dependencies). I solved this by complex check chain and custom filters, but in "competition" I get those out of the box.

I won't say that Elixir is worse technology it's just... I know others which are better (but not BEAM, BEAM is THE BEST)

xlii
·
3 months ago
·
[ - ]

I've just noticed that one image wasn't uploaded (i.e. fulltext search failing on doc), so adding: https://imgur.com/a/lP9z8qS

josevalim
·
3 months ago
·
[ - ]

> Yet given very static download count of Jason on Hex I think that rarely new projects are started in Elixir

So you are getting the download count of one package, one that has been added to Erlang/OTP (and Elixir itself) and is more than expected to decrease in download count, to estimate the popularity of the whole language and a ecosystem of 20k+ packages? And over what time period exactly?

> (ExDocs breaking) ... please give an example In screenshot - happened with Chromium,Safari and Firefox ~6 months ago. Often with OTEL libraries.

Got it. I was reminded that there was unfortunately one version of ExDoc with a sidebar bug and they probably were still using it. If you ask the package authors to update `ex_doc` and republish the docs, it should take 2 minutes to fix it. They might already have done it though.

Regarding ExDoc's search, people have been asked for improvements, such as searching on latest version by default and searching across packages. I am glad to say there is work happening towards this area (including soon the ability to search across all of the dependencies of your own project).

The other bug in your screenshot, about Ecto.Query, please report it if you can reproduce it. It is indeed a "wat" bug but I am not sure what could be causing it. EDIT: I was told this may happen if you are using a mocking library, here is a reproduction and a bug report: https://github.com/jjh42/mock/issues/151 - if you are using mocking libraries, please double check if they can be the root cause.

> I made a mental shortcut - i.e it's using only one core right now, and it's taking approx. 2-3 minutes. Looks like PR would resolve it, but not sure when we'll be able to use it.

It should not be using one core, even it if compiles one dependency at a time. My whole point is that you get parallelism from within the dependency/project.

> when doing umbrella tests on non compiled code seed influences order of compilation

Honestly, I have no idea how this could possibly be the case. Elixir's seed is applied per process and, in this case, it is only applied to the process running your tests, which is not the process related to compilation at all. I will drop a comment in your other thread about cycles in your umbrellas, which is also not possible.

> I can see popular libraries spread across 3-4 different major versions)

This is also something that should not be possible. I mean, you can specify different major versions, but the dependency resolution will guarantee they all agree on a single one. For example, you can't have different versions of `Jason` in the same umbrella, unless there is something really unconventional or undesired on how you are building your umbrella apps. So I would need a mechanism to reproduce it in order to pinpoint what. I would double and triple check your apps configuration, it seems there is something really unexpected going on.

xlii
·
3 months ago
·
[ - ]

Random response order :)

> I would double and triple check your apps configuration, it seems there is something really unexpected going on.

More than one thing for sure. It's big and highly heterogenous ecosystem (multiple umbrellas) in distributed environment with high idempotency requirement. Without safeguards decisions were made that today make things very complicated. It's difficult to challenge long used patterns without hard recommendation or concrete evidence (I looked into Perceived Complexity analysis for Elixir but couldn't find anything).

My organization now builds the story of "Elixir codebase is hard to work and unreliable - let's switch to other tools". I don't like that story because I still remember all the fun I had and all the systems I produced that stood years with 0 maintenance. But those were small teams and small projects and today it's an enormous Jenga tower that's risky to breath around.

I would go as far as to say that our codebase is somewhat of a Petri dish for all kinds of issues (especially on dev/test envs, but not only). I've seen code merged to main branch because it wasn't picked up as changed and used stale cache, multiple-Elixir and OTP versions used in compilation, arch spillovers and more.

>I can see popular libraries spread across 3-4 different major versions), This is also something that should not be possible.

We have overrides and I don't see the umbrella test helper so I guess that umbrella-level overrides don't play nice with non-compiled in-app test runs.

> I am glad to say there is work happening towards this area (including soon the ability to search across all of the dependencies of your own project).

Looking forward to it, one thing that I often change too, is changelog across libraries, so it would be nice to always have those up-to-date.

> It should not be using one core, even it if compiles one dependency at a time. My whole point is that you get parallelism from within the dependency/project.

Project or a single-dependency compilation is fine - I felt different after recent updates to our stack and won't complain. In one umbrella I have opened I see ~250 deps packages and deps.tree shows me ~6500 lines of output, some of those are compiled multiple times - I blame the loops.

I have similar CPU - 8 performance cores and 4 efficiency ones. Usually deps.compile takes less than 100% of total CPU with spike to 150%. On partitioned tests I can feel the warmth of 1100% CPU usage (it also makes me smile, because I like big numbers). Right now I'm thinking that maybe I could spawn ~250 containers, make each compile dependency and then merge output into one and see what broke ;-)

> So you are getting the download count of one package, one that has been added to Erlang/OTP (and Elixir itself) and is more than expected to decrease in download count, to estimate the popularity of the whole language and a ecosystem of 20k+ packages?

Not ideal, but the best I could find. I also looked at Ecto which is standard, but figured out that json is more often used in projects than a database. Given quality of software itself I'd expect steady increase on the "core" libraries. But I also hear from prior projects about them being sunsetted. 2 or 3 projects in Elixir less, no big deal. In current organization few of us are actively advocating for Elixir and BEAM. We're minority and newcomers encounter as a first thing difficult stack setup (Erlang and Elixir version) long compilation time, hundreds of odd warnings and LSP that takes 40s to pick up on changes and highlight some errors.

I'm not in position of making any demands, it's self inflicted 99% of the time, and it's not a bug that can be fixed upstream it's just a subjective experience, and I wish it could be better.

josevalim
·
3 months ago
·
[ - ]

I am sorry to hear. I understand it may be something out of your control but it seems you have clearly identified some "smells" that would be worth spending some time investigating.

For example, you can't have loops in deps, and therefore ~250 deps should not print a 6500 thousand entries long tree. At least, for this particular problem, you can isolate your project structure, without any code, and try to reproduce it externally. And, while you can override deps, the goal of an umbrella is to share dependencies, so overriding an umbrella sibling dependency is a smell too.

You said it's an enormous Jenga tower that's risky to breath around but it seems at the same time no one wants to invest on an air purifier. If it is of any help, you can look at the Remote case on the Elixir website (https://elixir-lang.org/blog/2025/01/21/remote-elixir-case/), they have a large codebase, around 15k files, 300 engineers (several dozens being Elixir ones), and while their codebase is healthy, you can see they had to invest on some "bottlenecks" that appeared along the way, such as CI times. And the need to invest in the code base itself will be true of any language as time passes. Best of luck!

xlii
·
3 months ago
·
[ - ]

Thank you. I completely agree and I'm not going to back out (nor switch technology at this). I hope that one day I will be able to open-source some of the work and help others who struggle. In the meantime I'm looking forward to deps compilation time speedup and documentation tooling (not to mention 1.19). My own experiment of parallel isolated builds showed speedup comparable to the one you mentioned, but I dislike hacks so I'd rather wait for peer reviewed version :)

coastalpuma
·
3 months ago
·
[ - ]

Out of curiousity, what did you move to?

xlii
·
3 months ago
·
[ - ]

We didn't and probably won't but we're exploring area of write Elixir code without Elixir code, i.e. heavy code generation. Some of us keep fingers crossed for the upcoming type checks, some of us are working on dev tooling (we have a looooong wish list of stuff to have, like dataflow traces, maybe some lightweight proofing, distributed version tracking etc.) but it's all about devex as app itself is rock solid and stable.

I have my own personal project that very quickly started to suffer from similar project and I move between tech, and given very good experiences with Go, that's something I'd be looking at. I like async and Go has fun async and great tooling to break/fix code.

widdershins
·
3 months ago
·
[ - ]

Have you given Gleam a try?

xlii
·
3 months ago
·
[ - ]

No, I did not.

There are some things that rubbed me wrong way about it. Like typing but still giving way to runtime errors in specific scenarios or lack of macros (which is what makes Rust great to work with because otherwise it’s just sea of boilerplate eventually).

IMO the more sensible decision would be moving „down”. One still has to use some Erlang in Elixir (for example for tracing) and there are small benefits I appreciate lately - like visual differentiation between variable and an atom.

widdershins
·
3 months ago
·
[ - ]

Runtime errors? As far as I know Gleam will never present runtime errors unless you manually add `todo`, `panic` or `let assert` statements. [1]

I feel you on the macros, I have wanted them too, but I respect the language creator's commitment to minimalism, and I don't feel that e.g. JSON decoders are too much effort. It seems the language is headed down the route of code generators rather than macros, which seems like a reasonable tradeoff to me. [2]

[1] https://tour.gleam.run/table-of-contents/ [2] https://gleam.run/news/improved-performance-and-publishing/#...

xlii
·
3 months ago
·
[ - ]

Yes, that’s let assert on pattern matches. We’ve briefly evaluated Gleam (but I only very shortly) and result was that we cannot port existing code any way and integration would be very difficult (especially given missing macros).

I do, however agree with you on macros and code generations. My hand in Rust is macro heavy (I dislike boilerplate) but in Go I learned to appreciate codegen utilities and it might be the way to go.

The topic itself is interesting, because I’ve been doing „business logic in types” and it’s impossible to pull of without invoking so much magic that keyboard starts to emit indigo and that puts Gleam in akward place because when we are at that place maybe it’s easier to write code generation with Prolog/Cue but instead of putting another layer just settle with Erlang/BEAM assembly.

But my problems are more in domain of „what happens when during daylight saving shift I receive an out of order message that should be included in generated raport of order fashion and one of node died at that point”.

cess11
·
3 months ago
·
[ - ]

"- dependency compilation is not parallelized - takes a lot of time and uses 10% of CPU"

Dependencies are commonly in foreign programming languages, where the compiler might run things concurrently outside of BEAM control. It's not uncommon that Mix is configured by dependencies to just pull a binary for the local architecture from some repo instead. Perhaps that's what you ought to do in your CI and deploy flows, instead of compiling everything.

"- a lot of knowledge is implicit (try checking if you can dynamically add children to a Supervisor)"

What do you mean, "implicit"? It's in the docs:

https://hexdocs.pm/elixir/DynamicSupervisor.html

notpublic
·
3 months ago
·
[ - ]

My experience has been just the opposite. We have moved all our apps to Elixir now. It has one of the best developer experience to work with. Especially for concurrent programming.

I suspect OP is using an umbrella app as a shared library or something. That is the only explanation I can think of that can cause the issue with compilation order.

About documentation, not quite sure what the OP is talking about. Elixir and Erlang have really good documentation.

Anyway, to truly appreciate Elixir (and for that matter Erlang), one needs to understand OTP and the philosophy behind it. It is not just a language but a framework to build concurrent application.

sethammons
·
3 months ago
·
[ - ]

> Dialyzer/dialyxyr typespecs are useless most of the time

I did Elixir for a year or so. I have to agree. I had to routinely jump up several layers of call sites to understand what I could do with the arguments.

Development where teams share stuff benefits so much from static types. For teams, the best I have experienced is Go.

The BEAM is great and all other systems are just partial and poor implementations of it, as they say. But k8s does a lot. If I could get typing that actually helped me, like in Go, Elixir would jump up my recommendation list

te_chris
·
3 months ago
·
[ - ]

Worth pointing out typing is coming in rapidly, built into the compiler. It’s already doing some checks, and 1.19 is actively adding more. See: https://hexdocs.pm/elixir/main/gradual-set-theoretic-types.h...

pawelduda
·
3 months ago
·
[ - ]

Dialyzer/dialyxir are just not worth it 99.9% of time, better wait a few years for a proper static typing support, or use something else if you need it right away

Munksgaard
·
3 months ago
·
[ - ]

> - there is no way to browse navigation outside this broken ExDocs which outputs only HTML and LSP

ExDoc support outputting EPUB as well as HTML.

xlii
·
3 months ago
·
[ - ]

I never got it working properly, errors on standard libraries, readers breaking etc. Today I mostly ripgrep the code instead - faster and 75% efficent.

simoncion
·
3 months ago
·
[ - ]

> - a lot of knowledge is implicit (try checking if you can dynamically add children to a Supervisor)

What? This is very clear: <https://www.erlang.org/doc/apps/stdlib/supervisor.html#start...>, as is this: <https://hexdocs.pm/elixir/1.18.3/Supervisor.html#start_child...>. What am I missing?

> - this order of compilation influences outcome and can lead to buggy code

Do you have an example of this? It's my understanding that the big thing about both Erlang and Elixir are that they're functional languages, it doesn't matter what's compiled when. Is this some nightmare compile-time code manipulation thing?

> ...searching [the docs] shows random versions and there isn’t even a link to „open newest”

Is scrolling to the top of the list contained in the version selector near the top left of hexdocs.pm not good enough? If not, why not?

dqv
·
3 months ago
·
[ - ]

They are probably talking about the child spec needing a unique ID if it's the same module being started.

The docs problem is more of a Google problem. For some reason Google still only shows the 1.12 docs for a lot of searches. The sidebar issue was fixed more recently, I think in the last year. But basically the sidebar wouldn't get loaded until Mermaid finished loading, so it was updated to defer loading of Mermaid. The latest version of ExDoc shouldn't have this problem.

simoncion
·
3 months ago
·
[ - ]

> They are probably talking about the child spec needing a unique ID if it's the same module being started.

No, my experience says that's not true. I have this code in one of my projects and it works just fine:

  -behavior(gen_server).

  start(Mod, Args) ->
    supervisor:start_child(secret_project_worker_sup:getName(), [{Mod, Args}]).

I can spawn as many of those guys as I like and they all become children of the named supervisor. The named supervisor is a 'simple_one_for_one' supervisor with a 'temporary' restart policy.

I guess the thing that might trip folks up with how the docs are worded is not noticing this further up in the document

  A supervisor can have one of the following restart strategies specified with the strategy key in the above map:
  ...
  * simple_one_for_one - A simplified one_for_one supervisor, where all child processes are dynamically added instances of the same process type, that is, running the same code.

and that 'start_child/2' accepts EITHER a 'child_spec()' OR a list of terms

  -spec start_child(SupRef, ChildSpec) -> startchild_ret()
                       when SupRef :: sup_ref(), ChildSpec :: child_spec();
                   (SupRef, ExtraArgs) -> startchild_ret() when SupRef :: sup_ref(), ExtraArgs :: [term()].

and that only the 'child_spec()' type can have an identifier, so the first bullet point in the list of three in the function documentation does not apply.

Also, I find the way the docs USED to print out function types a bit easier to understand than the new style: <https://web.archive.org/web/20170509120825/http://erlang.org...>. (You will need to either close the Archive.org nav banner or scroll up a line to see the first line of the function type information, which is pretty informative.)

dqv
·
3 months ago
·
[ - ]

I'm talking about the behavior of the one_for_one supervisor:

    defmodule Testing.Application do
      use Application

      @impl Application
      def start(_type, _args) do
        children = []
        opts = [strategy: :one_for_one, name: Testing.Supervisor]
        Supervisor.start_link(children, opts)
      end
    end

    defmodule Testing.Server do
      use GenServer
      
      def start_link(_), do: GenServer.start_link(__MODULE__, [])

      @impl GenServer
      def init(_), do: {:ok, nil}
    end

When you try to start more than one child, it fails:

    Erlang/OTP 25 [erts-13.2.2.11] [source] [64-bit] [smp:14:14] [ds:14:14:10] [async-threads:1] [jit:ns]

    Interactive Elixir (1.17.3) - press Ctrl+C to exit (type h() ENTER for help)
    iex(1)> Supervisor.start_child(Testing.Supervisor, {Testing.Server, []})
    {:ok, #PID<0.135.0>}
    iex(2)> Supervisor.start_child(Testing.Supervisor, {Testing.Server, [:x]})
    {:error, {:already_started, #PID<0.135.0>}}

But defining a child spec that sets the id:

    defmodule Testing.Server do
      use GenServer
      
      def start_link(_), do: GenServer.start_link(__MODULE__, [])

      def child_spec(arg) do
        id = Keyword.get(arg, :id)
        %{id: id, start: {__MODULE__, :start_link, [[]]}}
      end
 
      @impl GenServer
      def init(_), do: {:ok, nil}
    end

solves the problem:

    Erlang/OTP 25 [erts-13.2.2.11] [source] [64-bit] [smp:14:14] [ds:14:14:10] [async-threads:1] [jit:ns]

    Interactive Elixir (1.17.3) - press Ctrl+C to exit (type h() ENTER for help)
    iex(1)> Supervisor.start_child(Testing.Supervisor, {Testing.Server, id: 1})
    {:ok, #PID<0.135.0>}
    iex(2)> Supervisor.start_child(Testing.Supervisor, {Testing.Server, id: 1})
    {:error, {:already_started, #PID<0.135.0>}}
    iex(3)> Supervisor.start_child(Testing.Supervisor, {Testing.Server, id: 2})
    {:ok, #PID<0.136.0>}

simoncion
·
3 months ago
·
[ - ]

> I'm talking about the behavior of the one_for_one supervisor:

Oh, sure, you can vary the ID in non-'simple_one_for_one' supervisors there to make that work. Apologies for inducing you to write out all that transcript and code.

But, OP's claim was:

> - a lot of knowledge is implicit (try checking if you can dynamically add children to a Supervisor)

which is just not fucking true no matter how you slice it. It's true that the relevant documentation doesn't literally say "Calling 'start_child/2' is valid for any kind of 'supervisor'. That's why it's here... to dynamically add children to a 'supervisor'.", but if one bothers to read the docs on supervisors and the function in question it's clear that that's the entire point of 'start_child/2'.

xlii
·
3 months ago
·
[ - ]

You’re coming from position of knowledge. I don’t memorize documentation because it changes often and I use 3-5 various languages over the month. For fun and profit ;)

So I go into docs, Elixir because that’s the primary source and when I search for „dynamic” and „supervisor” everything points to (no fanfares) DynamicSupervisor. And yet I look at the diff where 12 months earlier my colleague changed DynamicSupervisor to Supervisor because connection adapter tended to crash and not start back up and that day I debug zombie connections.

Erlang has it clearly explained, and ultimately, over couple hours of research I found solution and fixed it, but there was no warning that between two lines of shutdown and delete BEAM could restart a child leading to a shadowing (and - in that case - unmanaged as adapter manager lives in other app that has a slot for a single process only) zombie connection handler.

This is the stuff people rarely look at because many systems don’t have high idempotency requirements but (only a parabole, that’s not my industry) would you like your system to administer double dosage of potentially lethal drugs? None is preferable but it’s still far from happy system land.

As for Hexdocs.pm - this goes once again about operating cost. Yes, I can do this, but search is not great and I more often have a broad queries and search for patterns or guidelines. I rely on „map mode” (how I can find knowledge) and not on „collect mode” (i.e. I keep knowledge), due to some mostly intrinsic traits. And thanks to wonders of modern human tracking^W^Wtechnology I can show anecdotal data of the impact.

When working with Elixir I spent (one random, recent day) 40% of my time on browsing Hexdocs and 60% in editor+console.

One random recent day on Go I spent 95% time in E+C and 5% in documentation.

My first line of code written in Go was less than 6 months ago, my first line of Elixir was written years ago, I would’ve to check my resume when exactly, but somewhere like a 8 years and I toyed with it when it wasn’t yet deemed production ready.

simoncion
·
3 months ago
·
[ - ]

> ...there was no warning that between two lines of shutdown and delete BEAM could restart a child...

If you're talking about calling 'terminate_child/2' followed by 'delete_child/2', there is a very explicit warning. From <https://hexdocs.pm/elixir/1.18.3/Supervisor.html#terminate_c...>

> A non-temporary child process may later be restarted by the supervisor.

and from <https://www.erlang.org/doc/apps/stdlib/supervisor.html#termi...>

> The process, if any, is terminated and, unless it is a temporary child, the child specification is kept by the supervisor. The child process can later be restarted by the supervisor.

In the English-language docs, the warning is very clear: children that a supervisor will restart on termination may be restarted before you get around to calling 'delete_child'. Children with a restart type of 'permanent' will always cause a race between 't_c' and 'd_c'. Children with a restart type of 'transient' will cause a race if they terminate abnormally... which is a term defined by the same document that the warning comes from.

You keep insisting that this stuff isn't documented. Are you perhaps reading a poor translation of the docs?

> You’re coming from position of knowledge.

No, I'm not. I've forgotten most of the details I ever knew about how the system works and only remember broad strokes. My experience with Erlang/OTP was scattered throughout my spare time over an eight to twelve month period ten years ago. Unlike you, I've never been paid to work with it, and you've worked with it far more recently than I.

The reason I was able to direct you to the right part of the docs was because I said to myself "Wow, it would be fuckin stupid if you couldn't dynamically add children to a supervisor. I remember the Erlang docs being really, really good, so let's see if they failed to describe if and/or how this works.", and then spent like five minutes reading the docs and another couple cross-checking with the Elixir docs.

> I don’t memorize documentation because it changes often...

Do the rules for "+", "-", and (if present) "%" change often in most major languages when they are used with built-in types? [0] It's always well worth your time to learn the rules for commonly-used, bedrock parts of a language, major libraries, and runtime systems that you intend to use. Bedrock parts don't substantially change, because if they did, they would invalidate every program ever written against those parts.

OTP provides a collection of bedrock parts, and supervisors are one such part. Memorizing docs is really stupid, but having a solid understanding of how the major things you use work is always worth the time.

Seriously, how would you function as a programmer if you didn't know how addition or string concatenation worked? If you're using OTP's supervisors, having a good understanding of how they function is just as fundamental.

> One random recent day on Go I spent 95% time in E+C and 5% in documentation.

Sure, that makes some sense. OTP is far more complex and robust than anything Go offers, so it's quite a bit quicker to come up to speed with what's documented in the Go official docs than what's in the OTP docs. Also, as someone who has written Go professionally for the last five, ten years, I warn you that you're going to get turbofucked by the things that -if they are documented at all- are documented only in blog posts or random tutorials. Go is an absolute grab bag of poorly-documented sharp edges and surprising behavior.

> I rely on „map mode” (how I can find knowledge) and not on „collect mode” (i.e. I keep knowledge)... [and] I can show anecdotal data of the impact.

The impact of you failing to familiarize yourself with the necessarily-complex tools you chose [1] to use seems pretty clear to me. You got tripped up by documented behaviour that you didn't bother to understand, which caused you to spend hours looking for solutions that would have been clear after a ten minute trip to the documentation for the 'supervisor' module. Given what you've said elsewhere in this subthread about how your project is an "an enormous Jenga tower that's risky to breath around", I suspect that a significant number of your coworkers also refused to familiarize themselves with the tools they use.

[0] No, they do not.

[1] (or were obligated)

xlii
·
3 months ago
·
[ - ]

> Do the rules for "+", "-", and (if present) "%" change often in most major languages when they are used with built-in types?

In terms of dynamic or in terms of behavior between them? For former - yes they do change, not often, but they do. Even Elixir is right now raising warnings that `-0.0` and `+0.0` will not be equal, which implies also changes in addition and subtraction (e.g. cancelling out event's value in event based system value might impact on system's behavior).

If that's the latter then it deserves blog post on its own, because some can add mixed types, some are casting in specfic way, some are copying data, some are mutating data, some are doing heuristic casting, some are crashing, some leak memory, some allow modifying pragmas, some allow implicit overloading.

It's a jungle out there. ...and it reminds me about academic joke that '2 + 2 = 5 given extremely high values of 2' - funny one until you spend night trying to figure out why this happens and another two planning vengeance on person who decided that int->float->int is a good trick to use helpful float-taking function on an otherwise perfectly fine integer.

The worth of remembering is a concept I perceive a consideration a cost, usefulness and available memory space. I rather remember that in Elixir/BEAM child shutdown and removal is message driven (and thus can cause race condition) than whether I need to use `+` or `++` for concenating lists.

simoncion
·
3 months ago
·
[ - ]

> If that's the latter then...

It's a good thing I asked specifically about built-in types in a particular system, and didn't ask about comparisons between operators in different languages.

> For former - yes they do change, not often, but they do. Even Elixir is right now raising warnings that `-0.0` and `+0.0` will not be equal...

Sure, that's a change to optional behavior to comparison of floating-point zeros. That doesn't change how equality testing, addition, subtraction, or -if available- modular arithmetic works.

As I said:

> It's always well worth your time to learn the rules for commonly-used, bedrock parts of a language, major libraries, and runtime systems that you intend to use. Bedrock parts don't substantially change, because if they did, they would invalidate every program ever written against those parts.

xlii
·
3 months ago
·
[ - ]

> > It's always well worth your time to learn the rules for commonly-used, bedrock parts of a language, major libraries, and runtime systems that you intend to use.

I disagree. I've been long enough around to see languages sunsetted, libraries sunsetted. Big systems are standing on shamefully old versions. If your job is to work on one language - I agree, but when working with 100s of systems that go over multiple OTP versions, multiple Elixir versions, sprinkled with JavaScript, TypeScript, Ruby 1.0, Elm, Java, "oh my dear is it Python 2 running CoffeeScript?!", then memorizing anything is pointless, because chance is that thing that you memorized is:

- not yet in this project

- no longer in this project

- that tech isn't in the project

- project is written in Malboge, everything you know is irrelevant

- is explicitly forbidden by code owner (for more or less sensible reason)

> Bedrock parts don't substantially change, because if they did, they would invalidate every program ever written against those parts.

Been there, done that, bought a t-shirt. I dislike TypeScript for exact that reason [0], but in Elixir the same is true if you rely on --warnings-as-errors flag due to (in my opinion) broken deprecation mechanism.

Software is full of leaky abstractions. Do you know that it's not guaranteed that your system clock is monotonic? [1]

[0]: https://github.com/microsoft/TypeScript/wiki/Breaking-Change... [1]: https://github.com/rust-lang/rust/blob/e2223c94bf433fc38234d...

simoncion
·
3 months ago
·
[ - ]

> Do you know that it's not guaranteed that your system clock is monotonic?

Yes. Wall-clock time is adjustable. That's why there's a monotonic clock function on any serious OS that's running on hardware that makes such a function possible.

> ...but when working with 100s of systems that go over multiple OTP versions [it's not worth understanding how anything that's bedrock works]...

Welp, let's go back to the behavior of 'supervisor' in 2007... the earliest version of that page of the docs that the Wayback Machine has: <https://web.archive.org/web/20070707071556/http://www.erlang...>

Hey, look at this description and warning in 'terminate_child/2'

> Tells the supervisor SupRef to terminate the child process corresponding to the child specification identified by Id. The process, if there is one, is terminated but the child specification is kept by the supervisor. This means that the child process may be later be restarted by the supervisor. The child process can also be restarted explicitly by calling restart_child/2. Use delete_child/2 to remove the child specification.

Hell, read the rest of that document... notice that the behavior described from nearly twenty years in the past is the same as now. (And I bet you One American Nickel that the behavior described in 1997 is also the same.)

If you don't consider that to be bedrock functionality and worth familiarizing yourself with, I don't know what to tell you.

xlii
·
3 months ago
·
[ - ]

> The impact of you failing to familiarize yourself with the necessarily-complex tools you chose [1] to use seems pretty clear to me. You got tripped up by documented behaviour that you didn't bother to understand, which caused you to spend hours looking for solutions that would have been clear after a ten minute trip to the documentation for the 'supervisor' module.

As I showed in the other post, this is incorrect (at least in Elixir documentation).

Fortunatelly I read that last, as I'd refrained from further conversation but regarding my situation and memory - I didn't choose so, I was born with a specific type of memory and specific traits. It's useful and it built me a rewarding career. Often problems in software are caused by assumptions and I can't have any. Thanks to that I can work on interesting systems that have hair pulling problems.

However this attitude of both shaming "you should just memorize" and "works for me" approach is one I seen often in Elixir's community and why I don't want to have such conversations in official places. I don't feel a need to be present in environment where I'm not welcome. And yet, peculiarily, I'm often brought as a decision maker regarding recommending choosing or sunsetting technologies and given lack of parameters I do fall back and it wasn't that great.

simoncion
·
3 months ago
·
[ - ]

> However this attitude of both shaming "you should just memorize" and "works for me" approach...

Okay? I'm doing neither, so I don't see why you're bringing that up. I've consistently rebutted your claims that something wasn't explicitly documented by pointing out where it's explicitly documented. I've also called memorization of documentation a fucking stupid thing to do.

> As I showed in the other post, this is incorrect (at least in Elixir documentation).

As I've mentioned in the other post, I don't see how this is incorrect, and await your detailed walkthrough.

xlii
·
3 months ago
·
[ - ]

> Children with a restart type of 'permanent' will always cause a race between 't_c' and 'd_c'. Children with a restart type of 'transient' will cause a race if they terminate abnormally... which is a term defined by the same document that the warning comes from.

This is not written anywhere explicitly in the docs - I also agree that Erlangs documentation is much better but I’m not saying that Erlang is missing information. I’m talking about Elixir not providing this and marking clearly - because if I need to start reading in Erlang first then why would I layer Elixir on top of it? This is exactly the thing I’m pointing out.

Because your response is long Ill only focus on this point and (hopefully) get back later.

My expectation (implicit) would be that when function is doing 2 lines the messages would be locally ordered. Yes, maybe that’s silly, but in many other languages that’s exactly the case. If I send messages to queue I’m aware that queue might not get two of those. I need to send a transaction, fine. If I broadcast or make a signal/event same happens. But here I have synchronous function with no indication or warnings that it’s a message.

If this can’t be known in documentation, isn’t caught by compiler/analysis, but requires experience or (often) reading source code it is implicit knowledge.

Yes, I posses it too now, but I think it’s a problem.

simoncion
·
3 months ago
·
[ - ]

> This is not written anywhere explicitly in the docs.

It absolutely is. I'll use the Elixir docs as my source:

> A non-temporary child process may later be restarted by the supervisor.

And, further up in the docs when talking about the circumstances under which a supervisor will restart a child that has terminated: [0]

  Restart values (:restart)
  
  The :restart option controls what the supervisor should consider to be a
  successful termination or not. If the termination is successful, the
  supervisor won't restart the child. If the child process crashed, the
  supervisor will start a new one.
  
  The following restart values are supported in the :restart option:
  
      :permanent - the child process is always restarted.
  
      :temporary - the child process is never restarted, regardless of the
      supervision strategy: any termination (even abnormal) is considered
      successful.
  
      :transient - the child process is restarted only if it terminates
      abnormally, i.e., with an exit reason other than :normal, :shutdown, or
      {:shutdown, term}.
  
  For a more complete understanding of the exit reasons and their impact, see
  the "Exit reasons and restarts" section.

And the "Exit reasons and restarts" section says: [1]

> A supervisor restarts a child process depending on its :restart configuration. For example, when :restart is set to :transient, the supervisor does not restart the child in case it exits with reason :normal, :shutdown or {:shutdown, term}.

You go on to say:

> But here I have synchronous function [to affect the state of a supervisor] with no indication or warnings that it’s a message.

Before I get into that, I have two questions for you:

1) How do you affect an Erlang or Elixir process without sending it a message? The docs for Processes [2] don't indicate any other way.

2) Have you never seen or written a function that does not return until it receives the response to an async operation?

Continuing on... from the top of the Supervisor docs, we see:

> A supervisor is a process which supervises other processes, which we refer to as child processes.

"A supervisor is a process...", straight off the bat. That's super clear and explicit, but I'll keep walking through the docs to show you how else this information is communicated to the reader.

If we read on, we see that the first argument to the 'stop_child/2' and 'delete_child/2' functions is of type 'supervisor()', which is defined as '@type supervisor() :: pid() | name() | {atom(), node()}'. What are these? Well, check the docs for how you start a Supervisor. [3] They say three interesting things:

1) The second argument to 'start_link/2' is of type 'option()', which is defined as '{:name, name()}', and 'name()' is defined as 'atom() | {:global, term()} | {:via, module(), term()}' . Keep those types in mind.

2) "If the supervisor and all child processes are successfully spawned (if the start function of each child process returns {:ok, child}, {:ok, child, info}, or :ignore), this function returns {:ok, pid}, where pid is the PID of the supervisor. If the supervisor is given a name and a process with the specified name already exists, the function returns {:error, {:already_started, pid}}, where pid is the PID of that process."

Notice how often it talks about "spawning" the supervisor and returning a PID, and saying that that PID is the PID of the supervisor you just created, or of a named supervisor that already exists.

3) "The options can also be used to register a supervisor name. The supported values are described under the "Name registration" section in the GenServer module docs."

Let's look at the "Name registration" section. [4] I'm not going to quote the whole thing because it'd be a nightmare to reformat sensibly, but the two key sections are

> Both start_link/3 and start/3 support the GenServer to register a name on start via the :name option. Registered names are also automatically cleaned up on termination. The supported values are: an atom ... {:global, term} ... {:via, module, term}...

and the last four items in the bulleted list in the section beginning with

> Once the server is started, the remaining functions in this module (call/3, cast/2, and friends) will also accept an atom, or any {:global, ...} or {:via, ...} tuples. In general, the following formats are supported:

Notice how those bullets match up to the 'name()' type that is passed in to supervisor:start_link/2, and connect that information with the fact that the docs for that function direct you here to learn about how you can register a name for your supervisor. Combine that information with the fact that the first argument to the "Tell the supervisor to do something" functions is of type 'supervisor()' and the fact that 'start_link' returns a PID, and it's really, really clear that a supervisor is another process that you can (optionally) name and refer to by name, rather than PID.

Once we understand that a supervisor is a process, and that the functions to instruct a supervisor to do things require the information required to contact a process, what other conclusion can we draw than "Communications with a supervisor is async, because communications with all processes are async."?

[0] <https://hexdocs.pm/elixir/1.18.3/Supervisor.html#module-rest...>

[1] <https://hexdocs.pm/elixir/1.18.3/Supervisor.html#module-exit...>

[2] <https://hexdocs.pm/elixir/1.18.3/processes.html>

[3] <https://hexdocs.pm/elixir/1.18.3/Supervisor.html#start_link/...>

[4] <https://hexdocs.pm/elixir/1.18.3/GenServer.html#module-name-...>

xlii
·
3 months ago
·
[ - ]

I anonymized the code:

    def start_new(name, config) do
      # Logging set up
      Supervisor.start_child(
        name,
        { HandlerModule, config }
      )
    end
    
    def replace_supervisor(name, config) do
      Supervisor.terminate_child(name, HandlerModule) # Success
      Supervisor.delete_child(name, HandlerModule)    # Failure
      start_new(name, config)
    end

That is exact code. Success and failure were logged. Also (from Erlang's documentation)

> one_for_one - If one child process terminates and is to be restarted, only that child process is affected. This is the default restart strategy.

In terminate child you can read that (once again Erlang).

> If the supervisor is not simple_one_for_one, Id must be the child specification identifier. The process, if any, is terminated and, [[unless it is a temporary child, the child specification is kept by the supervisor]]. The child process can later be restarted by the supervisor.

https://www.erlang.org/doc/apps/stdlib/supervisor.html#termi...

So yeah, Elixir documentation is wrong.

simoncion
·
3 months ago
·
[ - ]

> Success and failure were logged.

Sorry, what happened after or during the call to delete_child/2 that caused you to consider it to have failed?

> So yeah, Elixir documentation is wrong.

I don't see what's wrong about the Elixir documentation. Walk me through it, please? Do remember that the default restart strategy for a supervisor is 'permanent', and that 'one_for_one' only ensures that the supervisor-initiated restart of one supervised child doesn't cause the supervisor to restart any other supervised children.

xlii
·
3 months ago
·
[ - ]

It was restarted by a supervisor :)

After tracing the code this is exactly what happened (in this code exactly):

    1. Terminate child X 
    2. /Supervisor restarts X/
    3. Delete child X                 {:error, :running}
    4. Supervisor.start_child Y       {:ok, PID}
    5. /X and Y are both running/

As for incorrectness:

> the supervisor does not restart the child in case it exits with reason :normal, :shutdown or {:shutdown, term}.

`terminate_child` is sending shutdown and yet it's being restarted.

And to emphasise on use case. The child is connection handler. Service node changed. It NEEDS to be restarted on crash, but has to be replaced during handoff.

I believe you start to get into "huh?" mode with me. I have a treasure trove of those. (Btw., in Erlang repository there's plenty of notes mentioning THIS exact behavior and if I didn't overskim - even some bugs caused by it - you can search for terminate_child.

simoncion
·
3 months ago
·
[ - ]

> It NEEDS to be restarted on crash, but has to be replaced during handoff.

I question why you're handing off things between supervisors. If this is something you actually need to do, then 'delete_child/2' so the supervisor doesn't restart the child, terminate the child yourself, and re-start the child on the new supervisor.

EDIT: Actually, no, you can't 'delete_child/2'. You need to change the supervisor type from 'permanent', to the type that does exactly what you say you need. I'll leave it to you to read the docs. /EDIT

> `terminate_child` is sending shutdown and yet it's being restarted.

Here's the context for that partial quote that you pulled from [0]:

Re-read that first sentence that you chose to not quote. Then read about the ':restart' supervisor configuration and how it describes when a supervised child is and is not restarted. [1]

> I believe you start to get into "huh?" mode with me.

Yep. Selective quoting when it's trivial for your conversation partner to find the lies by omission definitely put me into "huh?" mode with you.

[0] <https://hexdocs.pm/elixir/1.18.3/Supervisor.html#module-exit...>

[1] <https://hexdocs.pm/elixir/1.18.3/Supervisor.html#module-rest...>

johnisgood
·
3 months ago
·
[ - ]

And how does your experience with Elixir compare to Erlang?

xlii
·
3 months ago
·
[ - ]

I'm biased because I like Prolog and syntax is similar.

I won't advocate strongly, but I think some designs are more clear in Erlang. E.g. nested structs and maps in Elixir are something I consider problem-to-be.

If one writes deeply nested structure in Erlang it looks like syntax vomit, so I'd avoid it. But I haven't been writing in Erlang for a while, so that might be just an illusion.

But it could simplify code while being perfectly compatible, so I wonder (and there's LFE too which feels like something I both want and don't want to touch)

antfarm
·
3 months ago
·
[ - ]

> - a lot of knowledge is implicit (try checking if you can dynamically add children to a Supervisor)

https://hexdocs.pm/elixir/DynamicSupervisor.html

xlii
·
3 months ago
·
[ - ]

I know about Dynamic one, but since I clicked and read (expecting some change) and noticed intro paragraph:

> The Supervisor module was designed to handle >>mostly<< static children that are started in the given order when the supervisor starts.

Emphasis mine. But I will spread knowledge that: yes, one can add child to a running Supervisor in runtime and Supervisor will try to use supervision strategy on it (as opposed to DynamicSupervisor that immediately forgets about its child). Replacing a child requires stop-remove in loop (see sibling comment response for exact case).

_randyr
·
3 months ago
·
[ - ]

> - when having non trivial ecosystem one cannot selectively deprecate and get errors, this has to be a human process (remember to replace…)

Apart from some statically typed languages (and even then), most languages have this. Very rarely in any ecosystem does a dependency upgrade not also require manual changing of stuff.

> - when doing umbrella tests on non compiled code seed influences order of compilation > - this order of compilation influences outcome and can lead to buggy code

I've never seen compilation produce odd artifacts, especially not as a result of compilation order. If the code has the proper compile-time deps, then the result seems stable.

> - documentation is hard to use - searching shows random versions and there isn’t even a link to „open newest”

Isn't this the fault of the search engine not having the latest version indexed? There's also a version selector on the top-left of hexdocs. Navigating to hexdocs.pm/<LIBRARY_NAME> also opens the latest version. This seems like a non-issue to me.

> - a lot of knowledge is implicit (try checking if you can dynamically add children to a Supervisor)

Already covered by another commenter, but also: https://hexdocs.pm/elixir/DynamicSupervisor.html I don't think the knowledge is necessarily implicit, it's just that learning Elixir deeply also means learning the BEAM/Erlang deeply, and there's a lot of Erlang docs.

> - sidebar with new ExDocs break for some reason so there is no navigation

Not a universal problem. Perhaps look into why it's broken on your device and report it as a bug?

> - there is no way to browse navigation outside this broken ExDocs which outputs only HTML and LSP

There's iex. For example `h String.codepoint`. Aside from that, I sometimes just open the relevant library in my deps/ directory and search there.

> - Squiggly arrow works weird (i.e. ~0.3 might catch 0.99)

I genuinely don't understand what you mean by this. There's no tilde operator and ~0.3 is invalid syntax. Can you give a code sample?

> - dependency compilation is not parallelized

I think this might be related to the common pattern of code you see in libraries:

    if Code.ensure_loaded?(SomeModuleFromAnotherLib) do
       # Some lib is loaded, add code to integrate with it.
    end

I think that could only be solved if that integration were extracted into another lib with properly setup deps, but due to how common this pattern is I don't think it's ever possible to switch to parallel dep compilation.

> - Dialyzer/dialyxyr typespecs are useless most of the time

A type system is being worked on, and each release lately has added more checks.

I agree that compilation is slow & editor integration is meh, but the rest I don't agree with.

oldpersonintx
·
3 months ago
·
[ - ]

[dead]

brcmthrowaway
·
3 months ago
·
[ - ]

Wait, is Elixir actually accessing color pixel data in realtime?

davidbou
·
3 months ago
·
[ - ]

No, it deals with metadata: control and status as explained in a previous reply https://news.ycombinator.com/item?id=43479094#43482362

Elixir does some computations as well but when we had to compute 3D luts based on video processing algorithms, Ghislain had to write them in C to be fast enough for our needs on embedded hardware.

hackburg
·
3 months ago
·
[ - ]

[dead]

RKFADU_UOFCCLEL
·
3 months ago
·
[ - ]

[dead]

arakutourisam
·
3 months ago
·
[ - ]

[dead]

pinoy420
·
3 months ago
·
[ - ]

[dead]