Spending 3 months investigating a 7-year old bug and fixing it in 1 line of code

384
114
asicsp
10 months ago
lemmy.world

TehShrike
·
10 months ago
·
[ - ]

I particularly liked this part:

> Knowing very little about USB audio processing, but having cut my teeth in college on 8-bit 8051 processors, I knew what kind of functions tended to be slow. I did a Ctrl+F for “%” and found a 16-bit modulo right in the audio processing code.

That feeling of saving days of work because you remember a clue from previous experience is so good.

djoldman
·
10 months ago
·
[ - ]

Those are the times one gets the opposite of imposter syndrome.

brailsafe
·
10 months ago
·
[ - ]

This is very true, and we need these moments. Lately I've been struggling to figure out where my place is, how to maybe get back into freelancing, what I'm technically good at etc... lots of ruminating since I've been out of work for a year.

But.. I met someone at the gym who's been struggling with an esoteric problem on an ancient piece of software for over a decade, and they approached me to ask if I could solve it. I said "maybe", sat on it for a few days, and then replicated the issue on my machine and solved their problem in about an hour. I asked for $50 and they gave me double, which was wildly more rewarding than being paid $100k to write react all year, not that that salary is on the table any longer.

9659
·
10 months ago
·
[ - ]

$50 for a one off thing, for a gym buddy is fine. in the blue collar world, that would be a 'case of beer' for helping me out.

But in business, you need to charge an honest / fair amount. (sure, sometimes that 1 hour bug fix had $100K of 'value', but we could argue about honest / fair).

You mention being an employee at $100K a year. Double that, gives you a contractor rate of $100 an hour. That is the floor of what you should be asking floor; as in the absolute lowest. Another $50 or $100 an hour is still fair and honest in todays economy.

manmal
·
10 months ago
·
[ - ]

> not that that salary is on the table any longer

Is the job market so bad right now? I‘m in a privileged position (and not in the US), so have no clue of the state of things.

brailsafe
·
10 months ago
·
[ - ]

I'm in Canada, but yes it's that bad.

ASalazarMX
·
10 months ago
·
[ - ]

Fortunately it's a temporal state, otherwise there's risk of entering the Dunning-Kruger effect.

"You did awesome, but don't let it go to your head."

dylan604
·
10 months ago
·
[ - ]

F-that! That's one of those times where I re-enact the scene from the Bond Golden Eye film where the guy jumps up extending both arms yelling "Yes! I am invincible!" Of course I totally expect the hubris to be short lived, just maybe not with liquid nitrogen

https://www.youtube.com/watch?v=fXW02XmBGQw

squigz
·
10 months ago
·
[ - ]

I alternate between "I am the best programmer to ever exist" and "I am completely incompetent at this and I should quit" while debugging.

dylan604
·
10 months ago
·
[ - ]

I've been known to inform people that the person that wrote the incredibly horrendous code that caused whatever problems to occur should be fired immediately knowing good and well that I was the only dev to write any of the code.

dmd
·
10 months ago
·
[ - ]

me, yesterday: What absolute piece of shit asshole wrote this shell script? my wife: Was it you? me: Well obviously

dctoedt
·
10 months ago
·
[ - ]

> my wife: Was it you? me: Well obviously

"Research shows" (I read long ago) that the happiest men are those take their wives' advice — that's certainly been true for my own N=1, for coming up on four decades now. I'd imagine we could replace "wives" with "spouses" and get comparable results.

Angostura
·
10 months ago
·
[ - ]

> I'd imagine we could replace "wives" with "spouses" and get comparable results.

Haven’t you just risked an infinite loop in your code?

dctoedt
·
10 months ago
·
[ - ]

Hah! I was referring to some men having husbands these days, but I take your point!

Aerbil313
·
10 months ago
·
[ - ]

I prefer "Why did I need to see this subtlest detail to resolve the bug, modern software sucks" and "Why do we not have better tools that would have prevented me from writing this bug in the first place, modern software sucks".

saagarjha
·
10 months ago
·
[ - ]

https://i.kym-cdn.com/photos/images/original/001/275/257/dbc...

9659
·
10 months ago
·
[ - ]

If you are honest, it may not even alternate. Both feelings can exist at the same time.

justinclift
·
10 months ago
·
[ - ]

There's an opposite to imposter syndrome?

·
10 months ago
·
[ - ]

Jcowell
·
10 months ago
·
[ - ]

Yes the kids these days call it being “Him” (or whatever third person singular pronoun you may refer to)

sn9
·
10 months ago
·
[ - ]

The term that came to mind was being "cracked".

justinclift
·
10 months ago
·
[ - ]

Not sure how being crazy is supposed to be the opposite of imposter syndrome?

sn9
·
10 months ago
·
[ - ]

https://www.urbandictionary.com/define.php?term=Cracked

aodonnell2536
·
10 months ago
·
[ - ]

Not really a new phenomenon, though. People have been been calling others “The Man” for ages

drewg123
·
10 months ago
·
[ - ]

If modulus is expensive, and he's checking a power-of-2, why not just use a bitwise AND.

Eg, for positive integers, x % 16 == x & 15. That should be trivially cheap.

kevin_thibedeau
·
10 months ago
·
[ - ]

Any top-tier C compiler will optimize modular division by a constant into a more efficient operation(s). It is better to keep the intent of the code clear rather than devolve it into increasingly obtuse bit-twiddling tricks the compiler can figure out on its own.

ladberg
·
10 months ago
·
[ - ]

It wasn't `x % 16` it was `x % y` where x and y are 16-bit integers. A compiler would also have taken care of it if it were just a literal.

drewg123
·
10 months ago
·
[ - ]

Whoops.. I misread what he was doing.

xgkickt
·
10 months ago
·
[ - ]

Reading the comments, that's kinda what they did, though they had to learn that first and only now realize they only needed one.

nikanj
·
10 months ago
·
[ - ]

This essentially is why senior engineers get much bigger salaries

mulmen
·
10 months ago
·
[ - ]

As a total newbie I saved my company a quarter million dollars in Oracle licensing in a single afternoon by rewriting a PL/SQL function. That change was a few lines of SQL. Seniors don’t have a monopoly on good ideas.

Salary is driven by market conditions and nothing else. It is not an approximation of merit or even delivered value.

stavros
·
10 months ago
·
[ - ]

This is laughably false. The highly-paid, experienced seniors produce so much more value than juniors that it's not even in the same ballpark. It's also usually the kind of value that juniors don't even notice, because it's not measured in lines of code.

A good junior will write a hundred lines of code in a day. A good senior will delete a hundred because they realize the user dictated a solution instead of detailing their problem, asked them, and figured out that they can solve that problem by changing a config variable somewhere.

klabb3
·
10 months ago
·
[ - ]

Violent agree on variance in value produced. Violent disagree on that junior, senior or other titles or roles have such strong correlations. For very simple reasons: we can’t measure value, and we absolutely can’t measure value within a performance review cycle.

The most devastating value destruction (aside from the rare intern deleting the prod db) that I’ve seen consistently is with senior/rockstars who introduce new tech, takes credit, moves on. There’s a reason for the term resume driven development. Think about what a negative force multiplier can do for an org.

_rm
·
10 months ago
·
[ - ]

How does violent agree/disagree work? Like after you conclude you agree/disagree to this internet text, do you then proceed to scream out on your balcony that which you agree with / smash up your apartment in rage, respectively?

chrismorgan
·
10 months ago
·
[ - ]

> Think about what a negative force multiplier can do for an org.

Negative force multipliers are easily remedied: just make sure you have an even number of them.

atherton33
·
10 months ago
·
[ - ]

This totally happened on my first team.

We had a guy who would argue about everything that knew the CTO so we had to tolerate him.

Then we hired a second one and they just argued with each other all the time and the rest of the team could finally make progress.

It was awesome.

bravetraveler
·
10 months ago
·
[ - ]

I have a feeling I was hired to be this second guy and I really just want to hang up my gloves, man

atherton33
·
10 months ago
·
[ - ]

Thank you for your service

stavros
·
10 months ago
·
[ - ]

I don't know, I think where I work we have a pretty good idea for the value each person brings. I don't know how much they're paid, but I do know how good each person is (including whether they tend to complicate things, to use exciting technologies, etc).

Maybe it varies per company.

mulmen
·
10 months ago
·
[ - ]

> I don't know how much they're paid

But that’s the entire point you’re missing. The pay is not proportional to contribution or technical skill. It’s proportional to market forces and negotiation skill.

stavros
·
10 months ago
·
[ - ]

I know what level each of our people is, and levels are compensated fairly evenly. The fact that I don't know exact numbers doesn't mean I don't have a proxy.

·
10 months ago
·
[ - ]

·
10 months ago
·
[ - ]

vaylian
·
10 months ago
·
[ - ]

> This is laughably false. The highly-paid, experienced seniors produce so much more value than juniors that it's not even in the same ballpark. It's also usually the kind of value that juniors don't even notice, because it's not measured in lines of code.

This particular case sounds like someone got incorrectly hired as a junior. Maybe they didn't have enough "real world" corporate experience and that is why they weren't offered a senior position?

mulmen
·
10 months ago
·
[ - ]

I was fresh out of college. But it’s not like that was an isolated incident or that other juniors don’t also have good ideas.

_rm
·
10 months ago
·
[ - ]

It's true, not "laughably false". I've seen with my own eyes the most effective developer in a company being paid in the bottom quartile, as well as a vice versa case.

In the former case, we basically had to demand management raise his salary to the low end of his market value, over the cause of six months, until they finally gave in. It was just so disgusting to us we couldn't let it go.

The reason comes down to a skill bias - it's a different skill set to navigate other people into getting yourself a good salary, versus navigating the ins and outs of coding. The skills don't overlap, so time spent on one detracts from another.

In the end he finally got the message we kept ramming in to his head, applied to work at a brand-name tech company, and instantly more than doubled his salary. He could've done so years earlier.

This stuff is the norm. I've been a manager having eyes on salaries while also having eyes on people's performance (although unfortunately not much of a lever on the former), and rest assured it is often a very jarring experience. Like "that person should be let go immediately / that person should job hop immediately".

stavros
·
10 months ago
·
[ - ]

> instantly more than doubled his salary. He could've done so years earlier.

So it's not true, then?

The GP claims this is universally true. All I need to do is post a counterexample, and I did. Yes, there are shitty companies that try to keep salaries as low as possible, not realizing that that will lose them their best people. Don't work for those!

·
10 months ago
·
[ - ]

mewpmewp2
·
10 months ago
·
[ - ]

Yes, but they don't get paid as many times as junior engs compared to how many more times value they bring.

·
10 months ago
·
[ - ]

dehugger
·
10 months ago
·
[ - ]

My view as a burgeoning senior dev is that the "senior" bit is generally less about coding and more about domain knowledge.

Understanding the business processes of your industry, how to solicit feedback from and interact with end users, how to explain things to management/sell on ideas.

If you put a junior dev in front of a panel of executives and ask them to explain requirements for a project odds are quite high they will info dump tech mumbo-jumbo. A senior should be able to explain risks, benefits, timelines, and impacted areas of the business in a manner that non technical people can easily grok.

truncate
·
10 months ago
·
[ - ]

>> they will info dump tech mumbo-jumbo

I'm mid-level engineer. Honestly, several staff+ engineers may not be spitting tech mumbo-jumbo, but they do dump all other kind of BS. Political BS, "tactical tornados"[1]. May not necessarily mean they were good at engineering, but just good with people skills. Obviously, not everyone is like that, but I would say many are.

[1] https://news.ycombinator.com/item?id=33394287#:~:text=The%20....

skydhash
·
10 months ago
·
[ - ]

If it will be BS, it should be understandable BS.

vsuperpower2020
·
10 months ago
·
[ - ]

For me, "senior" just counts the amount of time they've been doing something. If someone isn't very good at something after putting ten thousand hours into it, they just might work at microsoft.

close04
·
10 months ago
·
[ - ]

Statistically speaking a senior (more experienced) engineer is more likely to consistently deliver time saving results, while a junior is more likely to occasionally do it, if ever.

Proving it’s not a one time thing is what pushes you in the salary and seniority ranking.

nashashmi
·
10 months ago
·
[ - ]

Senior engineers have less opportunity to write time consumingly careful code because they get paid so much. Much easier to throw new great hardware at it.

fifilura
·
10 months ago
·
[ - ]

Senior engineers have less time to write code period.

And this is what saves the day.

Code is a liability.

JonChesterfield
·
10 months ago
·
[ - ]

The corporate structures that reward people who prove especially good at building the product with more meetings and less time building the product are perhaps not optimal in their deployment of resources.

Maximising the fraction of the product built by people who don't know what they're doing would however explain the emergent properties of modern software.

dgfitz
·
10 months ago
·
[ - ]

Senior engineers can write time-consuming, careful code efficiently. This is why they are seniors.

johnnyanmac
·
10 months ago
·
[ - ]

Not a monopoly, but a majority. Many juniors who do have that potential don't ever get put in such a situation.

Junior/senior isn't necessarily about skill level; I'm sure many can find a senior with 1YOE ten times over. It's about trust both in technical and sociopolitical navigation through the job. That's only really gained with time and experience (and yes, isn't perfect. Hence, the aforementioned 1x10 senior. Still "trusted" more than a 1 year junior).

beebmam
·
10 months ago
·
[ - ]

why is this not considered a compiler optimization and/or language problem? it seems to me that compiler optimizations for expressive programming languages should be able to handle something like this

JonChesterfield
·
10 months ago
·
[ - ]

What would you hope a compiler to optimise x % y into?

Higher level change-the-algorithm aspirations haven't really been met by sufficiently smart compilers yet, with the possible exception of scalar evolution turning loops into direct calculation of the result. E.g. I don't know of any that would turn bubble sort into a more reasonable sort routine.

vitus
·
10 months ago
·
[ - ]

If y is always a power of 2 (as suggested in the comments), then I'd expect it to turn into an AND of some sort.

And more generally, with older architectures, integer division was much slower than integer multiplication, so compilers would generally transform this into a multiplication plus some shifts [0]. For context in that timeframe, MUL on Sandy Bridge introduces 3-4 cycles worth of latency (depending on the exact variant), compared to DIV introducing 20+ (per Agner Fog's excellent instruction tables [1]). So even computing x - y * (x / y) with the clever math to replace x/y would be much faster than just x%y. (It's somewhat closer today, but integer division is still fairly slow.)

[0] https://news.ycombinator.com/item?id=1131177 (the linked article 404s now, but it's archived: https://web.archive.org/web/20110222015211/https://ridiculou...)

[1] page 220 of https://www.agner.org/optimize/instruction_tables.pdf

wizzwizz4
·
10 months ago
·
[ - ]

> So even computing x - y * (x / y) with the clever math to replace x/y would be much faster than just x%y.

That only works when y is constant. Otherwise, you need to work out what to replace x/y with… which ultimately takes longer than just using the DIV instruction.

vitus
·
10 months ago
·
[ - ]

> That only works when y is constant.

Excellent point! That said, that was the case in this particular example.

> This 16-bit modulo was just a final check that the correct number of bytes or bits were being sent (expecting remainder zero), so the denominator was going to be the same every time.

(Libraries like libdivide allow you to memoize the magic numbers for frequently-used denominators, and if on x86 you have floating point operations with more precision than you need for integer division, you can potentially use the FPU instead of the ALU: https://lemire.me/blog/2017/11/16/fast-exact-integer-divisio...)

brogrammernot
·
10 months ago
·
[ - ]

This exact type of thing is why when I switched to the dark side (product) and sat in management meetings where often non-technical folks would go “we could measure by lines of code or similar” for productivity I often pointed out how that was a bad idea.

Did I win? Of course not, it’s hard for non-technical people to fully appreciate these things and any sort of larger infrastructure work, esp for developer productivity because it goes back to well how you going to measure that ROI.

Anyways, this was fun to read and brought back good engineering memories. I’d also like to say, as it brought back a bug I chased forever, fuck you channelfactory in c#.

Swizec
·
10 months ago
·
[ - ]

Have you ever suggested that management/leadership should measure productivity by lines of document text written? They might better grok how that’s a bad idea. Especially since many of them much prefer to communicate in bullet-pointed slides than documents.

joshspankit
·
10 months ago
·
[ - ]

Or measure their mechanic's productivity by number of hours spent on the car

·
10 months ago
·
[ - ]

neonsunset
·
10 months ago
·
[ - ]

Troubleshooting vendor WCF SDK version mismatch was not fun, and the guy who had to reverse engineer it to attempt a .NET Core port probably lost a few years off his lifespan (this was before CoreWCF was a thing).

When people bash gRPC today, they don't know of the horrors of the past.

brogrammernot
·
10 months ago
·
[ - ]

Yeah, I’ve lived the life of straddling .NET Core and ASP.NET while also dealing with React vs Angular2+ and having half of the system in the script bundling hell that was razor views and all sorts of craziness.

That experience is actually what led me to switch over to Product among other things, I get it when people joke (half joke) about considering retirement rather than going through that again.

neonsunset
·
10 months ago
·
[ - ]

At the time, we had already been using React for front-end widgets so migrating most other parts to then latest .NET Core 3.1 went surprisingly smooth. There were a couple of EF queries that stopped working as EF Core disabled application side evaluation by default, but that was ultimately a good thing as the intention wasn't to pull more data than needed.

Instead, the actual source of problems was K8S and the huge amount of institutional knowledge it required that wasn't there. I still don't think K8S is that good, it's useful but it and containerized environments in general to this day have a lot of rough edges and poorly implemented design aspects - involved runtimes like .NET CLR and OpenJDK end up having to do special handling for them because reporting of core count and available memory is still scuffed while the storage is likely to be a network drive. The latter is not an issue in C# where pretty much all I/O code is non-blocking so there is no impact on application responsiveness, but it still violates many expectations. Aspects of easy horizontal scaling and focus on lean deployments are primarily more useful for worse languages with weaker runtimes that cannot scale as well within a single process.

I suppose, a silver lining to your situation on the other hand is that developers get to have a PO/PM with strong technical background which makes so many communication issues go away.

khazhoux
·
10 months ago
·
[ - ]

But you must admit, this is not the common case. If a developer regularly takes 3 months to fix every bug, then those all better be nasty heisenbeasts, because it's more likely that the developer is just slow.

mjburgess
·
10 months ago
·
[ - ]

The issue is the assumption that managers can assess productivity if is it is captured by a number; but would otherwise be aware, they'd be hopeless at it in a conversation. ie., theyre reducing it to a number to hide the fact they cannot do it.

It's strange that we havent figured out how to trust technical leadership to assess these things for management.

In many ways, the answer is obvious: give technical leaders economic incentives for team productivity. They will then use their expertise to actually assess relevant teams.

zarathustreal
·
10 months ago
·
[ - ]

I think the problem is that in order to evaluate something you need to have equal understanding of it as the person that made it. This is a problem in every field, everywhere. In fact it’s the reason pure democracy isn’t the optimal strategy for governance - the masses aren’t really qualified to make decisions.

Taylor_OD
·
10 months ago
·
[ - ]

Slow or has bad debugging abilities. This article is noteworthy because of the length of time taken and allowed for a bug fix. I can imagine almost any manager saying this isnt high priority enough for this time investment after week 2 or month 1.

jonathanlydall
·
10 months ago
·
[ - ]

I really miss working with WCF, said no one ever.

pelagicAustral
·
10 months ago
·
[ - ]

Some of the stuff I've struggled with the most over the years have been SQL constraints that are not documented. I remember (probably like 10 years ago), I deployed an update to an ancient Windows Forms implementation that deprecated some login and instead made use of Windows Authentication. It worked like a charm for all users, but one! Checked everything, replicated the machine, tried so many weird stuff, and in the end, what was happening is that the "Users" table had a constraint in the number of characters for the username. This username was over the limit and was not being validated... Another one was a report that was giving the wrong amount, but getting the data from database seemed to do the math right... it was the damn Money datatype, changed to decimal, done...

klysm
·
10 months ago
·
[ - ]

Changing money to decimal seems like a mistake - it must’ve been money in the first place for a reason

EvgeniyZh
·
10 months ago
·
[ - ]

I recently found a bug whose fix amounted to one-liner. It all started with random CI failure when I was working on adding some new functionality.

I've rerun test locally -- no fail. I've changed the seed to one that was used in failing run -- nothing.

I add a loop to the code to repeat text hundred times -- still nothing. I run the test in bash loop hundred time -- 3 fails. So this already hints on some internal problems. I fixed every possible source of randomness and verified that all the inputs are identical between the runs -- still fails only once in a while. I started building MWE, but the function involved in reproduction is fairly complicated. I'm left with a hundreds lines of Jax code which fails in couple of percent of cases.

I look at the output of the compiler, and it is identical between failing and successful runs. So the problem is in the compiled code. The compiled code is ~1000 lines of HLO (not much better than assembly). Unfortunately HLO tooling is both unfamiliar to me and not well fit to this case (or at least I couldn't figure it out). So I start manually bisecting the code. I'm finally left with ~30 lines of HLO. It fails even less often (1% maybe), but at least it runs fast. It also seems to fail in exactly the same way (i.e., there is single incorrect output that I've between 3 fails). Now that's something maintainers can be hoped to look at.

It turned out that matrices with same content but different layout were deduplicated, leading to, in my case, transposed matrix being replaced by non-transposed one. The hash used for storage did take layout into account so the bug appeared only if two entries ended up in the same bucket (~3% of times). The fix was an obvious one liner [1].

[1] https://github.com/openxla/xla/commit/76e7353599d914546f9b30...

creeble
·
10 months ago
·
[ - ]

Ha, coincidentally, I designed and built an 8051-based MIDI switch in the early 90’s. There weren’t that many good tools at the time, and I designed everything from the software and UI to the circuit board and rack-mount case.

I even wrote an 8051 assembler in C, but found a good tiny-C compiler for it before it went into production.

You are not a programmer unless you’ve written key-debounce code :)

(OTOH, some of the worst programmers I’ve ever had the displeasure of working with were amazing low-level code hackers. In olden times, it seems like you were either good at that level of abstraction, or you were good at a much different [“higher”] level, seldom both.)

r4nd_f
·
10 months ago
·
[ - ]

> You are not a programmer unless you’ve written key-debounce code :)

I've had to do that once, and I still consider it a blessing from Satan above that I both a) figured it out b) it worked every time (plus bonus C: explained the logic to better students in my class)

readthenotes1
·
10 months ago
·
[ - ]

"...it was based on a USB product we had already been making for PCs for almost a decade.

This product was so old in fact that nobody knew how to compile the source code. "

I think you mean "Management was so bad, nobody knew how to compile the source code".

There are plenty of systems out there that can and and plenty that cannot be reproduced from source. The biggest difference is the card taken to do so, not the age.

pvaldes
·
10 months ago
·
[ - ]

"I also ended up needing to find a Perl script that was buried deep in some university website. I still don’t know anything about Perl, but I got it to run"

Find dusty Perl script forgotten for years. Still works

Not the first time that I hear that

nikanj
·
10 months ago
·
[ - ]

Outside of javascript, it’s a pretty reasonable assumption that if you have the sources, you can get them to run

spc476
·
10 months ago
·
[ - ]

Most Perl code now a days runs under Perl5. I once tried running a Perl4 script in Perl5 and did not have a good day.

winrid
·
10 months ago
·
[ - ]

Reminds me of fixing an ~11yr old bug in Enemy Territory. I had to spend a night debugging the C code only to realize the issue was in the UI config: https://github.com/etlegacy/etlegacy-deprecated/pull/100/fil...

(IIRC UI scrolled twice for every mouse movement + you couldn't select items in server browser with mouse wheel as it would skip every other one)

lostlogin
·
10 months ago
·
[ - ]

That was such a great game but sadly it seemed to fizzle out. There were lots of neat exploits which made it even better. I also liked the communication style, with pre canned message you could give with certain key combos.

Terr_
·
10 months ago
·
[ - ]

In a similar vein, the voice tree from Starseige:Tribes (1998) was mind-blowing for the dialup era.

Ex: VSAB -> "I am attacking the enemy base!" (Voice, Self, Attacking, Base)

ramses0
·
10 months ago
·
[ - ]

VGS! Midair keeps the torch lit, and has had a re-release, but they're less "base" and more CTF with lights.

winrid
·
10 months ago
·
[ - ]

There's usually a full server or two. ETLegacy has plenty of players for me. but yeah, the communication style is fun. and if you join a server on axis you'll usually get spammed with "Hallo! Hallo! Hallo!" :)

intelVISA
·
10 months ago
·
[ - ]

WolfET was such good fun, awesome.

winrid
·
10 months ago
·
[ - ]

still is! just make sure you play ETLegacy as it's a more maintained client.

magwa101
·
10 months ago
·
[ - ]

Similarly, I spent 6 weeks on a kernel token-ring driver intermittent initialization issue. This required kernel restarts over and over to observe the issue. Breakpoints were useless as they hid the issue. Turns out initialization in a specific step was not synchronous and reading the status was a race condition. It tooks weeks of staring, joking around, thinking, bs'ing, then suddenly, voila. Changed the order of the code, worked.

m3kw9
·
10 months ago
·
[ - ]

These one line fix always seem like a stupid bug , but in reality most bugs are like this and the fix is in the discovery

xeromal
·
10 months ago
·
[ - ]

One of the reasons I struggle to give ETAs on fixing a bug. The moment I know what the issue is, the solution to fix it is usually already figured out barring a rearchitecture of some services or infrastructure.

noisy_boy
·
10 months ago
·
[ - ]

"how long it will take to fix?"

"I will let you know as soon as I have the fix"

halifaxbeard
·
10 months ago
·
[ - ]

Reminds me of a bug I fixed in yamux, simply because of how long I've had to deal with it. Bug existed for as long as yamux did. (yamux is used by hashicorp for stream muxing everywhere in their products.)

If yamux's keepalive fails/times out, and you're calling Read on a demuxed stream, it blocks forever.

https://github.com/hashicorp/yamux/pull/127

klysm
·
10 months ago
·
[ - ]

Huh I thought go was supposed to make concurrency easier. That seems very tricky to get right

rented_mule
·
10 months ago
·
[ - ]

The worst I experienced in this direction was also on a consumer device about 15 years ago. Performance was degraded and we couldn't explain it. A team of 5 of us was assembled to figure it out.

We spent over three months on it before finding a root cause. It was over two months before we could even understand how to measure it - we were seeing parts of the automated overnight test suite run taking longer, but every night it would be different tests that were slow. A key finding was that almost everything was slow on some boots of the device and fast on other boots of the device, and there was a reboot before each test was run. Doing some manual testing showed it being close to a 50% chance of a boot leading to slowness. Now what?

I eventually got frustrated and took the brute force / mindless approach... binary search over commits. Unfortunately, that wasn't easy because our build was 45-60 minutes, and then there was a heavily manual installation process that took 10-20 minutes, followed by several reboots to see if anything was slow. And there were several thousand commits since the last known good build (the previously shipped version of the device). The build/install/testing process was not easily automated, and we were not on git, otherwise using git-bisect would have been nice. Instead, I spent weeks doing the binary search manually.

That yielded the offending commit. The problem was that it was a massive commit (tens of thousands of lines of code) from a group in another part of the company. It was a snapshot of all of their development over the course of a couple of years. The commit message, and the authors, stated that the commit was a no-op with everything behind a disabled feature flag.

So now it was onto code level binary search. Keep deleting about half of the code in the commit, in this case by chunks that are intended to be inactive. After eventually deleting all the inactive code, there were still a few dozen lines of changes in a Linux subsystem that did window compositing. Those lines of code were all quite interdependent, so it was hard to delete much and keep things functional, so now on to walking through code. At least I could use my brain again!

Using the clue that the problem was happening about half the time and given that this code was in C, I started looking for uninitialized booleans. Sure enough, there was one called something like `enable_transparency`. Disabled code was setting it to `true`, but nothing was setting it to `false` when their system was disabled. Before their commit, there was no variable - `false` was being passed into the initializer call directly. Adding `= false` to the declaration was the fix.

So, well over a year of engineering hours spent to figure out the issue. The upside is that some people on the team didn't know how to proceed, so they spent their time speeding up random things that were slow. So the device ended up being noticeably faster when we were done. But it was pretty stressful as we were closing in on our launch date with little visibility into whether we'd figure it out or not.

namrog84
·
10 months ago
·
[ - ]

C++ senior dev here. Of the few teams I've been on, one of the first things I make sure is setup right is cranking up warnings, warnings as errors. Which include things like un initialized variables. I then fix up the errors and make sure they are part of build gates.

These types of problems(undefined behavior and or uninitialized) are often hard(time consuming) to diagnose and fairly common.

Lots of places overlook simple static analysis or built in compile features.

noisy_boy
·
10 months ago
·
[ - ]

When I wrote Perl regularly, the first two lines I wrote after shebang were use strict and use warnings - they saved me many many times. It is not a universal rule but using warnings as error also helps expose systematic issues.

hoten
·
10 months ago
·
[ - ]

Oh man, that sounds rough. I salute you.

This probably wasn't an option back then with your toolchain, but it's so reassuring to know modern compilers / ASAN are amazing at catching this class of bugs today.

leni536
·
10 months ago
·
[ - ]

AFAIK ASAN does not catch uninitialized variables, MSAN does. MSAN is significantly harder to set up.

JonChesterfield
·
10 months ago
·
[ - ]

Branch on uninit lights up beautifully in valgrind which has no set up, just run valgrind ./a.out

leni536
·
10 months ago
·
[ - ]

Good point, although sometimes valgrind is too slow.

gred
·
10 months ago
·
[ - ]

I love that the fix was an optimization allowing the code to keep the simplifying assumption that only one MIDI event ever needs to be buffered... rather than a "cleaner" / "future-proof" design change allowing buffering of more than one MIDI event.

figassis
·
10 months ago
·
[ - ]

The number of times I bumped by head against a desk, after missing multiple deadlines and then out of nowhere having a random moment of clarity such has “this gives me X vibes, but it would be insane if this was actually the case”, and then I do a quick string search and there it is.

anytime5704
·
10 months ago
·
[ - ]

First time I’ve seen hackernews link to Lemmy.

Love to see it. That place needs more organic growth.

tommiegannert
·
10 months ago
·
[ - ]

Kudos also to the original author for not doing premature optimization, of course. It wasn't until the iPad that it was needed. However, a TODO might have been useful. ;)

tedunangst
·
10 months ago
·
[ - ]

Only for users that didn't use both features at the same time. Users who did probably experienced the same bug, but it took until a critical mass of users reported the bug to get it fixed. At which point the fix probably took four times longer than necessary because the developer was unfamiliar with the design and the toolchain had decayed.

iam-TJ
·
10 months ago
·
[ - ]

Based on experience gained from debugging complex issues and code over decades I have a mantra I repeat to myself and others:

"Almost every bug turns out to be a 1 that should be 0, or a 0 that should be 1"

Keeping this in mind often keeps one focused on the detail of the underlying binary values and how they are being manipulated.

omoikane
·
10 months ago
·
[ - ]

> given a fixed denominator, any 16-bit modulo can be rewritten as three 8-bit modulos

Anybody know what's the exact transformation here? I searched around and found this answer, but it doesn't work:

https://stackoverflow.com/a/10441333

o11c
·
10 months ago
·
[ - ]

If the denominator is a constant, wouldn't it be faster to use the divmod identity to turn it into (divide, multiply, subtract), then use the usual constant-divide-is-multiply-and-shift optimization?

justincredible
·
10 months ago
·
[ - ]

The article isn't very clear but assuming it's a 16-bit numerator and 8-bit denominator, then MSN's answer to [0] lays it out (although for higher bit sizes). If the denominator was 16-bit, then the top-rated answer (by caf) to the same SO question seems like another approach, but that wouldn't be a one line change.

[0] https://stackoverflow.com/questions/2566010/fastest-way-to-c...

thebeardisred
·
10 months ago
·
[ - ]

This was a great read if for no more reason than this quote:

> I can still recall the cacophony of what amounted to an elephant on cocaine slamming on a keyboard for hours on end.

thenoblesunfish
·
10 months ago
·
[ - ]

Why is it so bad for a second noteon to leave the original note sounding? It makes no sense for keyboards but maybe you really do want two of the same note sounding.

langsoul-com
·
10 months ago
·
[ - ]

Half the challenge of the bug isn't fixing the code. It's finding wtf is happening.

shermantanktop
·
10 months ago
·
[ - ]

This kind of bug is always an emotional rollercoaster of anticipation, discovery, disappointment, angst, self-criticality, and satisfaction.

Tao3300
·
10 months ago
·
[ - ]

And then they screw you over on performance reviews because it was pointed to fit within a single sprint.