One security checking tool that has genuinely impressed me recently is CodeQL. If you’re using GitHub, you can run this as part of GitHub Advanced Security.
Unlike those naïve tools, CodeQL seems to perform a real tracing analysis through the code, so its report doesn’t just say you have user-provided data being used dangerously, it shows you a complete, step-by-step path through the code that connects the input to the dangerous usage. This provides useful, actionable information to assess and fix real vulnerabilities, and it is inherently resistant to false positives.
Presumably there is still a possibility of false negatives with this approach, particularly with more dynamic languages like Python where you could surely write code that is obfuscated enough to avoid detection by the tracing analysis. However, most of us don’t intentionally do that, and it’s still useful to find the rest of the issues even if the results aren’t perfect and 100% complete.
The latest drop in the bucket was a comment adding a useless intermediate variable, with the justification being “if you do this, you’ll avoid CodeQL flagging you for the problem”.
Sounds like slight overfitting to the data!
For non-SaaS products it doesn’t matter. Your customer’s security teams have their own scanners. If you ship them vulnerable binaries, they’ll complain even if the vulnerable code is never used or isn’t exploitable in your product.
Nope.
By Rice's Theorem, I somehow doubt that.
If dataflow is not provably connected from source to sink, an alert is impossible. If a sanitization step interrupts the flow of potentially tainted data, the alert is similarly discarded.
The end-to-end precision of the detection depends on the queries executed, the models of the libraries used in the code (to e.g., recognize the correct sanitizers), and other parameters. All of this is customizable by users.
All that can be overwhelming though, so we aim to provide sane defaults. On GitHub, you can choose between a "Default" and "Extended" suite. Those are tuned for different levels of potential FN/FP based on the precision of the query and severity of the alert.
Severities are calculated based on the weaknesses the query covers, and the real CVE these have caused in prior disclosed vulnerabilities.
QL-language-focused resources for CodeQL: https://codeql.github.com/
Looking at the docs, I’m not really sure CodeQL is semantic in the same sense as Rices theorem. It looks syntactic more than semantic.
Eg breaking Rices theorem would require it to detect that an application isn’t vulnerable if it contains the vulnerability but only in paths that are unreachable. Like
if request.params.limit > 1000:
throw error
# 1000 lines of code
if request.params.limit > 1000:
call_vulnerable_code()
I’m not at a PC right now, but I’d be curious if CodeQL thinks that’s vulnerable or not.It’s probably demonstrably true that there is syntactically a path to the vulnerability, I’m a little dubious that it’s demonstrably true the code path is actually reachable without executing the code.
Is CodeQL special cased for your code? I very much doubt that. Then it must work in the general case. At that point decidability is impossible and at best either false positives or false negatives can be guaranteed to be absent, but not both (possibly neither of them!)
I don't doubt CodeQL claims can be demonstrably true, that's still coherent with Rice's theorem. However it does mean you'll have false negatives, that is cases where CodeQL reports no provable claim while your code is vulnerable to some issues.
Clearly it is still possible to generate a false positive if, for example, CodeQL’s algorithm thinks it has found a path through the code where unsanitised user data can be used dangerously, but in fact there was a sanitisation step along the way that it didn’t recognise. This is the kind of situation where the theoretical result about not being able to determine whether a semantic property holds in all cases is felt in practical terms.
It still seems much less likely that an algorithm that needs to produce a specific demonstration of the problem it claims to have found will result in a false positive than the kind of naïve algorithms we were discussing before that are based on a generic look-up table of software+version=vulnerability without any attempt to determine whether there is actually a path to exploit that vulnerability in the real code.
I would love to hear what kind of local experience you're looking for and where CodeQL isn't working well today.
As a general overview:
The CodeQL CLI is developed as an open-source project and can run CodeQL basically anywhere. The engine is free to use for all open-source projects, and free for all security researchers.
The CLI is available as release downloads, in homebrew, and as part of many deployment frameworks: https://github.com/advanced-security/awesome-codeql?tab=read...
Results are stored in standard formats and can be viewed and processed by any SARIF-compatible tool. We provide tools to run CodeQL against thousands of open-source repos for security research.
The repo linked above points to dozens of other useful projects (both from GitHub and the community around CodeQL).
I’d be interested in what kinds of false positives you’ve seen it produce. The functionality in CodeQL that I have found useful tends to accompany each reported vulnerability with a specific code path that demonstrates how the vulnerability arises. While we might still decide there is no risk in practice for other reasons, I don’t recall ever seeing it make a claim like this that was incorrect from a technical perspective. Maybe some of the other types of checks it performs are more susceptible to false positives and I just happen not to have run into those so much in the projects I’ve worked on.
The patterns we had established were as simple, basic, and "safe" as practical, and we advised and code-reviewed the mechanics of services/apps for the other teams, like using database connections/pools correctly, using async correctly, validating input correctly, etc (while the other teams were more focused on features and business logic). Low-level performance was not really a concern, mostly just high-level db-queries or sub-requests that were too expensive or numerous. The point is, there really wasn't much of anything for CodeQL to find, all the basic blunders were mostly prevented. So, it was pretty much all false-positives.
Of course, the experience would be far different if we were more careless or working with more tricky components/patterns. Compare to the base-rate fallacy from medicine ... if there's a 99% accurate test across a population with nothing for it to find, the "1%" false positive case will dominate.
I also want to mention a tendency for some security teams to decide that their role is to set these things up, turn them on, cover their eyes, and point the hose at the devs. Using these tools makes sense, but these security teams think it's not practical for them to look at the output and judge the quality with their own brains, first. And it's all about the numbers: 80 criticals, 2000 highs! (except they're all the same CVE and they're all not valid for the same reason)
I completely agree about the problem of someone deciding to turn these kinds of scanning tools on and then expecting they’ll Just Work. I do think the better tools can provide a lot of value, but they still involve trade-offs and no tool will get everything 100% right, so there will always be a need to review their output and make intelligent decisions about how to use it. Scanning tools that don’t provide a way to persistently mark a certain result as incorrect or to collect multiple instances of the same issue together tend to be particularly painful to work with.
Certain languages don't have enough "rules" (forgot the term) either. This is the only open/free SAST I know of, if there are others I'd be interested as well.
My hope+dream is for Linux distros to require checks like this to pass for anything they admit to their repo.
> Dependencies should be updated according to your development cycle, not the cycle of each of your dependencies. For example you might want to update dependencies all at once when you begin a release development cycle, as opposed to when each dependency completes theirs.
and is arguing in favor of targeted updates.
It might surprise the younger crowd to see the number of Windows Updates you wouldn't have installed on a production machine, back when you made choices at that level. From this perspective Tesla's OTA firmware update scheme seems wildly irresponsible for the car owner.
It's just a silly historical artifact that we treat DoS as special, imo.
If the system is configured to "fail open", and it's something validating access (say anti-fraud), then the DoS becomes a fraud hole and profitable to exploit. Once discovered, this runs away _really_ quickly.
Treating DoS as affecting availability converts the issue into a "do I want to spend $X from a shakedown, or $Y to avoid being shaken down in the first place?"
Then, "what happens when people find out I pay out on shakedowns?"
The problem here isn't the DoS, it's the fail open design.
The best case is having your credit card processing fees like quadruple, and the worst case is being in a regulated industry and having to explain to regulators why you knowingly allowed a ton of transactions with 0 due diligence.
CVEs are at the discretion of the reporter.
> Then, "what happens when people find out I pay out on shakedowns?"
What do you mean? You pay to someone else than who did the DoS. You pay your way out of a DoS by throwing more resources at the problem, both in raw capacity and in network blocking capabilities. So how is that incentivising the attacker? Or did you mean some literal blackmailing??
Security team cannot explain attach surface. In the end it is binary. Fix it or take the blame
DoS is distinct because it's only considered a "security" issue due to arbitrary conversations that happened decades ago. There's simply not a good justification today for it. If you care about DoS, you care about almost every bug, and this is something for your team to consider for availability.
That is distinct from, say, remote code execution, which not only encompasses DoS but is radically more powerful. I think it's entirely reasonable to say "RCE is wroth calling out as a particularly powerful capability".
I suppose I would put it this way. An API has various guarantees. Some of those guarantees are on "won't crash", or "terminates eventually", but that's actually insanely uncommon and not standard, therefor DoS is sort of pointless. Some of those guarantees are "won't let unauthorized users log in" or "won't give arbitrary code execution", which are guarantees we kind of just want to take for granted because they're so insanely important to the vast majority of users.
I kinda reject the framing that it's impossible to categorize security vulnerabilities broadly without extremely specific threat models, I just think that that's the case for DoS.
There are other issues like "is it real" ie: "is this even exploitable?" and there's perhaps some nuance, and there's issues like "this isn't reachable from my code", etc. But I do think DoS doesn't fall into the nuanced position, it's just flatly an outdated concept.
But at the same time i don't know. Pre-cloudflare bringing cheap ddos mitigation to the masses, i suspect most website operators would have preferred to be subject to an xss attack over a DoS. At least xss has a viable fix path (of course volumetric dos is a different beast than cve type dos vulns)
We have decades of history of memory corruption bugs that were initially thought to only result in a DoS, that with a little bit of work on the part of exploit developers have turned into reliable RCE.
Regardless, I don't think it matters. If you truly believe your DoS may be a likely privesc etc, label it as those. The system accounts for this. The insanely vast majority of DoS are blatantly not primitives for other exploits.
Strongly disagree. While it might not matter much in some / even many domains, it absolutely can be mission critical. Examples are: Guidance and control systems in vehicles and airplanes, industrial processes which need to run uninterrupted, critical infrastructure and medicine / health care.
I can produce a web server that prints hello world and if you send it enough traffic it will crash. If can put user input into a regex and the response time might go up by 1ms and noone will say its suddenly a valid cve.
Then someone will demonstrate that with a 1mb input string it takes 4ms to respond and claim they've learnt a cve for it. I disagree. If you simply use Web pack youve probably seen a dozen of these where the vulnerable input was inside the Web pack.config.json file. The whole category should go in the bin.
But if we no longer classed DOSes as vulnerabilities they might
For a product that requires functional safety, CVEs are almost entirely a marketing tool and irrelevant to the technology. Go ahead and classify them as CVEs, it means the sales people can schmooze with their customer purchasing department folks more but it's not going to affect making your airplane fly or you car drive or your cancer treatment treat any more safely.
CVEs are helpful for describing the local property of a vulnerability. DOS just isn't interesting in that regard because it's only a security property if you have a very specific threat model, and your threat model isn't that localized (because it's your threat model). That's totally different from RCE, which is virtually always a security property regardless of threat model (unless your system is, say, "aws lambda" where that's the whole point). It's just a total reversal.
Well, the Linux Kernel project actually does.
If availability isn’t part of CIA then a literal brick fulfills the requirements of security and the entire practice of secure systems is pointless.
That and it can't understand that a tool that runs as the user on their laptop really doesn't need to sanitise the inputs when it's generating a command. If the user wanted to execute the command they could without having to obfuscate it sufficient to get through the tool. Nope, gotta waste everyone's time running sanitisation methods. Or just ignore the stupid code review tool.
We also suffer from this. Although in some cases it's due to a Dev dependency. It's crazy how much noise it adds specifically from ReDoS...
I made a GitHub action that alerts if a PR adds a vulnerable call, which I think pairs nicely with the advice to only actually fix vulnerable calls.
https://github.com/imjasonh/govulncheck-action
You can also just run the stock tool in your GHA, but I liked being able to get annotations and comments in the PR.
Incidentally, the repo has dependabot enabled with auto-merge for those PRs, which is IMO the best you can do for JS codebases.
If your test suite is up to the task you’ll find defects in new updates every now and then, but for me this has even led to some open source contributions, engaging with our dependencies’ maintainers and so on. So I think overall it promotes good practices even though it can be a bit annoying at times.
For a library, you really want the widest range of "allowed" dependencies, but for the library's test suite you want to pin specific versions. I wrote a tool[1] that helps me make sure (for the npm ecosystem) my dependency specifications aren't over-wide.
For an application, you just want pinned specific dependencies. Renovate has a nice feature wherein it'll maintain transitive dependencies, so you can avoid the trap of only upgrading when forced to by more direct dependencies.
The net result is that most version bumps for my library code only affect the test environment, so I'm happy allowing them through if the tests pass. For application code, too, my personal projects will merge version bumps and redeploy automatically -- I only need to review if something breaks. This matches the implicit behaviour I see from most teams anyway, who rely on "manual review" but only actually succeed in adding toil.
My experience is that Renovate's lock file maintenance makes update a whole load safer than the common pattern of having ancient versions of most transitive dependencies then upgrading a thread of packages depended on by a newer version of a single dependency.
https://docs.github.com/en/code-security/reference/supply-ch...
You can have Dependabot enabled, but turn off automatic PRs. You can then manually generate a PR for an auto-fixable issue if you want, or just do the fixes yourself and watch the issue number shrink.
The fundamental problem with Dependabot is that it treats dependency management as a security problem when it's actually a maintenance problem. A vulnerability in a function you never call is not a security issue — it's noise. But Dependabot can't distinguish the two because it operates at the version level, not the call graph level.
For Python projects I've found pip-audit with the --desc flag more useful than Dependabot. It's still version-based, but at least it doesn't create PRs that break your CI at 3am. The real solution is better static analysis that understands reachability, but until that exists for every ecosystem, turning off the noisy tools and doing manual quarterly audits might actually be more secure in practice — because you'll actually read the results instead of auto-merging them.
But I don't quite understand what Dependabot is doing for Go specifically. The vulnerability goes away without source code changes if the dependency is updated from version 1.1.0 to 1.1.1. So anyone building the software (producing an application binary) could just do that, and the intermediate packages would not have to change at all. But it doesn't seem like the standard Go toolchain automates this.
I think the bigger problem is that Github is being treated as a quasi-social-media, and these things are being viewed as a "thumbs down" or "dislike" (and vice versa). Unless you have an SLA with someone, you don't have to meet any numbers, just do your best when you feel like it, and drive your project best way you think. Just don't be a dick to people about it, or react to these social-media metrics by lashing out against your users or supporters (not claiming that in this case!).
> These PRs were accompanied by a security alert with a nonsensical, made up CVSS v4 score and by a worrying 73% compatibility score, allegedly based on the breakage the update is causing in the ecosystem.
Where did the CVSS score come from exactly? Does dependabot generate CVEs automatically?
For every boring API you can imagine someone using it for protecting nuclear launch codes, while having it exposed to arbitrary inputs from the internet. If it's technically possible, even if unrealistically stupid, CVSS treats it the same as being a fact, and we get spam about the sky falling due to ReDoS.
This is made worse by GitHub's vulnerability database being quantity-over-quality dumping ground and absolutely zero intelligence in Dependabot (ironic for a company aggressively inserting AI everywhere else)
It's good optimization advice, if you have the time, or suffer enough from the described pain points, to apply it.
What I do instead: monthly calendar reminder, run npm audit, update things that actually matter (security patches, breaking bugs), ignore patch bumps on stable deps. The goal isn't "every dep is always current" - it's "nothing in production has a known vulnerability". Very different targets.
I don't understand how the second part of that sentence is connected to the first.
Separately, I love the idea of the `geomys/sandboxed-step` action, but I've got such an aversion to use anyone else's actions, besides the first-party `actions/*` ones. I'll give sandboxed-step a look, sounds like it would be a nice thing to keep in my toolbox.
Hopefully I'll have something out next week.
Yeah, same. FWIW, geomys/sandboxed-step goes out of its way to use the GitHub Immutable Releases to make the git tag hopefully actually immutable.
how about `cargo-audit`?
For security vulnerabilities, I argue that updating might not be enough! What if your users’ data was compromised? What if your keys should be considered exposed? But the only way to have the bandwidth to do proper triage is by first minimizing false positives.
This is the thing that I don't really understand but that seems really popular and gaining. The article's section "Test against latest instead of updating" seems like the obvious thing to do, as in, keep a range of compatible versions of dependencies, and only restrict this when necessary, in contrast to deployment- or lockfile-as-requirement which is restricted liberally. Maybe it's just a bigger deal for me because of how disruptive UI changes are.
Are there any tools for handling these kind of CVEs contextually? (Besides migrating all our base images to chainguard/docker hardened images etc)
There never could be, these languages are simply too dynamic.
(Source: I maintain pip-audit, where this has been a long-standing feature request. We’re still mostly in a place of lacking good metadata from vulnerability feeds to enable it.)
It doesn't have the code tracing ability that my sibling is referring to, but it's better than nothing.
If you want something more structured, I’ve been playing with and can recommend Renovate (no affiliation). Renovate supports far more ecosystems, has a better community and customisation.
Having tried it I can’t believe how relatively poor Dependabot, the default tool is something we put up with by default. Take something simple like multi layer dockerfiles. This has been a docker features for a while now, yet it’s still silently unsupported by dependabot!
We also let renovate[bot] (similar to dependabot) merge non-major dep updates if tests pass. I hardly notice when deps have small updates.
https://github.com/search?q=org%3Amoov-io+is%3Apr+is%3Amerge...
https://fossa.com/products/fossabot/
We have some of the best JS/TS analysis out there based on a custom static analysis engine designed for this use-case. You get free credits each month and we’d love feedback on which ecosystems are next…Java, Python?
Totally agree with the author that static analysis like govulncheck is the secret weapon to success with this problem! Dynamic languages are just much harder.
We have a really cool eval framework as well that we’ve blogged about.
search revealed Sonatype Scan Gradle plugin. how is it?
GitHub actions is the biggest security risk in this whole setup.
Honestly not that complicated.
Absolutely wild.
https://github.com/imjasonh/go-cooldown
It's not running anymore but you get the idea. It should be very easy to deploy anywhere you want.
(I'm a Renovate maintainer)
(I agree with Filippo's post and it can also be applied to Renovate's security updates for Go modules - we don't have a way, right now, of ingesting better data sources like `govulncheck` when raising security PRs)
That just reminds me that I got a Dependabot alert for CVE-2026-25727 – "time vulnerable to stack exhaustion Denial of Service attack" – across multiple of my repositories.
Instead of, in addition to, updating all your dependencies, perhaps it would be better to emit monkey patches that turn unsafe methods into noops, or raise an exception if such methods are invoked. e.g "paste these lines at the beginning of main to ensure are you not impacted by CVE-2026-XXXX."
I think that for FOSS the F as in Gratis is always going to be the root cause of security conflicts, if developers are not paid, security is always going to be a problem, you are trying to get something out of nothing otherwise, the accounting equation will not balance, exploiting someone else is precisely the act that leaves you open to exploitation (only according to Nash Game Theory). "158 projects need funding" IS the vector! I'm not saying that JohnDoe/react-openai-redux-widget is going to go rogue, but with what budget are they going to be able to secure their own systems?
My advice is, if it ever comes the point where you need to install dependencies to control your growing dependency graph? consider deleting some dependencies instead.
Isn't FOSS a combination of the diverging ideas of "Open Source" and "Free Software"? The "Free" in "Free Software" very much does not mean "Gratis".
Go's tooling is exceptional here because the language was designed with this in mind - static analysis can trace exactly which symbols you import and call. govulncheck exploits this to give you meaningful alerts.
The npm ecosystem is even worse because dynamic requires and monkey-patching make static analysis much harder. You end up with dependency scanners that can't distinguish between "this package could theoretically be vulnerable" and "your code calls the vulnerable function."
The irony is that Dependabot's noise makes teams less secure, not more. When every PR has 12 security alerts, people stop reading them. Alert fatigue is a real attack surface.
govulncheck solves this if your auditor understands it. But most third-party security questionnaires still ask "how do you handle dependency vulnerabilities?" and expect the answer to involve automated patching. Explaining that you run static analysis for symbol reachability and only update when actually affected is a harder sell than "we merge Dependabot PRs within 48 hours."
We're in this space and our approach was to supplement Dependabot rather than replace it. Our app (https://www.infield.ai) focuses more on the project management and team coordination aspect of dependency management. We break upgrade work down into three swim lanes: a) individual upgrades that are required in order to address a known security vulnerability (reactive, most addressed by Dependabot) b) medium-priority upgrades due to staleness or abandonedness, and c) framework upgrades that may take several months to complete, like upgrading Rails or Django. Our software helps you prioritize the work in each of these buckets, record what work has been done, and track your libyear over time so you can manage your maintenance rotation.