This whole thing feels like an elaborate LLM fantasy. Is there any real, usable language behind these examples, or is the author just role-playing with ChatGPT?
While this may seem like a whimsical example, it is not intrinsically easier or harder for an AI model compared to solving a real-world problem from a human perspective. The model processes both simple and complex problems using the same underlying mechanism. To lessen the cognitive load for the human reader, however, we will stick to simple targeted examples in this article.
For LLMs this is blatantly false - in fact asking about "used textbooks" instead of "apples" is measurably more likely to result in an error! Maybe the (deterministic, Prolog-style) Universalis language mitigates this. But since Automind (an LLM, I think) is responsible for pre/post validation, naively I would expect it to sometimes output incorrect Universalis code and incorrectly claim an assertion holds when it does not.Maybe I am making a mountain out of a molehill but this bit about "lessen the cognitive load of the human reader" is kind of obnoxious. Show me how this handles a slightly nontrivial problem, don't assume I'm too stupid to understand it by trying to impress me with the happy path.
a) a game of roulette where you hope the LLM provider has RLHFed something very close to your use case, or
b) trying to few-shot it with in-context examples requires more engineering (and is still less reliable) than simply doing it yourself
In particular it's not just "the lack of a succinct, exhaustive text description," it also a lack of English->Prolog "translations."
It seems like the LLM-Prolog community is well aware of all this (https://swi-prolog.discourse.group/t/llm-and-prolog-a-marria...) but I don't see anything in Universalis that solves the problem. Instead it's just magically invoking the LLM.
I can ensure you Prolog prompting for at least the class of robotic planning problems (and similar discrete problems, plus potentially more advanced classes such as scheduling and financial/investment allocation planning requiring objective function optimization) works well, and you can easily check it out yourself with the prompting guide or even online if you have a capable endpoint you're willing to enter [1].
Aleph? In 2025. That's just lazy, now. At the very least they should try Metagol or Popper, both with dozens of recent publications (and I'm not even promoting my own work).
How about ISO? Why was this a requirement, out of curiosity?
That said, I think Vanilla should be easy to ISO-fy. It's just a meta-interpreter really. It uses tabling and I'm guessing that's not ISO (it's more like the Wild West) but it is not indispensible.
https://github.com/stassa/vanilla/blob/master/src/vanilla.pl
I could have a look, but I have no idea what's in the ISO standard (and that's another bugbear- the ISO docs are not free. What?). I guess I could try to run it on your engine and see what the errors say though.
P.S. Re: "academic codes" I got documentation, logging, and custom errors and that's more than you can say for 100% of academic prototypes plus ~80% of industry software also. I do have unit tests, even, just not many.
The focus seems to be on the paper the sw was published along with as mere attachment; I can get as much seeing that the doc consists of an excerpt of that paper. There's no guide on how to run a simple demo. It doesn't state license conditions. It uses tabling by default but it's Wild West in your own words ;) The canonical tabling implementation would be XSB Prolog anyway, but the sw seems to require SWI. It uses "modules" but if that is justified by code base size even, for Prolog code working on theories stored in the primary Prolog db specifically, such as ILP, this is just calling for trouble with predicate visibility and permission, and it shows in idiosyncratic code where both call(X) and user:call(X) is used.
What do you really expect from putting it on github? It's nice that you've got unit tests as you're saying but there aren't any in the repo. I'm sure the code serves its original purpose as a proof-of-concept or prototype in the context of academic discourse on the happy path well, but due to the issues I mentioned picking it up already involves nontrivial time investment when it isn't sure to save time as it stands compared to developing it from scratch based on widely available academic literature. In contrast, Aleph has test data where the authors have gone out of their way to publish reproducible results (as stated in TFA), end-user documentation, coverage in academic literature for eg. troubleshooting, a version history with five or at least two widely used versions, a small community behind or even so much as a demonstration that more than a single person could make sense of it, and a perspective for long term maintenance as an ISO port
*) Not to speak of the "modules" spec being withdrawn and considered bogus and unhelpful for a really long time now, nor of "modules" pushing Prolog as general-purpose language when the article is very much about using basic idiomatic Prolog combinatorial search for practical planning problems in the context of code generation by LLMs.
With that in mind, I was personally more interested in ILP as a complementary technique to LLM code generation.
Re ISO as I said the focus here is solving practical problems in a commercial setting with standardization as an alignment driver between newer Prolog developers/engines. You know, as opposed to people funded by public money sitting on de facto implementations for decades, which are still not great targets for LLMs. Apart from SWI this would in particular be the case for YAP Prolog which is the system Aleph was originally developed for and making use of its heuristic term indexing, which however has been in a long term refactoring spree such that it's difficult to compile on modern systems
>> The focus seems to be on the paper the sw was published along with as mere attachment;
Do you mean this paper?
https://hmlr-lab.github.io/pdfs/Second_Order_SLD_ILP2024.pdf
That, and a more recent pre-print, use Vanilla but they don't go into any serious detail on the implementation. Previous Meta-Interpretive Learning (MIL) systems did not separate the learning engine from the learning system, so you would not be able to implement Vanilla from scratch based on the literature; there is no real literature on it, to speak of. I feel there is very little academic interest for scholarly articles about implementation details, at least in AI and ILP where I tend to publish, so I haven't really bothered. I might at some point submit something to a Logic Programming venue or just write up a tech report/ manual; or complete Vanilla's REAMDE.
To clarify, the documentation I meant I have, is in the structured comments accompanying the source. This can be nicely displayed in a browser with SWI-Prolog's PlDoc library. You get that automatically if you start Vanilla by consulting the `load_project.pl` project load file which also launches the SWI IDE, or with `?- doc_browser.` if you start Vanilla in "headless" mode with `?-[load_headless].` Or you can just read the comments as text, of course. I should really have put all that in my incomplete README file.
The current version of Vanilla definitely "requires" SWI in the sense that I developed it in SWI and I have no idea whether, or how, it will run in other Prologs. Probably not great, as usual. SWI has predicates to remove and rebuild tables on the fly, which XSB does not. That's convenient because it means you don't need to restart the Prolog session between learning runs. Still it's a long-term plan to get Vanilla to run on as many Prolog implementations as possible, so I really wouldn't expect you (or any other Prolog dev) to do anything to port it. That's my job.
Prolog modules are a shitshow of a dumpster fire, no objection from me. Still, better with, than without. Although I do have to jump through hoops and rudely hack the SWI module implementation to get stuff done. Long story short, in all the learners that come with Vanilla, a dataset is a module and all dataset modules have the same module name: "experiment_file". That is perfectly safe as long as usage instructions are followed, which I have yet to write (because I only recently figured it out myself). A tutorial is a good idea.
>> What do you really expect from putting it on github?
Feedback, I suppose. Like I say, no, you wouldn't be able to implement Vanilla by reading the literature on MIL, which is languishing a couple of years behind the point where Vanilla was created. Aleph enjoys ~30 years of publications and engagement with the community, but so does SWI and you're developing your own Prolog engine so I shouldn't have to argue about why that's OK.
From my point of view, Aleph is the old ILP that didn't work as promised: no real ability to learn recursion, no predicate invention, and Inverse Entailment is incomplete [1]. Vanilla is the new ILP that ticks all the boxes: learning recursion, predicate invention, sound and complete inductive inference. And it's efficient even [2].
Unfortunately there is zero interest in it, so I'm taking my time developing it. Nobody's paying me to do it anyway.
>> It's nice that you've got unit tests as you're saying but there aren't any in the repo.
Oh. I thought I had them included. Thanks.
>> With that in mind, I was personally more interested in ILP as a complementary technique to LLM code generation.
I have no interest in that at all! But good luck I guess. I've certainly noticed some interest in combining Logic Programming with LLMs. I think this is in the belief that the logic in Logic Programming will somehow magickally percolate up to the LLM token generation. That won't happen of course. You'd need all the logic happening before the first token is generated, otherwise you're in a generate-and-filter regime that is juuust a little bit more controllable than an LLM, and more wasteful. Think of all the poor tokens you're throwing away!
Edit: Out of curiosity, did you have to edit Aleph's code to get it to run on your engine? I had to use Aleph on the job a while ago and I couldn't get it to work, not with YAP, not with SWI (I tried a version that was supposed to be specifically for SWI, but it didn't work). I only managed to get it to run from a version that was sent to me by a colleague, which I think had been tweaked to work with SWI (but wasn't the "official" ish SWI port).
_____________
[1] Akihiro Yamamoto, Which hypotheses can be found with inverse entailment? (2005)
https://link.springer.com/chapter/10.1007/3540635149_58
[2] Without tabling.
Current SWI doesn't work with SWI's own aleph module (derived from [2]) anymore because Aleph uses false :- ... for encoding negative examples when false/0 is a static builtin in ISO/IEC 13211-1 (2012) as mentioning briefly in [1]. It's an unfortunate choice to break existing code by introducing pointless trivial builtins with such generic names and I officially "complained" about this. It might also break other Prolog software such as Stickel et. al.'s PTTP. The original ISO spec, for similar reasons, didn't define consult/1 or assert/1. Quantum Prolog deliberarely didn't follow ISO in this respect and doesn't define false as builtin. For SWI, you can run an old pre-ISO 2012 version (cf [1]) as a workaround.
I should really put the ISO port of Aleph on a public repo one of these days. I was always intending to do that after the site/article went online.
[1]: https://quantumprolog.sgml.net/bioinformatics-demo/part1.htm...
It's funny there's both fail/0 and false/0 in ISO (though not sure which version) but only true/0 and no succeed/0. My guess is that an "I-mean-false" predicate is useful (e.g. for failure-driven loops) but "fail" strikes some folks as too procedural so we also got "false". But I'm only guessing.
I'd like to blame Aleph for using "false:-" for negative examples, rather than ISO. I suspect it's a hack to make it easier to cause negative examples to fail, somewhere. Normally ":- a" should be immediately recognised as a goal with a negative literal, but since that's the syntax used in directives it's probably not immediately obvious to most that "a" is negated. It wasn't to me for the first 10 years or so.
Is it a good thing to make this easier? We're drowning in garbage already.
We took a few decades to figure out how to specify & evolve current code to solve a certain class of problems [nothing is perfect .. but it seems to work at scale with trade-offs]. Shall watch this from a distance with pop-corn.
> "The actual implementation of Universalis uses Kotlin DataFrames"
Neat. My favorite talk at the 2025 Kotlin Conf was the demo of the new compiler plugin that allows ad-hoc type inference from DataFrame expressions.Essentially, say that you have an input type like:
type User { name: String, age: Int }
And you do something like ".map(it => { doubledAge: it.age * 2 })"The inferred type of that intermediate operation is now:
type User_synthetic { doubledAge: Int }
Which is wild, since you essentially have TypeScript-like inference in a JVM language.Timestamp to talk: