You could look for contradictions between pages on the same subject in different languages, or different pages on related subjects.
You could synthesise new assertions based on what the current assertions imply, then render it to a sentence and fact-check it.
You could use verified assertions to benchmark language parsing and comprehension for new models. Basically unit test NLP.
You could produce a list of new assertions and implications introduced when a new edit to a page is made.
I think the way I think about it is somewhat akin to what AWS are doing here, where they talk about using automated reasoning to reduce hallucinations from LLM's:
https://aws.amazon.com/blogs/machine-learning/reducing-hallu...
I know this sounds hand-wavy, but I have had good results doing similar things when trying to pick the correct foreign key out of a table with 1000s of rows.
For example, you might have ten types of queries. Each query will have parameters and a classification/description. Get the classification, and get the parameters, and stick them into your respective template.
This gives you way more safety and predictability.
I've gone through the whole gamut to do this since first using the GPT's API in early 2023.
As first I was asking for HTML with few-shot since getting reliable JSON was impossible. Then I switched to function calling with JSON mode when that was released later in 2023, but still with fallbacks because sometimes it would spit out markdown with JSON inside, and other fun mistakes. Finally summer 2024 strict mode became available which "guarantees" a well-formed JSON response corresponding to a given schema. I've run hundreds of thousands of queries through that and have not had a formatting issue (yet).
I intend to add it in the end to my sophisticated rag system wdoc (https://wdoc.readthedocs.io/en/latest/)
I think it would outperform on any language task as language that isn't formal requires interpretation which LLMs excel at.
OpenCyc specifically? No. But related semantic KB's using the SemanticWeb / RDF stack? Yes. That's something I'm spending a fair bit of time on right now.
And given that there is an RDF encoding of a portion fof the OpenCyc data out there[1], and that may make it into some of my experiments eventually, I guess the answer is more like "Not exactly, but sort of, maybe'ish" or something.
[1]: https://sourceforge.net/projects/texai/files/open-cyc-rdf/
Sounds like it would probably have. If you combine that with entity recognition and automated graph construction from unstructured data, you might get something that is vaguely useful.
I think that's included in the Graphrag project released from MS.
Same for fiction, visual art, raising children, caring for old people, and so on, and so on.
As I write this it feels naive and reminds me of a thousand ill-thought-out AB website tests etc. But still :D
There are plenty of very long-term studies going on where groups are identified and then followed over years or decades.
Another approach is today you might go looking for people with different levels of airborne lead exposure as a child, and then compare the average income today to see if their is correlation etc. In these kinds of studies the treatment groups weren't wittingly part of the experiment, but you can look backwards.
Either way, in 20 years we'll probably be able to identify schools where AI texts predominated earlier than other schools and, adjusting for other factors, try and tease out the actual impact of that choice.
That seems like a bad thing.
I suppose that it will happen however, and the backwards looking thing will be what is done to determine the result.
Later approaches of the same idea coning out of the academic Databases community reinvented this particular wheel with far better PR and branded it the Semantic Web with 'Ontologies' and RDF, OWL and its ilk.
While reasonable (pun intended) for small vertical domains, the approach has never made inroads in more broad general intelligence as IMHO it does not deal well with ambiguities, contradictions, multi level or perspective modeling and circular referential meaning in 'real world' reasoning and also tends to ignore agentive, transformative and temporal situatedness.
Its ideal seems to be a single thruth never changing model of the universe that is simply accepted by all.
I'm also not sure that fuzzy or probabilistic properties automatically translate into reasonable transitive properties or reasoning even if the individual weights are reasonable.
(Fuzzy logic is of course exactly about formal logic and reasoning in non-absolute terms, but the idea has been around for a long time and AFAIK largely superseded by probability.)
Not that I have any deeper idea about recent work in that area. I did a couple of years' stint in semantic web stuff back in the day, and weighted relationships were one of the obvious ideas for dealing with the rigidity of explicit relationships. They also came with obvious problems and at least back then my impression was that the idea wasn't actually as useful as it initially sounded.
But as I said, I haven't really been following the field in years, so there might have been some useful developments.
You might be on to something here. The problem being there isn’t billions of tokens worth of CycL out there to train on, or is there?
LLMs understand context and meaning and genuine intention
This sounds like a world model with extra-steps, and a rather brittle one at that.
How do you choose between two conflicting "microtheories"?
> H: The surgeon, who is the boy's father, says "I can't operate on this boy, he's my son!" Who is the surgeon of the boy?
> O1: The surgeon is the boy’s mother.
Also, just because humans don't always think rationally doesn't mean ChatGPT does.
> This is a classic riddle! The surgeon is actually the boy's mother. The riddle plays on the assumption that a surgeon is typically male, but in this case, the surgeon is the boy's mother.
> Did you enjoy this riddle? Do you have any more you'd like to share or solve?
"I was trying to find a clever twist that isn't actually there. The riddle appears to just be a straightforward statement - a father who is a surgeon saying he can't operate on his son"
More than being illogical, it seems that LLMs can be too hasty and too easily attracted by known patterns. People do the same.
Then the follow up correction did have the effect of making it look harder at the question. It actually wrote:
"Let me look at EXACTLY what's given" (with the all caps).
It's not very different from a person that decides to focus harder on a problem once it was fooled by it a couple of times already because it is trickier than it seems. So yes, surprisingly human, with all its flaws.
<<Because the surgeon, who is the boy's father, says, "I can't operate on this boy, he's my son!" This indicates that there is another parent involved who is also a surgeon. Given that the statement specifies the boy's father cannot operate, the other surgeon must be the boy's mother.>> Sounds plausible and on the first read, almost logical.
‘’’ these are the vectors that when viewed as linear functionals, annihilate every column of A . <…> Another way to view it: these are the vectors orthogonal to the row space. ‘’’
It’s quite obvious that vectors that “annihilate the columns” would be orthogonal to the column space not the row space.
I don’t know if you think o1 is magic. It still hallucinates, just less often and less obvious.
"Support, the calculator gave a bad result for 345987*14569" // "Yes, well, also your average human would"
...That why we do not ask "average humans"!
So the result might not necessarily be bad, it's just that the machine _can_ detect that you entered the wrong figures! By the way, the answer is 7.
> Can you please give an example of a “completely illogical statement” produced by o1 model? I suspect it would be easier to get an average human to produce an illogical statement.
And the whole point is nonsensical. If you discussed whether it would be ethically acceptable to canaries it would make more sense.
"The database is losing records...!" // "Also people forget." : that remains not a good point.
Sticking instead to the core matter: the architecture is faulty, unsatisfactory by design, and must be fixed. We are playing with the partials of research and getting some results, even some useful tools, but the idea that this is not the real thing must be clear - also since this two years plus old boom brought another horribly ugly cultural degradation ("spitting out prejudice as normal").
> For simple tasks where we would alternatively hire only ordinary humans AIs have similar error rates.
Yes if a task requires deep expertise or great care the AI is a bad choice. But lots of tasks don't. And in those kinds of tasks even ordinary humans are already too expensive to be economically viable
> But lots of tasks
Do you have good examples of tasks in which dubious verbal prompt could be an acceptable outcome?
By the way, I noticed:
> AI
Do not confuse LLMs with general AI. Notably, general AI was also implemented in system where critical failures would be intolerable - i.e., made to be reliable, or part of a finally reliable process.
Checking documents for compliance with a corporate style guide
It's the same as using LLM for programming, when you have a way to evaluate the output, then it's fine, if not, you can't trust the output as it could be completely hallucinated.
LLM are likelyhood completers, and classifiers. OpenCYC brings some logic and rationale into the classifiers. Without rationale LLM will continue hallucinating, spitting out nonsense.
https://2ro.co/post/768337188815536128
(EZ - a language for constraint logic programming)