I think it would be interesting to use an LLM to distill Wikipedia into a set of assertions, then iterate through combinations of those assertions using OpenCyc.

You could look for contradictions between pages on the same subject in different languages, or different pages on related subjects.

You could synthesise new assertions based on what the current assertions imply, then render it to a sentence and fact-check it.

You could use verified assertions to benchmark language parsing and comprehension for new models. Basically unit test NLP.

You could produce a list of new assertions and implications introduced when a new edit to a page is made.

Along with that, a portion of the content of Wikipedia is already available in structured assertion form, thanks to DBPedia[1] and Wikidata[2]. I don't know the exact percentage, but it's a starting point at the very least.

[1]: https://www.dbpedia.org/

[2]: https://www.wikidata.org/wiki/Wikidata:Main_Page

I've tried to use ChatGPT to produce wikidata queries- it sounds like a great combination. Unfortunately it's pretty hard to make it produce valid queries, and to find the wikidata documentation to teach it.
My interest is a little more like flipping that around the other way. Take output from the LLM, get it into structured assertion form, and then use those assertions as part of a query / inference process which pulls in other "known good" (or "ground truth" if you will) assertions from sources like DBPedia or Wikidata. The idea being to either verify the LLM output, or possibly to extend it using inferred conclusions.

I think the way I think about it is somewhat akin to what AWS are doing here, where they talk about using automated reasoning to reduce hallucinations from LLM's:

https://aws.amazon.com/blogs/machine-learning/reducing-hallu...

Was looking at prolog and having it shadow the llm activity in order to flag whenever the conversation trips.
Very interesting that you’ve done this! I’ve read a few times about people talking about knowledge bases and LLMs and there seems a lot of hand waving around “and at this point the LLM produces the SPARQL query” which feels very much “now draw the rest of the owl” to me
If you can express the DSL SPARQL query as a transformation to JSON Schema, you can have GPT give you the JSON predictably in strict mode, and then reverse the transformation.
Frankly the problem was not in producing a correctly formed SPARQL query; the issue was in the myriad of entity types and relations that the Wikidata project uses and that you need to know about in their detail to use the system. Which often just produces nothing or times out (from the web UI). Not having a previous experience with Wikidata (or SPARQL for that matter) it felt trickier than expected.
Interesting and good to know - the way I've handled similar kinds of things before (for SQL), is if you have a list of predicates or entity types you need to pick from you can populate an enum in a JSON schema. If the list is way too long (it probably is), you can get K nearest neighbor of the entity types to the natural language query and populate the enum with K values (where K might be something reasonable like 10 to 20). You'd need to have a dataset of all entity types stored locally to do this of course.

I know this sounds hand-wavy, but I have had good results doing similar things when trying to pick the correct foreign key out of a table with 1000s of rows.

Have you done this and had it working reliably for you in practice?
Not for SPARQL, no. But I have done it for other query DSLs like SQL and Opensearch.
Fair enough. Any particular tricks that helped you get valid queries out of it the first time?
Think of it like this: don't ask it to construct the query, ask it for the parts of the query, in a predefined format. Then populate one of many pre-defined templates with the values.

For example, you might have ten types of queries. Each query will have parameters and a classification/description. Get the classification, and get the parameters, and stick them into your respective template.

This gives you way more safety and predictability.

I've gone through the whole gamut to do this since first using the GPT's API in early 2023.

As first I was asking for HTML with few-shot since getting reliable JSON was impossible. Then I switched to function calling with JSON mode when that was released later in 2023, but still with fallbacks because sometimes it would spit out markdown with JSON inside, and other fun mistakes. Finally summer 2024 strict mode became available which "guarantees" a well-formed JSON response corresponding to a given schema. I've run hundreds of thousands of queries through that and have not had a formatting issue (yet).

Very smart approach, thanks
That's exactly one of my project, to use that on my medical anki flashcards, as well as on my many medical pdfs. I'm sure there's a good way to do RAG on it that would be sourced.

I intend to add it in the end to my sophisticated rag system wdoc (https://wdoc.readthedocs.io/en/latest/)

OpenCyc always had that failing that it never felt accessible hidden away. GraphRag for opencyc maybe?
I don't understand why you need OpenCyc for this, you could just chain LLM for both.

I think it would outperform on any language task as language that isn't formal requires interpretation which LLMs excel at.

> Is anyone playing with the combination of generative AI and OpenCyc?

OpenCyc specifically? No. But related semantic KB's using the SemanticWeb / RDF stack? Yes. That's something I'm spending a fair bit of time on right now.

And given that there is an RDF encoding of a portion fof the OpenCyc data out there[1], and that may make it into some of my experiments eventually, I guess the answer is more like "Not exactly, but sort of, maybe'ish" or something.

[1]: https://sourceforge.net/projects/texai/files/open-cyc-rdf/

Microsoft published quite a bit on the notion of graph RAG which is about enhancing regular RAG (retrieval augmented search) with graph databases. The idea is that instead of just pulling semantically related information to the query, it also pulls in information about things connected to those things. This gives the LLM more contextual information to work with.

Sounds like it would probably have. If you combine that with entity recognition and automated graph construction from unstructured data, you might get something that is vaguely useful.

Combine it with prolog to prove reasoning.
>If you combine that with entity recognition and automated graph construction from unstructured data, you might get something that is vaguely useful.

I think that's included in the Graphrag project released from MS.

I'll be the antiquated person here. Writing/speaking well has brought people to knowledge because the authors are uniquely positioned to use genius, humour, gentleness and generosity to bring an inquisitive but ignorant person into a new area of knowledge. When the value of that can be quantified, then we can compare AI "generation" of "efficiently written, useful knowledge" with what we had/have.

Same for fiction, visual art, raising children, caring for old people, and so on, and so on.

With big enough cohorts we can AB test by quantifying outcomes? Treatment group A has AI-generated texts and Treatment group B has the originals. We can have a questionnaire for how the group felt about the material etc, but we can also perhaps measure life outcomes over a bigger period e.g. performance at school or in the market place?

As I write this it feels naive and reminds me of a thousand ill-thought-out AB website tests etc. But still :D

exactly how long are you planning on running these tests for - sounds like minimum 20 years?
Yeap :)

There are plenty of very long-term studies going on where groups are identified and then followed over years or decades.

Another approach is today you might go looking for people with different levels of airborne lead exposure as a child, and then compare the average income today to see if their is correlation etc. In these kinds of studies the treatment groups weren't wittingly part of the experiment, but you can look backwards.

Either way, in 20 years we'll probably be able to identify schools where AI texts predominated earlier than other schools and, adjusting for other factors, try and tease out the actual impact of that choice.

I mean the thing is, ethically speaking, there is a strong supposition that the AI predominating instructional format will underperform the other formats, therefore you are essentially putting large groups of people, on purpose, in learning situations the theory is will underperform for the rest of their lives.

That seems like a bad thing.

I suppose that it will happen however, and the backwards looking thing will be what is done to determine the result.

Unsure specifically, but there's a long-standing movement to combine GOFAI symbolic approaches, and modern neurally-influence systems. https://en.wikipedia.org/wiki/Neuro-symbolic_AI
At the risk of sounding dumb: I don’t understand what are the practical applications of OpenCyc. I get LLMs, you can ask questions and they’ll answer, they can write an article, they can summarize documents… what are the practical applications of OpenCyc?
OpenCyc is the remaining evolution of Cyc, which was based on the idea that symbolic AI (knowledge graphs/Semanic Networks) would lead to AGI through scaling the knowledge base. You formalize the world so you can use logic to reason about it.

Later approaches of the same idea coning out of the academic Databases community reinvented this particular wheel with far better PR and branded it the Semantic Web with 'Ontologies' and RDF, OWL and its ilk.

While reasonable (pun intended) for small vertical domains, the approach has never made inroads in more broad general intelligence as IMHO it does not deal well with ambiguities, contradictions, multi level or perspective modeling and circular referential meaning in 'real world' reasoning and also tends to ignore agentive, transformative and temporal situatedness.

Its ideal seems to be a single thruth never changing model of the universe that is simply accepted by all.

All true in general, but allow me to add that there has been work on incorporating probabilistic / "soft" reasoning into the Semantic Web / RDF / OWL world. For some time now there has been PR-OWL (Probabilistic OWL)[1], and the recent work on RDF* (RDF-STAR)[2][3] emphasizes its application in terms of being able to - among other things - do stuff like adding weights (confidence scores, fuzzy probabilities, what-have-you) to RDF assertions. So the need to pursue these paths is understood, although I suppose one can argue that progress has been slow and painstaking.

[1]: https://www.pr-owl.org/

[2]: https://www.w3.org/2021/12/rdf-star.html

[3]: https://w3c.github.io/rdf-star/UCR/rdf-star-ucr.html

  • Delk
  • ·
  • 2 weeks ago
  • ·
  • [ - ]
One of the issues is probably that the weights in fuzzy or probabilistic relationships or properties are rather context-dependent, so they probably still more or less have the same problems as all general-purpose knowledge graphs: it's exceedingly difficult to explicitly model relationships so that they'd be both broadly generalizable and detailed enough to be useful for non-trivial reasoning.

I'm also not sure that fuzzy or probabilistic properties automatically translate into reasonable transitive properties or reasoning even if the individual weights are reasonable.

(Fuzzy logic is of course exactly about formal logic and reasoning in non-absolute terms, but the idea has been around for a long time and AFAIK largely superseded by probability.)

Not that I have any deeper idea about recent work in that area. I did a couple of years' stint in semantic web stuff back in the day, and weighted relationships were one of the obvious ideas for dealing with the rigidity of explicit relationships. They also came with obvious problems and at least back then my impression was that the idea wasn't actually as useful as it initially sounded.

But as I said, I haven't really been following the field in years, so there might have been some useful developments.

  • 2ro
  • ·
  • 2 weeks ago
  • ·
  • [ - ]
Cyc apparently addresses this issue with what are termed "microtheories" - in one theory something can be so, and in a different theory it can be not so: https://cyc.com/archives/glossary/microtheory/
Formalized knowledge representation, basically.
Hmm… maybe we could train /tune a model on symbolic logic similar to or even using CycL instead of python, and then when we have it “write code” it would be solving the problem we want it to think about, using symbolic logic?

You might be on to something here. The problem being there isn’t billions of tokens worth of CycL out there to train on, or is there?

Haven't LLMs simply obsoleted OpenCyc? What could introducing OpenCyc add to LLMs, and why wouldn't allowing the LLM to look up Wikipedia articles accomplish the same thing?
LLMs have just ignored the fundamental problem of reasoning: symbolic inference. They haven't "solved" it, they just don't give a damn about logical correctness.
logical correctness as in formal logic is a huge step down

LLMs understand context and meaning and genuine intention

  • 2ro
  • ·
  • 2 weeks ago
  • ·
  • [ - ]
Cyc apparently addresses this issue with what are termed "microtheories" - in one theory something can be so, and in a different theory it can be not so:

https://cyc.com/archives/glossary/microtheory/

>A microtheory (Mt), also referred to as a context, is a Cyc constant denoting assertions which are grouped together because they share a set of assumptions

This sounds like a world model with extra-steps, and a rather brittle one at that.

How do you choose between two conflicting "microtheories"?

I’m not familiar with Cyc/OpenCyc, but it seems that it’s not just a knowledge base, but also does inference and reasoning - while LLMs don’t reason and will happily produce completely illogical statements.
Such systems tend to be equally good at producing nonsense: mainly because it's really hard to make a consistent set of 'facts', and once you have inconsistencies, creative enough logic can produce nonsense in any part of the system.
  • 2ro
  • ·
  • 2 weeks ago
  • ·
  • [ - ]
Cyc apparently addresses this issue with what are termed "microtheories" - in one theory something can be so, and in a different theory it can be not so: https://cyc.com/archives/glossary/microtheory/
  • p1esk
  • ·
  • 2 weeks ago
  • ·
  • [ - ]
Can you please give an example of a “completely illogical statement” produced by o1 model? I suspect it would be easier to get an average human to produce an illogical statement.
  • n2d4
  • ·
  • 2 weeks ago
  • ·
  • [ - ]
Give it anything that sounds like a riddle, but isn't. Just one example:

> H: The surgeon, who is the boy's father, says "I can't operate on this boy, he's my son!" Who is the surgeon of the boy?

> O1: The surgeon is the boy’s mother.

Also, just because humans don't always think rationally doesn't mean ChatGPT does.

Haha, you are right, I just asked Copilot, and it replied this:

> This is a classic riddle! The surgeon is actually the boy's mother. The riddle plays on the assumption that a surgeon is typically male, but in this case, the surgeon is the boy's mother.

> Did you enjoy this riddle? Do you have any more you'd like to share or solve?

Ha, good one! Claude gets it wrong too, except for apologizing and correcting itself when questioned:

"I was trying to find a clever twist that isn't actually there. The riddle appears to just be a straightforward statement - a father who is a surgeon saying he can't operate on his son"

More than being illogical, it seems that LLMs can be too hasty and too easily attracted by known patterns. People do the same.

It's amazing how great these canned apologies work at anthropomorphising LLMs. It wasn't really in haste, it simply failed because the nuance fell below noise in its training set data but you rectified it with your follow-up correction.
Well, first of all it failed twice: first it spat out the canned riddle answer, then once I asked it to "double check" it said "sorry, I was wrong: the surgeon IS the boy's father, so there must be a second surgeon..."

Then the follow up correction did have the effect of making it look harder at the question. It actually wrote:

"Let me look at EXACTLY what's given" (with the all caps).

It's not very different from a person that decides to focus harder on a problem once it was fooled by it a couple of times already because it is trickier than it seems. So yes, surprisingly human, with all its flaws.

But thing is it wasn't trickier than it seemed. It was simply an outlier entry, like the flipped tortoise question that tripped the android in the Bladerunner interrogation scene. It was not able to think harder without your input.
Grok gives this as an excuse for answering "The surgeon is the boy's mother." :

<<Because the surgeon, who is the boy's father, says, "I can't operate on this boy, he's my son!" This indicates that there is another parent involved who is also a surgeon. Given that the statement specifies the boy's father cannot operate, the other surgeon must be the boy's mother.>> Sounds plausible and on the first read, almost logical.

Easy, from my recent chat with o1: (Asked about left null space)

‘’’ these are the vectors that when viewed as linear functionals, annihilate every column of A . <…> Another way to view it: these are the vectors orthogonal to the row space. ‘’’

It’s quite obvious that vectors that “annihilate the columns” would be orthogonal to the column space not the row space.

I don’t know if you think o1 is magic. It still hallucinates, just less often and less obvious.

average humans don't know what "column spaces" are or what "orthogonal" means
Average humans don't (usually) confidently give you answers to questions they do now know the meaning of. Nor would you ask them.
Ah hum. The discriminant is whether they know that they don't know. If they don't, they will happily spit out whatever comes to their mind.
Sure average humans don’t do that, but this is hackernews where it’s completely normal for commenters to confidently answer questions and opine on topics they know absolutely nothing about.
And why would the "average human" count?!

"Support, the calculator gave a bad result for 345987*14569" // "Yes, well, also your average human would"

...That why we do not ask "average humans"!

  • rzzzt
  • ·
  • 2 weeks ago
  • ·
  • [ - ]
"On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."

So the result might not necessarily be bad, it's just that the machine _can_ detect that you entered the wrong figures! By the way, the answer is 7.

average human matters here because the OP said

> Can you please give an example of a “completely illogical statement” produced by o1 model? I suspect it would be easier to get an average human to produce an illogical statement.

> because the OP said

And the whole point is nonsensical. If you discussed whether it would be ethically acceptable to canaries it would make more sense.

"The database is losing records...!" // "Also people forget." : that remains not a good point.

Because the cost competitive alternative to llms are often just ordinary humans
Following the trail as you did originally: you do not hire "ordinary humans", you hire "good ones for the job"; going for a "cost competitive" bargain can be suicidal in private enterprise and criminal in public ones.

Sticking instead to the core matter: the architecture is faulty, unsatisfactory by design, and must be fixed. We are playing with the partials of research and getting some results, even some useful tools, but the idea that this is not the real thing must be clear - also since this two years plus old boom brought another horribly ugly cultural degradation ("spitting out prejudice as normal").

I interpreted the op's argument to be that

> For simple tasks where we would alternatively hire only ordinary humans AIs have similar error rates.

Yes if a task requires deep expertise or great care the AI is a bad choice. But lots of tasks don't. And in those kinds of tasks even ordinary humans are already too expensive to be economically viable

Sorry for the delay. If you are still there:

> But lots of tasks

Do you have good examples of tasks in which dubious verbal prompt could be an acceptable outcome?

By the way, I noticed:

> AI

Do not confuse LLMs with general AI. Notably, general AI was also implemented in system where critical failures would be intolerable - i.e., made to be reliable, or part of a finally reliable process.

Yes lots of low importance tasks. E.g. assigning a provisional filename to an in progress document

Checking documents for compliance with a corporate style guide

  • tpm
  • ·
  • 2 weeks ago
  • ·
  • [ - ]
LLMs don't know what is true (they have no way of knowing that), but they can babble about any topic. OpenCyc contains 'truth'. If they can be meaningfully combined, it could be good.

It's the same as using LLM for programming, when you have a way to evaluate the output, then it's fine, if not, you can't trust the output as it could be completely hallucinated.

No, they are completely orthogonal.

LLM are likelyhood completers, and classifiers. OpenCYC brings some logic and rationale into the classifiers. Without rationale LLM will continue hallucinating, spitting out nonsense.

  • 2ro
  • ·
  • 2 weeks ago
  • ·
  • [ - ]
if anyone is interested:

https://2ro.co/post/768337188815536128

(EZ - a language for constraint logic programming)

How would you phrase that question in opencyc?
  • thom
  • ·
  • 2 weeks ago
  • ·
  • [ - ]
Probably, but the bitter lesson still applies.
sometimes i think projects like cyc are like 3n+1 problem for AI. it's so alluring.
  • ·
  • 2 weeks ago
  • ·
  • [ - ]
Isn’t Cycorp?
  • ·
  • 2 weeks ago
  • ·
  • [ - ]
  • ·
  • 2 weeks ago
  • ·
  • [ - ]