This is the quality that should be studied and developed in symbolic AI approach. However, the actual symbolic AI work I know of seems to fall in one of the two buckets: 1. "Let's solve a mathematical problem (e.g. winning at chess) and say that the solution is AI" (because humans can play chess, and now computers can too!) 2. "Let's make something like Prolog but with different solver algorithm / knowledge representation". Products like Cyc and Wolfram seem to work essentially in this manner, although with lots of custom coding for specific cases to make them practical. There's lots of work on separate aspects of this as well, like temporal and other modal logics.
I see the first bucket as just applied maths, not really AI. The second bucket is actually aimed at general reasoning, but the approaches and achievements in it are somewhat uninspiring, maybe because I don't know many of them.
So my broad question is: what is happening in such "logical AI" research/development in general? Are there any buckets I missed in the description above, or maybe my description is wrong to begin with? Are there any approaches that seem promising, and if so, how and why?
I would be grateful for suggestions of the books/blogs/other resources on the topic as well.
- birds and mammals are inherently able to count in almost any context because they understand what numbers actually mean; GPT-4 can only be trained to count in certain contexts. GPT-4 would be like a pigeon that could count apples, but not oranges, yet biological pigeons can count anything they can see, touch, or hear. There's a profound gap in true quantitative reasoning, even if GPT-4 can fake this reasoning on specific human math problems.
- Relatedly, birds and mammals are far faster at general pattern recognition than GPT-4, unless it has been trained to recognize that specific pattern.
- Birds and mammals can spontaneously form highly complex plans; GPT-4 struggles with even the simplest plans, unless it has been trained to execute that specific plan.
The "trained to do that specific thing" is what makes GPT-4 so much dumber than warm-blooded vertebrates. When we test the intelligence of an animal in a lab, we make sure to test them on a problem they've never seen before. If you test AI like you test an animal, AI looks incredibly stupid - because it is!
There was a devastating paper back in 2019[1] proving that Google's BERT model - which at the time was world-class at "logical reasoning" - was entirely cheating on its benchmarks. And another paper from this year[2] demonstrates that LLMs definitely don't have "emergent" abilities, AI researchers are just sloppy with stats. It is amazing how much bad science and wishful thinking has been accepted by the AI community.
The purpose of the subject is, roughly speaking, to exhaustively characterize all types of reliable reasoning which can be carried out efficiently - some people say they are searching for "a logic for P". The techniques used are a mix of ideas from model theory, universal algebra, Ramsey theory, and computer science. Given the ridiculously ambitious scope of the project, I think the rate of progress (especially in the past few years) is astounding.
I am unaware of anyone who can reason to any serious depth without a paper, computational, or actual version of a whiteboard.
This doesn’t seem like a particularly challenging thing to add to current shallow (but now quite wide) reasoning models.
Imagine how fast you could think if you had a mentally stable whiteboard that you could perceive as clearly as you can see, and update as fast as you can think the changes.
Our brains have probably been tragically speed limited by our slow vocal & finger speeds for some time.
That will take AI’s to a wide AND deep reasoning level far beyond us very quickly.
Now add mental file cabinets and an AI could trivially keep track of many goals and it’s progress on them. Again, not likely to be a huge challenge to add.
Now, given all that long term reasoning ability, let the AI manage instances of itself working across all the problems with speed adjusted for priority & opportunity.
Finally, have the model record every difficult problem it solved, so it’s fast wide (non-whiteboard) abilities can be tuned, moving up level after level. Occasionally do a complete retraining on all data and problem-solution pairs. Again, straightforward scaling.
Every new dimension they scale quickly surpasses us & keeps improving.
At this point, IMHO, anyone pessimistic about AI has expectations far behind the exponential curve we are in. Our minds constantly try to linearize our experiences. This is the worst time in history to be doing that.
My basic understanding is that it combines "standard" supervised learning techniques (neural nets + SGD) with a set of logical requirements (e.g. in the case of annotating autonomous driving data, things like "a traffic light cannot be red and green at the same time"). The logical requirements not only make the solution more practically useful, but can also help it learn the "right" solution with less labelled data.
[1] I don't know if they had a NeurIPS paper about this; I was talking to the authors about the NeurIPS competition they were running related to this approach: https://sites.google.com/view/road-r/home
Oddly, back when “expert system shells” were cool people thought 10,000 rules were difficult to handle, now 1,000,000 might not be a problem at all. Back then the RETE algorithm was still under development and people were using linear search and not hash tables to do their lookups.
Also https://github.com/Z3Prover/z3
Note “the semantic web” is both an advance and a retreat in that OWL is a subset of first order logic which is really decidable and sorta kinda fast. It can do a lot but people aren’t really happy with what it can do.
Formal correctness is drastically different from “actual reasoning”.