The State of AI

Seven Senior Member Reports from AAAI-2022

Reading newspaper and magazine articles in the past few years, a lay reader might well surmise that AI and deep learning were one and the same, as in this excerpt, from a 2021 article in the New York Times:

Deep learning is a kind of machine learning, which in turn is part of AI. But, in the press AI has often been conflated with deep learning alone.

Furthermore, all the 55+ years of AI research, roughly dating from the Dartmouth AI Conference in 1956 to the ascendance of deep learning in 2012, could simply be dismissed as ‘rule-based AI’ with ‘hand-crafted’ rules, which ran into a brick wall of rigidity, leading to its extinction and replacement with the new paradigm. Such a reader would, of course, be quite surprised by the presidential and plenary addresses at AAAI-2022, along with many of the other key talks given by senior members of AAAI. The surprise would be that the current state of AI recognized within AAAI views both symbolic and neural approaches as necessary, i.e., the neural approach has not simply supplanted the symbolic one.

A very useful framework for interpreting these talks is provided by Yejin Choi, in her 2020 ACL Tutorial on Common-sense Reasoning, in two slides. The first shows Daniel Kahneman’s characterization of human reasoning as having three cognitive systems:

Daniel Kahneman posited Three Cognitive Systems, not just Type I and Type II Reasoning.

I had first seen Yejin Choi point this out, as most other discussions of AI approaches only characterize human reasoning with the Type I and Type II distinctions. This tripartite decomposition yields a better way of considering what different AI systems are addressing. Under this perspective, neural systems are best at perception, symbolic systems best at reasoning, and some overlap of the two may best address intuition and common-sense reasoning, which will be discussed later.

Neural models seem closest to human perceptual capabilities while symbolic reasoning seems closest to human abstract reasoning. Intuition and common-sense reasoning require aspects of both approaches.

This paper presents the current state of AI, as seen from seven different key talks by senior AAAI members. We start with the AAAI consensus view that neural approaches are already being combined with symbolic approaches to address the complementary strengths and weaknesses in each. This is the conventional view of combining Type I and Type II reasoning. Next, we consider common-sense reasoning and its current capabilities in language models. Finally, for a practical, human-centered AI, we need clear human-intelligible explanations and models that support them, so we discuss XAI and human-interpretable models in our final two talks.

THREE TALKS ADDRESSING HYBRID AI

Bart Selman: Presidential Address

The State of AI

Bart Selman is a computer science professor at Cornell and the current president of AAAI. Bart’s talk started by reviewing some of the amazing advances made by deep learning, which all the speakers acknowledged in their talks. Here are some noteworthy examples that Bart covered, on his second slide:

  • Computer vision — e.g., the use of deep CNN architectures trained from ImageNet and other large datasets to vastly improve image classification

  • NLP — e.g., the use of GPT3 to generate staggering varieties and volumes of human-like prose

  • Machine translation — where the deep learning approach has supplanted statistical precursors

  • Gameplay — AlphaGo and AlphaZero, with the first-ever machine defeat of a top-ranked Go player

Bart’s view of the current state of AI is one of “reunification,” rather than deep learning and symbolic learning being in opposition in any way. Instead, the broader field is becoming unified again: Formerly specialized areas such as vision, natural language processing, and planning are seeing a common ground in shared work on deep learning that allows AI to have “eyes and ears,” built on distributed representations used for multiple applications and using shared architectural components, such as Transformers. Multimodal distributed representations raise interesting questions about how deep learning handles semantics and concept formation across these multiple modalities.

Bart contrasted “data-driven methods” and “knowledge-driven methods,” the former better characterizing deep learning and the latter better characterizing symbolic learning. He reminded us of some of the advantages of knowledge, e.g., that it can capture all truths about a domain, both current and future. He cited the example of Peano’s axioms for natural numbers and contrasted that with concepts learned as patterns from data, such as a ‘cat’ visual concept that could be learned by an off-the-shelf neural net (e.g., AlexNet, VGG, or ResNet) from ImageNet. Furthermore, knowledge provides key concepts and abstractions for understanding that domain and for performing causal reasoning in the domain. Learning these concepts and abstractions is cumulative, in the sense that you can apply this knowledge later in other domains. For example, your knowledge of linear algebra and vectors is useful both in physics and in deep learning. Bart emphasized that concept discovery, e.g., discovering the concepts of force, mass, and acceleration is a key prerequisite behind scientific discovery, e.g., we must understand these concepts before learning the laws tying them together, such as Newton’s Second Law, F=ma.

Francesca Rossi: Thinking Fast and Slow in AI

Francesca Rossi, the president-elect of AAAI, started her talk by referring to Daniel Kahneman’s book, Thinking Fast and Slow, and its discussion of Type I and Type II reasoning. In this more widely-used view within AI, considering human reasoning to have only two key modalities, neural approaches are most similar to human Type I reasoning and symbolic approaches are most similar to human Type II reasoning.

Dr. Rossi agreed with Dr. Selman that the current phase of AI was one of reunification, a synthesis of data-driven and knowledge-driven approaches, and exploring the best ways to combine deep learning and symbolic reasoning together to address complementary strengths and weaknesses.

As evidence of the shift toward hybrid architectures, she cited the AAAI 2020 Robert S. Engelmore Memorial Award Lecture by Henry Kautz, where Henry Kautz said that “we essentially have violent agreement on the need to bring together the neural and symbolic traditions” and the AI 100 Report, a report on the state of AI given every five years. To borrow her metaphor, “the pendulum has swung from symbolic AI to learning systems … to hybrid systems in AI.”

Francesca Rossi and her IBM colleagues have implemented the SOFAI (Slow and Fast AI) architecture to combine the two modalities, each doing what it does best:

The Slow and Fast AI Architecture of IBM. Fast neural components make inferences immediately, while slower deliberative components, such as the metacognitive module, provide planning and overall control.

She presented an application of SOFAI, which I will omit here in the interests of space.

Pat Langley: Toward Human-like Learning

The Computational Gauntlet of Human-Like Reasoning

What should the learning capabilities of these hybrid systems be? Pat Langley is the Director of the Institute for the Study of Learning and Expertise and has an extensive history of research in machine learning. Pat Langley’s Blue Sky talk advocated machine learning capable of matching the cognitive capabilities demonstrated by human learning. Motivated by studies in cognitive science, human learning can be characterized as learning cognitive structures. These schema-like cognitive structures are incrementally learned. Further, they are dynamically composable for new problem situations. In symbolic AI systems, these cognitive structures have been variously modeled by concepts, frames, constraints, pattern-response rules, or knowledge source activations — possible actions corresponding to different triggers — in blackboard systems.

The key point of Pat’s talk is that human-level reasoning is characterized by the incremental and broad coverage of cognitive structures when learning a new domain, and then their reusability and dynamic composition in cross-domain transfer situations. The cognitive structures allow comparatively rapid learning in just a few instances, in stark contrast to the bulk of deep learning, which can require thousands to millions of examples. Further, human learning relies heavily on knowledge, thus it is as Bart Selman described it, “knowledge-driven,” not “data-driven.”

Now that we have considered how neural and symbolic approaches might be integrated, and what would be desirable in a hybrid system’s overall learning capabilities — more human-like, faster, incremental, and dynamically composable learning — we will consider yet another missing ingredient, for either approach alone: common-sense reasoning.

TWO TALKS ADDRESSING COMMON-SENSE REASONING IN AI

Ron Brachman: Common-Sense Reasoning

Toward a New Science of Common Sense

Ronald Brachman is a professor of computer science at Cornell, best known for his work in Description Logic and many other contributions in knowledge representation and reasoning. In his talk, he says that the main problem with today’s computer systems is simply that “they lack common sense” and that they:

“…do things that feel stupid, uninformed, just unreasonable in many situations, and they err in ways that we can’t really anticipate, they make what we might call ‘uncanny errors’, not human-like at all, and as a result, impossible for us to anticipate, which leads to great difficulty trusting them.”

The definition of common-sense reasoning used for Dr. Brachman’s talk is:

“Common sense is the ability to make effective use of ordinary everyday experimental knowledge in achieving ordinary, practical goals.”

His definition emphasizes the use of the knowledge and the ability of common-sense reasoning to help practically in handling unpredicted obstacles in real-world situations as opposed to closed-world simulations. But, a key difference between this view of common-sense and many others is the importance of reasoning about costs and risks: Ron said that common sense “regularly, even if quickly and subconsciously, takes costs and risks of proposed actions into account.” Furthermore, the agent may also have a long history and goals to be taken into account. This bounded-rationality and context-sensitive view is absent from most other definitions of common sense.

Ron Brachman divides this kind of bounded-rationality and situated common-sense into commonsense knowledge (e.g., missing facts), commonsense reasoning (plausible inference), and learning. All of these must reside within some cognitive architecture of an autonomous agent.

The key point of his talk is that common-sense is required for an autonomous agent to be trustworthy, whether or not it can explain its actions. Thus common-sense is a key ingredient in future autonomous agents, specifically to help them deal with unanticipated problems in reasonable ways.

Well, you may ask, do not language models already incorporate common-sense reasoning? Indeed they do, but only here and there in patches, inconsistently, and nowhere near the capability of reasoning about risks and rewards that Brachman’s common sense requires. But to address that point in more detail we consider a paper by Prajjwal Bhargava and Vincent Ng, of The University of Texas at Dallas, that surveys common-sense reasoning in Transformer language models.

Vincent Ng’s Talk: A Survey of Common-Sense Reasoning in Language Models

Vincent Ng’s talk surveyed the work on language models to assess their current common-sense reasoning capabilities. Vincent Ng is a professor at the University of Texas at Dallas. The first author of the paper presented is his graduate student, Prajjwal Bhargava. They found that language models can learn some stereotypical associations, such as bears have fur and can do some physical comparisons, such as a suitcase is larger than an apple. On the other hand, language models do not handle negation well, perform poorly on numerical knowledge, and their performance decreases as sentence complexity increases. Language models can only handle simple reasoning about the physical world, for example, objects that typically are used together, such as hammer and nail; or the way individual objects are typically used as instruments, such as knife and slice. They do not do well in understanding simple relations such as before and after. The models can perform simple social reasoning such as predicting what will happen to someone or why they did something. They do better on emotional common-sense reasoning tasks than spatial common-sense reasoning tasks. In terms of temporal reasoning, language models appear to have only learned very shallow aspects of it.

When used for natural language generation, language models such as T5 can generate repetitive or incoherent text. A key point Vincent made in this talk is that current benchmarks are insufficient in measuring common-sense reasoning since language models are already performing at very high levels, sometimes near human-level, yet they are far from attaining human-level common-sense reasoning capabilities.

TWO TALKS ADDRESSING EXPLAINABLE AI

Now we consider another key ingredient in practical, human-centered AI: the ability to explain inferences and conclusions — XAI (Explainable AI). The first talk by Michael Pazzani, argues that current approaches to XAI are woefully inadequate for human-intelligible understanding while the second talk, by Cynthia Rudin, presents a human-intelligible approach that performs as well as the best deep learning approaches.

Mike Pazzani’s Talk: Expert-Informed, User-Centric Explanations for Machine Learning

Mike Pazzani, a Distinguished Scientist at Halıcıoğlu Data Science Institute, University of California San Diego, argues that the current approach to explaining image classification is better suited to developers than lay users. Instead, a more user-friendly approach would point out features that subject-matter experts emphasize (e.g., a particular kind of bird beak or stippling of the breast) and label them with that domain-specific text. These explanations provide both an intelligible ‘what’ (the text for domain concepts) and an intelligible ‘where’ for a recognizable domain feature, e.g., a beak, feathers, bird’s breast, etc., and not just an unlabeled heat map. This approach to labeling leads to greater trust and helps lay users increase their domain knowledge by learning new domain concepts and key class distinctions. Future XAI systems for deep learning should produce user-friendly explanations like those on the bottom of the diagram below, as opposed to those on the top, which were obtained from current XAI systems.

Comparing Current Heat Map XAI Explanations to those Users and Subject-Matter Experts Prefer

Based on subject matter expert interviews, Mike’s UCSD team determined the key concepts and key locations (e.g., eyes, wings) for explaining bird classifications and built a system that provided these kinds of explanations. They interviewed users to determine the kinds of explanations they preferred. They are working on a system that builds these kinds of explanations. The current approach uses a concept bottleneck, in addition to heat maps. The concept bottleneck provides expert-like concepts that users can understand to provide the “what” of a good explanation. The heat maps are converted to locations for arrows to point to, thus providing the “where” of the explanation. Mike’s talk focussed mostly on images — using bird identification and radiology as two domains — but he similarly argued that most of the NLP XAI work is primarily developer-centric, and should be more user-focused, too. In terms of providing explanations, he said, “I really think we ought to be evaluating whether AI acts like people, and not make people act like AI.”

Regarding broader implications, Mike believes current data annotations are insufficient: We could learn much more if the reasons for classifications were provided, and if better tools for explanations were provided (e.g., being able to point to key features and annotate them for images, or highlight and comment on text for NLP). For areas such as radiology, where intelligible explanations are required for trust, these are essential.

Regarding broader implications, Mike believes current data annotations are insufficient: We could learn much more if the reasons for classifications were provided, and if better tools for explanations were provided (e.g., being able to point to key features and annotate them for images, or highlight and comment on text for NLP). For areas such as radiology, where intelligible explanations are required for trust, these are essential.

Cynthia Rudin: Recipient of the 2022 AAAI Squirrel Award for AI

Interpretable Machine Learning: Bringing Data Science Out of the “Dark Age”

The last talk we will cover is by Cynthia Rudin, a professor of Computer Science at Duke University, and also addresses explainable AI, or more precisely, explainable data science. Dr. Rudin’s video talk is here. She received the 2022 AAAI Squirrel AI Award for Artificial Intelligence for the Benefit of Humanity, which is worth one million dollars to the winner, and “recognizes positive impacts of artificial intelligence to protect, enhance, and improve human life in meaningful ways with long-lived effects.”

She motivated her findings by discussing her work predicting manhole maintenance problems, which can be as dire as fire and explosions, using New York City data. She found that the choice of machine learning model did not matter, but what was crucial was interpretability. Basically, all the machine learning models performed the same.

Unfortunately, she found machine learning culture did not value interpretability highly, compared to “elegant” algorithms. Instead, the vast majority of applications were low stakes and drawn from common, shared datasets. When applied to real-world applications, problems with opacity and bias arose, e.g., in making bail or loan decisions. When these problems became more evident, interpretability began to be seen as more important.

Her applications also tended to be different: tabular data versus the raw data common to deep learning applications, such as pixels for images and characters/words for text. She believes this divides machine learning into two worlds, one where neural networks are best for the raw data, and one where the standard machine learning methods are all roughly equivalent for tabular data. Tabular data is better characterized by very sparse models, such as trees and scoring systems. With tabular data, all features are interpretable, which is not the case for neural nets.

Cynthia’s design for an interpretable model considers that seven plus or minus two features are best for humans — at most. She provided an example — the 2HELPS2B score, shown above — of how she and her colleagues designed a simple algorithm for interpretability used by neurologists to predict subclinical brain seizures. It is a sparse linear algorithm that was designed by an algorithm that selects features and point scores. The doctors can memorize it by knowing its name — a mnemonic.

The algorithm to design the sparse additive models is actually complicated, and it is combinatorially hard to design sparse additive models for interpretability. Compared to benchmarks, such as the FICO Explainable Machine Learning Challenge, Dr. Rubin says the interpretable models’ accuracy was as good as black box models’ accuracy, as shown in the example above. All that was needed was a two-layer additive risk model for 10 interpretable subscale measures, followed by a logistic regression model.

These models have been applied to manhole maintenance prediction, multiple medical domains, and crime series detection. She has compared her work to black-box models, such as COMPAS. She found serious problems with COMPAS, a black-box model for recidivism, directly related to its lack of transparency. Finally, Dr. Rubin argued against apologists for black-box models without interpretability. She said, first, that glass box models can match their benchmarks, and second, that any perceived elegance of the models is irrelevant in the real world. Her group’s latest paper is here.

Summary

In summary, the speakers pointed out that we have now shifted to a merger of the traditions of symbolic reasoning and the newer deep learning approaches. Data-driven AI is beginning to add back in knowledge-driven AI. Knowledge-driven AI can also use neural AI, as in SOFAI. Yet further possible combinations were discussed by Henry Kautz in his earlier AAAI-2020 talk (see below).

Symbolic and knowledge-driven AI provide the opportunity for on-the-fly advice-taking and instruction from other humans, along with model-driven reasoning, meta-level reasoning, causal reasoning, planning, and bounded-rationality control. Symbolic reasoning systems are inherently more explainable, as reasoning occurs at the symbol level and inferences can also be understood.

Neural and Symbolic Systems, Working Together, Complementing Each Other.

Photo by SK Yeong on Unsplash


Neural systems provide unparalleled handling of perceptual problems but are increasingly running into problems with bias, opacity, and extraordinary hunger of data and computing power. Both symbolic systems and neural systems can suffer from brittleness, but in different ways. Neural systems are brittle with regards to handling inputs dissimilar from what they are trained on and with regards to hallucinating knowledge that is not there.

We have also seen suggestions for how current neural XAI systems can be improved — showing what and where key decisions are made, and ways that interpretable data science may replace black-box deep learning systems in real-world applications. Finally, we see that even with an evolving reintegration of symbolic and neural systems, the problem of common-sense reasoning is still vexing. Ultimately, a practical AI with common sense, perception, and deliberation may require embodied hybrid systems to learn from the real world and the humans around it, just as we do.

References

Note: Some of the talks and papers will not be available until AAAI Proceedings are published.

[Bhargava, P. & Ng, V., 2022] Prajjwal Bhargava and Vincent Ng. University of Texas at Dallas. Commonsense Knowledge Reasoning and Generation with Pre-trained Language Models: A Survey AAAI-2022.

[Brachman & Levesque, 2022] Ron Brachman and Hector Levesque. AAAI-2022. Jacobs Technion-Cornell Institute and Cornell University; Dept. of Computer Science, University of Toronto. Toward a New Science of Common Sense. AAAI-2022.

[Choi, Y., 2020] Yejin Choi. University of Washington; Allen Institute for Artificial Intelligence. ACL 2020 Commonsense Tutorial (T6). ACL 2020.

[IEEE, 2021] The Great AI Reckoning. See 7 Revealing Was AIs Fail and Deep Learning’s Diminishing Returns in particular for problems with deep learning.

[Kautz, H., 2020] Henry Kautz. University of Rochester. The Third AI Summer. SlidesVideo of Talk. AAAI-2020. Recommended!

[Langley, P., 2022] Pat Langley. Institute for the Study of Learning and Expertise; Stanford Center for Design Research. The Computational Gauntlet of Human-Like Learning. AAAI-2022. Talk*

[Liu, et al., 2022] Jiachang Liu, Chudi Zhong, Margo Seltzer, Cynthia Rudin. Duke University. Fast Sparse Classification for Generalized Linear and Additive Models.

[McCarthy, J., 1959] John McCarthy. Stanford Computer Science Department. Programs with Common Sense.

[Pazzani, M., Soltani, et al., 2022] Michael Pazzani, Severine Soltani, Robert Kaufman, Samson Qian, and Albert Hsiao. University of California, San Diego. Expert-Informed, User-Centric Explanations for Machine Learning. AAAI-2022.

[Rossi, Francesca, 2022] Francesca Rossi. IBM Research. Thinking Fast and Slow in AI. AAAI Invited Talk. AAAI-2022.

[Rudin, C., 2022] Cynthia Rudin. Duke University. Interpretable Machine Learning: Bringing Data Science Out of the “Dark Age”.

[Selman, B., 2022] Bart Selman. The State of AI. Presidential Address. AAAI-2022.

Previous
Previous

The Confessions of F. Blenderbot