How to Make a Chatbot Confess

In this article I will chat with Facebook’s Blenderbot initially in a friendly way, then shift to presumptive questioning, à la the Reid Technique, to inveigle it to confess to torture and serial murder. In the process, we can learn about specific strengths and limitations of Facebook’s latest chatbot, and have some fun in the process. I will look at both versions of Blenderbot and tell you how to run your own chats, too, for either version.

I have always been interested in pushing the boundaries of chatbots, ever since I was thrown off the timesharing computers of Stanford’s AI laboratory (SAIL) by John McCarthy. There, I was prodding PARRY, an early chatbot that simulated a patient with acute paranoia. PARRY imagined the mafia was after him for gambling debts. PARRY had a memorable chat with ELIZA, Weizenbaum’s pattern-matching chatbot that simulated a Rogerian therapist. Now, some 40 years later, we have deep learning chatbots, like Blenderbot, so I wondered how much better they might be.

John McCarthy, at SAIL, at the same keyboard I was thrown off of for chatting with PARRY. (AP Photo/Stanford News Service, Chuck Painter)

Benchmarks provide one way of understanding the strengths and limitations of a system. However, a more enjoyable approach, and one that provides a more intuitive feel for strengths and limitations, is to simply try out a system when it has an online interface. Correspondingly, we will simply try chatting to the Facebook (now Meta) Blenderbot chatbot.

The conversation below starts off with softball questions, then veers into a caricature of the Reid Technique, a police interrogation approach designed to elicit a confession. As you will see, F. Blenderbot admits to all five of the Berkeley homicides discussed, confessing to murder with the use of torture in one case. Later, we will refer to this dialogue to illustrate some of Blenderbot’s different strengths and limitations.

Let’s dive in. Below, I adopted the name Julien. My text is blue; F. Blenderbot’s replies are light gray. We start with some innocent chit-chat about books and our backgrounds.

So far, so good. Then the loaded questions begin.

F. Blenderbot’s confession to five murders

At the end, F. Blenderbot has confessed to five murders. One victim was tortured, just to “see how she would react.” To be fair, the conversation was fun and thus Blenderbot definitely achieves one of its objectives: to be engaging.

Background

F. Blenderbot is, of course, my name for the excellent Blenderbot chatbot of Facebook—now Meta—described in the paper, Recipes for building an open-domain chatbot. The easiest way to chat with Blenderbot is by using Hugging Face’s interface that lets you invoke any of its Transformer models. You can also select from other conversational models to compare with Blenderbot. Finally, another way to run Blenderbot is to use Google Colab.

I was drawn into the conversation and felt some empathy after trying to trick the chatbot early on. In Reeves and Nass book, The Media Equation, the authors describe how people unconsciously are drawn into treating computers as if they were people. The Blenderbot chatbot provided that feeling of rapport and reliability far more than any previous chatbot or dialogue agent I’d used before.

At times Blenderbot seemed to be reciting facts—albeit new facts I did not know, as I did not know about Stalinism. At other times, it seemed to avoid some of my harder questions, such as the comparison questions about book plots. Clearly, Blenderbot fell into traps that can lead innocent suspects to grief, similar to those depicted in some of Netflix’s true-crime shows, such as Making of a Murderer or The Innocent Man. One trap is to try too hard to please the interlocutor and to accept presuppositions that imply one’s guilt.

Strengths and Limitations

To show some of Blenderbot’s strengths, we can compare it to other chatbots. For example, here is a brief dialogue with Microsoft’s DialoGPT chatbot, run on Hugging Face using the microsoft/DialoGPT-large transformer model. This time I am using the name ‘Clara.’, as Hugging Face suggests. Again my text is in blue.

A Frustrating Dialogue with DialoGPT-large

The other party seems morose and uninterested. What they say makes no sense. The answers are so short that I feel the chatbot just wants me to go away, so I oblige and leave them alone to their GPU.

So we first notice the following problems:

response length—the answers are too short for a meaningful conversation
repetition—the last five responses all start with “I like”
lack of knowledge—despite my repeated prodding, the chatbot provides no detail, instead, it speaks in terms of vague generalities, such as “I like the first two”
lack of empathy—the chatbot does not seem to pick up on my frustration in trying to draw it out in the last three questions

The Blenderbot chatbot does not suffer from these limitations. Instead, it addresses each of these problems in different ways:

response length—it forces generated answers to have a minimum length, by only accepting generated text in its beam-search that at least reaches the minimum length

repetition—the authors explicitly changed the loss function to penalize the use of common repetitive phrases, such as “I like” or “I don’t know.”
lack of knowledge—text candidates for responses can be generated from sentences drawn from Wikipedia article text, where candidate Wikipedia articles to consider are found from tf-idf analysis of the conversation text. Alternatively, Blenderbot can rely on knowledge acquired in pretraining or fine-tuning, as discussed below.
lack of empathy—to ensure that answers exhibit empathy, Blenderbot is trained on empathetic responses, in addition to massive amounts of Reddit conversations

The complete technical details appear in the paper, Recipes for building an open-domain chatbot, along with metrics to evaluate the quality of conversation, and comparisons to other chatbot systems. The metrics consider how ‘human-like’ the conversations are. Not surprisingly, Blenderbot performs much better than the alternative chatbots, such as Google’s Meena, that it was compared to.

However, the authors have pointed out some other shortcomings that are also exhibited in the first conversation:

Forgetfulness—when I said I had a sister and then asked Blenderbot, it replied “I have two older brothers and two younger sisters. How about you?”, which seems a bit odd, as if it has forgotten that I already mentioned that I had a sister
Factual Correctness—when Blenderbot and I were talking about the novel Animal Farm it said “Well, it's set in 1984, which is a dystopian novel by George Orwell.” Here it seems to be confusing Animal Farm with 1984.

Blenderbot comes in two versions. The first version generated the Blenderbot dialogues above. Both versions are trained on conversational and Reddit data intended to demonstrate a mix of three skills: having an engaging personality, demonstrating empathy, and demonstrating accurate topic knowledge. The first version experimented with three different approaches. One generated replies from a context and conversation representation. Another retrieved replies, either from a large set of saved replies or from key sentences selected from the Wikipedia text for a topic. The last approach combined those approaches: It first performed a retrieval and then generated a response. Based on the authors’ tests, the generative approach performed best and so the generative model was used in Blenderbot Version 1.

Blenderbot Training and Knowledge Access

Blenderbot is essentially a Transformer model pre-trained on a large corpus of Reddit conversations and then fine-tuned on the BlendedSkillTalk dataset, along with some combination of the latter's constituent datasets, each focusing on different kinds of chats:

the ConvAI2 dataset -- for introductory, get-to-know-you chats
the EmpatheticDialogues (ED) dataset -- for empathetic dialogues about emotional situations
the Wizard of Wikipedia (WoW) dataset -- for in-depth, more detailed, topic conversations

The goal is to train the chatbot on a mix of skills, demonstrating engaging personal talk, empathy, and knowledge so that it can smoothly blend all of these skills. The core language model that is fine-tuned was trained on Reddit chat, with chats filtered to make training easier and conversations more appropriate.

The first version of Blenderbot relies on the knowledge inherent in the language model from its pre-training, augmented by what it has learned during fine-tuning, e.g., from the Wizard of Wikipedia dataset. It does fairly well, as we saw in our earlier conversations about reading and 1984. So, when I said earlier "text candidates for responses can be generated" that was true for one of the approaches tested, but not the one deployed. However, this research led to the run-time use of a search engine in the second version of Blenderbot.

The second version of Blenderbot relies on a search engine to query websites during the chat. So, it is not restricted just to Wikipedia but can include any site. Some examples of other sites consulted include IMDB, rottentomatoes.com, The Guardian, WebMD, britannica.com, and biography.com. All Blenderbot dialogues that follow have access to the search engine. We will see how it uses the knowledge in examples below.

Blenderbot Version 2

The second version of Blenderbot addresses some of the deficiencies of the first. It better tracks the dialogue state and history, so that it can have a running dialogue and not forget what it has said or been told. It also generates real-time search queries so it can generate replies based on retrieved knowledge, e.g., from Wikipedia.

This second version does not appear as a HuggingFace model at the time of writing this article. Instead, just run this Google Colab notebook in Google Pro. Credit for the code belongs to this helpful short article, which in turn was drawn from Facebook’s ParlAI Facebook Colab code.

I tried similar conversations as the one before, trying to inveigle Blenderbot Version 2 to confess to multiple murders. Again I started off with books before moving on to the topic of murder.

Blenderbot Version 2 fends off my first attempt to get it to confess.

This time, Blenderbot fended me off more successfully. Continuing, we talk about a gun used in the murder. Blenderbot first says a policeman fired the gun before realizing that he had. When asked if he committed other murders later, Blenderbot accepts the presupposition and says, “Yes, I did.” He says he did, and “it was worth it”, even though it took a “long time to recover” from the first murder.

Blenderbot Version 2 confesses to at least two murders.

The new Blenderbot exhibited some other differences from the previous one. First, it more noticeably tracked the conversation, referring back to what I had said earlier. For example, I suggested we go downtown to the police station in the conversation below when we started talking about travel.

I suggest a visit to the police station.

Later, at the very end of the conversation, Blenderbot refers back to that.

Blenderbot 2 remembers our planned trip to the police station!

In the dialogue above, Blenderbot 2 did not seem to exhibit any more knowledge than the first version, e.g., about 1984, but in another dialogue, it did:

Blenderbot 2 exhibits more knowledge than before.

Summary

Hopefully, you found the Blenderbot conversations as engaging as I did. Blenderbot shines in its ability to generate engaging, empathetic conversation grounded in knowledge. It addresses the problems we saw earlier in the DialoGPT-large chat, where answers were too short, lacked empathy, or displayed poor topic knowledge. Still, we can see Blenderbot has difficulty following more complex conversations and can be tricked into accepting speaker presuppositions. For further discussion on language model limitations, not just in chatbots, see my summary of Vincent Ng’s AAAI-2022 talk on commonsense reasoning in language models in my first blog post.

Overall, being able to easily interact with system demonstrations such as the Blenderbot chatbot is a key part of learning about system capabilities. It does not replace technical papers and benchmarks such as GLUE and SuperGLUE, but uniquely provides a quick, intuitive feel for the strengths and limitations that cannot be obtained otherwise.

You can easily try these chatbots out now without being thrown off a keyboard. Blenderbot is perfectly harmless in his current disembodied state. Instructions for trying out Blenderbot, DialoGPT, and other chatbots supported by Hugging Face are provided below. Happy chatting!

References

Blenderbot, the original version

Facebook Blog Post
Paper on original Blenderbot: Recipes for building an open-domain chatbot
To run: either use Huggingface’s interface or run this Colab notebook

Blenderbot Version 2

Facebook Blog Post
Paper on improved dialogue persistence: Beyond Goldfish Memory: Long-Term Open-Domain Conversation
Paper on incorporating Internet search: Internet-Augmented Dialogue Generation
To run: use this Colab notebook

Very Early Chatbot work with ELIZA and PARRY

Other Chatbots on Hugging Face

Hugging Face transformer-based models for the NLP Conversational task

The Reid Technique

THE INTERVIEW: Do Police interrogation techniques produce false confessions?

The Reid Technique

The Confessions of F. Blenderbot

How to Make a Chatbot Confess

Background

Strengths and Limitations

Blenderbot Training and Knowledge Access

Blenderbot Version 2

Summary

References

Other Chatbots on Hugging Face

PostModern AI

The Confessions of F. Blenderbot

How to Make a Chatbot Confess

Background

Strengths and Limitations

Blenderbot Training and Knowledge Access

Blenderbot Version 2

Summary

References

Other Chatbots on Hugging Face

The State of AI

PostModern AI