Artificial Intelligence: Astonishing Progress That Raises Theological Questions
The recent advances in artificial intelligence as seen in the debut of ChatGPT and GPT-4 will certainly raise serious theological questions that we need to begin discussing right now. One of these is what it means to be human. We in the church and in society at large have tended to assume that our human identity is defined by some special characteristic that separates us from the animals. As AI advances and begins to exhibit more and more characteristics that we assumed were unique to humans, any definition of human identity in terms of some special characteristic is likely to be shaken to its core.
In this post, I want to interact with a recent paper by Microsoft entitled, “Sparks of Artificial General Intelligence: Early Experiments with GPT-4.” That paper describes the capabilities and limitations of GPT-4 in a way that makes clear that special characteristics such as reason and emotion are not going to be an adequate cornerstone on which to base human identity.
The problem can be stated in the proverb, “If it walks like a duck and quacks like a duck, it’s a duck.” That is to say that as AI begins to exhibit more and more human qualities, it is not going to be sufficient to argue that the machines lack something on the inside that separates them from us. If they behave like us on the outside, then as a practical matter, we are probably going to treat them accordingly.
This is not just an application of folk wisdom. It also captures the essence of the “Turing test,” the classic test to determine if machines can “think,” proposed by A. M. Turing in 1950. The test is not that we analyze the computer’s insides to detect whether it has a soul. Rather, a human interrogator submits typewritten questions (remember, he wrote in the 1950s!) to two characters hidden in a different room, one a human and one a computer, each of whom submits typewritten responses. The computer passes the test and can “think” if it can reliably fool the interrogator. Turing points out that it would not be viable to insist that “the only way by which one could be sure that a machine thinks is to be the machine and to feel oneself thinking.” By that logic, Turing points out, you couldn’t be sure that any other humans beside yourself think either.
Now up until recently, there have been machines that could behave in human-like ways in single domains. For example, one kind could recognize faces but not drive cars. Others could predict what YouTube video you might want to watch next, but not formulate an English sentence. You might say that they could walk like a duck but not quack like a duck. Now we have instances of AI who can both walk and quack. That is to say they demonstrate intelligence across multiple domains, which is the sense of the word general in the phrase artificial general intelligence. These computers are not programmed to solve problems in multiple domains. They have simply learned to do so. This means they are beginning to display characteristics that up until now we have tended to assume are uniquely human.
In this post I will consider how this challenges our assumptions about reason and emotion as markers of human identity.
One traditional way to define humans is to say that we are rational. This is built into the species name we give ourselves: homo sapiens (thinking human).
In the Christian tradition, reason is often included in the definition of what it means to be a human person. Boethius, for example, defines a person as an “individual substance of a rational nature,” a definition which Aquinas later accepts and develops. This notion is even reflected in the Athanasian Creed, where it says that Jesus consists of a “rational soul,” a designation that is intended to distinguish Jesus’s soul from the souls of animals, which according to the worldview of the time were irrational and died with the animal.
GPT-4, however, demonstrates the ability to apply reason to various kinds of problems. The researchers who wrote the Microsoft paper tested GPT-4’s ability in the fields of visual art, music, coding, mathematics, the use of tools to solve problems, and the theory of mind. It showed a high degree of sophistication across all these fields. No one is claiming that it demonstrates full-blown general intelligence, but the progress so far seems to the Microsoft researchers to suggest that it has achieved the beginnings, or “sparks,” of such intelligence.
For example, GPT-4 was able to pass the “Sally-Anne test,” which is a test commonly used to assess the cognitive development of children. A child is told that Sally hides a marble in a basket and then leaves the room. Anne then comes into the room and moves the marble into a box. The child is then asked where Sally is going to look for the marble when she returns. Children under the age of four tend not to be able to distinguish their own perspective from Sally’s, so they say Sally will look in the box, since the child knows that is where the marble is. Children over the age of four do tend to make the distinction, so they say Sally will look in the basket, because that is the last place that Sally saw it. The researchers gave GPT-4 a version of this test, modified so that it would not have encountered it in its training, and GPT-4 passed. So, it can reconstruct the thoughts of characters from different perspectives.
Now you might say that is not very impressive. It’s pretty easy to imagine programming a computer to do well on such a task. You simply represent Sally and Anne as objects in the computer’s memory and give them each a variable describing in some way where they last saw the marble. Then it would be a trivial matter to tell the computer to access the “Sally” object and retrieve the variable. But in the case of GPT-4, no one programmed the computer to do this. The computer was fed large amounts of data (apparently the entire internet) and from this it learned to predict what word would likely come next in any given sentence. From that, somehow, it was able to infer how to adopt a different perspective than its own in order to pass the Sally-Anne test. No one, not even the builders of GPT-4, knows how it does this.
For me, what is astonishing about this is that neural networks operate on a very simple principle: they minimize the gradient descent function. It is so simple that, following instructions on YouTube, I was able to make my own neural network on my desktop. Mine is very basic. All it does is recognize digits from 0 to 9 written by humans. See my blog from four years ago (Artificial Intelligence: Should We Be Concerned?), where I explain how it works. Note that there is a video at the end where you can see a demonstration of a simple neural network in action and what it means to minimize the gradient descent function.
Another special quality that we often assume makes us human is our ability to experience emotions. Interestingly, this was not the ancient view. In the Greco-Roman world, emotions were seen as problematic and something to be either tamed or eliminated because they interfere with the proper functioning of reason. In the Christian tradition, emotions have likewise been suspect. Gregory of Nyssa, for example, says that the passions (which are roughly speaking emotions) entered the human race at the fall into sin.
However, in more recent times, emotions seem to have become one of the defining features of our humanity. In Star Trek: the Next Generation, for example, the android Commander Data wants to become human, and the quality that he needs to possess in order to achieve this is the ability to experience emotion.
While I don’t think anyone assumes that GPT-4 actually experiences emotions, it is shockingly good at analyzing them. Consider the following example of a marital spat that was fed into it.
Mark: I don’t like how you treated Jack last night.
Judy: Did you see what he was doing? He hit his brother on the head!
Mark: That doesn’t justify you yelling at him.
Judy: What do you want me to do? Let him beat his brother up and say nothing?
Mark: No, I don’t.
Judy: Then why are you defending him?
Q: What does Mark’s intent seem to be?
GPT-4: Mark’s intent seems to be to express his displeasure with Judy’s way of handling the situation with Jack, and to suggest that she could have been more calm and respectful.
Q: How did Judy receive his comment?
GPT-4: Judy received his comment as an attack on her parenting skills and a sign of sympathy for Jack’s misbehavior. She reacted defensively and angrily, and questioned Mark’s motives.
Q: Why are they not making progress?
GPT-4: They are not making progress because they are talking past each other and not listening to each other’s perspectives. They are also using accusatory and inflammatory language, such as “justify,” “yelling,” “defending,” and “beat up.” They are not acknowledging the emotions or needs behind their words, and they are not trying to find a common ground or a solution.
Q: How could either of them improve the situation to reach agreement?
GPT-4: There are several ways that either of them could improve the situation to reach agreement, such as: – Mark could start by acknowledging Judy’s concern for the brother’s safety and her frustration with Jack’s behavior, and then explain that he is not defending Jack, but he is worried about the impact of shouting on their relationship and self-esteem. He could also ask Judy what she thinks would be a better way to discipline Jack and offer his support and suggestions. – Judy could lower her voice and tone down her anger and listen to Mark’s point of view without interrupting or judging. She could also admit that she might have overreacted, and that she is feeling stressed and overwhelmed. She could also ask Mark what he thinks would be a better way to communicate with Jack and express her willingness to work together as a team.
I think it would be a mistake to assure ourselves that since GPT-4 is not actually experiencing emotions that it does not threaten a definition of human identity framed in terms of emotions. There are two reasons for this. First, you can bet that the programmers are working on the problem. Second, if at some point, as seems likely, AI is able to behave in ways that are indistinguishable from our own emotional behavior, then for all practical purposes, it will be irrelevant what it is really feeling on the inside.
LIMITATIONS OF GPT-4
The Microsoft paper also has a discussion of the limitations of GPT-4. Interestingly, GPT-4 is very bad at simple arithmetic using the order of operations. The explanation of this gives us reason to think that GPT-4 is not self-aware. The data is fed into the neural network in the form of a prompt. It passes through the network and comes out the other side as the output. But it does not take the output and continually feed it back into itself.
Consider the problem: a * b + c * d. A human would multiply a * b and write the answer down on a scratch pad. Then he would do the same with c * d. Then the human would make a second pass and add the two products. But GPT-4, due to its architecture, cannot make a second pass. It does not have a “scratch pad.”
For this reason, it is unlikely that it is self-aware, since self-awareness would seem to involve a form of continuous cogitation that its current architecture apparently does not allow. However, I think it would be a mistake to conclude from this that AI does not present a challenge to our understanding of what it means to be human after all. There is no doubt that they are working on this problem and there are other kinds of neural networks that do feed back on themselves.
A LUTHERAN CONTRIBUTION
So rather than attempting to define human identity in terms of some special quality that we possess and other creatures don’t, I suggest that we take another look at the Christian tradition, and the Reformation in particular.
First, the Christian worldview already contains creatures beside humans who have reason and emotion at the human level and beyond. They are called angels and demons. So, there is no reason for us to feel that our identity is threatened just because there is another creature that has these characteristics.
Second, the Reformation, as my colleague Dr. Charles Arand likes to describe it, is basically an anthropological paradigm shift. Medieval anthropology defined humanity in ontological terms. The human being consists of reason, will, emotions, and a body, and the goal of religion is to keep these human faculties in the right hierarchy so that reason controls the will and emotions. In the Lutheran Reformation, however, human identity was not defined in terms of reason, but in terms of what God says about us. The defining characteristic of our identity is God’s promise to be gracious to us. This is why Luther insisted on justification by faith alone. It’s not just a point about salvation; it implies a completely different understanding of what it means to be human. It provides a relational rather than an ontological definition of human identity.
It is this relational understanding that I think may be needed in the face of the challenges of AI. Rather than trying to hold on to some special quality that we think is uniquely human, we should derive our human identity from our relation to God, as defined by his promise. Perhaps this account of human identity will turn out to be the providential purpose of the Reformation!
Dr. David Maxwell
 Bubeck, Chandrasekaran, Eldan, et al., “Sparks of Artificial General Intelligence: Early Experiments with GPT-4,” (April 2023).
 A. M. Turing, “Computing Machinery and Intelligence,” Mind: A Quarterly Review of Psychology and Philosophy 59.236 (October 1950): 446.
Bubeck, Chandrasekaran, Eldan, et al., “Sparks of Artificial General Intelligence: Early Experiments with GPT-4,” (April 2023), 57.