Do reinforcement learning systems have valenced subjective experiences -- do they feel pain or pleasure? If so, I'd think
they mattered morally.
Let's assume for the time being that they can have subjective experiences at all, that
there's something it's like to be them. Maybe I'll come back to that question at some point. For now, I want to present a few ideas that could bear on whether RL systems have valenced experiences. The first is an argument that Tomasik points out in his paper, but that I don't think he gives enough weight to:
Pleasure and pain aren't strongly expectation-relative; learning is not necessary for valenced experience
An argument I see frequently in favor of RL systems having positive or negative experiences relies on an analogy with animal brains: animal brains seem to learn to predict whether a situation is going to be good or bad, and dopamine bursts (known to have
something to do with how much a human likes or wants something subjectively) transmit prediction errors around the brain. For example, given an unexpected treat, dopamine bursts cause an animal's brain to update its predictions about when it'll get those treats in the future. Brian Tomasik touches on this argument in
his paper. This might lead us to think that (1) noticing errors in predicted reward and transmitting them to the brain via dopamine might indicate valenced experience in humans, and by analogy (2) this same kind of learning might indicate valenced experience in machines.
However, I think there is a serious issue with this reward-prediction-learning story. This is that in humans, how painful or pleasurable an experience is is only loosely related to how painful or pleasurable it was expected to be. If I expect a pain or a pleasure, it might reduce my experience of pain or pleasure, but it doesn't seem to me that my valenced experience is closely tied to my prediction error; a fully predicted valence doesn't go away, and in some cases anticipating pain or pleasure might intensify it.
Biologically, it shouldn't be too surprising if updating our predictions of rewards isn't tightly linked to actually liking an experience. There seems to be a difference between
"liking" and "wanting" an experience, and in some extreme cases liking and wanting can come apart altogether. Predicting rewards seems very likely to be closely tied to
wanting that thing (because the predictions are used to steer us toward the thing), but seem less likely to be tied closely to liking it. It seems quite possible to enjoy something completely expected, and not learn anything new in the process.
In a nutshell, I'm saying something like:
In humans, the size of error between actual drive satisfaction and predicted drive satisfaction doesn't seem strongly linked to valenced experience. Valenced experience seems more strongly linked to actual drive satisfaction.
This seems to me like evidence against RL systems having valenced experience by virtue of predicting rewards and updating based on errors between predicted and actual rewards. Since this is the main thing that RL systems are doing, maybe they don't have valenced experiences.
If valenced experience doesn't consist of noticing differences between expected and actual rewards and updating on those differences to improve future predictions, what might it consist of? It still seems very linked to something like reward, but not linked to the use of reward in updating predictions. Maybe it's related to the production of reward signals (i.e. figuring out which biological drives aren't well-satisfied and incorporating those into a reward signal; salt tastes better when you're short on it, etc.), or maybe to some other use of rewards. One strong contender is reward's role in attention, and the relationship between attention and valenced experience.
The unnoticed stomachache
Consider the following situation (based on an example told to me by
Luke about twisting an ankle but not noticing right away):
A person has a stomachache for one day -- their stomach and the nerves running from their stomach to their brain are in a state normally associated with reported discomfort. However, this person doesn't ever notice that they have a stomachache, and doesn't notice any side-effects of this discomfort (e.g. lower overall mood).
Should we say that this person has had an uncomfortable or negative experience? Does the stomachache matter morally?
My intuitions here are mixed. On the one hand, if the person never notices, then I'm inclined to say that they weren't harmed, that they didn't have a bad experience, and that it doesn't matter morally -- it's as if they were anesthetized, or distracted from a pain in order to reduce it. If I had the choice of giving a Tums to one person who
did notice their stomachache or to a large number of people who
didn't, I would choose the person who did notice their stomachache.
On the other hand, I'm not totally sure, and enough elements of discomfort are present that I'd be nervous about policies that resulted in a lot of these kinds of stomachaches -- maybe there is a sense in which part of the person's brain and body
are having bad experiences, and maybe that matters morally, even though the attending/reporting part of the person never had those experiences. Imagine a human and a dog; the dog is in pain, but the human doesn't notice this. Maybe part of our brain is like the dog, and the attentive part of our brain is like the human, so that part of the brain is suffering even though the rest of the brain doesn't notice. This seems a little far-fetched to me, but not totally implausible.
If the unnoticed stomachache is not a valenced experience, then I'd want to look more at the relationship between reward and attention in RL systems. If not, then I'd want to look at other processes that produce or consume reward signals and see which ones seem to track valenced experience in humans.
Either way, I think the basic argument for RL systems having valenced experience doesn't work very well; none of their uses of reward signals "look like" pleasure or pain to me.