Friday, March 4, 2016

Updated version of an AI decision theory problem

Let's say we have a model-based RL system in a peculiar episodic environment: at the start of each episode, the system is copied onto another computer, and the copy's actions matter for what happens in the environment (e.g. they are playing some kind of game).

A "correct" model (state, action) → (new state) seems like it should have two properties:
  1. If something causes the copy to behave differently from the system -- e.g., the copy is made incorrectly, the computer the copy runs on malfunctions, or the copy is interfered with during its decision -- the model should predict what the malfunctioning copy will actually do.
  2. "Otherwise", the model should predict that the system and the copy perform the same action; that is, when the model predicts the copy's action in order to predict new state, the copy's action should be identical to the model's input action.
There are two kinds of models that I think it is reasonable to imagine our RL system making: abstract-ish models and physical-ish models. It seems like abstract-ish models will do well with property 2, and physical-ish models will do well with property 1. I can't picture the type of model that will get both properties.
  • When the system has a fairly abstract and non-physical model (state, action) → (new state), it can simply model the copy's action as if it directly depended on the system's action, and it will correctly predict that the system and copy will always act the same way. However, this kind of model will make wrong predictions if something goes physically wrong with the copy.
  • As the system learns a model that correctly handles things going wrong with the copy, it will probably no longer model the copy's actions as directly dependent on the system's action (in part because this is an accurate model of the physical setup). However, this kind of model seems like it will not predict that the system and the copy perform the same action when nothing goes wrong with the copy, since the copy will perform the same physical steps to make its decision regardless of what the input action is.
As the system learns, if things do go wrong with the copy, it will probably get evidence that pushes it toward physical-ish models, so that it can predict those things going wrong. However, it seems like there are common cases where a model fails property 2, but doesn't receive evidence to push it to correct this. For example:
Suppose the system is considering which action to take in state s. It considers actions a1, a2, a3. Using a physical-ish model, it predicts that the copy will take action a1 independent of the system's choice, so that possible (system choice, copy choice) pairs are (a1, a1)(a2, a1)(a3, a1), the second and third of which are mispredictions. If the system prefers the state resulting from (a1, a1) to the states resulting from (a2, a1) or (a3, a1), it will choose a1, and will never receive feedback that if it had chosen a2, the copy would have chosen a2 as well, and ditto for a3.
Intuitively, the model can't learn "what would have happened" correctly, and will need to rely on generalization in order to get this right. I don't know what kind of generalization would produce a model that does this correctly. This kind of problem will be especially bad if there are equilibria that result in the system consistently choosing the same action, as in the Prisoner's Dilemma (where the system never realizes that if it cooperated, the copy would cooperate as well). However, even outside this kind of problem, I still don't know how to make a model that fulfills properties 1 and 2.

This is just a toy case, and I expect analogous problems to come up in analogous situations (e.g. situations where there is not an exact copy, or where the "copy" is another system, human, or market that is learning to predict the system's actions). I don't know how much I should expect this kind of problem to come up in model-free approaches, but it seems worth looking into, and it would be disappointing if this kind of problem blocked us from using model-based approaches at all.

No comments:

Post a Comment