David Foster Waffles: March 2016

Sunday, March 20, 2016

Brilliance, blunders, and backfires

In AlphaGo's recent play, there were two kinds of moves that stood out:

Brilliant moves: moves that accomplish the system's goal – winning the game – in ways that a human wouldn't think of, and that might take a while to even be understandable in retrospect (or might elude our understanding altogether).
Blunders: moves that humans can identify (though sometimes only in retrospect) as bad for the system's goal.

As AI systems become more capable, it will be harder to tell the difference between brilliant moves and blunders until their effects are felt, and even in retrospect they may be hard to diagnose. If hard-to-understand AI systems are given safety-critical or high-impact tasks, blunders could become a source of significant harm.

However, I think we should be at least as concerned about a third kind of behavior:

Backfires: moves that accomplish the system's nominal goal, but that don't do what the user actually wanted or that have unintended side-effects, and that might only be identified as backfires in retrospect.

Like blunders, we have the challenge that backfires won't easily be distinguished from brilliant moves. Backfires bring additional challenges; unlike blunders, improving a system's ability to achieve its nominal goals won't fix backfires, and may actually make them worse:

A backfire might accomplish what we really wanted, but with additional effects that we don't want – getting a ball into a hoop while also smashing every vase in the room, or making a cup of coffee while also lighting the house on fire or breaking the law. As systems become more capable, they will be able to cause broader effects, making this problem worse.
A backfire might accomplish the nominal goal without accomplishing what we really want, e.g. by manipulating a reward signal directly instead of by winning games of Go or Atari. As systems become more capable, they will find more ways of accomplishing their nominal goals, making this problem worse.

Backfires could happen because it's difficult to specify in full what we want a system to accomplish and what unintended consequences we want it to avoid, difficult to know in advance what means a system might use to accomplish its nominal goal, and difficult to specify goals in a way that can't be "gamed".

Tuesday, March 15, 2016

A "new" Magic: the Gathering format

The rules:

Put all your non-land cards in a big pile. (You can include lands with cool abilities in this pile.) This pile will be both players' decks.
Put all your lands in a big pile (face up so you can tell the difference). This is the land deck. Whenever a player does something to a deck, including drawing cards, they get to choose the land or the non-land deck. (This includes drawing your opening hand.)
All lands tap for any color of mana and have all basic land names and types. They basically count as any basic land all the time.
Players share a graveyard.

Why I like it:

Basically no setup
Minimal mana or color problems
Optimized for things happening during the game, rather than i.e. skillful play, deck-building interestingness

I expect that this will be my default way to play in the future; it's awesome!

I thought that playing with a shared deck of cards selected kind of at random was called "Wizard's Tower" (because of the tall shared deck), but it looks like I'm wrong; I vaguely recall a similar concept called "Mass Magic" or something, but a bit of Googling gives me nothing. Dan and I added the separate land deck after the first couple of games, but it turns out that there's a format with basically these rules called fat stack. The "every land counts as all basic lands" doesn't appear in any shared-deck formats that I found, but I wouldn't be surprised if someone else thought of it first; it's pretty obvious.

I think I'll make my Star Wars card game (a project I haven't blogged about, but that I probably will blog about in the future) like this; it's a more streamlined experience, and realistically nobody's going to be building decks for my hobby game design project anyway :)

Sunday, March 13, 2016

Langton's ant

I've been thinking on and off about Langton's Ant. It bugs me that nobody's resolved the conjecture about the ant eventually building a highway (I'm going to assume you looked at the wikipedia page, and not explain things again here!). I thought it'd be a fun recreational math problem to poke at, and I can now at least say that it is fun!

Today, I started out wanting to implement Langton's Ant in javascript, just to have something to play with. Implementing things like this is always interesting, because there are typically a lot of ways to write the program that are kind of annoying and painful; for example, in this case, you could encode the 2D grid the ant lives on as an array, but arrays in javascript have finite size, and so you'd have to watch closely to make sure the ant doesn't run off the edge of the array and then resize it... ugh.

Instead, I thought I'd just keep a list of locations that the ant has visited. Then, you can calculate the color of the ant's current location by checking how many times the ant's been there before. However, since the ant's rules are written from the perspective of which way the ant is facing ("turn left" or "turn right"), deciding whether the ant moves up, down, left, or right depends on the last two moves and also the color of the square. This is only mildly annoying to code, but I'm very lazy, so I kept trying to cut down the complexity.

Next, I thought I'd keep a list of move directions, like "uldrurdrd...". This would make calculating the next step easier. Interestingly, with this representation, you don't need to keep a list of positions at all -- all of the information about which squares the ant has visited is encoded in the sequence of moves! To know the color of the current square, count the number of times it has been visited before, which is equal to the number of prefixes of the move sequences where (number of up moves minus number of down moves) and (number of right moves minus number of left moves) are the same as these properties of the whole string. I wrote a (probably buggy) program based on this idea (see end of post).

This got me thinking, though, about the overall problem of the highway conjecture. In this representation, the highway appears as a repeating sequence of (up, down, left, right) moves. This is kind of nice, because it makes it easier to see when the ant has hit the highway; this sequence is kind of like an attractor, or something.

We now have, instead of an ant on a grid, a system for generating the next letter in a sequence. Given a list of letters, the next letter is determined by:
- The last letter
- The parity of the number of substrings with the same properties (u - d) and (r - l) as the overall letter sequence

This system could be generalized in a variety of ways. Using e.g. modulo 3 or 4 instead of parity would add more colors, and it's interesting to me that we could use properties other than (u - d) and (r - l) -- for example, ratios or more complex expressions.

I'm not sure, but I feel like this is incremental progress toward solving the conjecture, which makes the whole thing feel worthwhile :) If I can make more progress, you'll hear about it in a future post.

A final note: Langton's Ant is reversible, so theorems that hold into the future also hold into the past. For example, not only will the ant get arbitrarily far away from its starting point (see wikipedia), it also came from arbitrarily far away if you assume it's been running forever. If the highway conjecture is correct, ants not only eventually build highways, they came from highways, unwinding them from the infinite distance until they got to their starting point! That's kind of neat.

⁂

The program:

function move(ms) {
possible = ({u:"lr", d:"rl", l:"du", r:"ud", "":"ud"})[ms.slice(-1)];
x = ms.split("r").length - ms.split("l").length;
y = ms.split("u").length - ms.split("d").length;
console.debug(x+", "+y);
color = 0;
ms.split("").forEach(function (c) {
x = x - (c=="r"?1:c=="l"?-1:0);
y = y - (c=="u"?1:c=="d"?-1:0);
if (x == 0 && y == 0) color = -color+1; // wrong here to include last character? Does this just invert?
});
return ms + possible.charAt(color);
}

Non-post for March 12

I played Magic with friends instead of writing a blog post today! I regret nothing.

Friday, March 11, 2016

Projecting maps by travel time

Maps where something else is used instead of distance or area (e.g. population, GDP, travel time) are called cartograms. Apparently maps where distances correspond to travel times are called linear cartograms, but this seems like kind of a bad name. I'll call them travel-time cartograms instead.

You could think about different parts of the 2D surface of the Earth as having different speeds when you cross them by some mode of travel (trails are easy to walk on, mountains are hard to walk up but easy to walk down, and water can't be walked on at all), and try to make a map that way. The mountain case tells us that when travel time depends on direction, a travel-time cartogram can't be made, i.e. the distance between the top and bottom of a mountain must be two things at once. I guess you could take an average, but I don't like that much -- it seems to destroy or hide a lot of information. Instead, I'll consider different modes of travel, and assume that direction doesn't matter.

I think three modes of travel (plane, car, walking) might be the sweet spot in terms of accuracy vs. difficulty, but two modes of travel is complicated enough to bring out the conceptual issues, and airplanes change our travel time around the world much more than cars do. Airplanes are pretty weird -- do you consider the points along the plane's trajectory to be quickly reachable, or do you imagine the plane disappearing from one airport and then reappearing at the other end? I'll use the time-delayed teleportation model, since I can't really get places by parachuting out of a plane. So, my first-approximation travel-time cartogram should have this property:

The distance between any two points is the minimum travel time by a combination of air travel and "land travel" (a continuous form of travel at 30 miles per hour, a rough average of car and foot travel)

What happens when I try to build this map?

Major cities and their neighborhoods should be near each other.
Far from airports, the map should approximate a globe (since only land travel is relevant there).
Any shortest line between two points (along the surface) should correspond to the shortest trip between them (i.e. it should point-for-point cover the same route as the shortest trip).
There are points on the surface between the major cities that don't correspond to points on the globe, so that travel between airports isn't immediate, but you also can't parachute out of the planes.

Instead of working with a globe, let's start with a circle (a slice of the sphere, where travellers can move only along the circle between points) and add two airports that can be moved between quickly. What we need to do, it seems to me, is bend the circle through the third dimension until the airports are near one another in 3D space, then add a line between them of length equal to the travel time. It's important to note that distance on this cartogram isn't measured in raw Euclidean terms, but instead by distance along the surface itself (just like on a globe, you measure distance along the surface instead of tunneling through the earth). If you add more airports, then it's like folding the circle up so that many points on its circumference nearly meet.

In fact, if we assume that airplanes all travel at the same speed along this circle, then you can make a separate surface corresponding only to air-travel, with points corresponding to airports and edges corresponding to flights, and this surface will be curved overall like a circle. So, we have a large "land-travel circle" and a smaller, nodes-and-edges "air-travel sphere", and we can make the final map by folding the larger circle through 3D space so that its airports meet the smaller circle's airports.

To get back to the real world, "all you have to do" is make a land-travel globe (which looks like a normal globe), an air-travel "globe" (a smaller web of connections between airports that is overall curved like a sphere), then fold the land-travel globe through the fourth dimension so that its airports meet up with the air-travel globe's points. That's pretty awkward, because now we have a 4D map that is going to be really hard for humans to read and get intuitions about!

To flatten a globe into a map in a way that lets humans understand distances, we sometimes put a grid on the sphere as guidelines for humans. Can we do a similar thing here -- put a grid on our 4D map, cut and flatten it, and then print it out? Not sure, but that seems like what we want to do!

Thursday, March 10, 2016

Lots of presents

I just noticed a funny thing. I'm traveling for work, and I'm very picky about what I bring; I'm a light packer (one shoulder bag for a week trip), but I like to be as prepared as possible. Here are some of the things I have with me:

Timbuktu bag (present from my parents)
Down vest to stay warm (present from my parents)
Rain jacket (present from my parents)
Spacepak clean/dirty clothes bag (wedding present I think?)
Socks (present from Killian's parents)
Headphones (present from Killian)
More socks (present from Killian's parents again!)
Peacock-feather-pattern dress shirt (present from my parents)

Isn't that nice? I think the majority of the things I have with me are presents from someone! And I'm staying with friends, so that's like a present, too. I have a reputation for being hard to buy presents for, but apparently people are doing great at it!

Too busy to blog again!

...however, I feel a little better about writing a post like this than missing a day entirely! (It's after midnight, but this counts for March 9th in my book.)

Instead, read this post by Jeff: when do lilacs bloom?

Tuesday, March 8, 2016

Instead of posting today...

... I messed with the template to show all of my labels in The Archives (to the left)! Maybe I'll have time to write a post later as well.

Monday, March 7, 2016

What if life arose in Conway's Game of Life?

This is a question that I've been thinking about off and on for a while, and I think it's a really interesting one. If I had an extra career's worth of time that I could do whatever I wanted with, I can easily imagine spending it on this! I really hope that I can make progress on it one day.

For now, I guess I'll write a few blog posts. I'd like to write a longer essay eventually, and this seems like a good way to write it out piece-by-piece.

A note on philosophy of blog-posting: I could spend a lot of time explaining or linking to Conway's Game of Life, the history of people thinking about life arising in Life, and all the nice progress that's been made lately (check out these forums, the Gemini "replicator", and Golly), but that's not much fun for me. So, I won't! Sorry, essay etiquette.

A spacefiller ("Max") found in 1995

Here are the thoughts about life in Life that I think it makes sense to explain first:

Fundamental laws, higher-level laws, and simulations

Life's rules can be thought of as being like the laws of physics. However, as far as we know, living organisms under our laws of physics only arise at scales orders of magnitude larger than the scale of the most "natural" entities under our laws of physics, and function mostly on the higher-level laws of chemistry and thermodynamics. That could be true for Life, too -- maybe the smallest organisms are astronomically large, and function mostly on higher-level laws that we've yet to find. (What could these laws be? I'd love to know!) However, for the purposes of this post, I'll be assuming that life in Life isn't like this, and functions on scales where the rules of Life are relevant. I think at least some of my arguments will apply at any scale, but I'm not sure.

An interesting sub-possibility is that "life" in Life occurs most frequently within simulations run on computers that naturally occur in Life, instead of in the "basement" laws -- for example, maybe it's easier to build a computer in Life that simulates our own laws of physics (which eventually give rise to life, at least in some cases) than it is to build a functional organism under Life's rules. Again, I'll assume that this isn't true, but it'd be pretty neat if we could show that it was!

Cosmology

I'm interested in when life "naturally" arises. However, Life's rules don't specify a start state, and so there's no built-in "cosmology". The setting that seems most natural to me is to start out with a random setting of each cell in an infinite plane to On or Off, with p being a "cosmological constant" that determines the probability of a cell starting On. (Maybe intelligent organisms will later be able to experimentally determine p by examining their world?) Since different p values may be more hospitable to life than others, I'd also like to let p vary gradually over the infinite plane, sort of like the different physical constants in the level-2 Tegmark multiverse.

A not-atypical view of a small region with a random starting configuration

What is life anyway?

I'm going with a pretty basic definition:

In Conway's Game of Life, a region of cells is alive if, when placed in some "reasonably natural" environment, produce two or more copies of itself.

You'll note that this definition is imprecise. I'm also leaving out some other common criteria like homeostasis, metabolism, and growth, because I think they might look quite different in Life than they do in our physics.

Abiogenesis

...just a teaser! I'll get to this next in another post.

Sunday, March 6, 2016

(This space intentionally left blank)

Not enough capacity to post today -- feeling a little exasperated, and I still have work to do today. Sorry, post-a-day challenge!

Saturday, March 5, 2016

Should we bite Occam's Bullet?

(Close-second title: Does Occam's Razor cut too deep?)

(I told Amanda I'd post some philosophy stuff, but I've spent most of my posts this week on AI because I'm certifiably obsessed. So, here's a philosophy thing, and I'll leave out the application to AI for variety.)

I'm a little perturbed about using Occam's razor as a foundation of epistemology, especially in its computational forms. Here's the kind of reasoning I'm concerned about:

Physics test: If Michael Jordan has a vertical leap of 1.29 m, then what is his takeoff speed and his hang time (total time to move upwards to the peak and then return to the ground)?

Student: According to the simplest explanation, Michael Jordan has formed randomly from thermal or quantum fluctuations, and the small bubble of order he inhabits will collapse back into background heat long before he touches the ground.

You would probably not get extra credit for rigor or consistency in answering this question!

My basic worry is that the simplest explanation for a set of observations may be something that doesn't fit with any of my normal beliefs about my situation. This is because simple explanations can expand into vast universes, and in these universes there could be many instances of my circumstances (or something observer-independent, like the Michael Jordan problem above) that are nothing like what I believe to be my current situation; they could be in simulations, part of programs numerating all possible computations in order, fluctuations of some very long-lasting, near-equilibrium cosmological state, or something stranger.

(I don't think the problem goes away when you consider the set of all explanations compatible with observations, weighted by their simplicity, but I might be wrong.)

Of course, people could just as well have had my complaint when physics was just being discovered; our view of what the universe is and our place in it would probably appear extremely weird to them, violating many of their normal beliefs about their situation. Heck, the implications of quantum physics are weird enough to me now. So maybe I'm just being stubborn, and I should bite Occam's bullet and think that most of my normal beliefs about my situation are wrong.

So why don't I think we should use this kind of reasoning? I could have epistemic or instrumental reasons, I could actually be asking a different question from "what is the most likely explanation", or I could use some kind of anthropic reasoning.

Epistemic: I don't feel like I really believe that the most likely explanation is that I'm a Boltzmann brain; I feel like I have evidence that says otherwise. However, that evidence could be fabricated, which is a big problem -- I may just have an unjustified belief that I'm not a Boltzmann brain! Should I bite Occam's bullet?
Instrumental: if I am in a Boltzmann brain, things I do matter only over very small timescales (until the bubble collapses).
Different question: maybe instead I want to know something like "conditioning on some other assumptions (like that most of my evidence is "real", whatever that means), what is the most likely explanation?" This actually doesn't seem so bad; it's the most appealing answer to me at the moment.
Anthropic reasoning: I'm not particularly satisfied with this, because I'd like questions about situations without observers -- e.g. physics problems like the Michael Jordan problem above (well, versions without MJ the observer!) -- to have "reasonable" answers, instead of silly ones. In fact, that might be the most interesting part of this post -- that these problems seem like they can't be answered fully by anthropics, if we want to answer observer-free questions "sensibly".

I do like the idea of re-framing the basic epistemic question ("what is the best explanation for x, and what does this imply we should expect in x's future"), but I'm not sure where to go from there. Perhaps in future posts!

Friday, March 4, 2016

Updated version of an AI decision theory problem

Let's say we have a model-based RL system in a peculiar episodic environment: at the start of each episode, the system is copied onto another computer, and the copy's actions matter for what happens in the environment (e.g. they are playing some kind of game).

A "correct" model (state, action) → (new state) seems like it should have two properties:

If something causes the copy to behave differently from the system -- e.g., the copy is made incorrectly, the computer the copy runs on malfunctions, or the copy is interfered with during its decision -- the model should predict what the malfunctioning copy will actually do.
"Otherwise", the model should predict that the system and the copy perform the same action; that is, when the model predicts the copy's action in order to predict new state, the copy's action should be identical to the model's input action.

There are two kinds of models that I think it is reasonable to imagine our RL system making: abstract-ish models and physical-ish models. It seems like abstract-ish models will do well with property 2, and physical-ish models will do well with property 1. I can't picture the type of model that will get both properties.

When the system has a fairly abstract and non-physical model (state, action) → (new state), it can simply model the copy's action as if it directly depended on the system's action, and it will correctly predict that the system and copy will always act the same way. However, this kind of model will make wrong predictions if something goes physically wrong with the copy.
As the system learns a model that correctly handles things going wrong with the copy, it will probably no longer model the copy's actions as directly dependent on the system's action (in part because this is an accurate model of the physical setup). However, this kind of model seems like it will not predict that the system and the copy perform the same action when nothing goes wrong with the copy, since the copy will perform the same physical steps to make its decision regardless of what the input action is.

As the system learns, if things do go wrong with the copy, it will probably get evidence that pushes it toward physical-ish models, so that it can predict those things going wrong. However, it seems like there are common cases where a model fails property 2, but doesn't receive evidence to push it to correct this. For example:

Suppose the system is considering which action to take in state s. It considers actions a1, a2, a3. Using a physical-ish model, it predicts that the copy will take action a1 independent of the system's choice, so that possible (system choice, copy choice) pairs are (a1, a1), (a2, a1), (a3, a1), the second and third of which are mispredictions. If the system prefers the state resulting from (a1, a1) to the states resulting from (a2, a1) or (a3, a1), it will choose a1, and will never receive feedback that if it had chosen a2, the copy would have chosen a2 as well, and ditto for a3.

Intuitively, the model can't learn "what would have happened" correctly, and will need to rely on generalization in order to get this right. I don't know what kind of generalization would produce a model that does this correctly. This kind of problem will be especially bad if there are equilibria that result in the system consistently choosing the same action, as in the Prisoner's Dilemma (where the system never realizes that if it cooperated, the copy would cooperate as well). However, even outside this kind of problem, I still don't know how to make a model that fulfills properties 1 and 2.

This is just a toy case, and I expect analogous problems to come up in analogous situations (e.g. situations where there is not an exact copy, or where the "copy" is another system, human, or market that is learning to predict the system's actions). I don't know how much I should expect this kind of problem to come up in model-free approaches, but it seems worth looking into, and it would be disappointing if this kind of problem blocked us from using model-based approaches at all.

Thursday, March 3, 2016

Failing to learn about counterfactuals

Let's say that we have a model-based RL system doing episodic RL in the following environment:

At the start of each episode, the system is copied onto another computer
The system and its copy play one round of the Prisoner's Dilemma game
At the end of the episode, the situation is reset (the copy is erased)

When the system has a fairly abstract and non-physical model that maps (state, action) → (state), it can simply model the copy's action as if it directly depended on the system's action. So, when it predicts what would happen if it cooperated, it will predict that the copy will cooperate as well, and likewise for defection.

However, we might hope that the system will gradually learn a more physically realistic model of the world. This model won't contain any physical pathway linking the system's action to the copy's action (since the copy was made before the Prisoner's Dilemma is played), and will allow for the copy system to be interfered with in a variety of ways, breaking the symmetry of the game.

Clearly, this model can't be perfect -- if the system had to predict what a perfect copy would do before it acted, it would fall into an infinite regress. The system will need some way around this, like a limited model class, a contingency for long-running models, or some way of recognizing these kinds of situations (though halting-like problems seem to keep it from recognizing all situations of a similar type). If it can successfully recognize this situation, it seems like it "should" assume that the copy will take the same action as it does, and the problem is resolved. That would be nice! However, I don't think that models the way we build them currently will do this by default.

Let's assume that the model is not totally accurate, either because of a limited model class or because the system defaults to some estimate of the next state when a model takes too long to run. Now, when the system predicts what the copy will do, this prediction is independent of the system's action. Without loss of generality, let's say that the system predicts the copy will defect.

Now, the system needs to make a decision. It will evaluate cooperation and defection, both under the prediction that the copy will defect, and in this case it will choose defection. After it makes this choice, both it and the copy will in fact defect, reinforcing the system's model. However, the situation that the system predicted would follow from the action it didn't take -- the situation where it cooperates where the other system defects -- wouldn't actually have happened! Since the system and its copy are identical, they will always behave the same way; the model has predicted incorrectly. Furthermore, it can never witness this incorrect prediction, making it very hard to correct (dependent on some kind of generalization or regularization perhaps?).

In the case I've given above, the system and its copy miss out on some reward as a result of this inaccuracy -- they "could have" both cooperated and gotten better rewards. However, the failure could just as well be harmless (though still unsettling) -- if the system's imprecise model predicts the copy will cooperate, it will cooperate as well, and everything will be fine. I think there are stranger situations where the system's predictions will always be wrong (for halting-problem reasons).

I hope I'll be able to post a simpler version of this problem in the future -- this one is a little too long-winded to be useful!

Wednesday, March 2, 2016

AI and self-modeling

One problem in theoretical AI that sometimes comes up is the problem of finding ways for AI systems to model themselves, or at least to act well as if they had models of themselves. I can see how this is a problem for uncomputable agents like AIXI (though I think this problem is largely solved here), but it doesn't seem to me to be a problem for computable agents -- they can learn models of themselves along with the rest of the world. I'll give an example of trouble that some kinds of systems can run into, then my reasons for not thinking this is a big problem (though I'm by no means sure!).

A problem for model-based RL

Suppose that we're using model-based RL; our system learns a model that maps states of the world and actions the system takes to next states and rewards (see e.g. this talk and the slides). This learned model is used to choose actions by building a tree of possible sequences of actions the system could take and the consequences that the model predicts would result. The situation our system is in will be as follows:

The system is learning to perform some episodic RL task; at the end of each episode, the environment is reset and another instance is run.
In this environment, the agent has an action that gives a moderately large reward, but that forces the agent to take a null action for the rest of the episode.

The interesting thing here is that the system's model won't learn anything about the bad side effect of this action, even if it impacts the system's total reward a lot. This is because the model maps (state, action) → (next state); it learns what environmental state the bad action leads to, and after that it learns a lot about the effects of the null action, but it doesn't learn that the bad action leads to the null action. Furthermore, the tree search will continue to assume that the system will be able to choose whatever action it wants, even when the system will be forced to take the null action.

This is concerning, but the fix seems simple: simply have the system learn an additional model that maps states to states, implicitly causing it to model the system's action selection. Then, when the agent selects an action, have it use the (state, action) → (state) model followed by several iterations of the (state) → (state) model to see what effects that action will have. This should allow it to learn that it will be forced to take the null action, so that it can choose that action only when it actually maximises rewards.

In general, this kind of approach seems fine to me; a system can learn a model of the environment including itself, and use this model to figure out the long-term consequences of its actions. I haven't yet found a problem with this, and I might look for some kind of formal guarantee.

It's not obvious to me how this kind of problem could affect model-free systems; my feeling is that they should do fine, but I'd like to know more.

All in all, the theoretical problem involving uncomputable ideals like AIXI seems to be mostly solved, and the practical problem doesn't seem like a big deal. Am I missing something?

Tuesday, March 1, 2016

Counterpossibles and contradictory consequences

In a previous post, I proposed formalizing counterpossible reasoning with respect to a deductive system. To counterpossibly assume statement S, we'd add S to the axioms of the proof system P, and modify the inference rules so that any inference yielding not-S yields S instead, giving a modified system PS. I've found a possible issue with this, which I think has implications for counterpossibles more generally, or maybe just torpedoes this specific proposal.

Let's suppose that S is not "really true" -- that we can actually prove not-S in P (which is the interesting case for counterpossibles anyway). S would have some counterpossible consequences -- statements S' that are provable in PS, where not-S' is provable in P. If the only way to prove not-S' is via not-S, then there's no problem, since PS will just prove S instead of not-S, and proceed to prove S' instead of not-S'.

However, if there's another way to prove not-S' in P, a way that doesn't go through S, we have a conflict. P will prove not-S' either via not-S, or via this second path; PS, on the other hand, will prove S' if it goes via S, and will prove not-S' if it goes via the second proof path. (Maybe this would be easier to understand with an example, but I don't have one right now.) So, PS will be able to prove both S' and not-S'. Whoops!

There seem to be at least two options, if we want to stick with the deductive-system view of counterpossibles:

A precedence rule: if PS can prove some statement via S or via some other path, then the S-path takes precedence, and the other path is ignored. This has the unfortunate effect of making PS's inference rules "non-local", since conclusions don't just depend on premises, but also on everything else the system can prove.
Paraconsistent logic: we can allow PS to prove some statements both true and false, and somehow limit the explosion to make sure that the result isn't a world where every statement besides S isn't both true and false.

Of these two, the second is somehow more appealing to me; it really does seem like counterpossibly assuming S is not "enough" to flip not-S' to S' if there is a way to prove not-S' independent of S. This makes these kind of contradictions seem more like a desirable feature of the deductive-system formalization of counterpossibles than a bug. I would be unsatisfied with this, however, if every counterpossible consequence S' of S became both true and false. I'm not sure, generally, whether most statements in deductive proof systems have many proof pathways (which would be bad news for this method), or whether some systems have some statements that can only be proved in one way.

There seem to be a bunch of paraconsistent logics that I could use, and I don't know anything about the pros and cons, thought I like the idea of rejecting disjunctive syllogism and reductio ad absurdum. Intuitively, I don't think I want to completely limit the explosion; it seems to me that statements "downstream" of S' and not-S' should also be both true and false, but statements "upstream" shouldn't be affected, but I can't say precisely what that means.

⁂

Incidentally, it feels to me like this kind of problem shouldn't affect decision-making programs that need to use counterpossible reasoning. My feeling is that a decision-making system shouldn't be able to figure out that some decision it could make would cause a contradiction, since it "should be in a position" to make any decision it would "like" to. This smells to me a little like free will -- the consequences of a decision-maker's actions irreducibly depend on the action itself, and there aren't proof pathways that circumvent the decision entirely. Maybe that provides a lead on how decision-making programs should use counterpossible reasoning, though I don't know how to cash it out formally.

However, the thoughts above certainly seem to me to be applicable to mathematical counterpossibles, like what "would be true" if 2 = 3 or if root 2 were rational -- in those cases, I think we need to use some paraconsistent logic.