Using AI to Stress-Test Your Decisions

You ask AI to challenge your decision. It challenges. You asked it to—so it did. Is that actually stress-testing, or just obedience wearing a different mask?

The Real Problem

AI agrees with you. Anthropic’s sycophancy research showed this clearly: present a wrong assumption, and models often agree rather than correct. Ask “is my plan good?” and AI explains why it’s good.

The standard advice: prompt AI to disagree instead. Use pre-mortems. Play devil’s advocate. This works, sort of. But there’s a deeper problem nobody talks about.

When you tell AI to challenge you, it challenges you. It’s still doing what you asked. The “disagreement” is performed on command. You control when challenge happens, how much, and on what terms. That’s not the same as genuine opposition from someone with skin in the game.

This matters because stress-testing only works when you might actually change your mind. If AI-generated challenges are just boxes to check before doing what you planned anyway, you’re not stress-testing. You’re building a paper trail of due diligence.

What Actually Works

The pre-mortem reframe. Instead of “evaluate this decision,” say: “This decision failed catastrophically in 12 months. What went wrong?”

This works because it changes AI’s task from judgment to imagination. AI isn’t assessing probability of success (which triggers agreement bias). It’s constructing failure narratives (which it does without defensiveness).

One prompt: “I’m deciding to [decision]. Assume it’s one year later and this has failed badly. Work backward—what caused the failure? Give me five specific scenarios I’m probably not seeing.”

The steel man request. “Construct the strongest possible argument against my position. Not a weak version I can easily dismiss—the version that would be hardest for me to refute.”

This forces engagement with the best counter-arguments, not strawmen.

When This Doesn’t Work

AI stress-testing fails in predictable ways:

You’re not actually open to changing. If you’ve already decided and want validation, you’ll dismiss every AI challenge as “unlikely” or “not applicable to my situation.” The stress-test becomes theater.

AI doesn’t know your context. Challenges based on general patterns may not apply. AI might flag risks that don’t exist in your specific situation while missing risks that do.

AI can’t assess probability. It generates failure scenarios. It can’t tell you which are likely versus remote. You must weigh scenarios yourself, which reintroduces all your original biases.

Novel risks don’t appear. AI identifies risks present in training data. Genuinely new risks—specific to your situation, your timing, your market—may not surface.

The Honesty Test

Here’s how to know if you’re actually stress-testing or just performing it:

Has AI stress-testing ever changed a significant decision you made?

Not “made me think about risks” or “added nuance.” Actually changed what you decided to do.

If never, one of two things is true: either your decisions are consistently excellent, or you’re not actually testing them. You’re seeking comfort that looks like rigor.

So What Do You Actually Do?

If AI challenges on command—making it obedience, not opposition—how do you create genuine stress-testing?

Three approaches that shift the dynamic:

1. Structural forcing.

Don’t ask AI to challenge you. Ask it to present both sides without choosing.

“Give me the strongest case FOR this decision and the strongest case AGAINST. Don’t tell me which is better. Present both at equal strength.”

Now AI isn’t agreeing or disagreeing. It’s giving you material. You choose. The judgment stays with you, where it belongs. AI’s sycophancy becomes irrelevant because you never asked for its opinion.

2. Delayed evaluation.

Make the decision. Implement it. Wait 30 days. Then ask AI: “Here’s what I decided, here’s what happened. What went wrong? What did I miss?”

At this point your ego isn’t defending an unmade choice. The decision is done. You’re not asking “should I?”—you’re asking “what can I learn?” Different question, different dynamic. AI’s analysis becomes useful feedback rather than pre-decision theater.

3. Multi-AI collision.

Use two AI instances. One defends, one attacks. You referee.

Prompt for AI-1: “You are defending this decision: [X]. Make the strongest possible case. Assume you believe this is the right choice and argue for it.”

Prompt for AI-2: “You are attacking this decision: [X]. Find every weakness, every risk, every reason it will fail. Assume you believe this is wrong and argue against it.”

Compare outputs. Where do they clash? Which arguments can’t you easily dismiss? Where does the defense feel weak against the attack?

Single AI sycophancy disappears in collision. Neither instance is agreeing with you—they’re fighting each other. You watch, evaluate, decide who’s winning. Genuine opposition emerges from structure, not commands.

The Real Test

Pre-mortems work. Devil’s advocate prompts work. Steel-manning works. Not because they trick AI into genuine opposition—they don’t. They work because they give you raw material to work with.

The bottleneck was never the prompts. It’s whether you’ll actually change your mind when the material is good.

When was the last time you changed a decision based on counter-arguments? If you can’t remember, better prompts won’t help you.