Which Tasks to Give AI: A Selection Framework

Not every task belongs to AI, and not every task belongs to humans. The framework separates them systematically.

The Selection Problem

AI can attempt almost anything you ask. That’s not the same as AI doing everything well.

This article provides the decision framework. You know what AI can do (the capability map) and where delegation creates unacceptable risk (the danger zones). Now you need systematic criteria for everything in between: which tasks to delegate, which to keep, and which benefit from hybrid approaches.

The framework here is operational. Not capability analysis. Not risk warning. A practical decision tree for “should AI do this specific task?”

The Automation Potential Spectrum

McKinsey Global Institute’s research on generative AI (June 2023) quantified automation potential across work activities. With generative AI capabilities, 60-70% of current work activities now have automation potential, up from roughly 50% before these tools existed.

The key distinction: this measures task-level automation potential, not job elimination. Most roles contain a mix of automatable and non-automatable activities.

High automation potential (60-80% of task time):

Data collection and processing
Routine data entry
Format transformation
Template-based content generation
Pattern recognition in structured data
Scheduling and calendar management
Basic research aggregation

Moderate automation potential (30-50% of task time):

Analysis and interpretation
Content creation requiring judgment
Communication drafting
Decision support
Quality review assistance
Creative ideation

Low automation potential (under 20% of task time):

Strategic decision-making
Stakeholder management
Complex negotiations
Crisis response
Innovation and novel problem-solving
Relationship building
Collaboration requiring real-time adaptation

The percentages indicate how much of that activity type can typically be automated effectively, not how much AI can attempt.

The Four-Quadrant Framework

Task selection benefits from two-dimensional analysis: task complexity and error tolerance.

Quadrant 1: Low complexity, high error tolerance. Best AI fit. Routine tasks where mistakes are easy to catch and cheap to fix. Volume work. Template-based content. Data transformation. First drafts of low-stakes materials.

Approach: Full delegation with light review.

Quadrant 2: Low complexity, low error tolerance. Needs human oversight. The task itself is simple, but errors carry consequences. Customer-facing communications. Compliance documentation. Financial calculations.

Approach: AI assists, human verifies before output goes anywhere.

Quadrant 3: High complexity, high error tolerance. Experimental territory. Complex tasks where getting it wrong doesn’t hurt much. Strategic brainstorming. Creative exploration. Option generation.

Approach: Use AI to expand thinking without depending on accuracy.

Quadrant 4: High complexity, low error tolerance. Human domain. Complex judgment where errors cause real damage. Legal strategy. Medical decisions. Major financial commitments. Novel situations requiring genuine reasoning.

Approach: Human-led, AI may assist with background research only.

The Framework in Practice: Three Cases

Abstract frameworks become useful through concrete application. Here’s how three organizations applied task selection criteria in 2024.

Klarna: Full Automation Zone

Task: Customer service conversations, return processing, routine inquiries.

The selection logic: High volume (millions of conversations), low complexity (standard questions with standard answers), high error tolerance (mistakes are inconvenient but not catastrophic), easy verification (customer satisfaction scores provide immediate feedback).

Result: AI assistant handled 2.3 million conversations in its first month, work previously requiring 700 full-time agents. Resolution time dropped from 11 minutes to 2 minutes. Customer satisfaction remained equivalent to human agents.

Framework position: Quadrant 1. Full delegation with monitoring.

Morgan Stanley: Augmentation Zone

Task: Synthesizing investment insights from thousands of pages of market research.

The selection logic: High complexity (financial analysis requires judgment), low error tolerance (bad investment advice has serious consequences), but information gathering is separable from decision-making.

Result: AI assistant reduced time financial advisors spent searching for information by 90%. But the final output (investment recommendations) remained entirely human. AI gathered and organized; humans analyzed and decided.

Framework position: Quadrant 4 task split into components. Research aggregation (Quadrant 1) delegated to AI. Final judgment (Quadrant 4) retained by humans.

CarMax: Acceleration Zone

Task: Writing customer review summaries for thousands of used vehicles.

The selection logic: High volume (5,000+ vehicles), moderate complexity (summaries require synthesis), moderate error tolerance (inaccurate summaries damage trust but aren’t catastrophic), verifiable (human editors can review).

Result: AI generated summaries at scale. Human editors shifted from writing to approving, reviewing AI output rather than creating from scratch. Production timeline collapsed from years to months.

Framework position: Quadrant 2. AI produces, humans verify before publication.

The Repeatability Test

Tasks that repeat identically benefit most from AI investment.

High repeatability = high AI ROI:

Weekly reports with consistent structure
Product descriptions following templates
Customer service responses to common questions
Data transformation between consistent formats
Regular email communications following patterns

The prompt engineering investment spreads across many uses. Quality improvements compound. Workflow integration becomes worthwhile.

Low repeatability = question AI investment:

One-time strategic documents
Unique client situations
Custom creative work
Novel problem-solving
Relationship-specific communications

When each instance requires new context transfer, the per-task overhead may exceed time savings.

The Context Transfer Test

Some tasks require context that’s hard to transfer to a prompt.

Easy context transfer:

Format specifications
Length requirements
Tone examples
Structural templates
Factual information
Explicit rules

Hard context transfer:

Relationship history
Organizational politics
Unstated preferences
Industry nuance accumulated over years
Client personality and communication style
Strategic priorities that inform tactical decisions

When a task depends heavily on hard-to-transfer context, AI produces generic output that misses what matters. The output may be technically correct while being practically useless.

Ask: Can I explain in a paragraph everything AI needs to know to do this task well? If the answer is yes, AI can likely help. If the answer involves “but they need to understand…” or “it depends on context that…” the task may not transfer effectively.

The Verification Capability Test

Before delegating, assess your ability to verify output quality.

Easy verification:

Spelling and grammar (tools verify)
Format compliance (visual check)
Length requirements (word count)
Structural completeness (checklist)
Basic factual accuracy (quick search)

Hard verification:

Nuanced accuracy in specialized domains
Appropriateness of tone for specific audiences
Strategic alignment with goals you haven’t articulated
Legal or regulatory compliance
Technical correctness in areas outside your expertise

Delegate tasks you can verify. Be cautious with tasks where you’d trust AI output because you can’t evaluate it. That’s inverted logic: the less you can verify, the more caution is warranted, not less.

The Stakes Assessment

Error consequence should drive delegation decisions.

Low stakes (AI-appropriate):

Internal drafts for iteration
Brainstorming materials
Personal productivity tasks
First attempts that will be reviewed and revised
Content for internal consumption

Medium stakes (AI with oversight):

Client-facing communications
Published content
Business documentation
Decisions affecting moderate resources
External representations of your work

High stakes (human primary, AI assist only):

Legal implications
Financial consequences
Regulatory compliance
Reputation-critical outputs
Decisions affecting people’s lives or livelihoods

The stakes test isn’t about AI capability. It’s about error cost. AI might produce good output 95% of the time. But 5% failures on high-stakes tasks create damage that 95 successes don’t offset.

Calculating Task-Level ROI

“High ROI” needs measurement, not intuition. Here’s the basic calculation:

ROI = (Time Saved × Hourly Cost) – (AI Cost + Human Review Time) / AI Cost

The variables that matter:

Time saved benchmarks (from research):

Coding tasks: 55% faster with AI assistance (GitHub Copilot study)
Routine business writing: 66% faster (Nielsen Norman Group)
Consulting/analysis tasks: 25% faster (Harvard/BCG study)

Costs to include:

AI subscription or API costs
Time spent writing and refining prompts
Time spent reviewing and correcting output
Time spent on iterations when first output fails

Break-even calculation: If a task takes 60 minutes manually and AI reduces it to 30 minutes but requires 10 minutes of prompt writing and 10 minutes of review, net savings is 10 minutes. If prompt development took 30 minutes upfront, you need three uses to break even.

High-repeatability tasks amortize setup costs across many uses. One-off tasks carry full setup cost against single use.

Task Categories: Specific Guidance

Content creation: First drafts: AI-appropriate Ideation and brainstorming: AI-appropriate Variations and adaptations: AI-appropriate Final quality judgment: Human required Voice and brand alignment: Human review required Fact-checking: Human verification required

Research: Aggregating known information: AI-appropriate Summarizing long documents: AI-appropriate Initial literature review: AI-appropriate Fact verification: Human required Novel analysis: Human-led, AI assist Source credibility judgment: Human required

Communication: Templates and standard responses: AI-appropriate Internal documentation: AI with review External communications: AI draft, human review Relationship-critical messages: Human primary Negotiation correspondence: Human required

Analysis: Data formatting and transformation: AI-appropriate Pattern identification in structured data: AI-appropriate Interpretation and recommendations: Human-led, AI assist Strategic implications: Human required

Code: Boilerplate and standard patterns: AI-appropriate Documentation: AI-appropriate Debugging routine issues: AI-appropriate Architecture decisions: Human required Security-critical code: Human review mandatory Production deployment decisions: Human required

The Time Audit Approach

To identify your best AI opportunities:

Track time for one week. Note what you spend time on and categorize by:

Task type
Repeatability
Stakes level
Context requirements
Verification difficulty

Look for clusters in: high-time + low-stakes + high-repeatability + easy-verification. These are your highest-ROI AI delegation opportunities.

Low-time tasks rarely justify AI investment. Even if AI could help, the setup cost exceeds the time saved.

High-time + high-stakes tasks deserve careful analysis. The time savings would be meaningful, but the stakes require careful implementation.

Implementation Sequence

Start with Quadrant 1 tasks: low complexity, high error tolerance, high repeatability. Build confidence and develop skills before expanding.

Move to Quadrant 2 with explicit verification processes: low complexity, low error tolerance. Use AI but verify before anything goes out.

Experiment with Quadrant 3 for thinking expansion: high complexity, high error tolerance. AI for brainstorming and option generation without trusting accuracy.

Maintain human control of Quadrant 4: high complexity, low error tolerance. AI can research and support; humans decide.

Expanding too fast into high-stakes territory before building verification habits creates risk. Expanding too slow into low-stakes territory wastes opportunity.

The Review Protocol

For tasks you delegate to AI, establish review protocols proportional to stakes:

Light review (low stakes): Scan for obvious errors. Check format compliance. Verify length. If nothing jumps out, proceed.

Standard review (medium stakes): Read completely. Verify key facts. Check tone appropriateness. Confirm nothing sensitive appears incorrectly. Edit as needed before use.

Thorough review (high stakes): Verify every factual claim against primary sources. Check legal/regulatory compliance. Review for brand alignment. Consider audience perception. Have second person review if possible. Treat AI output as raw material, not finished product.

No review (any stakes): Not recommended. Even low-stakes AI output benefits from a quick scan. Errors compound over time if never caught.

The Bottom Line

Task selection for AI delegation isn’t intuition. It’s systematic assessment of complexity, stakes, repeatability, context requirements, and verification capability.

Use AI for: low-complexity tasks, high-repeatability work, high-error-tolerance situations, tasks with easy context transfer, output you can verify.

Maintain human control of: high-complexity judgment, low-error-tolerance decisions, relationship-critical work, novel problems, anything you cannot verify.

The framework doesn’t require memorization. Ask five questions: How complex? How high are the stakes? How often does this repeat? Can I transfer the context? Can I verify the output? The answers guide the decision.

Sources:

Automation potential by activity type: McKinsey Global Institute “The Economic Potential of Generative AI: The Next Productivity Frontier” (June 2023)
Task complexity and AI performance: Dell’Acqua et al., “Navigating the Jagged Technological Frontier,” Harvard Business School & Boston Consulting Group (2023)
Klarna AI assistant results: Klarna company announcements (February 2024)
Morgan Stanley AI implementation: Morgan Stanley official communications, financial industry coverage (2023-2024)
CarMax content generation: Microsoft Azure customer case study (2023)
Coding productivity gains: GitHub Copilot research (2022)
Business writing productivity: Nielsen Norman Group AI usability studies (2023)