The skills LLMs are best at replacing aren't the ones you'd expect

I spent a good chunk of last week reading a paper that made me rethink what I tell junior devs when they ask “should I be worried about AI taking my job?” The short answer used to be a vague “focus on the stuff AI can’t do.” The problem is nobody could quantify which stuff that actually was.

Now there’s a number for it. The Skill Automation Feasibility Index (SAFI) benchmarks four frontier LLMs across 263 text-based tasks covering all 35 skills in the U.S. Department of Labor’s O*NET taxonomy. The results are both obvious and surprising.

Math and programming score highest, and that’s the boring part

Mathematics scored 73.2 on the SAFI scale. Programming hit 71.8. If you’ve used any coding assistant in the last year, this won’t shock you. LLMs are genuinely good at structured, well-defined problems with clear right answers.

But here’s what caught my attention: active listening scored 42.2. Reading comprehension landed at 45.5. The skills that sound “easy” to humans are the ones these models struggle with most. Not because the models are bad, but because those tasks require a persistent understanding of context that shifts based on who’s talking and why.

I’ve been building projects where LLMs handle customer interactions, and I can confirm this gap is real. The model can parse a complaint perfectly and still miss that the customer is actually asking for reassurance, not a refund. That kind of reading-between-the-lines work is where humans still win by a wide margin.

The capability-demand inversion is the real story

The skills LLMs are best at replacing aren't the ones you'd expect

This is the finding that changed how I think about the whole “AI replacing jobs” conversation. The researchers cross-referenced their SAFI scores with real-world AI adoption data from the Anthropic Economic Index, which covers 756 occupations and almost 18,000 tasks.

What they found is a “capability-demand inversion.” The skills that LLMs automate best (math, programming, formal logic) are not the skills most demanded in jobs that are heavily exposed to AI. The jobs where AI shows up most, think analyst roles, content strategists, support leads, actually lean hardest on the skills LLMs score lowest on: active listening, social perceptiveness, persuasion.

The more your job involves AI, the more it needs the human skills AI can’t replicate. That’s not a feel-good platitude. It’s what the data shows.

I wrote about measuring what matters in LLM reasoning a while back, and this connects directly. We keep optimizing for the tasks LLMs already do well. The real gap isn’t in making math scores go from 73 to 85. It’s in the 42-and-below range, and nobody has a clear roadmap for closing it.

What the AI Impact Matrix tells you about your own job

The paper proposes four quadrants that position every skill along two axes: how automatable it is (SAFI score) and how exposed the related jobs are to AI adoption.

High Displacement Risk covers skills that are both highly automatable and highly demanded in AI-exposed roles. This is a smaller quadrant than you’d think. Most high-automation skills fall into what they call “AI-Augmented” instead, meaning the AI handles the grunt work while a human still drives strategy.

The “Upskilling Required” quadrant is where things get interesting for developers specifically. These are skills with low automation feasibility but high demand in AI-adjacent jobs. Coordination and negotiation, mostly. Instructing others. If you’re wondering what to learn next that isn’t another JavaScript framework, this quadrant is your answer.

I’ve been shifting my own freelance work in exactly this direction. Less time writing boilerplate, more time on client calls figuring out what the project actually needs before any code gets written. The hourly rate for the second activity is roughly triple the first, which is its own kind of signal.

78.7% of AI interactions are augmentation, not replacement

Another number from the paper worth sitting with: 78.7% of observed AI interactions in the wild are augmentation rather than full automation. People aren’t getting replaced in a single sweep. They’re getting a copilot that handles the predictable parts while they handle everything else.

This lines up with what I see in practice. The teams I work with that get the most out of AI tools are the ones that treat them as fast interns, not replacement engineers. You still need someone who knows which questions to ask, how to validate the output, and when to throw the whole suggestion away.

The 21.3% that is full automation tends to cluster around specific micro-tasks: formatting, data entry, boilerplate generation, first-draft summarization. Useful, but not the kind of work that makes or breaks a project.

What this means if you’re learning right now

If I’m a junior developer reading this in April 2026, here’s what I’d actually do this week.

Stop panicking about coding being “automated.” Yes, LLMs are good at generating functions. They’re terrible at knowing which functions to generate. System design, requirements gathering, talking to stakeholders about what they actually want – these are the skills with low SAFI scores and high job demand. Invest there.

Get good at validating AI output instead of just generating it. The paper shows that open source LLMs like LLaMA 3.3 70B and Qwen 2.5 72B score competitively with commercial models on the SAFI benchmark. The models are commoditizing fast. The ability to evaluate what they produce is not.

Practice the low-SAFI skills deliberately. Take the client call instead of asking someone else to do it. Write the project spec before you write the code. Run the retro. These aren’t soft skills in the dismissive sense. They’re the hard to automate skills that pay more precisely because they resist automation.

The paper’s data covers 1,052 model calls with a 0% failure rate, so the benchmarks are solid. But benchmarks measure task completion, not judgment. And judgment is the thing that’s actually scarce.