

A short field guide to a month that pushed agents from demo to default. What each release means in plain language and what to do about it.
The short answer
May 2026 was another dense month, with all four major platforms shipping meaningful updates.
- OpenAI made a faster model, GPT 5.5 instant, the new default in ChatGPT, and turned their coding tool into something that can run long tasks on its own
- Anthropic released Claude Opus 4.8, with sharper agentic performance and parallel subagent workflows(as of June 10 they've released Fable 5 into production - this is their best model, a safer version of Mythos, that costs 2x as much as Opus 4.8).
- Google led I/O 2026 with Gemini 3.5 Flash and launched an overwhelming number of new AI and agentic products.
- Microsoft rebuilt the M365 Copilot interface and confirmed a Copilot super app is coming before the end of the year. Microsoft also launched seven of their own models, showing a continuing break away from OpenAI.
The month's consistent theme is that agents are no longer experimental. Read on for the details.
Results and takeaways
- OpenAI: GPT-5.5 Instant (52.5% fewer hallucinations on high-stakes prompts), Codex becomes a persistent agent runtime inside Excel and Google Sheets.
- Anthropic: Claude Opus 4.8 (agentic coding up from 64.3% to 69.2%), dynamic parallel workflows, effort controls. Same price as 4.7.
- Google: Gemini 3.5 Flash (similar results to the best models, but quicker and cheaper), Gemini Omni (multimodal with video output), Gemini Spark (personal agent), Daily Brief (morning digest from Gmail and Calendar).
- Microsoft: Adaptive M365 Copilot workspace, computer-using agents in Copilot Studio, Copilot super app confirmed.
What's new from OpenAI in May?
Two meaningful releases:

GPT-5.5 Instant became the new default in ChatGPT on 5 May. The headline number is a 52.5% reduction in hallucinated claims on high-stakes prompts in medicine, law, and finance. That's a significant reliability improvement for anyone using ChatGPT for research, drafting, or analysis where getting things wrong matters.
Codex can now run for hours, hold state across sessions, and operate directly inside Excel and Google Sheets. The model plans, writes, executes and corrects code with minimal input.
Source: openai.com
What's new from Anthropic in May?
Claude Opus 4.8 shipped on 28 May, building on 4.7 with what Anthropic describes as sharper judgement, improved honesty about uncertainty and stronger agentic coding performance. For anyone using Claude for complex tasks with multiple steps and tool use, this is a major improvement.
Claude Code now runs multiple subagents in parallel to handle complex tasks faster. There are also new effort controls that let you dial how much work Claude puts into a response. This is useful when you want a quick draft versus a thorough analysis.
Source: anthropic.com
What's new from Google in May?

Google's I/O 2026 developer conference on 19 May was headlined by Gemini 3.5 Flash, a model Google is positioning as frontier-level quality at much lower cost and much higher speed. We put it to the test against Claude Opus 4.7 in this month's newsletter feature. We found that while the speed claim holds, speed isn't always what you need.
I/O was overflowing with new product announcements, but three other products caught our eye.
Gemini Omni takes multimodal input further with video as an output format, not just text and images. Gemini Spark is a personal agent that works on everyday tasks in life and business, such as booking travel, browsing the internet on your behalf and keeping your files in order. Daily Brief is a personalised morning digest built from your Gmail and Calendar.
Source: blog.google
What did Microsoft ship in May?
Microsoft pushed a significant M365 Copilot update on 28 May. The interface has been rebuilt around an adaptive, task-aware workspace, with the idea being that the layout adjusts to what you're doing, rather than you switching between Copilot and your apps.
From our perspective, the Copilot ecosystem layouts keep changing so quickly that it’s hard to keep track of what is where. Microsoft could afford to be a little more deliberative about the changes, as we’re finding clients are getting lost, sometimes thinking that their favourite tools have disappeared when they’ve actually just moved to a different spot on the screen.
Copilot Studio also gained computer-using agents. This means automation that navigates software the way a person would, without needing API access. That's a big deal for organisations with legacy systems or apps that don't have clean integration options.
Microsoft also announced that it’s building a Copilot super app that unifies GitHub Copilot, Copilot Chat, and Cowork before the end of the year.
Source: microsoft.com
What's the pattern across all four?
Every major platform is moving toward AI that acts with autonomy across your entire workflow. Codex runs for hours. Gemini Spark takes actions across your digital life. Opus 4.8 orchestrates parallel subagents. Microsoft builds agents that navigate software like a person.
The chat window is still there, but it's increasingly a fallback. The primary interface is becoming AI woven into the workflow itself.
What should you do about it?
Three takeaways for the people we work with:
Take agents seriously. This month saw agents move closer to the workplace default. If your mental model of AI is still "ask a question, get a response," it's time to update. Think about one task in your workflow that runs on a predictable process and start to assess your options. That's a good starting point.
Reliability. Gemini 3.5 Flash is fast, but GPT-5.5 Instant's 52.5% hallucination reduction is perhaps more useful for the average organisation. When choosing tools for work where errors have grave consequences (legal, financial, medical, comms) the accuracy benchmark is particularly important.
Integration. Microsoft's Copilot super app. Google's Daily Brief. Gemini Spark. The direction is AI that has access to your full workflow and can act on that information.
What we're watching in June
- Claude Fable 5 - can it live up to the hype created by Mythos?
- Will Copilot keep shaping UI changes so quickly that we can’t keep track of them?
- When will we get our hands on those Google updates like Spark and Daily Brief?
- Will the new Siri (co–built with Google) finally dig Apple out of its AI hole?
If you'd like to talk about what any of this means for your team, that's the work we do. AITC runs hands-on training and advisory engagements for organisations that want their teams to use AI safely and effectively. Get in touch

.png)
.png)