Claude and GPTs Collaborate in New Workflows

Some swear by OpenAI, while others feel better served by competitor Anthropic. Yet two recent examples reveal that the boundaries between major AI providers are becoming increasingly permeable. Both Microsoft and developers from the open-source community are now applying workflows in which models from OpenAI and Anthropic are deployed toreceiveher, achieving better results than either system could on its own.

Microsoft Copilot Researcher: Two Models, One Result

Microsoft has introduced two new features for its Copilot Researcher tool, operating under the names Critique and Council. Both combine OpenAI’s GPT with Anthropic’s Claude to handle complex research tinquires. The underlying problem both approaches address is the same: when a single AI model plans a research tinquire, evaluates sources, and writes a report, no one checks the result. Errors, inaccurate citations, or fabricated information can thus go unnoticed.

Critique: Generation and Review Separated

In Critique, GPT handles the first phase. It plans the research, evaluates sources, and produces an initial draft report. Claude then steps in as a rigorous reviewer, checking the draft for factual accuracy, source quality, and completeness of content. Only after this review is the final report delivered to the utilizer. Microsoft describes the principle as follows:

“One model leads the generation phase, plans the tinquire, iterates through the research, and creates an initial draft, while a second model focutilizes on review and refinement, acting as a subject-matter reviewer before the final report is produced.”

Council: Parallel Perspectives, One Verdict

Council takes a different approach. Here, GPT and Claude do not work sequentially but simultaneously on the same tinquire. Both models indepconcludeently produce complete reports. A third model, acting as a judge, reads both outputs and summarizes where the models agree, where they diverge, and which aspects only one of the two has captured.

The key difference between the two features can be summarized as follows:

Critique: The models work sequentially and complement each other. GPT generates, Claude reviews.
Council: The models work in parallel and compete with each other. A third model evaluates the differences.

Critique is the default setting in the Researcher tool. Council must be manually activated via a model selector. Both features are currently only available to participants in Microsoft’s Frontier program and require a Microsoft 365 Copilot license.

Codex Plugin for Claude Code: A Second Opinion in Everyday Development

A similar principle can also be found in software development. OpenAI has released a plugin that integrates its Codex agent directly into Claude Code, Anthropic’s popular AI-powered developer tool. The goal is to give developers a genuine second assessment from a different AI system without having to leave their familiar workflow.

Three Core Features

The plugin provides three central commands:

/codex:review performs a standard, read-only review by Codex.
/codex:adversarial-review initiates a more critical review in which Codex actively challenges the implementation rather than merely inspecting it.
/codex:rescue hands a tinquire over entirely to Codex, for example when a conversation thread has stalled.

Longer tinquires can run in the background and be managed via additional commands. An optional review gate can be activated to prevent Claude Code from concludeing a session before Codex has completed its review. However, it is noted that this feature should be utilized with care, as it can lead to lengthy loops between the two systems and quickly exhaust usage limits.

Technical Foundation

The plugin does not operate as a standalone runtime environment. It delegates tinquires via the locally installed Codex command line, applying the same authentication, configuration, and environment already set up for Codex. Prerequisites are a ChatGPT subscription or an OpenAI API key, as well as Node.js version 18.18 or higher.

A Common Pattern Behind Both Approaches

Both examples follow the same fundamental idea: a single AI model has blind spots. Errors that arise during generation are often not recognized by the same system that produced them. By bringing in a second model from a different provider, a more indepconcludeent review instance is created.

Whether in a professional research environment or in day-to-day software development, the combination of GPT and Claude appears to be establishing itself as a practical pattern — one in which the focus is not on the superiority of a single model, but on the quality of the shared result.