Sakana AI launched today Sakana Fugu. It is a multi-agent orchestration system that behaves like a model. You send a request to a single endpoint. Fugu decides how to handle it internally. It solves a task directly when sufficient. It also assembles and coordinates a team of expert models when needed. The complexity of multi-agent systems never reaches your code.
TL;DR
- Fugu provides a multi-agent system behind an OpenAI-compliant API.
- Fugu Ultra is the leader in most published coding and reasoning benchmarks.
- The orchestrator overcomes the individual models it coordinates.
- Opt-out and provider routing target compliance and single-vendor risk.
- Routing is proprietary, so per-query model selection remains hidden.
What is Sakana Fugu?
Fugu itself is a language model. It is trained to call other LLMs in the agent pool. That pool contains instances of itself, called recursion. Fugu internally manages model selection, delegation, validation, and synthesis.
Instead of hard-coded roles or workflows, Fugu learns to coordinate. It decides when to delegate and how agents should communicate. It then combines their work into one answer. From the outside, you call the single model. Inside, a coordinated system of experts works.
Sakana AI designed this as a defense against single-vendor dependency. If a provider restricts access, Fugu removes the disruption. The research team cites Anthropic’s recent export controls on Fable and Mythos models as inspiration. Over time, new models may be folded into the pool.
Fugu and Fugu Ultra: two models, one API
Fugu is available in two variants, both behind an OpenAI-compliant API:
- fugu Balances strong performance with low latency. It’s a default for everyday coding, code reviews, and chatbots. This also fits into tools like Codex. You can exclude specific agents from its pool. This helps teams meet data, privacy and compliance requirements.
- fugu ultra Designed for maximum answer quality on difficult, multi-step problems. It coordinates a deep group of expert agents. Its pool is fixed, so opt-outs are not available. current model id is
fugu-ultra-20260615.
The research behind Orchestrator
Fugu Two ICLR 2026 Paper Trinity and Conductor builds on the orchestration learned.
trinity Uses a mildly developed coordinator at several turns. It specifies the roles of thinker, worker or verifier to optimally delegate the work. conductor Trained with reinforcement learning. It explores natural-language coordination strategies and focused signals for diverse LLM pools.
Together, they show that systems can learn to assemble and route agents according to each task. It replaces hand-designed workflows.
interactive explainer
benchmark
Sakana compares AI Fugu to the foundation model it operates. Baselines use provider-reported scores. The SWE Bench Pro uses Mini-Sway-Agent as a scaffold.
| benchmark | fugu | fugu ultra | opus 4.8 | Gemini 3.1 Pro | gpt 5.5 |
|---|---|---|---|---|---|
| SWE Bench Pro* | 59.0 | 73.7 | 69.2 | 54.2 | 58.6 |
| TerminalBench 2.1 | 80.2 | 82.1 | 74.6 | 70.3 | 78.2 |
| livecodebench | 92.9 | 93.2 | 87.8 | 88.5 | 85.3 |
| LiveCodeBench Pro | 87.8 | 90.8 | 84.8 | 82.9 | 88.4 |
| final test of humanity | 47.2 | 50.0 | 49.8 | 44.4 | 41.4 |
| charxiv reasoning | 85.1 | 86.6 | 84.2 | 83.3 | 84.1 |
| GPQA-D | 95.5 | 95.5 | 92.0 | 94.3 | 93.6 |
| SciCode | 60.1 | 58.7 | 53.5 | 58.9 | 56.1 |
| τ³ Banking | 21.7 | 20.6 | 20.6 | 8.4 | 20.6 |
| long reference argument | 74.7 | 73.3 | 67.7 | 72.7 | 74.3 |
| MRCRV2 | 86.6 | 93.6 | 87.9 | 84.9 | 94.8 |
Orchestrator posts top scores on 10 out of 11 lines. Fugu Ultra tops four coding benchmarks, CharXiv Reasoning, and Humanities Final Exam. It combines the regular fugue on GPQA-D. Regular Fugu is a leader in psycode, τ³ banking and long context reasoning. GPT 5.5 wins MRCRv2, which is the only baseline win here.
Its Fugu models stand shoulder to shoulder with Anthropic’s Fable 5 and Mythos Preview. Both of them are not in Fugu’s pool, as they are not publicly accessible.
use cases
Sakana AI ran a beta with about 500 initial users. Published examples support long, multi-step tasks.
- automatic research: An agent autonomously improved the training prescription of a small GPT. It ran 123 experiments in about 14 hours on an H100 GPU. Fugu Ultra reached a best average verification BPB of 0.9774, with a best single run of 0.9748.
- rubik’s cube solver:Each model written with a pure-Python solver, no libraries allowed. Fugu Ultra solved all 300 frozen cubes in an average of 19.72 moves. A baseline closely matched this at 19.76 moves. Two others crashed and none were resolved.
- Classical Japanese Kana Reading Order: At 1610 letters, Fugu Ultra scored a NED 0.80. The closest baseline reached was only 0.24.
- blindfolded chess: Fugu played four games from memory, showing no boards. It defeated three Frontier models and the 2100-elo Stockfish engine.
- online trading: Over a 50-week window, Fugu Ultra returned an average of +19.43% across five runs. Other marginal models remained below +15%. Sakana AI states that past performance does not guarantee future results.
A minimal API example
Fugu uses OpenAI-compliant APIs, so no SDK migration is required. Point an existing client to your console-provided endpoint.
from openai import OpenAI
# Endpoint and key come from your Sakana console (console.sakana.ai).
client = OpenAI(
base_url="https://<your-fugu-endpoint>/v1", # from console.sakana.ai
api_key="YOUR_SAKANA_API_KEY",
)
resp = client.chat.completions.create(
model="fugu-ultra-20260615", # or "fugu"
messages=[
"role": "user",
"content": "Reproduce the method in this paper and report the gap.",
],
)
print(resp.choices[0].message.content)
Token usage and cost are reported per request. So you can monitor expenses in real time.
Sakana Fugu – Early Community Spirit
A manual review of public reaction to X and Hacker News, with links to each source. Captured on June 22, 2026.
12 posts reviewed
Assistant
confused
Serious
Initial reaction is sceptical. “Is it just a router or a wrapper?” The question dominates. The most obvious auxiliary sounds are Sakana-associated.
Method: Sentiment was hand-assigned from a small sample of public posts on June 22, 2026. This is not a statistical survey, and the breakdown may change as more responses come in. Two of the three assistant positions belong to Sakana AI or its CEO. Quotes have been shortened; Follow each link for complete reference. The Reddit quote is reported by VentureBeat.
marktechpost Sakana Fugu Emotion Tracker
Source: X Hacker News VentureBeat
check it out product page And technical details. Also, feel free to follow us Twitter And don’t forget to join us 150k+ml subreddit and subscribe our newsletter. wait! Are you on Telegram? Now you can also connect with us on Telegram.
Do you need to partner with us to promote your GitHub repo or Hugging Face page or product release or webinar, etc? join us