The autonomous agentic IDE · Windows

Your AI
engineering firm.

UltraCodey doesn't autocomplete your code — it staffs the project. An engineer who plans, a designer who builds, a principal who reviews and stamps every change, and employees who work shifts while you sleep. One super app, on your machine, with your accounts, under your rules. And yes — there's a pet.

Five apps. One window. A super app.

Window tabs switch the whole surface — each one is a complete product, not a sidebar panel. Code, a daily assistant, a model lab, an analyst desk and an AI staffing agency. Every screenshot below is the real app.

Code — the engineering firm Chat — the casual assistant Benchmark — the model lab Research — standing research loops Employee — hire persistent AI coworkers

Code — the engineering surface: the agent outlines your repo, edits across files, runs the tests, and reports — every step visible as it happens.

Run like a firm. Reviewed like a firm.

Most agents are one model talking to itself — and shipping its own unreviewed code. UltraCodey's Code surface is organized like an engineering firm: three roles, real separation of duties, and nothing ships unstamped.

01

The Engineer

Owns the task. Reads the repo, writes the plan, breaks the work into assignments and delegates — it never touches the code itself.

02

The Designer

Implements. Edits the files, runs the commands, and iterates until the build is green and the acceptance criteria actually pass.

03

The Principal

Reviews the diff with fresh eyes and a deterministic test signal, sends weak work back, and stamps what's done. No stamp, no done.

Away from your desk? The hands-off reviewer takes over: a deterministic risk table, first-match-wins rules, and constrained reviewer sub-agents for high-risk calls — with a flight-recorder audit log of every decision made while you were gone. You come back and read the tape.

The Code surface completing a task end to end

Code

Watch a task go from sentence to shipped.

In the session above, the agent was asked for a dark-mode toggle. It outlined the settings module, made the edits across three files, ran the suite — 214 passing — and flagged a contrast issue it noticed along the way. That's the whole product in one screenshot: it finishes, it verifies, and it thinks ahead.

  • Engineer → Designer → Principal: every change reviewed and stamped before it counts
  • Autonomous build → test → fix → verify loop with milestone ledgers
  • Checkpoints before every change; /revert undoes a whole run
  • Mid-run steering — type while it works and it folds your note in
Hiring an AI employee from the character cast

Employee

Stop prompting. Start hiring.

Every other tool waits for you to ask. UltraCodey lets you build a roster: name them, pick a face from the cast, hand each one a job description. They work their shifts on schedule — in their own chats, with the same engine and permissions as everything else.

  • Standing duties: nightly test runs, morning triage, weekly dependency patrol
  • One-off assignments queued in plain English
  • Shift reports inline — read them like teammates' standups
  • Each employee has a personality that colors how they work and report
The Benchmark surface

Benchmark

Stop guessing which model is best.

Everyone argues about models; UltraCodey measures them. Pick two or more of your connected models and run a real gauntlet — coding, reasoning, instruction-following, writing — judged with rubrics.

  • Time-to-first-token, total latency, tokens and cost per result
  • Head-to-head win-rate matrix and category radar
  • Value frontier: quality per dollar, including free local models
  • Export everything as CSV, JSON or markdown
The Research surface with loop templates

Research

Your own analyst desk, always on.

Research loops watch the web for you on a schedule — markets, your stack's ecosystem, any topic, person or product — and synthesize trend-aware reports into a feed you skim over coffee.

  • Sources are URLs or search queries; intervals from hourly to weekly
  • One-click templates: market watch, tech trends, topic monitor
  • Reports with highlights first, full markdown bodies underneath
  • Desktop notification when something genuinely moved
The Chat surface answering a travel question

Chat

And a great everyday assistant, too.

Not everything is code. Chat is a clean, fast general assistant with web access and your memory — and a hard wall between it and your engineering sessions, so a casual question never lands in a repo.

  • Separate conversation list, separate system behavior
  • Web search and fetch built in
  • Remembers your preferences across every conversation

Plugged into everything you already use.

70 verified MCP integrations with one-click install and browser sign-in, plus 124 built-in skill playbooks the agent reaches for on its own.

The plugin gallery: GitHub, Slack, Notion, Linear, Figma, PostgreSQL and more
Plugins — GitHub, Slack, Notion, Linear, Figma, Playwright, PostgreSQL, Sentry, Cloudflare and sixty more. Connect with your browser; credentials go to the OS keychain.
The built-in skills catalog
Skills — 124 playbooks across web, mobile, devops, data, security and more. The agent picks the right one per task — and writes new ones from its own wins.

The vibe layer

The first IDE with a heartbeat.

Serious firm, playful soul. Codey is your desk pet — a tiny always-on-top companion who watches runs with you, celebrates green tests, dozes off when things are quiet, and answers when you talk to it. Boop it. It likes that.

  • 36 characters to unlock — each with its own look, voice and personality
  • Disney-style squash-and-stretch animation with a full idle life cycle
  • Voice replies with adjustable rate and pitch
  • Bonding levels — it remembers how long you've worked together
  • Strictly optional: one toggle and it's a serious IDE again
Codey, the floating desk pet, with its own chat composer
Codey Studio — pick a character, mood, voice and colors
Codey Studio — pick your companion, customize colors and voice, or earn the rarer cast members by leveling up.
The theme gallery with tiered collections and Ultra Mode

Make it yours

275 themes. 52 of them are alive.

Built for vibe coders: a generative theme platform, not a light/dark switch. Tiered collections from Arctic to Autumn, live-motion themes with animated backdrops, and Ultra Mode — the aurora flagship. Higher tiers unlock as you level.

  • Core, Classic and seasonal collections, each tuned as a set
  • 52 live-motion themes that animate without stealing focus
  • Solarized, Nord, Dracula and high-contrast for the traditionalists
The achievements journey with badge tiers

Leveling

Shipping code should feel like this.

Every message, finished run and daily streak earns XP. Levels unlock themes and characters; badge tiers stack on your profile; over a thousand achievements track the journey from First Steps to Century Club — across conversation, shipping, goals and devotion.

  • 1,100+ achievements with a profile showcase and earned titles
  • Badge tiers, daily streaks, and a lifetime "together" clock
  • Unlocks are cosmetic only — capability is never paywalled behind play

The engine under all of it.

Everything below ships in the app today. No waitlists inside the product, no cloud lock-in, no telemetry.

A real autonomous loop

Milestone ledgers with machine-checkable acceptance, forced verification turns, deterministic green/fix classification, and re-planning when a fix keeps failing. Walk away; it builds, tests, fixes, and verifies until done.

Every provider. Your plan or your key.

Anthropic, OpenAI, Gemini, OpenRouter, xAI, DeepSeek, Mistral, Groq, Ollama and a dozen more — or any OpenAI-compatible endpoint. Sign in with Claude Pro/Max or ChatGPT the same way the official tools do, or paste a key. Keys live in the Windows keychain, never in plaintext.

Agent swarms & best-of-N

Fan work out to parallel sub-agents, round-robin across every connected provider, and race N attempts in isolated git worktrees — a judge merges only the attempt that builds green.

A brain that actually persists

Reinforced SQLite memory scored by recency, importance and results; reflection after every run; skills it authors for itself; project profiles and file-location memory. Every new chat starts already knowing your codebase and your preferences.

Orchestrator and workers

Use multiple subscriptions at once: a manager model orchestrates while worker models execute — or set both to Auto and let the best connected model take the lead per task.

Full coding power tools

CodeMirror editor, real PTY terminal that survives tab switches, red/green diffs with a code reviewer, project-wide search, checkpoints with one-command revert, @-file mentions, live todo tracking, and a built-in browser for your dev server.

Computer use

Pixel-accurate multi-monitor mouse and keyboard control, screenshots, window management and clipboard — benchmarked to ≤1px cursor accuracy across the full virtual desktop, with a visual-verify gate.

Hands-off safety that isn't a nanny

Full Access when you want it; an auto-reviewer when you're away — a deterministic risk table, first-match rules, a constrained reviewer agent for high-risk calls, and a flight-recorder audit log of every decision.

Automations and goals

Schedule any prompt on any interval, or set a /goal and the agent keeps working until the goal is verifiably complete — with cost dashboards, budgets, and run replay to audit what happened.

Hardened like a product, not a demo

Control Flow Guard, CET shadow stacks, DPAPI-encrypted secrets, SSRF and DNS-rebinding protection, debugger tamper guard, process containment via Windows Job Objects. Local-first: your code never transits anything but the model API you chose.

Offline mode

Run fully local with Ollama — curated model presets by VRAM tier, pull progress in-app, and the same agent loop pointed at your own hardware.

Import everything. Leave nothing.

One click pulls your memory, skills and MCP servers from Claude Code, Codex, Gemini CLI, Cursor and Windsurf. It even speaks ACP both directions — drive it from Zed, or host external agents inside it.

It gets better every time you use it.

After every run UltraCodey reflects: what worked, what failed, what the reviewer rejected, what you reverted. Useful lessons are reinforced; bad ones decay; recurring playbooks become skills it writes for itself. A self-train loop consolidates everything in the background — locally, in SQLite you can open and read.

Reinforced memoryrecency × importance × uses × reward
Voyager-style skillsit authors and curates its own playbooks
Cross-chat carryoverprofiles, locations, episodes, preferences
Reviewable evolutionprompt changes are human-gated and reversible

Secured like it matters. Because it does.

An agent with shell access deserves bank-grade paranoia. UltraCodey is hardened at every layer — binary, secrets, network, process — and it keeps receipts.

Exploit-hardened binaryControl Flow Guard and CET shadow stacks compiled into every shipped build — plus a debugger tamper guard for crackers.
Secrets stay secretKeys live in the Windows keychain; the fallback file is DPAPI-encrypted to your user account. Errors and logs are scrubbed of anything secret-shaped.
Network gatesSSRF protection with DNS-rebinding pinning: web requests can only reach the addresses that passed validation — never your router, never cloud metadata.
Command firewallDestructive commands are caught even when encoded, chained, flag-reordered or hidden behind PowerShell tricks — then denied or escalated by rule.
Process containmentSpawned commands run inside Windows Job Objects, so a runaway process tree dies with its run instead of outliving it.
The flight recorderEvery autonomous permission decision lands in an audit log you can replay — who asked, what risk, what rule fired, what happened.
Local-firstYour code and your database never transit anything except the model API you chose. There is no UltraCodey cloud in the middle.
Zero telemetryWe can't see your usage. Not "anonymized" — absent. The only data we get is the email you send asking for a beta seat.

Why not just use the others?

We did. Then we built the app we actually wanted. Fair comparison, current as of June 2026.

Capability UltraCodey Claude Code Cursor Codex
Native desktop app, local-firstYesTerminalEditor forkTerminal / cloud
Use ANY provider — or all at once20+, mixed per roleAnthropicSeveral, theirsOpenAI
Sign in with existing paid plansClaude + ChatGPTClaudeNoChatGPT
Persistent self-learning memoryYes, reinforcedFilesLimitedLimited
Hire persistent AI employeesYesNoNoNo
Built-in model benchmark labYesNoNoNo
Standing web research loopsYesNoNoNo
Computer use (real desktop control)Yes, ≤1pxNoNoNo
Walk-away autonomy with audit logYesPartialPartialPartial
A pet that celebrates your green testsObviouslyNoNoNo
Price during betaFreePlan$20+/moPlan

Honest caveat: those are excellent tools — UltraCodey can even import their configs and host them over ACP. The difference is scope: they assist a programmer; UltraCodey staffs a project.

Join the Windows beta.

UltraCodey is in private beta. Seats open in waves — ask for one and we'll send the signed installer and a quick-start.

Windows 10/11 · 64-bit · bring your own model accounts · no telemetry