Daily brief

May 29th, 2026

~6 min ·7 items surfaced

(No items clear the bar today.)

1 What to Know Today

Tier 1 — Callstack ships Apex, a React Native-specialised coding model (Fillarup)

Callstack quietly announced Apex, a fast coding model trained specifically on React Native architecture decisions and framework-specific fixes. Verdict: research preview — private beta only, no public benchmarks beyond Callstack’s own claims that it trails frontier models on generic coding but beats them on RN perf-to-cost. Fillarup is React Native/Expo and you spend real Claude tokens on RN-specific issues every session. Action this week: request beta access via callstack.com/apex, run a Fillarup-scoped eval against Sonnet 4.6 on three real tickets, decide if it slots in as a per-feature accelerator or stays parked.

Tier 1 — OpenAI Secure MCP Tunnel ships for private MCP servers (Always-On Reeve, Ben)

OpenAI shipped Secure MCP Tunnel: a tunnel-client that connects private MCP servers to OpenAI products via outbound HTTPS, no inbound exposure, no public DNS. Verdict: verified shipped (live in OpenAI’s developer docs). This is the enterprise pattern Roy will need when Ben and the Always-On Reeve listener start talking to Xero, PaperClip, and finance@ over HTTP without you punching holes in the home network. OpenAI is the first lab to ship this — Anthropic will follow but you can validate the architecture pattern now. Action: read the tunnel docs once (15 min), check if PaperClip’s heartbeat protocol could use the same outbound-only model, log the pattern in reference_anthropic-plugin-marketplace.md for when Claude ships parity.

Tier 1 — Microsoft Copilot Studio computer-use agents hit GA (MACA, Always-On Reeve)

Microsoft GA’d computer-use agents inside Copilot Studio on May 13 — first major platform to ship production-scale computer use. Agents click and fill GUIs in legacy software with no APIs, model choice between Claude Sonnet 4.5 and OpenAI CUA, Azure Key Vault for credentials, Purview audit logs, Windows 365 Cloud PC pools, ~$0.04 per step. Verdict: verified shipped, GA. Anthropic and Google are still in preview. This matters for MACA (Meta Ads Manager has GUI-only features the API misses) and for Always-On Reeve (overnight tasks against GUI-only surfaces). Action: spin up a single Copilot Studio computer-use agent against a low-stakes Meta Ads task this weekend, measure cost-per-step against Claude direct, decide whether to wait for Anthropic GA or adopt now.

2 What You Already Know That Most People Don't

a16z just published the data showing Ben’s 3-tier authority is the correct architecture

a16z dropped a Narrative Violation post with Pylon data: in B2B customer support, AI resolves end-to-end only ~15% of tickets vs ~35% in B2C. Two-thirds of the time, AI silently triages to humans. When AI engages actively before handoff, it cuts human workload by ~33%. Their thesis: “in high-stakes verticals, invisible-triage and copilot patterns beat full-autonomy framing.”

You shipped this six months ago. Ben’s 3-tier authority in ~/Developer/PrevailPartners/products/agents/XeroAgent/ — Tier 1 auto-execute on routine, Tier 2 confirm-then-execute on judgment calls, Tier 3 escalate to Roy on high-risk — is exactly the invisible-triage pattern a16z is now data-marketing. Same logic in MACA’s 14-agent wave architecture where ad copy passes through human review before publish. You’re not behind the consensus; the consensus is catching up to your design choice. Worth saying out loud in the next Aria conversation: “the labs and their VCs have only just landed on the architecture we’ve been building to since March.”

3 Worth a Deeper Look This Week

Simon Willison: “Anthropic and OpenAI have found product-market fit” — read it before the next pricing decision

Link: https://simonwillison.net/2026/May/27/product-market-fit/ (10 min read)

Willison’s frame: the labs are raising API prices aggressively because coding and general-purpose agent customers spend >$200/user/month, which actually covers inference costs. Chat subscriptions at $10-$20 never could. Specific angle for you: when you start pricing CourseBuilds Tier 2 ($50-120K/year embedded) and the MACA pitch, your reference point isn’t a SaaS seat or a per-month chatbot — it’s the agent-platform economy where the unit price reflects sustained per-task token spend. This article gives you the language Aria and Michael Jordan won’t be using yet, which is the language to use first.

Trajectory ($15M seed) — continual-learning AI, customers include Harvey and Decagon

Link: https://www.wired.com/story/ex-google-apple-ai-researchers-want-to-make-ai-that-gets-smarter-as-you-use-it/ (8 min read)

Ex-DeepMind / OpenAI / Apple researchers launched Trajectory: models that post-train weekly (heading toward hourly) on user corrections. Specific angle for you: Ben’s “learning from corrections” capability is the same lane — Ben currently captures corrections and updates its rules but doesn’t post-train weights. Worth 30 minutes to compare Trajectory’s approach against what’s in XeroAgent/ already, and decide if there’s a Ben v2 architectural call to be made before scaling Ben to other UBX-style clients.

4 Conversation Capital

“Cognition’s coding agent Devin went from $37 million to $492 million annualised in twelve months — they just raised a billion at a $26 billion valuation. Meanwhile Salesforce’s Agentforce hit $1.2 billion ARR in its first full year and the stock dropped 33% the day they reported, because bolt-on agents don’t lift overall revenue. The chat-subscription era is the wrong frame — we’re already in the agent-platform economy, and the winning architecture is rebuilding workflows around AI, not selling a seat.”

Use case: Drop this in any Aria, Rio Tinto, or AI-pro conversation where the framing slides toward “AI is overhyped” or “let’s pilot a chatbot.” Signals you’ve calibrated against the actual production economics (one win, one cautionary tale) and steers the conversation toward the only positioning that pays — re-architecting around AI, which is exactly the CourseBuilds Aria wedge.

5 Something You Haven't Thought About

Ramp’s “10,000 agent sessions in 8 hours” security sweep is a Saturday job for you

TLDR linked it as a quick hit, but the methodology is the part worth stealing. Ramp ran ~10,000 Inspect coding-agent sessions against its own backend in a single eight-hour stretch with a one-line prompt — “find security issues” — and surfaced real high-severity findings. Compute cost was a rounding error against the alternative (a paid pentest or a slow internal sweep).

The play for you this weekend: spin a Codex or Claude Code agent over MACA, InvoiceGen (CartQuote), and Ben — three repos that are about to face external eyes (UBX pitch, CWS resubmission, PaperClip launch). One prompt, three parallel sweeps, results into a triage doc by Sunday night. Guidance: ACT this week. It costs you a coffee in tokens and answers a question you’re going to be asked anyway by the first technical buyer who looks at the code. Don’t queue this — it’s smaller than it sounds.

6 Skip File

[TLDR — “ElevenLabs Music v2”]: Already covered in yesterday’s brief; “licensed data” angle isn’t enough new signal.
[TLDR — “Biohub world model of protein biology”]: Major science release, not your stack.
[TLDR — “Delta Weight Sync in TRL”]: Async RL infra plumbing, irrelevant to Prevail’s surface.
[TLDR — “NVIDIA LocateAnything parallel grounding”]: Vision-language research, no project tie.
[TLDR — “LiteParse v2.0 OSS PDF parsing”]: Worth bookmarking for Ben’s invoice ingestion later, but no action this week.
[TLDR — “Hassabis: AGI 3-4 years away”]: Already covered in yesterday’s skip; same prediction, fresh repackage.
[Rundown — “OpenAI Foundation $250M for AI economic impact”]: Directional only — no action for you.
[Rundown — “GPT-5.5 becomes default, GPT-5.2 retired from Codex June 2”]: Ops change for ChatGPT users, not Roy’s daily driver.
[Rundown — “Google Coral Board on-device AI”]: Hobbyist edge hardware, not Prevail’s stack.
[Rundown — “Anthropic ships Claude Code reliability upgrades”]: QoL improvements you’ll feel passively; no action required.
[Rundown — “Robinhood Agentic Trading + Agentic Credit Card”]: Consumer fintech, not your lane.
[Rundown — “YouTube auto AI-content detection”]: Consumer platform policy.
[Practicaly — “Spotify and UMG fan remix licensing”]: Music IP curiosity, not relevant.
[Practicaly — “OpenAI 2026 election safeguards”]: Civic-tech announcement, no project tie.
[The Information — “Meta paid AI chatbot subscriptions in Singapore/Bolivia”]: Consumer experiment, irrelevant.
[The Information — “AWS Agentic Shopping Assistant (Kate Spade w/ Anthropic)”]: Notable Anthropic placement but no MACA/Prevail wedge today.
[The Information — “Snowflake $6B AWS Graviton + 13,600 AI accounts”]: Hyperscaler infrastructure deal, ambient signal only.
[The Information — “Apple to renew on-device AI push at WWDC”]: Preview tease, wait for the actual announcement.
[The Information — “Salesforce Agentforce $1.2B ARR / stock -33%”]: Used in Section 4 — no separate item needed.
[The Information — “Cognition $26B / $492M run-rate”]: Used in Tier 1 + Section 4 — no separate item needed.
[The Information — “OpenAI infra leadership changes, Microsoft legal bench”]: Personnel inside baseball.
[a16z — “B2B AI is copilot not replacement”]: Used in Section 2 — no separate item needed.
[BagelBots — “$150 humanoid cleaning service in SF”]: Robotics curiosity, not your domain.
[BagelBots — “ClickUp 22% layoff, 3,000 internal agents”]: Already covered in yesterday’s skip.
[BagelBots — “Anti-tech extremism surveillance / data-center map”]: Important social-license context, no Prevail action.
[Neil Patel — “SEO isn’t dead webinar”]: Promo for a webinar, no fresh signal.

Brief Metadata

Sources scanned: 8 (TLDR AI, The Rundown, Practicaly.AI, The Tip, The Information, a16z, BagelBots, Neil Patel)
Items extracted: 33
Items surfaced: 7 (3 Tier 1, 1 anxiety-flip, 2 deeper-look, 1 first-mover)
Items skipped: 26
Read time: ~6 min