CODY AI TRAINING · 16 APRIL 2026

Building an AI coding toolbox
that actually works

A year of lessons from coding agents, context curation, and a personal memory bank — tailored for Coodyans and friends.

Magnus Gille · Magnus Gille Consulting
Embedded systems · Web · AI workflows
About me2 22
Who’s talking

A bit about me

Magnus Gille silhouette
Magnus Gille
AI practitioner · Magnus Gille Consulting
BackgroundSystems architect and product owner — Ericsson, Scania, Adage
FocusPractical AI — making it work, not just talking about it
Day-to-dayI use AI tools intensively every day — this is experience from practice, not theory
Fun factReigning AI Prompt Champion
Agenda3 22
What we’re doing today

One hour, not a monologue.

Timeline4 22
From reasoning to long-horizon agents

18 months — and a 36× leap in AI time horizon

2024-08-12
OpenAI releases its first reasoning model, O1
2024-11-25
Anthropic releases the MCP protocol
2025-02-03
Vibe Coding” coined by Karpathy
2025-02-24
Claude Code CLI ships as a research preview
2025-04-24
30% of our code is written by AI” — Pichai
2025-05-18
80% of my code is Codex” — McLaughlin, OpenAI
2025-10-03
44% of devs mainly use agents — Karpathy poll
2025-10-06
Almost all new OpenAI code is Codex” — Altman
2025-12-26
“I’ve never felt so behind as a programmer.” — Karpathy
2026-01-12
“It was basically just Claude Code” — Cherny
time (log₂)
1 h
2 h
4 h
8 h
20 min
39 min
1 h
2 h
3.4 h
4.9 h
5.9 h
12 h
METR time horizon · task length at 50% success · doubles every ~4 months
And my own kick-started journey
GitHub contribution graph showing acceleration late 2025
1 210 commits in the last year — almost all of them in the last three months.
Nuance5 22
But does it actually make us faster?

METR, July 2025: −19 %

What the study found

  • Experienced OSS developers on their own codebases
  • They predicted: +24 % faster
  • Reality: −19 % slower
  • Large, mature projects → the agent has to read a lot of context

Why it’s not the end of the story

  • The study measured Feb–Jun 2025. The Claude Sonnet 3.5 & 3.7 era
  • Q4 2025 tooling has completely different context mechanics
  • Stack Overflow 2025: trust fell 70 % → 60 %
  • The lesson: generic “use AI” ≠ value. Workflow is what decides.
We’re past the hype peak. That’s where the real craft starts — how you actually extract value in a real project.
Concept6 22
Where the value actually gets created

Model ≠ Harness

The model

  • Claude Opus 4.6, GPT 5.4, Gemini 3.1
  • A frozen file of 100–1000 GB
  • Knows nothing about your project
  • Knows nothing about your filesystem, terminal, or test suite
  • Without a harness: a very knowledgeable buddy in a chat window

The harness

  • Claude Code, Codex, Cursor, OpenCode
  • Gives the model tools: read file, run command, search, edit
  • Manages context, memory, agent loops
  • Defines how MCP servers, skills, and subagents work
  • Ongoing debate on where the value sits — models, harness, or both
The industry still talks a lot about models — who has the highest MMLU score. The real difference between getting 2× or 20× lives in the harness and how you use it.
Surfaces7 22
Where are you running it?

One model, four surfaces

The point: pick the surface by task, not by habit. Same model, completely different tool reach.
Paradigm shift8 22
A mental shift

From craft to factory

Craft (where we came from)

  • One developer, one task, full attention
  • The code is a personal expression
  • “It takes the time it takes”
  • Quality lives in the craftsman’s head

Factory (where we’re heading)

  • The developer orchestrates multiple agents in parallel
  • Quality lives in the process: tests, CI, context, loops
  • Repetitive work becomes automation, not grind
  • Time freed up for architecture, design, decisions
This is not a value judgement — it’s a mode of production. Craft doesn’t disappear. It moves up the stack.
Embedded9 22
The frustration is real

Why embedded has felt left behind

Embedded devs aren’t “behind”. They’ve been burned in areas where the tools really were bad. Six months ago, opting out was perfectly reasonable.
Embedded10 22
What’s changed in the last six months

The conversation has shifted

You don’t need to change careers. You need to change tools — and write your first claude.md.
Practice11 22
This is the heart of everything

Context → skills → subagents

Rule of thumb: anything you explain to the model twice should move into a CLAUDE.md or a skill.
Architecture12 22
Build for a dual audience

AI-ready architecture

Interactive13 22
A quick vote › click to vote

Team CLI vs Team MCP

If you could only pick one abstraction to extend your agent’s abilities — which one do you take?

Team CLI
The command line
Full control. Scriptable. No abstraction to leak through. The agent learns the same tools you already know.
0
Team MCP
The protocol
Language-agnostic. Composable. Write once, works in Claude, Codex, ChatGPT, Cursor. Future-proof interface.
0
Waiting for votes · click a card to vote (you can switch)
Principle14 22
CLI first — even when GUI is the goal

Make everything scriptable

A CLI that covers the whole domain is the cheapest MCP server you’ll ever build — the agent can already use it.
UI/UX15 22
Agent-first, human-first, or both?

I actually prefer a clean terminal.

My 9-year-old, iterating on her game — in a terminal.
The game she’s building — top-down adventure, browser-based.

Claude Code in my terminal isn’t just a coding tool anymore. Email, calendar, files, transcription — and, since recently, noxctl for Fortnox. Same surface for writing code, reading mail, or booking an invoice.

When a 9-year-old prefers the terminal, something has shifted. The real question isn’t “CLI or GUI?” — it’s what interface works when half the user is an agent?
Technique16 22
The cheapest regression engine you’ll ever own

Red/green TDD with the agent

“A significant risk with coding agents is that they write code that doesn’t work, or build code that never gets used — or both.” — Simon Willison, Agentic Engineering Patterns
Principle17 22
Measure everything you own

Hoard the things you build yourself

“‘Storage is manageable with retention policies’ is not a plan.” — Codex, from my debate logs, March 2026
If your tool can’t answer “is it getting better?” with a number, you don’t have a tool — you have a habit.
Thesis18 22
Karpathy, autoresearch · 2026

The bigger thesis

AI can improve itself — as long as you have enough compute and a clear signal for “what’s good”.
It’s also the thesis behind my whole toolbox: if I have a clear signal, the agent can do the work while I sleep.
Scope19 22
Same toolbox, wider problem set

It’s not just code

A consultancy that uses AI only inside the IDE is leaving most of the value at the door. The same harness that edits code also reads 400-page regulations and writes the memo for the customer.
Frontier20 22
One agent or many?

Multi-agent orchestration

This is a very new area. Nobody has landed the pattern yet — and new model releases keep re-opening the question.

My own view is pragmatic: if it works, it works — don’t overdo it. Where I actually feel friction is data & trust: I want a coordination layer that routes tasks by sensitivity — frontier models for the public stuff, self-hosted for what can’t leave the building. That’s the orchestration problem worth solving.
Links21 22
Take home

Resources

Simon Willison — Agentic Engineering Patterns
simonwillison.net/guides/agentic-engineering-patterns · read the whole thing, it’s worth it.

Karpathy — autoresearch
github.com/karpathy/autoresearch · the thesis on self-improving loops.

Beningo — Why Claude Code for Firmware Development Matters
beningo.com/why-claude-code-for-firmware-development-matters · the best embedded-specific piece right now.

Chalmers / Software Center — Agentic Pipelines in Embedded SW Engineering
arXiv 2601.10220 · Swedish industry partners, relevant to you.

METR, July 2025that −19 % study. Read it before you hype.

noxctl — my Fortnox CLI + MCP server: github.com/Magnus-Gille/noxctl.

All today’s material lands at coody.gille.ai (soon).

Thank you.

Questions, thoughts, things we didn’t get to — let’s hear them now.