office.md
A markdown file that changed everything
How software is built
Marcus had been a product manager at Apes & Co. for six years, and he had the calendar scars to prove it.
Every feature started the same way. Marcus would spend two weeks interviewing users, watching support tickets pile up, sitting in on sales calls where he’d wince at the same complaints. Then he’d disappear into a Google Doc for another week and emerge with a PRD — a Product Requirements Document — that he’d spent more time formatting than thinking. Twelve pages. Executive summary. Problem statement. User stories. Edge cases. A requirements table that had never once been read completely by anyone.
Then he’d present it to Engineering. Then Engineering would ask questions that were clearly in the document. Then there’d be a design review. Then a sprint planning session. Then a kickoff. Then, roughly four months after the user had first complained about the problem, something would ship — and it would be slightly wrong in a way that required two more months of iteration.
This was not considered a broken process at Apes & Co. This was considered how software was built.
Great ideas. Good intentions
The problem that arrived on a Tuesday in March was a good one.
Apes & Co. sold analytics software to mid-market retail companies. Their customers kept asking for the same thing: a way to automatically flag products that were probably going to run out of stock before the next supply order arrived — not just based on current inventory, but based on seasonal patterns, recent sales velocity, and upcoming promotions.
Simple concept. Genuinely complex underneath.
“We should build a stock-risk alerting system,” said Priya, the Head of Customer Success, in the all-hands meeting. She’d been hearing about this problem for eight months.
“Great,” said Derek, VP of Engineering. “Marcus, write the PRD. We’ll get it into Q3 planning.”
Q3 planning. It was currently March.
Marcus nodded and opened a new Google Doc.
Standard Invisible Procedures
Meanwhile, in the operations corner of the office, sat a woman named Rosa.
Rosa had been with Apes & Co. for nine years. Her official title was Senior Operations Analyst, which meant she did approximately forty different things that didn’t fit into anyone else’s job description. She built spreadsheets that kept whole workflows alive. She wrote the company’s internal SOPs — Standard Operating Procedures — documents that explained, in plain English, how to process a refund, onboard a new customer, handle a data discrepancy.
Nobody gave Rosa awards. Her SOPs lived in a Notion folder that was technically findable but practically unfindable.
On the same Tuesday Marcus opened his Google Doc, Rosa was building a spreadsheet — manually — to help a regional shoe retailer called Hamster & Sons figure out which products were most at risk of stocking out before Easter.
She pulled the inventory data. She pulled the sales data. She applied rules she’d developed over years of doing exactly this. Flag anything below 14 days of runway at current velocity. Increase urgency if there’s a promotion in the next 21 days. Halve the confidence on seasonal items because the velocity curves are unreliable.
It took her four hours. She sent it to Priya. Priya forwarded it to the client: “Rosa worked her magic again.”
The client replied: “This is exactly what we needed. Can we get this every week?”
Rosa stared at that email. Four hours, every week, for one client. They had 200 clients.
She closed the email. She had seventeen other things to do.
Planning vs Reality
Three weeks later, Marcus was on slide four of his PRD presentation to Engineering.
“How is the score calculated?” asked Tom, the senior engineer.
“It’s based on days of inventory remaining, adjusted for upcoming promotions and seasonal trends,” said Marcus.
“Right, but what’s the formula? If there’s a promotion in ten days and fifteen days of inventory, what’s the score?”
Marcus looked at his slide. He’d written “promotion urgency should increase the risk score proportionally” — the kind of sentence that felt precise when you wrote it and revealed itself as completely hollow when someone asked a follow-up question.
“I’ll follow up on that,” said Marcus.
The meeting ended. Four weeks of work. Still no formula.
Tom walked back to his desk. “We’re building this in Q3 at the earliest.”
“Optimistic,” said his colleague Aisha.
The (overburdened) innovator
The actual change began, as many changes do, with someone being too tired to do things the normal way.
Rosa had just finished her third manual stock-risk spreadsheet of the week. It was 6pm on a Thursday. She had a new client — a home goods company called Beige & Co. — and she was not going to build another spreadsheet from scratch.
Instead, she opened the AI assistant the company had rolled out three months ago and which most people used exclusively to write emails.
She typed, essentially, her brain.
I need to build a stock risk analysis. Here’s how I think about it:
First, pull inventory levels and calculate days of supply at current sales velocity. Flag anything under 14 days as high risk.
Second, look at upcoming promotions. If a product has a promotion in the next 21 days, cut the days-of-supply threshold in half — so high risk becomes anything under 28 days.
Third, for seasonal products, I’m less confident in the velocity number because it varies a lot. Add a note that says “seasonal — verify manually.”
Edge cases: if a product has been added in the last 30 days, don’t flag it — there’s not enough sales history.
Output: a table with product name, current inventory, days of supply, promotion flag, seasonal flag, and risk level (high/medium/low). Sort by risk level.
One more thing: as you run this repeatedly and see which patterns are fully routine — same logic, same structure, same output every time — write out a deterministic function for those cases and save it. I don’t want you reasoning through the same thing twice if you don’t have to.
The AI asked three clarifying questions. She answered them. It produced a working analysis in about four minutes.
Rosa ran it on Beige’s data. It worked — mostly. It had misunderstood how she defined “seasonal,” and the promotion date logic had a gap. She corrected it conversationally, the way you’d correct a colleague. No — when I say seasonal, I mean products where the month-over-month sales variance over the last 12 months is more than 40%. Let me give you an example.
Twenty minutes later, it was right.
She saved the whole thing — not just the output, but the instructions themselves. She called the file office_sop_v1.md.
Then she noticed something at the bottom of the file that the AI had added without being asked.
Under a heading that read Routine Functions — Auto-Generated, there were two short blocks of clean, readable code. The AI had identified the days-of-supply calculation and the promotion threshold check as fully deterministic — same logic every time, no judgment required — and had written them out as explicit functions. It had added a note: These cases no longer require LLM reasoning. Route directly to these functions for speed and consistency.
Rosa read the note twice.
The AI had looked at its own work, recognized which parts of its thinking were repetitive, and written them out as code so it wouldn’t have to think about them again.
She sat back in her chair.
It’s doing what a good analyst does when they’ve run a process enough times, she thought. They stop reasoning through it and write a checklist. Or a formula. Or a function.
The next day, she showed it to Priya.
“Can anyone use this?” Priya asked.
“I think so. It’s basically the same logic I’ve been applying manually for years. I just wrote it down in a way the AI could understand. And now the AI is writing parts of it back out as code.”
Priya ran it on a client dataset. Seven minutes, start to finish. She sat very still for a moment.
“Rosa,” she said carefully, “have you shown this to Marcus?”
Markdown magic
Marcus’s reaction was not defensive. It was more like the feeling you get when you realize you’ve been driving the wrong direction for an hour.
He looked at the file. He looked at the output. He looked at the four-week PRD process he was still in the middle of.
“This is essentially what I’ve been trying to specify,” he said slowly.
“It is,” said Rosa.
“But you built it in an afternoon.”
“I built a version of it. There are things it probably can’t handle yet — edge cases I haven’t seen. But those are the same problems we’d have discovered after six weeks of speccing and building.”
“Except you discovered them by running it. Day one. Not month four.”
Marcus pulled up his PRD. The system shall assess inventory risk based on a combination of factors including but not limited to days of supply, promotional calendars, and seasonality indicators, weighted appropriately for each use case.
He closed the document.
“I wrote that, and it means nothing. Your file means something. It does something.”
He stared at the file name. office_sop_v1.md. A markdown file. Plain English. Running in production.
Who are we now?
Derek, the VP of Engineering, was the hardest to convince. Not because he was stubborn — Derek was actually quite thoughtful — but because he’d seen fads before.
“What happens when it breaks?” he asked, when Priya demonstrated the system to the leadership team.
“We fix the instructions,” said Rosa.
“What happens at scale? 200 clients, weekly runs. Edge cases you haven’t anticipated.”
“Yes. We’ll hit them. We just hit them in week one instead of month four.”
Derek turned to Tom, who’d been quiet the whole meeting, reading the skill file on his laptop.
“What do you make of it?” Derek asked.
Tom looked up. “The AI has already identified the routine parts of its own reasoning and written them as deterministic code. It’s not waiting for us to do that. It’s watching its own outputs and thinking: this part never changes — I should write this down so I don’t have to think about it again.”
Derek leaned in. He read the functions. They were clean. Correct. Exactly what an engineer would have written, if an engineer had ever gotten around to it.
“The code emerged from the SOP,” Derek said slowly.
“Which emerged from the conversation,” said Tom. “That’s the new architecture. It’s not designed upfront — it grows. The AI reasons through novel cases, recognizes which reasoning has become routine, writes deterministic functions for those cases, and promotes them into something reusable. Each run makes the system faster, cheaper, more consistent. The LLM handles what’s new. The code handles what’s known.”
Derek looked at the whiteboard where he usually drew system diagrams before any line of code was written.
“What do my engineers actually build in this world?” he asked.
“The infrastructure around the intelligence,” said Tom. “Rosa and people like her write the business logic, in plain language. What we build is the world those instructions run in safely. The sandbox. The test harness. The function store. The monitoring. The pipeline that says: here’s how a reasoning answer becomes a verified function, and here’s how a verified function gets deployed.”
“We build the rails,” said Aisha, from the corner. “The AI builds the train.”
In engineering we trust
Tom disappeared for two days.
He came back with a small but complete system. He’d taken office_sop_v1.md and built a wrapper around it: a testing harness that ran the instructions against known good outputs and flagged discrepancies. He’d promoted the auto-generated functions into a proper function store — versioned, tested, deployable. He’d built a simple router: when a request matched a pattern the function store already handled, it went straight to deterministic code, no AI involved. When it was something new or ambiguous, it went to the AI, which reasoned through it — and then asked itself: is this routine enough to codify? If yes, it wrote a new function and submitted it to the store for testing and promotion.
“The system gets smarter as it runs,” Tom told Derek. “The first time it sees a new edge case, the AI handles it. The tenth time, it’s a function. The hundredth time, it runs in milliseconds for fractions of a cent.”
Derek drew a diagram on the whiteboard. At the top: Rosa’s skill file, in plain English. Below it: the AI reasoning engine. Branching from that: the function store, growing over time. Below that: the monitoring and testing layer. At the bottom: production.
No PM-to-Engineering handoff. No design review. No sprint planning. Just a loop: write, run, observe, codify, improve.
“This is the architecture,” Derek said. “And we didn’t design it in advance. It just… became apparent.”
“That’s the point,” said Tom. “You can’t design this upfront because you don’t know which cases will be routine until you’ve run it. The architecture is emergent. It reveals itself through use.”
An old Skill, born anew
Rosa’s workshop arrived six weeks later. She called it: How to Write Instructions That Run.
Twenty people showed up — customer success managers, junior PMs, two people from finance who’d heard about it secondhand.
She started with a question.
“How many of you have written a process document that nobody followed?”
Every hand went up.
“Here’s why that happened,” she said. “The document described what to do, but didn’t say precisely enough when, with what inputs, under what conditions, and what to do when something unexpected happens. It was written for a human reader who could infer the gaps. Humans are good at that.”
She paused.
“The AI is different. It can reason, infer, even introspect on its own outputs and decide which parts of its thinking are worth turning into reusable code. But it reasons from exactly what you’ve written — your words are its reality. If you leave out a constraint, it might handle the gap gracefully, or it might handle it wrong, confidently and at scale.”
She wrote on the whiteboard: Precision is the new skill.
“The discipline isn’t in the AI,” she said. “The AI has plenty of capability. The discipline has to be in the writing. Your skill file needs a clear goal, real constraints, worked examples of what right looks like, and honest accounting of your edge cases. If you do that, the AI can execute it, refine it, test its own reasoning, and gradually replace the parts of itself that have become routine with faster, cheaper, deterministic code.”
A junior CS manager raised his hand. “So it’s like writing a really good SOP?”
“It’s exactly like writing a really good SOP,” said Rosa. “Except when you’re done, it runs. And then it improves itself. And the knowledge that used to live only in your head — the knowledge that walked out the door every time someone quit — becomes something the organization actually owns.”
She let that land.
“Your expertise has always had value,” she said. “Now it has a direct path into production.”
Memory. Mind. Muscle.
Three months after Tom built the first wrapper, Derek made a decision that turned out to matter more than anyone expected.
He created an internal platform he called the Skill Registry — a searchable library of every skill.md file the company had produced, alongside the auto-generated functions, test results, version history, and performance metrics: how often each skill ran, what it cost, how frequently the AI still handled cases versus the deterministic functions, and where it still struggled.
Within a month, something unexpected happened. The customer success team was working on churn risk scoring when they discovered that half the logic they needed was already in the function store, extracted automatically from Rosa’s stock-risk skill. The days-of-supply calculations, the promotion-flag logic, the seasonal variance detection — all of it reusable, already tested, available to compose into something new.
“We didn’t have to rebuild it,” the CS manager told Derek. “We just pointed the new skill at the existing functions.”
Derek looked at the registry. Fourteen skills. Forty-three promoted functions. Across all production runs, 67% of processing was now handled by deterministic code — no AI reasoning required. That number had been 0% four months ago.
He thought about the server infrastructure Apes’s big competitors ran. The engineering headcount. The architectural review boards. The design docs that took months to write and were outdated before they were finished.
Then he looked at his registry.
He sent a message to Priya: This might be the most valuable thing we’ve built this year. It’s not a feature. It’s organizational memory that runs.
Organization (re-)organized
About eight months after the Thursday evening Rosa had typed her brain into a chat window, Priya gathered the original group — Marcus, Derek, Tom, Aisha, Rosa — in the small conference room they’d started calling the Workshop.
“I want to talk about what changed,” she said. “Not the numbers. The structure.”
Marcus went first. “The PRD was a translation artifact. I spent most of my time converting my thinking into something engineers could act on. Now the specification is the system. What’s left of my job is the part that was always the hard part: figuring out what to build. There’s nowhere to hide in a half-page brief.”
Tom: “We used to encode business logic. Now we build the environment that hosts it. The intelligence lives in Rosa’s files. Our job is to make sure it runs safely and improves over time.”
Aisha: “The thing that surprised me most is the architecture. We’ve never designed a system less upfront. The shape emerged from running it. The AI told us which cases were routine by writing functions for them. The architecture followed the behavior.”
Derek: “The registry is what I keep coming back to. Organizational knowledge — not in someone’s head, not in a Notion folder nobody reads — actually running, improving, composable.”
Rosa went last.
“The work that always mattered was the work of understanding,” she said. “Understanding what the problem actually was. What the edge cases were. What correct behavior looked like in the strange situations that only came up twice a year. I always did that work. For nine years. It just disappeared into spreadsheets and SOPs that nobody used. Now that understanding goes directly into something that runs. When parts of it become routine, the AI writes them into code. When the code is tested, Tom promotes it. When it’s promoted, anyone on the team can reuse it.”
She looked around the table.
“My operational knowledge used to evaporate the moment I walked out of a room. Now it compounds.”
The room was quiet.
“The document became the system,” Marcus said.
“And the system,” said Tom, “keeps writing itself.”
A brave new world
Outside, in the main office, a new engineer was being onboarded. He’d come from a large company where he’d spent two years building a single feature, moving through three layers of review, four rounds of cross-team alignment, a committee that had to approve the database schema.
His manager, Tom, handed him a folder on his first day.
Inside was a markdown file. And below the instructions, a section titled Routine Functions — Auto-Generated, with clean, tested code the AI had written for itself after recognizing which parts of its reasoning never changed.
“What’s this?” the new engineer asked.
“It’s the stock-risk system,” said Tom. “The whole thing.”
“The AI wrote some of this code?”
“It wrote all of it. Rosa wrote the instructions. The AI wrote the code. We built the infrastructure that makes sure the code is correct before it runs.”
“Where’s the system design doc?”
“You’re holding it.”
The engineer turned it over in his hands.
“At my last company, something like this would have been a hundred-page design doc, three microservices, and a six-month roadmap.”
“I know,” said Tom.
“And here it’s a markdown file that writes its own functions.”
“When it needs to, yes.”
“Does it actually work?”
Tom smiled. “Run it and see.”
The boundary between the document and the deed was never a law of nature. It was just the cost of the tools we had.
Better tools don’t just speed up the old way of working. They dissolve the walls between the people who understood and the people who built — until, eventually, understanding and building become the same act.

