Agent Apps turns skills into software

If you use an AI coding agent, you've written skills: prompt files that describe how to process, validate, generate, or decide things. Skills are the best way to tell AI what to do. But they only work when an agent is driving. There's always an LLM in the loop, whether the task needs intelligence or not. Skills today are instructions for an agent. They're not software.

Agent Apps makes them software: a skill becomes a program you can run from the command line, compose with other skills, and extend with new capabilities.

Skills become software

Let's look at a support ticket classifier. Same format you already know — markdown with YAML frontmatter — but now it runs on its own:

---
name: classify-ticket
metadata:
  params:
    type: object
    properties:
      text: { type: string }
    required: [text]
---

Classify this support ticket.

- "billing": charges, invoices, payments, refunds
- "technical": errors, crashes, bugs, performance
- "account": login, password, permissions
- "other": everything else

:arg[text]

Adding params gives the skill a typed contract that other skills and agents can depend on, but even that is optional: a skill can be a bare prompt with no frontmatter at all.

You can run it just like any other program:

$ agent-apps classify-ticket --text "500 error on data export"
{ "category": "technical", "priority": "high" }

The runtime discovers skills by name from your project directory. Pass arguments with the --arg flag.

Skills run anywhere software runs

A program that only runs on the command line isn't very useful. Real software serves web requests, processes events, executes on a schedule. Skills can now do that, too.

The YAML frontmatter includes a field called metadata. Each key you add activates a capability. For example, we can serve HTTP requests by adding a web config to the main skill:

# main skill
metadata:
  web:
    port: 3000
    routes:
      - method: GET
        path: /products/:id
        ref: product-page

---
name: product-page
allowed-tools: file-read
---

Look up product :arg[route.id] and generate an HTML page for it.
Include the name, price, description, and a buy button.

The skill runs whenever someone visits /products/123. We didn't write any server code. The runtime handles dispatch.

The same pattern connects skills to other channels, too. Here's a skill that processes email:

# main skill
metadata:
  email:
    region: us-east-1
    bucket: my-email-bucket
    from: agent@mydomain.com
    routes:
      - ref: support-assistant

---
name: support-assistant
allowed-tools: file-read file-search
---

You handle inbound support email. Read the message, look up relevant
documentation, and then send a helpful reply.

Add a slack: config with routes: and the same skill also handles Slack messages. The skill doesn't change. The transport config does.

Metadata can also customize the runtime itself. Want a skill to remember conversations, require approval before dangerous tools, or run on a schedule?

metadata:
  session: { key: nonlocals.gateway.id }              # remembers conversations
  trust: { rules: "shell($deny) file-write" }      # asks permission before dangerous tools
  schedule:                                          # enables scheduling with file persistence

One line each. Skills are just as configurable as software.

Skills do everything software does

If natural language is programming, it needs to handle what real software handles. Skills can.

Everything shown so far, the web servers, email handlers, trust rules, session storage, those are all tools the agent can use too. When a skill runs, the agent sees every capability available to it and can call them directly. A skill's prompt can describe complex, multi-step behavior and the agent will carry it out:

---
name: team-status
allowed-tools: "email-send slack-send file-read file-write event-wait"
metadata:
  session: { key: "team-status" }
---

Every weekday at 4pm, email each person on the team asking for a status
update. If someone hasn't replied by 5pm, follow up on Slack. The next
business morning, compile all the updates into a team newsletter and
email it to the distribution list.

Team list: :inline[team.md]

The agent handles the scheduling, the follow-ups, and the aggregation. It uses event-wait to set timers, email-send and slack-send to reach people, and session storage to track who has responded across invocations. The skill runs unattended, reacting to events as they arrive.

Skills can also respond to events directly. Here's a deployment monitor that watches for failures and pages the on-call:

---
name: deploy-monitor
allowed-tools: "slack-send file-read"
---

When a deployment completes, check if it succeeded or failed.
If it failed, look up the on-call rotation and notify them on Slack
with the deployment details and relevant logs.

Configure the main skill to route deploy events to this skill:

metadata:
  event:
    routes:
      - topic: deploy-complete
        ref: deploy-monitor

The skill runs only when a deploy-complete event fires. No polling loop, no wasted compute.

Skills are building blocks

We can compose skills the same way we compose software. A support workflow can use the classifier we built earlier alongside other skills:

---
name: handle-ticket
allowed-tools: classify-ticket draft-reply email-send
---

When a support email arrives, classify it using :skill[classify-ticket].
If it's urgent, draft a reply with :skill[draft-reply] and send it immediately.
Otherwise, add it to the backlog.

This skill composes three others: classify-ticket, draft-reply, and email-send. Each is a standalone skill with its own contract. You can mention them explicitly with CommonMark directives, or you can let the agent figure it out. The orchestration is plain language.

We can also pull in external tools through MCP servers. Declare them in your project config and they become callable skills:

metadata:
  mcp:
    github:
      command: npx
      args: [-y, "@modelcontextprotocol/server-github"]

Now any skill in the project can call github-create-issue or github-list-repos. A project is a collection of skills that work together. A main skill defines the shared configuration: which model to use, which transports to enable, what trust rules apply. Every other skill inherits that configuration and adds its own behavior. One project can have an email assistant, an HTTP API, and a scheduled job all running together, each as its own skill.

Skills are predictable

To work like software, skills need to be efficient, accurate, and cost-effective. Not every skill needs AI. You probably don't want an LLM processing every health check or serving every API response. So we let you write skills in code, too. They work exactly like markdown skills: same name-based discovery, same typed contracts, same way of being called.

Here's a health check with no AI involved:

export const frontmatter = {
  name: 'health',
};

export default async function() {
  return { status: 'ok', timestamp: Date.now() };
}

Normal web-server speed. No model call, no token cost. Code skills also let you mix AI with deterministic logic in the same workflow. You shouldn't have to use an LLM for an if statement:

export const frontmatter = { name: 'handle-ticket' };

export default async function(ctx, { text, from }) {
  const { category, priority } = await ctx.manager.invoke('classify-ticket', { text });
  if (priority === 'urgent') {
    const reply = await ctx.manager.invoke('draft-reply', { text, category });
    await ctx.manager.invoke('email-send', { to: from, body: reply });
  }
  return { category, priority, handled: priority === 'urgent' };
}

No LLM is involved in the orchestration. AI runs only for classification and drafting. You choose where that boundary falls.

Skills transform into code

Agent Apps is a natural language runtime, so you shouldn't have to write code. We can generate code skills from markdown skills automatically. When a skill's logic stabilizes and you want to stop paying for inference, compile it:

$ agent-apps skill-compile --ref product-page

The compiler reads the prompt and generates a JavaScript module that does the same thing. Here's what the product-page skill compiles to:

export default async function(ctx, { route }) {
  const content = await ctx.manager.invoke('file-read', { path: `products/${route.id}.json` });
  const product = JSON.parse(content);
  return {
    html: `<!DOCTYPE html>
<html><head><title>${product.name}</title></head>
<body>
  <h1>${product.name}</h1>
  <p>${product.description}</p>
  <p class="price">$${product.price}</p>
  <button>Buy Now</button>
</body></html>`,
  };
}

Same skill, same route, same contract. No LLM at runtime. The generated code is a build artifact: delete it, regenerate it, change it. The prompt stays the source of truth. Change the description, recompile, the code updates. Soon, compilation will be "just in time," too.

Skills have their own toolchain

If skills are software, they need a proper toolchain. Agent Apps ships with a test runner, a runtime analyzer, a compiler, and a package registry.

The test runner validates skills the way you'd test any program. Run it against a skill and it checks the frontmatter, validates the params, executes the skill on test inputs, and scores the output. When structural checks aren't enough, it brings in an AI judge to evaluate whether the output actually makes sense:

$ agent-apps skill-test --ref classify-ticket
✓ frontmatter valid
✓ params: 1 required arg (text: string)
✓ output type: object
✓ has key: category
✓ has key: priority
✓ judge: correct classification for input
  6/6 checks passed — score: 1.0

You can also snapshot a known-good output and the test runner will derive constraints from it automatically. No hand-written assertions. Change the skill, re-snapshot, the constraints update.

The analyzer shows you what a skill looks like from the runtime's perspective. What tools can it call? What middleware runs before it? Where does each piece of its configuration come from?

$ agent-apps skill-analyze --ref product-page
  type: markdown
  tools: file-read
  middleware: params
  directives: arg (1)
  cascade: params defined locally

Useful when a skill behaves differently than you expect and you need to understand why.

And skills are shareable. The hub is a package registry that lets you install community skills and publish your own:

$ agent-apps hub-search --query email
  email-assistant@0.2.1 — Handle inbound email with AI [email, assistant]
  email-templates@0.1.0 — Reusable email formatting skills [email, templates]

$ agent-apps hub-add --name email-templates
  ✓ installed email-templates@0.1.0 (3 skills)

Install a package and its skills become available to your project immediately. Write a skill, test it, compile it, share it.

Skills should be well-architected

Architecture matters for skills as much as it does for any software system.

Under the hood, Agent Apps is a tool runner. A tool is a callable unit of work: a markdown skill, a JavaScript function, or a capability bridged from an MCP server. Tools are generic. An agent calls them, code calls them with ctx.manager.invoke(), and metadata activates them with tool-name: { config }. The core is tiny. All functionality is layered on as tools.

When you write trust: { rules: "shell($deny)" } in your metadata, the runtime finds a tool called trust and runs it. When you write web: { port: 3000 }, it finds a tool called web and starts a server. These run as a middleware pipeline before your skill executes, the same proven pattern as Express or Koa. Because every capability is a tool, you extend the system the same way it's built. Write a skill called rate-limit, and any skill activates it by adding rate-limit: { max: 100 } to its metadata. There's no plugin API because the skill is the plugin.

The same applies to the framework's core. The HTTP server, the trust system, the agent itself: all replaceable. You override any piece of the system by writing a better one. Nothing is locked down.

What Agent Apps isn't

If skills-as-context inside your IDE is working for you, you don't need a runtime. If AI is a small part of what you're building, a lower-level framework gives you more control. If you want an interactive assistant that uses tools on demand, products like Kiro, OpenClaw, Claude Code, and Quick already do that well.

Agent Apps is for building software out of natural language. You can build assistants with it, but that's not the point. The point is that skills can be programs: tested, compiled, deployed, and running without a human in the loop.

What Agent Apps is

We think natural language is becoming a real programming language. Not a replacement for code, but a layer above it. Like any language, it needs structure: a way to run, test, compose, and compile programs written in it. That's what Agent Apps is.

It earns its keep when you have multiple skills that need to work as a system. When you're composing workflows, serving users across channels, running unattended. When skills need to stop being context and start being software.

We're still early in understanding what it means to build software this way. This is a first step.

Get started

We're announcing Agent Apps internally today. It's been in active use by our team and we're ready for more people to try it.

git clone --depth=1 --branch release ssh://git.amazon.com/pkg/AgentApps.git ~/.agent-apps/cli
bash ~/.agent-apps/cli/scripts/install.sh

mkdir my-app && cd my-app
agent-apps init
agent-apps hello --name World

Drop .skill.md files in your project. They're CLI commands immediately. Add web: to serve HTTP. Add email: for inbound messages. One line of config each. Try building a ticket classifier API, an email assistant for your team, or a workflow that mixes AI judgment with deterministic code.

Node 22+. AWS credentials with Bedrock access for AI-powered skills. Code-only skills work without it.

Getting Started · Tutorial · Cookbook · Specification