Setting Up Your Production Context Graph

This guide walks you through creating and organizing your Production Context Graph documents.

Prerequisites

Access to your Autoheal organization
At least one integration connected (recommended, so you have something to document)

Accessing the Production Context Graph

From the Autoheal sidebar, click any of the pages and any of the tabs within those pages.

Page	Tab	Purpose
Skills	Agent's instructions	Agents.md containing general context about your team, services, and procedures for alert and incident response
Skills	Integration Skills	How your team uses specific tools (Datadog, GitHub, etc.)
Skills	Alert skills	Step-by-step procedures for specific alerts or common issues
Catalog	Organized by entity type	See entities like service, team, people (in Table view) that Autoheal extracts/imports/syncs from your integrations and the relationships between them (in Graph view)
Decision traces	Accepted root causes
Decision traces	Published postmortems
Decision traces	Learned memories

Creating Your First Document

Select a Category Tab

Click the tab matching the type of document you want to create. For your first document, start with Alert Skills—this gives the agent step-by-step procedures for specific alerts.

Click New Document

Click the + button to create a new document.

Choose a Template

Select a template that fits your content:

Agent's instructions - Overview of your services and environment
Alert Skill - Procedure for a specific alert
Integration Skill - How to use a specific tool
Blank - Start from scratch

Write Your Content

Use the markdown editor to write your document. The editor supports:

Standard markdown formatting
Code blocks with syntax highlighting
Tables for structured information
Live preview

Save

Click Save. Your document is immediately available to the AI agent during investigations.

Document Categories

Alert Skills

Procedures for specific alerts. Include what the alert means, how to investigate, and how to resolve:

# High Payment Latency

## What This Means
Payment API p99 latency exceeded 500ms for 5+ minutes.

## Investigation Steps
1. Open "Payments Deep Dive" dashboard in Datadog
2. Check if latency is isolated to specific endpoints
3. Look for correlation with database latency

## Common Causes
- Database connection pool exhaustion
- Downstream payment provider issues
- Recent deployment regression

## Resolution
- If database: Scale API replicas or kill long-running queries
- If provider: Check status.stripe.com, enable fallback
- If deployment: Roll back with `kubectl rollout undo`

Integration skills

How your team uses specific tools—dashboards, queries, tagging conventions:

# How We Use Datadog

## Tagging Convention
- `env:production`, `env:staging`
- `service:<service-name>`
- `team:platform`, `team:payments`

## Useful Log Queries
- All errors: `status:error env:production`
- Payment failures: `service:payments-api @error.type:PaymentFailed`

## APM Services
- `payments-api-prod` (production payments)
- `user-service-prod` (production auth)

Memories

Learnings from past investigations—what happened, how you found it, how to prevent it:

# Memory: Payment Timeouts During Batch Processing

## What Happened
Payment latency spiked to 2s+ every day at 2pm PST.

## Root Cause
Batch job exhausting database connection pool.

## How We Found It
1. Noticed pattern only occurred weekdays at same time
2. Correlated with batch job schedule
3. Connection pool metrics showed saturation

## Prevention
- Added connection limits to batch job
- New alert: connection pool > 80%

Writing Effective Documents

Be Specific

The agent can't use vague guidance. Instead of "check the dashboard," write "open the 'Platform Team Overview' dashboard in Datadog."

Vague: Check the logs for errors.

Specific: Run this Datadog query: service:payments-api status:error env:production

Include Commands

Copy-paste-ready commands help both the agent and tired on-call engineers:

# Roll back the payments-api deployment
kubectl rollout undo deployment/payments-api -n production

# Check current replica count
kubectl get deployment payments-api -n production

Explain Why

Don't just document what to do—explain why. This helps the agent make better decisions in novel situations:

"Check the batch job schedule first because time-correlated latency issues are often caused by scheduled jobs competing for database connections."

Name Names

Include specific contacts, channels, and escalation paths:

Slack: #payments-oncall
Escalation: @jane-smith (payments lead)
PagerDuty: Platform → SRE → Engineering Manager

LLM Review

When you save a document, Autoheal generates a review with suggestions for improvement. You'll see feedback like:

Questions about missing information
Suggestions for more specific commands
Recommendations for better organization

Review the feedback and update your document as needed. This helps ensure your production context is as useful as possible.

Importing Existing Documentation

If you have existing skills in Confluence, Notion, or a Git repository:

Export or copy the content as markdown
Create a new document in the appropriate category
Paste and adjust the formatting
Add any missing context (specific dashboard names, commands, contacts)

tip

Don't try to migrate everything at once. Start with your most critical skills—the ones that get used during real incidents. Expand from there based on what gaps you notice during investigations.

Verifying Your Setup

Test that your Production Context Graph is working:

Go to Investigations
Start a new investigation

Ask about something you've documented:

How should I investigate high payment latency?

The agent should reference your skill in its response

If the agent doesn't find your document, check that:

The document was saved successfully
The content includes the keywords you're asking about
You're asking in a way that matches how you described the issue

Next Steps

Production Context Graph Overview

Learn how the agent uses your Production Context Graph and how it evolves over time.

Prerequisites​

Accessing the Production Context Graph​

Creating Your First Document​

Document Categories​

Alert Skills​

Integration skills​

Memories​

Writing Effective Documents​

Recommended First Documents​

LLM Review​

Importing Existing Documentation​

Verifying Your Setup​

Next Steps​