Setting Up Your Knowledge Base

This guide walks you through creating and organizing your Knowledge Base documents.

Prerequisites

Access to your Autoheal organization
At least one integration connected (recommended, so you have something to document)

Accessing the Knowledge Base

From the Autoheal sidebar, click Knowledge. You'll see four tabs organizing your content:

Tab	Purpose
Agent's Instructions	General context about your team, services, and procedures
Integrations	How your team uses specific tools (Datadog, GitHub, etc.)
Alert Runbooks	Step-by-step procedures for specific alerts
Memories	Learnings from past investigations

Creating Your First Document

Select a Category Tab

Click the tab matching the type of document you want to create. For your first document, start with Agent's Instructions—this gives the agent general context about your environment.

Click New Document

Click the + button to create a new document.

Choose a Template

Select a template that fits your content:

Team Context - Overview of your services and environment
Alert Runbook - Procedure for a specific alert
Integration Guide - How to use a specific tool
Blank - Start from scratch

Write Your Content

Use the markdown editor to write your document. The editor supports:

Standard markdown formatting
Code blocks with syntax highlighting
Tables for structured information
Live preview

Save

Click Save. Your document is immediately available to the AI agent during investigations.

Document Categories

Agent's Instructions

General context that applies across investigations. Start here with a team overview document:

# Platform Team Context

## Our Services
- **payments-api**: Payment processing (github.com/acme/payments-api)
- **user-service**: Authentication (github.com/acme/user-service)
- **gateway**: API gateway, routes all external traffic

## Key Datadog Dashboards
- "Platform Team Overview" - First stop for any incident
- "Payments Deep Dive" - Payment-related investigations
- "Database Performance" - Slow query issues

## On-Call Contacts
- Primary: #platform-oncall in Slack
- Escalation: Page @platform-lead after 15 minutes

Alert Runbooks

Procedures for specific alerts. Include what the alert means, how to investigate, and how to resolve:

# High Payment Latency

## What This Means
Payment API p99 latency exceeded 500ms for 5+ minutes.

## Investigation Steps
1. Open "Payments Deep Dive" dashboard in Datadog
2. Check if latency is isolated to specific endpoints
3. Look for correlation with database latency

## Common Causes
- Database connection pool exhaustion
- Downstream payment provider issues
- Recent deployment regression

## Resolution
- If database: Scale API replicas or kill long-running queries
- If provider: Check status.stripe.com, enable fallback
- If deployment: Roll back with `kubectl rollout undo`

Integrations

How your team uses specific tools—dashboards, queries, tagging conventions:

# How We Use Datadog

## Tagging Convention
- `env:production`, `env:staging`
- `service:<service-name>`
- `team:platform`, `team:payments`

## Useful Log Queries
- All errors: `status:error env:production`
- Payment failures: `service:payments-api @error.type:PaymentFailed`

## APM Services
- `payments-api-prod` (production payments)
- `user-service-prod` (production auth)

Memories

Learnings from past investigations—what happened, how you found it, how to prevent it:

# Memory: Payment Timeouts During Batch Processing

## What Happened
Payment latency spiked to 2s+ every day at 2pm PST.

## Root Cause
Batch job exhausting database connection pool.

## How We Found It
1. Noticed pattern only occurred weekdays at same time
2. Correlated with batch job schedule
3. Connection pool metrics showed saturation

## Prevention
- Added connection limits to batch job
- New alert: connection pool > 80%

Writing Effective Documents

Be Specific

The agent can't use vague guidance. Instead of "check the dashboard," write "open the 'Platform Team Overview' dashboard in Datadog."

Vague: Check the logs for errors.

Specific: Run this Datadog query: service:payments-api status:error env:production

Include Commands

Copy-paste-ready commands help both the agent and tired on-call engineers:

# Roll back the payments-api deployment
kubectl rollout undo deployment/payments-api -n production

# Check current replica count
kubectl get deployment payments-api -n production

Explain Why

Don't just document what to do—explain why. This helps the agent make better decisions in novel situations:

"Check the batch job schedule first because time-correlated latency issues are often caused by scheduled jobs competing for database connections."

Name Names

Include specific contacts, channels, and escalation paths:

Slack: #payments-oncall
Escalation: @jane-smith (payments lead)
PagerDuty: Platform → SRE → Engineering Manager

LLM Review

When you save a document, Autoheal generates a review with suggestions for improvement. You'll see feedback like:

Questions about missing information
Suggestions for more specific commands
Recommendations for better organization

Review the feedback and update your document as needed. This helps ensure your knowledge base is as useful as possible.

Importing Existing Documentation

If you have existing runbooks in Confluence, Notion, or a Git repository:

Export or copy the content as markdown
Create a new document in the appropriate category
Paste and adjust the formatting
Add any missing context (specific dashboard names, commands, contacts)

tip

Don't try to migrate everything at once. Start with your most critical runbooks—the ones that get used during real incidents. Expand from there based on what gaps you notice during investigations.

Verifying Your Setup

Test that your knowledge base is working:

Go to Investigations
Start a new investigation

Ask about something you've documented:

How should I investigate high payment latency?

The agent should reference your runbook in its response

If the agent doesn't find your document, check that:

The document was saved successfully
The content includes the keywords you're asking about
You're asking in a way that matches how you described the issue

Next Steps

Knowledge Base Overview

Learn how the agent uses your knowledge base and how it evolves over time.

Prerequisites​

Accessing the Knowledge Base​

Creating Your First Document​

Document Categories​

Agent's Instructions​

Alert Runbooks​

Integrations​

Memories​

Writing Effective Documents​

Recommended First Documents​

LLM Review​

Importing Existing Documentation​

Verifying Your Setup​

Next Steps​