Quick Start

This guide will help you set up Autoheal and run your first AI-powered incident investigation.

Prerequisites

Before you begin, ensure you have:

An Autoheal account (contact support@autoheal.ai for access)
Admin role in your organization
Credentials for at least one observability tool (Datadog, Grafana, etc.)

Step 1: Log In to Autoheal

Access the Dashboard

Navigate to your Autoheal instance URL (e.g., https://<tenant>.autoheal.ai) and click Log In.

Authenticate

Step 2: Connect Your First Integration

Integrations allow the AI agent to query your observability tools. Let's connect your first one.

Navigate to Integrations

From the sidebar, click Integrations.

Choose an Integration

Select the integration you want to connect. We recommend starting with your primary monitoring tool (e.g., Datadog, Grafana).

Enter Credentials

Provide the required API credentials. See Integrations for detailed instructions for each integration.

tip

Start with an observability tool as your first integration - they provide the richest context for incident investigation.

Step 3: Set Up Your Production Context Graph

The Production Context Graph stores skills, procedures, and learnings that the AI agent can reference during investigations.

Navigate to Instructions

From the sidebar, click Instructions.

Create Your First Skill

Click New Document and select Skill as the template.

Add Content

Document a common incident type your team handles. Include:

Symptoms and how to identify the issue
Step-by-step remediation procedures
Escalation paths if needed

Save and Publish

Save your skill. It's now available to the AI agent.

Step 4: Start an Investigation

Now you're ready to use the AI Investigation Agent!

Open the Investigations Panel

From the sidebar, click Investigations.

Describe the Issue

Type a description of the incident you want to investigate. For example:

High latency alerts firing on the checkout service.
Started around 2:30 PM UTC.

Let the Agent Investigate

The AI agent will:

Query your connected integrations for relevant data
Search your Production Context Graph for related skills
Analyze patterns and correlations
Present findings and recommendations

Follow Up

Ask follow-up questions to dive deeper:

Can you check the database metrics during that time?

Are there any similar incidents in the past week?

Example Investigation

Here's what a typical investigation looks like:

User Query

"We're seeing 5xx errors spike on the payment service. Can you investigate?"

Agent Response

The agent will:

Query Datadog/Grafana for error rates and latency metrics
Pull recent logs from the payment service
Check for recent deployments or changes
Search the Production Context Graph for payment service skills
Present a summary with potential root causes

Follow-up Questions

You can then ask:

"What changed in the last deployment?"
"Show me the database connection pool metrics"
"Are other services affected?"

Next Steps

Connect More Integrations

Add more observability tools for richer investigations.

Build Your Production Context Graph

Document your skills and procedures.

Invite Team Members

Add your team and configure permissions.

Explore Integrations

See all available integrations.

Getting Help

note

Need assistance? Contact our support team at support@autoheal.ai or check the detailed guides in this documentation.

Prerequisites​

Step 1: Log In to Autoheal​

Step 2: Connect Your First Integration​

Step 3: Set Up Your Production Context Graph​

Step 4: Start an Investigation​

Example Investigation​

Next Steps​

Getting Help​