Skip to main content

Tailscale Integration

The Tailscale integration lets Autoheal reach private services that live on your Tailscale tailnet — internal Grafana, Temporal, HTTP APIs, databases, or anything else only reachable from inside your network. During an investigation, the agent's sandbox joins your tailnet as a short-lived node and routes traffic through it, so you can keep those services off the public internet.

Unlike a data-source integration, Tailscale is a network connector: it does not add new query tools on its own. Instead, it provides the connectivity layer that other integrations and CLI tools (curl, temporal, psql, kubectl, gh) use to reach hosts that would otherwise be unreachable from the sandbox.

How It Works

  1. When a sandbox starts, Autoheal launches a background tailscaled process inside it, using the auth key from your integration.
  2. tailscaled runs in userspace networking mode (no elevated kernel privileges required) and joins your tailnet as an ephemeral node named autoheal-<id>.
  3. It exposes a local SOCKS5 / HTTP proxy on 127.0.0.1:1055. Sandbox tools route through this proxy to reach tailnet hosts, with DNS resolved by Tailscale (MagicDNS / split-DNS).
  4. Reachability is governed entirely by your Tailscale ACLs — the sandbox can only reach hosts you grant its tag.

Architecture

Autoheal Sandbox

│ tailscaled (userspace networking) → joins your tailnet
│ local SOCKS5/HTTP proxy on 127.0.0.1:1055


Your Tailnet (WireGuard mesh, encrypted, outbound-only from the sandbox)

│ subnet routes / direct nodes, scoped by ACL tag


Private services (internal Grafana, Temporal, APIs — no public endpoint)
  • The sandbox node is ephemeral: it automatically removes itself from your tailnet shortly after the sandbox stops.
  • No inbound ports, no public IP, and no NET_ADMIN capability are required — the connection is outbound-only.
  • The auth key is stored encrypted (Vault/secrets manager) and never written to disk inside the sandbox image.

Capabilities

Once connected, the agent's sandbox can:

CapabilityDescription
Reach private hostsConnect to any tailnet host or advertised subnet route the sandbox's ACL tag permits
MagicDNS resolutionResolve internal hostnames (e.g. grafana.internal.example.com) through Tailscale DNS
Route any CLIcurl, temporal, psql, kubectl, gh, and other tools reach private endpoints via the local proxy
Ephemeral, tagged identityEach sandbox joins as a tagged, short-lived node, auto-removed when it goes offline

Prerequisites

  • A Tailscale tailnet (any plan that supports auth keys and tags).
  • Admin access to the Tailscale admin console to mint an auth key and edit ACLs.
  • A tag defined in your ACL policy for Autoheal sandboxes (e.g. tag:autoheal-sandbox).
  • The private services you want to reach must already be on the tailnet — either as Tailscale nodes or behind a subnet router.

Setup

1
Define a tag for Autoheal sandboxes

In the Access controls editor, add a tag owner so you can mint keys carrying that tag:

{
"tagOwners": {
"tag:autoheal-sandbox": ["autogroup:admin"]
}
}

Then grant that tag access to the hosts you want investigations to reach. For example, to allow HTTPS to an internal subnet fronted by a subnet router:

{
"grants": [
{
"src": ["tag:autoheal-sandbox"],
"dst": ["tag:internal-router"],
"ip": ["443"]
}
]
}

Save the policy. (Until you scope it, a broad "allow all" grant also works for testing, but least-privilege is recommended.)

2
Create an auth key
  1. Go to Settings → Keys in the Tailscale admin console.
  2. Click Generate auth key.
  3. Enable these options:
    • Reusable — so every sandbox can authenticate with the same key.
    • Ephemeral — so sandbox nodes auto-remove when they go offline.
    • Tags — add tag:autoheal-sandbox.
    • Pre-authorized — if device approval is enabled on your tailnet.
  4. Set an expiry (e.g. 90 days) and click Generate key. Copy the tskey-auth-… value immediately — it is shown only once.
note

You must be signed in as an identity that owns the tag (per tagOwners above), or Tailscale will reject the key as invalid at connect time.

3
Add the integration in Autoheal
  1. Go to Settings → Integrations and select Tailscale.
  2. Enter a Name (e.g. "Production Tailnet").
  3. Paste the Tailscale Auth Key.
  4. Click Save. (There is no connection test — a network connector is verified the next time a sandbox starts.)
4
Verify on the next investigation

Start an investigation. The sandbox will join your tailnet on startup. Confirm in the Tailscale admin Machines page that a node named autoheal-<id> appears (and disappears again after the sandbox stops).

Configuration Fields

FieldRequiredNotes
Tailscale Auth KeyYesReusable + Ephemeral + tagged key. Stored encrypted. Drives the connection.
TailnetNoYour tailnet domain (e.g. your-org.ts.net). Display-only — informational. If left blank, the integration still works; membership is determined by the auth key.
Allowed Host SuffixesNoComma-separated host suffixes (e.g. .internal.example.com,.ts.net) for a future host allowlist. Collected for documentation today; host-suffix enforcement is reserved for a later release. Leave blank for no restriction.

Verifying the Connection

From inside a running sandbox, the agent can check tailnet status and reachability:

Show me the tailscale status in the sandbox
Curl https://grafana.internal.example.com/login through the tailnet and report the HTTP status

A healthy connection shows the sandbox node with a 100.x tailnet IP and successful requests to your private hosts.

Security Considerations

  • Least privilege via ACLs. The sandbox can reach only what its tag (tag:autoheal-sandbox) is granted in your ACL policy. Scope the grant to the specific hosts and ports investigations need — don't rely on a broad allow-all rule in production.
  • Ephemeral nodes. Sandbox nodes are short-lived and auto-removed, so you won't accumulate stale devices, and a sandbox can't retain tailnet access after it stops.
  • Outbound-only. The sandbox initiates the connection; nothing inbound is opened to your network.
  • Encrypted secret storage. The auth key is marked sensitive and stored in your secrets backend, never in the sandbox image or logs.
  • Key hygiene. Use a reasonable expiry and rotate the auth key periodically. Revoke it from the Tailscale admin if it is ever exposed.

Limitations

  • Reachability is bounded by your Tailscale ACLs and any subnet routes you advertise; the integration cannot reach a host your tag isn't granted.
  • The Allowed Host Suffixes field is collected but not yet enforced — host scoping today is done in your Tailscale ACL policy.
  • Ephemerality is a property of the auth key (set in the admin console), not a per-sandbox setting.

Troubleshooting

SymptomLikely CauseFix
invalid key … not valid at startupAuth key revoked/expired, or minted by an identity that doesn't own the tagMint a fresh key signed in as a tag owner; update the integration
Sandbox joins but can't reach a hostACL grant missing, or no subnet route to the hostAdd a grants rule for tag:autoheal-sandbox → the host/port; advertise the subnet via a subnet router
No autoheal-* node appears in adminIntegration not saved, or no sandbox has started since savingSave the integration, then start a new investigation
Internal hostname won't resolveMagicDNS / split-DNS not configured for the domainConfigure DNS for the domain in your Tailscale admin DNS settings

Example

Once connected, you can ask the agent to investigate using private services:

Walk through the Frontend gRPC Latency dashboard on our internal Grafana
Check the running workflows in the agents namespace on our internal Temporal