Skip to main content
Pipeline agents execute work close to private data sources or regulated networks. When an agent goes offline or reports degraded health, pipelines that target that agent pool stall. This page walks through heartbeat failures, network rules, logs, and safe restarts.

Agent offline

Symptoms: Runs queue indefinitely, UI shows agent offline, or no eligible workers.
1

Confirm scope

Identify which pools, tags, and environments the failing pipeline requires. A mismatch sends work to an empty pool while another pool stays healthy.
2

Check last heartbeat

In Settings → Agents (or your org’s Compute view), open the agent record. Last seen timestamps older than the lease interval mean the control plane considers the host dead.
3

Verify process and service

On the host, confirm the agent service is running and not crash-looping. Inspect systemd, Windows Service, or container restart counts.
4

Validate clock and TLS

Large clock skew breaks token validation. Ensure NTP is healthy and TLS inspection proxies present a trusted CA to the agent.

Heartbeat failures

Heartbeats are lightweight HTTPS calls to the Planasonix control plane.
  • Intermittent failures often trace to corporate proxies or satellite links—tune keepalive and idle timeouts on middleboxes.
  • 403 on heartbeat usually means registration token rotation or revoked enrollment—re-enroll with a fresh token from the UI.
  • Certificate pinning or custom trust stores on the agent host must include current Planasonix intermediate CAs after platform cert updates.
Graph heartbeat latency alongside packet loss on the host; rising loss predicts offline state before user-visible job failures spike.

Network configuration and firewall rules

Allow egress HTTPS from the agent to Planasonix API endpoints documented for your region. For private connectivity options, follow the VPC or PrivateLink guide your account team provides.
TCP 443 to control plane hosts; no inbound connections required from the internet to the agent for standard enrollment.
Symmetric routing issues (different egress paths for forward and return traffic) cause random TCP failures. Verify SNAT and firewall rules as a pair with your network team.

Log collection

Enable debug logging only while investigating—redact tokens before you attach files to tickets.
  • Linux: journalctl -u planasonix-agent (service name may differ).
  • Windows: Event Log or the install directory logs folder.
  • Container: Mount a volume for logs or ship to your stdout aggregator.
Rotated logs or temp spill can fill disks; agents then fail heartbeats. Monitor free space and inode usage on small VMs.
The agent user needs read/write to its workspace and cache directories after upgrades or SELinux policy changes.

Agent restart procedures

1

Drain work

Mark the agent draining in the UI if supported so new leases stop landing on the host.
2

Restart the service

Use your standard runbook (systemctl restart, service snap-in, or kubectl rollout restart).
3

Verify enrollment

Confirm heartbeat resumes and version matches the recommended release for your tenant.
4

Resume traffic

Clear draining state and watch the next scheduled or manual run complete end to end.
For clustered agents, restart one node at a time so you retain capacity during the health check period.

Compute

Pools, sizing, and agent registration overview.

Connection troubleshooting

Diagnose database paths from agent egress IPs.