Alert Troubleshooting

This guide covers common alert scenarios and their resolutions.

Critical Alert Playbooks

Device Offline (CONN_001)

Immediate Actions:

Check Console - Verify device shows as offline
Ping device (if on same network)
```
ping {device-ip}
```
Check physical - Power, network cable, indicator lights

If device unreachable:

Check network switch/router
Verify VLAN configuration
Check PoE power budget

If device reachable but still showing offline:

Access camera web UI
Check Anava ACAP status (Apps → Anava Agent)
Restart ACAP if stopped
Check MQTT configuration hasn't changed

Certificate Validation Failed (SEC_001)

This is a security-critical alert. Investigate immediately.

Check certificate expiry

Console → Devices → [Device] → Certificates

Verify CA chain
- Device should trust Anava root CA
- Check for CA mismatch
Investigate network path
- Could indicate MITM attempt
- Check for proxy interception
- Verify DNS resolution
If legitimate expiry:
- Initiate certificate rotation
- Console → Devices → Actions → Rotate Certificate

Unauthorized Broker Attempt (SEC_002)

Security incident - treat as high priority.

Check ConfigGuardian logs

Console → Devices → [Device] → Logs → Filter: ConfigGuardian

Identify the attempted broker
- Alert context shows attempted hostname/IP
Determine source:
- Manual configuration change?
- Network-level redirect?
- Malicious tampering?
Response:
- Verify device is now connected to correct broker
- Check for other affected devices
- Review network security

Common Scenarios

Cluster of Devices Go Offline

Pattern: Multiple devices in same location offline simultaneously

Likely causes:

Network switch/router failure
PoE switch overload
ISP/WAN outage
DHCP server issue

Investigation:

Check if all devices are in same subnet/location
Verify network infrastructure status
Check DHCP lease availability
Contact network team

Repeated Configuration Conflicts (CFG_003)

Pattern: Same device shows CFG_002 (healed) followed by CFG_003 (conflict) repeatedly

Likely causes:

Someone manually changing camera settings
Third-party integration modifying MQTT config
Script or automation fighting ConfigGuardian

Resolution:

Identify who/what is making changes
If legitimate: Update golden config through Console
If unauthorized: Investigate access, review camera audit logs

Memory Slowly Increasing

Pattern: RES_002 alerts appearing on same device over days/weeks

Likely causes:

Memory leak in ACAP
Increasing number of concurrent operations
Camera firmware issue

Resolution:

Check ACAP version - upgrade if outdated
Review skill configuration - reduce if excessive
Schedule regular ACAP restart as workaround
Report to support with device diagnostics

Certificate Expiry Wave

Pattern: Multiple SEC_003 alerts across fleet

Likely causes:

Devices provisioned at same time, certs expiring together
Certificate rotation not running

Resolution:

Check Console → Settings → Certificates → Auto-rotation status
If disabled, enable auto-rotation
Manually rotate affected devices if urgent
Review certificate lifecycle policy

Diagnostic Commands

From Anava Console

Action	Path
View device logs	Devices → [Device] → Logs
Download diagnostics	Devices → [Device] → Actions → Download Diagnostics
Check connectivity	Devices → [Device] → Connection Status
View alert history	Events → Filter by device
Restart ACAP	Devices → [Device] → Actions → Restart Application

From Camera Web UI

Action	Path
View ACAP status	Apps → Anava Agent
Check ACAP logs	System → Logs → Application Log
View network settings	System → Network
Check MQTT config	Settings → MQTT (if accessible)

Direct API Queries

# Get device status
curl -H "Authorization: Bearer $TOKEN" \
  "https://api.anava.ai/v1/devices/ACCC8EF12345/status"

# Get recent alerts for device
curl -H "Authorization: Bearer $TOKEN" \
  "https://api.anava.ai/v1/devices/ACCC8EF12345/alerts?hours=24"

# Get alert details
curl -H "Authorization: Bearer $TOKEN" \
  "https://api.anava.ai/v1/alerts/{alertId}"

Alert Noise Reduction

Too Many Alerts?

Adjust thresholds
```
Console → Settings → Alert Rules → Thresholds
```
- Increase latency threshold if false positives
- Adjust memory warning level for known high-usage devices

Use alert rules

rule:
  name: "Suppress INFO on test devices"
  condition:
    group: "test-lab"
    severity: "INFO"
  action:
    suppress: true

Configure quiet hours
```
Console → Settings → Notifications → Quiet Hours
```
- Suppress non-critical alerts during off-hours

Not Enough Alerts?

Check notification settings
- Verify email/Slack channels configured
- Check spam folders
Verify alert rules
- Ensure no rules suppressing needed alerts

Test alert pipeline

Console → Devices → [Device] → Actions → Send Test Alert

Escalation Guide

When to Escalate to Anava Support

Scenario	Priority	Information to Include
Security incident (SEC_002)	P1	Device ID, timestamps, network logs
Fleet-wide outage	P1	Affected device list, network topology
Repeated ACAP crashes	P2	Device diagnostics bundle, ACAP version
Unexplained alerts	P3	Alert IDs, patterns observed

How to Collect Diagnostics

Via Console:

Devices → [Device] → Actions → Download Diagnostics

Via API:

curl -H "Authorization: Bearer $TOKEN" \
  "https://api.anava.ai/v1/devices/ACCC8EF12345/diagnostics" \
  -o diagnostics.zip

Include:
- Device ID and firmware version
- ACAP version
- Timestamps of issues
- Steps to reproduce (if applicable)

Overview - Alert system introduction
Alert Codes Reference - Complete code list
ConfigGuardian Alerts - Configuration-specific alerts

Last updated: December 2025

Critical Alert Playbooks​

Device Offline (CONN_001)​

Certificate Validation Failed (SEC_001)​

Unauthorized Broker Attempt (SEC_002)​

Common Scenarios​

Cluster of Devices Go Offline​

Repeated Configuration Conflicts (CFG_003)​

Memory Slowly Increasing​

Certificate Expiry Wave​

Diagnostic Commands​

From Anava Console​

From Camera Web UI​

Direct API Queries​

Alert Noise Reduction​

Too Many Alerts?​

Not Enough Alerts?​

Escalation Guide​

When to Escalate to Anava Support​

How to Collect Diagnostics​

Related Documentation​

Critical Alert Playbooks

Device Offline (CONN_001)

Certificate Validation Failed (SEC_001)

Unauthorized Broker Attempt (SEC_002)

Common Scenarios

Cluster of Devices Go Offline

Repeated Configuration Conflicts (CFG_003)

Memory Slowly Increasing

Certificate Expiry Wave

Diagnostic Commands

From Anava Console

From Camera Web UI

Direct API Queries

Alert Noise Reduction

Too Many Alerts?

Not Enough Alerts?

Escalation Guide

When to Escalate to Anava Support

How to Collect Diagnostics

Related Documentation