Skip to main content

Disaster Recovery

Anava maintains comprehensive disaster recovery procedures to ensure platform resilience and business continuity.

Overview

Our Disaster Recovery (DR) Plan addresses SOC2 Trust Services Criteria:

  • CC7.4 - Detection, monitoring, and response to security incidents
  • CC7.5 - Identification and recovery from identified incidents

Recovery Objectives

MetricTargetDescription
RTO4 hoursMaximum acceptable downtime for critical services
RPO1 hourMaximum acceptable data loss
MTPD24 hoursMaximum tolerable period of disruption

Disaster Recovery Flow

The following diagram illustrates the disaster recovery decision process:

Disaster recovery decision flow from incident detection to resolution

Key Capabilities

Multi-Layer Redundancy

  • Firebase services with automatic failover
  • MQTT broker with container-based recovery
  • Firestore point-in-time recovery (PITR)
  • Versioned Terraform state backups in GCS
  • Multi-region Cloud Functions deployment

Infrastructure Recovery

Infrastructure recovery map showing sources and target services

Monitoring and Alerting

  • Cloud Monitoring for infrastructure health
  • Automated uptime checks every 60 seconds
  • Connection anomaly detection via MQTT metrics
  • Security event correlation with Cloud Logging
  • PagerDuty integration for on-call alerting

Communication

  • Defined escalation procedures
  • Customer notification templates
  • Status page integration at status.anava.ai

Recovery Procedures by Component

ComponentRecovery MethodRTORPO
Cloud FunctionsRedeploy from CI/CD15 min0
FirestorePoint-in-time recovery2 hours1 hour
MQTT BrokerContainer restart/recreate30 min0
Firebase HostingRedeploy from Git10 min0
Terraform StateRestore from GCS backup1 hourDaily

Detailed Procedures

Internal Documentation

Detailed recovery runbooks, specific commands, and escalation matrices are available in the Internal Documentation.

Team members can sign in with their @anava.ai email to access these procedures.