Disaster recovery is one of those things everyone agrees is important—right up until it’s time to pay for it, test it, or document it properly. I’ve lost count of how many environments I’ve walked into where “DR exists” in theory, but no one has actually tested a failover in years.
When a real outage hits—whether it’s a ransomware incident, hardware failure, power outage, or a bad change—the truth comes out fast.
That’s where Azure Site Recovery (ASR) genuinely shines. When configured properly, it provides a practical, affordable, and well-integrated disaster recovery solution that doesn’t require maintaining a full second data centre.
This guide walks through how to configure Azure Site Recovery properly, with real-world advice on what matters, what often gets missed, and how to avoid the most common mistakes.
What Is Azure Site Recovery (And What It Isn’t)
Azure Site Recovery is Microsoft’s Disaster Recovery as a Service (DRaaS) offering. At a high level, it continuously replicates workloads from a primary location to a secondary location and allows you to fail over with minimal downtime and data loss.
Azure Site Recovery can protect:
- On-premises VMware virtual machines
- On-premises Hyper-V virtual machines
- Physical servers
- Azure virtual machines (cross-region)
What ASR is not:
- A backup replacement
- A one-click “set and forget” solution
- A substitute for proper testing and documentation
In production environments, ASR works best when it’s treated as part of a broader business continuity strategy, alongside Azure Backup, identity resilience, and documented recovery procedures.
Core Components of Azure Site Recovery (What You’re Actually Configuring)
Understanding the moving parts makes troubleshooting much easier later.
Source Environment
This is where your workloads currently run—on-premises or in Azure.
Recovery Services Vault
The control plane for Site Recovery. Policies, replication status, failover actions, and monitoring all live here.
Replication Policies
These define:
- Recovery Point Objective (RPO)
- Snapshot frequency
- App-consistent snapshot settings
Target Environment
The recovery location:
- Azure region (for on-prem → Azure or Azure → Azure)
- Recovery resource group, network, and storage
Failover & Failback
ASR handles orchestration, but you still own validation, application testing, and post-failover clean-up.
Step 1: Planning Before You Click Anything (This Is Where Most Projects Fail)
Before opening the Azure portal, you should answer a few uncomfortable questions:
- What workloads actually need DR?
- What is the acceptable RPO and RTO per system?
- Which systems have dependencies (AD, DNS, databases)?
- Who is authorised to initiate a failover?
From experience, not every VM needs Site Recovery. Protecting everything blindly increases cost, complexity, and recovery time.
Also ensure:
- You have appropriate Azure RBAC permissions
- Networking between source and Azure is planned (VPN or ExpressRoute if required)
- VM OS and disk configurations are supported by ASR
Step 2: Create a Recovery Services Vault
The Recovery Services Vault is the heart of Azure Site Recovery.
In the Azure portal:
- Go to Create a resource
- Select Management & Governance
- Choose Recovery Services vault
- Specify:
- Vault name
- Subscription
- Resource group
- Azure region
Tip from the field:
Place the vault in the same region as your recovery workloads, not your primary workloads. This avoids unnecessary latency and complexity during failover.
Step 3: Configure Site Recovery Infrastructure
Once the vault is created:
- Open the vault
- Select Site Recovery
- Click Replicate
You’ll be prompted to choose:
- Source environment (on-premises or Azure)
- Source location
- Target location (Azure region)
For Azure-to-Azure DR, this process is refreshingly straightforward compared to traditional DR solutions.
Step 4: Define Replication Policies (Don’t Just Accept Defaults)
Replication policies control how much data you lose and how consistent it is.
Key settings include:
- Recovery Point Retention (commonly 24 hours)
- Snapshot frequency
- Application-consistent snapshots (critical for SQL, Exchange, ERP systems)
In real environments:
- Tier-1 systems often justify app-consistent snapshots
- Tier-2 systems usually don’t
Be realistic—tighter RPOs increase replication traffic and cost.
Step 5: Enable Replication for Virtual Machines
Now you select the workloads you actually want to protect.
When enabling replication:
- Choose the correct replication policy
- Verify target resource group
- Confirm target virtual network and subnet
- Review disk inclusion (exclude unnecessary data disks)
Initial replication can take time, especially for large disks. Monitor progress under Jobs in the vault.
Step 6: Test Failover (If You Skip This, You Don’t Have DR)
I’ll say this bluntly:
If you haven’t tested failover, you do not have disaster recovery.
Azure Site Recovery makes testing relatively painless:
- Go to Replicated items
- Select a VM
- Choose Test failover
- Select a recovery point
- Validate the VM in the isolated test network
Test failovers do not impact production.
What you should validate:
- VM boots correctly
- Networking works as expected
- Applications start
- Authentication and dependencies function
Step 7: Planned vs Unplanned Failover (Know the Difference)
Planned Failover
Used during:
- Data centre migrations
- Planned maintenance
Ensures zero data loss by syncing changes first.
Unplanned Failover
Used during:
- Outages
- Ransomware events
- Hardware failure
May involve minimal data loss depending on last replication cycle.
From experience, runbooks matter here. The portal makes failover easy, but humans still need to make decisions under pressure.
Step 8: Failing Back to the Primary Site
Once the primary site is restored:
- Re-establish replication in reverse
- Perform a planned failover back
- Validate workloads
- Resume normal operations
Failback is often overlooked in DR planning—and that’s a mistake. It’s usually more complex than failing over.
Best Practices for Azure Site Recovery (Lessons Learned the Hard Way)
- Define RPO and RTO per workload, not globally
- Test failover at least twice per year
- Use tags to track DR-protected systems
- Monitor replication health with Azure Monitor
- Combine ASR with Azure Backup
- Document recovery steps outside the Azure portal
- Train multiple staff members—not just one
Final Thoughts: Azure Site Recovery Is Only as Good as Its Testing
Azure Site Recovery is one of Microsoft’s strongest infrastructure services when used correctly. It removes the need for expensive secondary data centres while still delivering enterprise-grade disaster recovery.
But like all DR solutions, it doesn’t save you by default—it saves you when it’s planned, tested, documented, and understood.
From a sysadmin perspective, ASR is less about clicking “Enable Replication” and more about owning the recovery outcome.
When disaster strikes, no one cares how elegant the configuration was—they care how fast systems come back online. Azure Site Recovery gives you the tools. The responsibility to use them properly is still ours.

From my early days on the helpdesk through roles as a service desk manager, systems administrator, and network engineer, I’ve spent more than 25 years in the IT world. As I transition into cyber security, my goal is to make tech a little less confusing by sharing what I’ve learned and helping others wherever I can.
