Disaster Recovery Test Checklist: What to Capture Before You Start

This article is a practical checklist of the evidence to capture before you click Restore. A disaster recovery test should not begin with someone clicking Restore. It should begin with evidence. Here is a common way these tests go sideways. The team books a window, restores a VM, watches it boot, and calls the test a success. Please see Upgrading Azure AD Connect to Microsoft Entra Connect Sync, and PCI Driver Error: Connection to Deployment Share could not be made.

A week later, management asks which restore point was used, whether it met the recovery objective, and whether the application actually worked. The team did not capture the answers, because everyone was focused on the restore itself.

A disaster recovery test is not only about whether a server boots. It is about proving what existed before the test, what you selected to recover, what changed during the test, and whether the result met the recovery objective.

Please see Steps to Fix the “Invalid OS GUID” Error in MDT, Faulting SecHealthUI: Resolve Windows Defender Notification, and how to Update Your AMD Radeon Display Adapter Driver.

Key Takeaways

Capture evidence before the restore begins, not after the test is over. A VM that boots is not proof that the application recovered. Validate the application and its dependencies.

Confirm network isolation before you power on any recovered system.
Document which restore point you selected, where it came from, and why.
Verify credentials, encryption keys, and admin access before the test starts.
Sanitize screenshots, or recreate them in a lab, before you share or publish them.

Why Capture Evidence Before a Disaster Recovery Test?

A baseline tells you what the environment looked like before you touched it. Without it, you cannot tell whether a problem you hit during the test was caused by the recovery or was already present.

Evidence proves the backup existed and was healthy before the test. If a job failed the night before and nobody noticed, you want to know that before you start, not after a failed restore. Captured state supports audit and management review. When someone asks whether the recovery met the objective, you answer with records instead of memory.

A baseline makes rollback easier. If you need to undo a change, you already know what the original configuration was. Evidence reduces confusion during troubleshooting. When three people are staring at a recovered system that will not start a service, the pre-test notes tell you what normal looked like.

Capturing the environment first also helps you avoid accidental production impact. Writing down the production IP addresses, hostnames, and DNS records before the test forces you to plan around them.

Finally, a clean set of evidence becomes a reusable test record. The next test starts from a documented baseline instead of a blank page.

Please see How to update Proxmox VE to the latest version 9.2.3 today, Secure Boot 2023 Compliance Across WinPE, MDT, WDS, and ADK: Boot Chain Alignment and PXE Validation [Final Part], and how to Fix an error occurred while attempting to start selected VM on Hyper-V.

Capture the Environment Inventory

Before the test, capture the current state of the environment you are protecting. Screenshot or export the following:

Hypervisor or cloud platform dashboard, for a top-level view of the environment as it stood before the test.
Cluster, host, or node summary.
VM inventory for the protected workloads.
VM configuration: vCPU count, memory, disk layout and sizes, and network adapters.
Storage or datastore layout, including where the protected VMs live.
Restore target capacity, if the test will write recovered systems to a datastore, volume, cluster, or cloud storage target.
Network names, VLANs, subnets, port groups, bridges, or virtual switches.
Critical application dependencies, so the recovery order is clear later.
Current power state of each protected workload.

The labels differ by platform. On VMware vSphere you are looking at clusters, hosts, datastores, and port groups. On Hyper-V you have hosts, virtual switches, and VHDX files. On Proxmox VE you have nodes, storage, and Linux bridges. On Nutanix AHV you have a cluster, hosts, and AHV networks. For cloud workloads, capture the equivalent: instance configuration, attached storage, and the virtual network or subnet.

Capture the view that matches your platform. The goal is a clear record of what existed, not a product tutorial.

Please see Failed to Upgrade VIHR Component: Failed to open deployer Service Management Port, The Backup Was Safe: The Data Center Was not: A Real-World Lesson About Hidden Data Center Risks and Governance Failures, and Enterprise Tape Library Administration: Control Path, Firmware, Media Management and Tape Operations.

Capture the Backup Platform State

Now capture the state of the backup system itself. This is the evidence that your recovery source was healthy before you started.

Backup server or console dashboard.
The list of protected workloads.
Backup job configuration for the workloads in scope.
The last successful run for each job.
Recent job warnings and errors, so you know the history.
The restore point list for each workload you plan to recover.
Repository capacity and health, so you know the backup source is usable and not already under capacity or retention pressure.
Immutability status, if your repository uses it.
Encryption status, if backups are encrypted.
Backup copy or offsite copy status, so you know which copies exist.
A description of which credentials or service accounts the backup system uses, without exposing the passwords.

Veeam is a common example here, but the same evidence applies to any backup platform. Whatever product you use, you want proof that the job ran, the restore points exist, the repository is healthy, and any immutability or encryption is in the expected state.

Capture the job history or session history for the workload being tested.

Capture the restore point selection screen before choosing the restore point.

The image should show the available restore points for the workload, including date and time. This gives the recovery team evidence that the selected restore point existed before the test.

Capture the immutability or retention lock view only if the backup platform or repository exposes it safely.

This may come from the backup console, hardened repository settings, object storage bucket settings, or appliance dashboard. Do not expose bucket names, access keys, tenant IDs, or repository paths.

Please see How to Repair a Corrupt SQL Server Database Without Data Loss, Azure Application Gateway: Practical Configuration Guide, and Azure Managing Subscriptions with PowerShell: From Login-AzAccount to Resource Control and Private Endpoint Verification for Azure File Share”.

Document the Restore Point Selection

The restore point you choose decides what you actually recover. Record it precisely:

Restore point date and time.
The backup job that produced it.
The workload name.
Application consistency status, if known, such as application consistent or crash consistent.
Where the restore point lives: local, offsite, immutable, a backup copy, replicated, or archived.
Your RPO target compared with the age of the selected restore point.

After the test, you should be able to answer one question without hesitation: did we recover from the restore point we intended to use? If you cannot answer that, the test result is hard to trust.

Capture the details pane, restore wizard summary, or restore point properties view that shows the selected restore point date, time, source job, and consistency type if available.

This is especially useful when comparing the selected restore point against the RPO.

Document Recovery Objectives and Success Criteria

Write down what success looks like before the test, not after.

RTO: how long the recovery is allowed to take.
RPO: how much data loss is acceptable.
Systems in scope.
Systems out of scope.
Expected recovery order.
Expected validation steps.
Who signs off on success.
What counts as a failed test.

Example success criteria

The VM boots, and Services start.

Application login works.
The data timestamp is acceptable against the RPO.
DNS resolves correctly inside the test environment.
Users, or designated test users, can reach the restored service.
No production network conflict occurs.

A disaster recovery test is not successful just because a VM powers on. A booted VM with a dead application, a stale data set, or a broken dependency is a failed test that looks like a passing one. Decide in advance which of the criteria above must be met.

Please see Azure Arc for SQL Server PAYG: Installation, Connectivity Requirements and Operational Best Practices, how to Assign a Public IP to Azure Virtual Machine (VM), and how to Upgrade Veeam ONE to 13.0.2.6723 to Address Security Fixes.

Capture Application and Infrastructure Dependencies

Many disaster recovery tests fail because the restored workload depends on something that is not in the test. The server itself is fine. What it needs to talk to is missing. Map the dependencies before the test:

Active Directory
DNS
DHCP
NTP
Certificate services
File shares
SQL or other database servers
Application servers
Web servers
Load balancers
License servers
SMTP relay
External APIs or integrations
Firewall and NAT rules

A restored application server may boot successfully and still fail if it cannot reach DNS, authenticate against Active Directory, connect to its database server, or check out a license.

Time matters too. In Active Directory environments, Kerberos authentication depends on synchronized clocks. If the test network cannot reach a valid time source and the recovered systems drift too far, authentication can fail.

Decide how each dependency will be satisfied in the test: recovered alongside the workload, stubbed with a test service, or provided by an isolated copy.

Capture a sanitized DNS view or a lab version of the required records.

Capture only the specific firewall rules needed for the test, not the entire firewall rule base.

Use a lab firewall, sanitized export, or filtered rule view where possible. The screenshot should prove that required application ports and isolation rules were reviewed before the test.

Please see Fix Vulnerable Veeam Backup and Replication 13.0.1.2067 and Earlier, how to upgrade Veeam One from v12 to v13, and Integrate Trellix ePolicy Orchestrator with a Syslog Server.

Capture the Test Network Design

Network isolation is one of the most important pre-test items. Get it wrong and the test can affect production. Review this section carefully before any recovered system is powered on. Capture:

Test network name.
VLAN or subnet for the test.
The IP addressing plan for recovered systems.
Gateway behavior: whether the test network has a gateway, and where it routes.
The DNS override strategy for the test.
Firewall isolation rules.
NAT rules, if the test needs limited outbound access.
Whether restored systems can reach production.
Whether production systems can reach restored systems.
How you will avoid duplicate hostnames and duplicate IP addresses.

Keep the test isolated

Do not accidentally connect restored systems to the production network unless the recovery plan explicitly requires it and the risks have been reviewed and approved. An isolated test network, sometimes called a bubble or fenced network, lets recovered systems run with their original IP addresses and hostnames without colliding with production.

Why isolation matters

Duplicate IP addresses cause immediate problems. If a recovered server uses the same IP address as a live production server, the result is an address conflict and unreliable connectivity. Duplicate hostnames cause confusion and can break authentication and certificates.

A recovered domain controller deserves extra caution. If it is restored incorrectly or allowed to communicate with production domain controllers during a test, you can create Active Directory replication or recovery problems. Keep domain controller recovery tests isolated unless you are following a validated forest recovery or domain controller recovery procedure.

Production DNS registration is another trap. A recovered system that can reach production DNS may create conflicting or misleading records and send clients to the wrong host. Plan a DNS override inside the test network so name resolution stays contained.

Capture the test network definition from the hypervisor, virtual switch, cloud network, recovery lab, or network virtualization platform. The screenshot should show that recovered workloads will connect to an isolated or fenced test network instead of the production network.

Capture the VLAN, subnet, or test network address plan from the network console, hypervisor networking view, or lab documentation.

Capture the rule that blocks recovered systems from reaching production, or the rule that allows only tightly controlled traffic required for validation.

This should be a narrow, sanitized view. Do not show the full firewall policy, public IP addresses, VPN details, or sensitive network objects.

Please see Veeam Backup and Replication: PowerShell must be Remote Signed, how to Prevent Automatic Driver Updates in Windows and Xen-Orchestra, and how to Switch from IP Addresses to DNS for Backup Infrastructure in VBR.

Validate Access Before the Test

A recovery test can fail before it starts if the one person with the encryption key, the backup console, or the firewall access is unavailable. Confirm access ahead of time. Document who has access to:

The backup console.
The hypervisor or cloud console.
Domain admin or break-glass credentials.
Encryption keys, or the key custodian.
Repository access.
Firewall or network changes.
DNS changes.
Application admin consoles.
Monitoring tools.

Then test that the access actually works before the window opens. A documented account that nobody can log into is not access.

Security note: do not screenshot or publish passwords, tokens, secrets, license keys, customer names, public IP addresses, private infrastructure details, or sensitive diagrams unless they are fully redacted. If a screenshot cannot be safely redacted, recreate it in a lab or omit it. The evidence you keep should prove the test happened correctly, not expose how to break into the environment.

What Not to Capture

Some evidence should not be included in a public article, shared report, or customer-facing document unless it is fully redacted. Do not publish the following without sanitation:

Password fields
API tokens or secrets
License keys
Public IP addresses
Customer names
Internal domain names
Full firewall rule bases
Backup repository paths
Service account names
Cloud subscription IDs or tenant IDs

If a screenshot cannot be safely sanitized, recreate the view in a lab or describe the evidence in text.

Create an Evidence Folder

Decide where the evidence goes before the test, not while you are scrambling during it. A simple, consistent folder structure works:

00-Plan
01-Before-Test
02-Backup-Evidence
03-Network-Evidence
04-During-Test
05-Validation
06-After-Test
07-Lessons-Learned

Name files consistently so they sort and make sense later. A date prefix plus a short description works well:

YYYY-MM-DD-before-backup-dashboard.png
YYYY-MM-DD-restore-point-selected-fileserver01.png
YYYY-MM-DD-test-network-firewall-rules.png
YYYY-MM-DD-application-login-validation.png

Consistent names turn a pile of screenshots into a usable record. Six months from now, the file name should tell you what it shows without opening it.

Please see How to remove a Repository from Veeam Backup and Replication, how to Update Veeam Backup & Replication to Build 13.0.1.2067, and how to Fix broken Repository Path in Veeam Scale-Out Backup Repository.

Final Pre-Test Checklist

Use this checklist before you click Restore. It stands on its own.

Item	Evidence to Capture	Complete
Recovery scope approved	Written scope listing systems in and out, approved by the owner	[ ]
Success criteria documented	RTO, RPO, and pass or fail criteria recorded	[ ]
Backup job completed successfully	Screenshot of the last successful run for each workload	[ ]
Restore point selected	Restore point date, time, and source job recorded	[ ]
Backup repository health checked	Repository status, capacity, and warning state captured	[ ]
Restore target capacity checked	Datastore, volume, cluster, or cloud target capacity verified	[ ]
Immutability verified if applicable	Immutability or lock status captured	[ ]
Encryption key access confirmed	Key custodian available and key access tested	[ ]
Test network prepared	Isolated network and VLAN or subnet defined and verified	[ ]
DNS plan documented	DNS override or test DNS strategy written down	[ ]
Firewall rules reviewed	Isolation rules confirmed with no path to production	[ ]
Credentials verified	Console, admin, and break-glass access tested	[ ]
Application dependencies mapped	Dependency list and recovery order documented	[ ]
Stakeholders notified	Notification sent and window confirmed	[ ]
Rollback plan documented	Steps to tear down the test environment recorded	[ ]
Evidence folder created	Folder structure created and ready to fill	[ ]

Conclusion

A good disaster recovery test produces confidence, not just screenshots. The screenshots and notes are proof that the team understood the starting state, followed a plan, protected production, and validated the result. They turn a one-time exercise into something you can repeat and improve.

Capture evidence before the test, validate the application after the restore, and document what you learned before the next outage makes the test real.

I hope you found this blog post on “Disaster Recovery Test Checklist: What to Capture Before You Start” very useful. Please feel free to leave a comment below.