We recently performed a Hyper-V 2019 failover for a customer and wanted to share some insights.
- Non-clustered environment, simply 2x 2019 Hyper-V servers on the same domain, with one replicated VMs to the other.
- We had a few VM’s setup to replicate from Hypervisor-1 (HV-1) to Hypervisor-2 (HV-2)
- We had some maintenance to do on HV-1, so we executed a “planned failover” to HV-2. During the planned failover, we did NOT select the option to “reverse replication”
- The planned failover took a little longer than expected (a few days), so naturally the VM’s on Hypervisor-2 became the most current. We did not want to fail back at this point because there would be a big delta in data.
- We wanted to ensure that we had replication, so we hit “reverse replication” on Hypervisor-2. After about 10-20 minutes per VM, everything replicated from HV-2 to HV-1. During this process, it blessed the VM’s on HV-2 as the Primary and now the replicated VM’s on HV-1 as Replicas.
- Replication gave us a warning about recovery points and that the “time duration since last successful application consistent checkpoint has exceeded the warning limit for the virtual machine” – we presume this meant that it didn’t have the recovery point “snapshots” but replication appeared to work otherwise. To clear this error, we set the Replication Recovery Points on the VM to latest, then later set them to “additional hourly recovery points” – after the time window passed, the warning errors appeared to clear.
- In order to make HV-1 the primary again, we would do another planned failover from HV-2 to HV-1, then hit “reverse replication” on HV-1 to make those VM’s the primary and the VM’s on HV-2 the secondaries.
We hope this helps any admins out there working through some of Microsoft’s confusing terminology and documentation.
Questions or comments about this? Drop us a line!