Replication of VM’s can be kicked off from either the SRM interface or by right-clicking the VM and selecting vSphere Replication Options and choosing to replicate. Choose the RPO (default is 4 hours), for this demo I went with 15 minutes (the lowest possible) so I could see what all goes on. Upon clicking through the dialog, you’ll get presented with the following information message (notice the Recent Tasks down below). VM’s must be powered on for the replication to begin.
To actually see the progress of replication, see the following box. vSphere Replication first does an initial full sync, meaning it does a full pass of the VM and thereafter does changed blocks only – making the initial data copy take the longest to finish. In the below box, you’ll see that I am replicating from the protected site to the recovery site.
Successful replication of a VM shows up as a green “Success”. Anything green’s good, yeah?!
Now we turn our attention to creating Protection Groups, Recovery Plans, testing the plans and other bits and pieces.
A protection groups is a collection of VM’s and template that can be protected by using SRM. A recovery plan can contain multiple protection groups. The plan contains info on how SRM will recover VM’s in the protection groups. VM’s need to be in protection groups so SRM can add them to the inventory of the vCenter server at the recovery site.
A protection group cannot contain a mix of VM’s configured for array-based replication and for vSphere replication. A recovery plan can contain array-based protection groups and vSphere replication groups in the same plan.
Array-based protection groups: For array-based groups, VM’s in specific datastores can be protected. SRM will put those VM’s into a datastore group. This group will contain all files of protected VM’s. If additional VM’s need to be protected, their vmdk’s can be SvMotioned to the datastore that’s included in SRM’s datastore group. One important thing to note is if a VM in unprotected and its files are left in the SRM-protected datastore, then recovery for all VM’s in the datastore group fails!!
Because I don’t have an array(s) to play around with at home, I’ll talk more about vSphere replication groups.
vSphere Replication protection groups: Protection groups can be organised in folders. It’s best to have unique names for protection groups because in some views in the SRM interface, it’s hard to tell them apart. So
– Name the groups uniquely
– Don’t place them in folders so you are sure of the uniqueness of names
I created a protection group called FirstprotectedVMs with a couple of VM’s included in the group. You can only include VM’s that have been replicated at least once to the recovery site.
Recovery plans are automated run books. They contain info about every stage of the recovery process, the order in which recovered VM’s and powered on, the IP’s that recovered VM’s use and all that info. These plans are customizable as the situation dictates.
Recovery plans contain one or more protection groups. One protection group can be included in more than one recovery plan. As an example, one plan can be to handle the planned migration of services from the protected site to the recovery site and another plan to handle power outages or catastrophic events.
Recovery plans can be tested without affecting services at either the protected site or the recovery site. Recovery plans also allow for non-critical VM’s be suspended while the plan is being executed. Only one plan can be run at one time to recover a particular protection group and as such VM’s in one protection group can be recovered. If multiple plans contain the same protection group, only one of them will actually recover the VM’s. Recovery plans allow for testing that ensures VM’s the plan protects can be recovered and started up at the recovery site.
Important: Recovery plans suspend local machines for both testing the plan and actual recovery. Apart from this, the plans are not disruptive. Recovery plans allow for the admin to decide which VM’s are deemed not critical. This is good because a blanket “shutdown” VM’s at the recovery site will shutdown essential VM’s such as domain controllers, database servers and any other services that are essential to a customer. If vSphere Replication is used for replication, while testing a recovery plan, protected VM’s can still be replicated. The VRS will create redo logs that will be removed after cleanup has been performed when the test is complete.
Recovery plans can be run as often as needed and can be cancelled at any time. Permissions to run recovery tests are different to be able to test the plan. Both permissions are set separately.
Step-by-step Recovery plan and a Test of the plan
Create a protection group and include the VM’s you want moved over. Let’s use the one I initially created, FirstprotectedVMs. Choose the recovery site in the first dialog box
Create a new recovery plan and include that protection group
Chose the appropriate Recovery Network. You want the migrated VM to have network connectivity (for example, a stretched VLAN) at the recovery site.
If you were to click on the Test Network, selecting “Auto” will create a temporary, isolated network for the recovered VM’s at the recovery site.
When a recovery plan is tested, a test network is created in the recovery site. VM’s can use this Test Network without having to run on the network VM’s at the recovery site run on. This avoids the potential situation where VM’s at the recovery site can be disrupted. The Test network is on its own vSwitch. Care must be taken to recover dependent VM’s to the same network. A front-end Web server and a database back-end server is a good example. The Test network is chosen by using the Auto option when configuring network settings in the recovery plan. This avoids having to change network settings of VM’s.
Give the plan a name and you are set to perform your migration.
Going up to the Status tab brings us to a real-time view of what’s going on.
A series of events at the protected site (let’s call this the source) show up in the site’s vCenter server. Basically, a round of replication is kicked off to capture the latest changes (changed blocks only)
Test complete message:
SRM does a great job of cleaning up after itself. When the test is complete, run the Cleanup operation (this removes the recovered VM’s, the temporary vSwitch and the portgroups that get created as a result of the Test.
Clicking Next will bring you to the Review screen
Recent tasks over on the ESXi hosts shows the temporary networking being removed
Sometimes a planned migration is necessary – When my employer built their cloud and had their first customer, the DR site wasn’t equipped to handle all the VM’s they wanted at that site. Instead the production site was chosen to house them for the time being and the plan was to use SRM at a later date to move the DR VM’s from prod to where they should’ve belonged.
During a planned migration, SRM synchronizes the VM’s at the recovery site with the VM’s at the protected site so any recent changes can be captured (it’s best to have the VM’s not performing work at this time so replication time is minimised). After this round of replication is over, SRM will stop replication for those VM’s. It’s advisable to detach devices such as DVD and floppy drives, before attempting a migration, otherwise SRM will throw an error.