Replicating VM’s and actually using SRM to do its thing!

Replication of VM’s can be kicked off from either the SRM interface or by right-clicking the VM and selecting vSphere Replication Options and choosing to replicate. Choose the RPO (default is 4 hours), for this demo I went with 15 minutes (the lowest possible) so I could see what all goes on. Upon clicking through the dialog, you’ll get presented with the following information message (notice the Recent Tasks down below). VM’s must be powered on for the replication to begin.

Repl1

To actually see the progress of replication, see the following box. vSphere Replication first does an initial full sync, meaning it does a full pass of the VM and thereafter does changed blocks only – making the initial data copy take the longest to finish. In the below box, you’ll see that I am replicating from the protected site to the recovery site.

Repl3_check_repl_status

Successful replication of a VM shows up as a green “Success”. Anything green’s good, yeah?!

Now we turn our attention to creating Protection Groups, Recovery Plans, testing the plans and other bits and pieces.

Protection groups:

A protection groups is a collection of VM’s and template that can be protected by using SRM. A recovery plan can contain multiple protection groups. The plan contains info on how SRM will recover VM’s in the protection groups. VM’s need to be in protection groups so SRM can add them to the inventory of the vCenter server at the recovery site.

A protection group cannot contain a mix of VM’s configured for array-based replication and for vSphere replication. A recovery plan can contain array-based protection groups and vSphere replication groups in the same plan.

Array-based protection groups: For array-based groups, VM’s in specific datastores can be protected. SRM will put those VM’s into a datastore group. This group will contain all files of protected VM’s. If additional VM’s need to be protected, their vmdk’s can be SvMotioned to the datastore that’s included in SRM’s datastore group. One important thing to note is if a VM in unprotected and its files are left in the SRM-protected datastore, then recovery for all VM’s in the datastore group fails!!

Because I don’t have an array(s) to play around with at home, I’ll talk more about vSphere replication groups.

vSphere Replication protection groups: Protection groups can be organised in folders. It’s best to have unique names for protection groups because in some views in the SRM interface, it’s hard to tell them apart. So

–        Name the groups uniquely

–        Don’t place them in folders so you are sure of the uniqueness of names

I created a protection group called FirstprotectedVMs with a couple of VM’s included in the group. You can only include VM’s that have been replicated at least once to the recovery site.

Recovery plans:

Recovery plans are automated run books. They contain info about every stage of the recovery process, the order in which recovered VM’s and powered on, the IP’s that recovered VM’s use and all that info. These plans are customizable as the situation dictates.

Recovery plans contain one or more protection groups. One protection group can be included in more than one recovery plan. As an example, one plan can be to handle the planned migration of services from the protected site to the recovery site and another plan to handle power outages or catastrophic events.

Recovery plans can be tested without affecting services at either the protected site or the recovery site. Recovery plans also allow for non-critical VM’s be suspended while the plan is being executed. Only one plan can be run at one time to recover a particular protection group and as such VM’s in one protection group can be recovered. If multiple plans contain the same protection group, only one of them will actually recover the VM’s. Recovery plans allow for testing that ensures VM’s the plan protects can be recovered and started up at the recovery site.

Important: Recovery plans suspend local machines for both testing the plan and actual recovery. Apart from this, the plans are not disruptive. Recovery plans allow for the admin to decide which VM’s are deemed not critical. This is good because a blanket “shutdown” VM’s at the recovery site will shutdown essential VM’s such as domain controllers, database servers and any other services that are essential to a customer. If vSphere Replication is used for replication, while testing a recovery plan, protected VM’s can still be replicated. The VRS will create redo logs that will be removed after cleanup has been performed when the test is complete.

Recovery plans can be run as often as needed and can be cancelled at any time. Permissions to run recovery tests are different to be able to test the plan. Both permissions are set separately.

Step-by-step Recovery plan and a Test of the plan

Create a protection group and include the VM’s you want moved over. Let’s use the one I initially created, FirstprotectedVMs. Choose the recovery site in the first dialog box

RecoveryPlan1

Create a new recovery plan and include that protection group

RecoveryPlan2

Chose the appropriate Recovery Network. You want the migrated VM to have network connectivity (for example, a stretched VLAN) at the recovery site.

RecoveryPlan3

If you were to click on the Test Network, selecting “Auto” will create a temporary, isolated network for the recovered VM’s at the recovery site.

RecoveryPlan7_Network_used_recovered_VM RecoveryPlan8_vSwitch_on_host

When a recovery plan is tested, a test network is created in the recovery site. VM’s can use this Test Network without having to run on the network VM’s at the recovery site run on. This avoids the potential situation where VM’s at the recovery site can be disrupted. The Test network is on its own vSwitch. Care must be taken to recover dependent VM’s to the same network. A front-end Web server and a database back-end server is a good example. The Test network is chosen by using the Auto option when configuring network settings in the recovery plan. This avoids having to change network settings of VM’s.

Give the plan a name and you are set to perform your migration.

Going up to the Status tab brings us to a real-time view of what’s going on.

RecoveryPlan5_test_run RecoveryPlan5_power_on_VMs

A series of events at the protected site (let’s call this the source) show up in the site’s vCenter server. Basically, a round of replication is kicked off to capture the latest changes (changed blocks only)

Status check:

RecoveryPlan9_last_stage

Test complete message:

RecoveryPlan10_TEST_COMPLETE Basically, testing of the recovered VM’s can now be performed to check for any errors/warnings.

SRM does a great job of cleaning up after itself. When the test is complete, run the Cleanup operation (this removes the recovered VM’s, the temporary vSwitch and the portgroups that get created as a result of the Test.

Cleanup1

Clicking Next will bring you to the Review screen

Cleanup2

Status check

Cleanup3_

Recent tasks over on the ESXi hosts shows the temporary networking being removed

Cleanup4_port_groups_removed

Final message

RecoveryPlan11_ALL_DONE

Planned Migration

Sometimes a planned migration is necessary – When my employer built their cloud and had their first customer, the DR site wasn’t equipped to handle all the VM’s they wanted at that site. Instead the production site was chosen to house them for the time being and the plan was to use SRM at a later date to move the DR VM’s from prod to where they should’ve belonged.

During a planned migration, SRM synchronizes the VM’s at the recovery site with the VM’s at the protected site so any recent changes can be captured (it’s best to have the VM’s not performing work at this time so replication time is minimised). After this round of replication is over, SRM will stop replication for those VM’s. It’s advisable to detach devices such as DVD and floppy drives, before attempting a migration, otherwise SRM will throw an error.

Leave a Comment

Your email address will not be published.