A nested lab lets you play with just about any vSphere component and nested VSAN has not been any different. I wanted to check out stretched VSAN and decided to build my lab accordingly. I followed the config guide to build this up.
- Single cluster of 4 nested hosts called Collins Street
- Two of these nested ESXi hosts in a site I called Production
- The other two nested ESXi hosts in a site I called DR
- A VSAN witness host running ESXi as a VM for which I downloaded a pre-cooked ova from here (does not require a real flash drive, a tagged one works)
- VM called SAN running Starwind iSCSI SAN to provide storage to the hosts in the Production site
- VM called SAN1 running Starwind iSCSI SAN to provide storage to the hosts in the DR site
- Flat network 10.0.0.0/24 in use, this’ll satisfy the VSAN recommendation of a stretched L2 network for VMs and VSAN between the primary and secondary sites
- A domain controller running on 2008 r2 to provide directory services and DNS
- Single vCenter Server running in the Production site
There are a number of requirements and recommendations, I’ll list some of them as one-liners here:
- High bandwidth/low latency link between primary and secondary sites
- Lower bandwidth/higher latency link between the site where witness host is located (let’s call it Witness site) to the other sites
- Maximum 3 sites with only one VSAN Witness per stretched cluster, the Witness can be a host or a VM
- Each site is a VSAN domain
- 6.0 u1 required for both vCenter and ESXi, these provide the Advanced Edition of VSAN which allows for a stretched VSAN cluster. Standard versions of VSAN don’t allow for a stretched VSAN cluster
- DRS desirable, no, make it required in my opinion, so as to provide automatic placement and migration (if placed in wrong site, you don’t want data being read/written over the wire, data-locality is what I’m getting at)
- Both flash and spinning disks supported in a hybrid model
Things to know:
- Machines running in one site will have a second copy in the other site. There’s no VM data in the Witness site
- In the case of full site failure, HA will restart machines at the other site
- The Witness host needs 2 disks, one flash and one spinning rust. When deploying the Witness host as an appliance, ensure you choose the right size of the machine. I chose ‘tiny’ which meant 3 disks (one 8GB disk for ESXi, a 10GB disk tagged as ‘flash’ and the 3rd a 15GB disk)
- Important! Being a nested setup, you’ll run into the issue of the stretched wizard throwing a warning saying the Witness host not having the right hybrid disk group. Provided you’ve ensured you’ve chosen the right deployment size and not mucked with the disks it provisions for itself, it is safe to accept the warning and keep going forward
- Important! Check the health status of the stretched cluster after finishing setup so you can identify issues before actually playing with this
How I did it:
1. Enabled VSAN on the 4 nested hosts in my Collins Street cluster. Set my disks to automatically be claimed by VSAN
2. Added 2 disks to each virtual host, tagged one disk on each host as SSD so as to mimic a flash tier and a capacity tier. For sanity’s sake, the SSD disk was 10GB in size while the capacity disk was 32GB (number just so it stood out). Note the disks were added automatically because I had my cluster set to claim disks automatically. You can change it to manual if you have disks for your usual VMFS datastores.
3. Created a 5th nested host in a new datacenter, assigned it 50GB of local disk and deployed from ovf the Witness appliance I downloaded from the link I posted in the beginning. No special config was needed on the nested host. Next, I added the appliance as another host. It is important to note this host and appliance must sit outside of the VSAN cluster. Also note the colour of the appliance host is blue, see below:
4. The appliance has a special config for one of its vmknics (the witnessPg portgroup), follow the config guide on what to do. Briefly, this port group needs VSAN enabled and its security settings dont need to be changed.
5. Then I went into creating the stretched cluster, the guide tells you how to do it. I’ll just point out what I noticed. As part of the config, the wizard needs to claim the hybrid disk group (in other words the 2 bigger disks in the appliance) for protecting VMs on the stretched cluster. I gather being a nested setup the wizard throws a warning saying the disk group cannot be claimed:
After mucking around for days (during which I redeployed the appliance half a dozen times!), I just hit next and finish to complete the setup. Note I said complete the setup, yep, just go ahead with it. It’s likely an oddity of a twice–nested Witness appliance.
6. I checked the health of the stretched cluster after the setup completed (note it took about 5 minutes for it to finish during which it carries out a series of steps – verification and config)
The controllers part threw the warning which is a given since this is a nested setup.
7. Finally, this is how the lab ended up looking like:
VSAN can run some basic checks to see if things will work when it’s put to use
A dummy VM is created on every host and immediately deleted. This checks that every host in the VSAN cluster is operational and fit for VSAN, IO can be written/read and networking works.
My nested lab passed this test:
Multicast performance check:
The cluster checks for multicast speeds of 125MB/s or better and I got a ‘failed’ result on this:
The last test available is a storage check in which a number of IO tests are performed, I did not run this knowing my rig would be brought down to its knees.
For lot more details and an exhaustive list of tests, check this whitepaper written by Cormac Hogan.
I wont run through any failure scenarios (though I know they are what this thing should be all about). I did test a site failure and the poor lab fell over. Briefly though, I’ll show that I have VM1 located on host vESXi60a.domain.local, note it’s got a replica disk on a host in the DR site and witness host knows about it too: