Hyper-V and SCVMM – tales from the trenches

HyperV, SCVMM, Windows servers hosting VMs – these are some of those things that strike disbelief, apprehension and that general feeling of – nah, it’s horrible, or it’s painful to maintain. Fortunately, it’s not all so bad.

Having worked on one of the largest deployments of HyperV and SCVMM in the APAC region I think I’ve got a fair perspective of how Microsoft’s virtualization technology behaves/works. I’ll divvy the topics up so things are clearer.

HyperV, the hypervisor

On it’s own, HyperV isn’t bad. You install the Windows Server 2012/2016 OS (the GUI-less version is recommended). You then enable the HyperV role, reboot and you’re ready to host your first VMs. I also believe you cannot go back and forth between GUI and GUI-less versions with 2016. Really easy to deploy your VMs.

The 2012 GUI-less (core that is) version lets you launch the various .msc’s whereas 2016 doesn’t (more on this later), from c:\Windows\System32 if you’re inclined to not use PowerShell (you’re doing it wrong if you aren’t using PoSh for most of your tasks). The 2016 version doesn’t allow these .msc’s to be launched because they just aren’t there. They want you to go with remote management (PowerShell/SCVMM) of your hosts and clusters (more on clusters in a bit).

Management: Assuming a 2012 host, you can manage it via the stock HyperV Manager console. Lets you do stuff such as power up/down VMs, attach/remove disk – the usual stuff. What you cannot do is a Storage Migration for instance from the GUI. After all, HyperV Manager is meant to manage single hosts. The GUI does let you add multiple hosts to manage them all but you switch across them just as you would with say the RDP Manager console. Of course with PowerShell you can do whatever you want to, migrations of any sort too.

Failover Clustering

This is one area where, I believe, HyperV trumps ESXi. Failover Clustering. This does not require SCVMM or any other management tool. You simply build 2 or more hosts, spin up a cluster like you would and bam, you have a high availability cluster ready. VMs can restart on surviving hosts in the clusters like you’d want them. So a failover cluster’s free to setup (read – not require expensive licensing).

But it’s not all so straightforward. You HAVE to make sure you have the right patches installed on all your hosts, don’t do this and you risk downtime. DON’T just install any patches that have the word ‘cluster’ in them. You may not need all of them or worse they may make things worse. Work with your Microsoft rep to ensure you get the right patches. If you aren’t big enough to have a rep, make sure you run the patches up in a test cluster before rolling out into production – save your weekends and sanity (not necessarily in that order). Microsoft have these lists floating around somewhere about critical patches for clusters. Well, turned out my deployments needed 3 more patches apart from the ones listed in those patch checklists. So much for checklists, right?! Mind you, there wasn’t anything special about my deployments, just run of the mill clusters (3 of them when the issue occurred) running a few hundred VMs with run of the mill load profiles. Microsoft clearly didn’t finish their homework.

Make sure you have configured your cluster’s networks properly. Make sure you have created a Live Migration network that’s ideally separate from your management network. If you’re new to this game, Live Migration or vMotion for that matter is a bursty type of traffic that tries to use up all available bandwidth. If you don’t have much to begin with, you may find your hosts disconnecting if the migration uses up all your network.

You cannot forget setting up SMB multichannel constraints for your cluster. You might go – whattha… This is actually telling your hosts to ensure they use a certain network only when talking SMB to other hosts in the cluster. When do hosts talk SMB? In a FC storage network, more often than you think actually. Failover Clustering in HyperV has this idea of coordinator nodes. Coordinator nodes are those that are owners of CSVs (or datastores if you will). If a VM wants to write to disk and is not on the coordinator node, the non-coordinator node will talk to the coordinator node over SMB, that is, over the network. And which network? Any available network. In busy environments, this quickly becomes a problem. So you need a way to tell every host in the cluster – hey dude, talk to this other host via this network only when you want to send SMB traffic to it. This is precisely what these constraints do. MUST PUT THESE IN..

CSV cache: Very nifty little feature. This thing lets you configure a part of the host’s RAM for caching reads. Doing so, avoids having the host go all the way to retrieve a read operation and feed back to the VM that asked for it. I saw considerable gains with enabling this feature. What’s great was it was a simple PowerShell command to enable it for all hosts in all clusters and utilization of the array dropped a fair bit – both with bandwidth and IOPS generated. Cool feature and free..

SCVMM

I disliked SCVMM from day dot. Maybe I came from the straightforwardness of vCenter’s C# client, I don’t know. SCVMM is clunky. The way it deploys networking isn’t the greatest though the idea is cool. The idea is you create these capabilities in SCVMM and you are able to selectively deploy the capabilities to your hosts. Now I won’t go into the details of setting up networks, for that is not the purpose of this post. These capabilities require WinRM working flawlessly between the hosts and VMM. This is where the problem lies. WinRM is the black bubonic plague of HyperV virtualization. And it’s not a matter of simply opening the right ports up and creating exceptions. I believe I had all the right bits in place for WinRM and it worked say 70% of the time, other times random operations on hosts would just fail – check the events and sure enough WinRM failed to connect. WinRM could not.. WinRM.. WinRM.. Just say your prayers to your favourite god and keep plowing forward as in try over and over till you succeed. I had scripts fail multiple times, scripts that deployed networking, ones that created clusters, and ones that added hosts to clusters – any and all of them failed on more than one instance. I also believe WinRM is the issue why the view of VMs in SCVMM is usually or at least sometime different from the view in Failover Cluster Manager. Another peeve about views – since the views can be different, it might just pay to quickly match both up when you’re troubleshooting things. Since you’ll find yourself troubleshooting more often that not, make sure you have your tools ready!

System tasks: This required its own little section. SCVMM wants to do all these system tasks – things like – Refresh Host Cluster, Refresh Virtual Machine, Read Storage Provider were the ones that popped up the most. These look like normal system tasks, like a keep-alive mechanism the system uses. Fair enough. But.. This happens too often in my opinion. Most of these tasks run every 15 minutes, some sooner. Some take hours, even an entire day to run. While these are running, you CANNOT perform any of your own actions. Say you wanted to add a host to a cluster and there was the Refresh Host Cluster task already running – your operation to add a host will fail. If you wanted to edit a VM and it had a system task running, it’ll let you kick it off but then bomb out right away. Oh and the Read Storage Provider task – oh my.. Takes a whole day to run and if you wanted to create a new LUN using the SMI-S functionality (more on this later) it will let you kick off the task and die immediately. To top it all off – there’s no way, via the GUI or PowerShell (rendering it powerless) to kill a system task. The exasperation..

SMI-S provider: During my time with SCVMM, I was able to get the SMI-S provider configured for SCVMM with the help of colleagues. It lets you provision LUNs, expand them, delete them etc from the SCVMM console without having to ask the storage team to do the storage tasks for you. People have done posts on how to get SMI-S and SCVMM going, Google up to learn more.

Memory management: From what I’ve seen, there isn’t much going on. When a VM asks for physical memory and the host has some to give away, it’ll give it out. It never claims it back or at least doesn’t appear to. There’s no idea of a balloon driver installed on VMs. Sure there’s this package called Integration Services that’s meant to go on every VM, but it doesn’t contain a balloon driver or anything of that nature. Hypervisors aren’t meant to just yank memory back from VMs during times of contention, they ought to work with the guest OS, determine which pages can be reclaimed and go from there. In addition, there’s nothing of the likes of TPS going on either (though there’s the Shared Memory section in Task Manager which may be similar but I haven’t seen substantial savings as I have with vSphere (at least with VDI environments I have)). Dynamic Memory in HyperV appears to be a marketing gimmick for most production environments from what I’ve seen – it’s only dynamic when dishing it out and doesn’t take it back. Read this article for a good explanation on this fad.

PowerShell: It’s just necessary to learn enough PowerShell to keep things ticking along. Period. There are a myriad of things that can be done or rather must be done with PowerShell only. Things like a mass storage migration, or a Integration Services install/update, update number of migrations a host can do – all require a PowerShell script. Learn it, or get left behind. Better still, get DSC going to smooth things for yourself. DSC takes a lot of learning and effort to get going, once there it’s worth it from what I’ve been told by multiple sources.

Performance charts: Very rudimentary. This is an apt description. The performance charts in SCVMM are very basic – memory consumed, number of VMs running, how hot CPU is on the host(s)/cluster etc. Nothing like the in-depth reporting you may be used to seeing with vCenter. People may reckon, you’ve got SCOM or Perfmon/ResMon and some PowerShell commands, yeah you do, but then again it comes down to the simplicity of it all or lack thereof with HyperV. I wish Microsoft had integrated decent charts into SCVMM, would’ve greatly enhanced its usefulness and credibility.

I’ll close this off here, the post has been long already, hopefully I haven’t rambled much.

TLDR – HyperV isn’t all gloom. It works. It requires a lot of effort to keep the lights on. There’ll be days when you’ll doubt yourself. But when it comes to saving $$$ on licensing, you’ll want to check it out. What about the management overhead, the endless man hours spent? You decide.

Leave a Comment

Your email address will not be published.