Wednesday, December 16, 2015

VMWare Distributed Switch Migration


During my configuration adventures I have multiple Distributed switches across my cluster. I am going to try to migrate one of my host's connections from one dswitch to another.

I suppose there are a number of ways I could do this, but I'm going to first start by going to the switch I want to move the connections to, and kicking off the wizard.

  • Actions->Add and Manage Hosts
  
  • Click on the big green plus sign, and choose the host I want to add, here it is once I pick from the list.
  • Check the first two boxes. 
  

  • I'll need to re-assign the uplinks for the ports on Dswitch_Demo, to this switch DswitchARDC_VSAN.
 
  • Choosing vmnic0 and Assign uplink, I'm going to put these on my previously setup lag ports 4 and 5.
  

  • That looks right, lets see what happens next.
  • Here Vcenter is complaining that I'm going to leave vmk2 high and dry, but that's ok. He will be deleted after this. Actually I need to move vmk2 as well, but I messed up. see below :)
  • This next screen checks impact. None here, so ok. 
 

  • And one more pointless status page from vcenter before we complete.
  
 

  • Well, it looks like everything came out alright. 
 
  • Wait a tick, that's not right! Where is my vmk2? I guess I left it back on the other switch. Let me go back into the wizard and see if I can move that over too. 
  • This time I picked "Manage Host Networking" instead of Add Host.
  • Just need to move the vmk, so only check "Manage VMKernel Adapters" on the next page.
  • Here is the page I must've skipped last time. Guess I need to pay attention to DETAIL!

  •  Ok, so I selected vmk2, Clicked on Assign port group, and chose the portgroup for my new dswitch. Here is the end result.
 

  • Here goes nothing, hope it works. 
  • But first...analyze impact...show summary. (so verbose)
  • So, when you're done the screen just disappears, so I guess you have to dig back into the app to verify changes.
 
  • This looks almost right. For some reason the link light isn't on for my vmk2 on 172.20.52.242 (The vmk I just moved).
  • Well, I guess this is just another shortcoming of the web client. I refreshed the browser and it is "lit" now. Let's run a vmotion and see what happens. 
  • Wow, that was fast! Vmotion test succeeded. Looks like I'm good to go.
 
 

VSAN 10G Switch Configuration 3548 VMWare Distributed Switch Part #3


So, I have my Distributed switch setup on 1 host for vmotion testing. Here are a couple of settings that I need to verify

Top Level - Distributed Switch Configuration


  1. Make sure the port group is set to Active. Highlight the switch, Edit Settings, LACP

  

  1. Check the MTU that it matches the rest of your config. (If using Jumbo frames).
 

  Distributed Port Groups 

  1. Advanced-> Allow VLAN
 
  1. Set VLAN (if applicable)
  

  1. Teaming and Failover. Make sure the lag group we setup is in the "Active Links", by default it is set to "unused". You have to move all of the uplinks to usused and move the lag group to "active"
 

Testing - Vmotion 

  1. Log into the esxi target host to test vmotion (the host that the vm is moving to)
  2. Run esxtop then press n for network and T to sort by size
  3. Start storage vmotion to target host while simultaneously monitoring esxtop
Shown here is my vmk2 (vmkernel on distributed switch). As you can tell, during the vmotion it is utilizing both vmnic4 and vmnic1 within my lacp group. So, I know that this is working the way it's supposed to. Esxtop can be quite helpful here. For example, if you notice traffic going over the wrong vmnic's, you may have to re-check your configuration.

  
  1. To further test my distributed switch and port-channels,I disabled the links to the host, one at a time (during vmotion) to validate the failover capability. And much to my surprise, it worked perfectly.
  2.  
     
     

Tuesday, December 15, 2015

VSAN 10G Switch Configuration Cisco 3548 VMWare Distributed Switch Part #1


Best practices for VMWare VSAN call for using the vcenter distributed switch. It's a funny thing about licensing, because I actually didn't know that I was able to use the distributed switch until I read around a bit on the internet. I knew that our version of vcenter (enterprise, NOT enterprise plus) wasn't licensed for the Distributed Switch, but when we purchased our VSAN licenses, the Distributed switch was included with it. So, there you have it.

I have to admit, I was a bit anxious about setting this up properly. First of all, I'm not a networking expert. I had done some remedial switch config to get the VSAN up, but Etherchannels, port-channels, VPCs and the like were not exactly my strong suite. However, I did manage to setup my two cisco nexus 3548 switches as peer links, so I had a working knowledge of the basics.

Just as a quick review, the starting point for this endeavor is 3 VMWare Esxi 5.5u3a hosts, with 4 10Gig connections each. Currently, they are all connected using the VMWare Standard Switch. I hope to nail down the correct configuration to have all connections setup through the distributed switch.

I'm sure VMWare mentions it somewhere, but migrating from standard switch to distributed switch seems like a lot of work. Do I configure the switch first, then the hosts? How do I test it? How do I know if it is working right?

I'm starting off with two Vcenter standard switches on each host. One is setup for VSAN traffic, and one is setup for VMotion traffic. I had a heck of a time configuring the VSAN initially, so to mitigate any issues with the VSAN, I'm going to experiment with the vmotion switch. If I mess this up, at least my VSAN config will still be working.

So my plan of attack is as follows:
  1. Setup the distributed switch on a single host for vmotion
  2. Configure the cisco switch for LACP/port channeling for the two connections 
  3. Verify storage vmotion capability for distributed switch host
  4. Move all Hosts vmotion networking to distributed switch
  5. Move all Hosts VSAN networking to distributed switch* 
*If I'm lucky, I'll get through steps 1-4 in this post and save #5 for later.

Vcenter Configuration (step 1)


First I removed the vcenter standard switch with vmnic0 and vmnic4 from my hosts network configuration section.

Next I created a VCenter Distributed Switch via the web client.

On the left hand nav bar select VCenter->Distributed Switches. On the right pane click the + button for Create New Distributed Switch.

I chose a really lame lame, and specified Distributed Switch version 5.5. I want the best LACP has to offer!

On the number of uplinks I changed this to 6 (2 per host), however, I think this is configurable later, so I'm not sure it really matters. I also selected Default port group and named it vmotionportgroup2 (how unoriginal!). And last but not least, click Finish.

Here is what it looks like in web client. Nothing groundshaking here. 

Next. Go to the switch and select Manage->Settings. Here, I selected the LACP tab and added a new Link Aggregation Group. For lack of a creative name, I called mine vmotionLag, 6 ports, defaults for the rest, and chose Passive for the mode. (This means it is off, once I configure the switchports, I can come back and turn this to Active.


There are a number of ways of adding a host to a vcenter distributed switch (or is it vice versa?), but I am going to use the easiest way I can think of, the wizard. From Vcenter web client, select the distributed switch and click the Actions Tab on the right pane. From there select Add and manage Hosts

  1. Add hosts
  2. New hosts...
  3. <choose one host>, remember, I want to get distributed switch working on just one host for now. KISS!
  4. Next..
  5. Check the first two boxes for Manage physical adapters & Manage VMKernel adapters
  6. Next.

Shown above, vmnic1 and vmnic4 are free. VMWare makes it pretty easy by showing which vmnic's are free. On this step, DON'T click next until you have setup each uplink. 

  1. Highlight one of the free vmnic's. (vmnic1)
  2. Click the Assign Uplink button

  1. Since we enabled LACP we have the vmotionLag2 ports, don't use the regular uplinks.
  2. Do the same thing for vmnic4.
Here is the end result. 



  1. Click Next. This takes you to the Manage VMKernel Network Adapters page. As you can see I have my original vmk0 and vmk1 (my other 10G connections I'm keeping on standard switch for now). I want to create vmk2 which will use the distributed switch. 




  1. Click New adapter
  2. In step#1, browse for adpater type, choose vmotionportgroup2
  3. In step#2, enable vMotion traffic checkbox
  4. In step#3, assign a statpic IPv4 address/Mask
  5. Finish


  1. Next
  2. Next again
  3. Finish (Finally!)
Ok, just want to hop back to my host view / networking, and make sure everything is there. Everything Looks ok, if you notice my vmotionportgroup2's virtual Link Light is not lit up...this is still in passive mode. Will have to come back and change this to Active once the cisco switches are setup.




Part #2, Switch Configuration

VSAN 10G Switch Configuration Cisco 3548 VMWare Distributed Switch Part #2

If you haven't seen my earlier post, the starting point here is 3 VMWare Esxi 5.5u3a VSAN clustered hosts, each with 4 10G nic's. The goal is to get all 3 working with distributed switches for VSAN and VMotion. The first step (in post 1) was to remove the vcenter standard switch from one host, add a distributed switch and configure it for vmotion for that host, and leave the portgroup in passive mode. Thus, we move on to

Cisco Switch Configuration

The cisco nexus 3548 switch runs nx-os, which is pretty easy to learn. It is well documented online, and there are tons of examples. Having said that, I will still show the exact commands and their output.Also, these switches have already been configured as peer-links.

The first thing I did was create a mapping of every port on every host, to each switch port. Some people may be great at documentation and have this on hand from the install, but I used a slower and less efficient method. I shut down each port using
  1. conf t
  2. interface ethernet 1/5 
  3. shutdown
Then I would flip over to vcenter and see which link was disconnected. Then run no shutdown for the same port to turn it back on. Here are my results.
Host/vmnic/switch/switchport

On this host I have 4 NICs. I am currently using vmnic0/5 for VSAN connection. So I'll be working with the orange highlighted.

Well, I just realized something is amiss with this configuration. Just so there aren't inconsistencies with my first post, I'm going to go ahead and address this now before moving on. I have the wrong vmnics grouped together for VSAN and VMotion. The other hosts in my VSAN cluster have vmnic1 and vmnic4 on different switches. This was built in for triple redundancy (cable failure, card failure, and switch failure). However, it looks like I should be grouping together vmnic0/4 and vmnic 1/5 to achieve this. Let me update my vcenter switch and I'll come back. So, in case you are wondering why part #2 of this post has a slightly different configuration than part #1, now you know. (And knowing is half the battle)

Ok, so I changed my adapters, moving forward with vmnic0/4 for Vmotion on vsphere distributed switch.

I like to bring up two ssh sessions, put cmcdrswitch1 on the left and cmcdrswitch2 on the right. It is easy to get confused.

Here is a screenshot of the command show interface brief for the ports we'll be changing

cmcdrswitch 1 show interface brief


On this switch we will be working with interface ethernet 1/2. This is the pretty much the default config. I think the only thing I changed was the VLAN id when initially setting this up.

To see the running config for a port run

cmcdrswitch 1 show running-config interface ethernet 1/2

I will run the following commands to setup the port-channel
  1. conf t
  2. interface ethernet 1/2
  3. switchport mode trunk
  4. switchport access vlan 15
  5. channel-group 22 mode active
  6. no shutdown
Now if you run show interface brief again, you can see the changes to the port. (will show Mode as trunk and Port CH# as 22).

And if you run show port-channel summary, you should see port-channel 22.



So, what have we done? Nothing yet. We need to do the exact same thing on switch2 for interface ethernet 1/1. 

I'll save the verbosity of repeating the steps above. Here is the result on switch2.


Surely this is an accomplishment we can boast of? Nope. We've just setup port-channeling on two different switches. Now is when the magic happens. on both switches, run the commands
  1. conf t
  2. interface port-channel 22 
  3. vpc 22
 Now run: show vpc brief on each switch to verify that it is up and running.




Now we've really done something. The vsphere distributed switch (with lacp enabled) will utilize both of these links for load balancing and fail over. This is quite different and way better than"teaming" in a vsphere standard switch. If you want the technical answer you can Google it, or you can just take my word for it.


Lastly, I need to run a few vmotions to see if this is even working right. Don't forget to change the portgroup mode to Active. And be sure all of the MTU's match between vmotion switches, etc. I have everything set to Jumbo Frames (9000) even though I don't think it makes much of a difference for VSAN.

Next up, Testing and Validation.

Wednesday, December 9, 2015

VSAN 10G Switch Configuration Cisco 3548 SFP+


VSAN Switch Configuration. 

Part One: Physical Switch

3 Hosts, 4 10GB connections per host
2 Cisco 3548 Nexus Switches configured for failover (VPC)
Hosts using VCenter Standard Switch

Initially I got all my switches setup and plugged everything in, only to find that my datastore was segregated into three separate "Network Partition Groups". I dug around a little and found out that this had to do with the IGMP settings on my switch. VMWare uses some multicast  addresses for this

224.1.2.3 
224.2.3.4

If igmp snooping is not setup correctly on your switches, you will see this error

"Network status: Misconfiguration detected" and will most likely see Group 1, Group 2, Group 3 in the Network Partition Groups column of VSAN Disk Management. They should all be in the same group.















The switches we are using run NX-OS, here is the result from query

show running-config | grep igmp





In addition to changing the IGMP settings, I also had to enable Virtual Port Channeling for switch failover. Here is shown the output of
show interface brief


 All of the host network interfaces are connected to Eth1/1-Eth1/6. Eth1/15 and Eth1/16 are connected from switchA to swtichB. 

Running the command show vpc brief  on each switch, will let you know if you've configured it properly. Here is the output from both switches



Part Two: Virtual Switch (VCenter Standard Switch)

All the documentation out there points to the VCenter Distributed Switch as the recommended implementation for VSAN. To keep things simple I decided to start first with a VCenter Standard Switch, and move to the distributed switch once I was up and running.

Each Host has 4 10GB Nics. 2 cards with 2 ports each. In order they are vmnic0, 1 and vmnic 4,5.  
First, I decided to pair vmnic0 and vmnic5 into a standard switch for VSAN traffic, and vmnic1 and vmnic4 into a separate switch dedicated to VMotion traffic. For redundancy, each pair of 10G connections are routed to a separate switch. Thus, the failure of a single link, network card, or physical switch will not undermine the integrity of my two 10G networks. 


So far the performance is quite good. I can't imagine the Distributed Switch being much better, but in time I will get there and have some benchmarks ready for comparison. 

Thanks for reading.
Helpful Links


Monday, December 7, 2015

VMWare 5.5 ScratchConfig Syslog Etc.

Syslog and ScratchConfig in VSAN

After digging into the matter, I looked up some best practices, and this is what I've come up with.

1. The Host Syslog should not be on the VSAN Datastore
2. The ScratchConfig should not be on the VSAN Datastore, or on SD card.
3. Ideally there would be some local storage for these, but if VSAN claims all my local disks, what to do?
My proposed solutions are.

1. Sacrifice a HDD from each Host for Syslog and ScratchConfig.
2. Buy some smaller HDD's that are slower and cheaper.
3. Run without a local storage (in this case VMWare uses a ramdisk and data is not saved between boots).
4. ??

As it turns out during the initial install of 1 instance of VSAN I ended up with solution #1 and another instance ended up with solution #3. It took me a long time to figure out why 1 disk was missing from my VSAN setup, and then I began to see what happened. At some point during the install, VMWare made the decision for me and sacrificed one of my disks for Syslog and ScratchConfig. How I ended up with 2 different implementations is beyond me, but it actually helped me figure out the problem.

So next came the task of choosing my path. I could convert one instance or the other, but I could not abide having two different configs, so I chose to sacrifice a drive. I know, it might be ridiculous to use a 1.2 TB drive for 4GB space, but I couldn't find a better solution.

Then I had to reclaim a disk from VSAN, format it for local use, create a filesystem, then re-point Syslog and ScratchConfig to use this new space.

First I moved all of the Vms off of the machine that I wanted to reclaim a disk from and put the host in maintenance mode. Next, I went to the disk management portion of VSAN configuration, and made a note of the UUID of the disk that I wanted to remove. Then I clicked "remove". All is well, nothing exploded.

After the disk was free, I could get it ready for local vmfs filesystem.
Here is what the disk looked like in partedUtil when part of the VSAN.



Here is what the disk looked like in partedUtil after removing from VSAN.



If you run partedUtil showGuids, it shows you what the UUIDs mean, we'll need the appropriate UUID for creating the local 4.0GB partition



Next, I had to run
partedUtil setptbl
You might have to do a little math to get it right at 4GB.



Now I have a 4GB partition on a local disk that is not part of VSAN. Excellent. Without a filesystem mounted though, it will be useless. Running
esxcli storage filesystem list
You can see my currently mounted filesystems. Before mounting this new partition. I used the command
vmkfstools -C vfat "/vmfs/devices/disks/naa5000c5007f95b82f:1"
the :1 is for the first partition.



Now running filesystem check, I see my new mounted filesystem. Make a note of the mount point, we'll need it for the next steps. Note that this matches the UUID created in the previous step.



Next, copy the mount location and paste it into the Configuration->Advanced->ScratchConfig.ConfiguredScratchLocation.
You can see here that I'm still running on /tmp/scratch (ramdisk), and will have to reboot for the changes to take effect. Time to go get a cup of coffee.



Ok, now lets go back and check that same setting.



Next, enter the value
[] /scratch/log
in Advanced->Syslog->Syslog.global.logdir



Now, to see if that all worked. Login to the shell and navigate to the new scratch partition/log directory. You should see all the log files.



Scratch and Syslog files are not on the VSAN Datastore. Since they are on a local datastore, and not a ramdisk, they will persist between reboots. I believe this is as close to "best practice" as it gets with VMWare VSAN. Thanks for reading.