NetApp, DataONTAP, and PCI DSS

I’ve just found a very, very interesting document, that I wish I had seen before manually taking care of these issues. If you’re running – or planning on doing so – NetApp in a PCI DSS environment, you should take a look at this checklist.

ESXi 4.1 and iSCSI

Coming from mostly NetApp / NFS  environments, I rarely get to set up iSCSI for my ESX datastores. Recently, I was blessed (or cursed) with requiring to set up an iSCSI box from an – until then – unknown manufacturer.  So I did all the nifty little things required like setting up my RAID, volumes, LUNs, and whatnot and decided to “hook it up to the ESX server”.  So I navigated to the Storage Adapters menu in ESX, selected my Broadcom iSCSI Adapter and was greeted with the message

The host bus adapter is not associated with a vmknic. To configure targets the adapter should be assosicated with a vmknic. Refer to the VMware documentation…

(Gotta love NFS…)

So, what’s going on here? Well, it’s quite simple actually…

All you basically need to do is assign your physical network interface to your VMK NIC port. Here’s where we get to log in to the handy dandy SSH / Remote Tech Support connection. (I’m going to assume that you’ve hooked up the cables to the right ports and already created a virtual switch with the kernel ports; if not, now would be the time… )

To get a good overview, I like to list everything I’ve got that could play a role:

esxcfg-vmknic -l” returns a list of my VM Kernel NICs

~ # esxcfg-vmknic -l
Interface  Port Group/DVPort   IP Family IP Address
vmk0       Management Network  IPv4      192.168.xxx.xxx
vmk1       VMkernel-iSCSI      IPv4      192.168.xxx.xxx

esxcli swiscsi nic list -d vmhba32” will return a listing of which adapters are attached to the HBA. (It should return “Errors: No nics found for this adapter.“)

With a similar command, we can see which vmnic we are talking about….

~ # esxcli swiscsi vmnic list -d vmhba32
vmnic2
vmnic name: vmnic2
mac address: xx:xx:xx:xx:xx:xx
mac address settable: NO
maximum transfer rate: 1000
current transfer rate: 1000
maximum frame size: 1500

Now we need to attach the NIC to the VMKNIC. To do so, we simply enter “vmkiscsi-tool -V -a vmk1 vmhba32” and should get a success message. By the way, the command esxcli “swiscsi nic add -n vmk1 -d vmhba31” seems to do the same, but doesn’t provide any output.

~ # vmkiscsi-tool -V -a vmk1 vmhba32
Adding NIC vmk1 …
Added successfully.

Performing the list command again will return a lot more info, and we see that things seem to be working.

~ # esxcli swiscsi nic list -d vmhba32
vmk1
pNic name: vmnic2
ipv4 address: 192.168.xxx.xxx
ipv4 net mask: 255.255.255.0
ipv6 addresses:
mac address: xx:xx:xx:xx:xx:xx
mtu: 1500
toe: false
tso: true
tcp checksum: false
vlan: true
vlanId: 0
ports reserved: 63488~65536
link connected: true
ethernet speed: 1000
packets received: 301487
packets sent: 48
NIC driver: bnx2
driver version: 2.0.7d-3vmw
firmware version: 1.9.6

After that, you can go continue setting up iSCSI via the VI client.

Listing all Snapshots on an ESX Cluster / Virtual Center

As most of you know, leaving a snapshot running indefinately on a virtual machine isn’t exactly great performance wise. A snapshot should ideally be removed as soon as possible in order to prevent performance degredation on your systems. Unfortunately, what is usually forgotten until someone reports that things aren’t running correctly? That’s right – removing the snapshot.

How to find those pesky little buggers you ask? Well, the perhaps most obvious method would be to right click on each VM and hover over “Snapshot”. If “Revert to current snapshot” is greyed out, there is none, and all is fine. However, you’re most likely running more than one VM on more than one host – depending on the size of your environment, you can have quite a few virtual machines running, and clicking each one simply isn’t the way to cut the cheese. Fear not, there are easier methods of accomplishing this.

In order to do this, you’re going to need the PowerCLI, which is basically a set of Powershell libraries and commands which allow you to interact with your ESX and Virtual Center servers. As you most likely can guess, you can get this magical set of voodoo here. Next step – install it.

Ok, now we are getting there, but, in order to use it, we need to start it. Navigate to where you installed it, and open it. You should get a Powershell console with the right snapins and stuff loaded.

This is where things get interesting.  Type “Connect-VIServer *VI-Servername*“  (without the quotes and replacing the server’s name appropriately) to connect to your virtual center server. If you’re current user doesn’t have permission, you’re going to need to authenticate yourself with credentials that can.

As you know, Powershell commands can be linked, or “piped” into each other, to pass the results from one command on to the next command. You could enter “Get-Cluster *Clustername*” to get infos on your ESX cluster for example… Modify that commandlet and pipe the results into another command  (“Get-Cluster *Clustername* | Get-VM“) and you get a list of all VMs in your cluster… See where I’m going?

To get a list of snapshots, you simply need to pipe the results yet again into another commandlet: “Get-Cluster *Clustername* | Get-VM | Get-Snapshot” and voila, you have a list of all machines that currently have snapshots. (Note: Snapshots that are currently being modified will be shown by snapshot name and not by machine.)

Now, you can even take things a bit further, and delete snapshots from there, using the command “Remove-Snapshot” with or without confirmation. Obviously, you should use such commands with care, as you can surely ruin an update process by prematurely deleting a snapshot that might be needed later.

And last but not least, you can always find out what a commandlet does by entering “Get-Help *Commandlet*

Replacing Certificates Created with MD5 on ESX

If you’ve ever done a security scan on an ESX server, you might have stumbled across the vulnerability CVE-2004-2761: SSL Certificate Signed using Weak Hashing Algorithm. Great huh? So, now you have it, how do you get rid of it?

This one is quite simple to resolve, all you need are your fingers, root permissions, this blog article, and an SSH client or something like WinSCP if you don’t like command lines.

Step one: Connect to the server via SSH and authenticate yourself

Step two: Navigate to the current keys by typing “cd /etc/vmware/ssl/” (without the quotes of course) and move/rename the current ones by typing “mv rui.crt ./rui_bak.crt” and “mv rui.key ./rui_bak.key” just in case something goes awry…

Step three:  Edit the script that generates the certs. To do this, type the following “nano /etc/rc.d/init.d/mgmt-vmware” to open the file, ctrl + w to search, type “usr/bin/openssl” and press enter. You should be taken to the line /usr/bin/openssl req -new -x509 -keyout “$sslDir”‘/rui.key’. Simply add -sha1 to make the line look like this: /usr/bin/openssl req -new -x509 -sha1 -keyout “$sslDir”‘/rui.key’. Press ctrl + o and then enter to save the file and ctrl +x to exit nano.

Step four: Reboot. Easy one – type “reboot” on the command line.  One minor hint – you might want to make sure that no virtual machines are running on the host before you do this step.

Step five: Check to see if the machine still works. Once the machine booted again, log on via SSH and navigate to the /ssl/ dir mentioned in step one. Type “ls -l” and confirm that you have the two original files (“the _bak ones” and two new ones with a current date and timestamp).

Done.

Call “PropertyCollector.RetrieveContents” for object “propertyCollector” on vCenter Server vCenter failed

When attempting to edit the settings of a VM Template converted to a V-Machine, you might get the error

“template Call “PropertyCollector.RetrieveContents” for object “propertyCollector” on vCenter Server *Servername* failed”

Thankfully, this one is pretty easy to fix – simply remove the VM from inventory and add it back.

Changing the vSphere 4 Client’s Language

In the course of history, I’ve been forced to log on to quite a few VMware systems all configured differently. If you’re like me, you might want to change the language of the interface you’re getting from the downloaded client.

To do so, simply edit the short cut and add -locale en_US to the end (after the trailing quotes). Start the client and you should already notice the difference. Unfortunately, some things don’t always seem to translate – such as system state.

Documentation is just so much easier to find in English – now having to roughly translate should be a thing of the past…

Disabling Insecure Ciphers on Mod- / OpenSSL

In my previous post, I described how to disable insecure ciphers on Windows machines. I also mentioned that server software based on third party software can utilize stuff like Apache and OpenSSL to host applications. Changing the Windows encryption settings obviously won’t affect the encryption strength of these applications. Don’t worry, what may sound horrid at the start, really isn’t all that complicated.

As always, the first thing that needs to be done is logging on to the machine to be “fixed”. Find the Mod- / OpenSSL configuration file that needs to be modified – you can use the Windows search for “ssl.conf” to find the file. (Just remember, take a look at what you’re modifying before you do it; also backups never hurt…)

Once the file is opened, search for “SSLCipherSuite” and take a look at what’s behind that.  You guessed it – you can enable and disable cipher suites.  Put a # in front of that line to comment it and add the following line right below it:

SSLCipherSuite -ADH:HIGH:MEDIUM

What that does is enable high and medium strength cipher suites – all others are disabled. In addition, it prevents the use of  ciphers using anonymous DH key exchange (-ADH). You can find the syntax and a bit more help in the mod_ssl man pages.

To enable or disable encryption protocols such as TLSv1 or SSLv2, you can modify the “SSLProtocol” section. +TLSv1 will activate TLS version 1.0; -SSLv2 will deactivate SSL version 2.0.

Once you’ve done the changes, don’t forget to restart the service.

Disabling Insecure Cipher Suites on Windows 2K3 / 2K8

I regularly do internal security scans using the well known Nessus tool as a virtual appliance. One of the more common security problems I’ve stumbled across is the use of weak and medium strength cipher suites on Microsoft based servers(Nessus IDs 26928 and 42873). Certain security oriented certifications, such as PCI DSS, require that insecure ciphers not be used for encryption.

Solving this problem isn’t as hard as it seems – theres a pretty detailed MS KB article that shows us how to solve these issues. It’s a lot of text, and I’m going to cut things short by eliminating the reasons and just showing the solution….

First things first: identify the culprits.

Ciphers with less than 128 bit key lengths are regarded as insecure, enabling communication to be decrypted or allow man in the middle attacks to take place.  As you can guess, those things are not good if you’re the victim. If you’re lucky, you have a report in front of you saying which ciphers need to be disabled. If not, just disable all ciphers and protocols regarded as weak.

Ok, now that I know what to do, can you tell me where to do it?

In the Windows registry! Open regedit and navigate to the following hive HKLM\SYSTEM\CurrentControlSet\Control\SecurityProviders\Schannel\. You’ll see a few keys such as ciphers, hashes, and protcols. As you can guess, you can use these keys to configure the machine’s encryption strength. I’m going to concentrate on insecure ciphers.

I’m at the right place, ehm, how can I change things?

All the ciphers are listed under the ciphers branch. Each cipher has a DWORD value called enabled. It shouldn’t come as a surprise that a value of  “0″ means no, and “1″ means yes. Here are the ciphers that you should set to disabled.

  • RC2 40/128
  • RC4 40/128
  • RC2 56/128
  • RC4 56/128
  • RC4 64/128
  • DES 56/56

If the registry keys don’t exist, create them and the enabled DWORD value as mentioned above.

Also, while you’re at it, you might want to disable SSLv2 which is also regarded as insecure. To do so, navigate to the following DWORD value (if it doesn’t exist, create it):  [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\Schannel\Protocols\SSL 2.0\Server\]“Enabled”=dword:00000000

Now for the caveat: these changes won’t take effect until the machine was rebooted. (Yay!)

Lazy people can download this file and rename its ending to .reg. Simply copy the settings to the insecure machine, and inport them. Once done successfully, reboot the server. (Note, I disable RC2 128/128 and SSLv2 in these settings.)

You might find that if you’ve changed these settings, you’re still getting these warnings in your security scan. If so, check if some third party software uses OpenSSL and some other server software to host an application over a secure channel. In that case, you’ll have to edit the ssl.conf or its equivalent.

Warning: Before changing settings in the registry, it is advisable to make a backup using the registry editor. Also, please know what you’re doing – I won’t be held responsible for any machines that perform instably after this.

Resetting a Blade Management Module via Telnet

Every now and then, I get a managment module that won’t respond to http(s) requests. The only solution I’ve found so far is to reset the management module using a telnet connection.

To reset the module, simply enter reset -T system:mm[1] when connected to the center. You can replace [1] with the number of the management module you want to reboot.

If you’re not sure which device you’re looking for, you can list the devices by entering list -l 2.

Of course, you can also reboot switches and blades using this command by entering reset – T system:(switch/blade)[#].

Unable to add a node to an ESXi 4.1 Cluster

While attempting to add the third node to a three node ESXi 4.1 cluster, I kept getting the following very descriptive error:

HA agent on HOST in cluster CLUSTER in DATACENTER has an error: cmd addnode failed for primary node: Internal AAM Error – agent could not start.: Unknown HA error

I checked a few KB articles but could only find stuff related to DNS and other things that were ok in this setup.Taking a deeper look, I noticed that one of the hosts previously in the cluster still had local authentication set up, and the other two were set to domain auth. I joined the server to the domain and voila, was able to complete adding the node to the cluster.

You can change this setting per host under Configuration -> Authentication Services in the vSphere Client.