Understanding vSAN Encryption & troubleshooting tips

vSAN Encryption is becoming more & more popular in vSAN & VMware cloud on AWS as it provides data at rest encryption. So, data at rest refers to the data that is encrypted and stored on persistent media.

In this article, we will discuss about design architecture and workflow of vSAN Encryption.

Design Considerations:

  • Do not deploy KMS server on vSAN datastore you plan to encrypt because in case of production down ESXi host cannot contact KMS during reboot.
  • Make sure AES-NI in host BIOS is enabled as it will help improve encryption performance as encryption is CPU intensive operation.
  • Witness host does not participate in vSAN encryption because it is for metadata purposes.
  • When collecting coredumps use password as coredumps are encrypted by hostkey.

Consider a unsupported scenario where KMS server or KMS cluster is placed on encrypted vSAN datastore. When a host reboots and comes online, it will not able to mount diskgroups because KEK cannot be retrieved from KMS as KMS resides on the vSAN storage & it is unavailable. Therefore, it is always recommended to place KMS appliance or cluster outside of vSAN datastore to avoid these kind of scenarios.

VMware does not support placing KMS appliance or cluster on the encrypted datastore.


vCenter & KMS Configuration:  

Let’s discuss about the configuration of KMS with vCenter. Before enabling vSAN Encryption on the cluster KMS should be deployed and configured with vCenter and mutual trust has to be established between them by choosing appropriate certificate methods. vCenter provides central location for KMS configuration and health checks for easy visibility. vCenter stores KMS configuration in VCDB and certificates in VECS.

The best design part is, when enabling vSAN encryption on cluster vCenter pushes the KMS configuration, KEK_ID, certificates to all the ESXi hosts in the vSAN Encrypted cluster so that ESXi can communicate to KMS independently without dependency of vCenter being available.

Workflow:

  • vCenter requests keys and receives the KEK_ID from the KMS Server.
  • vCenter then pushes the server & client certificate and KEK_ID to all the hosts part of vSAN encryption cluster.
  • ESXi hosts save this information in esx.conf file which is at location /etc/vmware/esx.conf.
  • ESXi hosts save certificate information in /etc/vmware/ssl.
  • Basis on corresponding KEK_ID ESXi hosts contact KMS server for KEK.
  • ESXi and KMS server is based on SSL handshake using server certificate
  • KMS server generates & stores KEK which it then provides to ESXi hosts.

This way ESXi hosts do not depend on vCenter because even if vCenter is down, ESXi host can communicate and retrieve the key from KMS server over port 5696

KMS information is shown pushed by vCenter:

[root@blr1:~] grep -i /vsan/k /etc/vmware/esx.conf
/vsan/kmipClusterId = "HyTrust"
/vsan/kekId = "a83f52f7-a5e6-11e8-b99e-005056b989ed"
/vsan/kmipServer/child[0001]/old = "false"
/vsan/kmipServer/child[0001]/port = "5696"
/vsan/kmipServer/child[0001]/address = "192.168.2.131"
/vsan/kmipServer/child[0001]/name = "HyTrust-Secondary"
/vsan/kmipServer/child[0001]/kmipClusterId = "HyTrust"
/vsan/kmipServer/child[0001]/kmskey = "HyTrust/HyTrust-Secondary"
/vsan/kmipServer/child[0000]/kmskey = "HyTrust/HyTrust-Primary"
/vsan/kmipServer/child[0000]/kmipClusterId = "HyTrust"
/vsan/kmipServer/child[0000]/name = "HyTrust-Primary"
/vsan/kmipServer/child[0000]/address = "192.168.2.130"
/vsan/kmipServer/child[0000]/port = "5696"
/vsan/kmipServer/child[0000]/old = "false"

[root@blr1:~] grep -i /vsan/hostk /etc/vmware/esx.conf
/vsan/hostKeyId = "3e226958-a044-11e8-b99d-005056b989ed"

Before discussing further, I would like to discuss about KEK, DEK & Host Key

KEK: KEK stands for key encryption key and it is used to encrypt DEK. It is generated by KMS server upon request based on KEK_ID and gets persistently stored in KMS. Everytime host boots, based on the KEK_ID it contacts KMS and retrieve KEK in order to unwrap DEK and mount the diskgroups. In ESXi host, KEK is stored in keycache temporarily.

DEK: DEK stands for data encryption key and is used to encrypt data. It is randomly generated by all the vSAN disks part of ESXi servers. It is persistently stored on vSAN disks and plain text of DEK is stored temporarily in keycache for VSAN IO encryption & decryption.

HostKey: Host key is generated by KMS server and used to encrypt the memory dumps. It is persistently stored by KMS and temporarily stored in ESXi server keycache.

Host & KMS Workflow:

As per above workflow, ESXi host has retrieved the KEK from KMS server and stored it in the keycache area. Let’s discuss the workflow of enabling encryption on newly created or existing vsan cluster and behavior of host reboot parallely.

  • Enabling encryption
    • ESXi server contacts KMS to retrieve the KEK based on KEK_ID
    • Rolling reformat of diskgroup takes place which includes encryption metadata to be placed on encryption layer
    • At the time of announcing the disks DEKs get generated by individual vsan disks
    • DEKs are encrypted by KEK and plain text DEK gets stored in keycache for data encryption & decryption
    • Now, ESXI host has the KEK, DEK, hostkey in keycache & wrapped DEK on vSAN disks. Even if the KMS is down there is no impact on encrypted vsan cluster production.
  • ESXi host boot behavior
    • ESXi host boots and once hostd starts, system appears to load other modules like vsan module, cryptosafe module etc.
    • vSAN tells hostd to disable all coredumps on the host
    • vSAN encrypts host coredumps by retrieving host key from KMS and stores hostkey in keycache
    • vSAN uses KEK_ID to retrieve KEK from KMS and stores in keycache
    • vSAN uses KEK to unwrap DEK stored on the disks and mount the disks
    • vSAN then place all the DEK in plain text in keycache for vSAN IO path encryption & decryption

Troubleshooting tips:

  • Make sure you check the connectivity between ESXi hosts & KMS Server by pinging KMS via FQDN or IP address
  • If KMS is registered with DNS, check DNS is up & running and able to do forward & reverse resolution
  • Check the cert store location in ESXi host /etc/vmware/ssl and make sure that valid server certificate & client certificate are present
  • Check the expiration of server and client certificates
  • Check the log files vmkernel.log, syslog,log & vsansystem.log for exact reasoning behind communication errors b/w ESXi & KMS and error related to diskgroup mount.
  • Go through the known issue with vSAN Encryption earlier versions https://kb.vmware.com/s/article/52723

 

If you like this blog post, please feel free to share with your friends on social media.

Thanks for reading!!

 

2 Comments

Leave a Reply

Your email address will not be published.


*