Posts filed under ‘VMware’

Advanced Troubleshooting of ESXi Server 6.x for vSphere Gurus

Hi Folks

You could refer the attached document for hints that will help you in troubleshooting ESXi environments. This document covers mainly 3 areas.

  • Which log files to review and when.
  • ESXi commands to isolate and troubleshoot issues.
  • Configuration Files.

Thanks.

Source: vmworld.

June 8, 2020 at 9:51 am Leave a comment

VMware PowerCLI

In this post , I am going to cover the PowerCLI module for VMware. Whenever, I came across a new cmd-let , I will update this post.

First things first, You need to install the PowerCLI. Now , the Windows Powershell have the VMware PowerCLI module. So you could simply install it by.


PS> Install-Module -Name VMware.PowerCLI

Then import it before using the Power CLI.

# To verify the version:
PS> Get-PowerCLIVersion

# To login to VCenter
PS> Connect-VIServer -Server “vcenterhostname”

# To Suppress the Certificate Warning/Error
Set-PowerCLIConfiguration -InvalidCertificateAction Ignore -Confirm:$false

#To list the VM’s with their creation date.
Get-VM | fl Name,CreateDate


March 23, 2020 at 1:25 pm Leave a comment

How to create a RHEL 7 template in VSphere ESXi 6.7

Unlike for Windows , RHEL based template creation requires additional steps to make it work. During this process , I came across very valuable information from the linuxtechi blog . I am summarizing the steps and some additional steps that I followed during the whole. process. ( But , I am not adding the steps that you need to follow in ESXi to convert a VM in to template)

Source: https://www.linuxtechi.com/create-vm-template-ovirt-environment/

Environment Details:

  • RHEL 7.3
  • ESXi 6.7

+ Create a RHEL 7.3 VM

+ Install the Operating System and all other Packages needed.

+ Yum update it (If you have a valid RHEL subscription).

Thereafter , we need to follow the below steps to generalize the VM by removing any VM specific configuration and you need to do the below:

+ Remove the SSH host keys
# rm -f /etc/ssh/ssh_host_*

+ Remove the hostname and set it as local host(This is optional , because if
you have not provided any hostname during the step , it will retain
localhost as the name.
# hostnamectl set-hostname ‘localhost’

+ Then remove any reference for UUID , HWADDR & MAC
# rm -f /etc/udev/rules.d/*-persistent-*.rules
# sed -i ‘/^HWADDR=/d’ /etc/sysconfig/network-scripts/ifcfg-*
# sed -i ‘/^UUID=/d’ /etc/sysconfig/network-scripts/ifcfg-*
At this point , please make sure that , while deploying the VM’s from this template you need to create an VM customization specification and forcing to enter the IP address details. Otherwise , you will end up all the VM’s having the same hostid.

+ Again this is optional if you have not registered the VM
#rm -f /etc/sysconfig/rhn/systemid

+ Poweroff the VM systematically (Similar to Windows Sysprep)
# sys-unconfig

Update#1: In case , if the sys-unconfig command does not work, you must use the virt-sysprep command . Details steps can be found in the below article
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/sect-guest_virtual_machine_disk_access_with_offline_tools-using_virt_sysprep

NOTE: VM Customization specification is mandatory to avoid the VM’s getting the same hostids. Steps are as below;

+ Login to the VCenter.
+ Open Policies & Profiles.
+ Select VM Customization Specification.
+ Provide the details , based on your environment.
+ In the Network screen , select “Manually Select Custom Settings”.
+ Click on Add.
+ In the IPv4 section , select ” Prompt the user for an IPv4 address when the
specification is used “.

Good Luck .

 

 

October 7, 2019 at 12:07 pm 1 comment

Reset to device, \Device\RaidPort0, was issued” error in the Windows event log

Environment: VSphere ESXi 6.7 on HP DL 380 (Single Server)

Problem: The VM’s getting hanged / frozen. Cannot Login to Windows nor issue any Power off commands. During the investigation , we found out that the VM’s were recording Event ID 129 with the Warning message “Reset to device, \Device\RaidPort0, was issued” , just before the VM becoming unresponsive.

We were referring the VMware KB https://kb.vmware.com/s/article/2063346 , and confirmed the LSI_SAS driver is updated to the latest version. But , luckily in our case , this deployment was a temporary one as we are planning to move this VM’s to a stable VSphere Cluster running on Nutanix. After few days ,moving the VM’s to the Nutanix environment , we noticed that the VM’s were functioning well with out any issues.

So for those who are having a similar issue , you need to check the underlying storage structure . As it could cause similar issues like this.

NOTE: During this unresponsive state , you could notice the Disk Latency stays at more than 20. This definitely a problem for a VM’s responsiveness.


March 24, 2019 at 12:08 pm Leave a comment

How to enable EVC when VCenter Server is running on VM in a Nutanix Cluster

As part of the Nutanix best practices we need to enable the EVC on the VSphere Cluster.  In that sense , when the VCenter Server it self a VM , you will be dragged in to  a chicken and egg situation.  Because , when a host contains powered on VM , you will not be able to add the host to the EVC enabled Cluster. Thus , to overcome this condition , you could follow the below guidelines. (You may need to disable the Admission Control temporarily and enable it again until you finish all the steps)

1) Add the hosts to the DataCenter .

2) Create the HA / DRS Cluster .

3) Enable EVC on the cluster based on your processor architecture.

4) Pick up any host and shutdown the running VM’s and the CVM ( Please keep in mind , you can shutdown only one CVM at a time).

5) Then drag & drop the host to the Cluster , the  host will be added to the cluster without any hassle.

6) Power on the VM’s and the CVM ( wait till the CVM completes the boot)

7) Now , VMotion the VCenter VM to the host which is part of the Cluster already.

8) That’s it repeat  steps 4 ,5 & 6 for the remaining hosts.

Hint:

# In case if you have forgotten to enable EVC before you put the Cluster in to production , and now you are in a situation , that you need to expand your Nutanix Cluster and enabling EVC becomes mandatory to add the new nodes to the existing ESXi cluster.In this case , you could do the additional steps given below to achieve the intended result. ( Again , you may need to disable the Admission Control temporarily and enable it again until you finish all the steps)

 

1) Create a new Cluster (without EVC)

2) Select a host and VMotion  all the Production VM’s running on that host to other remaining hosts.

3) Shutdown the CVM

4) Put the host on to the Maintenance Mode

5) Drag and Drop the host to the new Cluster

6) Exit from the Maintenance Mode & Power on the CVM.

7) Then VMotion the VCenter VM & Other VM’s to this host.

8) Do the steps 2 – 6 for other remaining hosts.

9) Reconfigure your old cluster with proper EVC mode.

10) Then repeat 2 – 6 for all the hosts.

Source :

Refer https://www.virten.net/2013/04/intel-cpu-evc-matrix/ for the guidelines on EVC modes

Video Reference : https://www.youtube.com/watch?v=DSfzafr1ndA

 

 

 

March 18, 2019 at 2:24 pm Leave a comment

AsBuilt Report for VSphere

Hi Folks

Until recent years , I was struggling to build a proper AsBuilt Document for VSphere environments. As the manual process requires capturing screenshots and time consuming word document preparations.

Last week , I came across 2 blogs talking about this AsBuilt tool for VMware which turned out to be  very handy and must have tool for VMware installations .

For those who want to read more about this tool, could visit the 2 blogs that are listed at the bottom of this page.

You need Windows PowerShell. Once you are ready with the PowerShell run the below commands to build your AsBuilt document .

 

1) Install the PSCribo Module

 #Install-Module PSCribo

2)Download the AsBuilt PowerShell Scripts via https://github.com/tpcarman/As-Built-Report

2.1)Extract it to a Folder

#Import-Module C:\As-Built-Report-dev\AsBuiltReport.psd1

3)Install PowerCLI Module

#Find-Module -Name VMware.PowerCLI

#Install-Module -Name VMware.PowerCLI

3.1)Run the below command to bypass SSL warning for VCenter/ESXi

#Set-PowerCLIConfiguration -InvalidCertificateAction Ignore -Confirm:$false

4) Below command will create the Report

New-AsBuiltReport -Target vcenterip -Credential (get-credential) -Type vSphere -Format HTML,Word -TimeStamp imeStamp -Healthchecks -AsBuiltConfigPath C:\As-Built-Report-dev\Src\Public\Reports\vSphere\vsphere.json

Source:

https://www.timcarman.net/as-built-report/

As Built Report – working with it in my lab

 

January 24, 2019 at 3:29 pm Leave a comment

How to Capture & Analyze Network Traffic on ESXi

Being an ESXI  Implementer or an Administrator , you may come across some situations where you need to make your hands dirty 🙂 , with deep network troubleshooting.  I had a similar situation few months ago , which I would like to share it in this post.

We deployed the Horizon View (for VDI) in one of our customer’s ESXi Cluster ( 8 Nodes) environment, The Desktop users were complaining about they were not able to specific network .

 Thus to further investigate we swapped the Physical Adapter to the on-board BroadCom cards (1Gps). Then we were able to re-establish the network. We thought to engage the VMware Support with the intention to find out the root cause and get a permanent fix. The VMware support was pretty awesome and they were able to nail it very quickly.

First they used the two built-in commands on ESXI , which are

  • pktcap-uw (To capture the Network Packets)
  • tcpdump-uw ( To read the captured Packets)

They ran the below commands on both the NIC cards to initially capture the traffic.

  • pktcap-uw –uplink vmnic0 –dir 0 –mac 00:00:00:00:00:00 —vlan 18 -o /tmp/f.pcap

uplink –  Name of the VMnic

dir      –  0  means RX Traffic

mac   –  MAC address of the machine which you are troubleshooting

vlan   –  The VLAN ID

Thereafter we read the  output of the above command using 

  •     tcpdump-uw -ner /tmp/f.pcap

By comparing the output from both NIC’s  we were able to narrow down the problem to the Mellanox cards. when tagged traffic passed by on a Mellanox Network Card (10 Gbps), the reply packet was not being tagged with the proper VLAN ID causing disruption to the network traffic.

 

Good Luck

Muralee

 

 

December 12, 2018 at 10:52 am Leave a comment

VMware HA Network Failover & Failback Delay

Hi Guys

There are lots of article describes about VMware VSwitch Teaming capabilities and their configuration. But I could not find any article that explains some actions need to be done to avoid these delays and what are the expected behavior.

So recently I came across two good resource that helped me to a good idea on this area. So I have listed the resource below for anyone have a similar requirement.

Source 1:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003804

Source 2: ( Bit old doc , but still applicable for the newer versions as well.

vmware_network_config

October 23, 2017 at 3:14 pm Leave a comment

ESXi 6.5 changes to HA

Hi All

With the latest release of ESXi 6.5 , VMware have made lots of changes to the HA Capability.

Below article provides a detailed description about these improvements:

source: http://blog.servercentral.com/high-availability-redundancy-features-vsphere-6.5.

Also this articles clarifies the correct method  of calculating the Percentage method based Admission control as well.

Screenshot extract from the article mentioned.

October 23, 2017 at 3:05 pm Leave a comment

ESXi Host Disconnects from vCenter Server

Hi All

Recently we had an issue in one of customer environment where he is    hosting 3 nodes ESXi Cluster on Nutanix. Suddenly one of the host was showing not responding and disconnected from the VCenter. But luckily there was no impact to the production VM ‘s hosted in that node since it was only the Management Network was having issue with it. After several hours of troubleshooting we decided to call the VMware Support and found out the issue is related to KB 2145611)

Below is the extract from the vmkernel.log
——————————————————————————-
2017-03-19T05:35:01.871Z cpu26:7190268)ALERT: hostd detected to be non-responsive
2017-03-19T06:00:01.988Z cpu2:7192142)ALERT: hostd detected to be non-responsive
2017-03-19T06:02:53.474Z cpu6:36416)StorageApdHandler: 1204: APD start for 0x4305932c3770 [8c9d039d-452d1170]
2017-03-19T06:02:53.474Z cpu6:36416)StorageApdHandler: 1204: APD start for 0x4305932c4fd0 [fa49f8b0-fa322ecd]
2017-03-19T06:02:59.369Z cpu18:32953)StorageApdHandler: 1292: APD bounce-exit for 0x4305932c4fd0 [fa49f8b0-fa322ecd]
2017-03-19T06:02:59.369Z cpu18:32953)StorageApdHandler: 1292: APD bounce-exit for 0x4305932c3770 [8c9d039d-452d1170]

2017-03-19T09:40:04.774Z cpu44:7213651)WARNING: LinuxFileDesc: 5637: Unrecoverable exec failure: Failure during exec while original state already lost
2017-03-19T09:40:06.784Z cpu24:7213652)WARNING: UserParam: 1301: could not change group to <host/vim/vimuser/terminal/ssh>: Admission check failed for memory resource
2017-03-19T09:40:06.784Z cpu24:7213652)WARNING: LinuxFileDesc: 5637: Unrecoverable exec failure: Failure during exec while original state already lost
2017-03-19T09:40:06.986Z cpu29:7213653)WARNING: UserParam: 1301: could not change group to <host/vim/vimuser/terminal/ssh>: Admission check failed for memory resource
2017-03-19T09:40:06.986Z cpu29:7213653)WARNING: LinuxFileDesc: 5637: Unrecoverable exec failure: Failure during exec while original state already lost
2017-03-19T09:41:39.969Z cpu16:37557)WARNING: LinuxThread: 340: Error cloning thread: -28 (bad0081)
2017-03-19T09:45:52.490Z cpu43:7214205)WARNING: User: 5366: Error in exec’d cartel setup: Failed to map section: Admission check failed for memory resource
2017-03-19T09:45:52.490Z cpu43:7214205)WARNING: LinuxFileDesc: 5637: Unrecoverable exec failure: Failure during exec while original state already lost
2017-03-19T09:46:06.930Z cpu30:7214223)WARNING: LinuxThread: 340: Error cloning thread: -28 (bad0081)
2017-03-19T09:46:07.236Z cpu41:7214225)WARNING: LinuxThread: 340: Error cloning thread: -28 (bad0081)
2017-03-19T09:46:46.417Z cpu22:7214286)WARNING: User: 5366: Error in exec’d cartel setup: Failed to map section: Admission check failed for memory resource
2017-03-19T09:46:46.417Z cpu22:7214286)WARNING: LinuxFileDesc: 5637: Unrecoverable exec failure: Failure during exec while original state already lost
2017-03-19T09:47:11.461Z cpu26:37558)WARNING: LinuxThread: 340: Error cloning thread: -28 (bad0081)
2017-03-19T09:49:19.688Z cpu5:7214435)WARNING: LinuxThread: 340: Error cloning thread: -28 (bad0081)
————————————————————————————-

The support engineer suggested that we could try it by clear the likewise cache(where the ESXI host the AD authentication related data) before applying the patch.

The commands he used are:(Take a Putty Session to the ESXi host impacted)

# /usr/lib/vmware/likewsie/lw-lsa ad-cache –delete all

The above command will produce an error (file not found) if there is no cache.

Good luck.

 

 

 

March 20, 2017 at 11:06 am 2 comments

Older Posts


Archives

Categories

Follow Hope you like it.. on WordPress.com

Blog Stats

  • 32,533 hits

%d bloggers like this: