Posts filed under ‘VMware’

Reset to device, \Device\RaidPort0, was issued” error in the Windows event log

Environment: VSphere ESXi 6.7 on HP DL 380 (Single Server)

Problem: The VM’s getting hanged / frozen. Cannot Login to Windows nor issue any Power off commands. During the investigation , we found out that the VM’s were recording Event ID 129 with the Warning message “Reset to device, \Device\RaidPort0, was issued” , just before the VM becoming unresponsive.

We were referring the VMware KB https://kb.vmware.com/s/article/2063346 , and confirmed the LSI_SAS driver is updated to the latest version. But , luckily in our case , this deployment was a temporary one as we are planning to move this VM’s to a stable VSphere Cluster running on Nutanix. After few days ,moving the VM’s to the Nutanix environment , we noticed that the VM’s were functioning well with out any issues.

So for those who are having a similar issue , you need to check the underlying storage structure . As it could cause similar issues like this.

NOTE: During this unresponsive state , you could notice the Disk Latency stays at more than 20. This definitely a problem for a VM’s responsiveness.


Advertisements

March 24, 2019 at 12:08 pm Leave a comment

How to enable EVC when VCenter Server is running on VM in a Nutanix Cluster

As part of the Nutanix best practices we need to enable the EVC on the VSphere Cluster.  In that sense , when the VCenter Server it self a VM , you will be dragged in to  a chicken and egg situation.  Because , when a host contains powered on VM , you will not be able to add the host to the EVC enabled Cluster. Thus , to overcome this condition , you could follow the below guidelines. (You may need to disable the Admission Control temporarily and enable it again until you finish all the steps)

1) Add the hosts to the DataCenter .

2) Create the HA / DRS Cluster .

3) Enable EVC on the cluster based on your processor architecture.

4) Pick up any host and shutdown the running VM’s and the CVM ( Please keep in mind , you can shutdown only one CVM at a time).

5) Then drag & drop the host to the Cluster , the  host will be added to the cluster without any hassle.

6) Power on the VM’s and the CVM ( wait till the CVM completes the boot)

7) Now , VMotion the VCenter VM to the host which is part of the Cluster already.

8) That’s it repeat  steps 4 ,5 & 6 for the remaining hosts.

Hint:

# In case if you have forgotten to enable EVC before you put the Cluster in to production , and now you are in a situation , that you need to expand your Nutanix Cluster and enabling EVC becomes mandatory to add the new nodes to the existing ESXi cluster.In this case , you could do the additional steps given below to achieve the intended result. ( Again , you may need to disable the Admission Control temporarily and enable it again until you finish all the steps)

 

1) Create a new Cluster (without EVC)

2) Select a host and VMotion  all the Production VM’s running on that host to other remaining hosts.

3) Shutdown the CVM

4) Put the host on to the Maintenance Mode

5) Drag and Drop the host to the new Cluster

6) Exit from the Maintenance Mode & Power on the CVM.

7) Then VMotion the VCenter VM & Other VM’s to this host.

8) Do the steps 2 – 6 for other remaining hosts.

9) Reconfigure your old cluster with proper EVC mode.

10) Then repeat 2 – 6 for all the hosts.

Source :

Refer https://www.virten.net/2013/04/intel-cpu-evc-matrix/ for the guidelines on EVC modes

Video Reference : https://www.youtube.com/watch?v=DSfzafr1ndA

 

 

 

March 18, 2019 at 2:24 pm Leave a comment

AsBuilt Report for VSphere

Hi Folks

Until recent years , I was struggling to build a proper AsBuilt Document for VSphere environments. As the manual process requires capturing screenshots and time consuming word document preparations.

Last week , I came across 2 blogs talking about this AsBuilt tool for VMware which turned out to be  very handy and must have tool for VMware installations .

For those who want to read more about this tool, could visit the 2 blogs that are listed at the bottom of this page.

You need Windows PowerShell. Once you are ready with the PowerShell run the below commands to build your AsBuilt document .

 

1) Install the PSCribo Module

 #Install-Module PSCribo

2)Download the AsBuilt PowerShell Scripts via https://github.com/tpcarman/As-Built-Report

2.1)Extract it to a Folder

#Import-Module C:\As-Built-Report-dev\AsBuiltReport.psd1

3)Install PowerCLI Module

#Find-Module -Name VMware.PowerCLI

#Install-Module -Name VMware.PowerCLI

3.1)Run the below command to bypass SSL warning for VCenter/ESXi

#Set-PowerCLIConfiguration -InvalidCertificateAction Ignore -Confirm:$false

4) Below command will create the Report

New-AsBuiltReport -Target vcenterip -Credential (get-credential) -Type vSphere -Format HTML,Word -TimeStamp imeStamp -Healthchecks -AsBuiltConfigPath C:\As-Built-Report-dev\Src\Public\Reports\vSphere\vsphere.json

Source:

https://www.timcarman.net/as-built-report/

As Built Report – working with it in my lab

 

January 24, 2019 at 3:29 pm Leave a comment

How to Capture & Analyze Network Traffic on ESXi

Being an ESXI  Implementer or an Administrator , you may come across some situations where you need to make your hands dirty 🙂 , with deep network troubleshooting.  I had a similar situation few months ago , which I would like to share it in this post.

We deployed the Horizon View (for VDI) in one of our customer’s ESXi Cluster ( 8 Nodes) environment, The Desktop users were complaining about they were not able to specific network .

 Thus to further investigate we swapped the Physical Adapter to the on-board BroadCom cards (1Gps). Then we were able to re-establish the network. We thought to engage the VMware Support with the intention to find out the root cause and get a permanent fix. The VMware support was pretty awesome and they were able to nail it very quickly.

First they used the two built-in commands on ESXI , which are

  • pktcap-uw (To capture the Network Packets)
  • tcpdump-uw ( To read the captured Packets)

They ran the below commands on both the NIC cards to initially capture the traffic.

  • pktcap-uw –uplink vmnic0 –dir 0 –mac 00:00:00:00:00:00 —vlan 18 -o /tmp/f.pcap

uplink –  Name of the VMnic

dir      –  0  means RX Traffic

mac   –  MAC address of the machine which you are troubleshooting

vlan   –  The VLAN ID

Thereafter we read the  output of the above command using 

  •     tcpdump-uw -ner /tmp/f.pcap

By comparing the output from both NIC’s  we were able to narrow down the problem to the Mellanox cards. when tagged traffic passed by on a Mellanox Network Card (10 Gbps), the reply packet was not being tagged with the proper VLAN ID causing disruption to the network traffic.

 

Good Luck

Muralee

 

 

December 12, 2018 at 10:52 am Leave a comment

VMware HA Network Failover & Failback Delay

Hi Guys

There are lots of article describes about VMware VSwitch Teaming capabilities and their configuration. But I could not find any article that explains some actions need to be done to avoid these delays and what are the expected behavior.

So recently I came across two good resource that helped me to a good idea on this area. So I have listed the resource below for anyone have a similar requirement.

Source 1:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003804

Source 2: ( Bit old doc , but still applicable for the newer versions as well.

vmware_network_config

October 23, 2017 at 3:14 pm Leave a comment

ESXi 6.5 changes to HA

Hi All

With the latest release of ESXi 6.5 , VMware have made lots of changes to the HA Capability.

Below article provides a detailed description about these improvements:

source: http://blog.servercentral.com/high-availability-redundancy-features-vsphere-6.5.

Also this articles clarifies the correct method  of calculating the Percentage method based Admission control as well.

Screenshot extract from the article mentioned.

October 23, 2017 at 3:05 pm Leave a comment

ESXi Host Disconnects from vCenter Server

Hi All

Recently we had an issue in one of customer environment where he is    hosting 3 nodes ESXi Cluster on Nutanix. Suddenly one of the host was showing not responding and disconnected from the VCenter. But luckily there was no impact to the production VM ‘s hosted in that node since it was only the Management Network was having issue with it. After several hours of troubleshooting we decided to call the VMware Support and found out the issue is related to KB 2145611)

Below is the extract from the vmkernel.log
——————————————————————————-
2017-03-19T05:35:01.871Z cpu26:7190268)ALERT: hostd detected to be non-responsive
2017-03-19T06:00:01.988Z cpu2:7192142)ALERT: hostd detected to be non-responsive
2017-03-19T06:02:53.474Z cpu6:36416)StorageApdHandler: 1204: APD start for 0x4305932c3770 [8c9d039d-452d1170]
2017-03-19T06:02:53.474Z cpu6:36416)StorageApdHandler: 1204: APD start for 0x4305932c4fd0 [fa49f8b0-fa322ecd]
2017-03-19T06:02:59.369Z cpu18:32953)StorageApdHandler: 1292: APD bounce-exit for 0x4305932c4fd0 [fa49f8b0-fa322ecd]
2017-03-19T06:02:59.369Z cpu18:32953)StorageApdHandler: 1292: APD bounce-exit for 0x4305932c3770 [8c9d039d-452d1170]

2017-03-19T09:40:04.774Z cpu44:7213651)WARNING: LinuxFileDesc: 5637: Unrecoverable exec failure: Failure during exec while original state already lost
2017-03-19T09:40:06.784Z cpu24:7213652)WARNING: UserParam: 1301: could not change group to <host/vim/vimuser/terminal/ssh>: Admission check failed for memory resource
2017-03-19T09:40:06.784Z cpu24:7213652)WARNING: LinuxFileDesc: 5637: Unrecoverable exec failure: Failure during exec while original state already lost
2017-03-19T09:40:06.986Z cpu29:7213653)WARNING: UserParam: 1301: could not change group to <host/vim/vimuser/terminal/ssh>: Admission check failed for memory resource
2017-03-19T09:40:06.986Z cpu29:7213653)WARNING: LinuxFileDesc: 5637: Unrecoverable exec failure: Failure during exec while original state already lost
2017-03-19T09:41:39.969Z cpu16:37557)WARNING: LinuxThread: 340: Error cloning thread: -28 (bad0081)
2017-03-19T09:45:52.490Z cpu43:7214205)WARNING: User: 5366: Error in exec’d cartel setup: Failed to map section: Admission check failed for memory resource
2017-03-19T09:45:52.490Z cpu43:7214205)WARNING: LinuxFileDesc: 5637: Unrecoverable exec failure: Failure during exec while original state already lost
2017-03-19T09:46:06.930Z cpu30:7214223)WARNING: LinuxThread: 340: Error cloning thread: -28 (bad0081)
2017-03-19T09:46:07.236Z cpu41:7214225)WARNING: LinuxThread: 340: Error cloning thread: -28 (bad0081)
2017-03-19T09:46:46.417Z cpu22:7214286)WARNING: User: 5366: Error in exec’d cartel setup: Failed to map section: Admission check failed for memory resource
2017-03-19T09:46:46.417Z cpu22:7214286)WARNING: LinuxFileDesc: 5637: Unrecoverable exec failure: Failure during exec while original state already lost
2017-03-19T09:47:11.461Z cpu26:37558)WARNING: LinuxThread: 340: Error cloning thread: -28 (bad0081)
2017-03-19T09:49:19.688Z cpu5:7214435)WARNING: LinuxThread: 340: Error cloning thread: -28 (bad0081)
————————————————————————————-

The support engineer suggested that we could try it by clear the likewise cache(where the ESXI host the AD authentication related data) before applying the patch.

The commands he used are:(Take a Putty Session to the ESXi host impacted)

# /usr/lib/vmware/likewsie/lw-lsa ad-cache –delete all

The above command will produce an error (file not found) if there is no cache.

Good luck.

 

 

 

March 20, 2017 at 11:06 am 2 comments

Older Posts


Archives

Categories

Follow Hope you like it.. on WordPress.com

Blog Stats

  • 20,535 hits

%d bloggers like this: