Posts filed under ‘Nutanix’

How to enable EVC when VCenter Server is running on VM in a Nutanix Cluster

As part of the Nutanix best practices we need to enable the EVC on the VSphere Cluster.  In that sense , when the VCenter Server it self a VM , you will be dragged in to  a chicken and egg situation.  Because , when a host contains powered on VM , you will not be able to add the host to the EVC enabled Cluster. Thus , to overcome this condition , you could follow the below guidelines. (You may need to disable the Admission Control temporarily and enable it again until you finish all the steps)

1) Add the hosts to the DataCenter .

2) Create the HA / DRS Cluster .

3) Enable EVC on the cluster based on your processor architecture.

4) Pick up any host and shutdown the running VM’s and the CVM ( Please keep in mind , you can shutdown only one CVM at a time).

5) Then drag & drop the host to the Cluster , the  host will be added to the cluster without any hassle.

6) Power on the VM’s and the CVM ( wait till the CVM completes the boot)

7) Now , VMotion the VCenter VM to the host which is part of the Cluster already.

8) That’s it repeat  steps 4 ,5 & 6 for the remaining hosts.

Hint:

# In case if you have forgotten to enable EVC before you put the Cluster in to production , and now you are in a situation , that you need to expand your Nutanix Cluster and enabling EVC becomes mandatory to add the new nodes to the existing ESXi cluster.In this case , you could do the additional steps given below to achieve the intended result. ( Again , you may need to disable the Admission Control temporarily and enable it again until you finish all the steps)

 

1) Create a new Cluster (without EVC)

2) Select a host and VMotion  all the Production VM’s running on that host to other remaining hosts.

3) Shutdown the CVM

4) Put the host on to the Maintenance Mode

5) Drag and Drop the host to the new Cluster

6) Exit from the Maintenance Mode & Power on the CVM.

7) Then VMotion the VCenter VM & Other VM’s to this host.

8) Do the steps 2 – 6 for other remaining hosts.

9) Reconfigure your old cluster with proper EVC mode.

10) Then repeat 2 – 6 for all the hosts.

Source :

Refer https://www.virten.net/2013/04/intel-cpu-evc-matrix/ for the guidelines on EVC modes

Video Reference : https://www.youtube.com/watch?v=DSfzafr1ndA

 

 

 

Advertisements

March 18, 2019 at 2:24 pm Leave a comment

Latency between the Nutanix CVM’s

Recently we noticed the Prism was throwing an error stating that there is latency between CVM’s . To investigate the issue we raised a support call with the Nutanix Team. I am sharing the procedures followed by the Nutanix Team  as it may help somebody who are facing a similar issue.

# Login to Controller VM

# cd ~nutanix/data/logs/sysstats (This location will contain the ping_hosts & ping_gateway logs)

# tailf ping_hosts.INFO

In our case we noticed there was unreachable on one of the CVM’s

x.x.x.1 : 0.187 ms

x.x.x.2 : Unreachable

x.x.x.3 : 0.028 ms

So we consulted the Network Team and found out that the Switch port where one of the node is connected , conatined lots of errors and we had to replace the cable.

 

That’s it the problem got resolved.

 

March 18, 2019 at 10:34 am Leave a comment

Nutanix: fatal mounting installer media

Last week , we were doing the foundation on NX-1365-G6 block. The  foundation process  hangs at 26% with error fatal mounting installer media. When this happens, the Nutanix nodes are being powered off.  I have attached 2 screenshots below that depicts the problem we faced.

 

 

 

 

 

 

 

You could see it in the images , that the IPMITool is trying to restart the server and failing to do so.

Therefore ,  to overcome this situation , we  logged in to IPMI on each node and did a Uni Reset & Factory Default via the Maintenance Menu. Thereafter we restarted the foundation from scratch and  it got completed successfully.

February 14, 2019 at 12:28 pm Leave a comment

Nutanix AOS Upgrade Tips

Recently we were upgrading our Nutanix Cluster which was running an AOS version 4.5.2.3 to the latest 5.9. The process was seamless and non interruptive.  I have listed the commands  we have used along with Nutanix Engineer during the process for future reference.

 

Initial Check prior to AOS Upgrade

  • ncli cluster info
  • ncli host ls
  • ncli ru ls
  • ncli ms ls
  • ncc –version
  • cluster status | grep -v UP
  • nodetool -h 0 ring | grep -i normal | wc -l
  • svmips | wc -w

Once the output of above checks are fine Use the Software Upgrade feature from PRISM to upgrade the AOS.

To check the upgrade / pre-upgrade status and on which node is being picked up.& Confirm the versions after upgrade

  • allssh ls -ltra ~/data/logs | grep -i preupgrade
  • tail -F ~/data/logs/preupgrade.out
  • use upgrade_status (to verify the status , less verbose mode)
  • ncli –version
  • stargate –version
  • watch -d genesis status (to check the services status after the CVM reboot)

Optional: To delete the previously uploaded ISO

  • cd ~/software_downloads/nos ( use it with allssh to run it on all the CVM’s)

Finally after AOS upgrade sometime the curator replication process and it takes some time to complete . Until it completes we cannot proceed with the other update , thus you could check via the below command;

  • curator_cli get_under_replication_info

 

 

January 20, 2019 at 1:30 pm Leave a comment

Nutanix NTP Issues & Troubleshooting.

The below commands helps to troubleshoot and fix NTP issues on Nutanix Cluster. You can run these command by logging to any of the CVM’s.

To check the date on all the nodes

allssh ssh root@192.168.5.1 date

To check the NTP source
allssh ssh root@192.168.5.1 ntpq p
To update the NTP server
allssh ssh root@192.168.5.1 service ntpd stop (Stops the NTP service)
allssh ssh root@192.168.5.1 ntpdate u 1.1.1.1 ( Add the NTP server IP)
allssh ssh root@192.168.5.1 service ntpd start (Starts the NTP service)
(source: http://vmwaremine.com)
——————————————
Further Troubleshooting.
——————————————
In case if you are bombed with NTP alerts on Prism like Time drift you could run the below commands , But I would recommend to contact support.(By default offset of 3 seconds + or – , will throw these error messages)
To check any communication issues with the NTP server
1) sudo nc  -vu 1.1.1.1 123 (leave it for few minutes and Press CTRL+C)(If your NTP is listening on UDP you will not be getting any response)
2) Read the genesis.out file and look for the offset messages ( allssh grep offset ~/data/logs/genesis.out)
3) Run the ntpdate -d 1.1.1.1 (To check the NTP sync data)
As Nutanix recommends run the below cron job to force the servers to reduce the offset.
allssh ‘(/usr/bin/crontab -l && echo “*/1 * * * * bash -lc /home/nutanix/serviceability/bin/fix_time_drift”) | /usr/bin/crontab -‘
Thereafter you could monitor with the below command to observe the NTP offset is being reduced,
allssh “grep offset ~/data/logs/genesis.out | tail -n10”
Finally make sure to remove the cronjob with the below command.
allssh “(/usr/bin/crontab -l | sed ‘/fix_time_drift/d’ | /usr/bin/crontab -)”.
To check the NTP sync’s on AHV host.
hostssh ntpq -pn

July 24, 2018 at 9:00 am Leave a comment

Virtualized Domain Controller Nightmare on Nutanix Hyper-V Cluster

Hi Guys

We recently deployed a Hyper-V  Nutanix Cluster , and everything looked fine until we hit the wall. For some reasons if the VM(Domain Controllers)  go down you will not be able to power it on . The reason behind this that the Hyper-V Cluster on Nutanix uses SMB3 based share as the shared storage . Unless  the Hypervisor is able to authenticate the access to the SMB3 share is blocked. In our case it was not possible because the 2 DC VM’s were powered off and unable to power it on.(Chicken and Egg situation)

Error Message I received:

error1 error2 error3

The below Nutanix article explains the below:

Source:https://portal.nutanix.com/#/page/kbs/details?targetId=kA032000000TTGWCA4
nutanix-kb

So the conclusion is that you must need either a physical domain controller or a DC VM that does not sit on the SMB3 share. Hopefully the Windows 2016  may come up with a solution for this scenario.

Update1: As a last resort , I changed the Virtual disk path from on the DC VM from the FQDN  of the cluster name to the IP address of the cluster , voila I was able power it on the 2 VM’s(but I do not have any clue how this has worked)

December 27, 2016 at 9:00 am 2 comments

Nutanix Best Practices for VMware

Hi  Folks

Recently we got an opportunity to work with Nutanix Converged Solution . When we deployed there was some customization that we need to make on HA / DRS Cluster settings to realign the configuration according to Nutanix – VMware Best Practices
(These information were provided by the support team)

Note1: In Nutanix Cluster is created with single Datastore the Vsphere HA will popup error related to HA Heartbeat stating insufficient datastore for heartbeat , that could be suppress as below
bestpractice_datastore_heartbeating

Note2: In HA Cluster the VM restart priority and Host Isolation Response need to be changed for CVM’s as below:
bestpractice_cluster_vm_settings_1

Note3:  The VM Monitoring for CVM’s need to be disabled
bestpractice_cluster_vm_monitoring_1

Note4: In DRS Cluster the Automation level for CVM’s need to be disabled as well
bestpractice_drs_cluster_vm_settings_1

December 11, 2016 at 2:29 pm Leave a comment


Archives

Categories

Follow Hope you like it.. on WordPress.com

Blog Stats

  • 18,530 hits

%d bloggers like this: