Posts filed under ‘Nutanix’

Nutanix: fatal mounting installer media

Last week , we were doing the foundation on NX-1365-G6 block. The  foundation process  hangs at 26% with error fatal mounting installer media. When this happens, the Nutanix nodes are being powered off.  I have attached 2 screenshots below that depicts the problem we faced.

 

 

 

 

 

 

 

You could see it in the images , that the IPMITool is trying to restart the server and failing to do so.

Therefore ,  to overcome this situation , we  logged in to IPMI on each node and did a Uni Reset & Factory Default via the Maintenance Menu. Thereafter we restarted the foundation from scratch and  it got completed successfully.

Advertisements

February 14, 2019 at 12:28 pm Leave a comment

Nutanix AOS Upgrade Tips

Recently we were upgrading our Nutanix Cluster which was running an AOS version 4.5.2.3 to the latest 5.9. The process was seamless and non interruptive.  I have listed the commands  we have used along with Nutanix Engineer during the process for future reference.

 

Initial Check prior to AOS Upgrade

  • ncli cluster info
  • ncli host ls
  • ncli ru ls
  • ncli ms ls
  • ncc –version
  • cluster status | grep -v UP
  • nodetool -h 0 ring | grep -i normal | wc -l
  • svmips | wc -w

Once the output of above checks are fine Use the Software Upgrade feature from PRISM to upgrade the AOS.

To check the upgrade / pre-upgrade status and on which node is being picked up.& Confirm the versions after upgrade

  • allssh ls -ltra ~/data/logs | grep -i preupgrade
  • tail -F ~/data/logs/preupgrade.out
  • use upgrade_status (to verify the status , less verbose mode)
  • ncli –version
  • stargate –version
  • watch -d genesis status (to check the services status after the CVM reboot)

Optional: To delete the previously uploaded ISO

  • cd ~/software_downloads/nos ( use it with allssh to run it on all the CVM’s)

Finally after AOS upgrade sometime the curator replication process and it takes some time to complete . Until it completes we cannot proceed with the other update , thus you could check via the below command;

  • curator_cli get_under_replication_info

 

 

January 20, 2019 at 1:30 pm Leave a comment

Nutanix NTP Issues & Troubleshooting.

The below commands helps to troubleshoot and fix NTP issues on Nutanix Cluster. You can run these command by logging to any of the CVM’s.

To check the date on all the nodes

allssh ssh root@192.168.5.1 date

To check the NTP source
allssh ssh root@192.168.5.1 ntpq p
To update the NTP server
allssh ssh root@192.168.5.1 service ntpd stop (Stops the NTP service)
allssh ssh root@192.168.5.1 ntpdate u 1.1.1.1 ( Add the NTP server IP)
allssh ssh root@192.168.5.1 service ntpd start (Starts the NTP service)
(source: http://vmwaremine.com)
——————————————
Further Troubleshooting.
——————————————
In case if you are bombed with NTP alerts on Prism like Time drift you could run the below commands , But I would recommend to contact support.(By default offset of 3 seconds + or – , will throw these error messages)
To check any communication issues with the NTP server
1) sudo nc  -vu 1.1.1.1 123 (leave it for few minutes and Press CTRL+C)(If your NTP is listening on UDP you will not be getting any response)
2) Read the genesis.out file and look for the offset messages ( allssh grep offset ~/data/logs/genesis.out)
3) Run the ntpdate -d 1.1.1.1 (To check the NTP sync data)
As Nutanix recommends run the below cron job to force the servers to reduce the offset.
allssh ‘(/usr/bin/crontab -l && echo “*/1 * * * * bash -lc /home/nutanix/serviceability/bin/fix_time_drift”) | /usr/bin/crontab -‘
Thereafter you could monitor with the below command to observe the NTP offset is being reduced,
allssh “grep offset ~/data/logs/genesis.out | tail -n10”
Finally make sure to remove the cronjob with the below command.
allssh “(/usr/bin/crontab -l | sed ‘/fix_time_drift/d’ | /usr/bin/crontab -)”.
To check the NTP sync’s on AHV host.
hostssh ntpq -pn

July 24, 2018 at 9:00 am Leave a comment

Virtualized Domain Controller Nightmare on Nutanix Hyper-V Cluster

Hi Guys

We recently deployed a Hyper-V  Nutanix Cluster , and everything looked fine until we hit the wall. For some reasons if the VM(Domain Controllers)  go down you will not be able to power it on . The reason behind this that the Hyper-V Cluster on Nutanix uses SMB3 based share as the shared storage . Unless  the Hypervisor is able to authenticate the access to the SMB3 share is blocked. In our case it was not possible because the 2 DC VM’s were powered off and unable to power it on.(Chicken and Egg situation)

Error Message I received:

error1 error2 error3

The below Nutanix article explains the below:

Source:https://portal.nutanix.com/#/page/kbs/details?targetId=kA032000000TTGWCA4
nutanix-kb

So the conclusion is that you must need either a physical domain controller or a DC VM that does not sit on the SMB3 share. Hopefully the Windows 2016  may come up with a solution for this scenario.

Update1: As a last resort , I changed the Virtual disk path from on the DC VM from the FQDN  of the cluster name to the IP address of the cluster , voila I was able power it on the 2 VM’s(but I do not have any clue how this has worked)

December 27, 2016 at 9:00 am 2 comments

Nutanix Best Practices for VMware

Hi  Folks

Recently we got an opportunity to work with Nutanix Converged Solution . When we deployed there was some customization that we need to make on HA / DRS Cluster settings to realign the configuration according to Nutanix – VMware Best Practices
(These information were provided by the support team)

Note1: In Nutanix Cluster is created with single Datastore the Vsphere HA will popup error related to HA Heartbeat stating insufficient datastore for heartbeat , that could be suppress as below
bestpractice_datastore_heartbeating

Note2: In HA Cluster the VM restart priority and Host Isolation Response need to be changed for CVM’s as below:
bestpractice_cluster_vm_settings_1

Note3:  The VM Monitoring for CVM’s need to be disabled
bestpractice_cluster_vm_monitoring_1

Note4: In DRS Cluster the Automation level for CVM’s need to be disabled as well
bestpractice_drs_cluster_vm_settings_1

December 11, 2016 at 2:29 pm Leave a comment


Archives

Categories

Follow Hope you like it.. on WordPress.com

Blog Stats

  • 16,734 hits

%d bloggers like this: