How to fix the disk usage warning when /home partition or /home/nutanix directory is full



This article describes ways to safely free up space if /home or /home/nutanix becomes full or does not contain enough space to facilitate an AOS upgrade or PCVM upgrade.

Versions affected:

ALL Prism Central Versions, ALL AOS VersionTroubleshootingUpgrade


WARNING: DO NOT treat the Nutanix CVM (Controller VM) or PCVM as a normal Linux machine. DO NOT perform “rm -rf /home” on any of the CVMs or PCVM. It could lead to data loss scenarios. Contact Nutanix Support in case you have any doubts.

This condition can be reported in two scenarios:

  • The NCC health checkdisk_usage_check reports that the /home partition usage is above a certain threshold (by default 75%)
  • The pre-upgrade check test_nutanix_partition_space checks if all nodes have a minimum of 5.6 GB space on the /home/nutanix directory before performing an upgrade

The following error messages will be generated in Prism by the test_nutanix_partition_space pre-upgrade check:

Not enough space on /home/nutanix directory on Controller VM [ip]. Available = x GB : Expected = x GB
Failed to calculate minimum space required
Failed to get disk usage for cvm [ip], most likely because of failure to ssh into cvm
Unexpected output from df on Controller VM [ip]. Please refer to preupgrade.out for further information

Nutanix reserves space on the SSD-tier of each CVM for its infrastructure. These files and directories are located in the /home folder that you see when you log in to a CVM. The size of the /home folder is capped at 40 GB so that the majority of the space on SSD is available for user data.

Due to the limited size of the /home partition, it is possible for it to run low on free space and trigger Prism Alerts, NCC Health Check failures or warnings, or Pre-Upgrade Check failures. These guardrails exist to prevent /home from becoming completely full, as this causes data processing services like Stargate to become unresponsive. Clusters with multiple CVMs having 100% full /home partition will often result in downtime for user VMs.

The Scavenger service running on each CVM is responsible for the automated clean-up of old logs in /home and improvements to its scope were made in AOS 5.5.9, 5.10.1, and later releases. For customers running earlier AOS releases, or in special circumstances, it may be necessary to manually clean up files out of certain directories in order to bring space usage in /home down to a level that will allow future AOS upgrades.

When cleaning up unused binaries and old logs on a CVM, it is important to note that all the user data partitions on each drive associated with a given node are also mounted within /home. This is why we strongly advise against using undocumented commands like “rm -rf /home”, since this will also wipe the user data directories mounted within this path. The purpose of this article is to guide you through identifying the files that are causing the CVM to run low on free space and removing only those which can be safely deleted.


WARNING: DO NOT treat the Nutanix CVM (Controller VM) as a normal Linux machine. DO NOT perform “rm -rf /home” on any of the CVMs. It could lead to data loss scenarios. Contact Nutanix Support in case you have any doubts.

Step 1: Parsing the space usage for “/home”.

Log in to CVM, download to /home/nutanix/tmp directory, make it executable and run it. has some checks (MD5, compatibility, etc.) and deploys script accordingly.

nutanix@cvm:~$ cd ~/tmp
nutanix@cvm:~/tmp$ wget
nutanix@cvm:~/tmp$ mv
nutanix@cvm:~/tmp$ chmod +x
nutanix@cvm:~/tmp$ ./

You can select to deploy the script to the local CVM or all CVMs.

Select package to deploy
     1 : Deploy the tool only to the local CVM
     2 : Deploy the tool to all of the CVMs in the cluster
    Selection (Cancel="c"):

Run the script to get a clear distribution of partition space usage in /home.

nutanix@cvm:~/tmp$ ./

Step 2: Check for files that can be deleted from within the list of approved directories.

PLEASE READ: The following are the ONLY directories within which it is safe to remove files. Take note of the specific guidance for removing files from each directory. Do not use any other commands or scripts to remove files. Do not use “rm -rf” under any circumstances.

  1. Removing Old Logs and Core Files Before removing old logs, check to see if you have any open cases with pending RCAs (Root Cause Analysis). The existing logs might be necessary for resolving those cases and you should check with the owner from Nutanix Support before cleaning up /home. Only delete the files inside these directories. Do not delete the directories themselves.
    • /home/nutanix/data/cores/
    • /home/nutanix/data/binary_logs/
    • /home/nutanix/data/ncc/installer/
    • /home/nutanix/data/log_collector/
    Use this syntax for deleting files within each of these directories: nutanix@cvm:~$ rm /home/nutanix/data/cores/*
  2. Removing Old ISOs and Software Binaries Begin by confirming the version of AOS that is currently installed on your cluster by running the command below. Make sure never to remove any files that are associated with your current AOS version. You will find this under the “Cluster Version” field in the output of the command shown below. nutanix@cvm:~$ ncli cluster info Example output: Cluster Name : Axxxxa Cluster Version : 5.10.2 Only delete the files inside these directories. Do not delete the directories themselves.
    • /home/nutanix/software_uncompressed/ – Delete any old versions other than the versions you are currently upgrading. The software_uncompressed folder is only in use when the pre-upgrade is running and should be removed after a successful upgrade. If you see a running cluster which is currently not upgrading, it is safe to remove everything underneath software_uncompressed
    • /home/nutanix/foundation/isos/ – Old ISOs of hypervisors or Phoenix.
    • /home/nutanix/foundation/tmp/ – Temporary files that can be deleted.
    Use this syntax for deleting files within each of these directories: nutanix@cvm:~$ rm /home/nutanix/foundation/isos/* If you see large files in the software_downloads directory that are not needed for any planned upgrades, do not remove those from the command-line. Instead, use the Prism Upgrade Software UI to accomplish as shown below. This example lists multiple versions of AOS which consume around 5 GB each, simply click on the ‘X’ to delete the files. Then click on each of the following tabs including File Server, Hypervisor, NCC, and Foundation to locate further downloads you may not require. It is possible that Enable Automatic Download is checked. This is located below the above screenshot (on the AOS tab). Left unmonitored, the cluster will download multiple versions, consuming more space in the home directory.

Step 3: Check space usage in /home to see that it is now below 70%.

You can use the “df -h” command to check on the amount of free space in /home. To accommodate a potential AOS upgrade, usage should ideally be below 70%.

nutanix@cvm:~$ allssh "df -h /home"

Example output:

================== x.x.x.x =================
/dev/md2         40G  8.4G   31G  22% /home
================== x.x.x.x =================
/dev/md2         40G  8.5G   31G  22% /home
================== x.x.x.x =================
/dev/md2         40G   19G   21G  49% /home

Cleaned up files from the approved directories but still see high usage in /home?

Contact Nutanix Support and submit the script log bundle (/tmp/home_kb1540_<cvm_name>_<timestamp>.tar.gz). One of our Systems Reliability Engineers (SREs) will promptly assist you with identifying the source of and solution to the problem at hand. Under no circumstances should you remove files from any other directories aside from those found here as these may be critical to the CVM infrastructure or may contain user data.

For the home partition exceeding its limit on the PCVM refer to the KB-8950 to troubleshoot.

September 7, 2020 at 12:11 pm Leave a comment

How do I flush or delete incorrect records from my recursive server cache?

Sometimes a recursive server may have incorrect records in its cache.  These may be as a result of an error made by a zone administrator, or as a result of a deliberately engineered cache poisoning attack.

To identify the faulty records, by dumping and inspecting cache:

rndc dumpdb -all
grep problem.domain /var/named/data/cache_dump.db

(The location of the cache_dump.db may be varied based on the bind configuration)

Or you may be able to identify which records are incorrect by querying your server directly.

dig +norec <ip address of nameserver> <name> <type>

How to solve the problem?

rndc flushname name
  • Use the name of a domain if there are problems with the NS or MX records associated with it.
  • Use the server name, if there are problems with the addresses associated with that server name (for example a nameserver, a webserver or a mailserver).

Flush the cache for a specific name as well as all records below that name

rndc flushtree name
  • This will clear the cache, but it will not clear any names out of ADB, so may not be sufficient for some needs.

If you are not sure where the problem lies, or there are too many records to delete them individually, then you might prefer to:Flush the entire named cache

rndc flush && rndc reload

August 18, 2020 at 11:48 am Leave a comment

How I passed the CASP+

This year I have decided to complete few certifications specializing in the field of security. Based on this goal, I started my certification sprint with the CompTIA Advanced Security Practitioner Certification (CASP+) exam. I chosed this exam due to the reason it is a performance based certification for practitioners , not only for managers.

CASP+ is compliant with ISO 17024 standards and approved by the US DoD to meet directive 8140/8570.01-M requirements.
Regulators and government rely on ANSI accreditation, because it provides confidence and trust in the outputs of an accredited program.

I bought the book titled “CompTIA® Advanced Security Practitioner (CASP) CAS-003 Cert Guide” from the Pearson Store. The books is authored by Robin Abernathy & Troy McMillan . I spent around 4-5 months reading the book and understanding the contents ,as I was making sure that I understood the technologies and the terms in the book. Since, my goal was two folded , one is passing the exam , and the other one is to ensure that the knowledge i gained will help me to handle real world situation in a professional manner.

The exam will contain approximately 90 questions(Multiple choice based and performance based question). The duration is 165 minutes. Further this exam does not give you a scaled score – it is pass / fail only.

I am providing the link below to the book I used. Also I am willing to share the PDF version of the book with anyone who wants to attempt this exam.

August 8, 2020 at 11:12 am Leave a comment

DC & Exchange loses connection during VEEAM Backup


Outlook users get disconnected periodically (at the same time everyday).

When we analyzed the situation, we found out that the issue coincides with the backup windows. Further, investigation reveals that it happens exactly at the time of VMware snapshot removal stage, and this is quite normal ,since the VM will experience a longer VM stun.(Can be confirmed by looking in to the vmware.log). This  was causing the VM (Domain Controller) to freeze,  and at this time the Exchange triggers a Netlogon error with the eventID 5719 because  it loses the connection to the domain controller. The outlook users (Desktop & Smartphone) will be forced to re-open the email client or re-enter the credentials.


So to avoid this , we had to convert the  backup job from VM based to an Agent based. The agent based backup uses the  VSS instead of VMware API triggered VM based snapshots.

Once the above is changed , we did not see any Netlogon event ID 5719 appeared and the users did not complain thereafter.

Good Luck

June 25, 2020 at 1:01 pm Leave a comment

How to troubleshoot DNS Issues with Wireshark

Hi Folks

Until recently I was a big fan Microsoft Message Analyzer. Unfortunately , Microsoft deprecated the product.So I decided to switch to Wireshark. I will not be going through the basic operations of wireshark as there are plenty of good video tutorials on the Internet.

In this article , I will focus on how to capture DNS packets on a BIND server and filter the packets for known queries and the response codes.

Step1: Start the capture on the BIND server

Step2: After running sample queries , Press CTRL & C to end the capture and transfer the .pcap file to the wireshark.

Once you open the .pcap file in the Wireshark , you can use the below filters to display the required data.

** To filter based on the queried domain name ** == “”

** To filter MX queries **
dns.qry.type == 15

** To filter SERVFAIL response **
dns.flags.rcode == 2

You could use ! to exclude a filter in the search for example to exclude dns.qry.type == 15
!dns.qry.type == 15

For detailed list of DNS Response Codes & other DNS parameters refer the below URL’s.

Good Luck.

June 17, 2020 at 2:23 pm Leave a comment

Advanced Troubleshooting of ESXi Server 6.x for vSphere Gurus

Hi Folks

You could refer the attached document for hints that will help you in troubleshooting ESXi environments. This document covers mainly 3 areas.

  • Which log files to review and when.
  • ESXi commands to isolate and troubleshoot issues.
  • Configuration Files.


Source: vmworld.

June 8, 2020 at 9:51 am Leave a comment

sudo: effective uid is not 0, is sudo installed setuid root

When messing with up acl’s you may come across situation where the sudo will be stopped from functioning.  Especially , when you typed sudo you may notice the error “sudo: effective uid is not 0, is sudo installed setuid root”.

To diagnose the issue

Check the /etc/sudoers file , whether you have added the group or the user name in the sudoers file for e.g: user abc

abc        ALL=(ALL)       NOPASSWD: ALL

Step2: if the output of the step 1 is correct check the permission on sudo as below (Output of a working sudo)

# ls -l /usr/bin/sudo
—s–x–x 2 root root 190904 Mar 4 18:21 /usr/bin/sudo

# stat /usr/bin/sudo

Access: (4111/—s–x–x) Uid: ( 0/ root) Gid: ( 0/ root)

In case , if you find the output of Step 2 is not matching with yours you can reset the permission to default

# rpm –setperms sudo.



May 11, 2020 at 12:35 pm Leave a comment

VMware PowerCLI

In this post , I am going to cover the PowerCLI module for VMware. Whenever, I came across a new cmd-let , I will update this post.

First things first, You need to install the PowerCLI. Now , the Windows Powershell have the VMware PowerCLI module. So you could simply install it by.

PS> Install-Module -Name VMware.PowerCLI

Then import it before using the Power CLI.

# To verify the version:
PS> Get-PowerCLIVersion

# To login to VCenter
PS> Connect-VIServer -Server “vcenterhostname”

# To Suppress the Certificate Warning/Error
Set-PowerCLIConfiguration -InvalidCertificateAction Ignore -Confirm:$false

#To list the VM’s with their creation date.
Get-VM | fl Name,CreateDate

March 23, 2020 at 1:25 pm Leave a comment

How to re-configure /configure IPMI using ipmitool in ESXi

This post covers the steps needed to assign / change IP address for IPMI without logging in to IPMI Portal or restarting the server. The tool we are going to use is ipmitool builtin to ESXi.

To get the current IPMI IP Details
#/ipmitool lan print 1

[root@esxi]# /ipmitool lan set 1 ipsrc static

[root@esxi]# /ipmitool lan set 1 ipaddr x.x.x.x
Setting LAN IP Address to x.x.x.x

[root@esxi]# /ipmitool lan set 1 netmask x.x.x.x
Setting LAN Subnet Mask to x.x.x.x

[root@esxi]# /ipmitool lan set 1 defgw ipaddr x.x.x.x
Setting LAN Default Gateway IP to x.x.x.x

[root@esxi]# /ipmitool lan set 1 defgw macaddr xx:xx:xx:xx:xx:xx
Setting LAN Default Gateway MAC to xx:xx:xx:xx:xx:xx

[root@esxi]# /ipmitool lan set 1 arp respond on
Enabling BMC-generated ARP responses

[root@esxi]# /ipmitool lan set 1 snmp public
Setting LAN SNMP Community String to public

Change the IPMI Password

[root@esxi]# /ipmitool user list (Note down the user ID in mycase it is 2)
[root@esxi]# /ipmitool user set password 2
[root@esxi]# /ipmitool lan set 1 access on

To recreate the SSL certificate(Incase if the IPMI Page self-signed certificate is expired

./ipmitool raw 0x30 0x68 0x0



March 19, 2020 at 1:11 pm Leave a comment

How can I create a disk partition on a disk that is greater than 2TB in size on Red Hat Enterprise Linux?

When we try to partition a disk that is larger than 2 TB , you must use the parted utility instead of fdisk. In this example I am referring to my disk as /dev/sdj

#parted /dev/sdj
Using /dev/sdj
Welcome to GNU Parted! Type ‘help’ to view a list of commands.

#(parted) mklabel —–> This will create a GPT label on the disk.
Warning: The existing disk label on /dev/sdj will be destroyed and all data on this disk will be lost. Do you want to continue?
Yes/No? Yes
New disk label type? [gpt]? gpt

(parted) print  

Model: Linux device-mapper (dm)
Disk /dev/sdj: 5662310.4MB ————-> Note down this value as we will be using it the below commands)
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags

Create the partition:
(parted) mkpart primary 0 5662310.4MB

(parted) print ——–> Use this command to verify the partition created.

Unlike , the fdisk , you don’t need to issue the write command to save the changes. Simply type quit to exit from the parted utility. Thereafter , you could proceed with the file system creation.

Root Cause

The fdisk command only supports the legacy MBR partition table format (also known as msdos partition table)

* MBR partition tables use data fields that have a maximum of 32 bit sector numbers, and with 512 bytes/sector that means a maximum of 2^(32+9) bytes per disk or partition is supported.
*MBR partition table can not support accessing data on disks past 2.19TB due to the above limitation
Note that some older versions of fdisk may permit a larger size to be created but the resulting partition table will be invalid.

The parted command can create disk labels using MBR (msdos), GUID Partition Table (GPT), SUN disk labels and many more types.

* The GPT disk label overcomes many of the limitations of the DOS MBR including restrictions on the size of the disk, the size of any one partition and the overall number of partitions.
* Note that booting from a GPT labelled volume requires firmware support and this is not commonly available on non-EFI platforms (including x86 and x86_64 architectures).


March 15, 2020 at 12:59 pm Leave a comment

Older Posts Newer Posts



Follow Hope you like it.. on

Blog Stats

  • 41,607 hits

%d bloggers like this: