The NFS 16 groups limit issue

The last Friday I was involved in a curious situation trying to setup a NFS server. The NFS server was mounted in UNIX server which was using UNIX users accounts assigned to many groups. These users were using files and directories stored in the NFS server.

As brief description of the situación which incites this post, I will say that the problem occurs when you are using UNIX users which are assigned in more than 16 UNIX groups. In this scenario, if you are using NFS (whatever version) with the UNIX system authentication (AUTH_SYS), quite common nowadays in spite of the security recommendations, you will get a permission denied during the access to certain arbitrary files and directories. The reason is that the list of secondary groups assigned to the user is truncated by the AUTH_SYS implementation. That is simple amazing!

Well, to be honest, this is not an unknown NFS problem. This limitation is here, around us, since the early stages of the modern computing technology. After a quick search on Internet, I found the reason why this happens and it is not a NFS limitation but it is a limit specified on AUTH_SYS:

   The client may wish to identify itself, for example, as it is
   identified on a UNIX(tm) system.  The flavor of the client credential
   is "AUTH_SYS".  The opaque data constituting the credential encodes
   the following structure:

         struct authsys_parms {
            unsigned int stamp;
            string machinename<255>;
            unsigned int uid;
            unsigned int gid;
            unsigned int gids<16>;

The root cause

AUTH_SYS is the historical method which is used by client programs contacting an RPC server need. This allows the server get information about how the client should be able to access, and what functions should be allowed. Without authentication, any client on the network that can send packets to the RPC server could access any function.

AUTH_SYS has been in use for years in many systems just because it was the first authentication method available but AUTH_SYS is not a secure authentication method nowadays. In AUTH_SYS, the RPC client sends the UNIX UID and GIDs for the user, the server implicitly trusts that the user is who the user claims to be. All the this information is sent through the network without any kind of encryption and authentication, so it is high vulnerable.

In consequence, AUTH_SYS is an insecure security mode. The result is this can be used as the proverbial open lock on a door. Overall  the technical articles about these matters highly suggest the usage of other alternatives like NFSv4 (even NFSv3) and Kerberos, but  yet AUTH_SYS is commonly used within companies, so we must still deal it.

Note: This article didn’t focus in security issues. The main purpose of this article is describe a specific situation and show the possible alternatives identified during the troubleshooting of the issue.

Taking up the thread …

I was profiling a situation where the main issue was leaded by a UNIX secondary groups list truncation. Before continue, some summary of the context here: A UNIX user has a primary group, defined in the passwd database, but can also be a member of many other groups, defined in the group database. A UNIX system hardcoded  a limit of 16 groups that a user can be a member of (source). This means that clients into UNIX groups only be able to access to 16 groups. Quite poor when you deal with dozens and dozens of groups.

As we already know, the problem is focused in the NFS fulfilment with the AUTH_SYS specifications, which has an in-kernel data structure where the groups a user has access to is hardcoded as an array of 16 identifiers (gids). Even though Linux now supports 65536 groups, it is still not possible to operate on more than 16 from userland.

My scenario …

at this moment, I had identified this same situation in my case. I had users assigned to more than 16 secondary groups, I had a service using a NFS for the data storage but, in addition, I had some more extra furnitures in the room:

  • Users of the service are actual UNIX accounts. The authorization to for the file accessing is delegated to the own UINIX system
  • I hadn’t got a common LDAP server sharing the uids and gids
  • The NFS service wasn’t under my control

; this last point turned my case a little bit more miserable as we will see later.

 Getting information from Internet …

first of all, a brief analysis of the situation is always welcome:

– What is the actual problem? This problem occurs when a user, who is a member of more than 16 groups, tries to access a file or directory on an nfs mount that depends on his group rights in order to be authorized to see it.  Isn’t it?
– Yes!
– So, whatever thing that you do should be starting by asking on Google. If the issue was present for all those years, the solution should be also present.
– Good idea! – I told concluding the dialog with myself.

After a couple of minutes I had a completed list of articles, mail archives, forums and blog posts which throw up all kind of information about the problem. All of them talked about the most of the points introduced up to this point in this article. More or less interesting each one, one of them sticked out respect the others. It was the solving-the-nfs-16-group-limit-problem posted article from the blog.

The solving-the-nfs-16-group-limit-problem article describes a similar situation and offers it own conclusions. I must admit that I am pretty aligned with these conclusions and I would recommend this post for a deep reading.

The silver bullet

This solution is the best case. If you have the control of the NFS and you are running a Linux kernel 2.6.21 at least. This kernel or newer supports a NFS feature with allows ignore the gids sent by the RPC operations, instead of uses the local gids assigned to the uid from the local server:

-g or --manage-gids
Accept requests from the kernel to map user id numbers into lists of group id numbers for use in access control. An NFS request will normally (except when using Kerberos or other cryptographic authentication) contains a user-id and a list of group-ids. Due to a limitation in the NFS protocol, at most 16 groups ids can be listed. If you use the -g flag, then the list of group ids received from the client will be replaced by a list of group ids determined by an appropriate lookup on the server. Note that the 'primary' group id is not affected so a newgroup command on the client will still be effective. This function requires a Linux Kernel with version at least 2.6.21.

The key for this solution is get synchronized the ids between the client and the server. A common solution for this last requirement it is a common Name Service Switch (NSS) service. Therefore, the --manage-gids option allows the NFS server to ignore the information sent by the client and check the groups directly with the information stored into a LDAP or whatever using by the NSS. For this case, the NFS server and the NFS client must share the UIDs and GIDs.

That is the suggested approaching suggested in solving-the-nfs-16-group-limit-problem. Unfortunately, it was not my case :-(.

But not in my case

In my case, I had no way for synchronize the ids of the client with the ids of the NFS server. In my situation the ids in the client server was obtained from a Postgres database added in the NSS as one of the backends, there was not any chance to use these backend for the NFS server.

The solution

But this was not the end. Fortunately, the nfs-ngroups patchs developed by expand the variable length list from 16-bit to 32-bit numeric supplemental group identifiers. As he says in the README file:

This patch is useful when users are member of more than 16 groups on a Linux NFS client. The patch bypasses this protocol imposed limit in a compatible manner (i.e. no server patching).

That was perfect! It was that I was looking for exactly. So I had to build a custom kernel patched with the right patch in the server under my control and voilá!:

tar -xf linux-3.10.101.tar.xz</code><code>
cd linux-3.10.101/
patch &lt; ../3.10-nfs-ngroups-4.60.patch
make oldconfig
make menuconfig
make rpm
rpm -i /root/rpmbuild/RPMS/x86_64/kernel-3.10.101-4.x86_64.rpm
dracut "initramfs-3.10.101.img" 3.10.101
grub2-mkconfig &gt; /boot/grub2/grub.cfg

Steps for CentOS, based on these three documents: [1] [2] [3]


As I said this post doesn’t make focus in the security stuffs. AUTH_SYS is a solution designed for the previous times before Internet. Nowadays, the total interconnection of the computer networks discourages the usage of kind methods like AUTH_SYS. It is an authentication method too much naive in the present.

Anyway, the NFS services are still quite common and many of them are still deployed with AUTH_SYS, not Kerberos or other intermediate solutions.  This post is about a specific situation in one of these deployments. Even if these services should be progressively replaced by other more secure solutions, a sysadmin should demand practical feedback about the particularities of these legacy systems.

Knowledge about the NFS 16 secondary groups limit and the different recognized workaround are still interesting from the point of view of the know-how. This post shows two solutions, even three if you consider the Kerberos choice, to fix this issue … just one of them fulfill with my requirements in my particular case.

My custom settings for Zabbix Agents

Since some time ago, I’m used working with ZABBIX systems – as server as agent, proxy … . Now, I don’t going to explain all the fantastic features about this software in this post. In this post, I will only enumerate my preferred ZABBIX Agent settings and my best values for each one.


    List of comma delimited IP addresses (or hostnames) of ZABBIX servers.
    No spaces allowed. First entry is used for sending active checks.
    Note that hostnames must resolve hostname->IP address and
    IP address->hostname.

  • ServerPort=10051

    Server port for sending active checks


    Unique hostname. Required for active checks.
    This hostname must correspond with the name set on ZABBIX Server for this

  • ListenPort=10050

    Listen port. Default is 10050.

  • ListenIP=

    IP address to bind agent
    If missing, bind to all available IPs

  • StartAgents=5

    Number of pre-forked instances of zabbix_agentd.
    Default value is 5.
    This parameter must be between 1 and 16

  • RefreshActiveChecks=90

    How often refresh list of active checks. 2 minutes by default.
    This check list are sending by the ZABBIX server/proxy.
    See in
    to known more about Zabbix Agent’s protocol.
    This value can’t be lower than 60 seconds.

  • DisableActive=0

    Disable active checks. The agent will work in passive mode listening server.

  • EnableRemoteCommands=1

    Enable remote commands for ZABBIX agent. By default remote commands disabled.

  • DebugLevel=3

    Specifies debug level:

    • 0 – debug is not created
    • 1 – critical information
    • 2 – error information
    • 3 – warnings
    • 4 – information (default)
    • 5 – for debugging (produces lots of information)
  • PidFile=/var/run/zabbix-agent/

    Name of PID file

  • LogFile=/var/log/zabbix-agent/zabbix_agentd.log

    Name of log file.
    If not set, syslog will be used

  • LogFileSize=5

    Maximum size of log file in MB. Set to 0 to disable automatic log rotation.

  • Timeout=15

    Spend no more than Timeout seconds on processing
    Must be between 1 and 30

  • UserParameter=http.basicaction,wget -t 2 -T 10 -q -O – “http://server:80/StatusInfo.php&#8221; | grep -e “App:”

    Format: UserParameter=key,shell_command
    Note that shell command must not return empty string or EOL only

More info:

Flushing ARP table entries for one specific IP

In some cases, the network elements use caching strategies in order to improve the network throughput. In this enviroment, is frequently that we are using subsystems like LVS with purpose of balance one service IP between a couple of hosts. In this cases, we can have several problems when we want use HA system because some network switch don’t releases  the old ARP entry of the service IP (due to caching effect). To avoid this aim, we’d use arping to force releases of  the old ARP entry.

  arping -v -c 1 -i eth0 -S -t ff:ff:ff:ff:ff:ff
  arping: invalid option -- '-'
  Usage: arping [-fqbDUAV] [-c count] [-w timeout] [-I device] [-s source] destination
   -f : quit on first reply
   -q : be quiet
   -b : keep broadcasting, don't go unicast
   -D : duplicate address detection mode
   -U : Unsolicited ARP mode, update your neighbours
   -A : ARP answer mode, update your neighbours
   -V : print version and exit
   -c count : how many packets to send
   -w timeout : how long to wait for a reply
   -I device : which ethernet device to use (eth0)
   -s source : source ip address
   destination : ask for what ip address

For example, we can use this tip in the network/intefaces conffile:

auto eth0
iface eth0 inet static
  post-up arping -v -c 1 -i eth0 -S -t ff:ff:ff:ff:ff:ff

The best documentation in the blackboard, … always




This diagram is chairing my work office. It has been drawn in a little blackboard  so that it is easilly visible for all members of the team. Many times, look up it is more useful than a big effort  of our mind.

The diagram show the different tables that IPTables is composited, the IP packages logic-flow arround the tables and try to show the relation between IPTables and the route tables of the system.