Running Debian in a Fuloong 2.0 Mini-PC (MIPS64el CPU Loongson 3A4000)

The history of the world is a continuous succession of  contradictions. The announcement from MIPS Technologies about their decision of definitely abandoning MIPS arch in favour of RISC-V is just another example. But the truth is that things are far from trivial in this topic. Even when the end-of-life date for the MIPS architecture looks closer in time than ever,  there are still infrastructures and platforms what need to keep being supported and maintained for this architecture in the meantime. To make the situation more complex, at the same time I am writing this post the Loongson Technology Ltd  is  announcing a new 16-Core MIPS 12nm CPU for 256-Core (Tom’s Hardware news). Loongson Technology also says that they keep a strong commitment with RISC-V for the future but they will keep their bet for MIPS64 in the meantime. So if MIPS is going to die it going be in lovely death.

In this context, here in Igalia we are hosting and maintaining the CI workers for the JavaScriptCore 32-bit (MIPS) infrastructure for the WebKit web browser engine.

Build worker for JavaScriptCore 32-bit (MIPS) host at Igalia

No one ever said that finding end-user hardware of this kind of system is easy-peasy. The options in the market often don’t achieve the sufficient level of maturity or come with a poor set of hardware specifications. The choices are often not intended for long time-consuming CPU tasks, or they simply lack good OS support (maintenance, updates, custom kernels, open-source drivers …).

Nowadays we are using a parallelized cluster of MIPSEL CI 20 boards to move the JavaScriptCore 32-bits (MIPS) CI workers. Don’t get me wrong: the CI 20 boards are certainly not bad. These boards are really great for development and evaluation purposes, but even rare failures become commonplace when you run 30 of them 24/7 in parallel. For this reason some time ago we started looking for an alternative that would eventually replace them. And this was when we found the following candidate.

The candidate

We had a look at what Debian was using for their QA infrastructure and talked to the MIPS team – credits to Berto García who helped us with this – and we concluded that the Loongson 3B4000 MIPSel board was a promising option so we decided to explore it.

We started looking for information about this CPU model and we found references for the Loongson 3A4000 + Fuloong 2.0 Mini-PC. This computer is a kind of very interesting end-user product based on the MIPS64el architecture. In particular, this computer uses a similar but more recent and powerful evolution of the Loongson 3B4000 processor. The Fuloong 2.0 comes in a barebone format with the Loongson-3A R4 (Loongson-3A4000) @ 1500MHz, a quad-core processor, with 8GB DDR4 RAM and a 1TB NVMe of internal storage. These technical specifications are completed with a Realtek ALC662 sound card, 2x USB 3.0 ports + 1x USB Type-C + 4x USB 2.0, 2x HDMI video outputs, 2x Ethernet (WGI211AT), audio connectors, M.2 slot for WiFi module and, finally, a Vivante GL1000 GPU (OpenGL ES 2.0/1.1). This specifications are clearly far from the common constraints of the regular development MIPS boards and are technically a serious candidate for replacing the current boards used in the CI cluster.

However, the acquisition of this kind of products has some non-technical cons that is important to have in mind before taking any decision. For example, it is very difficult to find a reseller in Europe providing this kind of machines. This means that this computer needs to be directly shipped from China, which also means that the acquisition process can suffer from the common problems of this kind of orders: higher delivery time (~1 month), paperwork for customs, taxes, delivery tracking issues … Anyway, this post is intended to keep the focus on the technical details ;-). The fact is, once these issues are solved you will receive a machine similar to this one shown in the photos:

The unboxing

The machine comes with a pre-installed custom distro (“Dragon Dream F28”, based on Fedora 28). This distro is quite old but it is the one provided by the manufacturer (Lemote). Apparently it is the only one that, in theory, fully supports the machine. The installed image comes with a desktop environment on top of an X server. The distro is also synced with an RPM repository hosted by Lemote. This is really convenient to start experimenting with the computer and very useful to get information about the system before taking any action on the computer. Here is the output of some commands:

# cat /proc/cpuinfo
system type : generic-loongson-machine
machine : loongson,generic
processor : 0
cpu model : Loongson-3 V0.4 FPU V0.1
model name : Loongson-3A R4 (Loongson-3A4000) @ 1500MHz
CPU MHz : 1500.00
BogoMIPS : 2990.15
wait instruction : yes
microsecond timers : yes
tlb_entries : 2112
extra interrupt vector : no
hardware watchpoint : no
isa : mips1 mips2 mips3 mips4 mips5 mips32r1 mips32r2 mips64r1 mips64r2
ASEs implemented : vz msa loongson-mmi loongson-cam loongson-ext loongson-ext2
shadow register sets : 1
kscratch registers : 6
package : 0
core : 0
... (x4)

dmesg:

Mar 9 12:43:19 fuloong-01 kernel: [ 2.884260] Console: switching to colour frame buffer device 240x67 
Mar 9 12:43:19 fuloong-01 kernel: [ 2.915928] loongson-drm 0000:00:06.1: fb0: loongson-drmdrm frame buffer device 
Mar 9 12:43:19 fuloong-01 kernel: [ 2.919792] etnaviv 0000:00:06.0: Device 14:7a15, irq 93 
Mar 9 12:43:19 fuloong-01 kernel: [ 2.920249] etnaviv 0000:00:06.0: model: GC1000, revision: 5037 
Mar 9 12:43:19 fuloong-01 kernel: [ 2.920378] [drm] Initialized etnaviv 1.3.0 20151214 for 0000:00:06.0 on minor 1

lsblk:

# lsblk
nvme0n1 259:0 0 477G 0 disk
├─nvme0n1p1 259:1 0 190M 0 part /boot/efi
├─nvme0n1p2 259:2 0 1,7G 0 part /boot
├─nvme0n1p3 259:3 0 7,5G 0 part [SWAP]
├─nvme0n1p4 259:4 0 46,6G 0 part /
└─nvme0n1p5 259:5 0 421,1G 0 part /home

Getting Debian into the Fuloong 2.0

The WebKitGTK and WPE WebKit CI infrastructure is entirely based on Debian Stable and/or Ubuntu LTS. This is according to the WebKitGTK maintenance and development policy. For that reason we were pretty interested in getting the machine running with Debian Stable (“buster” as of this writing). So what comes next is the description of the installation process of a pure Debian base system hybridized with the Lemote Fedora Linux kernel using an external USB storage stick as the bootable disk. The process is a mix between the following two documents:

Those documents provide a good detailed explanation of the steps to follow to perform the installation. Only the installation of the kernel and the grub2-efi differs a bit but let’s come back to that later. The idea is:

  • Set the EFI/BIOS to boot from the USB storage (EFI)
  • Install the base Debian OS in a external microSD card connected to the USB3-SS port
  • Keep using the internal nvme disk as the working dir (/home, /var/lib/lxc)

The installation process is initiated in the pre-installed Fedora image. The first action is to mount the external USB storage (sda) in the living system as follows:

# lsblk
sda 8:0 1 14,9G 0 disk
├─sda1 8:1 1 200M 0 part /mnt/debinst/boot/efi
└─sda2 8:2 1 10G 0 part /mnt/debinst
nvme0n1 259:0 0 477G 0 disk
├─nvme0n1p1 259:1 0 190M 0 part /boot/efi
├─nvme0n1p2 259:2 0 1,7G 0 part /boot
├─nvme0n1p3 259:3 0 7,5G 0 part [SWAP]
├─nvme0n1p4 259:4 0 46,6G 0 part /
└─nvme0n1p5 259:5 0 421,1G 0 part /home

As I said, the steps to install the Debian system into the SDcard are quite straightforward. The problems begins during the installation of GRUB and the Linux kernel …

The Linux Kernel

Having followed the guide we will reach the Install a Kernel step. Debian provides a Loongson Linux 4.19 kernel for the Loongson 3A/3B boards.

ii linux-image-4.19.0-14-loongson-3 4.19.171-2 mips64el Linux 4.19 for Loongson 3A/3B
ii linux-image-loongson-3 4.19+105+deb10u9 mips64el Linux for Loongson 3A/3B (meta-package)
ii linux-libc-dev:mips64el 4.19.171-2 mips64el Linux support headers for userspace development

It is quite old in comparison with the one that the Lemote Fedora distro contains (5.4.63-20201012-def) so I prefered to keep the one, although it should be possible to get the machine running with this kernel as well.

Grub2 EFI, first attempt trying to build it for the device

This is the main issue that I found. The first thing that I tried was to look for a GRUB package with EFI support in the mips64el Debian chroot:

root@fuloong-01:/# apt search grub | grep efi
<<empty>>

The frustration came quickly when I didn’t find any GRUB candidate. It was then when I remembered that there was a grub-yeeloong package in the Debian repository that could be useful in this case. The Yeeloong is the predecessor of the Loongson so what I tried next was to rebuild the GRUB package but adding the mips64el architecture for the grub-yeeloong package. Something like the following:

  • Getting the Debian sources and dependencies for the grub2 packages:
    apt source grub2
    apt install debhelper patchutils python flex bison po-debconf help2man texinfo xfonts-unifont libfreetype6-dev gettext libdevmapper-dev libsdl1.2-dev xorriso parted libfuse-dev ttf-dejavu-core liblzma-dev wamerican pkg-config bash-completion build-essentia
    
  • Patching the /debian/control file using this patch
  • … and then to build the Debian package:
    ~/debs# cd grub2-2.02+dfsg1 && dpkg-buildpackage
    
    ~/debs/grub2-2.02+dfsg1# ls ../
    grub-common-dbgsym_2.02+dfsg1-20+deb10u3_mips64el.deb grub-yeeloong_2.02+dfsg1-20+deb10u3_mips64el.deb grub2_2.02+dfsg1-20+deb10u3.debian.tar.xz grub2_2.02+dfsg1.orig.tar.xz
    grub-common_2.02+dfsg1-20+deb10u3_mips64el.deb grub2-2.02+dfsg1 grub2_2.02+dfsg1-20+deb10u3.dsc
    grub-mount-udeb_2.02+dfsg1-20+deb10u3_mips64el.udeb grub2-common-dbgsym_2.02+dfsg1-20+deb10u3_mips64el.deb grub2_2.02+dfsg1-20+deb10u3_mips64el.buildinfo
    grub-yeeloong-bin_2.02+dfsg1-20+deb10u3_mips64el.deb grub2-common_2.02+dfsg1-20+deb10u3_mips64el.deb grub2_2.02+dfsg1-20+deb10u3_mips64el.changes
    

The .deb package is built correctly but the problem is the binary. It lacks EFI runtime support so it is not useful in our case:

*******************************************************
GRUB2 will be compiled with following components:
Platform: mipsel-none <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
With devmapper support: Yes
With memory debugging: No
With disk cache statistics: No
With boot time statistics: No
efiemu runtime: No (only available on i386)
grub-mkfont: Yes
grub-mount: Yes
starfield theme: Yes
With DejaVuSans font from /usr/share/fonts/truetype/ttf-dejavu/DejaVuSans.ttf
With libzfs support: No (need zfs library)
Build-time grub-mkfont: Yes
With unifont from /usr/share/fonts/X11/misc/unifont.pcf.gz
With liblzma from -llzma (support for XZ-compressed mips images)
With quiet boot: No
*******************************************************

This is what happens if you still try to install it:

root@fuloong-01:~/debs/grub2-2.02+dfsg1# dpkg -i ../grub-yeeloong-bin_2.02+dfsg1-20+deb10u3_mips64el.deb ../grub-common_2.02+dfsg1-20+deb10u3_mips64el.deb ../grub2-common_2.02+dfsg1-20+deb10u3_mips64el.deb
root@fuloong-01:~/debs/grub2-2.02+dfsg1# grub-install /dev/sda
Installing for mipsel-loongson platform.
...
grub-install: warning: WARNING: no platform-specific install was performed. <<<<<<<<<<
Installation finished. No error reported.

There is not glue between EFI and GRUB. Files like BOOTMIPS.EFI, gcdmips64el.efi and grub.efi are missing so this is package is not useful at all:

root@fuloong-01:~/debs/grub2-2.02+dfsg1# ls /boot/
System.map-4.19.0-14-loongson-3 config-4.19.0-14-loongson-3 efi grub grub.elf initrd.img-4.19.0-14-loongson-3 vmlinux-4.19.0-14-loongson-3
root@fuloong-01:~/debs/grub2-2.02+dfsg1# ls /boot/grub
fonts grubenv locale mipsel-loongson
root@fuloong-01:~/debs/grub2-2.02+dfsg1# ls /boot/efi/
<<empty>>
root@fuloong-01:~/debs/grub2-2.02+dfsg1# ls /boot/
System.map-4.19.0-14-loongson-3 config-4.19.0-14-loongson-3 efi grub grub.elf initrd.img-4.19.0-14-loongson-3 vmlinux-4.19.0-14-loongson-3
root@fuloong-01:~/debs/grub2-2.02+dfsg1# ls /boot/grub
grub/ grub.elf

The grub-install command will also confirm that the mips64el-efi target is not supported:

root@fuloong-01:~/debs/grub2-2.02+dfsg1# /usr/sbin/grub-install --help
Usage: grub-install [OPTION...] [OPTION] [INSTALL_DEVICE]
Install GRUB on your drive.
...
--target=TARGET install GRUB for TARGET platform
[default=mipsel-loongson]; available targets:
arm-efi, arm-uboot, arm64-efi, i386-coreboot,
i386-efi, i386-ieee1275, i386-multiboot, i386-pc,
i386-qemu, i386-xen, i386-xen_pvh, ia64-efi,
mips-arc, mips-qemu_mips, mipsel-arc,
mipsel-loongson, mipsel-qemu_mips,
powerpc-ieee1275, sparc64-ieee1275, x86_64-efi,
x86_64-xen

Second attempt, the loongson-community Grub2 EFI

Now that we know that we can not use an official Debian package to install and configure GRUB it is time for a bit of google-fu.

I must have a lot of practice since it only took me a short while to find that the Lemote Fedora distro provides its own GRUB package for the Loongson and, later, I found new hope reading this article. This article explains how to build the GRUB from loongson-community with EFI support so what I would do next was the obvious logical step: To try to build it and check it:

    • git clone https://github.com/loongson-community/grub.git
      cd grub
      bash autogen.sh
      ./configure --prefix=/opt/alternative/
      make ; make install
    • The configure output looks promising:
      *******************************************************
      GRUB2 will be compiled with following components:
      Platform: mips64el-efi <<<<<<<<<<<<<<<<< Looks good.
      With devmapper support: Yes
      With memory debugging: No
      With disk cache statistics: No
      With boot time statistics: No
      efiemu runtime: No (not available on efi)
      grub-mkfont: No (need freetype2 library)
      grub-mount: Yes
      starfield theme: No (No build-time grub-mkfont)
      With libzfs support: No (need zfs library)
      Build-time grub-mkfont: No (need freetype2 library)
      Without unifont (no build-time grub-mkfont)
      With liblzma from -llzma (support for XZ-compressed mips images)
      *******************************************************

    … but unfortunately I started to have more and more build errors in every step. Errors like these:

cc1: error: position-independent code requires ‘-mabicalls’
grub_script.yy.c:19:22: error: statement with no effect [-Werror=unused-value]
build-grub-module-verifier: error: unsupported relocation 0x51807.

… so after several attempts I finally gave up trying to build the loongson-community with GRUB EFI support. Here the patch with some of the modifications that I tried in the code just in case you are better at solving these build errors than me.

Third attempt, reusing the GRUB2 EFI resources from the pre-installed system

… and the last one.

My winner horse was the simpler solution: to reuse the /boot and /boot/efi directories installed in the Fedora system as base for a new Debian system:

    • Clone the tree in the destination dir:
      cp -a /boot /mnt/debinst/boot
    • Replace the UUIDs patch

    The /boot dir in the target installation will be look like this:

    [root@fuloong-01 boot]# tree /mnt/debinst/boot/
    /mnt/debinst/boot/
    ├── boot -> .
    ├── config-5.4.60-1.fc28.lemote.mips64el
    ├── e8a27b4e4fcc4db9ab7a64bd81393773
    │   └── 5.4.60-1.fc28.lemote.mips64el
    │   ├── initrd
    │   └── linux
    ├── efi
    │   ├── boot
    │   │   ├── grub.cfg
    │   │   └── grub.efi
    │   ├── EFI
    │   │   ├── BOOT
    │   │   │   ├── BOOTMIPS.EFI
    │   │   │   ├── fonts
    │   │   │   │   └── unicode.pf2
    │   │   │   ├── gcdmips64el.efi
    │   │   │   ├── grub.cfg
    │   │   │   └── grubenv
    │   │   └── fedora
    │   ├── mach_kernel
    │   └── System
    │   └── Library
    │   └── CoreServices
    │   └── SystemVersion.plist
    ├── extlinux
    ├── grub2
    │   ├── grubenv -> ../efi/EFI/BOOT/grubenv
    │   └── themes
    │   └── system
    │   ├── background.png
    │   └── fireworks.png
    ├── grub.cfg
    ├── grub.efi
    ├── initramfs-5.4.60-1.fc28.lemote.mips64el.img
    ├── loader
    │   └── entries
    │   └── e8a27b4e4fcc4db9ab7a64bd81393773-5.4.60-1.fc28.lemote.mips64el.conf
    ├── lost+found
    ├── System.map-5.4.60-1.fc28.lemote.mips64el
    ├── vmlinuz-205
    └── vmlinuz-5.4.60-1.fc28.lemote.mips64el

… et voilà!

Finally we have a pure Debian Buster root base system hybridized with the Lemote Fedora Linux kernel:

root@fuloong-01:~# cat /etc/debian_version
10.8
root@fuloong-01:~# uname -a
Linux fuloong-01 5.4.60-1.fc28.lemote.mips64el #1 SMP PREEMPT Mon Aug 24 09:33:35 CST 2020 mips64 GNU/Linux
root@fuloong-01:~# cat /etc/apt/sources.list
deb http://httpredir.debian.org/debian buster main contrib non-free 
deb-src http://httpredir.debian.org/debian buster main contrib non-free 
deb http://security.debian.org/ buster/updates main contrib non-free 
deb http://httpredir.debian.org/debian/ buster-updates main contrib non-free
root@fuloong-01:~# apt update
Hit:1 http://httpredir.debian.org/debian buster InRelease
Get:2 http://security.debian.org buster/updates InRelease [65,4 kB]
Get:3 http://httpredir.debian.org/debian buster-updates InRelease [51,9 kB]
Get:4 http://security.debian.org buster/updates/main mips64el Packages [242 kB]
Get:5 http://security.debian.org buster/updates/main Translation-en [142 kB]
Fetched 501 kB in 1s (417 kB/s)                                
Reading package lists... Done
Building dependency tree       
Reading state information... Done
3 packages can be upgraded. Run 'apt list --upgradable' to see them.

With this hardware we can reasonably run native GDB directly on it and have the possibility to run other tools in the host (e.g. you can run any monitoring agent on it to get stats and so). Definitely, having this hardware enabled for using it in the CI infrastructure will be a promising step towards a better QA for the project.
That is all from my side. I will probably continue dedicating some time to get buildable packages of GRUB-EFI and the Linux Kernel that we could use for this and similar machines (e.g. for tools like perf who needs to have the userspace binaries in sync with the kernel version). In the meantime, I really hope that this can be useful to someone out there who is interested in this hardware. If you have some comment or question or you simply wish to share your thoughts about this just leave a comment.

Stay safe!

The NFS 16 groups limit issue

The last Friday I was involved in a curious situation trying to setup a NFS server. The NFS server was mounted in UNIX server which was using UNIX users accounts assigned to many groups. These users were using files and directories stored in the NFS server.

As brief description of the situación which incites this post, I will say that the problem occurs when you are using UNIX users which are assigned in more than 16 UNIX groups. In this scenario, if you are using NFS (whatever version) with the UNIX system authentication (AUTH_SYS), quite common nowadays in spite of the security recommendations, you will get a permission denied during the access to certain arbitrary files and directories. The reason is that the list of secondary groups assigned to the user is truncated by the AUTH_SYS implementation. That is simple amazing!

Well, to be honest, this is not an unknown NFS problem. This limitation is here, around us, since the early stages of the modern computing technology. After a quick search on Internet, I found the reason why this happens and it is not a NFS limitation but it is a limit specified on AUTH_SYS:

   The client may wish to identify itself, for example, as it is
   identified on a UNIX(tm) system.  The flavor of the client credential
   is "AUTH_SYS".  The opaque data constituting the credential encodes
   the following structure:

         struct authsys_parms {
            unsigned int stamp;
            string machinename<255>;
            unsigned int uid;
            unsigned int gid;
            unsigned int gids<16>;
         };

The root cause

AUTH_SYS is the historical method which is used by client programs contacting an RPC server need. This allows the server get information about how the client should be able to access, and what functions should be allowed. Without authentication, any client on the network that can send packets to the RPC server could access any function.

AUTH_SYS has been in use for years in many systems just because it was the first authentication method available but AUTH_SYS is not a secure authentication method nowadays. In AUTH_SYS, the RPC client sends the UNIX UID and GIDs for the user, the server implicitly trusts that the user is who the user claims to be. All the this information is sent through the network without any kind of encryption and authentication, so it is high vulnerable.

In consequence, AUTH_SYS is an insecure security mode. The result is this can be used as the proverbial open lock on a door. Overall  the technical articles about these matters highly suggest the usage of other alternatives like NFSv4 (even NFSv3) and Kerberos, but  yet AUTH_SYS is commonly used within companies, so we must still deal it.

Note: This article didn’t focus in security issues. The main purpose of this article is describe a specific situation and show the possible alternatives identified during the troubleshooting of the issue.

Taking up the thread …

I was profiling a situation where the main issue was leaded by a UNIX secondary groups list truncation. Before continue, some summary of the context here: A UNIX user has a primary group, defined in the passwd database, but can also be a member of many other groups, defined in the group database. A UNIX system hardcoded  a limit of 16 groups that a user can be a member of (source). This means that clients into UNIX groups only be able to access to 16 groups. Quite poor when you deal with dozens and dozens of groups.

As we already know, the problem is focused in the NFS fulfilment with the AUTH_SYS specifications, which has an in-kernel data structure where the groups a user has access to is hardcoded as an array of 16 identifiers (gids). Even though Linux now supports 65536 groups, it is still not possible to operate on more than 16 from userland.

My scenario …

at this moment, I had identified this same situation in my case. I had users assigned to more than 16 secondary groups, I had a service using a NFS for the data storage but, in addition, I had some more extra furnitures in the room:

  • Users of the service are actual UNIX accounts. The authorization to for the file accessing is delegated to the own UINIX system
  • I hadn’t got a common LDAP server sharing the uids and gids
  • The NFS service wasn’t under my control

; this last point turned my case a little bit more miserable as we will see later.

 Getting information from Internet …

first of all, a brief analysis of the situation is always welcome:

– What is the actual problem? This problem occurs when a user, who is a member of more than 16 groups, tries to access a file or directory on an nfs mount that depends on his group rights in order to be authorized to see it.  Isn’t it?
– Yes!
– So, whatever thing that you do should be starting by asking on Google. If the issue was present for all those years, the solution should be also present.
– Good idea! – I told concluding the dialog with myself.

After a couple of minutes I had a completed list of articles, mail archives, forums and blog posts which throw up all kind of information about the problem. All of them talked about the most of the points introduced up to this point in this article. More or less interesting each one, one of them sticked out respect the others. It was the solving-the-nfs-16-group-limit-problem posted article from the xkyle.com blog.

The solving-the-nfs-16-group-limit-problem article describes a similar situation and offers it own conclusions. I must admit that I am pretty aligned with these conclusions and I would recommend this post for a deep reading.

The silver bullet

This solution is the best case. If you have the control of the NFS and you are running a Linux kernel 2.6.21 at least. This kernel or newer supports a NFS feature with allows ignore the gids sent by the RPC operations, instead of uses the local gids assigned to the uid from the local server:

-g or --manage-gids
Accept requests from the kernel to map user id numbers into lists of group id numbers for use in access control. An NFS request will normally (except when using Kerberos or other cryptographic authentication) contains a user-id and a list of group-ids. Due to a limitation in the NFS protocol, at most 16 groups ids can be listed. If you use the -g flag, then the list of group ids received from the client will be replaced by a list of group ids determined by an appropriate lookup on the server. Note that the 'primary' group id is not affected so a newgroup command on the client will still be effective. This function requires a Linux Kernel with version at least 2.6.21.

The key for this solution is get synchronized the ids between the client and the server. A common solution for this last requirement it is a common Name Service Switch (NSS) service. Therefore, the --manage-gids option allows the NFS server to ignore the information sent by the client and check the groups directly with the information stored into a LDAP or whatever using by the NSS. For this case, the NFS server and the NFS client must share the UIDs and GIDs.

That is the suggested approaching suggested in solving-the-nfs-16-group-limit-problem. Unfortunately, it was not my case :-(.

But not in my case

In my case, I had no way for synchronize the ids of the client with the ids of the NFS server. In my situation the ids in the client server was obtained from a Postgres database added in the NSS as one of the backends, there was not any chance to use these backend for the NFS server.

The solution

But this was not the end. Fortunately, the nfs-ngroups patchs developed by frankvm@frankvm.com expand the variable length list from 16-bit to 32-bit numeric supplemental group identifiers. As he says in the README file:

This patch is useful when users are member of more than 16 groups on a Linux NFS client. The patch bypasses this protocol imposed limit in a compatible manner (i.e. no server patching).

That was perfect! It was that I was looking for exactly. So I had to build a custom kernel patched with the right patch in the server under my control and voilá!:

wget https://cdn.kernel.org/pub/linux/kernel/v3.x/linux-3.10.101.tar.xz
wget http://www.frankvm.com/nfs-ngroups/3.10-nfs-ngroups-4.60.patch
tar -xf linux-3.10.101.tar.xz</code><code>
cd linux-3.10.101/
patch &lt; ../3.10-nfs-ngroups-4.60.patch
make oldconfig
make menuconfig
make rpm
rpm -i /root/rpmbuild/RPMS/x86_64/kernel-3.10.101-4.x86_64.rpm
dracut "initramfs-3.10.101.img" 3.10.101
grub2-mkconfig &gt; /boot/grub2/grub.cfg

Steps for CentOS, based on these three documents: [1] [2] [3]

Conclusions

As I said this post doesn’t make focus in the security stuffs. AUTH_SYS is a solution designed for the previous times before Internet. Nowadays, the total interconnection of the computer networks discourages the usage of kind methods like AUTH_SYS. It is an authentication method too much naive in the present.

Anyway, the NFS services are still quite common and many of them are still deployed with AUTH_SYS, not Kerberos or other intermediate solutions.  This post is about a specific situation in one of these deployments. Even if these services should be progressively replaced by other more secure solutions, a sysadmin should demand practical feedback about the particularities of these legacy systems.

Knowledge about the NFS 16 secondary groups limit and the different recognized workaround are still interesting from the point of view of the know-how. This post shows two solutions, even three if you consider the Kerberos choice, to fix this issue … just one of them fulfill with my requirements in my particular case.

Deux ex virtual machine

El siguiente texto no se trata de un alegato a favor de los sistemas de virtualización a nivel OS como OpenVZ y VServer y en contra de los sistemas de virtualización completa como Xen, VMware, VBox, KVM …
No, no se trata de eso, si no de que, en principio,  para un entorno homogéneo y con pocos recursos hardware o con servicios que, a priori, sea difícil saber como van a evolucionar en el tiempo con respecto al consumo de los recursos HW, el uso de uso de estos contenedores se me antoja personalmente mucho más práctico y versátil que las otras dos alternativas (no virtualización y virtualización completa).

***

La empresa Pérez y Familia son una PYME de medio tamaño (unos 50 trabajadores). La misma, tiene un importante Departamento de Informática formado por un gurú Unix (top 9 en el
ranking mundial de freaks) y un becario a media jornada. Por supuesto, tienen una partida presupuestaria de 1200 chocomonedas para todo el año destinadas principalmente a mantener la familia numerosa del señor que mantiene fiel y cuasiquereligiosamente la máquina del café. El Dep. de Informática en su CPD, ubicado inmediatamente debajo de la mesa del becario, tiene una pequeña infraestructura de servidores:

  • (A) 2 PentiumIV con unas a cuantas tarjetas de red a modo de Router corporativo en HA
  • (B) 1 PowerEdge 1890 DualCore que compraron cuando los inicios de la empresa
  • (C) 2 X PCs clónicos 8core con 8GB de RAM comprados recientemente en MarcaMedia a petición explícita del Dep. de Informática

Al departamento de informática se le da vía libre para instalar los SO bajo su antojo, siempre y cuando las horas extra caigan de su lado. Así que como casi una excepción empresarial, se forman su propia taifa dentro del reino de los Pérez. Por otro lado, la empresa poco a poco, a lo largo de su tiempo de vida fue incorporando herramientas informáticas a sus procesos de trabajo:

  • Al principio, los de Informática montaron sus servidores que sólo sirven para consumir ciclos de procesado y algunas cosas freakies más:
    • Un servidor DNS, que si un DHCP, y un Snort en A
    • Que si un OpenVPN en A
    • Que si unos repositorios en B
  • Luego, vinieron las herramientas de apoyo al proceso:
    • Gestor documental en B
    • CRM en B
  • Tiempo más, llegaron más herramientas que se fueron incorporando:
    • La intranet
    • La extranet
    • El streamer de música
    • El repositorio de código
    • El inventario y el sistema de monitorización de servicios
    • y un largo etc …

Todo esto se fue metiendo en B y C como se pudo. A mayores, cada cierto tiempo algo o alguién pide la evaluación de un software para ver si es rompedor. Lógicamente, estas pruebas se hacen en los entornos de testing (aka, el portátil del becario, el suyo el que se compro con la subvención de estudiante matriculado). Todo esto va generando el escenario, preparando el clímax de nuestra historia, el preludio del ansiado momento que nos llevará al clímax y servirá de justificante al desenlace.

Y llegó el momento, Dirección exige la instalación del ERP UltimatumTotalNG. Este está pensado para RedHat4.0 si y sólo con versiones concretas de ciertos tipos de base de datos y demás y Dirección, que es experta en toma de decisiones técnicas como Franco lo era en la Teoría de la Relatividad de Einstein, no quiere ni oir de hablar de otras distros o versiones que no sean las que el apuesto y sofisticado consultor externo ha sugerido para el UltimatumTotalNG.

Llegados a este punto los del Dep. Informática (o sea el becario y el gurú Top10) se  encuentran de repente contra la espada y la pared. Ya que ellos decidieron en el pasado, allá por el tiempo de la patata (esta es buena, a ver quien la pilla), elegir Debian como OS y lo fueron actualizando hasta Squeeze ni más ni menos. ¿Que pasará? ¿Será el fin de nuestros héroes? ¿Como podrán salir de esta encrucijada? Veamos cómo actúan:

– ¿Que podemos hacer? dijo el Gurú Unix.

– Bien, analicemos la situación, dijo el becario. Tenemos muchos servicios instalados pero realmente estos no están consumiendo siempre los recursos del sistema (50 clientes no son muchos clientes al fin y al cabo). Con las máquinas que tenemos bien podemos aguantar todos estos servicios. No obstante, tenemos un problema con los procesos de actualización: No es la primera vez que por motivo de la actualización de un servicio concreto, se ven afectados otros servicios independientes y nos cae un bronca encima.

– Bien, podríamos usar virtualización. Movemos todos los servicios que están en C” a C’ y preparamos esta máquina para albergar máquinas KVM y así vamos migrando todo sucesivamente (empezando por la RedHat del ERP) hasta tener todo virtualizado.

– Perfecto, crack!

(tiempo que transcurre en liberar C” de servicios)

– …. Bien, ahora ¿como dimensionamos las imágenes? ummm hombre dale medio core a esta MV que no se suele usar muy a menudo.
– Nooo, estás loco!, si ahí va a ir el CRM, entre las nueve y nueve y media los de marketing se ponen a hacer consultas como locos. Ya sabes que no son muy cuidadosos a la hora de ejecutar los filtros. Se nos van a echar encima, dale por lo menos dos cores …

(después de 5 horas más)

– Venga, créame otra MV para este servicio.
– Oops!, creo que tenemos todos los recursos físicos asignados.
– Pero ¿como es posible?, si antes estábamos dando el servicio perfectamente con el mismo número de servidores!!!.
– Si, pero también es cierto que teníamos en la misma máquina el CRM y el Gestor documental y estos, a pesar de usar todos los recursos físicos, uno tenía sus cargas de trabajo por la mañana y otro por la tarde por lo que la máquina estaba bien balanceada. Con la reserva de recursos de las máquinas hemos partido los recursos a la mitad, con lo que otros servicios menores que también se corrían sobre dicha máquina se han quedado sin recursos para una nueva MV donde alojarlos.

… (todo los demás transcurrido es una sucesión de culpas, reproches, #epicfails, #workarrounds, excusas, horas extras y #fatalities) …

Finalmente, los informáticos fueron despedidos. El gurú escribió un libro y se retiró con el dinero que ganó y el Becario, frustrado por su primera experiencia laboral, diseño un sistema de jails para Linux llamado VUltimatum con un funcionamiento muy similar a lo que conocemos como VServers, Zones, Jails, VZ, Containers. Debido a esto, es altamente reconocido en su gremio profesional y, a mayores, tiene un trabajo digno como reponedor del Carrefour. Podría decir que la vida le sonríe.

I prefer OS-virtualization than full-virtualization

Since some time ago I’m clear about next question: ¿what advantages do you see to OS-virtualization (like OpenVz or VServer) solutions against Full-virtualizacion solutions like (KVM or Xen)? And, this is my official answer for this question:

  • Performance. OS-virtualization have got a lower penalization than the other one with respect to the physical machine resources. In OS-virtualization this is certainly a low different against a real machine:
  • We can make checkpoints before software updates. Very interesting to no missteps and ease basesystem upgrading.
  • Simplicity. When we need isolate a particular software components, we can use a VZ (OS-virtualization) as same way that a chroot environment. For example, I need to run a particular software architecture that supports only 32-bit on a cocrete AMD64 server.
  • You do not need a disk image on which to install the guest operating.
    • No headaches about sizing/resizing images
    • You can set disk quotas consumption if necessary
    • You can access to files directly accesing to the instance VServer/OpenVZ from the base host. For example, backing up is only needed on the base host, not on virtual host. Another example, you could share directories as Debian’s packages pool among multiple virtual host reducing the disk consuption (like Solaris Non-Global Zones).
  • If our environment is homogeneous, in OS point of view, and assuming we’re living in a world GNU/Linux like,  really we do not need that our virtualization system is able to virtualize BSD, SystemV, Windows and others OS
  • Physical resources can be reserved by virtual machines, but also can be left as sharing. This is all virtual machines use the resources on demand.

Historia de una migración de sistemas

Hace año y medio me embarqué junto con mis colegas de departamento en la ardua empresa de migrar nuestra plataforma de servidores empresariales hacia una plataforma full-virtualized basada en OpenVZ, KVM y ProxMox. Escribo esto con la ventaja de hacerlo postumamente a la finalización de la misma.

La decisión de iniciar esta migración tiene su punto inicial en el anuncio de Debian de marcar como deprecated el uso y soporte oficial de VServer a partir de Debian 6.0

http://lwn.net/Articles/357623/

El caso es que de aquella estábamos de aquella realizando un uso habitual  de la tecnología de VServer para los servicios de cliente y internos de la empresa
bajo las premisas:

El sistema base lo más limpio posible y homogéneo” y “Máquina virtual por servicio

(para evitar incompatibilidades entre distintos software bajo la misma
instancia de SO, reducir solapamientos de servicios …)

La alternativa que valoramos fue el paso a un sistema mixto de OpenVZ y KVM gestionado por una aplicación web ad hoc usando libvirtd y la justificación de esta alternativa era la siguiente:

  • Fácil migración de VServer a OpenVZ (3 simples pasos)
  • En OpenVZ tenía dos modos de gestión de la red: red virtual y dispositivo virtual
    • El uno tenía la ventaja de ser rápido y sencillo de configurar y más restrictivo en cuanto capacidades del guest
    • El otro permitía multicast/broadcast y usado conjunto bridges ofrecía una configurabilidad  muy seductora. Por ejemplo, arranque por DHCP 🙂
    • Ambos modos permitía la posibilidad de cargar reglas de IPtables en el guest
  • El wiki/soporte documental del proyecto OpenVZ era mucho mayor, claro y organizado que el de VServer
  • OpenVZ ofrecía la capacidad de migración de nodos en caliente.
  • Los rendimientos de OpenVZ frente a VServer eran bastante equiparables
  • Libvirtd había incorporado recientemente soporte para OpenVZ
  • Libvirtd tenía sus respectivas implementaciones del API de C para Ruby y Python
  • KVM era bastante cómodo de usar
  • KVM empezaba a dar señales de robustez para AMD64
  • KVM nos permitiría la evaluación de otras plataformas a las cuales no dedicabamos tiempo debido a lo engorroso de reinstalar máquinas físicas:  BSDs, Solaris sin el uso de Zones, Windows …
  • Era asumible la construcción de un software basado en Libvirtd + OnRails para la gestión homogénea de lo que sería nuestra nueva plataforma

No obstante, un punto crucial en nuestro camino fue el descubrimiento del proyecto ProxMox. Este hayazgo, facilitó enormemente el camino ya que, una vez analizado y probado, nos dimos cuenta que cubría a la perfección nuestras demandas como usuarios y administradores de la plataforma de virtualización y esto implicaba que el desarrollo de una frontend de gestión ya no sería necesario. Mientras tanto, las evaluaciones que estabamos realizando con respecto a OpenVZ eran muy positivas con respecto a VServer:

  • Migración on live funcionaba realmente.
    • El sistema probado fue entre nodos independientes (sin storage común)
    • La migración se realiza en caliente usando ssh,public_keys y rsync.
    • El cambio de nodo se hace sin aparente caída del servicio
  • Sistema de snapshots funcional.
  • Política de Red
    • Venet (virtualización de la red). Funciona, es eficiente y nos permitió virtualizar segmentos de red (muy interesante en nuestro caso ya que hasta la fecha usabamos el mismo segmento de red para servicios que deberían estar aislados a nivel 2 y sólo accesibles a nivel 3 por enrutado). Venet nos permitió simular este comportamiento sin añadir hardware de red adicional.
    • Veth (bridge). No era nada fuera de lo normal con respecto a lo que se hacía con KVM y con Xen. Nos vino muy bien para los entornos de desarrollo ya que da la sensación de entorno real, y también en los entornos controlados de producción donde necesitamos, por ejemplo, multicast (como clusters Tomcat).
  • Gestión de recursos:
    • Un poco confusa antes de leerse la documentación (muy buena, por cierto), pero muy simple de gestionar luego.
    • Límites de uso de páginas de memoria, unidades de CPU, bloques de disco, número de procesos, nº PTYs, buffers TCP, buffers socket …, quotas, memoria del kernel, y otros parámetros de configuración como red, dominio …
    • Finalmente, también se podían quitar los límites y dejar que las máquinas compitieran entre ellas por recursos del sistema real.

Finalmente, después 1 año y 4 meses, entre VZs, puedo decir que la migración a VZ fue un éxito en líneas generales. Cierto que tuvimos unos cuantos altercados en todo el proceso que nos costaron ciertos dolores de cabeza, pero en líneas generales el resultado final ha merecido la pena y ya llevamos más o menos desde marzo del año pasado (2009) trabajando a pleno  rendimiento con la plataforma virtual.

A grandes rasgos y a modo conclusiones, resumiré los hitos conseguidos:

  • Abstracción de red. Con Venet + iproute2 + iptables, e implementación de redes “virtuales” por encima de una única vlan.
  • Desolapamiento entre servidores físicos y servidores virtuales
  • Sencillo y fiable sistema de snapshots basado en vzdump y LVM2snapshots
  • Aceptable sistema de appliances (plantillas de OS para usar)
  • Pasamos de largo del centenar de máquinas virtuales corriendo en 15 nodos reales  y aún con cierta holgura de crecimiento mayor.
  • Hemos reducido a trivial la problematica de restauración tras pérdida y/o corrupción de datos;
  • Hemos reducido a trivial las restauración y/o disponibilidad de servicio a pesar de averías hardware.
  • Hemos reducido a trivial la clonación y replicación de servicios tanto para tareas  eventuales de actualizaciones delicadas que requieren de un checkpoint como para  la creación de escenarios réplica de servicios en funcionamiento los cuales poder llevar a ferias y demostraciones (a modo branch de los sistemas en “producción”)
  • Sencillo y fiable interfaz de administración web y acceso mediante consola VNC.
  • y lo más importante …  todo 100% libre y sin coste asociado al producto.

ClusterSSH: A GTK parallel SSH tool

From some time ago, I’m using a amazing admin tool named clusterSSH (aka cssh).

With this tool (packages available for GNU/Debian like distributions, at least), we
can interact simultaneously against a servers cluster
. This is very useful,
when your are making eventual tasks in similar servers
(for example, Tomcat Cluster nodes, … ) and you want execute the same intructions
in all of them.

cssh

My config (~/.csshrc) file for cssh is look like to the default settings:

auto_quit=yes
command=
comms=ssh
console_position=
extra_cluster_file=~/.clusters <<<<<<<<<
history_height=10
history_width=40
key_addhost=Control-Shift-plus
key_clientname=Alt-n
key_history=Alt-h
key_paste=Control-v
key_quit=Control-q
key_retilehosts=Alt-r
max_host_menu_items=30
method=ssh
mouse_paste=Button-2
rsh_args=
screen_reserve_bottom=60
screen_reserve_left=0
screen_reserve_right=0
screen_reserve_top=0
show_history=0
ssh=/usr/bin/ssh
ssh_args= -x -o ConnectTimeout=10
telnet_args=
terminal=/usr/bin/xterm
terminal_allow_send_events=-xrm ‘*.VT100.allowSendEvents:true’
terminal_args=
# terminal_bg_style=dark
terminal_colorize=1
terminal_decoration_height=10
terminal_decoration_width=8
terminal_font=6×13
terminal_reserve_bottom=0
terminal_reserve_left=5
terminal_reserve_right=0
terminal_reserve_top=5
terminal_size=80×24
terminal_title_opt=-T
title=CSSH
unmap_on_redraw=no
use_hotkeys=yes
window_tiling=yes
window_tiling_direction=right

The  ~/.clusters file is the file which defined the concrete clusters (see man ):

# home cluster
c-home tor@192.168.1.10 pablo@192.168.1.11

# promox-10.40.140
promox-10.40.140 10.40.140.17 10.40.140.18 10.40.140.19 10.40.140.33 10.40.140.17 10.40.140.18 10.40.140.33

# kvm-10.41.120
kvm-10.41.120 10.41.120.17 10.41.120.18

When I want work with c-home cluster, we execute de cssh as following:

# cssh c-home

In addition, I have written a tiny python script that automatized the cluster lines generation. This script is based in pararell executed ICMP queries. Thats is cool when your servers are deploying in a big VLAN or the number of them is big. In this cases, we can execute my script to found the servers.

# ./cssh-clusterline-generator.py -L 200 -H 250 -d mot -n 10.40.140 >> ~/.clusters

# mot-10.40.140-range-10-150
mot-10.40.140-range-10-150 10.40.140.17 10.40.140.19 10.40.140.32 10.40.140.37

Finally, … the script:

import os
from threading import Thread
from optparse import OptionParser

class Thread_(Thread):
def __init__ (self,ip):
Thread.__init__(self)
self.ip = ip
self.status = -1
def run(self):

res = os.system(“ping -c 1 %s > /dev/null” % self.ip)
res_str = “Not founded”

self.status = res

threads_=[]
ips = “”

parser = OptionParser()
parser.add_option(“-n”, “–net”, dest=”network”, default=”10.121.55″,
help=”Class C Network”, metavar=”NETWORK”)
parser.add_option(“-L”, “–lowrange”, dest=”lowrange”, default=”1″,
help=”Low range”, metavar=”LOW”)
parser.add_option(“-H”, “–highrange”, dest=”highrange”, default=”254″,
help=”High range”, metavar=”HIGH”)
parser.add_option(“-d”, “–deploy”, dest=”deploy”, default=”Net”,
help=”Deploy name”, metavar=”DEPLOY”)
parser.add_option(“-v”, “–verbose”, dest=”verbose”,
default=False, action=”store_true”,
help=”Verboise mode”)

(options, args) = parser.parse_args()

low_range = int(options.lowrange)
high_range = int(options.highrange)
net=options.network
deploy_id = options.deploy
verbose=options.verbose

for i in range (low_range, high_range+1):
ip = net + “.” + str(i)
h = Thread_(ip)
threads_.append(h)
h.start()

for h in threads_:
res_str = “Not founded”
h.join()
count=0

if h.status == 0:
count = count + 1
res_str = “FOUNDED”
if verbose:
print “Looking host in %s … %s” % (h.ip, res_str)
ips += h.ip + ” ”

if verbose:
print “Finished word. %s host founded” % count

print “”
print “# ” + deploy_id + “-” + net + “-range-” + str(low_range) + “-” + str(high_range)
line = deploy_id + “-” + net + “-range-” + str(low_range) + “-” + str(high_range) + ” ” + ips
print line