Issues with Linux and large NUMA/COMA factor architectures



This talk will detail issues with Linux and large NUMA factor/COMA architectures.


ScaleMP vSMP Foundation is a form of aggregation virtualization. vSMP Foundation is a distributed virtual machine monitor (VMM) aggregating multiple similar x86-64 systems to make a single large shared memory system. The multiple systems are interconnected to each other with a commodity fast interconnect (currently Infiniband). The vSMP Foundation VMM takes care of aggregating all the constituent hardware of the aggregated system. This implies that the VMM also takes care of the memory/cache coherency among the constituent systems. Currently Linux is the only guest operating system that is supported by the vSMP Foundation.

The vSMP Foundation VMM implements multiple inter-node coherency mechanisms. The resulting shared memory architecture is both “NUMA” and “COMA” in nature. The
coherency mechanism chosen is transparent as far as the guest kernel is concerned (Applications can explicity choose a coherency mechanism though). Due to the software approach to coherency, and speeds of the existing commodity interconnects the NUMA factor of the aggregated system is fairly large. The COMA coherency domain results in a large internode cacheline size — 4kB usually. Due to the above two reasons, cache misses and cacheline ping-pongs are a major issue. This talk will focus on solutions employed in the kernel and applications to overcome performance penalties due to the NUMA factors and large cacheline. The effects due to the large cacheline show up in different ways — from the classical false sharing cases where traditional solutions based on padding could be employed to true sharing/lock contention cases where workarounds based on certain features in the Linux kernel and userspace libraries like hugetlb, libhugetlbfs, arena based allocations, third party malloc replacements, make more sense. This talk will detail all these workarounds and techniques used to date, some of the techniques we plan to use, and solicit suggestions on some unsolved issues.


  • Ravikiran Thirumalai

    ScaleMP Inc


    Ravikiran works for ScaleMP as the lead Linux developer. Kiran (as he likes to be called) maintains the ScaleMP related bits in the linux kernel and works on scalability aspects of Linux and its interactions with the ScaleMP vSMP Foundation VMM.

Leave a private comment to organizers about this proposal