What doesn't work in CRIU

Session information has not yet been published for this event.

*

One Line Summary

Why CRIU can't dump any set of processes

Abstract

In compare with virtual machines, CRIU can’t dump and restore any set of processes and there are a few reasons. The first one — CRIU doesn’t support all type of resources. Usually it means that we don’t implemented support for them yet and we don’t know how hard it will be. More serious problem with resources which CRIU supports but can’t handle them for all cases. For example, CRIU can’t restore arbitrary process tree, any set of mounts, etc.

The second part is about what we have to fix in the kernel. For example, when we migrate a processes to another host, the logic around clock_gettime() will be broken and we need to implement a namespace for clocks and timers to fix it.

The third part is about how to make sure that the migration will actually work. For some use cases of CRIU it is can be easily said if a process or container migration will succeed or not. The cases where CRIU will not work are most of the times pretty obvious to predict. But if running a checkpoint/restore which should theoretically succeed there are still many reasons a checkpoint/restore might fail. Do all files used by the restored process exist? Do all libraries used by the restored process exist? Are the used binaries there? Is there enough memory available to restore the process? Are there …?

CRIU already has a few checks like is the binary on the system the processes are restored the same size as the binary on the source system but those checks are only run during a full restore which might already be too late as the process on the source system has already been paused/stopped/aborted.

This talk presents what has already been implemented to help the users to make a decision if a migration should be started at all. This talk should also be the basis for a discussion what additional checks in CRIU can be provided to help the users make a better decision if a migration should be attempted.

Tags

containers, checkpoint, CRIU, snapshot

Speakers

  • Biography

    Software developer at Red Hat.

  • 20151022_100252-zzz

    Biography

    Developer in the CRIU and OpenVZ projects.

    Andrew Vagin is interested in Container Virtualization (LXC, OpenVZ). He started to write autotest for OpenVZ in 2006, when he was a student at the Moscow Institute of Physics and Technology (MIPT). Now Andrew works in OpenVZ kernel team. In addition he is an active developer in the CRIU (Checkpoint/Restore in Userspace) project.