Recover a container image from the kubelet cache
ποΈ Date: 2024-03-20 Β· πΊοΈ Word count: 1313 Β· β±οΈ Reading time: 7 MinuteOne time I was in the akward situation where some container images were deleted by mistake from the registry by a bugged automation script. As expected, most of the images were pretty old and couldn’t be rebuilt (offline repositories, broken links, etc.).
The containers running those images were still safe as long as their image was in the local kubelet cache of the kubernetes worker node. So, the mission was to recover the images from the node cache and restore them to the remote registry.
There are at least two ways to do it: an easy and safe one and a manual, tedious and much more interesting one. Useless to say which one I discovered first.
Let’s begin with the easy one.
Easy and safe way to recover an image
The tool crictl
is the kubernetes way to interact with the kubelet and its container and images.
If the tool is available, we can use it to export the cached image.
In this example, we will export the nginx:alpine
image.
- Identify the node that has the image in its cache, then run a privileged shell in it.
The fastest way to do this is to use the
kubectl debug
command, that spawns a privileged container with the node filesystem mounted on/host
.
# identify the node running the pod with the "missing" container image
kubectl get pods demo-nginx-78f6b68b8d-gnfgt -owide
# look at column "NODE"
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
demo-nginx-78f6b68b8d-gnfgt 1/1 Running 0 3m33s 10.42.3.77 worker1 <none> <none>
# run the privileged shell
kubectl debug node/worker1 -it --image=alpine
# once it started, chroot to /host
chroot /host
- List the available images in the node kubelet cache.
ctr -n k8s.io images list | grep "nginx:alpine"
docker.io/library/nginx:alpine application/vnd.oci.image.index.v1+json sha256:41523187cf7d7a2f2677a80609d9caa14388bf5c1fbca9c410ba3de602aaaab4 21.7 MiB linux/386,linux/amd64,linux/arm/v6,linux/arm/v7,linux/arm64/v8,linux/ppc64le,linux/riscv64,linux/s390x io.cri-containerd.image=managed
Found it. The full name is docker.io/library/nginx:alpine
and is available for multiple architectures.
- Export the image in a tar archive.
ctr -n k8s.io images export nginx-alpine.tar docker.io/library/nginx:alpine --platform linux/amd64
# creates nginx-alpine.tar
ls -alh nginx-alpine.tar
-rw-r--r--. 1 root root 22M Jan 9 14:43 nginx-alpine.tar
- Move the tar file out of the node.
If the command fails because of the large image size, increase the
--retries
value.
# get the name of the pod created by "kubectl debug"
kubectl get pods
kubectl cp --retries 30 node-debugger-master0-wqzvq:/host/nginx-alpine.tar nginx-alpine.tar
- Import the image to your local machine
podman load -i nginx-alpine.tar
Getting image source signatures
Copying blob da9db072f522 done |
Copying blob e10e486de1ab done |
Copying blob af9c0e53c5a4 done |
Copying blob b2eb2b8af93a done |
Copying blob e351ee5ec3d4 done |
Copying blob fbbf7d28be71 done |
Copying blob 471412c08d15 done |
Copying blob a2eb5282fbec done |
Copying config 91ca84b4f5 done |
Writing manifest to image destination
Loaded image: docker.io/library/nginx:alpine
podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/library/nginx latest f592cca94770 1 minutes ago 22 MB
podman run --rm -it --name ng nginx:alpine /bin/sh
# creates the container
- Clean up
From the pod:
# remove useless files from the node filesystem
rm nginx-alpine.tar
exit # from the chroot
exit # quit the pod
From the local machine:
rm nginx-alpine.tar
kubectl delete pod node-debugger-master0-wqzvq
Hard way to recover an image
This method includes the manual assembly of the image filesystem layers into a tar file compliant with the OCI specification.
Note: Use this method only if ctr
is not available.
A OCI container image is composed of one or more filesystem layers (stacked each on top of the previous) and are produced during the image building process. In addition, the image file contains metadata with informations about the image name, tag, size and supported architectures.
Recover the image layers
- First, we need to recover the image name, tag and sha256 checksum.
kubectl get pods demo-nginx-78f6b68b8d-gnfgt -oyaml | grep "image:\|imageID:"
- image: nginx:alpine
image: docker.io/library/nginx:alpine
imageID: docker.io/library/nginx@sha256:41523187cf7d7a2f2677a80609d9caa14388bf5c1fbca9c410ba3de602aaaab4
In this case:
- name:
docker.io/library/nginx
- tag:
alpine
- sha256:
41523187cf7d7a2f2677a80609d9caa14388bf5c1fbca9c410ba3de602aaaab4
- Get a root shell in the node using
kubectl debug
and browse to the filesystem path where the container images are stored. The path can change with the kubernetes distribution. For instance, in AKS is/var/lib/containerd/io.containerd.content.v1.content/blobs/sha256
, while in K3s is/var/lib/rancher/k3s/agent/containerd/io.containerd.content.v1.content/blobs/sha256
.
# identify the node running the pod with the "missing" container image
kubectl get pods demo-nginx-78f6b68b8d-gnfgt -owide
# look at column "NODE"
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
demo-nginx-78f6b68b8d-gnfgt 1/1 Running 0 3m33s 10.42.3.77 worker1 <none> <none>
# run the privileged shell
kubectl debug node/worker1 -it --image=alpine
# once it started, chroot to /host
chroot /host
cd /var/lib/rancher/k3s/agent/containerd/io.containerd.content.v1.content/blobs/sha256
ls -alh
- Inspect the layers that make up the image.
For each image, there exists an “entrypoint” metadata json file named after its sha256 checksum.
In case of images available for a single architecture, the file contains the layers that make up the image, in a layers
section.
Instead, if the image is available for more than one architecture, the entrypoint file will contain the name of the entrypoint files of each architecture (architecture-specific entrypoints).
This information is located in the manifests
array with the architecture and its entrypoint (the digest
field).
# inspect the entrypoint file
cat 41523187cf7d7a2f2677a80609d9caa14388bf5c1fbca9c410ba3de602aaaab4 | jq
In our case, the image nginx:alpine
is compatible with 16 architectures
cat 41523187cf7d7a2f2677a80609d9caa14388bf5c1fbca9c410ba3de602aaaab4 | jq '.manifests | length'
16
cat 41523187cf7d7a2f2677a80609d9caa14388bf5c1fbca9c410ba3de602aaaab4 | jq '.manifests[].platform' | head -8
{
"architecture": "amd64",
"os": "linux"
}
{ "architecture": "unknown",
"os": "unknown"
}
We are interested in amd64/linux
, so the first manifests
entry.
cat 41523187cf7d7a2f2677a80609d9caa14388bf5c1fbca9c410ba3de602aaaab4 | jq '.manifests[0]'
{
"annotations": {
"com.docker.official-images.bashbrew.arch": "amd64",
"org.opencontainers.image.base.digest": "sha256:a2d509cbd8a5a54c894cf518e94739f0936189631a24d05bb7c90e73ec639251",
"org.opencontainers.image.base.name": "nginx:1.27.3-alpine-slim",
"org.opencontainers.image.created": "2024-11-26T21:07:15Z",
"org.opencontainers.image.revision": "d21b4f2d90a1abb712a610678872e804267f4815",
"org.opencontainers.image.source": "https://github.com/nginxinc/docker-nginx.git#d21b4f2d90a1abb712a610678872e804267f4815:mainline/alpine",
"org.opencontainers.image.url": "https://hub.docker.com/_/nginx",
"org.opencontainers.image.version": "1.27.3-alpine"
},
"digest": "sha256:b1f7437a6d0398a47a5d74a1e178ea6fff3ea692c9e41d19c2b3f7ce52cdb371",
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"platform": {
"architecture": "amd64",
"os": "linux"
},
"size": 2498
}
Let’s take note of the architecture-specific entrypoint (field .manifests[0].digest
): b1f7437a6d0398a47a5d74a1e178ea6fff3ea692c9e41d19c2b3f7ce52cdb371
.
Finally, we can get the actual filesystem layers that make up the image for our architecture.
cat b1f7437a6d0398a47a5d74a1e178ea6fff3ea692c9e41d19c2b3f7ce52cdb371 | jq -r '.layers[].digest'
sha256:da9db072f522755cbeb85be2b3f84059b70571b229512f1571d9217b77e1087f
sha256:e10e486de1ab216956a771c782ef1adabef10b1bfd9a3765e14f79484784e9cd
sha256:af9c0e53c5a430c700d068066f35cb313945c9917bee94108bae13a933f6b6b4
sha256:b2eb2b8af93a0c4d2b5f5a70ed620869b406658462aba70b03f12f442aa40cc1
sha256:e351ee5ec3d4f55b4e3fce972c2a34a5632ede02602dfbcad85afc539b486131
sha256:fbbf7d28be71101773e4440c75dbbe7ed12767763fbb2e9c85a32a31f611169a
sha256:471412c08d15ee3b0c86b86fe91a6dd0e17d1f4d1b6d83a7f68e9b709328bf3d
sha256:a2eb5282fbec00fa3d13849dafbfd7f416b69059e527e5653b84f1d9245b8eb0
- Download the layers to the local machine.
For each layer , run
kubectl cp --retries 30 node-debugger-worker1-shn48:/host/var/lib/rancher/k3s/agent/containerd/io.containerd.content.v1.content/blobs/sha256/layername layername
Also download
- the architecture-specific entrypoint (in this case
b1f7437a6d0398a47a5d74a1e178ea6fff3ea692c9e41d19c2b3f7ce52cdb371
) - the layer in
.config.digest
in the architecture-specific entrypoint
cat b1f7437a6d0398a47a5d74a1e178ea6fff3ea692c9e41d19c2b3f7ce52cdb371 | jq '.config.digest'
Assemble the layers
- Create the metadata files.
Now, we need to put the layers together according to the OCI specification.
Keep in mind that the image entrypoint is now the architecture-specific one, in our case b1f7437a6d0398a47a5d74a1e178ea6fff3ea692c9e41d19c2b3f7ce52cdb371
.
Create 3 files:
oci-layout
index.json
manifest.json
oci-layout:
{"imageLayoutVersion":"1.0.0"}
index.json:
This file is the description of the image, it contains the name of the entrypoint and some metadata.
{
"schemaVersion": 2,
"manifests": [
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:b1f7437a6d0398a47a5d74a1e178ea6fff3ea692c9e41d19c2b3f7ce52cdb371",
"size": 2498,
"annotations": {
"io.containerd.image.name": "docker.io/library/nginx:alpine",
"org.opencontainers.image.ref.name": "alpine"
}
}
]
}
Fill:
mediaType
is the samemediaType
value from the architecture-specific entrypoint
cat b1f7437a6d0398a47a5d74a1e178ea6fff3ea692c9e41d19c2b3f7ce52cdb371 | jq -r '.mediaType'
digest
is the name of the architecture-specific entrypointsize
is size in bytes of the architecture-specific entrypoint
du -b b1f7437a6d0398a47a5d74a1e178ea6fff3ea692c9e41d19c2b3f7ce52cdb371
annotations
must be edited with the original image name and tag.
manifest.json:
This file specifies the layers (in order) of the image.
[
{
"Config": "blobs/sha256/91ca84b4f57794f97f70443afccff26aed771e36bc48bad1e26c2ce66124ea66",
"RepoTags": [
"docker.io/library/nginx:alpine"
],
"Layers": [
"blobs/sha256/da9db072f522755cbeb85be2b3f84059b70571b229512f1571d9217b77e1087f",
"blobs/sha256/e10e486de1ab216956a771c782ef1adabef10b1bfd9a3765e14f79484784e9cd",
"blobs/sha256/af9c0e53c5a430c700d068066f35cb313945c9917bee94108bae13a933f6b6b4",
"blobs/sha256/b2eb2b8af93a0c4d2b5f5a70ed620869b406658462aba70b03f12f442aa40cc1",
"blobs/sha256/e351ee5ec3d4f55b4e3fce972c2a34a5632ede02602dfbcad85afc539b486131",
"blobs/sha256/fbbf7d28be71101773e4440c75dbbe7ed12767763fbb2e9c85a32a31f611169a",
"blobs/sha256/471412c08d15ee3b0c86b86fe91a6dd0e17d1f4d1b6d83a7f68e9b709328bf3d",
"blobs/sha256/a2eb5282fbec00fa3d13849dafbfd7f416b69059e527e5653b84f1d9245b8eb0"
]
}
]
Fill:
Config
with the config.digest value from the entrypoint
cat b1f7437a6d0398a47a5d74a1e178ea6fff3ea692c9e41d19c2b3f7ce52cdb371 | jq '.config.digest'
RepoTags
with the image name and tagLayers
with the layers from the entrypoint
cat b1f7437a6d0398a47a5d74a1e178ea6fff3ea692c9e41d19c2b3f7ce52cdb371 | jq '.layers[].digest'
- Assemble the image.
Create a new folder, inside it put the 3 files created in the previous step and create in it the folder blobs/sha256
.
Put all the layers inside it.
The directory structure must look like this:
tree
.
βββ blobs
β βββ sha256
β βββ 471412c08d15ee3b0c86b86fe91a6dd0e17d1f4d1b6d83a7f68e9b709328bf3d
β βββ 91ca84b4f57794f97f70443afccff26aed771e36bc48bad1e26c2ce66124ea66
β βββ a2eb5282fbec00fa3d13849dafbfd7f416b69059e527e5653b84f1d9245b8eb0
β βββ af9c0e53c5a430c700d068066f35cb313945c9917bee94108bae13a933f6b6b4
β βββ b1f7437a6d0398a47a5d74a1e178ea6fff3ea692c9e41d19c2b3f7ce52cdb371
β βββ b2eb2b8af93a0c4d2b5f5a70ed620869b406658462aba70b03f12f442aa40cc1
β βββ da9db072f522755cbeb85be2b3f84059b70571b229512f1571d9217b77e1087f
β βββ e10e486de1ab216956a771c782ef1adabef10b1bfd9a3765e14f79484784e9cd
β βββ e351ee5ec3d4f55b4e3fce972c2a34a5632ede02602dfbcad85afc539b486131
β βββ fbbf7d28be71101773e4440c75dbbe7ed12767763fbb2e9c85a32a31f611169a
βββ index.json
βββ manifest.json
βββ oci-layout
Finally, tar the files and import the image. Test with podman images
and run a container.
tar -cf nginx-alpine.tar *
podman import -i nginx-alpine.tar
podman images