Proposal for support for runtime composefs validation#28658
Proposal for support for runtime composefs validation#28658alexlarsson wants to merge 4 commits into
Conversation
If this is set to `check`, then we validate the image manifest against the policy every time we run the container. If instead it is set to `require` then we do the same, but we also fail if there is no policy for the image. Also, we take care to validate the actual manifest data that we decode in image.Inspect(), so that we can trust the ImageData, like the annotations.
This is a small script that rewrites a manifest to add the fs-verity annotations, converts to zstd::chunked, signs it with cosign and pushes it to a repo.
I think we would like this yes, especially as we know that not everyone will use quadlet files in /usr |
|
|
||
| options = append(options, libpod.WithRootFSFromImage(newImage.ID(), resolvedImageName, s.RawImageName)) | ||
|
|
||
| if s.SignaturePolicy != "" { |
There was a problem hiding this comment.
I feel like there's a TOCTOU risk doing this in Specgen. The container isn't created yet, if the image tag we want to use is replaced before it is this check is subverted
There was a problem hiding this comment.
To add to this anything in specgen is by design not doing runtime validation, specgen runs once when the container is created. For podman stop/start it will not be called again.
Yes that does not matter for the quadlet use case but still if such cli options exists it must work with all of podman not just quadlet.
There was a problem hiding this comment.
ok nvm I fully read the code now, I think the TOCTOU does not matter because you pass in the verify digests and if the image was replaced in between then at mount time the digests will be invald and cause failure as they should.
There was a problem hiding this comment.
Yes, that main requirement is that we validate the manifest json at some point, and then make any security relative decisions based on the manifest content (such as the fs-verity deltas) only from exactly the same data that we validated.
|
|
||
| const verityDigestAnnotation = "io.containers.composefs.digest" | ||
|
|
||
| func extractVerityDigests(imageData *libimage.ImageData) ([][]string, error) { |
There was a problem hiding this comment.
A further reason not to do this in Specgen: You don't know what storage driver is in use by Libpod, so you could try to run this check on a system using a btrfs or zfs store. From my initial read of the code it'll probably work but you definitely aren't getting the benefits you expect if composefs isn't the backing store.
|
Can you expand more on your security/thread model please? Right now the digest is looked up once then put into the container config (db) before it is passed to the storage mount, you say you use a read only fs but the db of course must be writeable so the attacker could try to write to the sqlite db to unset these container config fields. For policy verification we also support loading that via env var that points to any file so that could also be used as attacker if they can set this for the podman process. Regarding quadlets are only in ro /usr/etc I am not sure that alone matters. We also lookup /run/containers/systemd/ which I guess would be writeable or even systemd itself uses run/systemd/system which could be used to overwrite the service with another unit. So I am not sure the controlled env will help you that much. If the attacker can get root with write access on the system there is nothing we could do. |
|
Its been a while since I looked at the podman codebase, so I marked this Draft for precisely the reason that I'd like some highlevel review on where this should go, and you're right that there is a potential issue with the config being stored in a db post-validation. Doing the validation later seems better from this PoV, but then it may be harder to do the actual validation. Anyway, you correctly ask for a threat model, and I can at least give you what the automotive version of this is: Suppose you have a sealed system, that verifies (secure-boot style) at boot and runtime:
In such a setup we can trust everything except /var, at least from a clean boot. This is good, because the goal are:
We already support embedding containers in the system image (as separate image store directory), which gives use some of the above features for containers. However, we would like to also extend this to being able to install and update containers in /var, separate from the bootc image. This will allow faster and partial OTA updates with less risk. And, we would like to keep the above features and goals for apps using such containers. So, lets assume we can "trust" /usr, /etc, $PATH, etc. What can we do to ensure that we can also install container images in /var and when we run them, we will run the code that was intended. And, this model should include hostile root-running code trying to persist root-rights across reboot. As the most basic example, lets assume we have a quadlet in the trusted read-only bootc image. This means we have a trusted podman binary, a trusted podman config, and a trusted podman run commandline, but an untrusted /var/lib/containers. The storage config enables the overlayfs backend and the features to allow composefs images and makes the container root read-only by default (and this is not overridden in the run command). Can we, in this world, give the above guarantees? The hope is that we can run only trusted code and config up to the container being started, and have the container image content be validated using composefs digests that are trusted due to being referenced in a signed json manifest that map to a key that is part of the read-only trusted rootfs. The main weakness I see is that an individual container may use a volume to persist data, and an attacker could modify it to attack the container, and then further use some container escape to get root access on the host. But, ignoring that vector, I think this is doable, although my initial draft may be naive in some aspects. |
Thanks, given that I think the current approach sounds reasonable if we move more of the validation into libpod (container start time) and then do not store the annotations as part of the container config, once validated with policy.json they need to passed along the call stack in memory to the storage mount code IMO. I have no real opinion on the image design questions, i.e. layer annotations for the hashes. Of course once an attacker gains root on the running system they could turn of the policy.json verification and/or overmount /usr/bin/podman or the quadlet file. |
In the automotive sphere, we're interested in having some level of runtime validation. For the rootfs we already get this from bootc using composefs. However, if we're also using containers in /var/lib/containers, those are not protected by this. However, containers/storage already (optionally) supports composefs, so we should be able to do something similar for podman.
Here is what this MR, and the related changes in container-libs does:
--security-opt signature=[check,require]. If this is set, then the signature for the manifest is validated atpodman runtime, which allows us to trust the manifest data, like the annotations.verity=require(which requires all files in the mount to be backed by a file with a fs-verity digest from the composefs blob).--security-opt verity=enforceis passed topodman run, then podman looks at the per-layer annotations in the image manifest (which we ideally trust due to a signature) forio.containers.composefs.digestkeys, where you can give a list of allowable fs-verity digests. These are then forwarded to the overlayfs driver which validates this at mount time.With the above, we can have a pretty robust validation of the container at runtime. There are some weak points:
All of these are fixable in a controlled environment. For example, if you have a read-only /usr and /etc like bootc, and you ship a quadlet file in /usr that has the right arguments, then you can have some trust in that the right code is running, and you can do "podman pull" to get a new image version, keeping this trust.
I have an example signed image with annotations at https://quay.io/repository/alexl42/centos-verity. See the description there for the public key used. With I can run a validated image:
You can also see how it works with an unsigned and no-verity image:
Or if i tweak the composefs blob:
I don't expect this PR to just be necessarily merged as is, but I'd like to bring this up for discussion. We'd like a feature like this in automotive, is that reasonable? Is the approach reasonable? Is the interface reasonable?
Discussion points:
Some minor notes: