mm/slub: Add disable_canary kernel cmdline#112
Open
nbouchinet-anssi wants to merge 109 commits into
Open
Conversation
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Levente Polyak <levente@leventepolyak.net>
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
It can make sense to disable this to reduce attack surface / complexity.
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Orthogonal to the other sysctl proc functions expose the variant that is checking CAP_SYS_ADMIN on write for consumption in external subsystem's sysctl tables. Signed-off-by: Levente Polyak <levente@leventepolyak.net> [nicolas.bouchinet@ssi.gouv.fr: Constify the ctl_table argument as in commit 78eb4ea] Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
Based on the public grsecurity patches. [thibaut.sautereau@ssi.gouv.fr: Adapt to sysctl code refactoring] Signed-off-by: Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr> Signed-off-by: Levente Polyak <levente@leventepolyak.net> [thibaut.sautereau@ssi.gouv.fr: Adapt to sysctl code refactoring] Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@oss.cyber.gouv.fr>
This moves the usb related sysctl knobs to an own usb local sysctl table in order to clean up the global sysctl as well as allow the knob to be exported and referenced appropriately when building the usb components as dedicated modules. Signed-off-by: Levente Polyak <levente@leventepolyak.net>
The userspace API is left intact for compatibility. Signed-off-by: Levente Polyak <levente@leventepolyak.net>
This patch adds struct user_namespace *owner_user_ns to the tty_struct. Then it is set to current_user_ns() in the alloc_tty_struct function. This is done to facilitate capability checks against the original user namespace that allocated the tty. E.g. ns_capable(tty->owner_user_ns,CAP_SYS_ADMIN) This combined with the use of user namespace's will allow hardening protections to be built to mitigate container escapes that utilize TTY ioctls such as TIOCSTI. See: https://bugzilla.redhat.com/show_bug.cgi?id=1411256 Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Matt Brown <matt@nmatt.com>
This introduces the tiocsti_restrict sysctl, whose default is controlled via CONFIG_SECURITY_TIOCSTI_RESTRICT. When activated, this control restricts all TIOCSTI ioctl calls from non CAP_SYS_ADMIN users. This patch depends on patch 1/2 This patch was inspired from GRKERNSEC_HARDEN_TTY. This patch would have prevented https://bugzilla.redhat.com/show_bug.cgi?id=1411256 under the following conditions: * non-privileged container * container run inside new user namespace Possible effects on userland: There could be a few user programs that would be effected by this change. See: <https://codesearch.debian.net/search?q=ioctl%5C%28.*TIOCSTI> notable programs are: agetty, csh, xemacs and tcsh However, I still believe that this change is worth it given that the Kconfig defaults to n. This will be a feature that is turned on for the same reason that people activate it when using grsecurity. Users of this opt-in feature will realize that they are choosing security over some OS features like unprivileged TIOCSTI ioctls, as should be clear in the Kconfig help message. Threat Model/Patch Rational: >From grsecurity's config for GRKERNSEC_HARDEN_TTY. | There are very few legitimate uses for this functionality and it | has made vulnerabilities in several 'su'-like programs possible in | the past. Even without these vulnerabilities, it provides an | attacker with an easy mechanism to move laterally among other | processes within the same user's compromised session. So if one process within a tty session becomes compromised it can follow that additional processes, that are thought to be in different security boundaries, can be compromised as a result. When using a program like su or sudo, these additional processes could be in a tty session where TTY file descriptors are indeed shared over privilege boundaries. This is also an excellent writeup about the issue: <http://www.halfdog.net/Security/2012/TtyPushbackPrivilegeEscalation/> When user namespaces are in use, the check for the capability CAP_SYS_ADMIN is done against the user namespace that originally opened the tty. Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Matt Brown <matt@nmatt.com> Signed-off-by: Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr> Signed-off-by: Levente Polyak <levente@leventepolyak.net>
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
With 46c7dd5 ("modpost: always show verbose warning for section mismatch"), sec_mismatch_verbose was removed which would have printed errors for all writable function pointers during compilation if it hadn't been "#if 0"ed out for quite some time now. Let's introduce a new DEBUG_WRITABLE_FUNCTION_POINTERS_VERBOSE Kconfig option to cleanly control this linux-hardened functionality. Signed-off-by: Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr> Signed-off-by: Levente Polyak <levente@leventepolyak.net>
Commit a9cd410 ("mm/page_alloc.c: memory hotplug: free pages as higher order") changed `static void __init __free_pages_boot_core()` into `void __free_pages_core()`, causing the following section mismatch warning at compile time: WARNING: vmlinux.o(.text+0x180fe4): Section mismatch in reference from the function __free_pages_core() to the variable .meminit.data:extra_latent_entropy The function __free_pages_core() references the variable __meminitdata extra_latent_entropy. This is often because __free_pages_core lacks a __meminitdata annotation or the annotation of extra_latent_entropy is wrong. This commit is an attempt at fixing this issue. I'm not sure it's OK as we are accessing pages that are still managed by the memblock allocator. The prefetching part is not an issue as it only affects struct pages. Signed-off-by: Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr> [levente@leventepolyak.net: most of core MM initialization moved to mm/mm_init.c] Signed-off-by: Levente Polyak <levente@leventepolyak.net> [nicolas.bouchinet@ssi.gouv.fr: MAX_ORDER has been renamed to MAX_PAGE_ORDER (see 5e0a760)] Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
This has required some rework during the port to 5.13, due to da844b7 ("kasan, mm: integrate slab init_on_alloc with HW_TAGS"), and the patch is actually quite simpler now since we do not need to unpoison objects anymore. Signed-off-by: Levente Polyak <levente@leventepolyak.net> Signed-off-by: Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr> [nicolas.bouchinet@ssi.gouv.fr: pre/post-alloc hooks moved from mm/slab.h to mm/slub.c (see 6011be5)] Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
This is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. TCP simultaneous connect adds a weakness in Linux's implementation of TCP that allows two clients to connect to each other without either entering a listening state. The weakness allows an attacker to easily prevent a client from connecting to a known server provided the source port for the connection is guessed correctly. As the weakness could be used to prevent an antivirus or IPS from fetching updates, or prevent an SSL gateway from fetching a CRL, it should be eliminated. This creates a net.ipv4.tcp_simult_connect sysctl that when disabled, disables TCP simultaneous connect. Reviewed-by: Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr> Reviewed-by: Levente Polyak <levente@leventepolyak.net> Signed-off-by: Levente Polyak <levente@leventepolyak.net>
When disabled, unprivileged users will not be able to create new overlayfs mounts. This cuts the attack surface if no unprivileged user namespace mounts are required like for running rootless containers. Signed-off-by: Levente Polyak <levente@leventepolyak.net>
Trigger BUG when kfence encounters data corruption of kfence managed objects. This allows a finer-grained control instead of globally enabling panic_on_warn. Signed-off-by: Levente Polyak <levente@leventepolyak.net>
Before commit d0fe47c ("slub: add back check for free nonslab objects"), freeing a non-slab object used to trigger a BUG if CONFIG_DEBUG_VM was enabled. Now it only warns, which I think is not enough for such a memory corruption. Let's restore the previous behaviour, but tie it to CONFIG_BUG_ON_DATA_CORRUPTION as suggested by Levente. After page folios were introduced in v5.17, this patch was adapted to trigger a bug when the order of the folio is zero instead of when the page is not a compound page, which is not equivalent but respects the semantics of the conversion to page folios and follows the change made to the WARN_ON_ONCE beneath. Suggested-by: Levente Polyak <levente@leventepolyak.net> Signed-off-by: Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr> [nicolas.bouchinet@ssi.gouv.fr: kfree moved from mm/slab_common.c to mm/slub.c (see b774d3e)] Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
This forces processes to have `CAP_SYS_ADMIN` in order to use io_uring or to be in the io_uring_group. The patch alter the sysctl value range in order that once set to "2" it can't be lowered again. The io_uring_group sysctl option is set to -1 by default, user should define a proper group and set the sysctl properly if they want it configured. Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
Since we expose proc_dointvec_minmax_sysadmin, add it to sanity checking functions. Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
Signed-off-by: Levente Polyak <levente@leventepolyak.net>
With barn and sheaves introduction, slab objects are used to prefill or refill sheaves, which are cache of small objects taking the form of an array of pointers to slab objects. Sheaves are then used for quick allocation and free, which consist of shrinking and growing the array index. Thus, there is two vision of allocation state for those objects. While they are seen as allocated by the slab allocator, the sheaf allocator see them as free and then allocates them. We thus need to adapt the slab canary patch in order to avoid sanitizing objects allocation and free from this array. A next patch will add a per-sheave canary random value which would lead to a better tracking of objects overflow. Signed-off-by: Levente Polyak <levente@leventepolyak.net> Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
Sheaf allocation is an allocation cache that uses pre-allocated slab objects for faster free and allocation from a sheaf array. This patch adds a sheaf canary in order to detect small overflows and double-free of sheaf objects. Signed-off-by: Levente Polyak <levente@leventepolyak.net> Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
- Invert canary logic, we now only track objects in their inactive state allocated objects are always tagged as random_active. Free objects are tagged as sheaf_random_inactive or random_inactive depending on if they are in a sheaf or in a slab freelist. The logic inversion should make the patch way more stable. - Fixes slab_debug canary crash in early allocation state when the bootstrap sheaf is in use. - Fixes slabobj_ext offset computaion when stored in objects. - Always instrument sheaf_canary, even when slab_debug is active. - Fixes canary mismatch in some free path. - Adapt canary to new alloc/free paths. - Fixes kmem_cache_refill_sheaf instrumentation. Signed-off-by: Levente Polyak <levente@leventepolyak.net> Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
With canary_debug, a canary mismatch will print supposed canary values and the one that has been encountered. Signed-off-by: Levente Polyak <levente@leventepolyak.net> Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
Excplicitly define CONST_CAST_TREE For gcc-16, this was removed in gcc trunk see commits c3d96ff9e916c02584aa081f03ab999292efbb50 458c7926d48959abcb2c1adaa22458e27459a551 Link: https://www.spinics.net/lists/kernel/msg6111050.html
Signed-off-by: Levente Polyak <levente@leventepolyak.net>
Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
a8818b3 to
85ab20b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit adds an option to disable canary from kernel cmdline.
With sheaf introduction, the canary patch has grown in complexity and we encounter various crashes as for #111.
Add this option to let users disable canary without disabling the patchset completely or using an obsolete linux-hardened version.