Fix Android ARM64 build for torchao lowbit kernels#4029
Merged
meta-codesync[bot] merged 1 commit intopytorch:mainfrom Mar 14, 2026
Merged
Fix Android ARM64 build for torchao lowbit kernels#4029meta-codesync[bot] merged 1 commit intopytorch:mainfrom
meta-codesync[bot] merged 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4029
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 2 Unrelated FailuresAs of commit af0b021 with merge base 5e79e5e ( NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
SS-JIA
added a commit
to SS-JIA/ao
that referenced
this pull request
Mar 9, 2026
Summary: D95224222 added torchao ARM lowbit kernel dependencies to the ExecuTorch llama runner for ARM64 builds, but the Buck targets had two issues that prevented the Android ARM64 build from succeeding. 1. `std::aligned_alloc` is not available on Android API < 28. Android's Bionic libc only added `aligned_alloc` in API 28 (Android 9 Pie). The NDK's libc++ declares `using ::aligned_alloc _LIBCPP_USING_IF_EXISTS` which silently becomes unresolved when targeting API < 28 (the default `app_platform` is android-21). This caused a compile error in `shared_kernels/internal/memory.h`. Fixed by using `posix_memalign` (which is available since API 16) as a fallback when `__ANDROID_API__ < 28`. 2. The aarch64 `linear` kernels use ARM dot product intrinsics (`vdotq_s32`) which require the `+dotprod` architecture feature. The CMake build already passed `-march=armv8.4-a+dotprod`, but the Buck targets were missing this flag for Android builds. Fixed by adding `-march=armv8.2-a+dotprod` to `fbandroid_compiler_flags` in both the `aarch64/linear` target and the `op_linear_8bit_act_xbit_weight_executorch` target. Differential Revision: D95832860
digantdesai
requested changes
Mar 9, 2026
Contributor
digantdesai
left a comment
There was a problem hiding this comment.
Review automatically exported from Phabricator review in Meta.
SS-JIA
added a commit
to SS-JIA/ao
that referenced
this pull request
Mar 9, 2026
Summary: D95224222 added torchao ARM lowbit kernel dependencies to the ExecuTorch llama runner for ARM64 builds, but the Buck targets had two issues that prevented the Android ARM64 build from succeeding. 1. `std::aligned_alloc` is not available on Android API < 28. Android's Bionic libc only added `aligned_alloc` in API 28 (Android 9 Pie). The NDK's libc++ declares `using ::aligned_alloc _LIBCPP_USING_IF_EXISTS` which silently becomes unresolved when targeting API < 28 (the default `app_platform` is android-21). This caused a compile error in `shared_kernels/internal/memory.h`. Fixed by using `posix_memalign` (which is available since API 16) as a fallback when `__ANDROID_API__ < 28`. 2. The aarch64 `linear` kernels use ARM dot product intrinsics (`vdotq_s32`) which require the `+dotprod` architecture feature. The CMake build already passed `-march=armv8.4-a+dotprod`, but the Buck targets were missing this flag for Android builds. Fixed by adding `-march=armv8.2-a+dotprod` to `fbandroid_compiler_flags` in both the `aarch64/linear` target and the `op_linear_8bit_act_xbit_weight_executorch` target. Differential Revision: D95832860
SS-JIA
added a commit
to SS-JIA/ao
that referenced
this pull request
Mar 9, 2026
Summary: Pull Request resolved: pytorch#4029 D95224222 added torchao ARM lowbit kernel dependencies to the ExecuTorch llama runner for ARM64 builds, but the Buck targets had two issues that prevented the Android ARM64 build from succeeding. 1. `std::aligned_alloc` is not available on Android API < 28. Android's Bionic libc only added `aligned_alloc` in API 28 (Android 9 Pie). The NDK's libc++ declares `using ::aligned_alloc _LIBCPP_USING_IF_EXISTS` which silently becomes unresolved when targeting API < 28 (the default `app_platform` is android-21). This caused a compile error in `shared_kernels/internal/memory.h`. Fixed by using `posix_memalign` (which is available since API 16) as a fallback when `__ANDROID_API__ < 28`. 2. The aarch64 `linear` kernels use ARM dot product intrinsics (`vdotq_s32`) which require the `+dotprod` architecture feature. The CMake build already passed `-march=armv8.4-a+dotprod`, but the Buck targets were missing this flag for Android builds. Fixed by adding `-march=armv8.2-a+dotprod` to `fbandroid_compiler_flags` in both the `aarch64/linear` target and the `op_linear_8bit_act_xbit_weight_executorch` target. Differential Revision: D95832860
SS-JIA
added a commit
to SS-JIA/ao
that referenced
this pull request
Mar 9, 2026
Summary: D95224222 added torchao ARM lowbit kernel dependencies to the ExecuTorch llama runner for ARM64 builds, but the Buck targets had two issues that prevented the Android ARM64 build from succeeding. 1. `std::aligned_alloc` is not available on Android API < 28. Android's Bionic libc only added `aligned_alloc` in API 28 (Android 9 Pie). The NDK's libc++ declares `using ::aligned_alloc _LIBCPP_USING_IF_EXISTS` which silently becomes unresolved when targeting API < 28 (the default `app_platform` is android-21). This caused a compile error in `shared_kernels/internal/memory.h`. Fixed by using `posix_memalign` (which is available since API 16) as a fallback when `__ANDROID_API__ < 28`. 2. The aarch64 `linear` kernels use ARM dot product intrinsics (`vdotq_s32`) which require the `+dotprod` architecture feature. The CMake build already passed `-march=armv8.4-a+dotprod`, but the Buck targets were missing this flag for Android builds. Fixed by adding `-march=armv8.2-a+dotprod` to `fbandroid_compiler_flags` in both the `aarch64/linear` target and the `op_linear_8bit_act_xbit_weight_executorch` target. Differential Revision: D95832860
JacobSzwejbka
approved these changes
Mar 11, 2026
6b2619d to
da12a44
Compare
SS-JIA
added a commit
to SS-JIA/ao
that referenced
this pull request
Mar 13, 2026
Summary: D95224222 added torchao ARM lowbit kernel dependencies to the ExecuTorch llama runner for ARM64 builds, but the Buck targets had two issues that prevented the Android ARM64 build from succeeding. 1. `std::aligned_alloc` is not available on Android API < 28. Android's Bionic libc only added `aligned_alloc` in API 28 (Android 9 Pie). The NDK's libc++ declares `using ::aligned_alloc _LIBCPP_USING_IF_EXISTS` which silently becomes unresolved when targeting API < 28 (the default `app_platform` is android-21). This caused a compile error in `shared_kernels/internal/memory.h`. Fixed by using `posix_memalign` (which is available since API 16) as a fallback when `__ANDROID_API__ < 28`. 2. The aarch64 `linear` kernels use ARM dot product intrinsics (`vdotq_s32`) which require the `+dotprod` architecture feature. The CMake build already passed `-march=armv8.4-a+dotprod`, but the Buck targets were missing this flag for Android builds. Fixed by adding `-march=armv8.2-a+dotprod` to `fbandroid_compiler_flags` in both the `aarch64/linear` target and the `op_linear_8bit_act_xbit_weight_executorch` target. Reviewed By: tanvirislam-meta Differential Revision: D95832860
SS-JIA
added a commit
to SS-JIA/ao
that referenced
this pull request
Mar 13, 2026
Summary: D95224222 added torchao ARM lowbit kernel dependencies to the ExecuTorch llama runner for ARM64 builds, but the Buck targets had two issues that prevented the Android ARM64 build from succeeding. 1. `std::aligned_alloc` is not available on Android API < 28. Android's Bionic libc only added `aligned_alloc` in API 28 (Android 9 Pie). The NDK's libc++ declares `using ::aligned_alloc _LIBCPP_USING_IF_EXISTS` which silently becomes unresolved when targeting API < 28 (the default `app_platform` is android-21). This caused a compile error in `shared_kernels/internal/memory.h`. Fixed by using `posix_memalign` (which is available since API 16) as a fallback when `__ANDROID_API__ < 28`. 2. The aarch64 `linear` kernels use ARM dot product intrinsics (`vdotq_s32`) which require the `+dotprod` architecture feature. The CMake build already passed `-march=armv8.4-a+dotprod`, but the Buck targets were missing this flag for Android builds. Fixed by adding `-march=armv8.2-a+dotprod` to `fbandroid_compiler_flags` in both the `aarch64/linear` target and the `op_linear_8bit_act_xbit_weight_executorch` target. Reviewed By: tanvirislam-meta Differential Revision: D95832860
43510af to
02c2861
Compare
02c2861 to
d734f1f
Compare
SS-JIA
added a commit
to SS-JIA/ao
that referenced
this pull request
Mar 13, 2026
Summary: D95224222 added torchao ARM lowbit kernel dependencies to the ExecuTorch llama runner for ARM64 builds, but the Buck targets had two issues that prevented the Android ARM64 build from succeeding. 1. `std::aligned_alloc` is not available on Android API < 28. Android's Bionic libc only added `aligned_alloc` in API 28 (Android 9 Pie). The NDK's libc++ declares `using ::aligned_alloc _LIBCPP_USING_IF_EXISTS` which silently becomes unresolved when targeting API < 28 (the default `app_platform` is android-21). This caused a compile error in `shared_kernels/internal/memory.h`. Fixed by using `posix_memalign` (which is available since API 16) as a fallback when `__ANDROID_API__ < 28`. 2. The aarch64 `linear` kernels use ARM dot product intrinsics (`vdotq_s32`) which require the `+dotprod` architecture feature. The CMake build already passed `-march=armv8.4-a+dotprod`, but the Buck targets were missing this flag for Android builds. Fixed by adding `-march=armv8.2-a+dotprod` to `fbandroid_compiler_flags` in both the `aarch64/linear` target and the `op_linear_8bit_act_xbit_weight_executorch` target. Reviewed By: tanvirislam-meta Differential Revision: D95832860
d734f1f to
da8b96d
Compare
Summary: D95224222 added torchao ARM lowbit kernel dependencies to the ExecuTorch llama runner for ARM64 builds, but the Buck targets had two issues that prevented the Android ARM64 build from succeeding. 1. `std::aligned_alloc` is not available on Android API < 28. Android's Bionic libc only added `aligned_alloc` in API 28 (Android 9 Pie). The NDK's libc++ declares `using ::aligned_alloc _LIBCPP_USING_IF_EXISTS` which silently becomes unresolved when targeting API < 28 (the default `app_platform` is android-21). This caused a compile error in `shared_kernels/internal/memory.h`. Fixed by using `posix_memalign` (which is available since API 16) as a fallback when `__ANDROID_API__ < 28`. 2. The aarch64 `linear` kernels use ARM dot product intrinsics (`vdotq_s32`) which require the `+dotprod` architecture feature. The CMake build already passed `-march=armv8.4-a+dotprod`, but the Buck targets were missing this flag for Android builds. Fixed by adding `-march=armv8.2-a+dotprod` to `fbandroid_compiler_flags` in both the `aarch64/linear` target and the `op_linear_8bit_act_xbit_weight_executorch` target. Reviewed By: tanvirislam-meta Differential Revision: D95832860
da8b96d to
af0b021
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
D95224222 added torchao ARM lowbit kernel dependencies to the ExecuTorch
llama runner for ARM64 builds, but the Buck targets had two issues that
prevented the Android ARM64 build from succeeding.
std::aligned_allocis not available on Android API < 28.Android's Bionic libc only added
aligned_allocin API 28 (Android 9 Pie).The NDK's libc++ declares
using ::aligned_alloc _LIBCPP_USING_IF_EXISTSwhich silently becomes unresolved when targeting API < 28 (the default
app_platformis android-21). This caused a compile error inshared_kernels/internal/memory.h. Fixed by usingposix_memalign(whichis available since API 16) as a fallback when
__ANDROID_API__ < 28.The aarch64
linearkernels use ARM dot product intrinsics (vdotq_s32)which require the
+dotprodarchitecture feature. The CMake build alreadypassed
-march=armv8.4-a+dotprod, but the Buck targets were missing thisflag for Android builds. Fixed by adding
-march=armv8.2-a+dotprodtofbandroid_compiler_flagsin both theaarch64/lineartarget and theop_linear_8bit_act_xbit_weight_executorchtarget.Reviewed By: tanvirislam-meta
Differential Revision: D95832860