Skip to content

Fix Android ARM64 build for torchao lowbit kernels#4029

Merged
meta-codesync[bot] merged 1 commit intopytorch:mainfrom
SS-JIA:export-D95832860
Mar 14, 2026
Merged

Fix Android ARM64 build for torchao lowbit kernels#4029
meta-codesync[bot] merged 1 commit intopytorch:mainfrom
SS-JIA:export-D95832860

Conversation

@SS-JIA
Copy link
Copy Markdown
Contributor

@SS-JIA SS-JIA commented Mar 9, 2026

Summary:
D95224222 added torchao ARM lowbit kernel dependencies to the ExecuTorch
llama runner for ARM64 builds, but the Buck targets had two issues that
prevented the Android ARM64 build from succeeding.

  1. std::aligned_alloc is not available on Android API < 28.
    Android's Bionic libc only added aligned_alloc in API 28 (Android 9 Pie).
    The NDK's libc++ declares using ::aligned_alloc _LIBCPP_USING_IF_EXISTS
    which silently becomes unresolved when targeting API < 28 (the default
    app_platform is android-21). This caused a compile error in
    shared_kernels/internal/memory.h. Fixed by using posix_memalign (which
    is available since API 16) as a fallback when __ANDROID_API__ < 28.

  2. The aarch64 linear kernels use ARM dot product intrinsics (vdotq_s32)
    which require the +dotprod architecture feature. The CMake build already
    passed -march=armv8.4-a+dotprod, but the Buck targets were missing this
    flag for Android builds. Fixed by adding -march=armv8.2-a+dotprod to
    fbandroid_compiler_flags in both the aarch64/linear target and the
    op_linear_8bit_act_xbit_weight_executorch target.

Reviewed By: tanvirislam-meta

Differential Revision: D95832860

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Mar 9, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4029

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit af0b021 with merge base 5e79e5e (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Mar 9, 2026

@SS-JIA has exported this pull request. If you are a Meta employee, you can view the originating Diff in D95832860.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 9, 2026
SS-JIA added a commit to SS-JIA/ao that referenced this pull request Mar 9, 2026
Summary:

D95224222 added torchao ARM lowbit kernel dependencies to the ExecuTorch
llama runner for ARM64 builds, but the Buck targets had two issues that
prevented the Android ARM64 build from succeeding.

1. `std::aligned_alloc` is not available on Android API < 28.
   Android's Bionic libc only added `aligned_alloc` in API 28 (Android 9 Pie).
   The NDK's libc++ declares `using ::aligned_alloc _LIBCPP_USING_IF_EXISTS`
   which silently becomes unresolved when targeting API < 28 (the default
   `app_platform` is android-21). This caused a compile error in
   `shared_kernels/internal/memory.h`. Fixed by using `posix_memalign` (which
   is available since API 16) as a fallback when `__ANDROID_API__ < 28`.

2. The aarch64 `linear` kernels use ARM dot product intrinsics (`vdotq_s32`)
   which require the `+dotprod` architecture feature. The CMake build already
   passed `-march=armv8.4-a+dotprod`, but the Buck targets were missing this
   flag for Android builds. Fixed by adding `-march=armv8.2-a+dotprod` to
   `fbandroid_compiler_flags` in both the `aarch64/linear` target and the
   `op_linear_8bit_act_xbit_weight_executorch` target.

Differential Revision: D95832860
@SS-JIA SS-JIA force-pushed the export-D95832860 branch from b41bf73 to 0ee22e2 Compare March 9, 2026 19:15
Copy link
Copy Markdown
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

SS-JIA added a commit to SS-JIA/ao that referenced this pull request Mar 9, 2026
Summary:

D95224222 added torchao ARM lowbit kernel dependencies to the ExecuTorch
llama runner for ARM64 builds, but the Buck targets had two issues that
prevented the Android ARM64 build from succeeding.

1. `std::aligned_alloc` is not available on Android API < 28.
   Android's Bionic libc only added `aligned_alloc` in API 28 (Android 9 Pie).
   The NDK's libc++ declares `using ::aligned_alloc _LIBCPP_USING_IF_EXISTS`
   which silently becomes unresolved when targeting API < 28 (the default
   `app_platform` is android-21). This caused a compile error in
   `shared_kernels/internal/memory.h`. Fixed by using `posix_memalign` (which
   is available since API 16) as a fallback when `__ANDROID_API__ < 28`.

2. The aarch64 `linear` kernels use ARM dot product intrinsics (`vdotq_s32`)
   which require the `+dotprod` architecture feature. The CMake build already
   passed `-march=armv8.4-a+dotprod`, but the Buck targets were missing this
   flag for Android builds. Fixed by adding `-march=armv8.2-a+dotprod` to
   `fbandroid_compiler_flags` in both the `aarch64/linear` target and the
   `op_linear_8bit_act_xbit_weight_executorch` target.

Differential Revision: D95832860
@SS-JIA SS-JIA force-pushed the export-D95832860 branch from 0ee22e2 to 28fa77f Compare March 9, 2026 19:38
SS-JIA added a commit to SS-JIA/ao that referenced this pull request Mar 9, 2026
Summary:
Pull Request resolved: pytorch#4029

D95224222 added torchao ARM lowbit kernel dependencies to the ExecuTorch
llama runner for ARM64 builds, but the Buck targets had two issues that
prevented the Android ARM64 build from succeeding.

1. `std::aligned_alloc` is not available on Android API < 28.
   Android's Bionic libc only added `aligned_alloc` in API 28 (Android 9 Pie).
   The NDK's libc++ declares `using ::aligned_alloc _LIBCPP_USING_IF_EXISTS`
   which silently becomes unresolved when targeting API < 28 (the default
   `app_platform` is android-21). This caused a compile error in
   `shared_kernels/internal/memory.h`. Fixed by using `posix_memalign` (which
   is available since API 16) as a fallback when `__ANDROID_API__ < 28`.

2. The aarch64 `linear` kernels use ARM dot product intrinsics (`vdotq_s32`)
   which require the `+dotprod` architecture feature. The CMake build already
   passed `-march=armv8.4-a+dotprod`, but the Buck targets were missing this
   flag for Android builds. Fixed by adding `-march=armv8.2-a+dotprod` to
   `fbandroid_compiler_flags` in both the `aarch64/linear` target and the
   `op_linear_8bit_act_xbit_weight_executorch` target.

Differential Revision: D95832860
@SS-JIA SS-JIA force-pushed the export-D95832860 branch from 28fa77f to a156d7c Compare March 9, 2026 19:42
SS-JIA added a commit to SS-JIA/ao that referenced this pull request Mar 9, 2026
Summary:

D95224222 added torchao ARM lowbit kernel dependencies to the ExecuTorch
llama runner for ARM64 builds, but the Buck targets had two issues that
prevented the Android ARM64 build from succeeding.

1. `std::aligned_alloc` is not available on Android API < 28.
   Android's Bionic libc only added `aligned_alloc` in API 28 (Android 9 Pie).
   The NDK's libc++ declares `using ::aligned_alloc _LIBCPP_USING_IF_EXISTS`
   which silently becomes unresolved when targeting API < 28 (the default
   `app_platform` is android-21). This caused a compile error in
   `shared_kernels/internal/memory.h`. Fixed by using `posix_memalign` (which
   is available since API 16) as a fallback when `__ANDROID_API__ < 28`.

2. The aarch64 `linear` kernels use ARM dot product intrinsics (`vdotq_s32`)
   which require the `+dotprod` architecture feature. The CMake build already
   passed `-march=armv8.4-a+dotprod`, but the Buck targets were missing this
   flag for Android builds. Fixed by adding `-march=armv8.2-a+dotprod` to
   `fbandroid_compiler_flags` in both the `aarch64/linear` target and the
   `op_linear_8bit_act_xbit_weight_executorch` target.

Differential Revision: D95832860
@SS-JIA SS-JIA force-pushed the export-D95832860 branch from a156d7c to 6b2619d Compare March 9, 2026 22:27
@meta-codesync meta-codesync Bot changed the title Fix Android ARM64 build for torchao lowbit kernels Fix Android ARM64 build for torchao lowbit kernels (#4029) Mar 13, 2026
@SS-JIA SS-JIA force-pushed the export-D95832860 branch from 6b2619d to da12a44 Compare March 13, 2026 19:28
SS-JIA added a commit to SS-JIA/ao that referenced this pull request Mar 13, 2026
Summary:

D95224222 added torchao ARM lowbit kernel dependencies to the ExecuTorch
llama runner for ARM64 builds, but the Buck targets had two issues that
prevented the Android ARM64 build from succeeding.

1. `std::aligned_alloc` is not available on Android API < 28.
   Android's Bionic libc only added `aligned_alloc` in API 28 (Android 9 Pie).
   The NDK's libc++ declares `using ::aligned_alloc _LIBCPP_USING_IF_EXISTS`
   which silently becomes unresolved when targeting API < 28 (the default
   `app_platform` is android-21). This caused a compile error in
   `shared_kernels/internal/memory.h`. Fixed by using `posix_memalign` (which
   is available since API 16) as a fallback when `__ANDROID_API__ < 28`.

2. The aarch64 `linear` kernels use ARM dot product intrinsics (`vdotq_s32`)
   which require the `+dotprod` architecture feature. The CMake build already
   passed `-march=armv8.4-a+dotprod`, but the Buck targets were missing this
   flag for Android builds. Fixed by adding `-march=armv8.2-a+dotprod` to
   `fbandroid_compiler_flags` in both the `aarch64/linear` target and the
   `op_linear_8bit_act_xbit_weight_executorch` target.

Reviewed By: tanvirislam-meta

Differential Revision: D95832860
SS-JIA added a commit to SS-JIA/ao that referenced this pull request Mar 13, 2026
Summary:

D95224222 added torchao ARM lowbit kernel dependencies to the ExecuTorch
llama runner for ARM64 builds, but the Buck targets had two issues that
prevented the Android ARM64 build from succeeding.

1. `std::aligned_alloc` is not available on Android API < 28.
   Android's Bionic libc only added `aligned_alloc` in API 28 (Android 9 Pie).
   The NDK's libc++ declares `using ::aligned_alloc _LIBCPP_USING_IF_EXISTS`
   which silently becomes unresolved when targeting API < 28 (the default
   `app_platform` is android-21). This caused a compile error in
   `shared_kernels/internal/memory.h`. Fixed by using `posix_memalign` (which
   is available since API 16) as a fallback when `__ANDROID_API__ < 28`.

2. The aarch64 `linear` kernels use ARM dot product intrinsics (`vdotq_s32`)
   which require the `+dotprod` architecture feature. The CMake build already
   passed `-march=armv8.4-a+dotprod`, but the Buck targets were missing this
   flag for Android builds. Fixed by adding `-march=armv8.2-a+dotprod` to
   `fbandroid_compiler_flags` in both the `aarch64/linear` target and the
   `op_linear_8bit_act_xbit_weight_executorch` target.

Reviewed By: tanvirislam-meta

Differential Revision: D95832860
@SS-JIA SS-JIA force-pushed the export-D95832860 branch 2 times, most recently from 43510af to 02c2861 Compare March 13, 2026 19:32
@meta-codesync meta-codesync Bot changed the title Fix Android ARM64 build for torchao lowbit kernels (#4029) Fix Android ARM64 build for torchao lowbit kernels Mar 13, 2026
@SS-JIA SS-JIA force-pushed the export-D95832860 branch from 02c2861 to d734f1f Compare March 13, 2026 21:03
@meta-codesync meta-codesync Bot changed the title Fix Android ARM64 build for torchao lowbit kernels Fix Android ARM64 build for torchao lowbit kernels (#4029) Mar 13, 2026
SS-JIA added a commit to SS-JIA/ao that referenced this pull request Mar 13, 2026
Summary:

D95224222 added torchao ARM lowbit kernel dependencies to the ExecuTorch
llama runner for ARM64 builds, but the Buck targets had two issues that
prevented the Android ARM64 build from succeeding.

1. `std::aligned_alloc` is not available on Android API < 28.
   Android's Bionic libc only added `aligned_alloc` in API 28 (Android 9 Pie).
   The NDK's libc++ declares `using ::aligned_alloc _LIBCPP_USING_IF_EXISTS`
   which silently becomes unresolved when targeting API < 28 (the default
   `app_platform` is android-21). This caused a compile error in
   `shared_kernels/internal/memory.h`. Fixed by using `posix_memalign` (which
   is available since API 16) as a fallback when `__ANDROID_API__ < 28`.

2. The aarch64 `linear` kernels use ARM dot product intrinsics (`vdotq_s32`)
   which require the `+dotprod` architecture feature. The CMake build already
   passed `-march=armv8.4-a+dotprod`, but the Buck targets were missing this
   flag for Android builds. Fixed by adding `-march=armv8.2-a+dotprod` to
   `fbandroid_compiler_flags` in both the `aarch64/linear` target and the
   `op_linear_8bit_act_xbit_weight_executorch` target.

Reviewed By: tanvirislam-meta

Differential Revision: D95832860
@SS-JIA SS-JIA force-pushed the export-D95832860 branch from d734f1f to da8b96d Compare March 13, 2026 21:17
Summary:
D95224222 added torchao ARM lowbit kernel dependencies to the ExecuTorch
llama runner for ARM64 builds, but the Buck targets had two issues that
prevented the Android ARM64 build from succeeding.

1. `std::aligned_alloc` is not available on Android API < 28.
   Android's Bionic libc only added `aligned_alloc` in API 28 (Android 9 Pie).
   The NDK's libc++ declares `using ::aligned_alloc _LIBCPP_USING_IF_EXISTS`
   which silently becomes unresolved when targeting API < 28 (the default
   `app_platform` is android-21). This caused a compile error in
   `shared_kernels/internal/memory.h`. Fixed by using `posix_memalign` (which
   is available since API 16) as a fallback when `__ANDROID_API__ < 28`.

2. The aarch64 `linear` kernels use ARM dot product intrinsics (`vdotq_s32`)
   which require the `+dotprod` architecture feature. The CMake build already
   passed `-march=armv8.4-a+dotprod`, but the Buck targets were missing this
   flag for Android builds. Fixed by adding `-march=armv8.2-a+dotprod` to
   `fbandroid_compiler_flags` in both the `aarch64/linear` target and the
   `op_linear_8bit_act_xbit_weight_executorch` target.

Reviewed By: tanvirislam-meta

Differential Revision: D95832860
@meta-codesync meta-codesync Bot changed the title Fix Android ARM64 build for torchao lowbit kernels (#4029) Fix Android ARM64 build for torchao lowbit kernels Mar 13, 2026
@SS-JIA SS-JIA force-pushed the export-D95832860 branch from da8b96d to af0b021 Compare March 13, 2026 23:54
@meta-codesync meta-codesync Bot merged commit bb8897d into pytorch:main Mar 14, 2026
18 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants