Skip to content

field: force-inline 5x52 mul and sqr#1859

Open
l0rinc wants to merge 1 commit into
bitcoin-core:masterfrom
l0rinc:l0rinc/force-inline-5x52-mul-sqr
Open

field: force-inline 5x52 mul and sqr#1859
l0rinc wants to merge 1 commit into
bitcoin-core:masterfrom
l0rinc:l0rinc/force-inline-5x52-mul-sqr

Conversation

@l0rinc

@l0rinc l0rinc commented May 30, 2026

Copy link
Copy Markdown

Problem: The 5x52 field multiplication and squaring routines are hot in group arithmetic and scalar multiplication. Some compilers leave the thin wrappers and int128 inner helpers out of line, which keeps a call boundary in this hot path and limits scheduling of the 64x64->128 arithmetic.

Fix: Define SECP256K1_FORCE_INLINE next to the existing inline helper and use it for the 5x52 multiplication and squaring wrappers and int128 inner helpers.

For default optimized builds, this expands to __forceinline on MSVC-compatible compilers and to __attribute__((always_inline)) on GCC-compatible compilers. It falls back to the existing inline spelling when inlining is disabled, when optimization is disabled, when optimizing for size on GCC/Clang, or when _DEBUG is defined.

Benchmarks: Values are relative changes in Min(us), lower is better.

Source Host / CPU Compiler ecdsa_verify ecdh schnorrsig_verify field_sqr field_mul
local M4-Max.local gcc-14 14.3.0 -9.1% -9.0% -9.6% -7.0% -4.0%
local i9-ssd GCC 16.1.0 -5.3% -4.1% -5.5% -15.7% -11.6%
local WIN-A2EHOAU4JET / Xeon E5-2637 v2 MSVC 19.50.35728 -2.6% -9.3% -2.4% -7.4% -7.4%
local i7-hdd GCC 14.2.0 -10.9% -11.1% -10.5% -9.4% -21.6%
local umbrel / Intel N150 GCC 12.2.0 -4.9% -4.3% -4.6% +0.6% -1.1%
local rpi5-16-3 GCC 14.2.0 -0.6% -0.7% -0.6% -5.5% -1.0%
local rpi4-2-1 GCC 14.2.0 -2.7% -2.3% -2.7% -5.6% -4.0%
local nodl / Cortex-A53 GCC 11.4.0 -3.3% -7.6% -5.7% -9.9% -1.8%
andrewtoth i9-14900HX GCC 12.3 -5.3% -4.2% -5.6% -1.5% -6.1%
theStack Snapdragon X Elite X1E-78-100 GCC 14.2.0 -11.2% n/a -11.1% n/a n/a
sipa Ryzen 5950X GCC 15.2.0 -11.4% -10.4% -8.4% n/a n/a
image

Tradeoffs: The speedups reproduce most consistently with GCC and MSVC. Clang was less consistently positive.

Inlining also increases code size:

Platform Artifact Before After Delta
macOS GCC libsecp256k1.a 1,254,320 1,311,368 +57,048 (+4.55%)
Linux GCC libsecp256k1.a 1,271,040 1,330,808 +59,768 (+4.70%)
Windows MSVC Release libsecp256k1-*.dll 1,239,040 1,414,144 +175,104 (+14.13%)

Linux benchmarking script
BEFORE=8363a2d8d1b47857c437f7cf22bd11ab06c7c50f; AFTER=33b1b9c455eb2bb07eded939b36abc49859d2ccf; CC=gcc; \
API_ITERS=10000; INT_ITERS=200000; JOBS=1; \
BH=$(git rev-parse --short=12 "$BEFORE") && AH=$(git rev-parse --short=12 "$AFTER") && \
RUN=$(date +%Y%m%d%H%M%S) && \
ROOT="$PWD/.bench-builds/gcc-$BH-$AH-$RUN" && \
RAW="$PWD/.bench-results/secp-bench-gcc-$BH-$AH-$RUN.txt" && \
(set -e; \
  mkdir -p "$ROOT" "$(dirname "$RAW")"; \
  printf "host: %s, compiler: %s\n" "$(hostname)" "$("$CC" --version | sed -n '1p')" | tee "$RAW" >&2; \
  old=$(git symbolic-ref --short -q HEAD || git rev-parse HEAD); \
  trap 'git switch -q "$old" 2>/dev/null || git switch -q --detach "$old"' EXIT; \
  for side in before after; do \
    ref=$([ "$side" = before ] && printf %s "$BEFORE" || printf %s "$AFTER"); \
    git cat-file -e "$ref^{commit}" 2>/dev/null || git fetch -q origin "$ref"; \
    h=$(git rev-parse --short=12 "$ref"); \
    b="$ROOT/$side-$h"; \
    echo "== $side $h ==" >&2; \
    git switch -q --detach "$ref"; \
    cmake -S . -B "$b" -DCMAKE_C_COMPILER="$CC" -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF -DSECP256K1_BUILD_BENCHMARK=ON -DSECP256K1_BUILD_TESTS=OFF -DSECP256K1_BUILD_EXHAUSTIVE_TESTS=OFF -DSECP256K1_BUILD_CTIME_TESTS=OFF -DSECP256K1_BUILD_EXAMPLES=OFF -DSECP256K1_ENABLE_MODULE_MUSIG=OFF -DSECP256K1_VALGRIND=OFF >> "$RAW" 2>&1; \
    cmake --build "$b" -j "$JOBS" --target bench bench_internal >> "$RAW" 2>&1; \
    echo "=== $side $ref $h ===" >> "$RAW"; \
    SECP256K1_BENCH_ITERS=$API_ITERS "$b/bin/bench" ecdsa ec ecdh schnorrsig ellswift >> "$RAW"; \
    SECP256K1_BENCH_ITERS=$INT_ITERS "$b/bin/bench_internal" field group ecmult hash context >> "$RAW"; \
  done; \
  awk -F, '/^=== /{split($0,p," "); side=p[2]; next} /^[[:alnum:]_][[:alnum:]_]*[[:space:]]*,/{name=$1; val=$2+0; gsub(/^[[:space:]]+|[[:space:]]+$/,"",name); if(name!="Benchmark"){if(!(name in seen)){seen[name]=1; order[++n]=name} x[side,name]=val}} END{print "Benchmark\tBefore min(us)\tAfter min(us)\tDelta"; for(i=1;i<=n;i++){name=order[i]; b=x["before",name]; a=x["after",name]; if(b&&a) printf "%s\t%.6g\t%.6g\t%+.1f%%\n",name,b,a,100*(a-b)/b}}' "$RAW" | column -t -s $'\t'; \
  echo "raw: $RAW" >&2)
Linux size comparison script
BEFORE=8363a2d8d1b47857c437f7cf22bd11ab06c7c50f; AFTER=33b1b9c455eb2bb07eded939b36abc49859d2ccf; CC=gcc; JOBS=1; \
BH=$(git rev-parse --short=12 "$BEFORE"); AH=$(git rev-parse --short=12 "$AFTER"); RUN=$(date +%Y%m%d%H%M%S); ROOT="$PWD/.size-builds/gcc-$BH-$AH-$RUN"; \
(set -e; old=$(git symbolic-ref --short -q HEAD || git rev-parse HEAD); trap 'git switch -q "$old" 2>/dev/null || git switch -q --detach "$old"' EXIT; \
printf "host: %s, compiler: %s\n" "$(hostname)" "$("$CC" --version | sed -n '1p')"; \
for side in before after; do \
  ref=$([ "$side" = before ] && printf %s "$BEFORE" || printf %s "$AFTER"); git cat-file -e "$ref^{commit}" 2>/dev/null || git fetch -q origin "$ref"; h=$(git rev-parse --short=12 "$ref"); b="$ROOT/$side-$h"; \
  git switch -q --detach "$ref"; \
  cmake -S . -B "$b" -DCMAKE_C_COMPILER="$CC" -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF -DSECP256K1_BUILD_BENCHMARK=OFF -DSECP256K1_BUILD_TESTS=OFF -DSECP256K1_BUILD_EXHAUSTIVE_TESTS=OFF -DSECP256K1_BUILD_CTIME_TESTS=OFF -DSECP256K1_BUILD_EXAMPLES=OFF -DSECP256K1_ENABLE_MODULE_MUSIG=OFF -DSECP256K1_VALGRIND=OFF >/dev/null; \
  cmake --build "$b" -j "$JOBS" --target secp256k1 >/dev/null; \
  lib=$(find "$b" -name 'libsecp256k1.a' -print -quit); \
  bytes=$(wc -c < "$lib" | tr -d ' '); \
  printf "%s\t%s\t%s\n" "$side" "$h" "$bytes"; \
  done | awk 'BEGIN{print "Side\tCommit\tlibsecp256k1.a bytes"} {print; size[$1]=$3} END{if(size["before"]&&size["after"]) printf "Delta\t\t%+d bytes (%+.2f%%)\n",size["after"]-size["before"],100*(size["after"]-size["before"])/size["before"]}' | column -t -s $'\t')
host: M4-Max.local, compiler: gcc-14 (Homebrew GCC 14.3.0) 14.3.0
Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              17.5            15.9           -9.1%
ecdsa_sign                12.3            12.1           -1.6%
ec_keygen                 8.07            7.77           -3.7%
ecdh                      16.6            15.1           -9.0%
schnorrsig_sign           8.6             8.29           -3.6%
schnorrsig_verify         17.8            16.1           -9.6%
ellswift_encode           11.1            11.1           +0.0%
ellswift_decode           4.68            4.69           +0.2%
ellswift_keygen           19.4            19.1           -1.5%
ellswift_ecdh             18.5            17.1           -7.6%
field_half                0.00154         0.00155        +0.6%
field_normalize           0.00665         0.00672        +1.1%
field_normalize_weak      0.00291         0.00291        +0.0%
field_sqr                 0.00871         0.0081         -7.0%
field_mul                 0.00969         0.0093         -4.0%
field_inverse             1.57            1.58           +0.6%
field_inverse_var         0.735           0.742          +1.0%
field_is_square_var       0.994           1              +0.6%
field_sqrt                2.21            2.22           +0.5%
group_double_var          0.0502          0.0447         -11.0%
group_add_var             0.126           0.11           -12.7%
group_add_affine          0.1             0.0922         -7.8%
group_add_affine_var      0.0887          0.077          -13.2%
group_add_zinv_var        0.106           0.0902         -14.9%
group_to_affine_var       0.774           0.774          +0.0%
ecmult_wnaf               0.334           0.334          +0.0%
hash_sha256               0.12            0.12           +0.0%
hash_hmac_sha256          0.464           0.463          -0.2%
hash_rfc6979_hmac_sha256  2.55            2.55           +0.0%
context_create            1.96            1.96           +0.0%

Side    Commit                 libsecp256k1.a bytes
before  8363a2d8d1b4           1254320
after   33b1b9c455eb           1311368
Delta   +57048 bytes (+4.55%)
host: WIN-A2EHOAU4JET (Intel(R) Xeon(R) CPU E5-2637 v2 @ 3.50GHz), system: Microsoft Windows NT 10.0.20348.0, compiler: Microsoft (R) C/C++ Optimizing Compiler Version 19.50.35728 for x64
Benchmark                    Before min(us) After min(us)    Delta
ecdsa_verify                           74.1          72.2    -2.6%
ecdsa_sign                             43.3          41.4    -4.4%
ec_keygen                              32.3            30    -7.1%
ecdh                                     75            68    -9.3%
schnorrsig_sign                        34.1            32    -6.2%
schnorrsig_verify                      74.9          73.1    -2.4%
ellswift_encode                        32.3          32.5    +0.6%
ellswift_decode                        14.4          14.6    +1.4%
ellswift_keygen                        64.6          62.9    -2.6%
ellswift_ecdh                          80.2          73.7    -8.1%
field_half                          0.00378       0.00378    +0.0%
field_normalize                      0.0114        0.0114    +0.0%
field_normalize_weak                0.00389       0.00389    +0.0%
field_sqr                            0.0272        0.0252    -7.4%
field_mul                            0.0394        0.0365    -7.4%
field_inverse                          3.27          3.29    +0.6%
field_inverse_var                      2.07          2.11    +1.9%
field_is_square_var                     2.7          2.67    -1.1%
field_sqrt                             7.47          6.98    -6.6%
group_double_var                      0.245         0.207   -15.5%
group_add_var                           0.6         0.525   -12.5%
group_add_affine                      0.465         0.405   -12.9%
group_add_affine_var                  0.418         0.358   -14.4%
group_add_zinv_var                    0.458         0.403   -12.0%
group_to_affine_var                    2.25          2.26    +0.4%
ecmult_wnaf                            0.58          0.59    +1.7%
hash_sha256                           0.332         0.333    +0.3%
hash_hmac_sha256                       1.31          1.31    +0.0%
hash_rfc6979_hmac_sha256               7.23           7.2    -0.4%
context_create                         3.32          3.34    +0.6%

Side     Commit          DLL bytes
before   8363a2d8d1b4      1239040
after    a37e34e187da      1414144
Delta                      175104 (+14.13%)
host: i9-ssd, compiler: gcc (GCC) 16.1.0
Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              39.6            37.5           -5.3%
ecdsa_sign                27.1            26.4           -2.6%
ec_keygen                 18.2            17.5           -3.8%
ecdh                      39              37.4           -4.1%
schnorrsig_sign           19.5            18.7           -4.1%
schnorrsig_verify         40.3            38.1           -5.5%
ellswift_encode           20.1            19.9           -1.0%
ellswift_decode           8.59            8.46           -1.5%
ellswift_keygen           38.2            37.3           -2.4%
ellswift_ecdh             43.4            40.9           -5.8%
field_half                0.00275         0.00275        +0.0%
field_normalize           0.00995         0.00994        -0.1%
field_normalize_weak      0.00378         0.00378        +0.0%
field_sqr                 0.0178          0.015          -15.7%
field_mul                 0.019           0.0168         -11.6%
field_inverse             2.41            2.39           -0.8%
field_inverse_var         1.32            1.28           -3.0%
field_is_square_var       1.69            1.68           -0.6%
field_sqrt                4.21            4.16           -1.2%
group_double_var          0.121           0.115          -5.0%
group_add_var             0.309           0.272          -12.0%
group_add_affine          0.248           0.231          -6.9%
group_add_affine_var      0.216           0.194          -10.2%
group_add_zinv_var        0.245           0.213          -13.1%
group_to_affine_var       1.41            1.36           -3.5%
ecmult_wnaf               0.536           0.581          +8.4%
hash_sha256               0.29            0.286          -1.4%
hash_hmac_sha256          1.14            1.13           -0.9%
hash_rfc6979_hmac_sha256  6.3             6.21           -1.4%
context_create            2.68            2.68           +0.0%

Side    Commit        libsecp256k1.a bytes
before  8363a2d8d1b4  1271040
after   33b1b9c455eb  1330808
Delta                 +59768 bytes (+4.70%)
host: i7-hdd, compiler: gcc (Ubuntu 14.2.0-19ubuntu2) 14.2.0
Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              43.1            38.4           -10.9%
ecdsa_sign                28.4            27.3           -3.9%
ec_keygen                 19.3            18             -6.7%
ecdh                      43.2            38.4           -11.1%
schnorrsig_sign           20.6            19.4           -5.8%
schnorrsig_verify         43.7            39.1           -10.5%
ellswift_encode           19.9            19.7           -1.0%
ellswift_decode           8.48            8.41           -0.8%
ellswift_keygen           39.2            37.8           -3.6%
ellswift_ecdh             46.4            41.8           -9.9%
field_half                0.00275         0.00275        +0.0%
field_normalize           0.00998         0.00998        +0.0%
field_normalize_weak      0.00402         0.00402        +0.0%
field_sqr                 0.017           0.0154         -9.4%
field_mul                 0.0218          0.0171         -21.6%
field_inverse             2.49            2.46           -1.2%
field_inverse_var         1.36            1.35           -0.7%
field_is_square_var       1.66            1.67           +0.6%
field_sqrt                4.07            4.07           +0.0%
group_double_var          0.132           0.119          -9.8%
group_add_var             0.346           0.28           -19.1%
group_add_affine          0.266           0.236          -11.3%
group_add_affine_var      0.243           0.201          -17.3%
group_add_zinv_var        0.265           0.216          -18.5%
group_to_affine_var       1.46            1.44           -1.4%
ecmult_wnaf               0.554           0.604          +9.0%
hash_sha256               0.305           0.298          -2.3%
hash_hmac_sha256          1.18            1.17           -0.8%
hash_rfc6979_hmac_sha256  6.47            6.43           -0.6%
context_create            2.73            2.71           -0.7%
host: rpi5-16-3, compiler: gcc (Ubuntu 14.2.0-19ubuntu2) 14.2.0
Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              157             156            -0.6%
ecdsa_sign                69.5            69.3           -0.3%
ec_keygen                 57.6            57.5           -0.2%
ecdh                      149             148            -0.7%
schnorrsig_sign           59.3            59             -0.5%
schnorrsig_verify         158             157            -0.6%
ellswift_encode           44.9            44.8           -0.2%
ellswift_decode           24.2            24.2           +0.0%
ellswift_keygen           103             102            -1.0%
ellswift_ecdh             154             154            +0.0%
field_half                0.00334         0.00334        +0.0%
field_normalize           0.0143          0.0144         +0.7%
field_normalize_weak      0.00543         0.00543        +0.0%
field_sqr                 0.0654          0.0618         -5.5%
field_mul                 0.0919          0.091          -1.0%
field_inverse             4.8             4.78           -0.4%
field_inverse_var         2.24            2.24           +0.0%
field_is_square_var       2.31            2.31           +0.0%
field_sqrt                17              17             +0.0%
group_double_var          0.526           0.525          -0.2%
group_add_var             1.35            1.34           -0.7%
group_add_affine          0.988           0.984          -0.4%
group_add_affine_var      0.926           0.915          -1.2%
group_add_zinv_var        1.02            1.01           -1.0%
group_to_affine_var       2.6             2.6            +0.0%
ecmult_wnaf               0.606           0.614          +1.3%
hash_sha256               0.316           0.315          -0.3%
hash_hmac_sha256          1.2             1.2            +0.0%
hash_rfc6979_hmac_sha256  6.62            6.62           +0.0%
context_create            4.18            4.18           +0.0%
host: rpi4-2-1, compiler: gcc (Ubuntu 14.2.0-19ubuntu2) 14.2.0
Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              222             216            -2.7%
ecdsa_sign                111             109            -1.8%
ec_keygen                 90.4            88.6           -2.0%
ecdh                      216             211            -2.3%
schnorrsig_sign           93.4            91.4           -2.1%
schnorrsig_verify         224             218            -2.7%
ellswift_encode           64.2            64.1           -0.2%
ellswift_decode           33.5            33.5           +0.0%
ellswift_keygen           156             153            -1.9%
ellswift_ecdh             226             220            -2.7%
field_half                0.00447         0.00447        +0.0%
field_normalize           0.0215          0.0215         +0.0%
field_normalize_weak      0.00783         0.00783        +0.0%
field_sqr                 0.0871          0.0822         -5.6%
field_mul                 0.126           0.121          -4.0%
field_inverse             8.54            8.54           +0.0%
field_inverse_var         3.25            3.25           +0.0%
field_is_square_var       3.57            3.57           +0.0%
field_sqrt                22.7            22.6           -0.4%
group_double_var          0.72            0.71           -1.4%
group_add_var             1.87            1.8            -3.7%
group_add_affine          1.4             1.36           -2.9%
group_add_affine_var      1.3             1.24           -4.6%
group_add_zinv_var        1.42            1.37           -3.5%
group_to_affine_var       3.76            3.75           -0.3%
ecmult_wnaf               1.06            1.05           -0.9%
hash_sha256               0.532           0.531          -0.2%
hash_hmac_sha256          2.02            2.02           +0.0%
hash_rfc6979_hmac_sha256  11.2            11.2           +0.0%
context_create            6.8             6.8            +0.0%
host: umbrel (Intel(R) N150), compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0
Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              371             353            -4.9%
ecdsa_sign                163             160            -1.8%
ec_keygen                 129             123            -4.7%
ecdh                      347             332            -4.3%
schnorrsig_sign           131             126            -3.8%
schnorrsig_verify         373             356            -4.6%
ellswift_encode           143             142            -0.7%
ellswift_decode           71.3            70.8           -0.7%
ellswift_keygen           272             268            -1.5%
ellswift_ecdh             367             352            -4.1%
field_half                0.0124          0.0124         +0.0%
field_normalize           0.0439          0.0439         +0.0%
field_normalize_weak      0.0192          0.0192         +0.0%
field_sqr                 0.168           0.169          +0.6%
field_mul                 0.182           0.18           -1.1%
field_inverse             11.2            11.2           +0.0%
field_inverse_var         8.44            8.4            -0.5%
field_is_square_var       9.56            9.55           -0.1%
field_sqrt                45              44             -2.2%
group_double_var          1.25            1.18           -5.6%
group_add_var             2.92            2.68           -8.2%
group_add_affine          2.22            2.12           -4.5%
group_add_affine_var      2.02            1.86           -7.9%
group_add_zinv_var        2.21            2.01           -9.0%
group_to_affine_var       9.25            9.13           -1.3%
ecmult_wnaf               2.51            2.45           -2.4%
hash_sha256               1.13            1.12           -0.9%
hash_hmac_sha256          4.44            4.44           +0.0%
hash_rfc6979_hmac_sha256  24.4            24.4           +0.0%
context_create            14.2            14.1           -0.7%
host: nodl (Cortex-A53), compiler: gcc (Ubuntu 11.4.0-1ubuntu1~22.04.3) 11.4.0
Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              632             611            -3.3%
ecdsa_sign                308             291            -5.5%
ec_keygen                 228             212            -7.0%
ecdh                      633             585            -7.6%
schnorrsig_sign           231             221            -4.3%
schnorrsig_verify         630             594            -5.7%
ellswift_encode           156             156            +0.0%
ellswift_decode           80              76.1           -4.9%
ellswift_keygen           438             455            +3.9%
ellswift_ecdh             613             599            -2.3%
field_half                0.0106          0.00985        -7.1%
field_normalize           0.0483          0.0499         +3.3%
field_normalize_weak      0.0173          0.0173         +0.0%
field_sqr                 0.202           0.182          -9.9%
field_mul                 0.278           0.273          -1.8%
field_inverse             21.3            21.1           -0.9%
field_inverse_var         7.67            7.48           -2.5%
field_is_square_var       8.73            8.91           +2.1%
field_sqrt                65.8            61.9           -5.9%
group_double_var          2.04            1.9            -6.9%
group_add_var             5.35            5.09           -4.9%
group_add_affine          3.93            3.51           -10.7%
group_add_affine_var      3.56            3.32           -6.7%
group_add_zinv_var        3.94            3.65           -7.4%
group_to_affine_var       9.56            10.4           +8.8%
ecmult_wnaf               2.37            2.48           +4.6%
hash_sha256               1.13            1.19           +5.3%
hash_hmac_sha256          5.08            4.76           -6.3%
hash_rfc6979_hmac_sha256  33.3            31.2           -6.3%
context_create            19.3            18.8           -2.6%
Reviewer measurements

andrewtoth, i9-14900HX, GCC 12.3

Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              22.7            21.5           -5.3%
ecdsa_sign                14.3            14.0           -2.1%
ec_keygen                 9.90            9.54           -3.6%
ecdh                      21.6            20.7           -4.2%
schnorrsig_sign           10.6            10.2           -3.8%
schnorrsig_verify         23.1            21.8           -5.6%
ellswift_ecdh             23.8            22.7           -4.6%
field_sqr                 0.00912         0.00898        -1.5%
field_mul                 0.0114          0.0107         -6.1%
field_inverse             1.23            1.24           +0.8%
field_inverse_var         0.770           0.773          +0.4%
field_is_square_var       1.06            1.05           -0.9%
field_sqrt                2.82            2.46           -12.8%
group_double_var          0.0701          0.0612         -12.7%
group_add_var             0.168           0.153          -8.9%
group_add_affine          0.132           0.123          -6.8%
group_add_affine_var      0.120           0.103          -14.2%
group_add_zinv_var        0.138           0.117          -15.2%
group_to_affine_var       0.820           0.819          -0.1%

theStack, Snapdragon X Elite X1E-78-100, GCC 14.2.0

Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              24.1            21.4           -11.2%
ecdsa_sign                19.0            18.5           -2.6%
schnorrsig_sign           13.0            12.7           -2.3%
schnorrsig_verify         24.4            21.7           -11.1%

Bitcoin Core subtree bench_bitcoin -filter=VerifyScript.*:

Benchmark                   Before ns/script  After ns/script  Delta
VerifyScriptP2TR_KeyPath    23679.52          20899.66         -11.7%
VerifyScriptP2TR_ScriptPath 43430.71          39280.19         -9.6%
VerifyScriptP2WPKH          23526.82          20870.22         -11.3%

sipa, Ryzen 5950X, GCC 15.2.0

Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              30.8            27.3           -11.4%
ecdsa_sign                18.7            17.2           -8.0%
ec_keygen                 13.6            12.2           -10.3%
ecdh                      29.8            26.7           -10.4%
ecdsa_recover             31.0            28.2           -9.0%
schnorrsig_sign           14.4            13.0           -9.7%
schnorrsig_verify         31.1            28.5           -8.4%
ellswift_encode           13.2            13.4           +1.5%
ellswift_decode           5.79            5.84           +0.9%
ellswift_keygen           26.8            25.7           -4.1%
ellswift_ecdh             32.1            29.6           -7.8%

clang:

host: i9-ssd, compiler: Ubuntu clang version 22.1.6 (++20260508084839+c0262e742787-1~exp1~20260508204859.77)
Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              40.1            39.8           -0.7%
ecdsa_sign                29.2            29.1           -0.3%
ec_keygen                 19.5            19.6           +0.5%
ecdh                      40.3            39.8           -1.2%
schnorrsig_sign           21              20.9           -0.5%
schnorrsig_verify         40.6            40.3           -0.7%
ellswift_encode           20.1            20.1           +0.0%
ellswift_decode           8.43            8.41           -0.2%
ellswift_keygen           39.8            39.7           -0.3%
ellswift_ecdh             44.1            43.5           -1.4%
field_half                0.0028          0.0028         +0.0%
field_normalize           0.00889         0.00891        +0.2%
field_normalize_weak      0.0037          0.0037         +0.0%
field_sqr                 0.0144          0.0144         +0.0%
field_mul                 0.021           0.019          -9.5%
field_inverse             2.6             2.64           +1.5%
field_inverse_var         1.34            1.35           +0.7%
field_is_square_var       1.73            1.73           +0.0%
field_sqrt                3.95            3.96           +0.3%
group_double_var          0.128           0.125          -2.3%
group_add_var             0.311           0.31           -0.3%
group_add_affine          0.243           0.242          -0.4%
group_add_affine_var      0.207           0.207          +0.0%
group_add_zinv_var        0.229           0.228          -0.4%
group_to_affine_var       1.43            1.44           +0.7%
ecmult_wnaf               0.536           0.598          +11.6%
hash_sha256               0.3             0.299          -0.3%
hash_hmac_sha256          1.18            1.18           +0.0%
hash_rfc6979_hmac_sha256  6.51            6.53           +0.3%
context_create            2.16            2.15           -0.5%

reindex-chainstate:

2026-05-28 | reindex-chainstate | 950059 blocks | dbcache 5000 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | SSD
for DBCACHE in 5000; do \
    COMMITS="67250b1d97e6159d908ef44639b6a12471e7c717 c264526415f38afb9890003003b7de39b370b745"; \
    STOP=950059; CC=gcc; CXX=g++; \
    BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
    (echo ""; for c in $COMMITS; do git fetch -q origin "$c" 2>/dev/null || true; git log -1 --pretty='%h %s' $c || exit 1; done) && \
    (echo "" && echo "$(date -I) | reindex-chainstate | ${STOP} blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(l
sblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 1 && echo HDD || echo SSD)"; echo "") && \
    hyperfine \
    --sort command \
    --runs 1 \
    --export-json "$BASE_DIR/rdx-$(sed -E 's/[^ ]+/\L&/g;s/[.]/_/g;s/ /-/g'<<<"$COMMITS")-$STOP-$DBCACHE-$CC.json" \
    --parameter-list COMMIT ${COMMITS// /,} \
    --prepare "killall -9 bitcoind 2>/dev/null; rm -f ./build/bin/bitcoind; git clean -fxd; git reset --hard {COMMIT} && \
      cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release && ninja -C build bitcoind -j1 && \
      ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=1000 -printtoconsole=0; sleep 20; rm -f $DATA_DIR/debug.log; rm -rfd $DATA_DIR/indexes;" \
    --conclude "killall bitcoind || true; sleep 5; grep -q 'height=0' $DATA_DIR/debug.log && grep -q 'Disabling script verification at block #1' $DATA_DIR/debug.log && grep -q 'height=$STOP' $DATA_DIR/debug.log && grep 'Bitcoin Core version' $DATA_
DIR/debug.log | grep -q \"\$(git rev-parse --short=12 {COMMIT})\"; \
                cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
    "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=$DBCACHE -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -assumevalid=0"; \
done

67250b1d97 parallel input fetcher
c264526415 Refactor: optimize scalar reduction and arithmetic functions.

2026-05-28 | reindex-chainstate | 950059 blocks | dbcache 5000 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | SSD

Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=5000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -assumevalid=0 (COMMIT = 67250b1d97e6159d908ef44639b6a12471e7c717)
  Time (abs ≡):        37155.108 s               [User: 375835.978 s, System: 978.929 s]

Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=5000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -assumevalid=0 (COMMIT = c264526415f38afb9890003003b7de39b370b745)
  Time (abs ≡):        36261.785 s               [User: 362247.387 s, System: 1002.867 s]

Relative speed comparison
        1.02          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=5000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -assumevalid=0 (COMMIT = 67250b1d97e6159d908ef44639b6a12471e7c717)
        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=5000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -assumevalid=0 (COMMIT = c264526415f38afb9890003003b7de39b370b745)

@andrewtoth

andrewtoth commented Jun 1, 2026

Copy link
Copy Markdown

Ran the benchmarks on i9-14900HX built with GCC 12.3, confirmed the speedups:

Results (Min us, lower is better)

Benchmark Before After Delta
ecdsa_verify 22.7 21.5 -5.3%
ecdsa_sign 14.3 14.0 -2.1%
ec_keygen 9.90 9.54 -3.6%
ecdh 21.6 20.7 -4.2%
schnorrsig_sign 10.6 10.2 -3.8%
schnorrsig_verify 23.1 21.8 -5.6%
ellswift_ecdh 23.8 22.7 -4.6%
field_sqr 0.00912 0.00898 -1.5%
field_mul 0.0114 0.0107 -6.1%
field_inverse 1.23 1.24 +0.8%
field_inverse_var 0.770 0.773 +0.4%
field_is_square_var 1.06 1.05 -0.9%
field_sqrt 2.82 2.46 -12.8%
group_double_var 0.0701 0.0612 -12.7%
group_add_var 0.168 0.153 -8.9%
group_add_affine 0.132 0.123 -6.8%
group_add_affine_var 0.120 0.103 -14.2%
group_add_zinv_var 0.138 0.117 -15.2%
group_to_affine_var 0.820 0.819 -0.1%

@theStack

theStack commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Seeing a ~12.5% speedup for both ECDSA and Schnorr verification and ~3% for signing on my arm64 machine (Snapdragon X Elite - X1E-78-100), using GCC 14.2.0:

master:

$ ./build/bin/bench verify sign
Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)    

ecdsa_verify                  ,    24.1       ,    24.1       ,    24.5    
ecdsa_sign                    ,    19.0       ,    19.0       ,    19.1    
schnorrsig_sign               ,    13.0       ,    13.1       ,    13.2    
schnorrsig_verify             ,    24.4       ,    24.4       ,    24.5

PR:

$ ./build/bin/bench verify sign
Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)    

ecdsa_verify                  ,    21.4       ,    21.4       ,    21.8    
ecdsa_sign                    ,    18.5       ,    18.5       ,    18.5    
schnorrsig_sign               ,    12.7       ,    12.7       ,    12.7    
schnorrsig_verify             ,    21.7       ,    21.7       ,    21.7

Applying this change to the Bitcoin Core secp256k1 subtree (Branch apply-secp-pr1859) shows the speedup in the script verification benchmarks as well (run via ./build/bin/bench_bitcoin -filter=VerifyScript.*):

master (commit theStack/bitcoin@654a522):

ns/script script/s err% total benchmark
23,679.52 42,230.58 0.3% 0.01 VerifyScriptP2TR_KeyPath
43,430.71 23,025.18 0.4% 0.01 VerifyScriptP2TR_ScriptPath
23,526.82 42,504.68 0.3% 0.01 VerifyScriptP2WPKH

PR applied (commit theStack/bitcoin@494a473):

ns/script script/s err% total benchmark
20,899.66 47,847.67 0.3% 0.01 VerifyScriptP2TR_KeyPath
39,280.19 25,458.13 0.8% 0.01 VerifyScriptP2TR_ScriptPath
20,870.22 47,915.16 0.6% 0.01 VerifyScriptP2WPKH

@real-or-random

Copy link
Copy Markdown
Contributor

Concept ACK

That's a very interesting observation. So far, we tried to stay away from guiding the compiler too much, but the ratio of added complexity vs. gains here is pretty good.

@l0rinc What I always wanted to try is profile-guided optimizations, e.g., where the profile is generated in a benchmark run that only performs signature verification (this could even be done automatically as part of the build process). I imagine there could be more low-hanging fruits. Would you be interested in looking into this stuff as well?

Comment thread src/util.h Outdated
The 5x52 field multiplication and squaring routines are hot in group arithmetic and scalar multiplication.

Use the new `SECP256K1_FORCE_INLINE` for the thin wrappers and `int128` inner helpers so compilers can schedule the 64x64->128 arithmetic without a call boundary.
The helper uses forced inlining in optimized release-style builds, but falls back to `SECP256K1_INLINE` when no-inline, size optimization, or debug-style macros ask not to force it.

Across the measured GCC and MSVC Release builds, this improves ECDSA verification by 0.6% to 9.1%, ECDH by 0.7% to 9.3%, and Schnorr verification by 0.6% to 9.6%.
The direct field benchmarks generally show the intended effect on field squaring and multiplication, while Clang results are mostly flat and less consistently positive.

This is a code-size tradeoff: the tested static library builds grew by about 4.6% to 4.7%, and the tested Windows Release DLL grew by 14.1%.

Co-authored-by: Sebastian Falbesoner <sebastian.falbesoner@gmail.com>
@sipa

sipa commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Concept ACK.

Master:

Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)    

ecdsa_verify                  ,    30.8       ,    31.1       ,    33.4    
ecdsa_sign                    ,    18.7       ,    18.7       ,    18.7    
ec_keygen                     ,    13.6       ,    13.6       ,    13.6    
ecdh                          ,    29.8       ,    29.9       ,    29.9    
ecdsa_recover                 ,    31.0       ,    32.3       ,    34.4    
schnorrsig_sign               ,    14.4       ,    14.4       ,    14.4    
schnorrsig_verify             ,    31.1       ,    31.1       ,    31.2    
ellswift_encode               ,    13.2       ,    13.2       ,    13.3    
ellswift_decode               ,     5.79      ,     5.80      ,     5.82   
ellswift_keygen               ,    26.8       ,    26.8       ,    26.8    
ellswift_ecdh                 ,    32.1       ,    32.1       ,    32.2    

This PR:

Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)    

ecdsa_verify                  ,    27.3       ,    28.2       ,    30.3    
ecdsa_sign                    ,    17.2       ,    17.9       ,    19.4    
ec_keygen                     ,    12.2       ,    12.3       ,    12.3    
ecdh                          ,    26.7       ,    26.8       ,    26.8    
ecdsa_recover                 ,    28.2       ,    28.2       ,    28.3    
schnorrsig_sign               ,    13.0       ,    13.4       ,    14.9    
schnorrsig_verify             ,    28.5       ,    28.7       ,    28.8    
ellswift_encode               ,    13.4       ,    13.5       ,    13.5    
ellswift_decode               ,     5.84      ,     5.87      ,     5.91   
ellswift_keygen               ,    25.7       ,    25.9       ,    26.2    
ellswift_ecdh                 ,    29.6       ,    29.6       ,    29.9    

(GCC 15.2.0 on Ryzen 5950X)

@l0rinc l0rinc force-pushed the l0rinc/force-inline-5x52-mul-sqr branch from ac915c9 to 1c537ab Compare June 3, 2026 21:08

@real-or-random real-or-random left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 1c537ab

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants