Add "silentpayments" module implementing BIP352 (take 4, limited to full-node scanning)#1765
Add "silentpayments" module implementing BIP352 (take 4, limited to full-node scanning)#1765theStack wants to merge 14 commits into
Conversation
|
Added the optimized version on top of this PR: For more context: |
|
Small supplementary update: I've created a corresponding Python implementation of the provided API functions based on secp256k1lab (https://github.com/theStack/secp256k1lab/blob/add_bip352_module_review_helper/src/secp256k1lab/bip352.py) (also linked in the PR description). The hope is that this makes reviewing this PR a bit easier by having a less noisy, "executable pseudo-code"-like description on what happens under the hood. The code passes the BIP352 test vectors and hence should be correct.
Thanks for rebasing on top of this PR, much appreciated! I will take a closer look within the next days. |
w0xlt
left a comment
There was a problem hiding this comment.
Nit: Not related to optimization, but the diff below removes some redundant public-key serialization code:
diff --git a/src/modules/silentpayments/main_impl.h b/src/modules/silentpayments/main_impl.h
index 106da20..922433d 100644
--- a/src/modules/silentpayments/main_impl.h
+++ b/src/modules/silentpayments/main_impl.h
@@ -21,6 +21,19 @@
/** magic bytes for ensuring prevouts_summary objects were initialized correctly. */
static const unsigned char secp256k1_silentpayments_prevouts_summary_magic[4] = { 0xa7, 0x1c, 0xd3, 0x5e };
+/* Serialize a ge to compressed 33 bytes. Keeps eckey_pubkey_serialize usage uniform
+ * (expects non-const ge*), and centralizes the VERIFY_CHECK. */
+static SECP256K1_INLINE void secp256k1_sp_ge_serialize33(const secp256k1_ge* in, unsigned char out33[33]) {
+ size_t len = 33;
+ secp256k1_ge tmp = *in;
+ int ok = secp256k1_eckey_pubkey_serialize(&tmp, out33, &len, 1);
+#ifdef VERIFY
+ VERIFY_CHECK(ok && len == 33);
+#else
+ (void)ok;
+#endif
+}
+
/** Sort an array of silent payment recipients. This is used to group recipients by scan pubkey to
* ensure the correct values of k are used when creating multiple outputs for a recipient.
*
@@ -68,13 +81,11 @@ static int secp256k1_silentpayments_calculate_input_hash_scalar(secp256k1_scalar
secp256k1_sha256 hash;
unsigned char pubkey_sum_ser[33];
unsigned char input_hash[32];
- size_t len;
int ret, overflow;
secp256k1_silentpayments_sha256_init_inputs(&hash);
secp256k1_sha256_write(&hash, outpoint_smallest36, 36);
- ret = secp256k1_eckey_pubkey_serialize(pubkey_sum, pubkey_sum_ser, &len, 1);
- VERIFY_CHECK(ret && len == sizeof(pubkey_sum_ser));
+ secp256k1_sp_ge_serialize33(pubkey_sum, pubkey_sum_ser);
secp256k1_sha256_write(&hash, pubkey_sum_ser, sizeof(pubkey_sum_ser));
secp256k1_sha256_finalize(&hash, input_hash);
/* Convert input_hash to a scalar.
@@ -85,15 +96,13 @@ static int secp256k1_silentpayments_calculate_input_hash_scalar(secp256k1_scalar
* an error to ensure strict compliance with BIP0352.
*/
secp256k1_scalar_set_b32(input_hash_scalar, input_hash, &overflow);
- ret &= !secp256k1_scalar_is_zero(input_hash_scalar);
+ ret = !secp256k1_scalar_is_zero(input_hash_scalar);
return ret & !overflow;
}
static void secp256k1_silentpayments_create_shared_secret(const secp256k1_context *ctx, unsigned char *shared_secret33, const secp256k1_ge *public_component, const secp256k1_scalar *secret_component) {
secp256k1_gej ss_j;
secp256k1_ge ss;
- size_t len;
- int ret;
secp256k1_ecmult_const(&ss_j, public_component, secret_component);
secp256k1_ge_set_gej(&ss, &ss_j);
@@ -103,12 +112,7 @@ static void secp256k1_silentpayments_create_shared_secret(const secp256k1_contex
* impossible at this point considering we have already validated the public key and
* the secret key.
*/
- ret = secp256k1_eckey_pubkey_serialize(&ss, shared_secret33, &len, 1);
-#ifdef VERIFY
- VERIFY_CHECK(ret && len == 33);
-#else
- (void)ret;
-#endif
+ secp256k1_sp_ge_serialize33(&ss, shared_secret33);
/* Leaking these values would break indistinguishability of the transaction, so clear them. */
secp256k1_ge_clear(&ss);
@@ -585,7 +589,6 @@ int secp256k1_silentpayments_recipient_scan_outputs(
secp256k1_ge output_negated_ge, tx_output_ge;
secp256k1_gej tx_output_gej, label_gej;
unsigned char label33[33];
- size_t len;
secp256k1_xonly_pubkey_load(ctx, &tx_output_ge, tx_outputs[j]);
secp256k1_gej_set_ge(&tx_output_gej, &tx_output_ge);
@@ -595,7 +598,6 @@ int secp256k1_silentpayments_recipient_scan_outputs(
secp256k1_ge_neg(&output_negated_ge, &output_ge);
secp256k1_gej_add_ge_var(&label_gej, &tx_output_gej, &output_negated_ge, NULL);
secp256k1_ge_set_gej_var(&label_ge, &label_gej);
- ret = secp256k1_eckey_pubkey_serialize(&label_ge, label33, &len, 1);
/* Serialize must succeed because the point was just loaded.
*
* Note: serialize will also fail if label_ge is the point at infinity, but we know
@@ -603,7 +605,7 @@ int secp256k1_silentpayments_recipient_scan_outputs(
* Thus, we know that label_ge = tx_output_gej + output_negated_ge cannot be the
* point at infinity.
*/
- VERIFY_CHECK(ret && len == 33);
+ secp256k1_sp_ge_serialize33(&label_ge, label33);
label_tweak = label_lookup(label33, label_context);
if (label_tweak != NULL) {
found = 1;
@@ -617,7 +619,6 @@ int secp256k1_silentpayments_recipient_scan_outputs(
secp256k1_gej_neg(&label_gej, &tx_output_gej);
secp256k1_gej_add_ge_var(&label_gej, &label_gej, &output_negated_ge, NULL);
secp256k1_ge_set_gej_var(&label_ge, &label_gej);
- ret = secp256k1_eckey_pubkey_serialize(&label_ge, label33, &len, 1);
/* Serialize must succeed because the point was just loaded.
*
* Note: serialize will also fail if label_ge is the point at infinity, but we know
@@ -625,7 +626,7 @@ int secp256k1_silentpayments_recipient_scan_outputs(
* Thus, we know that label_ge = tx_output_gej + output_negated_ge cannot be the
* point at infinity.
*/
- VERIFY_CHECK(ret && len == 33);
+ secp256k1_sp_ge_serialize33(&label_ge, label33);
label_tweak = label_lookup(label33, label_context);
if (label_tweak != NULL) {
found = 1;
w0xlt
left a comment
There was a problem hiding this comment.
nit: The following diff removes the implicit cast and clarifies that k is 4 bytes
diff --git a/src/modules/silentpayments/main_impl.h b/src/modules/silentpayments/main_impl.h
index 922433d..d94aed6 100644
--- a/src/modules/silentpayments/main_impl.h
+++ b/src/modules/silentpayments/main_impl.h
@@ -512,7 +512,8 @@ int secp256k1_silentpayments_recipient_scan_outputs(
secp256k1_xonly_pubkey output_xonly;
unsigned char shared_secret[33];
const unsigned char *label_tweak = NULL;
- size_t j, k, found_idx;
+ size_t j, found_idx;
+ uint32_t k;
int found, combined, valid_scan_key, ret;
/* Sanity check inputs */
jonasnick
left a comment
There was a problem hiding this comment.
Thanks @theStack for the new PR. I can confirm that this PR is a rebased version of #1698, with the light client functionality removed and comments addressed, except for:
- #1698 (comment)
- #1698 (comment)
- #1698 (review) (only the last one, "elemement")
Is there a reason for serializing prevouts_summary without light client functionality? If not, I think the don't do this comment is sufficient. Right now, in contrast to the docs of all other opaque objects, this is missing, however:
|
c11d30c to
445f2e8
Compare
|
@w0xlt, @jonasnick: Thanks for the reviews! I've addressed the suggested changes:
Given that this compressed-pubkey-serialization pattern shows up repeatedly also in other modules (ellswift, musig), I think it would make the most sense to add a general helper (e.g. in eckey{,_impl}.h), which could be done in an independent PR. I've opened issue #1773 to see if there is conceptual support for doing this.
Good point, I can't think of a good reason for full nodes wanting to serialize prevouts_summary. |
445f2e8 to
9103229
Compare
|
To address the open questions, I’ve reviewed the proposed changes by @w0xlt on 8d16914. I'm going to focus more on the key aspects I extracted from the review and the merits of each change, rather on the big O improvement claims, because I didn't get that far. These are multiple different changes rather than a single one, so to make the review easier I suggest to brake it in multiple commits. I would state on each of them the purpose and the real case scenario where the change would be relevant. Also, I would use clearer names for the variables or at least document their purpose. The changes I've identified so far are the following:
In general I agree with @jonasnick that we should define a clear target to benchmark and improve. As I've said before, the base case should be a wallet with a single label for change. In conclusion, from the proposed commit and the discussion around it, the only changes I've found clear enough to consider are:
|
|
Thanks @nymius for reviewing the changes, addressing the main points, and proposing a simplification. I’m currently splitting the optimization commit into smaller pieces to make it easier to review. The only part of the discussion that still feels a bit ambiguous is the “base” or “usual” case. So the goal of this optimization would be to mitigate that scenario, not the collaborative one. |
|
I ran the Without the Answering the questions: The optimized receiver implementation relies on a heuristic (the
|
|
@w0xlt, @nymius: Thanks for investigating this deeper. I've now also had a chance to look at the suggested optimizations and came to similar conclusions as stated in #1765 (comment). I particularly agree with the stated points that the changes should not increase complexity significantly and that the most important optimization candidate to consider for mitigating the worst-case scanning attack is "skip outputs that we have already found" (as previously stated by @jonasnick, see #1698 (comment) and jonasnick@311b4eb). I don't think stabilizing the sorting helps at all, since this is something that happens at the sender side, and we can't rely on the attacker using a specific implementation (even if they did, it's trivial for them to shuffle the outputs after creation). For the proposed target to benchmark, I'm proposing the following modified example that exhibits the worst-case scanning time based on a labels cache with one entry (for change outputs), by creating a tx with 23255 outputs [1] all targeted for Bob: 1df4287 Shower-thought from this morning: what if we treat the Any thoughts on this? Maybe I'm still missing something. [1] that's an upper bound of maximum outputs per block: floor(1000000/43) = 23255 |
1df4287 is a good target.
I had to increase stack size to be able to fit all N_OUTPUT size allocations in the example. Initially I preferred the |
|
@theStack Yes — if we want to keep only the adversarial-scenario optimizations, we can drop sort stabilization and the extra heads. I like your idea of avoiding dynamic memory allocation. That’s a very interesting direction. On my machine, the scan completes in about 0.4s, which feels like a good balance between simplicity and the optimization needed for the labeled case. Below are the changes I had to make for your example to run on my machine and to record the scan time. diff --git a/examples/silentpayments.c b/examples/silentpayments.c
index 5e71e73..d43332f 100644
--- a/examples/silentpayments.c
+++ b/examples/silentpayments.c
@@ -10,6 +10,7 @@
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
+#include <time.h>
#include <secp256k1_extrakeys.h>
#include <secp256k1_silentpayments.h>
@@ -112,15 +113,21 @@ const unsigned char* label_lookup(
return NULL;
}
+static secp256k1_xonly_pubkey tx_inputs[N_INPUTS];
+static const secp256k1_xonly_pubkey *tx_input_ptrs[N_INPUTS];
+static secp256k1_xonly_pubkey tx_outputs[N_OUTPUTS];
+static secp256k1_xonly_pubkey *tx_output_ptrs[N_OUTPUTS];
+static secp256k1_silentpayments_found_output found_outputs[N_OUTPUTS];
+static secp256k1_silentpayments_found_output *found_output_ptrs[N_OUTPUTS];
+static secp256k1_silentpayments_recipient recipients[N_OUTPUTS];
+static const secp256k1_silentpayments_recipient *recipient_ptrs[N_OUTPUTS];
+/* 2D array for holding multiple public key pairs. The second index, i.e., [2],
+ * is to represent the spend and scan public keys. */
+static unsigned char (*sp_addresses[N_OUTPUTS])[2][33];
+
int main(void) {
unsigned char randomize[32];
unsigned char serialized_xonly[32];
- secp256k1_xonly_pubkey tx_inputs[N_INPUTS];
- const secp256k1_xonly_pubkey *tx_input_ptrs[N_INPUTS];
- secp256k1_xonly_pubkey tx_outputs[N_OUTPUTS];
- secp256k1_xonly_pubkey *tx_output_ptrs[N_OUTPUTS];
- secp256k1_silentpayments_found_output found_outputs[N_OUTPUTS];
- secp256k1_silentpayments_found_output *found_output_ptrs[N_OUTPUTS];
secp256k1_silentpayments_prevouts_summary prevouts_summary;
secp256k1_pubkey unlabeled_spend_pubkey;
struct labels_cache bob_labels_cache;
@@ -209,11 +216,6 @@ int main(void) {
{
secp256k1_keypair sender_keypairs[N_INPUTS];
const secp256k1_keypair *sender_keypair_ptrs[N_INPUTS];
- secp256k1_silentpayments_recipient recipients[N_OUTPUTS];
- const secp256k1_silentpayments_recipient *recipient_ptrs[N_OUTPUTS];
- /* 2D array for holding multiple public key pairs. The second index, i.e., [2],
- * is to represent the spend and scan public keys. */
- unsigned char (*sp_addresses[N_OUTPUTS])[2][33];
unsigned char seckey[32];
printf("Sending...\n");
@@ -340,6 +342,9 @@ int main(void) {
* `secp256k1_silentpayments_recipient_prevouts_summary_create`
* 2. Call `secp256k1_silentpayments_recipient_scan_outputs`
*/
+ clock_t start, end;
+ double cpu_time_used;
+
ret = secp256k1_silentpayments_recipient_prevouts_summary_create(ctx,
&prevouts_summary,
smallest_outpoint,
@@ -356,14 +361,20 @@ int main(void) {
/* Scan the transaction */
n_found_outputs = 0;
+
+ start = clock();
ret = secp256k1_silentpayments_recipient_scan_outputs(ctx,
found_output_ptrs, &n_found_outputs,
- (const secp256k1_xonly_pubkey * const *)tx_output_ptrs, N_OUTPUTS,
+ (const secp256k1_xonly_pubkey **)tx_output_ptrs, N_OUTPUTS,
bob_scan_key,
&prevouts_summary,
&unlabeled_spend_pubkey,
label_lookup, &bob_labels_cache /* NULL, NULL for no labels */
);
+ end = clock();
+ cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
+ printf("Bob's scan took %f seconds\n", cpu_time_used);
+
if (!ret) {
printf("This transaction is not valid for Silent Payments, skipping.\n");
return EXIT_SUCCESS;
@@ -435,7 +446,7 @@ int main(void) {
n_found_outputs = 0;
ret = secp256k1_silentpayments_recipient_scan_outputs(ctx,
found_output_ptrs, &n_found_outputs,
- (const secp256k1_xonly_pubkey * const *)tx_output_ptrs, 1, /* dummy scan with one output (we only care about Bob) */
+ (const secp256k1_xonly_pubkey **)tx_output_ptrs, 1, /* dummy scan with one output (we only care about Bob) */
carol_scan_key,
&prevouts_summary,
&unlabeled_spend_pubkey, |
|
@nymius, @w0xlt: Thanks once again for the quick feedback and for benchmarking! Shortly after my previous comment, I've been notified about yet another approach to tackle the worst-case scanning time attack (kudos to @furszy for bringing up the idea!), that I think is even more elegant: we can use the pointers in the The only tiny drawback about these non-malloc approaches might be that something that is conceptually an "in" parameter is modified, which might be a bit unsound in a strict API design sense. On the other hand, it shouldn't matter for the user (I doubt that these lists passed in would ever be reused for anything else after by the callers), and we already do the same in the sending API for the recipients, so it's probably fine.
The way I see it currently, code paths for non-adversarial scenarios with increasing k values would be hit so rarely in practice, that I'm sceptical that it's worth it put much effort into those optimizations. When scanning, the vast majority of transactions won't have any matches in the first place. Out of those few that do have a match, the vast majority will very likely again not contain any repeated recipient (IMHO it doesn't make that much sense to do that, unless the recipient explicitly asks "I want to receive my payment split up in multiple UTXOs, but still in a single tx"?), so in the bigger picture those optimizations wouldn't matter all that much, and I'd assume that the dominant factor should be by far all the (unavoidable) ECDH computations per transaction. But that's still more of a guess and it's still good to already have optimization ideas at hand if we need them in the future. |
|
@theStack Thanks for continuing to refine the optimization. The deletion approach performs slightly better (0.40 s vs. 0.45 s), likely because deleting items shrinks the array and cuts the number of loop iterations by about 50% compared to nullifying them. |
assuming the labels cache has only one entry (for change) for now includes fixes by w0xlt in order to avoid running into a stack overflow and time measureing code, see bitcoin-core#1765 (comment)
assuming the labels cache has only one entry (for change) for now includes fixes by w0xlt in order to avoid running into a stack overflow and time measureing code, see bitcoin-core#1765 (comment)
9103229 to
650b2fb
Compare
|
To summarize, the following table shows the proposed mitigations for the worst-case scanning attack so far, with benchmark results from my machine. The previous baseline commit with the worst-case example has been updated to include @w0xlt's changes, in order to work without stack size limit changes.
The run-times of the fixes vary slightly (the removal approach "fix2" being the fastest, confirming #1765 (comment) above), but are all in the same ballpark. I don't think exact performance results matter much here, as the goal of the mitigation should be to IMHO roughly cut the run-time down from "minutes" to "seconds" (and remember, this is already for the absolute worst-case, one giant non-standard transaction filling out a whole block, and it can only slow down one specific SP recipient). Thus, I decided to pick the the simplest approach that avoids dynamic memory allocation, i.e. fix number 3 using With that tackled, I believe that all of the open questions and TODOs are addressed now (updated the PR description accordingly). The latest force-push also includes a rebase on master (to include the CI fix #1771). |
Thanks a lot, that's very helpful! If you have the chance, could you do another run that includes exercising the SP benchmarks as well, i.e. by // EDIT: nevermind, I only realize now that anyone can simply do it locally by applying each of the listed mutant changes :) |
c02e14c to
a8f297a
Compare
theStack
left a comment
There was a problem hiding this comment.
@josibake: Thanks for the review and also weighing in on past discussion topics, much appreciated! 🙏
I've force pushed with addressing #1765 (comment) and #1765 (comment). The mutation tests are pretty nice, as they uncovered missing test coverage for the returned label (and the found_with_label flag) in found outputs (thanks @brunoerg!); I've added tests for that now as well (in the test vectors we don't know the exact label that was used for creating an individual output, but checking against the set of involved labels should be good enough; at least it kills the corresponding mutants 🧟 where the label_ge assignments were modified). The remaining mutant testing findings are IMHO way less severe or there is a good explanation for them, see review comments below -- up for discussion if we want to address them as well.
| combined = (int)prevouts_summary->data[4]; | ||
| if (!combined) { | ||
| secp256k1_scalar input_hash_scalar; | ||
| secp256k1_scalar_set_b32(&input_hash_scalar, &prevouts_summary->data[5 + 64], NULL); | ||
| secp256k1_scalar_mul(&scan_key_scalar, &scan_key_scalar, &input_hash_scalar); |
There was a problem hiding this comment.
The mutation tests discovered that taking this branch unconditionally if (1 == 1) ... doesn't result in a failure. This makes sense: due to missing light client support in the PR, the prevouts_summary object is currently always created with combined = 0, i.e. there is no way of not taking the branch, unless the user tampers with the prevous_summary manually. Should we ARG_CHECK on that or return 0, or leave as-is? Not entirely sure what to do here, considering that a later PR that adds light client support might change the error behavior again (but maybe that's fine and I'm overthinking).
There was a problem hiding this comment.
Given this is for behaviour in a later PR, I'd recommend leaving as-is for now, and looking at the testing in the later PR(s).
| k_max = (n_tx_outputs < SECP256K1_SILENTPAYMENTS_RECIPIENT_GROUP_LIMIT) ? | ||
| n_tx_outputs : SECP256K1_SILENTPAYMENTS_RECIPIENT_GROUP_LIMIT; |
There was a problem hiding this comment.
The mutation tests discovered that by replacing <= with <, i.e.
| k_max = (n_tx_outputs < SECP256K1_SILENTPAYMENTS_RECIPIENT_GROUP_LIMIT) ? | |
| n_tx_outputs : SECP256K1_SILENTPAYMENTS_RECIPIENT_GROUP_LIMIT; | |
| k_max = (n_tx_outputs <= SECP256K1_SILENTPAYMENTS_RECIPIENT_GROUP_LIMIT) ? | |
| n_tx_outputs : SECP256K1_SILENTPAYMENTS_RECIPIENT_GROUP_LIMIT; |
the tests still pass. I think that's expected, as these statements are logically equivalent -- if the number of outputs is equal to K_max, it doesn't matter which of the two resultant expressions are taken. (As potential follow-up idea, maybe we could introduce a MIN macro for better readability?).
There was a problem hiding this comment.
Personally, I find (a < b) ? a : b to be readable enough, i think this is an example of a mutant false positive.
| if (ret != 0) { | ||
| return ret; | ||
| } else { | ||
| return (r1->index > r2->index) - (r1->index < r2->index); |
There was a problem hiding this comment.
The mutation tests discovered that certain changes in the inequality operators here don't result in an error. Considering that the spec doesn't guarantee a certain output ordering within a group and this tie-breaker was primarily introduced to make testing a bit more deterministic and easier, I'm not sure if it's worth it to go deeper here and put in an effort to kill the mutants.
| label_batch_idx++; | ||
| /* If the batch is filled or we have reached the last transaction output, perform batch | ||
| * inversion and check the label cache for each label candidate entry in the batch */ | ||
| if (label_batch_idx == LABEL_BATCH_SIZE || j == (n_tx_outputs-1)) { |
There was a problem hiding this comment.
The mutation tests discovered that the following change
| if (label_batch_idx == LABEL_BATCH_SIZE || j == (n_tx_outputs-1)) { | |
| if (label_batch_idx != LABEL_BATCH_SIZE || j == (n_tx_outputs-1)) { |
doesn't lead to a test failure. This change seems to effectively reduce the batch size to 1 (and sometimes 2), which weakens the performance significantly, but doesn't affect the logic. AFAICT there is no way we could catch that regression with a unit test.
| for (i = 0; i < n_seckeys; i++) { | ||
| ret = secp256k1_scalar_set_b32_seckey(&addend, seckeys[i]); | ||
| secp256k1_declassify(ctx, &ret, sizeof(ret)); | ||
| if (!ret) { |
There was a problem hiding this comment.
The mutation tests discovered that never taking this branch (i.e. if (1 == 0) ...) doesn't lead to a test failure. We do test for passing an invalid secret key (see https://github.com/theStack/secp256k1/blob/a8f297a642651835a0086e85790ab9e6a0b6ca82/src/modules/silentpayments/tests_impl.h#L268-L269), but apparently another error condition is hit later (I guess the secret key sum being zero).
| } | ||
| } | ||
| if (found_idx != -1) { | ||
| break; |
There was a problem hiding this comment.
The mutation tests discovered that continueing instead of breaking here (or never taking the branch, which results in the same logic), doesn't lead to a test error. We could fix that by moving the found_idx = -1 assignment within the outputs iteration loop (j counter), but I'm not sure if that helps readability.
There was a problem hiding this comment.
In this case, I'd prioritise readability?
|
ACK a8f297a 🤫 |
Scalar multiplication with the generator point frequently involves a
conversion to affine coordinates and clearing out the temporary Jacobian
group element object after to avoid leaking secret key material, i.e.
executing the following three steps:
- secp256k1_ecmult_gen(ctx, &rj, ...)
- secp256k1_ge_set_gej(&r, &rj)
- secp256k1_gej_clear(&rj)
This commit introduces a corresponding helper to deduplicate code
and mitigate the risk that last step is forgotten (which can easily
happen and is not detected by tests).
The idea came up during a conversation with furszy, see
bitcoin-core#1765 (comment)
… gej leaks)
Scalar multiplication with the generator point frequently involves a
conversion to affine coordinates and clearing out the temporary Jacobian
group element object after to avoid leaking secret key material, i.e.
executing the following three steps:
- secp256k1_ecmult_gen(ctx, &rj, ...)
- secp256k1_ge_set_gej(&r, &rj)
- secp256k1_gej_clear(&rj)
This commit introduces a corresponding helper to deduplicate code
and mitigate the risk that last step is forgotten (which can easily
happen and is not detected by tests).
The idea came up during a conversation with furszy, see
bitcoin-core#1765 (comment)
… leaks)
Scalar multiplication with the generator point frequently involves a
conversion to affine coordinates and clearing out the temporary Jacobian
group element object after to avoid leaking secret key material, i.e.
executing the following three steps:
- secp256k1_ecmult_gen(ctx, &rj, ...)
- secp256k1_ge_set_gej(&r, &rj)
- secp256k1_gej_clear(&rj)
This commit introduces a corresponding helper to deduplicate code
and mitigate the risk that last step is forgotten (which can easily
happen and is not detected by tests).
The idea came up during a conversation with furszy, see
bitcoin-core#1765 (comment)
…j leaks)
Scalar multiplication with the generator point frequently involves a
conversion to affine coordinates and clearing out the temporary Jacobian
group element object after to avoid leaking secret key material, i.e.
executing the following three steps:
- secp256k1_ecmult_gen(ctx, &rj, ...)
- secp256k1_ge_set_gej(&r, &rj)
- secp256k1_gej_clear(&rj)
This commit introduces a corresponding helper to deduplicate code
and mitigate the risk that last step is forgotten (which can easily
happen and is not detected by tests).
The idea came up during a conversation with furszy, see
bitcoin-core#1765 (comment)
…accidental gej leaks) 9e017e5 refactor: rename `_ecmult_gen` -> `_ecmult_gen_gej` for consistency (Sebastian Falbesoner) a3296d5 refactor: introduce `_ecmult_gen_ge` helper (preventing accidental gej leaks) (Sebastian Falbesoner) Pull request description: Scalar multiplication with the generator point frequently involves a conversion to affine coordinates and clearing out the temporary Jacobian group element object after to avoid leaking secret key material (see 765ef53 / #1579 for that last part), i.e. executing the following three functions: * `secp256k1_ecmult_gen(ctx, &rj, ...)` * `secp256k1_ge_set_gej(&r, &rj)` * `secp256k1_gej_clear(&rj)` This PR introduces a corresponding helper to deduplicate code and mitigate the risk that the last step is forgotten (which can easily happen, as it would not be detected by tests). It is applied in the code paths for ECDSA signing, Schnorr signing, public key creation and ecmult_gen blinding setup. The only remaining instance where we directly call `_ecmult_gen` is for [musig nonce generation](https://github.com/bitcoin-core/secp256k1/blob/a39093de15345a330fc8acbd09515fd150853bc8/src/modules/musig/session_impl.h#L416), as we apply batch inversion there for the two points. The idea came up during a conversation with furszy, who caught that the gej clearing was missing in the silentpayments module (sending function) as well (see #1765 (comment), [b10c mirror link](https://mirror.b10c.me/bitcoin-core-secp256k1/1765/#issuecomment-4482838033)). If this gets conceptual support, I'd be curious to hear naming suggestions, as I'm not sure if the current one is fits well to the existing terminology (maybe `ecmult_gen_ge` or `ecmult_gen_to_affine`?). ACKs for top commit: real-or-random: utACK 9e017e5 furszy: ACK 9e017e5 Tree-SHA512: e9dc96c4301622e5b258de5c2cff5bd9f27d07d3a5f881d937c3db526e2c0a7ba5772ea24585af37be359e0510136916a15b39316c7c351d068c09545c003b6e
Add a routine for the entire sending flow which takes a set of private keys,
the smallest outpoint, and list of recipients and returns a list of
x-only public keys by performing the following steps:
1. Sum up the private keys
2. Calculate the input_hash
3. For each recipient group:
3a. Calculate a shared secret
3b. Create the requested number of outputs
This function assumes a single sender context in that it requires the
sender to have access to all of the private keys. In the future, this
API may be expanded to allow for a multiple senders or for a single
sender who does not have access to all private keys at any given time,
but for now these modes are considered out of scope / unsafe.
Internal to the library, add:
1. A function for creating shared secrets (i.e., a*B or b*A)
2. A function for generating the "SharedSecret" tagged hash
3. A function for creating a single output public key
Co-authored-by: w0xlt <94266259+w0xlt@users.noreply.github.com>
Add function for creating a label tweak. This requires a tagged hash function for labels. This function is used by the receiver for creating labels to be used for a) creating labeled addresses and b) to populate a labels cache when scanning. Add function for creating a labeled spend pubkey. This involves taking a label tweak, turning it into a public key and adding it to the spend public key. This function is used by the receiver to create a labeled silent payment address. Add tests for the label API. Co-authored-by: w0xlt <94266259+w0xlt@users.noreply.github.com>
Add routine for scanning a transaction and returning the necessary spending data for any found outputs. This function works with labels via a lookup callback and requires access to the transaction outputs. Requiring access to the transaction outputs is not suitable for light clients, but light client support is enabled in a future release. Add an opaque data type for passing around the prevout public key sum and the input hash tweak (input_hash). This data is passed to the scanner before the ECDH step as two separate elements so that the scanner can multiply the scan_key * input_hash before doing ECDH. Finally, add test coverage for the receiving API. Co-authored-by: w0xlt <94266259+w0xlt@users.noreply.github.com>
This affects both the sending and scanning API functions: * Sending fails if any group is exceeding the limit. * Scanning doesn't look beyond the limit. Also add a recommendation to the API docs to shuffle the `tx_outputs` input array, which improves the worst-case by ~2x. Co-authored-by: nymius <155548262+nymius@users.noreply.github.com>
Demonstrate sending and scanning on full nodes.
Add a benchmark for a full transaction scan, both for the common case and worst-case (full-block sized tx) scenarios. Only benchmarks for scanning are added as this is the most performance critical portion of the protocol. Co-authored-by: Sebastian Falbesoner <91535+thestack@users.noreply.github.com>
This improves the worst-case scanning scenario performance by ~2.5x
and also helps notably in the common case ("no match") scenario,
especially if the number of outputs is high:
Benchmark results on parent commit (without optimization):
silentpayments_scan_nomatch_N=2 , 43.0 , 43.1 , 43.1
silentpayments_scan_nomatch_N=5 , 46.8 , 46.9 , 46.9
silentpayments_scan_nomatch_N=10 , 53.5 , 53.6 , 53.6
silentpayments_scan_nomatch_N=100 , 175.0 , 178.0 , 186.0
silentpayments_scan_nomatch_N=1000 , 1425.0 , 1430.0 , 1436.0
silentpayments_scan_nomatch_N=2323 , 3251.0 , 3260.0 , 3269.0
silentpayments_scan_nomatch_N=23250 , 32039.0 , 32167.0 , 32294.0
silentpayments_scan_worstcase_K=10 , 352706.0 , 352781.0 , 352872.0
silentpayments_scan_worstcase_K=100 , 3230214.0 , 3231089.0 , 3231796.0
silentpayments_scan_worstcase_K=1000 , 31401430.0 , 31406989.0 , 31416195.0
silentpayments_scan_worstcase_K=2323 , 70759992.0 , 70764742.0 , 70769823.0
Benchmark results on this commit (with optimization):
silentpayments_scan_nomatch_N=2 , 42.2 , 42.2 , 42.2
silentpayments_scan_nomatch_N=5 , 43.6 , 43.6 , 43.7
silentpayments_scan_nomatch_N=10 , 47.1 , 47.1 , 47.1
silentpayments_scan_nomatch_N=100 , 101.0 , 104.0 , 109.0
silentpayments_scan_nomatch_N=1000 , 654.0 , 658.0 , 664.0
silentpayments_scan_nomatch_N=2323 , 1473.0 , 1476.0 , 1479.0
silentpayments_scan_nomatch_N=23250 , 14400.0 , 14412.0 , 14425.0
silentpayments_scan_worstcase_K=10 , 157215.0 , 157242.0 , 157267.0
silentpayments_scan_worstcase_K=100 , 1440043.0 , 1440081.0 , 1440106.0
silentpayments_scan_worstcase_K=1000 , 13995233.0 , 13995646.0 , 13995923.0
silentpayments_scan_worstcase_K=2323 , 31535643.0 , 31538121.0 , 31542702.0
Add the BIP-352 test vectors. The vectors are generated with a Python script that converts the .json file from the BIP to C code: $ ./tools/tests_silentpayments_generate.py test_vectors.json > ./src/modules/silentpayments/vectors.h Co-authored-by: Ron <4712150+macgyver13@users.noreply.github.com> Co-authored-by: Sebastian Falbesoner <91535+thestack@users.noreply.github.com> Co-authored-by: Tim Ruffing <1071625+real-or-random@users.noreply.github.com>
Co-authored-by: Jonas Nick <2582071+jonasnick@users.noreply.github.com> Co-authored-by: Sebastian Falbesoner <91535+thestack@users.noreply.github.com>
Test midstate tags used in silent payments.
The worst-case scanning benchmarks and the common-case scanning benchmarks with N>10 are relatively slow, leading to signifcantly long run times of CI jobs. Avoid this by skipping these if the iters count is <= 2 (CI jobs run with SECP256K1_BENCH_ITERS=2). This is the same approach used in the ecmult benchmarks (bench_ecmult).
a8f297a to
2451607
Compare
|
Rebased on master. By taking use of the newly introduced `$ git range-diff a8f297a...2451607` |
|
Failures seem to be docker related, likely just need a restart. |
…e_var` 240578e bench: add internal benchmark for `secp256k1_fe_normalize_var` (Sebastian Falbesoner) Pull request description: While addressing the review suggestion #1765 (comment) ([b10c mirror link](https://mirror.b10c.me/bitcoin-core-secp256k1/1765/#discussion_r3238616034)), I noticed that we don't have an internal benchmark for the variable-time variant of `_fe_normalize` yet, so this PR adds one. IIUC it's fine to repeatedly apply the operation on the same (already normalized at latest after the first loop iteration) field element for benchmarking purposes and don't put in an effort to reach the [final reduction code path](https://github.com/bitcoin-core/secp256k1/blob/b11340b3ce2afac1b6ffda4ce5828c30621d2917/src/field_5x52_impl.h#L120-L132), considering how extremely unlikely it is to reach it in practice. Results on my machine: ``` $ ./build/bin/bench_internal normalize Benchmark , Min(us) , Avg(us) , Max(us) field_normalize , 0.0103 , 0.0106 , 0.0128 field_normalize_var , 0.00545 , 0.00546 , 0.00547 field_normalize_weak , 0.00352 , 0.00354 , 0.00363 ``` ACKs for top commit: real-or-random: utACK 240578e Tree-SHA512: 4480e65b24c9e3c498389c5faf807cc44ae2a421d4500dd95066f9bb4f4885c67d2a6e1875e93912b96422963fd1430f724d442f30eb152faf86302ba266bd94
|
Kicked CI |
|
reACK 2451607 |
Description
This PR implements BIP352 with scanning limited to full-nodes. Light-client scanning is planned to be added in a separate PR in the future. The following 7 API functions are currently introduced:
Sender side [BIP description]:
secp256k1_silentpayments_sender_create_outputs: given a list ofReceiver side, label creation [BIP description]:
secp256k1_silentpayments_recipient_label_create: given a scan secret key and label integer, calculate the corresponding label tweak and label objectsecp256k1_silentpayments_recipient_label_serialize: given a label object, create the corresponding 33-byte serializationsecp256k1_silentpayments_recipient_label_parse: given a 33-byte label representation, create the corresponding label objectsecp256k1_silentpayments_recipient_create_labeled_spend_pubkey: given a spend public key and a label object, create the corresponding labeled spend public keyReceiver side, scanning [BIP description]:
secp256k1_silentpayments_recipient_prevouts_summary_create: given a list ofprevouts_summaryobject needed for scanningsecp256k1_silentpayments_recipient_scan_outputs: given aprevouts_summaryobject, a recipients scan secret key and spend public key, and the relevant transaction outputs (x-only public keys), scan for outputs belonging to the recipients and and return the tweak(s) needed for spending the output(s). Optionally, a label_lookup callback function can be provided to also scan for labels.For a higher-level overview on what these functions exactly do, it's suggested to look at a corresponding Python implementation that was created based on the secp256k1lab project (it passes the test vectors, so this "executable pseudo-code" should be correct).
Changes to the previous take
Based on the latest state of the previous PR #1698 (take 3), the following changes have been made:
_prevout_summary_{parse,serialize},__recipient_create_output_pubkeys), adapted tests and benchmark accordinglyThe scope reduction isn't immediately visible in commit count (only one commit was only introducing light-client relevant functionality and could be completely removed), but the review burden compared #1698 is still significantly lower in terms of LOC, especially in the receiving commit.
Open questions / TODOs
Recent proposals of reducing the worst-case scanning time (see posts by w0xlt and jonasnick, Add BIP352 module (take 3) #1698 (comment) ff.) are not taken into account yet.➡️ solved by marking already-found outputs, see Add "silentpayments" module implementing BIP352 (take 4, limited to full-node scanning) #1765 (comment) ✔️Not providing➡️ solved by mentioning a "don't do this" comment in the API header (same phrasing as in other modules), see Add "silentpayments" module implementing BIP352 (take 4, limited to full-node scanning) #1765 (comment) ✔️prevouts_summary(de)serialization functionality yet in the API poses the risk that users try to do it anyway by treating the opaque object as "serialized". How to cope with that? Is adding a "don't do this" comment in API header sufficient?