Skip to content

Experimental: implement OpenPGP message grammar validation#18

Merged
larabr merged 5 commits intoProtonMail:mainfrom
larabr:proton-message-grammar-check
May 14, 2025
Merged

Experimental: implement OpenPGP message grammar validation#18
larabr merged 5 commits intoProtonMail:mainfrom
larabr:proton-message-grammar-check

Conversation

@larabr
Copy link
Copy Markdown

@larabr larabr commented May 9, 2025

This solution gives us the option to probe whether the grammar is too disruptive in a real-world setting, by testing it in the Proton ecosystem.

@larabr larabr marked this pull request as draft May 10, 2025 01:30
@larabr larabr force-pushed the proton-message-grammar-check branch from 90ef0d8 to c7973fd Compare May 13, 2025 12:19
@larabr larabr marked this pull request as ready for review May 13, 2025 12:20
@larabr larabr force-pushed the proton-message-grammar-check branch from 3b2219a to 1a68010 Compare May 13, 2025 14:43
Comment thread src/config/config.js Outdated
Comment thread src/util.js Outdated
Comment on lines +199 to +202
// Grammar validation cannot be run before message integrity has been enstablished,
// to avoid leaking info about the unauthenticated message structure.
const releaseUnauthenticatedStream = util.isStream(encrypted) && config.allowUnauthenticatedStream;
grammarValidator = getMessageGrammarValidator({ delayReporting: releaseUnauthenticatedStream });
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this condition is correct, because even if we're not streaming or config.allowUnauthenticatedStream is false, we should still wait with reporting grammar errors until after the MDC check has been done. So I think this should just be:

Suggested change
// Grammar validation cannot be run before message integrity has been enstablished,
// to avoid leaking info about the unauthenticated message structure.
const releaseUnauthenticatedStream = util.isStream(encrypted) && config.allowUnauthenticatedStream;
grammarValidator = getMessageGrammarValidator({ delayReporting: releaseUnauthenticatedStream });
// Grammar validation cannot be run before message integrity has been enstablished,
// to avoid leaking info about the unauthenticated message structure.
grammarValidator = getMessageGrammarValidator({ delayReporting: true });

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are not streaming unauthenticated data, all the packet bytes are authenticated before being passed to the parser, because there is a readToEnd here: https://github.com/ProtonMail/openpgpjs/blob/main/src/packet/sym_encrypted_integrity_protected_data.js#L213; so the grammar checker can throw as soon as it detects an error.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK yes, fair enough.
But then still, since we have all the data available in that case, what's the advantage of checking after every packet? We're basically checking the grammar repeatedly after every packet, even though we already have all of them. So I think it's faster to just check it once at the end, especially in the common case of the message being valid.
(But, I guess that's less critical.)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we can avoid parsing the rest of the stream if there is an issue. It's a cheap check

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure, but again, it doesn't really make sense to optimize for the "if there's an issue" case at the cost of the happy path

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does if the cost for the happy path is zero. It's checking the content an array of numbers that has like 5 elements at most 😅

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely convinced that it's free, there's a bunch of allocations in there and the regression test complains. But anyway, we can refactor this later if needed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the test failed for an actual regression 👼 pushed a fix

Comment thread src/packet/sym_encrypted_integrity_protected_data.js
Comment thread src/packet/sym_encrypted_integrity_protected_data.js Outdated
Comment thread src/packet/grammar.ts Outdated
Comment thread src/packet/grammar.ts Outdated
larabr added 2 commits May 13, 2025 19:03
…mar`)

It enforces a message structure as defined in
https://www.rfc-editor.org/rfc/rfc9580.html#section-10.3
(but slightly more permissive with Padding packets allowed in all cases).
Since we are unclear on whether this change might
impact handling of some messages in the wild, generated by
odd use-cases or non-conformant implementations, we
also add the option to disable the grammar check via
`config.enforceGrammar`.

This solution gives us the option to probe whether the
grammar is too disruptive in a real-world setting,
by testing it in the Proton ecosystem.

GrammarErrors are only sensitive in the context of
unauthenticated decrypted streams.
Data is known to be authenticated at the end of the Packetlist stream parsing,
even for messages with MDC check.
@larabr larabr force-pushed the proton-message-grammar-check branch from 1a68010 to 200a1c7 Compare May 13, 2025 17:03
@larabr larabr merged commit 414df43 into ProtonMail:main May 14, 2025
12 of 13 checks passed
Comment thread openpgp.d.ts
import enums from './src/enums';
import config, { type Config, type PartialConfig } from './src/config';

export { enums, config, PartialConfig };
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

export { enums, config, **Config**, PartialConfig };

(TODO fix for upstream release)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants