Experimental: implement OpenPGP message grammar validation#18
Experimental: implement OpenPGP message grammar validation#18larabr merged 5 commits intoProtonMail:mainfrom
Conversation
To access the types in internally
To access the types in internally
90ef0d8 to
c7973fd
Compare
3b2219a to
1a68010
Compare
| // Grammar validation cannot be run before message integrity has been enstablished, | ||
| // to avoid leaking info about the unauthenticated message structure. | ||
| const releaseUnauthenticatedStream = util.isStream(encrypted) && config.allowUnauthenticatedStream; | ||
| grammarValidator = getMessageGrammarValidator({ delayReporting: releaseUnauthenticatedStream }); |
There was a problem hiding this comment.
I don't think this condition is correct, because even if we're not streaming or config.allowUnauthenticatedStream is false, we should still wait with reporting grammar errors until after the MDC check has been done. So I think this should just be:
| // Grammar validation cannot be run before message integrity has been enstablished, | |
| // to avoid leaking info about the unauthenticated message structure. | |
| const releaseUnauthenticatedStream = util.isStream(encrypted) && config.allowUnauthenticatedStream; | |
| grammarValidator = getMessageGrammarValidator({ delayReporting: releaseUnauthenticatedStream }); | |
| // Grammar validation cannot be run before message integrity has been enstablished, | |
| // to avoid leaking info about the unauthenticated message structure. | |
| grammarValidator = getMessageGrammarValidator({ delayReporting: true }); |
There was a problem hiding this comment.
If we are not streaming unauthenticated data, all the packet bytes are authenticated before being passed to the parser, because there is a readToEnd here: https://github.com/ProtonMail/openpgpjs/blob/main/src/packet/sym_encrypted_integrity_protected_data.js#L213; so the grammar checker can throw as soon as it detects an error.
There was a problem hiding this comment.
OK yes, fair enough.
But then still, since we have all the data available in that case, what's the advantage of checking after every packet? We're basically checking the grammar repeatedly after every packet, even though we already have all of them. So I think it's faster to just check it once at the end, especially in the common case of the message being valid.
(But, I guess that's less critical.)
There was a problem hiding this comment.
Because we can avoid parsing the rest of the stream if there is an issue. It's a cheap check
There was a problem hiding this comment.
Yes sure, but again, it doesn't really make sense to optimize for the "if there's an issue" case at the cost of the happy path
There was a problem hiding this comment.
It does if the cost for the happy path is zero. It's checking the content an array of numbers that has like 5 elements at most 😅
There was a problem hiding this comment.
I'm not entirely convinced that it's free, there's a bunch of allocations in there and the regression test complains. But anyway, we can refactor this later if needed.
There was a problem hiding this comment.
Yeah the test failed for an actual regression 👼 pushed a fix
…mar`) It enforces a message structure as defined in https://www.rfc-editor.org/rfc/rfc9580.html#section-10.3 (but slightly more permissive with Padding packets allowed in all cases). Since we are unclear on whether this change might impact handling of some messages in the wild, generated by odd use-cases or non-conformant implementations, we also add the option to disable the grammar check via `config.enforceGrammar`. This solution gives us the option to probe whether the grammar is too disruptive in a real-world setting, by testing it in the Proton ecosystem. GrammarErrors are only sensitive in the context of unauthenticated decrypted streams.
Data is known to be authenticated at the end of the Packetlist stream parsing, even for messages with MDC check.
1a68010 to
200a1c7
Compare
| import enums from './src/enums'; | ||
| import config, { type Config, type PartialConfig } from './src/config'; | ||
|
|
||
| export { enums, config, PartialConfig }; |
There was a problem hiding this comment.
export { enums, config, **Config**, PartialConfig };
(TODO fix for upstream release)
This solution gives us the option to probe whether the grammar is too disruptive in a real-world setting, by testing it in the Proton ecosystem.