Add config to avoid checking MD5 on get and put#928
Conversation
f8269e5 to
38ba00c
Compare
|
Thanks for the PR. Let's remove the relation to no-check-md5, and create a new config key like: And in your case, we can use a value like: With that, in the future, we could have something like: |
|
Disabling ETag is really the only sane choice. Some background, S3Proxy builds on top of Apache jclouds, a cross-cloud library, and jclouds returns a String from |
|
Most providers implementing the Amazon s3 protocol, are respecting the aws s3 behavior. I say "md5", but we can say "md5s3". Even if they allows themselves to change in the future, the algo is pretty deterministic and never changed so far. There is even some way to check the md5 for multipart files if you have a "fixed" block size. Anyway, why the etag is useful?
|
|
HTTPS should provide sufficient protection against corruption. The Azure ETag is opaque and includes some kind of timestamp it seems. The other two providers do not actually return an ETag header and the S3Proxy/jclouds behavior is bogus and likely should not return any header. Thus there is no work to do here except adding an ignore flag as this pull request does. |
|
In my opinion, in such a case, https is not necessarily sufficient.
Regarding the patch, in my opinion it is a new config just to deal with specific lines that conflict with servers that have a different behavior. But nothing that would be really clear for an user or future proof. So, that's why i think that adding something more generic stating what is the supposed meaning of the ETag for a specific service could fix your issue and even have an added value. |
|
Checking ETag is not too useful given that most interesting uploads use multipart which also does not have a meaningful value. Most S3 tools which check ETag have some way to disable it, for example, aws/aws-sdk-java#560. If you will not accept this pull request as-is, please close it. |
38ba00c to
0f7286d
Compare
Overload existing --no-check-md5. Object stores like S3Proxy cannot return an MD5 ETag using some providers: * Atmos returns nothing * Azure returns an opaque ETag * B2 returns a SHA1
|
IMHO I don't think Etag is a good candidate to use for integrity checking - there are too many cases where it is not an MD5 hash:
Etag I think can only be sensibly used to detect if the object has changed since the last known Etag. For S3-compatible services, s3cmd is the only one that prints warnings about MD5 hashes not matching if the Etag is not an MD5 hash. The official aws cli does not do this, nor do other s3 clients. If you want to guarantee integrity on the server side, you can do this using the V4 signature using a variety of options for the There are also checksum-specific headers in the reponse that you can use when retrieving objects if you wish, The Etag is supposed to be treated as an opaque identifier:
|
Overload existing
--no-check-md5. Object stores like S3Proxy cannotreturn an MD5 ETag using some providers: