Releases: Eventual-Inc/parquet-format-safe
Releases · Eventual-Inc/parquet-format-safe
Patched parquet-format-safe
fix: Resolve mismatch with Thrift compact protocol The [Thrift compact protocol](https://github.com/apache/thrift/blob/master/doc/specs/thrift-compact-protocol.md) is used for Parquet file metadata. [parquet-format-safe](https://github.com/jorgecarleitao/parquet-format-safe) and other Rust implementations of the protocol eagerly read string/binary fields as UTF-8. However, based on the protocol which states that > Strings are first encoded to UTF-8, and then send as binary it cannot be known upfront, without using the schema to disambiguate the field type, whether a field is a string or a binary. This means that when the field is actually a binary field and contains invalid UTF-8, Rust libraries error out when reading the field with `File out of specification: Invalid thrift: bad data`. To fix this, we patch the protocol implementation to correctly interpret string/binary fields as binary.