Commit 29ba746
committed
fix: Resolve mismatch with Thrift compact protocol
The [Thrift compact
protocol](https://github.com/apache/thrift/blob/master/doc/specs/thrift-compact-protocol.md)
is used for Parquet file metadata.
[parquet-format-safe](https://github.com/jorgecarleitao/parquet-format-safe)
and other Rust implementations of the protocol eagerly read
string/binary fields as UTF-8.
However, based on the protocol which states that
> Strings are first encoded to UTF-8, and then send as binary
it cannot be known upfront, without using the schema to disambiguate the
field type, whether a field is a string or a binary. This means that
when the field is actually a binary field and contains invalid UTF-8,
Rust libraries error out when reading the field with `File out of
specification: Invalid thrift: bad data`.
To fix this, we patch the protocol implementation to correctly interpret
string/binary fields as binary.1 parent b499e46 commit 29ba746
6 files changed
Lines changed: 50 additions & 50 deletions
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
136 | | - | |
| 136 | + | |
137 | 137 | | |
138 | 138 | | |
139 | 139 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
324 | 324 | | |
325 | 325 | | |
326 | 326 | | |
327 | | - | |
| 327 | + | |
328 | 328 | | |
329 | 329 | | |
330 | 330 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
263 | 263 | | |
264 | 264 | | |
265 | 265 | | |
266 | | - | |
| 266 | + | |
267 | 267 | | |
268 | 268 | | |
269 | 269 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
162 | 162 | | |
163 | 163 | | |
164 | 164 | | |
165 | | - | |
| 165 | + | |
166 | 166 | | |
167 | 167 | | |
168 | 168 | | |
| |||
703 | 703 | | |
704 | 704 | | |
705 | 705 | | |
706 | | - | |
707 | | - | |
| 706 | + | |
| 707 | + | |
708 | 708 | | |
709 | 709 | | |
710 | 710 | | |
| |||
732 | 732 | | |
733 | 733 | | |
734 | 734 | | |
735 | | - | |
| 735 | + | |
736 | 736 | | |
737 | 737 | | |
738 | 738 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
148 | 148 | | |
149 | 149 | | |
150 | 150 | | |
151 | | - | |
| 151 | + | |
152 | 152 | | |
153 | 153 | | |
154 | 154 | | |
| |||
0 commit comments