[#10029][benchmarks] arrow-flight roundtrip as well as encode/decode #10031
Conversation
20b0d75 to
4f6d153
Compare
There was a problem hiding this comment.
👍 nice work @Rich-T-kid! pretty clean benchmarks. Left some suggestions, but otherwise this LGTM
alamb
left a comment
There was a problem hiding this comment.
Thanks @Rich-T-kid and @gabotechs
I think this looks good to me after we implement @gabotechs 's suggestion
97bc1a9 to
a0ce9a3
Compare
|
pushed a revised PR @alamb. im not sure why the CI is failing. I'm running |
Looks like the fail is https://github.com/apache/arrow-rs/actions/runs/26834183268/job/79123485734?pr=10031 do we need to check in the Cargo.lock file 🤔 |
|
yea the error mentions the cargo.lock file but it doesn't exist within the arrow-flight repo. |
|
The issue is that the cargo.lock file on main doesnt have |
92c2c64 to
dbf546c
Compare
|
😰 fixed it |
alamb
left a comment
There was a problem hiding this comment.
Thanks @Rich-T-kid -- I started going through this one.
I suggest:
- Put the code in single benchmark for now
arrow-flight/benches/flight.rs - Name the benchmarks after whatever flight command they are doing (
putandexchangefor example) - Run the benchmarks locally (via
cargo bench --bench <name>) and profile them to make sure they show that the encoder is taking substantial time
| let mut client = FlightClient::new(channel); | ||
| let frames = FlightDataEncoderBuilder::new().build(futures::stream::iter([Ok(batch)])); | ||
| let _: Vec<_> = client | ||
| .do_put(frames) |
There was a problem hiding this comment.
this seems to be benchmarking do_put not really flight_encode 🤔
There was a problem hiding this comment.
Originally I wanted to do an end-to-end benchmark, but I think it makes more sense to have smaller, more focused benchmarks. I updated the benchmarks to include one round-trip benchmark and another that benchmarks FlightDataEncoder::encode_batch() directly. Other benchmarks can be added later organically.
There was a problem hiding this comment.
majority of the time is being spent int arrow-ipc as expected, this is what #10044 aims to address. its not directly wired up to to arrow-flight yet, but the results seems to be good so far.
|
CI is finally passing, sorry about that! @alamb should be good now, lmk if theres anything else |
alamb
left a comment
There was a problem hiding this comment.
Thank you @Rich-T-kid -- while it did take some back and forth I think this benchmark will now let us improve things substnatially.
I agree with your assessment that there is a lot of copying going on -- it will be great to see PRs to avoid it
|
Thank you aslo @gabotechs for your review |
Which issue does this PR close?
Rationale for this change
Provides benchmarks for arrow-flight crate. benchmarks for round trip as well as encode/decode individually.
What changes are included in this PR?
Adds three criterion benches under arrow-flight/benchmarks/ (roundtrip.rs, flight_encode.rs, flight_decode.rs), each sweeping a tunable matrix of rows, cols, and column types (fixed Int64, variable StringArray, nested List, dict DictionaryArray) built via a shared
common::build_batch helper.
Are these changes tested?
n/a
Are there any user-facing changes?
no