diff --git a/_posts/2025-04-27-20.0.0-release.md b/_posts/2025-04-27-20.0.0-release.md new file mode 100644 index 000000000000..e86cb9e95055 --- /dev/null +++ b/_posts/2025-04-27-20.0.0-release.md @@ -0,0 +1,189 @@ +--- +layout: post +title: "Apache Arrow 20.0.0 Release" +date: "2025-04-27 00:00:00" +author: pmc +categories: [release] +--- + + +The Apache Arrow team is pleased to announce the 20.0.0 release. This release +covers over 2 months of development work and includes [**259 resolved +issues**][1] on [**327 distinct commits**][2] from [**63 distinct +contributors**][2]. See the [Install Page](https://arrow.apache.org/install/) to +learn how to get the libraries for your platform. + +The release notes below are not exhaustive and only expose selected highlights +of the release. Many other bugfixes and improvements have been made: we refer +you to the [complete changelog][3]. + +## Community + +Since the 19.0.0 release, Ed Seidl, Jean-Baptiste Onofré and Matthijs Brobbel +have been invited to become committers. Bryce Mecum, Ian Cook, Jacob Wujciak-Jens +and Rok Mihevc have been invited to join the Project Management Committee (PMC). + +Thanks for your contributions and participation in the project! + +## C++ Notes + +### Compute + +We’ve added several new compute functions: inverse_permutation/scatter ([GH-44393](https://github.com/apache/arrow/issues/44393)), pivot_wider/hash_pivot_wider ([GH-45269](https://github.com/apache/arrow/issues/45269)), rank_normal ([GH-45572](https://github.com/apache/arrow/issues/45572)), skew/kurtosis ([GH-45676](https://github.com/apache/arrow/issues/45676)), and winsorize ([GH-45755](https://github.com/apache/arrow/issues/45755)). + +### Acero + +We've significantly improved the Hash Join in terms of overflow-safety ([GH-44513](https://github.com/apache/arrow/issues/44513), [GH-45334](https://github.com/apache/arrow/issues/45334), [GH-45506](https://github.com/apache/arrow/issues/45506)), memory consumption (peak memory usage reduced by half: [GH-45551](https://github.com/apache/arrow/issues/45551)), and performance (up to several dozen times faster: [GH-45611](https://github.com/apache/arrow/issues/45611), [GH-45917](https://github.com/apache/arrow/issues/45917)). + +### Flight RPC + +- The experimental Flight over UCX feature has been removed. +([#43296](https://github.com/apache/arrow/issues/43296)) + +## C# Notes + +- Added support for the `Ordered` and `AppMetaData` fields to FlightInfo ([#45753](https://github.com/apache/arrow/pull/45753)) +- FlightClient can now be integrated with [Grpc.Net.ClientFactory](https://learn.microsoft.com/en-us/aspnet/core/grpc/clientfactory?view=aspnetcore-9.0) ([#45451](https://github.com/apache/arrow/issues/45451)) + +## Linux Packaging Notes + +https://apache.jfrog.io/ is still available but https://packages.apache.org/ is preferred because the latter uses the apache.org domain. + +## Python Notes + +Compatibility notes: +* Minimum supported Cython has been raised to 3 and higher [GH-45237](https://github.com/apache/arrow/issues/45237) . +* A subset of deprecated APIs have been removed + [GH-45680](https://github.com/apache/arrow/issues/45680): + ``PARQUET_2_0`` [GH-45848](https://github.com/apache/arrow/issues/45848), + ``use_legacy_dataset`` [GH-44790](https://github.com/apache/arrow/issues/44790), + serialize/deserialize PyArrow C++ code [GH-43587](https://github.com/apache/arrow/issues/43587) . + +New features: +* Large variable width types are supported in NumPy conversion + [GH-35289](https://github.com/apache/arrow/issues/35289). +* Biased/unbiased option are available in skew and kurtosis compute functions + [GH-45733](https://github.com/apache/arrow/issues/45733). +* Support for SAS token in the ``AzureFileSystem`` has been added + [GH-45705](https://github.com/apache/arrow/issues/45705). +* Interchange of ``decimal32``, ``decimal64`` and ``decimal256`` data type objects between + Pandas and PyArrow is now supported [GH-45582](https://github.com/apache/arrow/issues/45582), + [GH-45570](https://github.com/apache/arrow/issues/45570). +* ``pyarrow.ArrayStatistics``and ``pyarrow.Array.statistics()`` are added + [GH-45457](https://github.com/apache/arrow/issues/45457). +* Bindings for JSON streaming reader are added + [GH-14932](https://github.com/apache/arrow/issues/14932). +* Bindings for ``MemoryPool::total_bytes_allocated`` and ``MemoryPool::num_allocations`` are added. + Also allocator-specific statistics can now be printed to stderr [GH-45358](https://github.com/apache/arrow/issues/45358). +* A new ``maps_as_pydicts``parameter is introduced to ``to_pylist``, ``to_pydict`` and ``as_py`` + methods enabling deserialization into Python dictionary instead of list of tuples + [GH-39010](https://github.com/apache/arrow/issues/39010). + +Other improvements: +* Source (sdist) and binary distribution (wheels) are now uploaded to GitHub Releases + [GH-45920](https://github.com/apache/arrow/issues/45920). +* Cython code has been cleaned up as we now require at least Cython 3.0 + [GH-45433](https://github.com/apache/arrow/issues/45433). +* Building of free-threaded wheels on Windows is enabled + [GH-44421](https://github.com/apache/arrow/issues/44421) . +*Wheels for Alpine Linux are now provided [GH-18036](https://github.com/apache/arrow/issues/18036) . + + +Relevant bug fixes: +* Pandas conversion roundtrip with bytes column names error is fixed + [GH-44188](https://github.com/apache/arrow/issues/44188). +* Exceptions are raised instead of showing segfaults when users try to instantiate internal Parquet + metadata classes [GH-36628](https://github.com/apache/arrow/issues/36628). + +## R Notes + +- Binary Arrays now inherit from `blob::blob` in addition to `arrow_binary` when + [converted to R + objects](https://arrow.apache.org/docs/r/articles/data_types.html#translations-from-arrow-to-r). + This change is the first step in eventually deprecating the `arrow_binary` + class in favor of the `blob` class in the + [`blob`](https://cran.r-project.org/package=blob) package (See + [GH-45709](https://github.com/apache/arrow/issues/45709)). + +## Ruby and C GLib Notes + +### Improvements + +- `garrow_array_validate()` / `Arrow::Array#validate`: Added. + - [GH-45328](https://github.com/apache/arrow/issues/45328) +- `garrow_array_validate_full()` / `Arrow::Array#validate_full`: Added. + - [GH-45342](https://github.com/apache/arrow/issues/45342) +- `garrow_record_batch_validate()` / `Arrow::RecordBatch#validate`: Added. + - [GH-45353](https://github.com/apache/arrow/issues/45353) +- `garrow_record_batch_validate_full()` / + `Arrow::RecordBatch#validate_full`: Added. + - [GH-45386](https://github.com/apache/arrow/issues/45386) +- `garrow_table_validate()` / `Arrow::Table#validate`: Added. + - [GH-45414](https://github.com/apache/arrow/issues/45414) +- `garrow_table_validate_full()` / `Arrow::Table#validate_full`: Added. + - [GH-45468](https://github.com/apache/arrow/issues/45468) +- `GArrowArrayStatistics()` / `Arrow::ArrayStatistics`: Added. + - [GH-45490](https://github.com/apache/arrow/issues/45490) +- Changed to require Meson 0.61.2 or later. +- `GArrowBinaryViewArray()` / `Arrow::BinaryViewArray`: Added. + - [GH-45650](https://github.com/apache/arrow/issues/45650) +- `GArrowStringViewArray()` / `Arrow::StringViewArray`: Added. + - [GH-45711](https://github.com/apache/arrow/issues/45711) +- Added support for + [rubygems-requirements-system](https://rubygems.org/gems/rubygems-requirements-system) + - [GH-45976](https://github.com/apache/arrow/issues/45976) + +### Incompatible changes + +- `gparquet_arrow_file_writer_new_row_group()` / + `Parquet:ArrowFileWriter#new_row_group`: Removed `chunk_size` argument. + - [GH-45088](https://github.com/apache/arrow/issues/45088) +- `garrow_record_batch_new()` / `Arrow::RecordBatch#initialize`: + Stopped validating automatically. If you want to validate a created + record batch, call `garrow_record_batch_validate()` / + `Arrow::RecordBatch#validate` explicitly. + - [GH-45353](https://github.com/apache/arrow/issues/45353) +- `garrow_table_new()` / `Arrow::Table#initialize`: Stopped validating + automatically. If you want to validate a created table, call + `garrow_table_validate()` / `Arrow::Table#validate` explicitly. + - [GH-45414](https://github.com/apache/arrow/issues/45414) + +## Java, Go, and Rust Notes + +The Java, Go, and Rust Go projects have moved to separate repositories outside +the main Arrow [monorepo](https://github.com/apache/arrow). + +- For notes on the latest release of the [Java +implementation](https://github.com/apache/arrow-java), see the latest [Arrow +Java changelog][7]. +- For notes on the latest release of the [Rust + implementation](https://github.com/apache/arrow-rs) see the latest [Arrow Rust + changelog][5]. +- For notes on the latest release of the [Go +implementation](https://github.com/apache/arrow-go), see the latest [Arrow Go +changelog][6]. + +[1]: https://github.com/apache/arrow/milestone/65?closed=1 +[2]: {{ site.baseurl }}/release/20.0.0.html#contributors +[3]: {{ site.baseurl }}/release/20.0.0.html#changelog +[4]: {{ site.baseurl }}/docs/r/news/ +[5]: https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md +[6]: https://github.com/apache/arrow-go/releases +[7]: https://github.com/apache/arrow-java/releases