Skip to content
Merged
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
193 changes: 193 additions & 0 deletions _posts/2025-04-27-20.0.0-release.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
---
layout: post
title: "Apache Arrow 20.0.0 Release"
date: "2025-04-27 00:00:00"
author: pmc
categories: [release]
---
<!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to you under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
{% endcomment %}
-->

The Apache Arrow team is pleased to announce the 20.0.0 release. This release
covers over 2 months of development work and includes [**259 resolved
issues**][1] on [**327 distinct commits**][2] from [**63 distinct
contributors**][2]. See the [Install Page](https://arrow.apache.org/install/) to
learn how to get the libraries for your platform.

The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bugfixes and improvements have been made: we refer
you to the [complete changelog][3].

## Community

Since the 19.0.0 release, Ed Seidl, Jean-Baptiste Onofré and Matthijs Brobbel
have been invited to become committers. Bryce Mecum, Ian Cook, Jacob Wujciak-Jens
and Rok Mihevc have been invited to join the Project Management Committee (PMC).

Thanks for your contributions and participation in the project!

## C++ Notes

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


### Compute

Comment thread
assignUser marked this conversation as resolved.
We’ve added several new compute functions: inverse_permutation/scatter ([GH-44393](https://github.com/apache/arrow/issues/44393)), pivot_wider/hash_pivot_wider ([GH-45269](https://github.com/apache/arrow/issues/45269)), rank_normal ([GH-45572](https://github.com/apache/arrow/issues/45572)), skew/kurtosis ([GH-45676](https://github.com/apache/arrow/issues/45676)), and winsorize ([GH-45755](https://github.com/apache/arrow/issues/45755)).

### Acero

Comment thread
assignUser marked this conversation as resolved.
We've significantly improved the Hash Join in terms of overflow-safety ([GH-44513](https://github.com/apache/arrow/issues/44513), [GH-45334](https://github.com/apache/arrow/issues/45334), [GH-45506](https://github.com/apache/arrow/issues/45506)), memory consumption (peak memory usage reduced by half: [GH-45551](https://github.com/apache/arrow/issues/45551)), and performance (up to several dozen times faster: [GH-45611](https://github.com/apache/arrow/issues/45611), [GH-45917](https://github.com/apache/arrow/issues/45917)).

### Filesystems

Comment thread
assignUser marked this conversation as resolved.
Outdated
### Flight RPC

- The experimental Flight over UCX feature has been removed.
([#43296](https://github.com/apache/arrow/issues/43296))

### Parquet

Comment thread
assignUser marked this conversation as resolved.
Outdated
## C# Notes
Comment thread
assignUser marked this conversation as resolved.
Outdated
Comment thread
assignUser marked this conversation as resolved.

- Added support for the `Ordered` and `AppMetaData` fields to FlightInfo ([#45753](https://github.com/apache/arrow/pull/45753))
- FlightClient can now be integrated with [Grpc.Net.ClientFactory](https://learn.microsoft.com/en-us/aspnet/core/grpc/clientfactory?view=aspnetcore-9.0) ([#45451](https://github.com/apache/arrow/issues/45451))

## Linux Packaging Notes

Comment thread
assignUser marked this conversation as resolved.
https://apache.jfrog.io/ is still available but https://packages.apache.org/ is preferred because the latter uses the apache.org domain.

## Python Notes

Compatibility notes:
* Minimum supported Cython has been raised to 3 and higher [GH-45237](https://github.com/apache/arrow/issues/45237) .
* A subset of deprecated APIs have been removed
[GH-45680](https://github.com/apache/arrow/issues/45680):
``PARQUET_2_0`` [GH-45848](https://github.com/apache/arrow/issues/45848),
``use_legacy_dataset`` [GH-44790](https://github.com/apache/arrow/issues/44790),
serialize/deserialize PyArrow C++ code [GH-43587](https://github.com/apache/arrow/issues/43587) .

New features:
* Large variable width types are supported in NumPy conversion
[GH-35289](https://github.com/apache/arrow/issues/35289).
* Biased/unbiased option are available in skew and kurtosis compute functions
[GH-45733](https://github.com/apache/arrow/issues/45733).
* Support for SAS token in the ``AzureFileSystem`` has been added
[GH-45705](https://github.com/apache/arrow/issues/45705).
* Interchange of ``decimal32``, ``decimal64`` and ``decimal256`` data type objects between
Pandas and PyArrow is now supported [GH-45582](https://github.com/apache/arrow/issues/45582),
[GH-45570](https://github.com/apache/arrow/issues/45570).
* ``pyarrow.ArrayStatistics``and ``pyarrow.Array.statistics()`` are added
[GH-45457](https://github.com/apache/arrow/issues/45457).
* Bindings for JSON streaming reader are added
[GH-14932](https://github.com/apache/arrow/issues/14932).
* Bindings for ``MemoryPool::total_bytes_allocated`` and ``MemoryPool::num_allocations`` are added.
Also allocator-specific statistics can now be printed to stderr [GH-45358](https://github.com/apache/arrow/issues/45358).
* A new ``maps_as_pydicts``parameter is introduced to ``to_pylist``, ``to_pydict`` and ``as_py``
methods enabling deserialization into Python dictionary instead of list of tuples
[GH-39010](https://github.com/apache/arrow/issues/39010).

Other improvements:
* Source (sdist) and binary distribution (wheels) are now uploaded to GitHub Releases
[GH-45920](https://github.com/apache/arrow/issues/45920).
* Cython code has been cleaned up as we now require at least Cython 3.0
[GH-45433](https://github.com/apache/arrow/issues/45433).
* Building of free-threaded wheels on Windows is enabled
[GH-44421](https://github.com/apache/arrow/issues/44421) .
*Wheels for Alpine Linux are now provided [GH-18036](https://github.com/apache/arrow/issues/18036) .


Relevant bug fixes:
* Pandas conversion roundtrip with bytes column names error is fixed
[GH-44188](https://github.com/apache/arrow/issues/44188).
* Exceptions are raised instead of showing segfaults when users try to instantiate internal Parquet
metadata classes [GH-36628](https://github.com/apache/arrow/issues/36628).

## R Notes
Comment thread
assignUser marked this conversation as resolved.

- Binary Arrays now inherit from `blob::blob` in addition to `arrow_binary` when
[converted to R
objects](https://arrow.apache.org/docs/r/articles/data_types.html#translations-from-arrow-to-r).
This change is the first step in eventually deprecating the `arrow_binary`
class in favor of the `blob` class in the
[`blob`](https://cran.r-project.org/package=blob) package (See
[GH-45709](https://github.com/apache/arrow/issues/45709)).

## Ruby and C GLib Notes

### Improvements

- `garrow_array_validate()` / `Arrow::Array#validate`: Added.
- [GH-45328](https://github.com/apache/arrow/issues/45328)
- `garrow_array_validate_full()` / `Arrow::Array#validate_full`: Added.
- [GH-45342](https://github.com/apache/arrow/issues/45342)
- `garrow_record_batch_validate()` / `Arrow::RecordBatch#validate`: Added.
- [GH-45353](https://github.com/apache/arrow/issues/45353)
- `garrow_record_batch_validate_full()` /
`Arrow::RecordBatch#validate_full`: Added.
- [GH-45386](https://github.com/apache/arrow/issues/45386)
- `garrow_table_validate()` / `Arrow::Table#validate`: Added.
- [GH-45414](https://github.com/apache/arrow/issues/45414)
- `garrow_table_validate_full()` / `Arrow::Table#validate_full`: Added.
- [GH-45468](https://github.com/apache/arrow/issues/45468)
- `GArrowArrayStatistics()` / `Arrow::ArrayStatistics`: Added.
- [GH-45490](https://github.com/apache/arrow/issues/45490)
- Changed to require Meson 0.61.2 or later.
- `GArrowBinaryViewArray()` / `Arrow::BinaryViewArray`: Added.
- [GH-45650](https://github.com/apache/arrow/issues/45650)
- `GArrowStringViewArray()` / `Arrow::StringViewArray`: Added.
- [GH-45711](https://github.com/apache/arrow/issues/45711)
- Added support for
[rubygems-requirements-system](https://rubygems.org/gems/rubygems-requirements-system)
- [GH-45976](https://github.com/apache/arrow/issues/45976)

### Incompatible changes

- `gparquet_arrow_file_writer_new_row_group()` /
`Parquet:ArrowFileWriter#new_row_group`: Removed `chunk_size` argument.
- [GH-45088](https://github.com/apache/arrow/issues/45088)
- `garrow_record_batch_new()` / `Arrow::RecordBatch#initialize`:
Stopped validating automatically. If you want to validate a created
record batch, call `garrow_record_batch_validate()` /
`Arrow::RecordBatch#validate` explicitly.
- [GH-45353](https://github.com/apache/arrow/issues/45353)
- `garrow_table_new()` / `Arrow::Table#initialize`: Stopped validating
automatically. If you want to validate a created table, call
`garrow_table_validate()` / `Arrow::Table#validate` explicitly.
- [GH-45414](https://github.com/apache/arrow/issues/45414)

## Java, Go, and Rust Notes

The Java, Go, and Rust Go projects have moved to separate repositories outside
the main Arrow [monorepo](https://github.com/apache/arrow).

- For notes on the latest release of the [Java
implementation](https://github.com/apache/arrow-java), see the latest [Arrow
Java changelog][7].
- For notes on the latest release of the [Rust
implementation](https://github.com/apache/arrow-rs) see the latest [Arrow Rust
changelog][5].
- For notes on the latest release of the [Go
implementation](https://github.com/apache/arrow-go), see the latest [Arrow Go
changelog][6].

[1]: https://github.com/apache/arrow/milestone/65?closed=1
[2]: {{ site.baseurl }}/release/20.0.0.html#contributors
[3]: {{ site.baseurl }}/release/20.0.0.html#changelog
[4]: {{ site.baseurl }}/docs/r/news/
[5]: https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md
[6]: https://github.com/apache/arrow-go/releases
[7]: https://github.com/apache/arrow-java/releases