Skip to content

docs: missing optional error_rate parameter in APPROX_COUNT_DISTINCT documentation #3296

@sundy-li

Description

@sundy-li

What's Missing

The APPROX_COUNT_DISTINCT aggregate function accepts an optional error rate parameter that controls the precision of the HyperLogLog estimation, but the current documentation only shows the single-argument form.

Source File

/workspace/databend/src/query/functions/src/aggregates/aggregate_approx_count_distinct.rs

What It Does

The function signature is:

APPROX_COUNT_DISTINCT(<expr> [, <error_rate>])
-- or equivalently:
APPROX_COUNT_DISTINCT(<error_rate>)(<expr>)

When error_rate is provided (a float64 value), the precision parameter p is computed as:

p = ceil(log2((1.04 / error_rate)^2))

and clamped to the range [4, 14]. The default precision is p = 14 (approximately 0.81% error rate). A higher error rate means fewer bits of precision and faster computation.

Current Documentation

/workspace/databend-docs/docs/en/sql-reference/20-sql-functions/07-aggregate-functions/aggregate-approx-count-distinct.md

The current doc only documents:

APPROX_COUNT_DISTINCT(<expr>)

Suggested Doc Location

/workspace/databend-docs/docs/en/sql-reference/20-sql-functions/07-aggregate-functions/aggregate-approx-count-distinct.md

The doc should be updated to show the optional error_rate parameter, explain the precision/accuracy tradeoff, and include an example using a custom error rate such as APPROX_COUNT_DISTINCT(user_id, 0.05).

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions