What's Missing
The APPROX_COUNT_DISTINCT aggregate function accepts an optional error rate parameter that controls the precision of the HyperLogLog estimation, but the current documentation only shows the single-argument form.
Source File
/workspace/databend/src/query/functions/src/aggregates/aggregate_approx_count_distinct.rs
What It Does
The function signature is:
APPROX_COUNT_DISTINCT(<expr> [, <error_rate>])
-- or equivalently:
APPROX_COUNT_DISTINCT(<error_rate>)(<expr>)
When error_rate is provided (a float64 value), the precision parameter p is computed as:
p = ceil(log2((1.04 / error_rate)^2))
and clamped to the range [4, 14]. The default precision is p = 14 (approximately 0.81% error rate). A higher error rate means fewer bits of precision and faster computation.
Current Documentation
/workspace/databend-docs/docs/en/sql-reference/20-sql-functions/07-aggregate-functions/aggregate-approx-count-distinct.md
The current doc only documents:
APPROX_COUNT_DISTINCT(<expr>)
Suggested Doc Location
/workspace/databend-docs/docs/en/sql-reference/20-sql-functions/07-aggregate-functions/aggregate-approx-count-distinct.md
The doc should be updated to show the optional error_rate parameter, explain the precision/accuracy tradeoff, and include an example using a custom error rate such as APPROX_COUNT_DISTINCT(user_id, 0.05).
What's Missing
The
APPROX_COUNT_DISTINCTaggregate function accepts an optional error rate parameter that controls the precision of the HyperLogLog estimation, but the current documentation only shows the single-argument form.Source File
/workspace/databend/src/query/functions/src/aggregates/aggregate_approx_count_distinct.rsWhat It Does
The function signature is:
When
error_rateis provided (a float64 value), the precision parameterpis computed as:and clamped to the range
[4, 14]. The default precision isp = 14(approximately 0.81% error rate). A higher error rate means fewer bits of precision and faster computation.Current Documentation
/workspace/databend-docs/docs/en/sql-reference/20-sql-functions/07-aggregate-functions/aggregate-approx-count-distinct.mdThe current doc only documents:
Suggested Doc Location
/workspace/databend-docs/docs/en/sql-reference/20-sql-functions/07-aggregate-functions/aggregate-approx-count-distinct.mdThe doc should be updated to show the optional
error_rateparameter, explain the precision/accuracy tradeoff, and include an example using a custom error rate such asAPPROX_COUNT_DISTINCT(user_id, 0.05).