Skip to content

reorganized stats/chunk containers#2999

Open
TjarkMiener wants to merge 2 commits into
mainfrom
container_maintenance
Open

reorganized stats/chunk containers#2999
TjarkMiener wants to merge 2 commits into
mainfrom
container_maintenance

Conversation

@TjarkMiener
Copy link
Copy Markdown
Member

We noticed in #2996 that containers dealing with chunks and stats needed some maintenance and docstring polishing.

Copy link
Copy Markdown
Member

@kosack kosack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just need a changelog entry

mexanick
mexanick previously approved these changes Apr 28, 2026
Comment thread src/ctapipe/containers.py
Comment thread src/ctapipe/containers.py
Comment on lines +1262 to +1267
class ChunkStatisticsContainer(ChunkContainer):
"""Container for descriptive statistics of the chunk distribution"""

mean = Field(None, "mean value of the chunk distribution")
median = Field(None, "median value of the chunk distribution")
std = Field(None, "standard deviation of the chunk distribution")
Copy link
Copy Markdown
Member Author

@TjarkMiener TjarkMiener Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of replacing this duplication by

class ChunkStatisticsContainer(ChunkContainer):
    """Container for descriptive statistics of the chunk distribution"""
    stats = Field(
        default_factory=StatisticsContainer,
        description="Statistical description of the chunk distribution",
    )

which would be nice but would also lead to breaking changes here and elsewhere.
@mexanick @maxnoe @kosack

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we do elsewhere is have separate containers for the index and the data and write multiple containers to the same table.

I.e. you could have the ChunkContainer, StatsContainer and a HistogramContainer and write like this:

writer.write(table, (chunk_container, stats_container))

like this, you also don't need separate containers for the interpolated result and the chunk storage.

Copy link
Copy Markdown
Member

@maxnoe maxnoe Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See e.g.:

self._writer.write(
table_name="simulation/event/subarray/shower",
containers=[event.index, event.simulation.shower],
)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think n_events should be moved to the StatisticsContainer and then compute_stats() from the PlainAggregator and SigmaClippingAggregator should return StatisticsContainer and the compute_histo() from HistorgramAggregator from #2996 should return a HistogramContainer

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmh, that seems a bit weird. the n_events is a property of the chunk and should be the same for all chunk aggregations, regardless of whether you compute a histogram or stats or something else.

n_events should be really the number of events inside the chunk, which is very interesting in case of time-based chunks.

It should also be independent from which values are actually used due to e.g. outlier detection, sigma clipping or under/overflow.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, the number of events are related to the chunks and the number of entries are related to the number of values used in the aggregation.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxnoe Coming back to this now; sorry for the delay! I’d slightly prefer to keep the changes as they are for now and avoid introducing breaking changes, since the aggregation and chunking components are already used across various pipelines. We could tackle a major reorganization of the containers after this PR and the dependent histogramaggregation PR #2996.

@TjarkMiener TjarkMiener changed the base branch from main to docs-build May 26, 2026 12:00
@TjarkMiener TjarkMiener force-pushed the container_maintenance branch from 621b015 to 574490c Compare May 26, 2026 12:02
Base automatically changed from docs-build to main May 26, 2026 13:24
@maxnoe maxnoe dismissed mexanick’s stale review May 26, 2026 13:24

The base branch was changed.

@TjarkMiener
Copy link
Copy Markdown
Member Author

TjarkMiener commented May 26, 2026

After the docs fix #3007 has been merged the CI is not running here and in #2996. Can you please have a look and help out, @maxnoe? Thanks a lot in advance!

EDIT: After the latest push, all CI jobs are running! Sorry for the noise!

@ctao-sonarqube
Copy link
Copy Markdown

@TjarkMiener TjarkMiener requested a review from mexanick May 26, 2026 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants