⚡ Optimize Parallel Raster Chunk Allocation#26
Merged
Conversation
Replaces inefficient vector allocations with direct sequence generation using ALTREP. Achieves >30,000x speedup for 10M cell chunks.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR optimizes the cell ID generation in parallel raster chunk processing by replacing inefficient vector arithmetic with direct sequence generation. The optimization leverages R's ALTREP feature to avoid allocating two large intermediate vectors (rows and cols), significantly reducing memory usage and computation time.
Changes:
- Replaced row/col vector generation with direct linear cell ID sequence calculation in
.compute_raster_chunk() - Applied the same optimization to the inline computation in
stream_raster_parallel_mirai()
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Performance Optimization Task
💡 What
Replaced the inefficient generation of rows and cols vectors with a direct linear sequence generation for cell_ids in R/stream_grid_raster_parallel.R.
🎯 Why
The original code allocated two large intermediate vectors (nrows * ncols in size) to compute cell IDs using vector arithmetic. Since the raster is filled in row-major order, the cell IDs form a continuous integer sequence. The optimized approach uses start:end, leveraging R's ALTREP (Alternative Representations) to avoid memory allocation almost entirely during the index generation phase.
📊 Measured Improvement
✅ Verification