Skip to content

Latest commit

 

History

History
354 lines (281 loc) · 10.1 KB

File metadata and controls

354 lines (281 loc) · 10.1 KB

Performance Analysis & Optimization Guide

📊 Benchmark Results Summary

Based on our comprehensive benchmarking suite (v0.3.1), here's a detailed analysis of performance characteristics and optimization opportunities.

🚀 Performance Highlights

CRDT Operations - Excellent Performance

LWW Register Operations:

  • Creation (1 item): ~1.05 µs (sub-microsecond)
  • Creation (10 items): ~12.0 µs (linear scaling)
  • Creation (100 items): ~113.5 µs (linear scaling)
  • Creation (1000 items): ~975.8 µs (linear scaling)

LWW Register Merges:

  • Merge (1 item): ~24.4 ns (nanosecond range)
  • Merge (10 items): ~288.2 ns (nanosecond range)
  • Merge (100 items): ~2.47 µs (microsecond range)
  • Merge (1000 items): ~23.1 µs (microsecond range)

GCounter Operations:

  • Increment (10 items): ~110.6 ns (nanosecond range)
  • Increment (100 items): ~1.26 µs (microsecond range)
  • Increment (1000 items): ~11.7 µs (microsecond range)
  • Increment (10000 items): ~119.8 µs (microsecond range)

Collection Operations - Good Performance

Insert Operations:

  • 10 items: ~1.63 µs
  • 100 items: ~13.6 µs
  • 1000 items: ~144.7 µs
  • 10000 items: ~1.44 ms

Get Operations:

  • 10 items: ~3.94 ns (nanosecond range)
  • 100 items: ~42.5 ns (nanosecond range)
  • 1000 items: ~382.7 ns (nanosecond range)

🔍 Identified Bottlenecks

1. Large-Scale Operations

Warning Signs:

Warning: Unable to complete 100 samples in 5.0s. 
You may wish to increase target time to 5.2s, enable flat sampling, 
or reduce sample count to 60.

Affected Operations:

  • LWW Register creation (1000 items): Target time exceeded
  • LWW Map operations (10000 items): Target time exceeded

Root Cause Analysis:

  • Memory Allocation: Large vector allocations for bulk operations
  • Serialization Overhead: JSON serialization scales poorly with size
  • Hash Map Resizing: HashMap reallocation during bulk inserts

2. Storage Operations - Performance Degradation

Performance Drop:

  • Set (10 items): ~2.99 µs
  • Set (100 items): ~235.5 µs (78x slower)
  • Set (1000 items): ~4.29 ms (1435x slower)

Get Operations:

  • Get (10 items): ~427.1 ns
  • Get (100 items): ~8.55 µs (20x slower)
  • Get (1000 items): ~541.8 µs (1269x slower)

Root Cause:

  • Linear Search: O(n) complexity in storage implementations
  • Memory Fragmentation: Poor memory locality for large datasets
  • Serialization Bottleneck: JSON encoding/decoding overhead

3. Memory Usage Patterns

Memory Allocation Times:

  • LWW Register: ~9.53 µs
  • LWW Map: ~130.2 µs (13.7x slower)
  • GCounter: ~15.9 µs

Analysis:

  • HashMap Overhead: LWW Map has significant memory allocation cost
  • Metadata Storage: Additional metadata increases memory footprint
  • Fragmentation: Large data structures cause memory fragmentation

🛠️ Optimization Strategies

1. Immediate Optimizations (High Impact)

A. Bulk Operations Optimization

// Current: Individual operations
for item in items {
    collection.insert(&item.id, &item).await?;
}

// Optimized: Batch operations
let batch: Vec<_> = items.into_iter()
    .map(|item| (item.id, item))
    .collect();
collection.insert_batch(batch).await?;

B. Memory Pool for CRDTs

use std::sync::Arc;
use parking_lot::Mutex;

struct CRDTMemoryPool {
    lww_registers: Arc<Mutex<Vec<LwwRegister<String>>>>,
    lww_maps: Arc<Mutex<Vec<LwwMap<String, String>>>>,
    gcounters: Arc<Mutex<Vec<GCounter>>>,
}

impl CRDTMemoryPool {
    fn get_lww_register(&self) -> LwwRegister<String> {
        self.lww_registers.lock().pop()
            .unwrap_or_else(|| LwwRegister::new(String::new(), ReplicaId::default()))
    }
    
    fn return_lww_register(&self, register: LwwRegister<String>) {
        self.lww_registers.lock().push(register);
    }
}

C. Serialization Optimization

// Use bincode instead of JSON for better performance
use bincode::{serialize, deserialize};

impl<T: Serialize + DeserializeOwned> CRDT<T> {
    fn to_bytes(&self) -> Result<Vec<u8>, Box<dyn std::error::Error>> {
        Ok(serialize(self)?)
    }
    
    fn from_bytes(bytes: &[u8]) -> Result<Self, Box<dyn std::error::Error>> {
        Ok(deserialize(bytes)?)
    }
}

2. Medium-Term Optimizations

A. Indexed Storage

use std::collections::BTreeMap;

struct IndexedStorage {
    primary: HashMap<String, Vec<u8>>,
    indices: BTreeMap<String, Vec<String>>, // key -> [id]
}

impl IndexedStorage {
    fn get_by_index(&self, index_name: &str, value: &str) -> Vec<String> {
        self.indices.get(&format!("{}:{}", index_name, value))
            .cloned()
            .unwrap_or_default()
    }
}

B. Lazy Loading

struct LazyCRDT<T> {
    id: String,
    data: Option<T>,
    storage: Arc<dyn Storage>,
}

impl<T> LazyCRDT<T> {
    async fn get(&mut self) -> Result<&T, Box<dyn std::error::Error>> {
        if self.data.is_none() {
            let bytes = self.storage.get(&self.id).await?;
            self.data = Some(bincode::deserialize(&bytes)?);
        }
        Ok(self.data.as_ref().unwrap())
    }
}

C. Compression for Large Data

use flate2::{Compress, Decompress, Compression};

struct CompressedStorage {
    inner: Box<dyn Storage>,
    compression_level: Compression,
}

impl CompressedStorage {
    async fn set_compressed(&self, key: &str, value: &[u8]) -> Result<(), Box<dyn std::error::Error>> {
        let mut compress = Compress::new(self.compression_level, false);
        let mut compressed = Vec::new();
        
        compress.compress_vec(value, &mut compressed, flate2::FlushCompress::Finish)?;
        self.inner.set(key, &compressed).await?;
        
        Ok(())
    }
}

3. Long-Term Architectural Improvements

A. Streaming CRDTs

use tokio::sync::mpsc;

struct StreamingCRDT<T> {
    sender: mpsc::Sender<CRDTOperation<T>>,
    receiver: mpsc::Receiver<CRDTOperation<T>>,
}

enum CRDTOperation<T> {
    Insert(String, T),
    Update(String, T),
    Delete(String),
    Merge(String, T),
}

impl<T> StreamingCRDT<T> {
    async fn process_operations(&mut self) {
        while let Some(op) = self.receiver.recv().await {
            match op {
                CRDTOperation::Insert(key, value) => {
                    // Process insert
                }
                CRDTOperation::Update(key, value) => {
                    // Process update
                }
                // ... other operations
            }
        }
    }
}

B. Hierarchical CRDTs

struct HierarchicalCRDT<T> {
    root: LwwRegister<T>,
    children: LwwMap<String, Box<HierarchicalCRDT<T>>>,
}

impl<T> HierarchicalCRDT<T> {
    fn get_path(&self, path: &[String]) -> Option<&T> {
        if path.is_empty() {
            Some(self.root.value())
        } else {
            let child = self.children.get(&path[0])?;
            child.get_path(&path[1..])
        }
    }
}

📈 Performance Targets

Short Term (Next Release)

  • LWW Register Creation (1000 items): < 500 µs (2x improvement)
  • Storage Set (1000 items): < 2 ms (2x improvement)
  • Memory Allocation: < 50% reduction in allocation time

Medium Term (3 months)

  • Large Scale Operations: Support 100k+ items efficiently
  • Memory Usage: 30% reduction in memory footprint
  • Serialization: 5x improvement in large data serialization

Long Term (6 months)

  • Real-time Performance: < 1ms latency for 10k+ item operations
  • Memory Efficiency: 50% reduction in memory usage
  • Scalability: Support 1M+ items with linear performance

🧪 Testing Optimization Impact

Benchmark Before/After

# Before optimization
cargo bench --package leptos-sync-core --bench crdt_benchmarks

# After optimization
cargo bench --package leptos-sync-core --bench crdt_benchmarks

# Compare results
cargo bench --package leptos-sync-core --bench crdt_benchmarks -- --verbose

Memory Profiling

# Install memory profiler
cargo install memory-profiler

# Profile memory usage
memory-profiler --output memory-profile.json cargo bench

Performance Regression Testing

#[test]
fn test_performance_regression() {
    let start = std::time::Instant::now();
    
    // Perform operation
    let mut crdt = LwwRegister::new("test", ReplicaId::default());
    for i in 0..1000 {
        crdt.merge(&LwwRegister::new(format!("value_{}", i), ReplicaId::default())).unwrap();
    }
    
    let duration = start.elapsed();
    assert!(duration < std::time::Duration::from_millis(1), 
            "Performance regression: operation took {:?}", duration);
}

🎯 Implementation Priority

Phase 1: High Impact, Low Effort

  1. Bulk Operations: Implement batch insert/update operations
  2. Serialization: Switch from JSON to bincode for internal operations
  3. Memory Pool: Implement object pooling for frequently used CRDTs

Phase 2: Medium Impact, Medium Effort

  1. Indexed Storage: Add secondary indices for faster lookups
  2. Lazy Loading: Implement on-demand data loading
  3. Compression: Add compression for large data sets

Phase 3: High Impact, High Effort

  1. Streaming CRDTs: Implement async streaming operations
  2. Hierarchical CRDTs: Add tree-based data structures
  3. Advanced Caching: Implement intelligent cache invalidation

📊 Monitoring & Metrics

Key Performance Indicators

  • Operation Latency: P50, P95, P99 percentiles
  • Memory Usage: Peak and average memory consumption
  • Throughput: Operations per second at various scales
  • Error Rates: Failed operations and retry counts

Alerting Thresholds

  • Latency: > 10ms for single operations
  • Memory: > 100MB for 1k items
  • Throughput: < 1000 ops/sec for 10k items
  • Errors: > 1% failure rate

Next Steps: Start with Phase 1 optimizations for immediate impact, then move to Phase 2 for sustained improvements. Monitor performance metrics continuously to ensure optimizations deliver expected results.