Storage Path: tmp | Username: ahmed_test | Iterations: 3 | Generated: 2025-12-23
| Operation | Throughput | Notes |
|---|---|---|
| Fresh Inserts | ~15K records/sec | Consistent across all sizes (1K - 1M) |
| Updates | 594-11,713 records/sec | Depends on existing data size (1K-50K tested) |
| Table Loading | 29K-42K records/sec | 2-3x faster than inserts, improves with size |
Realistic Use Case (20K records/month) with ZivaApp-Ring:
- Insert: ~1.4s
- Load: ~0.5s
| Records | Avg Time (s) | Min (s) | Max (s) | Throughput (rec/s) |
|---|---|---|---|---|
| 1.0K | 0.069 | 0.066 | 0.073 | 14,496 |
| 10.0K | 0.659 | 0.632 | 0.685 | 15,202 |
| 100.0K | 6.014 | 5.875 | 6.246 | 16,641 |
| 1.0M | 61.103 | 59.544 | 63.166 | 16,376 |
Performance Breakdown (per 1K records):
- JSON parsing: ~8ms
- Datetime conversion: ~5ms
- Schema validation: ~9ms
- Record processing: ~15ms
- File writing: ~22ms
Note: Updates require loading all existing records into memory. Performance depends on the size of existing data.
Testing updates of 1,000 records against different amounts of existing data:
| Existing Records | Avg Update Time (s) | Min (s) | Max (s) | Throughput (rec/s) |
|---|---|---|---|---|
| 1.0K | 0.087 | 0.072 | 0.104 | 11,713 |
| 10.0K | 0.390 | 0.367 | 0.425 | 2,575 |
| 20.0K | 0.653 | 0.646 | 0.663 | 1,531 |
| 50.0K | 1.692 | 1.572 | 1.859 | 594 |
Key Insight: Update performance degrades as existing dataset grows:
- 1K existing: ~11.7K records/sec (comparable to fresh inserts)
- 10K existing: ~2.6K records/sec (4.5x slower)
- 20K existing: ~1.5K records/sec (10x slower)
- 50K existing: ~594 records/sec (25x slower)
Bottleneck Breakdown (for 20K existing records):
- File loading: ~250ms (loading 20K existing records)
- File writing: ~370ms (rewriting updated file)
- Record processing: ~13ms (actual update logic)
Note: Test sizes based on realistic monthly data volumes. Max expected: ~20K records/file/month (zivaapp ring usage).
| Records | Insert (s) | Load (s) | Load Speedup | Throughput (rec/s) |
|---|---|---|---|---|
| 1.0K | 0.071 | 0.034 | 2.09x faster | 29,256 |
| 10.0K | 0.797 | 0.284 | 2.81x faster | 35,229 |
| 20.0K | 1.415 | 0.482 | 2.94x faster | 41,454 |
| 50.0K | 3.306 | 1.174 | 2.82x faster | 42,606 |
Key Insight: Loading throughput improves with table size (29K → 42K records/sec)
| Existing Data | Fresh Insert 1K (s) | Update 1K (s) | Update Penalty |
|---|---|---|---|
| 1K existing | 0.069 | 0.087 | 1.26x slower |
| 10K existing | 0.069 | 0.390 | 5.65x slower |
| 20K existing | 0.069 | 0.653 | 9.46x slower |
| 50K existing | 0.069 | 1.692 | 24.5x slower |
| 1M+ existing | 0.069 | 31.048 | 450x slower |
Note: The 1M+ existing records case shows extreme degradation and is not typical for zivaapp ring usage (max ~20K/month).
| Records | Fresh Insert (s) | Update with 1M+ Existing (s) | Update Penalty |
|---|---|---|---|
| 1.0K | 0.069 | 31.048 | 450x slower |
| 10.0K | 0.659 | 33.075 | 50x slower |
| 100.0K | 6.014 | 38.249 | 6x slower |
| Records | Insert (s) | Load (s) | Load Advantage |
|---|---|---|---|
| 1.0K | 0.071 | 0.034 | 2.09x faster |
| 10.0K | 0.797 | 0.284 | 2.81x faster |
| 20.0K | 1.415 | 0.482 | 2.94x faster |
| 50.0K | 3.306 | 1.174 | 2.82x faster |
- Consistent insert performance: ~15K records/sec regardless of size (1K - 1M)
- Fast loading: 2-3x faster than inserts, excellent for queries
- Linear scaling: Time increases proportionally with record count
- Realistic use case optimized: 20K records/month = ~1.4s insert, ~0.5s load
- Updates slow down with existing data:
- 1K existing: ~11.7K rec/s (comparable to inserts)
- 20K existing: ~1.5K rec/s (10x slower)
- 50K existing: ~594 rec/s (25x slower)
- Update bottleneck: File loading + file writing (must load all existing records)
- For realistic use case (20K/month): Updates take ~0.65s for 1K records (acceptable)
- Recommendation: For very large datasets (100K+), consider batch updates or alternative strategies
The benchmarks used the following table(activitydetails) schema, which can influence performance:
{
"date": {
"type": "int",
"required": true,
"unique": true,
"datetime": true
},
"distance": {
"type": "int|float"
},
"step": {
"type": "int"
},
"calories": {
"type": "int|float"
},
"arraySteps": {
"type": "array"
},
"is_sync": {
"type": "bool"
}
}Schema Characteristics:
- 6 fields total (date, distance, step, calories, arraySteps, is_sync)
- 1 unique field (date) - used for update detection
- 1 datetime field (date) - requires timestamp conversion
- 1 array field (arraySteps) - contains 10 integer elements
- Mixed types: int, float, bool, array
- No validation constraints (min/max removed for these benchmarks)
Note: Performance may vary with different schemas:
- More fields = longer validation/processing time
- More complex validation rules = longer validation time
- Larger arrays = more memory/processing overhead
- Storage: Local filesystem (tmp directory)
- Partitioning: Hive-style (partition_date=YYYY-MM-DD)
- File Format: Parquet
- Bucket Interval: 43200 (monthly)
- Test Methodology: Each test run 3 times, statistics calculated across iterations