This directory contains sample/synthetic energy data files that demonstrate the required data format for HEMA.
All data files in this directory are SYNTHETIC and NOT from actual households.
energy_data_sample.csv— Synthetic energy consumption dataappliance_thresholds_sample.csv— Synthetic threshold valuesutility_rate_sample.csv— Sample rate structure (representative only)These files are intended for:
- Understanding required data formats
- Testing HEMA functionality without sensitive data
- Quick demonstrations and evaluation
- Development and debugging
For real analysis, replace with your own energy data from:
- Your utility provider's smart meter downloads
- Pecan Street Dataport (academic access)
- Your own home energy monitoring devices
Research Note: The manuscript evaluation used real data from Pecan Street Dataport with proper consent. This repository includes only generic samples to allow public distribution without privacy concerns.
15-minute interval energy consumption data for individual appliances.
Format:
- Column 1: Timestamp (ISO 8601 format with timezone)
- Columns 2+: Power consumption in kW for each appliance
Example:
local_15min,HVAC unit,Refrigerator,Electric water heater,Washing machine,Clothes dryer,...
2024-01-15 00:00:00+00:00,0.5,0.2,0.1,0.0,0.0,...
2024-01-15 00:15:00+00:00,0.5,0.2,0.1,0.0,0.0,...
Requirements:
- Timestamps must be consecutive at 15-minute intervals
- Power values must be non-negative
- Values can be 0 for appliances not in use
- Include actual appliance columns that exist in your home (e.g., if you don't have a pool pump, exclude that column)
Typical Coverage:
- Multiple weeks of continuous data (30-90 days recommended for meaningful analysis)
- All hours of day and all seasons for comprehensive analysis
- Examples: 2023-06-30 through 2023-08-15 (summer pattern with AC usage), or full year for seasonal variation
Threshold values to automatically detect when appliances turn "on".
Used by the Analysis Agent to identify appliance usage periods.
Format:
appliance_name,threshold_kw
HVAC unit,0.5
Refrigerator,0.2
Electric water heater,0.15
Washing machine,0.1
...
Requirements:
- One row per appliance (must match appliance columns in energy data)
- Threshold should be the minimum power draw that indicates the appliance is actively running
- For always-on appliances (refrigerator), use a value that distinguishes normal operation from idle
Guidelines:
- HVAC/Heating units: 0.4-1.0 kW (compressor power)
- Electric water heater: 0.1-0.5 kW depending on capacity
- EV charger: 1.0-7.0 kW depending on charging mode
- Clothes dryer: 2.0-5.0 kW
- Washing machine: 0.05-0.5 kW
- Dishwasher: 0.1-0.5 kW
- Microwave/Oven: 1.0-3.0 kW
- Refrigerator: 0.1-0.3 kW (continuous baseline)
- Pool pump: 0.5-2.0 kW
Important: Thresholds File is OPTIONAL ✅
- The HEMA system works without appliance thresholds
- If no thresholds file is provided, the system uses a default 0.01 kW threshold for all appliances
- Thresholds improve accuracy of appliance frequency analysis and standby power detection
- For best results: provide accurate thresholds specific to your appliances
- Without thresholds: HEMA still analyzes consumption patterns, just with less appliance-level granularity
Time-of-use (TOU) electricity rate structure.
Defines different rates for different hours and seasons.
Energy education documents for the Knowledge Agent's Retrieval-Augmented Generation (RAG) system.
The Knowledge Agent uses semantic search over these documents to answer energy-related questions. See README Knowledge Base & RAG section for details.
Directory Structure:
data/knowledge_base/
├── guides/ # Energy efficiency guides
│ └── energy-saver-guide-2022.pdf # General energy saving tips
├── utility_rates/ # Rate and pricing information
│ ├── austin_energy_rates.md # TOU rate schedules
│ └── COA-Utilities-Rates-and-Fees.pdf
└── rebates/ # Incentive programs
└── austin_energy_rebates.md # Available rebates and incentives
Adding Your Own Documents:
To include custom energy education documents:
- Place documents in appropriate
knowledge_base/subdirectories - Supported formats: PDF, Markdown, plain text
- On first Knowledge Agent query, the system will automatically:
- Load all documents from
knowledge_base/ - Create semantic chunks for searching
- Build vector embeddings (requires OPENAI_API_KEY)
- Cache in
data/vector_index/(not tracked in git)
- Load all documents from
Examples of content to add:
- Local utility rebate programs
- ENERGY STAR appliance specifications
- Renewable energy guidelines
- Home efficiency improvement guides
- Demand response program information
Format:
hour_start,hour_end,month,day_type_id,rate_kwh,adjustment_kwh
0,5,1,1,0.095,0.0
5,9,1,1,0.121,0.0
9,14,1,1,0.095,0.0
14,20,1,1,0.215,0.0
20,23,1,1,0.095,0.0
...
Columns:
hour_start: Hour when rate begins (0-23)hour_end: Hour when rate ends (0-23, exclusive)month: Month number (1-12)day_type_id: 1 = weekday, 2 = weekend/holidayrate_kwh: Electricity price in $/kWhadjustment_kwh: Grid adjustment factor (typically 0.0)
Requirements:
- Must cover all hours of the day (0-24) for every month and day type
- No overlapping time periods for same month/day_type
- Rates should reflect actual utility rates (check your electric bill)
Common TOU Patterns:
- Off-peak (night/early morning): $0.08-0.10/kWh (typically 9 PM - 5 AM)
- Mid-peak (morning/evening): $0.10-0.12/kWh (typically 5-9 AM and 9 PM onwards)
- Peak (afternoon/early evening): $0.20-0.30/kWh (typically 2-8 PM)
- Weekend rates: Generally 10-20% lower than weekday
| Use Case | Duration | Records | File Size |
|---|---|---|---|
| Quick testing | 1 week | 672 | ~30 KB |
| Basic evaluation | 30 days | 2,880 | ~130 KB |
| Comprehensive evaluation | 90 days | 8,640 | ~390 KB |
| Full year (recommended) | 365 days | 35,040 | ~1.6 MB |
-
Export from your energy monitoring system:
- HEMA requires appliance-level (sub-metered) data, not whole-home smart meter data
- Use home energy monitors with circuit-level tracking (e.g., Sense, Emporia Vue, IoTaWatt)
- Or access appliance-level datasets like Pecan Street Dataport (academic access)
-
Format your data:
- Convert timestamps to ISO 8601 format with UTC timezone
- Organize by appliance/circuit name
- Ensure no missing intervals (HEMA requires complete sequences)
-
Set thresholds:
- Analyze your consumption patterns to determine on/off thresholds
- Start conservative and adjust based on HEMA's analysis results
-
Verify rates:
- Use your actual electric bill's rate schedule
- Include all time-of-use periods from your utility
-
Place in data/ directory:
# Option A: Replace the existing files cp my_energy_data.csv data/home_power/energy_use_data_XXXX.csv cp my_thresholds.csv data/home_power/appliance_thresholds_XXXX.csv cp my_rates.csv data/utility_rate/utility_rate_XXX.csv # Option B: Keep samples and create new directory mkdir data/samples/my_home cp my_energy_data.csv data/samples/my_home/
- The sample data provided here is synthetic and realistic but not from an actual home
- Power values are representative of typical household appliances
- For production analysis, replace with your actual home energy data
- HEMA's Analysis Agent works best with at least 30 days of data for meaningful patterns
- For seasonal analysis (e.g., comparing summer AC usage), include full season data
HEMA's evaluation framework and all tools work correctly without appliance thresholds:
# Evaluation runs fine without thresholds
python -m evaluation.run_experiment --persona confused_newcomer --scenario understand_utility_rate
# Even without data/home_power/appliance_thresholds_sample.csv fileWhat happens when thresholds are missing:
- System logs a warning (not an error)
- Tools use default 0.01 kW threshold for all appliances
- Energy analysis continues normally with aggregated/total consumption metrics
- Evaluation metrics are calculated successfully
For Evaluation:
- The 23 evaluation metrics don't depend on threshold accuracy
- Metrics like "factual accuracy", "user questions", "response quality" are independent of thresholds
- Appliance frequency analysis would use default thresholds (still valid, just less precise)
When thresholds ARE provided:
- Appliance-specific analysis is more accurate
- Standby power filtering works better
- Frequency detection is more precise
- Users get better appliance-level insights
Testing approach:
- Test without thresholds first → validates core functionality
- Add thresholds file → improves appliance-level analysis accuracy
- Compare results to see the difference
- These samples contain no real household data
- If using your own data, consider privacy implications before sharing
- Pecan Street dataset (used in manuscript) contains real data with consent