NEMDataTools Implementation Statusο
This document describes the current implementation status of the NEMDataTools package. The project has achieved production readiness with comprehensive functionality.
Core Componentsο
1. Downloader Moduleο
The downloader module will be responsible for retrieving data from AEMOβs public data sources.
Key functionalities:
Determine the correct URLs for different data types
Handle authentication if required by AEMO
Manage HTTP requests with proper error handling
Support different data formats (CSV, XML, JSON)
Implement retry logic for failed requests
Provide flexible date range handling
Implementation approach:
def fetch_data(data_type, start_date, end_date, regions=None, cache_path=None):
"""
Main function to fetch data from AEMO.
1. Validate input parameters
2. Map data_type to appropriate endpoint
3. Check cache for existing data
4. If not cached, download from AEMO
5. Parse response
6. Cache results if cache_path provided
7. Return as DataFrame
"""
2. Cache Manager β IMPLEMENTEDο
The cache manager handles local storage of downloaded data to avoid unnecessary requests.
Implemented functionalities:
β Store downloaded data in structured format with metadata indexing
β Intelligent cache lookup with exact and partial date range matching
β Handle partial cache hits seamlessly
β Configurable cache TTL and invalidation
β Automatic disk space management
Implementation approach:
class CacheManager:
"""
Manages local caching of AEMO data.
"""
def __init__(self, cache_dir):
"""Initialize with cache directory."""
def get_cached_data(self, data_type, start_date, end_date, regions):
"""Check if data is cached and return it if available."""
def cache_data(self, data_type, start_date, end_date, regions, data):
"""Store data in cache."""
def clear_cache(self, older_than=None):
"""Clear cache entries older than specified date."""
3. Time Utilities β IMPLEMENTEDο
The time utilities module handles all time-related operations needed for AEMO data.
Implemented functionalities:
β Convert between different time formats with robust parsing
β Generate time periods for queries and analysis
β Proper AEST timezone handling for NEM data
β Support for dispatch intervals (5-minute, 30-minute, etc.)
β Forecast horizon calculations for pre-dispatch data
Implementation approach:
def generate_intervals(start_date, end_date, interval="5min"):
"""Generate time intervals between dates."""
def convert_nem_datetime(date_string):
"""Convert AEMO datetime format to Python datetime."""
def get_forecast_horizon(run_time, target_time):
"""Calculate forecast horizon between run time and target time."""
4. Data Processorο
The data processor will standardize and clean the raw data from AEMO.
Key functionalities:
Normalize column names
Convert data types
Handle missing values
Reshape data if needed
Provide consistent output format
Implementation approach:
def standardize(data, format_type=None):
"""
Standardize raw AEMO data.
1. Normalize column names
2. Set appropriate index
3. Convert data types
4. Handle missing values
5. Apply any specific formatting
"""
def merge_datasets(datasets, on=None):
"""Merge multiple datasets into one."""
5. Data Type Handlers β IMPLEMENTEDο
Specialized handlers for different AEMO data types with specific processing requirements.
Implemented data types:
β DISPATCHPRICE - with price validation and intervention handling
β DISPATCHREGIONSUM - with demand aggregation and regional indexing
β PREDISPATCHPRICE - with forecast horizon calculations
β DISPATCH_UNIT_SCADA - with generator output processing
β Multiple additional MMSDM and static data types
Implementation approach:
class DispatchPriceHandler:
"""Handler for DISPATCHPRICE data type."""
@staticmethod
def process(data):
"""Process DISPATCHPRICE data."""
class PredispatchHandler:
"""Handler for PREDISPATCH data type."""
@staticmethod
def process(data):
"""Process PREDISPATCH data."""
6. Batch Commands β IMPLEMENTEDο
The batch commands module provides efficient parallel downloading capabilities for bulk data operations.
Implemented Functionalities:ο
β Parallel downloads using ThreadPoolExecutor for efficiency
β Progress tracking with tqdm for user feedback
β Multi-table batch downloading
β Multi-year data fetching capabilities
β Error handling and retry logic for failed downloads
Configurable delays between requests (respecting AEMO rate limits)
Comprehensive error handling and logging
Flexible caching options
Backward compatibility
Implementation Details:ο
def download_yearly_data(years, tables, max_workers=3, delay=2):
"""
Download multiple years/tables in parallel.
Implementation:
1. Validate input parameters
2. Initialize ThreadPoolExecutor with max_workers
3. Create futures for each year/table combination
4. Track progress with tqdm progress bar
5. Handle results and errors as they complete
6. Return nested dictionary of results
"""
def download_multiple_tables(tables, start_date, end_date):
"""
Download multiple tables sequentially.
Implementation:
1. Validate input parameters
2. Iterate through requested tables
3. Call fetch_data for each table
4. Collect results in dictionary
5. Return dictionary of DataFrames
"""
Example Usage:ο
# Parallel yearly downloads
from nemdatatools.batch_commands import download_yearly_data
results = download_yearly_data(
years=[2022, 2023, 2024],
tables=["DISPATCHPRICE", "PREDISPATCHPRICE"],
max_workers=4, # Number of parallel downloads
delay=1, # Minimum delay between requests (seconds)
overwrite=False # Skip existing files
)
# Multiple table downloads
from nemdatatools.batch_commands import download_multiple_tables
results = download_multiple_tables(
table_names=["DISPATCHPRICE", "DISPATCHREGIONSUM"],
start_date="2024/01/01",
end_date="2024/01/31",
regions=["NSW1", "QLD1"] # Optional region filter
)
Key Parameters:ο
Parameter |
Type |
Description |
Default |
---|---|---|---|
max_workers |
int |
Maximum parallel downloads |
3 |
delay |
int |
Minimum delay between requests (seconds) |
2 |
overwrite |
bool |
Force re-download existing files |
False |
cache_path |
str |
Custom download directory |
βdata/aemo_dataβ |
Error Handling Strategy:ο
Failed downloads are logged with full error details
None is returned for failed downloads
Other downloads continue unaffected
Progress bar continues tracking completed tasks
Development Timelineο
See the Project Board for a detailed breakdown of tasks and milestones.
Implementation Detailsο
AEMO Data Accessο
AEMO provides data through several mechanisms:
MMS Data Model
Public data files
NEMWeb portal
Our implementation will focus on accessing publicly available data without requiring special credentials. Weβll use the following approach:
Map data types to appropriate public URLs
Use standard HTTP requests to fetch data
Parse returned data (typically CSV or XML)
Convert to pandas DataFrames for easy analysis
Caching Strategyο
Our caching strategy will be:
Create a directory structure based on data_type, regions, and date ranges
Store data in parquet format for efficiency
Implement metadata for each cache entry
Support partial cache hits by combining cached and newly fetched data
Provide cache management functions to control disk usage
Error Handlingο
Weβll implement robust error handling:
Validate all input parameters
Handle HTTP errors with appropriate retries and backoffs
Provide clear error messages for common issues
Log detailed diagnostic information
Support graceful degradation when services are unavailable
Future Extensionsο
After the initial implementation, we plan to extend the package with:
Support for more AEMO data types
Advanced visualization tools
Integration with other energy data sources
Time series analysis utilities
Forecasting tools
These extensions will be prioritized based on user feedback after the initial release.