NEMDataTools Project Board
This guide outlines the project structure and implementation plan for the NEMDataTools Python package. The project is designed to provide a comprehensive toolkit for accessing, processing, and analyzing Australian Energy Market Operator (AEMO) data.
Status Update: This project has been successfully completed and is production-ready. The implementation exceeded the original scope with comprehensive functionality, testing, and documentation.
Note: This guide provides the original structure and setup plan. Actual implementation details may vary based on requirements discovered during development.
Phase 1: Project Setup (Milestones 1-2) ✅ COMPLETED
Milestone 1: Development Environment Setup ✅ COMPLETED
Install UV
pip install uv
or use the recommended curl installation methodConfigure UV with a
.uv.toml
file
Create Project Structure
Initialize git repository
Create directory structure following the src layout:
nemdatatools/ ├── .github/workflows/ ├── docs/ ├── src/nemdatatools/ ├── tests/ ├── .gitignore ├── LICENSE ├── pyproject.toml ├── README.md ├── .uv.toml
Configure Packaging
Create
pyproject.toml
with minimal dependenciesCreate virtual environment:
uv venv
Activate the environment:
source .venv/bin/activate
Install in development mode:
uv pip install -e ".[dev]"
Milestone 2: Core Module Skeletons ✅ COMPLETED
Implement Basic Module Structure ✅ COMPLETED
Create
__init__.py
with version infoAdd full implementations for:
downloader.py
- comprehensive HTTP handling, retry logiccache.py
- metadata-based intelligent cachingtimeutils.py
- AEST timezone, dispatch intervalsprocessor.py
- comprehensive data standardization functionsbatch_commands.py
- parallel download operationsmmsdm_helper.py
- MMSDM-specific utilitiesdata_source.py
- configuration for multiple data types
Set Up Testing Framework ✅ COMPLETED
Configured pytest with coverage
Comprehensive test files for all modules
GitHub Actions CI/CD workflows with automated testing
Phase 2: Core Functionality Implementation (Milestones 3-4)
Milestone 3: Time Utilities and Cache Management
Implement Time Utilities
Date parsing and formatting
Interval generation
Time period handling for AEMO data types
Forecast horizon calculations
Write comprehensive tests
Implement Cache Management
Cache directory structure
Metadata tracking
Cache lookup and retrieval
Cache invalidation
Write tests for caching logic
Milestone 4: Data Downloading
Map AEMO Data Sources
Identify public endpoints for each data type
Create URL templates
Implement data source mapping
Implement Downloader
HTTP request handling with retries
Error handling
Authentication (if needed)
Input validation
Integration with cache
Batch downloading for multiple regions
Test Downloading
Create mock responses for testing
Implement end-to-end tests
Verify cache integration
Phase 3: Data Processing (Milestones 5-6)
Milestone 5: Basic Data Processing
Implement Data Standardization
Column normalization
Data type conversion
Date parsing
Missing value handling
Data Type Handlers
DISPATCHPRICE handler
DISPATCHREGIONSUM handler
Write tests for each handler
Milestone 6: Advanced Processing
Implement Time Series Processing
Resampling and aggregation
Rolling statistics
Time-based filtering
Region-based processing
Tests for time series functions
Implement Predispatch Handlers
PREDISPATCH processing
P5MIN processing
Tests for forecast data handling
Statistical Functions
Price statistics
Demand statistics
Data aggregation utilities
Visualization helpers
Phase 4: Documentation and Examples (Milestone 7)
API Documentation
Set up Sphinx (or other documentation tool)
Document all public functions
Create API reference
Usage Examples
Basic usage examples
Advanced usage examples
Jupyter notebooks with real-world data
Installation and Setup Guide
UV-specific installation instructions
Development setup instructions
Dependency management guide
Phase 5: Quality Assurance and Release (Milestone 8)
Code Quality
Run and fix linting issues (black, isort)
Run and fix type checking issues (mypy)
Ensure 90%+ test coverage
Comprehensive Test Suite
Add edge cases to tests
Implement integration tests
Test on multiple Python versions
Performance Optimization
Profile code for bottlenecks
Optimize data loading and processing
Implement caching improvements
Prepare for Release
Update version number
Finalize README
Create CHANGELOG
Update package metadata
CI/CD Pipeline Setup
Set up GitHub Actions workflows
Include linting, testing, and deployment steps
Automate versioning and release tagging
Release
Build package with UV
Test installation in clean environment
Publish to PyPI
Phase 6: Continuous Improvement (Ongoing)
Monitor and Fix Issues
Address bug reports
Implement feature requests
Expand Supported Data Types
Add support for additional AEMO data types
Enhance processing capabilities
Community Building
Respond to user questions
Review and merge contributions
Update documentation based on user feedback
Development Practices
Throughout the implementation, adhere to these practices:
Dependency Management with UV
Add new dependencies to pyproject.toml
Install with
uv pip install -e ".[dev,docs]"
Periodically update with
uv pip install --upgrade -e ".[dev,docs]"
Create requirements.lock file:
uv pip freeze > requirements.lock
Version Control
Commit frequently with descriptive messages
Use feature branches for development
Create pull requests for code review
Testing
Write tests before implementation (TDD)
Maintain high test coverage
Include edge cases in tests
Documentation
Document code as you write it
Keep README and user guides up-to-date
Include examples for new features
Code Quality
Run black and isort before commits
Use mypy for type checking
Follow PEP 8 guidelines