NEMDataTools Project Board

This guide outlines the project structure and implementation plan for the NEMDataTools Python package. The project is designed to provide a comprehensive toolkit for accessing, processing, and analyzing Australian Energy Market Operator (AEMO) data.

Status Update: This project has been successfully completed and is production-ready. The implementation exceeded the original scope with comprehensive functionality, testing, and documentation.

Note: This guide provides the original structure and setup plan. Actual implementation details may vary based on requirements discovered during development.

Phase 1: Project Setup (Milestones 1-2) ✅ COMPLETED

Milestone 1: Development Environment Setup ✅ COMPLETED

Install UV
- pip install uv or use the recommended curl installation method
- Configure UV with a .uv.toml file

Create Project Structure

Initialize git repository

Create directory structure following the src layout:

nemdatatools/
├── .github/workflows/
├── docs/
├── src/nemdatatools/
├── tests/
├── .gitignore
├── LICENSE
├── pyproject.toml
├── README.md
├── .uv.toml

Configure Packaging
- Create pyproject.toml with minimal dependencies
- Create virtual environment: uv venv
- Activate the environment: source .venv/bin/activate
- Install in development mode: uv pip install -e ".[dev]"

Milestone 2: Core Module Skeletons ✅ COMPLETED

Implement Basic Module Structure ✅ COMPLETED
- Create __init__.py with version info
- Add full implementations for:
  - downloader.py - comprehensive HTTP handling, retry logic
  - cache.py - metadata-based intelligent caching
  - timeutils.py - AEST timezone, dispatch intervals
  - processor.py - comprehensive data standardization functions
  - batch_commands.py - parallel download operations
  - mmsdm_helper.py - MMSDM-specific utilities
  - data_source.py - configuration for multiple data types
Set Up Testing Framework ✅ COMPLETED
- Configured pytest with coverage
- Comprehensive test files for all modules
- GitHub Actions CI/CD workflows with automated testing

Phase 2: Core Functionality Implementation (Milestones 3-4)

Milestone 3: Time Utilities and Cache Management

Implement Time Utilities
- Date parsing and formatting
- Interval generation
- Time period handling for AEMO data types
- Forecast horizon calculations
- Write comprehensive tests
Implement Cache Management
- Cache directory structure
- Metadata tracking
- Cache lookup and retrieval
- Cache invalidation
- Write tests for caching logic

Milestone 4: Data Downloading

Map AEMO Data Sources
- Identify public endpoints for each data type
- Create URL templates
- Implement data source mapping
Implement Downloader
- HTTP request handling with retries
- Error handling
- Authentication (if needed)
- Input validation
- Integration with cache
- Batch downloading for multiple regions
Test Downloading
- Create mock responses for testing
- Implement end-to-end tests
- Verify cache integration

Phase 3: Data Processing (Milestones 5-6)

Milestone 5: Basic Data Processing

Implement Data Standardization
- Column normalization
- Data type conversion
- Date parsing
- Missing value handling
Data Type Handlers
- DISPATCHPRICE handler
- DISPATCHREGIONSUM handler
- Write tests for each handler

Milestone 6: Advanced Processing

Implement Time Series Processing
- Resampling and aggregation
- Rolling statistics
- Time-based filtering
- Region-based processing
- Tests for time series functions
Implement Predispatch Handlers
- PREDISPATCH processing
- P5MIN processing
- Tests for forecast data handling
Statistical Functions

Price statistics
Demand statistics
Data aggregation utilities
Visualization helpers

Phase 4: Documentation and Examples (Milestone 7)

API Documentation
- Set up Sphinx (or other documentation tool)
- Document all public functions
- Create API reference
Usage Examples
- Basic usage examples
- Advanced usage examples
- Jupyter notebooks with real-world data
Installation and Setup Guide
- UV-specific installation instructions
- Development setup instructions
- Dependency management guide

Phase 5: Quality Assurance and Release (Milestone 8)

Code Quality
- Run and fix linting issues (black, isort)
- Run and fix type checking issues (mypy)
- Ensure 90%+ test coverage
Comprehensive Test Suite
- Add edge cases to tests
- Implement integration tests
- Test on multiple Python versions
Performance Optimization
- Profile code for bottlenecks
- Optimize data loading and processing
- Implement caching improvements
Prepare for Release
- Update version number
- Finalize README
- Create CHANGELOG
- Update package metadata
CI/CD Pipeline Setup
- Set up GitHub Actions workflows
- Include linting, testing, and deployment steps
- Automate versioning and release tagging
Release
- Build package with UV
- Test installation in clean environment
- Publish to PyPI

Phase 6: Continuous Improvement (Ongoing)

Monitor and Fix Issues
- Address bug reports
- Implement feature requests
Expand Supported Data Types
- Add support for additional AEMO data types
- Enhance processing capabilities
Community Building
- Respond to user questions
- Review and merge contributions
- Update documentation based on user feedback

Development Practices

Throughout the implementation, adhere to these practices:

Dependency Management with UV
- Add new dependencies to pyproject.toml
- Install with uv pip install -e ".[dev,docs]"
- Periodically update with uv pip install --upgrade -e ".[dev,docs]"
- Create requirements.lock file: uv pip freeze > requirements.lock
Version Control
- Commit frequently with descriptive messages
- Use feature branches for development
- Create pull requests for code review
Testing
- Write tests before implementation (TDD)
- Maintain high test coverage
- Include edge cases in tests
Documentation
- Document code as you write it
- Keep README and user guides up-to-date
- Include examples for new features
Code Quality
- Run black and isort before commits
- Use mypy for type checking
- Follow PEP 8 guidelines