A Python package for downloading historical data published by the Australian Energy Market Operator (AEMO)
- Download Windows Application (GUI)
- Documentation
- Contributing
- Support NEMOSIS
- Get Updates, Ask Questions
- Using the Python Interface (API)
Choose the exe from the latest release
- Check out the wiki
- View worked examples:
- What data is available and data column definitions
- Watch a video
- Read our paper introducting NEMOSIS
Interested in contributing? Check out the contributing instructions, which also includes steps to install nemosis for development.
Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
Cite our paper in your publications that use data from NEMOSIS.
Join the NEMOSIS forum group.
pip install nemosis
Dynamic tables contain a datetime column that allows NEMOSIS to filter their content by a start and end time.
To learn more about each dynamic table visit the wiki.
You can view the dynamic tables available by printing the NEMOSIS default settings.
from nemosis import defaults
print(defaults.dynamic_tables)
# ['DISPATCHLOAD', 'DUDETAILSUMMARY', 'DUDETAIL', 'DISPATCHCONSTRAINT', 'GENCONDATA', 'DISPATCH_UNIT_SCADA', 'DISPATCHPRICE', . . .Your workflow may determine how you use NEMOSIS. Because the GUI relies on data being stored as strings (rather than numeric types such as integers or floats), we suggest the following:
- If you are using NEMOSIS' API in your code, or using the same cache for the GUI and API, use
dynamic_data_compiler. This will allow your data to be handled by both the GUI and the API. Data read in via the API will be typed, i.e. datetime columns will be a datetime type, numeric columns will be integer/float, etc. See this section. - If you are using NEMOSIS to cache data in feather or parquet format for use with another application, use
cache_compiler. This will ensure that cached feather/parquet files are appropriately typed to make further external processing easier. It will also cache faster as it doesn't prepare a DataFrame for further analysis. See this section.
dynamic_data_compiler can be used to download and compile data from dynamic tables.
from nemosis import dynamic_data_compiler
start_time = '2017/01/01 00:00:00'
end_time = '2017/01/01 00:05:00'
table = 'DISPATCHPRICE'
raw_data_cache = 'C:/Users/your_data_storage'
price_data = dynamic_data_compiler(start_time, end_time, table, raw_data_cache)Using the default settings of dynamic_data_compiler will download zip archives from AEMO's NEMWeb portal and save them to the raw_data_cache directory. It will then extract each CSV, create a parquet file version (parquet has excellent compression and good interop with downstream tools), and delete the extracted CSV. The downloaded zip is retained by default so that subsequent runs that need to rebuild the parquet (e.g. to add columns or switch format) do not have to re-download from AEMO. Subsequent dynamic_data_compiler calls will check if any data in raw_data_cache matches the query and load it from there. This means that subsequent dynamic_data_compiler calls will be faster so long as the cached data is available. See Caching options below to tune what stays on disk.
A number of options are available to configure filtering (i.e. what data NEMOSIS returns as a pandas DataFrame) and caching.
start_time and end_time accept three forms:
- A string in
"YYYY/MM/DD HH:MM:SS"form, e.g."2017/01/01 00:00:00". - A
datetime.datetime(timezone unaware). Timezone-aware datetimes are rejected — NEMOSIS treats all timestamps as AEMO market time. - A
datetime.date. Asstart_timeit expands to 00:00:00 of that day; asend_timeit expands to 00:00:00 of the next day (soend_time=date(2018, 5, 1)is inclusive of all of 2018-05-01).
from datetime import date, datetime
from nemosis import dynamic_data_compiler
# string
data = dynamic_data_compiler('2017/01/01 00:00:00', '2017/01/01 00:05:00', table, raw_data_cache)
# datetime
data = dynamic_data_compiler(datetime(2017, 1, 1, 0, 0), datetime(2017, 1, 1, 0, 5), table, raw_data_cache)
# date (covers all of 2017-01-01)
data = dynamic_data_compiler(date(2017, 1, 1), date(2017, 1, 1), table, raw_data_cache)The same forms are accepted by cache_compiler.
dynamic_data_compiler can be used to filter data before returning results.
To return only a subset of a particular table's columns, use the select_columns argument.
from nemosis import dynamic_data_compiler
price_data = dynamic_data_compiler(start_time, end_time, table, raw_data_cache,
select_columns=['REGIONID', 'SETTLEMENTDATE', 'RRP'])To see what columns a table has, you can inspect NEMOSIS' defaults.
from nemosis import defaults
print(defaults.table_columns['DISPATCHPRICE'])
# ['SETTLEMENTDATE', 'REGIONID', 'INTERVENTION', 'RRP', 'RAISE6SECRRP', 'RAISE60SECRRP', 'RAISE5MINRRP', . . .Columns can also be filtered by value. To do this, you need provide a column to be filtered (filter_cols) and a value or values to filter (filter_values) a corresponding column by. to filter by a column the column must be included as a filter column.
In the example below, the table will be filtered to only return rows where REGIONID == 'SA1'.
from nemosis import dynamic_data_compiler
price_data = dynamic_data_compiler(start_time, end_time, table, raw_data_cache, filter_cols=['REGIONID'],
filter_values=(['SA1'],))Several filters can be applied simultaneously. A common filter is to extract pricing data excluding any physical intervention dispatch runs (INTERVENTION == 0 is the appropriate filter, see here). Below is an example of filtering to get data for Gladstone Unit 1 and Hornsdale Wind Farm 2 excluding any physical dispatch runs:
from nemosis import dynamic_data_compiler
unit_dispatch_data = dynamic_data_compiler(start_time, end_time, 'DISPATCHLOAD', raw_data_cache,
filter_cols=['DUID', 'INTERVENTION'],
filter_values=(['GSTONE1', 'HDWF2'], [0]))By default dynamic_data_compiler uses fformat='parquet', keep_csv=False, and keep_zip=True. That is:
- The original AEMO CSVs are extracted, converted to parquet, and then deleted (lean cache). Parquet has excellent compression characteristics and good compatibility with packages for handling large on-memory/cluster datasets (e.g. Dask) — this helps with local storage (especially for Causer Pays data) and file size for version control.
- The downloaded AEMO archive zips are kept on disk so that rebuilding the cache or switching format later does not require re-downloading from AEMO (see #56).
You can mix and match these two flags to control what is retained:
keep_csv |
keep_zip |
What stays in raw_data_cache |
|---|---|---|
False (default) |
True (default) |
parquet/feather + zip |
False |
False |
parquet/feather only — leanest cache |
True |
False |
parquet/feather + CSV |
True |
True |
parquet/feather + CSV + zip — full raw retention |
For example, to keep neither the CSV nor the zip after the parquet file is written:
price_data = dynamic_data_compiler(start_time, end_time, table, raw_data_cache,
keep_csv=False, keep_zip=False)To keep the raw AEMO CSVs alongside the parquet files (e.g. for an external tool that consumes the original CSV directly):
price_data = dynamic_data_compiler(start_time, end_time, table, raw_data_cache, keep_csv=True)If the option fformat='csv' is used then no parquet files will be created, and all caching will be done using CSVs.
price_data = dynamic_data_compiler(start_time, end_time, table, raw_data_cache, fformat='csv')If the option fformat='feather' is provided then no parquet files will be created, and a feather file will be used instead. Feather may give faster read/write than parquet, at the cost of larger files on disk.
price_data = dynamic_data_compiler(start_time, end_time, table, raw_data_cache, fformat='feather')keep_csv and keep_zip are also accepted by cache_compiler with the same defaults and meaning.
This may be useful if you're using NEMOSIS to
build a data cache, but then process the cache using other packages or applications. It is particularly useful because cache_compiler will infer the data types of the columns before saving to parquet or feather, thereby eliminating the need to type convert data that is obtained using dynamic_data_compiler.
cache_compiler can be used to compile a cache of parquet or feather files (parquet by default; pass fformat='feather' to switch). cache_compiler will not run if it detects the appropriate files in the raw_data_cache directory. Otherwise, it will download zip archives from AEMO, extract the CSVs, convert them to the requested format, and then delete the CSVs. The downloaded zips are kept by default (keep_zip=True); pass keep_zip=False to delete them as well. cache_compiler does not return any data, unlike dynamic_data_compiler.
The example below downloads parquet data into the cache.
from nemosis import cache_compiler
cache_compiler(start_time, end_time, table, raw_data_cache)By default NEMOSIS only includes a subset of an AEMO table's columns, the full set of columns are listed in the
MMS Data Model Reports,
or can be seen by inspecting the CSVs in the raw data cache. Users of the python interface can add additional
columns as shown below. If you are using a feather or parquet based cache the rebuild option should be set to
true so the additional columns are added to the cache files when they are rebuilt. With the default keep_zip=True,
the rebuild re-extracts the CSV from the cached zip rather than re-downloading from AEMO. This method of adding
additional columns should also work with the cache_compiler function.
from nemosis import dynamic_data_compiler
from nemosis import defaults
defaults.table_columns['BIDPEROFFER_D'] += ['PASAAVAILABILITY']
start_time = '2017/01/01 00:00:00'
end_time = '2017/01/01 00:05:00'
table = 'BIDPEROFFER_D'
raw_data_cache = 'C:/Users/your_data_storage'
volume_bid_data = dynamic_data_compiler(start_time, end_time, table, raw_data_cache, rebuild=True)Static tables do not include a time column and cannot be filtered by start and end time.
To learn more about each static table visit the wiki.
You can view the static tables available by printing the tables in NEMOSIS' defaults:
from nemosis import defaults
print(defaults.static_tables)
# ['ELEMENTS_FCAS_4_SECOND', 'VARIABLES_FCAS_4_SECOND', 'Generators and Scheduled Loads']Note: 'FCAS Providers' was previously listed here but has been
deprecated — AEMO migrated the underlying data out of the
NEM Registration and Exemption List.xlsx workbook to a weekly archive
at https://www.nemweb.com.au/REPORTS/CURRENT/ANCILLARY_SERVICES_REPORTS/.
See issue #92 for
status. Calls to static_table("FCAS Providers", ...) now raise a
UserInputError pointing at the new endpoint.
The static_table function can be used to access these tables
from nemosis import static_table
fcas_variables = static_table('VARIABLES_FCAS_4_SECOND', raw_data_cache)NEMOSIS uses the python logging module to print messages to the console. If desired, this can be disabled after imports, as shown below. This will disable log messages unless they are at least warnings.
import logging
from nemosis import dynamic_data_compiler
logging.getLogger("nemosis").setLevel(logging.WARNING)- BIDPEROFFER_D and BIDDAYOFFER_D data is known to be missing for these tables between March 2021 and July 2024.
- For FCAS_4_SECOND, approximately the most recent two months of data are available and historical data between 2011 and 2016.