<p dir="ltr">This dataset contains dayside aurora observations derived from measurements by the Global-scale Observations of the Limb and Disk (GOLD) mission. The source data are from the Level 1C DAY (L1C DAY) products, available on the <a href="https://gold.cs.ucf.edu/" rel="noopener" target="_new">official GOLD website</a> and at NASA’s <a href="https://spdf.gsfc.nasa.gov/pub/data/gold/" rel="noreferrer" target="_blank">Space Physics Data Facility (SPDF)</a>.</p><p dir="ltr"><br></p><p dir="ltr">GOLD is onboard a geostationary satellite at 47.5°W longitude. Due to its viewing geometry and the positioning of the geomagnetic poles, the southern aurora is generally not visible in the L1C DAY files. Consequently, this dataset only has products of the northern aurora.</p><p dir="ltr">The dataset includes three main products:</p><ol><li>Raw emissions of auroral species from the L1C DAY files.</li><li>Dayglow estimates - representing light pollution (non auroral emissions) that contaminate the images.</li><li>Binary Masks - Estimated auroral locations in the images.</li></ol><p dir="ltr">Subtracting (2) from (1), and applying (3), gives the dayside aurora estimate, with no dayglow contamination.</p><p dir="ltr">In total, the dataset consists of over 47,000 image-label pairs spanning October 2018 to June 2025, making it one of the largest publicly available dayside aurora datasets to date. </p><p dir="ltr">Code that was used to generate this dataset can be found at this <a href="https://github.com/jah-26603/dayside_aurora_gold" rel="noreferrer" target="_blank">link</a>, and above for download.</p>
aurora_products.zip [folder]: This contains all measurements of the dayside aurora.
|----> [2018]
|----> [2019]
|----> 001.nc
|----> 002.nc
|----> ...
|----> 365.nc
|----> [2020]
|----> ...
|----> [2025]
Contains GOLD mission measurements from October 2018 - June 2025 used in this dataset.
Year Folders: Each folder corresponds to a calendar year and contains the reduced daily measurement files.
Day Files (.nc): Within these day files are the main contents of dataset. These files are derived from an entire day of GOLD Level 1C DAY (L1C DAY) products. Each of these products are in array format (N_scans x 52 rows x 92 columns). Since the instrument is onboard a geostationary satellite, the geographic and longitude coordinates are constant throughout the mission. The third dimension of this image, number of scans, indicates how many scans were performed for a given day.
- Raw scans from GOLD files of three species: 135.6nm , 149.3 nm, & LBH emissions.
- Dayglow estimates - representing light pollution (non auroral emissions) that contaminate the images.
- Binary Masks - Estimated auroral locations in the images.
- Geographic Latitude & Longitude.
- Universal Time.
- Magnetic Latitude & Local Time (apex quasi-dipole).
- Solar Zenith Angle & Emission Angle.
Model_Weights
The folder model_weights/ contains pretrained UNet models used to generate:
best_model_seg: Binary mask generation (segmentation).
best_model_reg: Dayglow estimation (regression).
These weights are automatically downloaded when running the code repository. No manual download required.
Code (zip): Contains all the code that was used in the generation of the reduced dataset. High level overview of the files and subdirectories:
- data_process.py: function that produces the data products for 1 full day given raw scans.
- main.py: loops over data_process.py
- model_compare_stats.py: compares algorithm outputs against an external auroral precipitation model.
- functions [folder]: contains all of the functions developed and called in data_process.py
- deep_learning [folder]: contains all of the necessary code for data allocation and training the UNet models from scratch.
- model_comparisons [folder]: contains necessary .csv files to make predictions against the Zhang-Paxton model.
- download_L1C_files[folder]: downloads all necessary GOLD L1C DAY files automatically.