Meeting Documents
PO.DAAC’s First Batch Release of Virtual Datasets for Rapid Access and Analysis
Presented at: AGU Annual Meeting 2025
Abstract
Technology to create virtual datasets (VDSs) has matured to the point where NASA DAACs (Distributed Active Archive Centers) can begin utilize it, resulting in significant benefits to users in terms of ease and speed of access to entire data sets (100 of GB's to TB's) in seconds from a few lines of code even on a modest machine such as a laptop. A VDS is a single sidecar reference file - typically a few to tens of MB’s in size - that acts as a table of contents to byte offsets/chunks across all files in a dataset. Major benefits include: (1) users can interact with the data as if they were all part of a single analysis-ready data set object, rather than thousands of files, (2) You don't need to be in the cloud to access data via this method, although performance is enhanced there, (3) this does not require extensive learning.
PO.DAAC (Physical Oceanography DAAC) has created and published its first batch of VDS's. These include Level 3/4 datasets for sea surface temperature (SST) from the OSTIA project, ocean winds (CCMP project), sea surface salinity (SMAP project), and model data (ECCO). We have also added an overview page and example notebooks to our Cookbook site to introduce users to this new method of access data. For example, the entire OSTIA SST dataset (~15k files at ~11TBs) can be lazy loaded with the xarray package in 10-20 seconds, and subsets of the data can immediately be loaded and used in computations. We present benchmarking results, showing that computing a regional mean on a 10 x 10 degree window of the globe across the 40 year period takes about 20 minutes using the VDSs, while it would take days or weeks using traditional methods. This is the case for both in-cloud and out-of-cloud usage. We present similarly significant results for the wind and model data sets.
Additionally, we will summarize a two examples where the VDSs are used for more extensive analyses. One example is producing a 10 year time series analysis of sea surface temperature vs ocean wind speed which depicts the anti correlation signals between these two parameters in the Gulf of Tehuantepec, a region of strong upwelling. The other investigates the correlative characteristics of the Indian Ocean Dipole and how the east and west regions behave differently, mirroring Figure 2 a,b in the paper by Wang et al., 2016 (https://doi.org/10.1007/s00382-016-2977-y).
View Document (AGU) »