This example demonstrates how to open NEMO files and make them compliant with xgcm.
NEMO files consist of two different types of files: 1) the domain files containing information on the grid (domain_cfg and mesh_mask files),
and 2) the nemo files containing the outputted variables (usually, the filenames are similar to XXX_01234_01234_grid_X.nc).
To create the xgcm.Grid, most of the information is located in the domain files. It is thus necessary to open both the domain and the nemo files.
These files can either be opened in two different Datasets, or combined into a single one. Both options are demonstrated in this example.
Start by importing the functions
from pathlib import Path
from os import listdir
from xnemogcm import open_domain_cfg, open_nemo, process_nemo, open_namelist, open_nemo_and_domain_cfg
from xnemogcm import __version__ as xnemogcm_version
xnemogcm_version
'0.5.0.post2'
help(open_domain_cfg)
Help on function open_domain_cfg in module xnemogcm.domcfg:
open_domain_cfg(datadir=None, files=None, add_coordinates=True)
Return a dataset containing all dataarrays of the domain_cfg*.nc / mesh_mask files.
For that, open and merge all the datasets.
The dataset is compatible with xgcm, the corresponding grid
can be create through: xgcm.Grid(domcfg)
Parameters
----------
datadir : string or pathlib.Path or None
The directory containing the 'domain_cfg' or 'mesh_mask' files
files : list or iterator or None
list of the file names that correspond to the domain_cfg and/or mesh_mask files,
e.g. 'files=Path('path/to/data').glob('*my_domcfg*.nc') if your domain_cfg files are called
'something_my_domcfg_00.nc' and 'something_my_domcfg_01.nc'
add_coordinates : bool
Whether to add the 'glamt', 'gphit', etc as coordinates of the dataset
Returns
-------
domcfg : xarray.Dataset
The domain configuration dataset, can be read by xgcm.
You can provide the file names / folder using 3 similar methods:
- Give the path to the files and xnemogcm opens the domain_cfg_out and/or mesh_mesk files
- Give the path to the data folder + the name of the files
- Give the name of the files that already contain the tree (e.g. ['/path/to/file1', '/path/to/file2']
These 3 methods are equivalent, however if your domain files don't have the standard names you need to provide them by hand.
We use one of the test folder:
datadir = Path('../../xnemogcm/test/data/4.2.0/open_and_merge/')
print(listdir(datadir))
['GYRE_1y_00010101_00011230_grid_T.nc', 'GYRE_1y_00010101_00011230_grid_U.nc', 'GYRE_1y_00010101_00011230_grid_V.nc', 'GYRE_1y_00010101_00011230_grid_W.nc', 'mesh_mask.nc']
domcfg = open_domain_cfg(datadir=datadir)
# or
domcfg = open_domain_cfg(datadir=datadir, files=['mesh_mask.nc'])
# or
domcfg = open_domain_cfg(files=datadir.glob('*mesh_mask*.nc'))
domcfg
<xarray.Dataset> Size: 324kB
Dimensions: (z_c: 4, y_c: 22, x_c: 32, x_f: 32, y_f: 22, z_f: 4)
Coordinates: (12/18)
glamt (y_c, x_c) float64 6kB dask.array<chunksize=(22, 32), meta=np.ndarray>
glamu (y_c, x_f) float64 6kB dask.array<chunksize=(22, 32), meta=np.ndarray>
glamv (y_f, x_c) float64 6kB dask.array<chunksize=(22, 32), meta=np.ndarray>
glamf (y_f, x_f) float64 6kB dask.array<chunksize=(22, 32), meta=np.ndarray>
gphit (y_c, x_c) float64 6kB dask.array<chunksize=(22, 32), meta=np.ndarray>
gphiu (y_c, x_f) float64 6kB dask.array<chunksize=(22, 32), meta=np.ndarray>
... ...
* x_f (x_f) float64 256B 0.5 1.5 2.5 3.5 4.5 ... 28.5 29.5 30.5 31.5
* y_f (y_f) float64 176B 0.5 1.5 2.5 3.5 4.5 ... 18.5 19.5 20.5 21.5
* z_f (z_f) float64 32B -0.5 0.5 1.5 2.5
* x_c (x_c) int64 256B 0 1 2 3 4 5 6 7 8 ... 23 24 25 26 27 28 29 30 31
* y_c (y_c) int64 176B 0 1 2 3 4 5 6 7 8 ... 13 14 15 16 17 18 19 20 21
* z_c (z_c) int64 32B 0 1 2 3
Data variables: (12/28)
tmask (z_c, y_c, x_c) int8 3kB dask.array<chunksize=(4, 22, 32), meta=np.ndarray>
umask (z_c, y_c, x_f) int8 3kB dask.array<chunksize=(4, 22, 32), meta=np.ndarray>
vmask (z_c, y_f, x_c) int8 3kB dask.array<chunksize=(4, 22, 32), meta=np.ndarray>
fmask (z_c, y_f, x_f) int8 3kB dask.array<chunksize=(4, 22, 32), meta=np.ndarray>
tmaskutil (y_c, x_c) int8 704B dask.array<chunksize=(22, 32), meta=np.ndarray>
umaskutil (y_c, x_f) int8 704B dask.array<chunksize=(22, 32), meta=np.ndarray>
... ...
e3u_0 (z_c, y_c, x_f) float64 23kB dask.array<chunksize=(4, 22, 32), meta=np.ndarray>
e3v_0 (z_c, y_f, x_c) float64 23kB dask.array<chunksize=(4, 22, 32), meta=np.ndarray>
e3f_0 (z_c, y_f, x_f) float64 23kB dask.array<chunksize=(4, 22, 32), meta=np.ndarray>
e3w_0 (z_f, y_c, x_c) float64 23kB dask.array<chunksize=(4, 22, 32), meta=np.ndarray>
e3uw_0 (z_f, y_c, x_f) float64 23kB dask.array<chunksize=(4, 22, 32), meta=np.ndarray>
e3vw_0 (z_f, y_f, x_c) float64 23kB dask.array<chunksize=(4, 22, 32), meta=np.ndarray>
Attributes: (12/13)
DOMAIN_dimensions_ids: [1 2]
DOMAIN_size_global: [32 22]
DOMAIN_halo_size_start: [0 0]
DOMAIN_halo_size_end: [0 0]
DOMAIN_type: BOX
CfgName: GYRE
... ...
Iperio: 0
Jperio: 0
NFold: 0
NFtype: -
VertCoord: zco
IsfCav: 0help(open_nemo)
Help on function open_nemo in module xnemogcm.nemo:
open_nemo(domcfg, datadir=None, files=None, chunks=None, parallel=False, **kwargs_open)
Open nemo dataset, and rename the coordinates to be conform to xgcm.Grid
The filenames must finish with 'grid_X.nc', with X in
['T', 'U', 'V', 'W', 'UW', 'VW', 'FW']
*OR*
the global attribute 'description' of each individual file must
be 'ocean X grid variables' with X in ['T', 'U', ...]
Parameters
----------
datadir : string or pathlib.Path
The directory containing the nemo files
domcfg : xarray.Dataset
the domcfg dataset, e.g. opened with xnemogcm.open_domain_cfg
files : list, optional
List of the files to open
chunks : dict
The chunks to use when opening the files,
e.g. chunks={'time_counter':10}
/! chunks need to be provided with the old names of dimensions
i.e. 'time_counter', 'x', etc
For more complex chunking, you may want to open without any chunks and set them up afterward.
kwargs_open : any other argument given to the xarray.open_dataset function
Returns
-------
nemo_ds : xarray.Dataset
Dataset containing all outputted variables, set on the proper
grid points (center, face, etc).
We can provide the files folder / name following the same convention as for the open_domain_cfg function. We also need to provide the domcfg dataset so xnemogcm knows how to set the variables on the proper grid position. We can also provide extra kwargs to the underlying call to xarray.open_mfdataset function.
nemo = open_nemo(domcfg=domcfg, datadir=datadir)
# or
nemo = open_nemo(domcfg=domcfg, files=datadir.glob('*grid*.nc'))
# or, using attributes from dataset and not name
datadir2 = Path('../../xnemogcm/test/data/4.2.0/nemo_no_grid_in_filename/')
nemo = open_nemo(
domcfg=domcfg, files=[
datadir2 / 'T.nc',
datadir2 / 'U.nc',
datadir2 / 'V.nc',
datadir2 / 'W.nc'
]
)
nemo
<xarray.Dataset> Size: 181kB
Dimensions: (z_c: 4, axis_nbounds: 2, t: 1, y_c: 22, x_c: 32,
x_f: 32, y_f: 22, z_f: 4)
Coordinates: (12/18)
time_centered (t) object 8B dask.array<chunksize=(1,), meta=np.ndarray>
* t (t) object 8B 0001-07-01 00:00:00
* x_c (x_c) int64 256B 0 1 2 3 4 5 6 ... 26 27 28 29 30 31
* y_c (y_c) int64 176B 0 1 2 3 4 5 6 ... 16 17 18 19 20 21
gdept_1d (z_c) float64 32B dask.array<chunksize=(4,), meta=np.ndarray>
* z_c (z_c) int64 32B 0 1 2 3
... ...
* y_f (y_f) float64 176B 0.5 1.5 2.5 3.5 ... 19.5 20.5 21.5
glamv (y_f, x_c) float64 6kB dask.array<chunksize=(22, 32), meta=np.ndarray>
gphiv (y_f, x_c) float64 6kB dask.array<chunksize=(22, 32), meta=np.ndarray>
gdepw_1d (z_f) float64 32B dask.array<chunksize=(4,), meta=np.ndarray>
* z_f (z_f) float64 32B -0.5 0.5 1.5 2.5
gdepw_0 (z_f, y_c, x_c) float64 23kB dask.array<chunksize=(4, 22, 32), meta=np.ndarray>
Dimensions without coordinates: axis_nbounds
Data variables: (12/15)
deptht_bounds (z_c, axis_nbounds) float32 32B dask.array<chunksize=(4, 2), meta=np.ndarray>
time_centered_bounds (t, axis_nbounds) object 16B dask.array<chunksize=(1, 2), meta=np.ndarray>
t_bounds (t, axis_nbounds) object 16B dask.array<chunksize=(1, 2), meta=np.ndarray>
toce (t, z_c, y_c, x_c) float32 11kB dask.array<chunksize=(1, 4, 22, 32), meta=np.ndarray>
soce (t, z_c, y_c, x_c) float32 11kB dask.array<chunksize=(1, 4, 22, 32), meta=np.ndarray>
e3t (t, z_c, y_c, x_c) float32 11kB dask.array<chunksize=(1, 4, 22, 32), meta=np.ndarray>
... ...
depthv_bounds (z_c, axis_nbounds) float32 32B dask.array<chunksize=(4, 2), meta=np.ndarray>
voce (t, z_c, y_f, x_c) float32 11kB dask.array<chunksize=(1, 4, 22, 32), meta=np.ndarray>
e3v (t, z_c, y_f, x_c) float32 11kB dask.array<chunksize=(1, 4, 22, 32), meta=np.ndarray>
depthw_bounds (z_f, axis_nbounds) float32 32B dask.array<chunksize=(4, 2), meta=np.ndarray>
woce (t, z_f, y_c, x_c) float32 11kB dask.array<chunksize=(1, 4, 22, 32), meta=np.ndarray>
e3w (t, z_f, y_c, x_c) float32 11kB dask.array<chunksize=(1, 4, 22, 32), meta=np.ndarray>
Attributes:
Conventions: CF-1.6
timeStamp: 2023-Mar-28 10:42:16 GMT
name: NEMO dataset
description: Ocean grid variables, set on the proper positions
title: Ocean grid variablesprocess_nemo¶
help(process_nemo)
Help on function process_nemo in module xnemogcm.nemo:
process_nemo(positions, domcfg, parallel=False)
Process datasets from NEMO outputs and set coordinates and attributes.
Parameters
----------
positions : list of tuples
[(ds1, 'X'), (ds2, 'Y'), (ds3, 'Z'), etc]
Here 'X', 'Y', 'Z' must me the proper positions
e.g. in ['T', 'U', 'V', 'W', 'UW', 'VW', 'FW']
*OR*
can be set to None. If None, then the corresponding dataset(s)
must have the global attribute 'description' with value
'ocean X grid variables' with X in ['T', 'U', ...]
domcfg : xarray.Dataset
the domcfg dataset
parallel : bool, default False
whether to use dask.delayed to process tasks in parallel
Returns
-------
nemo_ds : xarray.Dataset
Dataset containing all outputted variables, set on the proper
grid points (center, face, etc).
import xarray as xr
datadir2 = Path('../../xnemogcm/test/data/4.2.0/nemo_no_grid_in_filename/')
nemo = process_nemo(
positions=[
(xr.open_dataset(datadir2 / 'T.nc'), 'T'),
(xr.open_dataset(datadir2 / 'U.nc'), 'U'),
(xr.open_dataset(datadir2 / 'V.nc'), 'V'),
(xr.open_dataset(datadir2 / 'W.nc'), 'W')
],
domcfg=domcfg
)
# or, if the datasets contain the attribute 'description'
nemo = process_nemo(
positions=[
(xr.open_dataset(datadir2 / 'T.nc'), None),
(xr.open_dataset(datadir2 / 'U.nc'), None),
(xr.open_dataset(datadir2 / 'V.nc'), None),
(xr.open_dataset(datadir2 / 'W.nc'), None)
],
domcfg=domcfg
)
Open both at once¶
It is possible to open the domain and nemo output at once in one unique dataset. What happens is that 2 datasets are created and then merged. Thus all option possible for the open_nemo and open_domain_cfg functions are still possible.
help(open_nemo_and_domain_cfg)
Help on function open_nemo_and_domain_cfg in module xnemogcm.merge:
open_nemo_and_domain_cfg(nemo_files=None, domcfg_files=None, nemo_kwargs=None, domcfg_kwargs=None, linear_free_surface=False)
Open nemo_ds and domcfg with open_nemo and open_domain_cfg and merge them with _merge_nemo_and_domain_cfg.
See the respective functions docstrings for more details.
2 methods are available for nemo files and domain_cfg/mesh_mask files: 1) provide a list of the files
you want to open, 2) provide the path of the directories containing the files and xnemogcm will try
to open as much files as it can.
Arguments
---------
nemo_files : Optional, list / generator or string / Path
1) list / generator containing the nemo output files, or
2) string / Path of the directory containing the nemo output files.
Will open all files containing "grid_X" in their name, "X" being "T", "U", "V", "W", "F", etc
domcfg_files : Optional, list / generator or string / Path
1) list / generator containing the domain_cfg / mesh_mask files, or
2) string / Path of the directory containing the domain_cfg / mesh_mask output files.
Will open all files containing "domain_cfg" or "mesh_mask" in their name.
nemo_kwargs : dict
dict containing the parameters of the xnemogcm.open_nemo function
Can contain the files and/or datadir arguments of the open_nemo function
e.g. {'chunks':{'time_counter':10}}
domcfg_kwargs : dict
dict containing the parameters of the xnemogcm.open_domain_cfg function
Can contain the files and/or datadir arguments of the open_domain_cfg function
linear_free_surface : bool
True if linear free surface is used. Used by xnemogcm._merge_nemo_and_domain_cfg function
Again, multiple equivalent arguments are possible to open the data
# the simplest for simple cases, provide the path
ds = open_nemo_and_domain_cfg(nemo_files=datadir, domcfg_files=datadir)
# or provide the files
ds = open_nemo_and_domain_cfg(nemo_files=datadir.glob('*grid*.nc'), domcfg_files=datadir.glob('*mesh*.nc'))
# or use the nemo_kwargs and domcfg_kwargs dictionaries
ds = open_nemo_and_domain_cfg(nemo_kwargs=dict(datadir=datadir), domcfg_kwargs={'files':datadir.glob('*mesh*.nc')})
ds
<xarray.Dataset> Size: 426kB
Dimensions: (z_c: 4, axis_nbounds: 2, t: 1, y_c: 22, x_c: 32,
x_f: 32, y_f: 22, z_f: 4)
Coordinates: (12/20)
time_centered (t) object 8B dask.array<chunksize=(1,), meta=np.ndarray>
* t (t) object 8B 0001-07-01 00:00:00
* x_c (x_c) int64 256B 0 1 2 3 4 5 6 ... 26 27 28 29 30 31
* y_c (y_c) int64 176B 0 1 2 3 4 5 6 ... 16 17 18 19 20 21
gdept_1d (z_c) float64 32B dask.array<chunksize=(4,), meta=np.ndarray>
* z_c (z_c) int64 32B 0 1 2 3
... ...
gphiv (y_f, x_c) float64 6kB dask.array<chunksize=(22, 32), meta=np.ndarray>
gdepw_1d (z_f) float64 32B dask.array<chunksize=(4,), meta=np.ndarray>
* z_f (z_f) float64 32B -0.5 0.5 1.5 2.5
gdepw_0 (z_f, y_c, x_c) float64 23kB dask.array<chunksize=(4, 22, 32), meta=np.ndarray>
glamf (y_f, x_f) float64 6kB dask.array<chunksize=(22, 32), meta=np.ndarray>
gphif (y_f, x_f) float64 6kB dask.array<chunksize=(22, 32), meta=np.ndarray>
Dimensions without coordinates: axis_nbounds
Data variables: (12/43)
deptht_bounds (z_c, axis_nbounds) float32 32B dask.array<chunksize=(4, 2), meta=np.ndarray>
time_centered_bounds (t, axis_nbounds) object 16B dask.array<chunksize=(1, 2), meta=np.ndarray>
t_bounds (t, axis_nbounds) object 16B dask.array<chunksize=(1, 2), meta=np.ndarray>
toce (t, z_c, y_c, x_c) float32 11kB dask.array<chunksize=(1, 4, 22, 32), meta=np.ndarray>
soce (t, z_c, y_c, x_c) float32 11kB dask.array<chunksize=(1, 4, 22, 32), meta=np.ndarray>
e3t (t, z_c, y_c, x_c) float32 11kB dask.array<chunksize=(1, 4, 22, 32), meta=np.ndarray>
... ...
e3u_0 (z_c, y_c, x_f) float64 23kB dask.array<chunksize=(4, 22, 32), meta=np.ndarray>
e3v_0 (z_c, y_f, x_c) float64 23kB dask.array<chunksize=(4, 22, 32), meta=np.ndarray>
e3f_0 (z_c, y_f, x_f) float64 23kB dask.array<chunksize=(4, 22, 32), meta=np.ndarray>
e3w_0 (z_f, y_c, x_c) float64 23kB dask.array<chunksize=(4, 22, 32), meta=np.ndarray>
e3uw_0 (z_f, y_c, x_f) float64 23kB dask.array<chunksize=(4, 22, 32), meta=np.ndarray>
e3vw_0 (z_f, y_f, x_c) float64 23kB dask.array<chunksize=(4, 22, 32), meta=np.ndarray>
Attributes: (12/18)
Conventions: CF-1.6
timeStamp: 2023-Mar-28 10:42:16 GMT
name: NEMO dataset
description: Ocean grid variables, set on the proper positions
title: Ocean grid variables
DOMAIN_dimensions_ids: [1 2]
... ...
Iperio: 0
Jperio: 0
NFold: 0
NFtype: -
VertCoord: zco
IsfCav: 0Remark¶
All opening are lazy using dask, which makes files quick to open, until you actually load the data you need
Namelist¶
It can be convenient to open the namelist used for the run (e.g. to compare different runs with different parameters). This is possible using the f90nml package (it needs to be installed, this is an optional dependency).
help(open_namelist)
Help on function open_namelist in module xnemogcm.namelist:
open_namelist(datadir=None, files=None, ref=True, cfg=True)
Open the namelist and store it into a xarray.Dataset
Here you provide the folder path containing the reference and configuration namelists, or the filenames (as for nemo and domcfg). You can choose to load both, or only one of them. The configuration namelist will overwrite the default one.
For this we need to use another folder of the test data (with simplified namelists for the example):
datadir = Path('../../xnemogcm/test/data/namelist/')
print(listdir(datadir))
['namelist_ref', 'namelist_cfg']
name = open_namelist(datadir)
name
--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) Cell In[16], line 1 ----> 1 name = open_namelist(datadir) 2 name File ~/checkouts/readthedocs.org/user_builds/xnemogcm/envs/latest/lib/python3.10/site-packages/xnemogcm/namelist.py:16, in open_namelist(datadir, files, ref, cfg) 12 def open_namelist(datadir=None, files=None, ref=True, cfg=True): 13 """ 14 Open the namelist and store it into a xarray.Dataset 15 """ ---> 16 import f90nml 18 files = _dir_or_files_to_files(datadir, files, patterns=["namelist*"]) 20 if len(files) > 2: ModuleNotFoundError: No module named 'f90nml'
name.nn_it000
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[17], line 1 ----> 1 name.nn_it000 NameError: name 'name' is not defined