Skip to content

Latest commit

 

History

History
179 lines (132 loc) · 13.5 KB

File metadata and controls

179 lines (132 loc) · 13.5 KB

So you've got a new CMIP6 variable

Step, The First

You've probably recieved an email from a friendly scientist saying something along the lines of, "Hey, there's this handy CMIP6 variable that I need for my analysis, could you publish it?" The first thing to do is create a new entry in the E3SM variable conversion confluence page here. Identify the CMIP6 table that the variable belongs to, and add an entry in the appropriate table. If the scientist helpfully included a conversion formula, include that as well, otherwise reach out to the appropriate science group leader (land, atmos, ocean) and ask them if they can supply the conversion formula as well as a scientist to perform a quality control check on the data.

case study: snw

New CMIP variable request:

SNOWICE variable.  For comparison I’d like to have corresponding output from the coupled (historical)
and AMIP simulations if that’s available (monthly frequency is fine). …the official CMIP6 name for this
variable is ‘snw’ and it’s part of the ‘landice’ realm, but there are no landice realm variables listed
for any of the E3SM models.  I think this corresponds to the variable ’SNOWICE’ in your model, but it
would be great to confirm that as well.

part 1:

check the cmip6 metadata tables for the snw variable, keeping in mind the email said "it’s part of the ‘landice’ realm" so its probablt the LImon table instead of the Lmon table like the rest of the land data. Search the repo for your variable until you find the matching CMIP6 table enty. You're going to have to track down the variable, so I suggest keeping a copy of the cmip6 tables repository on hand. Looking at the CMIP6_LImon.json we see the entry for our variable:

"snw":
    {
        "frequency": "mon",
        "modeling_realm": "landIce land",
        "standard_name": "surface_snow_amount",
        "units": "kg m-2",
        "cell_methods": "area: mean where land time: mean",
        "cell_measures": "area: areacella",
        "long_name": "Surface Snow Amount",
        "comment": "The mass of surface snow on the land portion of the grid cell divided by the land area in the grid cell; reported as missing where the land fraction is 0; excludes snow on vegetation canopy or on sea ice.",
        "dimensions": "longitude latitude time",
        "out_name": "snw",
        "type": "real",
        "positive": "",
        "valid_min": "",
        "valid_max": "",
        "ok_min_mean_abs": "",
        "ok_max_mean_abs": ""
    },

of particular interest here are the comment and dimensions, the first because it tells us about the meaning of the thing, and the second because it tells us we're dealing with a fairly simple 2d monthly land variable and this should be easy.

part 2:

Afte checking with the land scientists, it was discovered that although the requesting scientist thought they knew what the source variable was, they were wrong, and that the correct E3SM variable is H2OSNO not SNOWICE.

Now lets check the E3SM variable and see what its attributes are. Use the handy ESGF metagrid search or the old CoG seach (or look on your filesystem) to find the raw input data of interest, in this case the historical and amip ensembles from the E3SM-1-0 model version.

~~> ncdump -h ~/Data/20181217.CNTL_CNPCTC1850_OIBGC.ne30_oECv3.edison.clm2.h0.1850-01.nc |  grep H2OSNO
        float H2OSNO(time, lndgrid) ;
            H2OSNO:long_name = "snow depth (liquid water)" ;
            H2OSNO:units = "mm" ;
            H2OSNO:cell_methods = "time: mean" ;
            H2OSNO:_FillValue = 1.e+36f ;
            H2OSNO:missing_value = 1.e+36f ;

In this case the E3SM units are H2OSNO:units = "mm" ;, and the CMIP6 units are "units": "kg m-2",, which although they look like they dont match, due to the wonders of the SI unit system "mm" == "kg m-2" when working with water variables and so no unit conversion is required.

part 3:

Now lets go to the internal confluence table and make a new entry with the info we've discovered. Document the new formula and notify the people of interest.

Step, The Second

Create a new branch of the e3sm_to_cmip repository to hold the new converter. If its a "simple" converter, i.e. is a 1-to-1 conversion from an E3SM variable to a CMIP6 variable (with perhaps a unit conversion) then this step is easy, simply add an entry in the default handler specification. Supported unit conversions are:

'g-to-kg' -> data / 1000
'1-to-%'  -> data * 100.0
'm/s-to-kg/ms' -> data * 1000
'-1' -> data * -1

You can add additional unit conversions here

If the new variable does not have a simple one to one formula, you're going to have to create a new conversion handler. Follow one of the many examples here.

case study: snw

part 1:

In part 1 we identified that this was a simple handler, so this should be fairly easy. First lets make a new branch

>> git checkout -b new-handler-snw
Switched to a new branch 'new-handler-snw'

now all we need to do is add an entry in the default handlers file

- cmip_name: snw
  e3sm_name: H2OSNO
  units: 'kg m-2'
  table: CMIP6_LImon.json

looking pretty good, lets create some sample data so we can run it.

>> ncclimo -7 --dfl_lvl=1 --no_cll_msr  -v H2OSNO  -s 1 -e 1 -o $Data/tmp/ --map=$Data/map_ne30np4_to_cmip6_180x360_aave.20181001.nc  -O $Data/timeseries --ypf=10 -i $Data/land/native/model-output/mon/ens1/v0 --sgs_frc=$Data/land/native/model-output/mon/ens1/v0/20180129.DECKv1b_piControl.ne30_oEC.edison.clm2.h0.0001-01.nc/landfrac

Started climatology splitting at Wed Aug 11 16:43:25 PDT 2021
Running climatology script ncclimo from directory /home/baldwin32/anaconda3/envs/warehouse/bin
NCO binaries version 4.9.9 from directory /home/baldwin32/anaconda3/envs/warehouse/bin
Parallelism mode = background
Timeseries will be created for only one variable
Will split data for each variable into one timeseries of length 1 years
Splitting climatology from 12 raw input files in directory /land/native/model-output/mon/ens1/v0
Each input file assumed to contain mean of one month
Native-grid split files to directory /land/180x360/time-series/mon/ens1/v0-tmp
Regridded split files to directory /land/180x360/time-series/mon/ens1/v0
Wed Aug 11 16:43:26 PDT 2021: Generated /land/180x360/time-series/mon/ens1/v0-tmp/H2OSNO_000101_000112.nc
Input #00: /land/180x360/time-series/mon/ens1/v0-tmp/H2OSNO_000101_000112.nc
Map/Wgt  : /maps/map_ne30np4_to_cmip6_180x360_aave.20181001.nc
Wed Aug 11 16:43:27 PDT 2021: Regridded /land/180x360/time-series/mon/ens1/v0/H2OSNO_000101_000112.nc
Quick plots of last timeseries segment of last variable split:
ncview /land/180x360/time-series/mon/ens1/v0/H2OSNO_000101_000112.nc &
panoply /land/180x360/time-series/mon/ens1/v0/H2OSNO_000101_000112.nc &
Completed 1-year climatology operations for input data at Wed Aug 11 16:43:27 PDT 2021
Elapsed time 0m2s

part 2:

With this new regridded timeseries we can take the converter for a run and see how it goes. For this you will need a working installation of the e3sm_to_cmip package, as well as the cmip6 metadata tables. <NOTE: Explain the -u option>

>> python -m e3sm_to_cmip -i $Data/land/180x360/time-series/mon/ens1/v0 -o $Data -t ~/projects/cmip6-cmor-tables/Tables/ -u piControl_r1i1p1f1.json -v snw --realm lnd

realm [*] Writing log output to: /p/user_pub/e3sm/baldwin32/warehouse_testing/converter.log [+] Running CMOR handlers in parallel 100%|█████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.08it/s] [+] 1 of 1 handlers complete

Step, The Third

Once you're able to produce the variable manually using the e3sm_to_cmip package, supply a sample of the variable output to the responsible scientist for quality assurance. Its best to supply them with a 5 year file so there's enough data to do a thurough check. If they give you the green light then merge your changes into the e3sm_to_cmip package and create a new version tag.

Step, The Fourth

Now that the convertsion handler is working and merged, you can update the warehouse dataset specification to include the new variable. Under esgfpub/warehouse/resources/ open the dataset_spec.yaml file. There are two top level objects in the file, "tables" and "project," the first thing that needs to change is for the new variable to be added to the appropriate place under "tables." If its an Amon variable, add it to the variable list under Amon, etc. By default, anything that shows up in those tables will now be included in the CMIP6 datasets for ALL CASES. If the raw E3SM input variable isnt included in any of the cases, then this new variabler should NOT be included for the case. You will need to add the variable to all the case sections under the "except" section, you can see an example here

Merge this new change into the 'master' branch and install the new change locally.

case study: snw When we open up the dataset_spec.yaml file, we can see that the table that "snw" belongs to doesnt exist yet, so we have to add the new "LImon" table to the list

tables:
    Amon:
        ...
    AERmon:
        ...
    CFmon:
        ...
    Lmon:
        ...
    LImon:       <-  our new table
        - snw    <-  our new variable

We dont know yet if there are any cases that dont include the H2OSNO variable, but I believe it to be a standard variable included in all the cases, so we dont need to add snw to and of the cases exception list.

Step, The Fifth

You can now envoke the warehouse to create your new CMIP6 datasets! This should be as simple as running warehouse postprocess -d CMIP6.*.<YOUR_NEW_VARIABLE>.* and then after the datasets are produced run warehouse auto -d CMIP6.*.<YOUR_NEW_VARIABLE>.* which should publish them. Its advised that you run a single case first before envoking the run-everything command, as any problems will be easier to solve with a single case then when working with all the cases at once.

case study: snw

Its always better to take things one at a time when doing something new, so lets try the new snw variable on one case first before doing everything at once. Since we dont want to run on a whole case while debugging, we can use the --testing option and -d CMIP6.test.*.snw.* to run on a small test case. We want to use the postprocess workflow instead of auto, because we dont want to publish the testing output.

>> python -m warehouse postprocess -w $Data --status-path $status -d CMIP6.test.*.snw.*  --testing
2021/08/13 16:11:01:INFO:POSTPROCESS:initializing workflow POSTPROCESS
2021/08/13 16:11:01:INFO:POSTPROCESS:Starting with datasets ['CMIP6.test.*.snw.*']
2021/08/13 16:11:01:INFO:WAREHOUSE:Running warehouse in serial mode
2021/08/13 16:11:01:INFO:WAREHOUSE:Initializing the warehouse
2021/08/13 16:11:01:INFO:WAREHOUSE:starting listener for $status/CMIP6.test.E3SM-Project.test.test.r1i1p1f1.LImon.snw.gr.status
2021/08/13 16:11:01:INFO:WAREHOUSE:Listener setup complete
2021/08/13 16:11:01:INFO:WAREHOUSE:dataset: CMIP6.test.E3SM-Project.test.test.r1i1p1f1.LImon.snw.gr updated to state POSTPROCESS:GenerateLndMonCMIP:Ready:
2021/08/13 16:11:01:INFO:WAREHOUSE:starting job: POSTPROCESS:GenerateLndMonCMIP:CMIP6.test.E3SM-Project.test.test.r1i1p1f1.LImon.snw.gr
2021/08/13 16:11:01:INFO:DATASET:E3SM.test.test.test.land.native.model-output.mon.ens1 initialized and set to WAREHOUSE:UNINITIALIZED:
2021/08/13 16:11:01:INFO:WAREHOUSE:Job POSTPROCESS:GenerateLndMonCMIP:CMIP6.test.E3SM-Project.test.test.r1i1p1f1.LImon.snw.gr meets its input dataset requirements
2021/08/13 16:11:04:INFO:WAREHOUSE:dataset: CMIP6.test.E3SM-Project.test.test.r1i1p1f1.LImon.snw.gr updated to state POSTPROCESS:GenerateLndMonCMIP:Engaged:slurm_id=11693
2021/08/13 16:11:42:INFO:WAREHOUSE:dataset: CMIP6.test.E3SM-Project.test.test.r1i1p1f1.LImon.snw.gr updated to state POSTPROCESS:GenerateLndMonCMIP:Pass:
2021/08/13 16:11:42:INFO:WAREHOUSE:dataset: CMIP6.test.E3SM-Project.test.test.r1i1p1f1.LImon.snw.gr updated to state POSTPROCESS:Pass:
2021/08/13 16:11:42:INFO:WAREHOUSE:Dataset CMIP6.test.E3SM-Project.test.test.r1i1p1f1.LImon.snw.gr SUCCEEDED from POSTPROCESS:Pass:
2021/08/13 16:11:42:INFO:WAREHOUSE:All datasets complete, exiting
Postprocessing complete, dataset CMIP6.test.E3SM-Project.test.test.r1i1p1f1.LImon.snw.gr is in state POSTPROCESS:Pass: