[Esip-documentation] ACDD comments

John Graybeal via Esip-documentation esip-documentation at lists.esipfed.org
Fri Sep 19 13:23:14 EDT 2014


Good input!  I lied about something -- we didn't really discuss these comments today, because a lot of ACDD topics and discussion came up. We will be creating a Google spreadsheet shortly that will let us follow the status of each of these comments, as well as a number of others, and try to resolve them by an adjudication meeting in 2 weeks. So the input was timely and will be actively resolved. (Read: I'll be pestering you more soon. :->)

See other comments in-line. Thanks for continuing the engagements/discussion, it's very helpful.

By the way, I hope you don't mind that I'm copying to the list -- obviously the group will end up discussing/approving approaches.

John

On Sep 18, 2014, at 11:50, Bob Simons - NOAA Federal <bob.simons at noaa.gov> wrote:

> Thank you very much for considering my comments. Some follow-ups are below...
> 
> On 2014-09-18 10:59 AM, John Graybeal wrote:
>> As the person most involved in several of these, I guess I will respond as best I can. 
>> 
>> On Sep 18, 2014, at 09:26, Bob Simons - NOAA Federal <bob.simons at noaa.gov> wrote:
>> 
>>> I have given up trying to keep up with all of the debate regarding ACDD in the ESIP documentation cluster and trying to figure out the new system for proposing changes. But I see several big problems in the currently proposed ACDD 2.0. Can you please deal with these comments before ratification of 2.0? Thank you.
>>> 
>>> * All of the date format references just say "Use ISO 8601 date format" and point to http://en.wikipedia.org/wiki/ISO_8601,
>>> but the original ISO 8601 actually presents a large range of possible formats, including these date formats: YYYY-MM-DD, YYYYMMDD, YYYY-WWW,  YYYY-WWW-D, YYYYWWWD, YYYY-DDD, YYYYDDD, and even variants with YY instead of YYYY.  I think it is better to specify just the most common format: "ISO 8601:2004 'extended' format date time in the form YYYY-MM-DDThh:mm:ss<zone> (although ss, mm, and hh can be omitted, and <zone> can be Z, ±hh:mm, ±hh, or omitted for dates without times)".
>> 
>> On many standards this questions has been argued, my impression is that smaller activities choose the narrower approach you recommend. Larger activities choose the broader, on the principal that libraries are now widely available that address all the formats, and there are functional advantages to supporting them on some projects. I strongly prefer supporting all formats. We could check ISO compatibility, which I think would be a good test.
> Saying "ISO 8601" is ambiguous and probably misleading. The initial version and the 2nd version (8601:2000) were superseded by the 3rd version (8601:2004), which sought to simplify the previous versions and remove some of the formats which they later realized were a bad idea (like 2 digit years). Okay, you are unwilling to limit the formats as much as I would prefer. But in ACDD, please at least specify ISO 8601:2004.   And please at least give a preference for the "extended" format (YYYY-MM-DDThh:mm:ss<zone>, which includes the shortened variants), because, as the 8601:2004 standard says, "The basic format should be avoided in plain text."

That sounds perfect. (Well, it seems that way to *me*.) Thanks for the clarifications. 

>> * The summary is now recommended to include the geospatial coverage of the data, and the temporal coverage of the data. But "Maintenance of Metadata" acknowledges the importance of software tool revising the metadata, notably the geospatiotemporal attributes when the dataset is modified. It is reasonable/possible for software tools to maintain e.g., geospatial_lon_min and max, but it is not reasonable to expect software tools to maintain the same values that occur within plaintext in the summary. Please remove the green sentence above.
>> 
>> This was extensively discussed on multiple threads, and Maintenance of Metadata was the result. The plurality seem to favor leaving the green in, and I don't know of anyone other than yourself still requesting it be removed.
> Great. That doesn't mean I'm wrong. I (as the author of ERDDAP) am certainly one of very few software tool makers on this list. 
> As a human, I would love to have geospatial coverage and temporal coverage in the summary, but it is NOT POSSIBLE for software tools to update this information because it is free text. 
> It isn't maintainable metadata.

I think we need to listen very closely to what the software tool makers say -- that is a fundamental need for acceptance of the standard.  You and I agree that you can't maintain this metadata by analyzing and modifying free text -- beyond that can I talk to you off-line so we can clarify technical details?  I think we might find agreement relatively soon....

> * cdm_data_type should not be tied to 
>> 
>>> http://www.unidata.ucar.edu/software/thredds/current/tds/catalog/InvCatalogSpec.html#dataType
>>> which is out-of-date and obsolete .
>> 
>> Are you sure? I thought several on this list were still using it.
> They probably are. That doesn't make it right. Unidata has created several sets of terms over the years. They haven't retracted the old versions.  I'm not saying what the right list of terms is, just that that list is out-of-date. Until Unidata and CF get their act together, it is better for ACDD to not pick a winner.
> Please read this entire exchange:
> http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2010/048519.html
> which clearly indicates the John Caron (if he is in practice the decider) says about cdm_data_type, which clearly goes beyond the list ACDD is seeking to enshrine.

Good pointer, I'll check it out, thanks.

>>> * Personally, I would change all instances of "bounding box or cube" to "bounding box". A cube is just a much more restricted bounding box (with equal length edges). So it is redundant and confusing.
>> 
>> I support this change, and propose to make it if no one disagrees.

I later realized the original concept of 'bounding box' was 2D, hence the addition of 'or cube'. I think the best approach is to reword, we'll try that.

>>> * The current statement that geospatial_lon_max may be less than lon_min invites misuse. Please add an example, e.g., "For example, a dataset which contains data in the longitude range 160 to 180 and -180 to -160 would have geospatial_lon_min=160 and geospatial_lon_max=-160." (Or perhaps I have misunderstood the use of this.)
>> 
>> An example is an excellent idea, will do this as well if no objections.
>> 
>>> * For geospatial_bounds, WKT doesn't specify units. So it is unclear if the WKT values represent latitude and longitude, or x and y from some projection (which opens up a huge can of worms). Please either require the use of latitude and longitude (please) or make some provision for specifying units (good luck with that).
>> 
>> OK, I'm guessing we'll have to look into this. Thanks for pointing it out. WKT is not specified well in Z either, if I recall correctly.
>> 
>>> * long_name has been widely used to provide a longer, human readable, restatement of the variable's name (often with spaces, e.g., variable=par, long_name="Photosynthetically Active Radiation"). If I understand "the "long_name" attribute value will be used by THREDDS as the variable's name in the variable mapping." correctly, it means the correct name will be something like groupName1.subgroupName2.par. Is that correct? If so, then the please make a new attribute name (variable_mapping?) for this new usage.
>> 
>> Not sure I understand this.
> I'm not sure I do either. But when I read the green sentence and I think about the new netcdf4/hdf5 data structures and the netcdf-java API, I interpret the green sentence as saying the long_name should have the data structure's complete name for the variable, e.g., in group "groupName1", in subgroup "subgroupName2", variable="par", would have long_name=groupName1.subgroupName2.par.
> 
> That directly conflicts with the wide-spread traditional use of long_name as a human readable (with spaces between words) longer version of the variable's name, e.g., variable=par, long_name="Photosynthetically Active Radiation"
> 
> And if my interpretation of the green sentence is incorrect, then it certainly indicates that the green sentence needs to be rewritten to be clarified and/or an example given.

Yes, I *think* these green bits about THREDDS are long-standing (old?) descriptions of THREDDS practice. I am not sure it's appropriate for ACDD to say what THREDDS does to map things -- are the communities that closely coupled? A question we'll discuss further in coming weeks.

>>> *** Deprecation is always a bad idea. It is far better to improve the definitions of existing attributes. CF understands this and has an excellent history of not deprecating terms. ACDD should follow CF's example.  Those of use who deal with the metadata for 1000's of datasets and for software really don't want changes that break the existing metadata in those dataset and in that software.
>> 
>> I think ACDD is an entirely different kind of standard than CF,
> Hmm. Is it?
>  
>> in that attributes in ACDD are all recommended, whereas you can not use a CF name that is not in the vocabulary and still be compliant.
> standard_names is an exception because of the controlled vocabulary. CF says "This standard describes many attributes (some mandatory, others optional),"

My key point (I mis-stated in my earlier hurry) was that some CF attributes are mandatory. Not so for ACDD.

>> So I don't think the analogy applies -- if someone still wants to use the old attributes, which people strongly felt had particular (conflicting) meanings, then they can still do so.
> Huh? You make it sound like ACDD is just for humans reading the metadata.

That wasn't my point, sorry for the confusion. 

> But the whole point of CD and ACDD must be to enable human understanding AND machine processing (and "understanding") of the metadata.  Everyone expects software tools to work with ACDD. Fine. Then give the software tool makers and the people maintaining 1000's of datasets that use ACDD 1.0 a break and maintain backward compatibility.  If you (collectively) make compliance with ACDD too burdensome by frequently making incompatible changes or by making the standard to big or complex, then people will be less inclined to make all the changes. 

Deprecating attributes from ACDD doesn't make ACDD 1.1 or 1.0  sets incompatible, for two reasons: 1) The data sets might have specified they were using ACDD 1.1/1.0 in the Conventions attribute (or by virtue of not specifying can be assumed to be 1.1 or earlier), so the checkers can process that.  2) If they use a deprecated attribute, ACDD can't say "That's an illegal attribute." ACDD doesn't have a list of "must have" and "can't have" attributes, that isn't what deprecated means in ACDD. All ACDD can say is that the latest version encourages the use of these other attributes instead.  (Umm, it may be that this is not a universally held belief and that it is not explicitly documented in ACDD. In which case I propose to add it. As I understand ACDD it is all about recommending best attribute practices, not prescribing them.)

So for deprecated attributes I think the compatibility checkers should be reporting Advisories or Information only. But even for Strongly Recommended attributes, the checkers can't say "COMPLIANCE ERROR" if they aren't there, because they are *not required*. At most they should say "Warning: ACDD 2.0 strongly recommends the use of attribute X, which you did not include."

Perhaps we need to make this explicit also -- I just derived it from previous discussions and the lack of required attributes. But it isn't obvious otherwise.

>> Just defining it better was not going to happen, for the reasons above.
>> 
>>> * Don't deprecate date_created. Just use the definition from date_product_available and remove date_product_available.
>>> * Don't deprecate date_issued. Just define it better.
>>> * Don't deprecate date_modified. Just use the definition from date_product_modified or date_values_modified and remove one of those newer terms.
>>> * How can you deprecate institution, which is in CF?! Just use the definition from creator_institution and remove creator_institution.
>> 
>> And people can still use it for whatever CF needs it to be. Creator_institution has a specific meaning which is not necessarily consistent with the one in CF (I need to double-check that claim.)
>>> 
>>> Thank you for considering these comments. I hope you will pursue these changes.
>> 
>> We will certainly discuss them today!
> 
> Thank you again for considering my comments.
> 
> 

> -- 
> Sincerely,
> Bob Simons 
> IT Specialist 
> Environmental Research Division 
> NOAA Southwest Fisheries Science Center 
> 1352 Lighthouse Ave 
> Pacific Grove, CA 93950-2079 
> Phone: (831)333-9878 (Changed 2014-08-20) 
> Fax: (831)648-8440 
> Email: bob.simons at noaa.gov 
> 
> The contents of this message are mine personally and 
> do not necessarily reflect any position of the 
> Government or the National Oceanic and Atmospheric 
> Administration. 
> <>< <>< <>< <>< <>< <>< <>< <>< <>< <><
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-documentation/attachments/20140919/2b91f90e/attachment-0001.html>


More information about the Esip-documentation mailing list