[Esip-documentation] status/open issues for ACDD approval

Nan Galbraith via Esip-documentation esip-documentation at lists.esipfed.org
Fri Sep 19 10:03:04 EDT 2014


Hi Phil and all -

For files containing observational data, the descriptors 'released', 
'published'
and 'issued' are very nearly meaningless terms.

My NetCDF data files each have a single 'version date' that reflects the 
last time
the observations or derived data values were changed by editing, or 
applying new
calibrations or new algorithms, and a 'file date' that includes the 
above plus any
re-write that was due to changes in some format spec (like ACDD) or maybe a
semantic error (such as a missed standard name).

We have no concept of a date on which something was originally available,
either - assorted subsets of our surface mooring data are released in 
real time,
other parts take years to be published.

Originally, I tried to use the terms you've listed, which were in 
Ethan's first cut
of ACDD, but I had to make up definitions for them - because they really 
have
no inherent meaning for my data. I think this is true of most 
observational and/or
in situ data.

Do we need 2 sets of 'file date' terms, one for products/models and one for
observational data sets?

Regards-
Nan Galbraith

On 9/18/14 1:56 PM, Philip Jones - NOAA Affiliate via Esip-documentation 
wrote:
> John,
>
> Thanks for considering my comments.
>
> I don't see why the descriptors "released", "published' or "issued" 
> are a problem for describing the data publication date. Most products 
> have an associated version number and by definition this attribute 
> would be the date that version was released. (Maybe we need to add an 
> attribute for product version number.) I have two concerns with the 
> current proposal, date_product_available. 1) Attributes should not 
> have a compound purpose, e.g., date the product was originally created 
> or made available. These can be two distinct dates. 2) The meaning of 
> these attributes must be obvious to users by the attribute name alone, 
> and I'm not sure the intended meaning of "available" would be obvious. 
> Note that users of these netCDF files will not refer to the ACDD pages 
> to look up the attribute definitions. The netCDF creators possibly 
> will, but not users.
>
> I very much suggest we keep an attribute that supports the original 
> create date. Just about every metadata standard for science data to 
> documents, images and video, supports the concept of date created and 
> date modified. What do we gain by removing it from ACDD?
>
> I can discuss it more in our meeting.
>
> Phil
>
> On Thu, Aug 28, 2014 at 3:49 PM, John Graybeal 
> <jgraybeal at mindspring.com <mailto:jgraybeal at mindspring.com>> wrote:
>
>     Philip,
>
>     No worries about the late date, if we can make it noticeably
>     better I don't think anyone will mind a small delay in finalizing.
>     But push to wrap up at this next meeting if we can!
>
>     /Regarding date_product_[generated|distributed|released] /: I
>     didn't care for 'distributed' because the same product can be
>     distributed multiple times; and I didn't care for 'released'
>     because that word often has a formal meaning (in opposition to
>     unreleased). Anna and I came up with *date_product_available* --
>     how does that work for you? The definition, now with further
>     clarifications, is
>
>>     *date_product_available* : The date on which this individual data
>>     file or other product was made available (ISO 8601 format);
>>     corresponds to ISO 19115-2 CI:DateTypeCode of "publication". This
>>     can be the date the product was originally created in some
>>     systems; for others, it may be the date the product was (first)
>>     provided to a user. This means the availability date may be after
>>     the product was first created; therefore the
>>     date_content_modified and date_values_modified should be used to
>>     assess the age of the content.
>
>     Let's pose the question to the group of whether
>     *date_product_generated* adds value, for the purposes you identify
>     (provenance tracking and managing additional or replacement
>     files). I assume we are trying to assess this from the external
>     user's perspective, and allow for file and web service protocols.
>      My take: Knowing when the file was created provides no inherent
>     advantage to the user receiving that file unless he or she knows
>     the mechanism by which the system creates files, and that the
>     mechanism won't change. (Obviously the data system that creates
>     and publishes the file could tie its provenance records to the
>     file creation time, if it keeps data in files internally; but it
>     could equally well tie it to the availability time, or a unique
>     ID, or the provenance could be much more atomic than a whole data
>     file.)  I'm not sure which use case you mean by 'managing
>     additional or replacement files'; again from the user's
>     perspective, I think all the use cases for that are addressed with
>     the existing three attributes. Happy to work this through offline
>     if that helps.
>
>     /Regarding date_[content|values]_modified/ : The terms 'data' and
>     'metadata' are ambiguous in most contexts, including this one; I
>     would not like those terms myself. Assuming we are trying to
>     satisfy the primary use cases of "when did _anything_ change?" and
>     "when did the values change?", maybe we can improve the first by
>     replacing 'content' with 'product': *date_product_modified*.
>
>     I can't think of a better term than 'values'; 'variables' to many
>     includes the variable attributes, which we are explicitly trying
>     to exclude. Since the definition is the important thing, maybe we
>     can choose from a list of possible name pairs at the next meeting?
>      What choices would you add to the following?
>     1)  date_content_modified, date_values_modified
>     2)  date_product_modified, date_values_modified
>     3)  date_data_modified, date_metadata_modified
>
>
>     John
>
>
>
>
>     On Aug 22, 2014, at 06:35, Philip Jones - NOAA Affiliate
>     <philip.jones at noaa.gov <mailto:philip.jones at noaa.gov>> wrote:
>
>>     John, thanks for your responses.
>>
>>     If that is the intended meaning of date_product_generated, then I
>>     agree the attribute name should better reflect that meaning. If
>>     you want to avoid using the word "issued", then maybe use
>>     "released" or "published" in the name. For example, date_released
>>     or date_published. Because "distributed" can be understood as an
>>     ongoing activity, whereas "released" or "published" imply the
>>     initial distribution of a particular version.
>>
>>     I'm still not sure the definitions of date_content_modified and
>>     date_values_modified would be apparent to users of a data file.
>>     What about simply using date_data_modified and
>>     date_metadata_modified?
>>
>>     The rationale for deprecating the date_created attribute on the
>>     ACDD page says:
>>     "date_created:deleted in favor of date_product_generated (which
>>     used to be date_issued); we did not have a use case for knowing
>>     the date a stream or product was _first_ generated, once it has
>>     been updated"
>>     Having the producer's initial create date of a file is important
>>     for provenance tracking and for managing additional or
>>     replacement files that may be created. Only using the *_modified
>>     date attributes creates a dependency on using the history
>>     attribute correctly with change details in order to determine the
>>     original create date of a file.
>>
>>     I apologize for the late comments and do not wish to delay plans
>>     for this ACDD version. I'll try to make the next group call.
>>
>>     Phil
>>
>>
>>     On Thu, Aug 21, 2014 at 4:16 PM, John Graybeal
>>     <jgraybeal at mindspring.com <mailto:jgraybeal at mindspring.com>> wrote:
>>
>>         Hi Philip, thanks for your input. Here are my thoughts,
>>         looking for feedback from you and the list.
>>
>>>         date_product_generated:  Is this attribute intended to hold
>>>         the initial create date of the file?
>>
>>         No, it was meant to be when it was distributed (the
>>         separation you wanted). This corresponds originally to the
>>         ISO 19115-2 code
>>         /gmd:dateType/gmd:CI_DateTypeCode="publication", which says
>>         here
>>         <https://geo-ide.noaa.gov/wiki/index.php?title=ISO_19115_and_19115-2_CodeList_Dictionaries#CI_DateTypeCode> "date
>>         identifies when the resource was issued". To me this means
>>         'visibly released' to the external users, but some say
>>         'issued' means 'produced' in the system).
>>
>>         As I understand it, the ACDD attribute targets the use case
>>         "if I went to the site on date X, was it there yet?" This
>>         being helpful for people or computers who grab information
>>         from a site at every so often, to know what they don't have
>>         to grab.
>>
>>         So I agree the word 'generated' is confusing here; I can't
>>         find a discussion where it changed from _issued to
>>         _generated, but I think it was an attempt to avoid the
>>         ambiguity of the ISO term 'issued'.
>>
>>         Perhaps this is better:
>>
>>         *date_product_distributed*: The date on which this individual
>>         data file or other product was distributed (ISO 8601 format).
>>         This may be after the product was created (but not before);
>>         therefore the date_content_modified and date_values_modified
>>         should be used to assess the age of the content.
>>
>>         (I wanted to add "If the identical data file or product is
>>         distributed multiple times, this should be the first date of
>>         distribution." But it is pretty wordy already.)
>>
>>>         date_content_modified, date_values_modified
>>>         Both definitions mention changes to the "data", which I
>>>         presume means changes to variables in the file. Can the
>>>         definitions and maybe the attribute names be clarified so
>>>         that the differences between them are clear? Suggest using
>>>         terminology from the netCDF data model
>>>         <https://www.unidata.ucar.edu/software/netcdf/docs/html_guide/netcdf_data_set_components.html>.
>>>
>>
>>         Well, that might be more precise, if we can agree. I'm a
>>         little nervous proposing a change, but let's see what people
>>         say about just changing 'data' to 'variables' and 'metadata'
>>         to 'attributes':
>>
>>         *date_content_modified*:  The date on which any of the
>>         provided content, including variables, attributes, and
>>         presented format, was last created or changed (ISO 8601 format)
>>
>>         *date_values_modified*: The date on which the provided
>>         variables' data values were last created or changed; excludes
>>         attributes and formatting changes (ISO 8601 format)
>>
>>>         can you add the original version 1 from 2005 to the wiki
>>
>>         Good suggestion. As discussed on the call, we'll add this.
>>
>>         John
>>
>>
>>
>>         On Aug 21, 2014, at 10:58, Philip Jones - NOAA Affiliate via
>>         Esip-documentation <esip-documentation at lists.esipfed.org
>>         <mailto:esip-documentation at lists.esipfed.org>> wrote:
>>
>>>         John, all,
>>>
>>>         I have a few late comments/questions on the date attributes.
>>>
>>>         Attribute:
>>>         date_product_generated
>>>             The date on which this data file or product was
>>>         produced/distributed (ISO 8601 format). While this date is
>>>         like a file timestamp, the date_content_modified and
>>>         date_values_modified should be used to assess the age of the
>>>         contents of the file or product.
>>>
>>>         Comment:
>>>         The date-time a file was "produced" (generated) is not the
>>>         same as when it was "distributed", because not all datasets
>>>         are distributed in real-time. Many datasets are
>>>         produced/generated weeks prior to their distribution. I
>>>         recommend separating produced from distributed, which
>>>         suggests that date_issued is still relevant. Is this
>>>         attribute intended to hold the initial create date of the file?
>>>
>>>         Attributes:
>>>         date_content_modified
>>>             The date on which any of the provided content, including
>>>         data, metadata, and presented format, was last created or
>>>         changed (ISO 8601 format)
>>>         date_values_modified
>>>             The date on which the provided data values were last
>>>         created or changed; excludes metadata and formatting changes
>>>         (ISO 8601 format)
>>>
>>>         Comment:
>>>         Both definitions mention changes to the "data", which I
>>>         presume means changes to variables in the file. Can the
>>>         definitions and maybe the attribute names be clarified so
>>>         that the differences between them are clear? Suggest using
>>>         terminology from the netCDF data model
>>>         <https://www.unidata.ucar.edu/software/netcdf/docs/html_guide/netcdf_data_set_components.html>.
>>>
>>>
>>>         Also, if this group is maitaining a history of all ACDD
>>>         versions, can you add the original version 1 from 2005 to
>>>         the wiki? It is no longer hosted at Unidata. Archive link
>>>         from April 2014:
>>>         https://web.archive.org/web/20140424133239/http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/formats/DataDiscoveryAttConvention.html
>>>
>>>         Phil
>>>
>>>
>>>         On Tue, Aug 19, 2014 at 7:57 PM, John Graybeal via
>>>         Esip-documentation <esip-documentation at lists.esipfed.org
>>>         <mailto:esip-documentation at lists.esipfed.org>> wrote:
>>>
>>>             Hi all,
>>>
>>>             In case we get time to consider ACDD Thursday, here are
>>>             the issues I've seen on the discussion thread and their
>>>             current status. The page at
>>>             http://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery_1-2_Working contains
>>>             the recommended changes.
>>>
>>>             If there are no further concerns raised, I'd like to do
>>>             preliminary approval/consent at this call, and schedule
>>>             the final approval for next call. (If there are still
>>>             concerns, we can discuss them on-list or on the call.)
>>>             The approved document can become Version 2, since
>>>             several people have started calling it that; then
>>>             everyone is free to work on a groups-aware revision, as
>>>             they see fit.
>>>
>>>             A brief reminder: With respect to issues (1) and (2),
>>>             because ACDD attributes are all recommendations -- there
>>>             are no 'shall' statements in the document -- people are
>>>             still within the specification while not using whatever
>>>             attributes they don't like. So it isn't dysfunctional if
>>>             there are attributes that some choose to omit, or
>>>             deprecated terms that some choose to use.
>>>
>>>             === Open Topics ===-
>>>
>>>             1) Deprecation of date_* attributes
>>>
>>>             This related to the deprecation of
>>>             date_created, date_issued, data_modified
>>>             attributes, while adding (not 1 for 1)
>>>             date_content_modified, date_values_modified,
>>>             date_product_generated.
>>>
>>>             This topic was previously summarized in email; review
>>>             that summary on the talk page[1]. If there continue to
>>>             be concerns, we can vote on the best answer..
>>>
>>>             2) Adoption of summary metadata for geospatiotemporal
>>>             ranges (good, tolerable, or bad?)
>>>             Extensive discussion led to an explicit section
>>>             addressing key software principles[2], and some warning
>>>             text.  I have not received any critical comments since
>>>             the last round of changes. (I think one critic is
>>>             satisfied, another perhaps just silent. :->)  If
>>>             concerns remain, we can discuss and vote.
>>>
>>>             3) Organization of ACDD pages
>>>
>>>             There is a bit of confusion still with the current
>>>             organization. I hesitated to go wild with fixes myself,
>>>             but now that I'm co-chair with Anna, I think we can just
>>>             fix issues as they are identified. If you have an issue
>>>             with the ACDD organization, can you please send it to
>>>             the list or us, as you prefer?  With approval a lot will
>>>             become more transparent.
>>>
>>>             John
>>>
>>>
>>>
>>>
>>>
>>>
>>>             [1] Summary of date_* attribute concerns:
>>>             http://wiki.esipfed.org/index.php/Talk:Attribute_Convention_for_Data_Discovery_1-2_Working#Attributes_Discussed_and_Resolved
>>>
>>>             [2] Spatial and Temporal bounds summary recommendations:
>>>             http://wiki.esipfed.org/index.php/Talk:Attribute_Convention_for_Data_Discovery_1-2_Working#Spatial_and_Temporal_Bounds
>>>
>>>

-- 
*******************************************************
* Nan Galbraith        Information Systems Specialist *
* Upper Ocean Processes Group            Mail Stop 29 *
* Woods Hole Oceanographic Institution                *
* Woods Hole, MA 02543                 (508) 289-2444 *
*******************************************************


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-documentation/attachments/20140919/d3eddb78/attachment-0001.html>


More information about the Esip-documentation mailing list