[Esip-preserve] [Infusion] Suggestion for tech infusion activity vis a vis MEaSUREs

Christopher Lynnes Chris.Lynnes at nasa.gov
Wed Apr 14 12:57:45 EDT 2010


With the complexity and diversity of some of the versioning schemes  
out there, I would advocate for using a DOI for each Dataset (i.e.,  
DataType + Version).  If a researcher used data from multiple versions  
of a dataset, then the citation of multiple DOIs will make that  
crystal clear.

On Apr 14, 2010, at 10:04 AM, Curt Tilmes wrote:

> On 03/23/2010 02:35 PM, Wilson, Brian D (335G) wrote:
>> We will need to formulate this consensus recommendation quickly.
>>
>> I suggest two features:
>>
>> 1) Publish the MEASUREs datasets as a dataset paper in an appropriate
>> journal so the *dataset* has a refrence-able DOI.
>
> We've begun to discuss/distinguish the concepts of "Data Type" (what
> EOS call's ESDT) from "Dataset", which is a specific version (EOS
> parlance 'Collection') of that Data Type in the ESIP Preservation
> cluster identifiers group.
>
> I put some strawman terms and definitions here: (up for discussion!)
> http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Identifiers#Definitions
>
> I think each of those concepts needs a referenceable identifier from
> which we can construct data citations.
>
> For example, consider ESDT FOO.  It is archived in DAAC MyOrg
> (CrossRef DOI Org 10.12345), which has archived data from ESDT FOO for
> collection 1 (a "Closed Data Set") and is currently archiving
> collection 2 (an "Open Data Set" still being processed from current
> data).
>
> We need a citation for the general data type:
>
> Smith, John. "Some Earth Science Data", FOO, DOI: 10.12345/FOO.
>
> and a citation for each data set (each version of the data time).
> Rather than registering a new DOI for each new version (collection),
> I'm inclined to advise reusing the data type DOI:
>
> Smith, John. "Some Earth Science Data", FOO, DOI: 10.12345/FOO,
> Collection 1.
>
> This "datatype DOI" could also be the 'published paper describing the
> dataset' DOI, but I guess I'd be inclined to have separate DOIs, one
> for the paper, and one for the datatype.  Then a paper could reference
> either or both as appropriate to the nature of the use.
>
>
> Alternatively, we could register distinct DOIs for each new version:
>
> Smith, John. "Some Earth Science Data", FOO, DOI: 10.12345/FOO.1,
> Collection 1.
>
> For the "Open Data Set" case, I think we must precisely qualify the
> citation to reference the specific granule membership of the dataset.
> There are a few ways to do this, but I think the cleanest is a
> date/time stamp:
>
> Smith, John. "Some Earth Science Data", FOO, DOI: 10.12345/FOO,
> Collection 2, 2010-04-01T14:00:00.
>
>> 2) Serve the dataset granules from permanent (as possible) URL's
>> from the origin sites and the receiving DAAC's.  The grabbed real
>> estate, the root of the URL, should reference MEASUREs and the
>> institution, and not contain the name of a computer (or something
>> else that is dumb).
>>
>> 3) As far as truly permanent URI's, I don't know what to say.  I
>> don't think either the handle system, XRI's, or any other system has
>> gotten traction (a large market share).  This is mostly the fault of
>> the W3C, which thinks the entire problem has been solved by existing
>> URLs and URNs.  Hogwash.
>
> I like including both identifiers, datatype and dataset.  I'm leaning
> toward using DOIs for the datatype and PURLs for the precise data
> specification and locator:
>
> Smith, John. "Some Earth Science Data", FOO, DOI: 10.12345/FOO,
> Collection 2, http://purl.org/NET/MyOrg/data/FOO/ 
> 2/2010-04-01T14:00:00.
>
> (Though, as Ruth points out, ARKs are nice too and have their own
> benefits.)
>
> Curt
>
> _______________________________________________
> Infusion mailing list
> Infusion at lists.sciencedatasystems.org
> http://lists.sciencedatasystems.org/mailman/listinfo/infusion_lists.sciencedatasystems.org

--
Christopher Lynnes             NASA/GSFC, Code 610.2          
301-614-5185



More information about the Esip-preserve mailing list