[Esip-preserve] Citations guideline revisions
Bruce Barkstrom
brbarkstrom at gmail.com
Tue Jul 26 09:57:41 EDT 2011
Sounds like a workable suggestion. At this point, we're arguing
theory. It would be
useful to get some practice.
It would probably be helpful to provide guidance via the Data
Committee on recommended
use of the suggestion - meaning examples and procedures for automating
the process.
Bruce B.
On Tue, Jul 26, 2011 at 9:24 AM, Moses, John F. (GSFC-5860)
<john.f.moses at nasa.gov> wrote:
> Hi Bruce,
> I'd suggest we are after a solution that works for most applications - which is that of a persistent data object handle accepted by publishers. DOIs have emerged as a likely candidate for early adopters. We can decide whether a DOI maps to a collection or a single file - and it may vary based on product type. In the case of a collection, authors can get to individual files by adding qualifiers in the reference material - i.e., like adding page/paragraph or table/figure number. It may not be ideal, but it appears to be the best practical way to move forward.
>
>
> John F Moses
> EOSDIS Science Operations, ESDIS Project Code 423
> BLD 32, E208B
> Goddard Space Flight Center, Greenbelt, MD 20771
> voice at GSFC (301)614-5308
> fax at GSFC (301)614-5267
> Email john.f.moses at nasa.gov
>
> -----Original Message-----
> From: esip-preserve-bounces at lists.esipfed.org [mailto:esip-preserve-bounces at lists.esipfed.org] On Behalf Of Bruce Barkstrom
> Sent: Tuesday, July 26, 2011 9:04 AM
> To: Mark A. Parsons
> Cc: esip-preserve at lists.esipfed.org; Greg Janée
> Subject: Re: [Esip-preserve] Citations guideline revisions
>
> OK - again, we stumble on mental models. Mark assumes citations only need point
> to file collections to provide credit for work done in creating a
> collection. I assume citations
> are necessary for replication of results and therefore need to be done
> for individual files. The
> collection citation approach could induce its own pleasantries, such
> as GSFC wanting the
> citation to point to all MODIS files in MODAPS or GES, whereas NSIDC
> only wants the
> MODIS citation to go to the NSIDC collection of MODIS files. These
> might well be the same
> files (replicated). Of course, maybe NSIDC would keep some files that
> GSFC got rid of
> - which would be useful. In the long run, if NSIDC decided they
> needed to reformat their
> collection to avoid obsolescence, whereas GSFC didn't or used a
> different reformatting,
> citation could be interesting. Both citations could still point to
> scientifically equivalent
> data. [An equivalent case could be made for a Word document where one
> copy is in
> .doc format and another copy is in .docx - or .pdf. Are they really
> different and do they
> deserve different citations?]
>
> As a minor correction to this note, in my mental model, a "version" of
> a collection
> is usually a collection whose instances of time sampling are the same as those
> of a previous version but with different errors. An analogy is a new
> edition of a
> printed encyclopedia - except that there are no new articles in the new edition,
> so the new edition contains exactly the same articles, but with
> revised contents.
> I'm not sure what Greg's original comment assumed about how many files
> there might
> be in a "version" nor how a new "version" relates to the old one.
>
> Bruce B.
>
> On Tue, Jul 26, 2011 at 8:41 AM, Mark A. Parsons <parsonsm at nsidc.org> wrote:
>> I just meant that you may need multiple identifiers for different purposes. I thought that was a central conclusion of the Cluster's identifier paper that was just published. Meanwhile, you need to pick one collection-level identifier for citation. That identifier can support multiple resolution, but typically it wouldn't be pointing to individual files but rather a data set home page or some such. There are multiple considerations to address when choosing your citation identifier. One is whether the publisher will use it. Another is whether you are allowing the publishers too much control as Bruce fears. Archives need to weigh those considerations and then recommend an approach to their users.
>>
>> All said, I don't think any of this conversation changes the guideline which says use a locator in your citation and then further notes that publishers like DOIs, but please correct me if I am wrong and modify the guidelines accordingly.
>>
>> -m.
>>
>> Sent from my iPad. Pardon my brevity.
>>
>> On Jul 26, 2011, at 6:27 AM, Bruce Barkstrom <brbarkstrom at gmail.com> wrote:
>>
>>> "Multiple resolution" is what's needed. A particular archive can actually have
>>> multiple copies of a file (one in "deep storage", another on tape, a
>>> third on disk
>>> for rapid access, and a fourth being staged for production). More importantly,
>>> data files with different formats can actually be scientifically
>>> identical and stored
>>> in different locations. One example of replication of files to
>>> different storage
>>> locations is the NOAA CLASS archive which (last time I had checked) puts
>>> duplicate copies in separate locations - and the NOAA data centers might
>>> also choose to put copies in offsite locations as well.
>>>
>>> Bruce B.
>>>
>>> On Tue, Jul 26, 2011 at 2:27 AM, Greg Janée <gjanee at icess.ucsb.edu> wrote:
>>>> The Handle and DOI systems support "multiple resolution" which can be used
>>>> for, among other things, describing the multiple locations at which the
>>>> object may be found.
>>>>
>>>> I don't know how often this capability is used in practice, but multiple
>>>> resolution would seem to be a great help in thinking of an identifier as
>>>> identifying an abstract object (e.g., a version of a dataset) for which
>>>> there may be varying numbers of copies in existence at any given time.
>>>>
>>>> Regarding Mark's comment, is it ever desirable for an object to have more
>>>> than one persistent identifier? If it takes some amount of awareness and
>>>> responsibility and effort to maintain one identifier over time, doesn't that
>>>> burden get multipled N times if there are N identifiers? And then there's
>>>> the diluting effect of having more than one identifier, which causes
>>>> confusion (which identifier should I use?), plays havoc with citation
>>>> counting and search system ranking, etc.
>>>>
>>>> -Greg
>>>>
>>>> On Jul 25, 2011, at 12:02 PM, Mark A. Parsons wrote:
>>>>>
>>>>> Yes, use as many identifiers as you like, but you should probably only use
>>>>> one in a citation. The publishers would probably prefer that be a DOI (at
>>>>> the moment at least).
>>>>>
>>>>> Cheers,
>>>>>
>>>>> -m.
>>>>>
>>>>> On 25 Jul 2011, at 12:22 PM, Bruce Barkstrom wrote:
>>>>>>
>>>>>> One question that I don't think we've addressed is whether having a
>>>>>> single
>>>>>> source of redirection will decrease the probability of losing information
>>>>>> due
>>>>>> to the loss of multi-site replication. Going to the multi-identifier
>>>>>> approach
>>>>>> would be more consistent with multi-site distribution of locators.
>>>>>>
>>>>>> Bruce b.
>>>>
>>>> _______________________________________________
>>>> Esip-preserve mailing list
>>>> Esip-preserve at lists.esipfed.org
>>>> http://www.lists.esipfed.org/mailman/listinfo/esip-preserve
>>>>
>>
> _______________________________________________
> Esip-preserve mailing list
> Esip-preserve at lists.esipfed.org
> http://www.lists.esipfed.org/mailman/listinfo/esip-preserve
>
More information about the Esip-preserve
mailing list