[Esip-preserve] ESIP Citation Guidelines

alicebarkstrom at frontier.com alicebarkstrom at frontier.com
Tue Oct 12 09:28:46 EDT 2010


Question 3 should probably be rephrased as
(3a) Can I prove that File A and B contain the same data when the
data formats and data element order are different?

I've got an algorithm by which this can be done - but not
by using a cryptographic digest.

My previous e-mail suggests a more precise formulation of question (5).

Bruce B.
----- Original Message -----
From: "Christopher S. Lynnes (GSFC-6102)" <christopher.s.lynnes at nasa.gov>
To: esip-preserve at lists.esipfed.org
Sent: Tuesday, October 12, 2010 8:00:11 AM
Subject: Re: [Esip-preserve] ESIP Citation Guidelines

I'm only half Scandinavian, so my view is not so pessimistic as Bruce's. However, as a would-be practitioner watching the debate, it looks like it has gone too far down in the weeds to have practical value to Joe Data Manager.  My suggestion is to divide and conquer by coming back to specific questions that can (and cannot yet) be answered.

For example:
(1) Can I prove that File A and B are the same file?  A: a cryptographic hash can do this (most of the time)
(2) Can I prove that File A and B contain the same data?  A: yes, if they are the same file. But see next question...
(3) Can I prove that File A and B do NOT contain the same data?  A: much more difficult, due to reformatting, reordering, etc.
(4) Are Dataset A and B the same?  A:  yes, if they have the same dataset identifier (e.g., a DOI)
(5) Did researcher A and B use the same data from datasets A and B?  A:  much more difficult to determine

What I have seen of the debate revolves mostly around questions 3 and 5.  Even though questions 1, 2, and 4 may seem too simple, degenerate or incomplete to be interesting from an academic standpoint, they do have some practical value in today's world of data management.  Perhaps you can couch any recommendations in terms of the questions can be answered easily, v. those that are difficult to answer?
--
Dr. Christopher Lynnes    NASA/GSFC, Code 610.2, Greenbelt, MD 20771
Phone: 301-614-5185

_______________________________________________
Esip-preserve mailing list
Esip-preserve at lists.esipfed.org
http://www.lists.esipfed.org/mailman/listinfo/esip-preserve


More information about the Esip-preserve mailing list