[Esip-preserve] ESIP Citation Guidelines
alicebarkstrom at frontier.com
alicebarkstrom at frontier.com
Tue Oct 12 09:28:46 EDT 2010
Question 3 should probably be rephrased as
(3a) Can I prove that File A and B contain the same data when the
data formats and data element order are different?
I've got an algorithm by which this can be done - but not
by using a cryptographic digest.
My previous e-mail suggests a more precise formulation of question (5).
Bruce B.
----- Original Message -----
From: "Christopher S. Lynnes (GSFC-6102)" <christopher.s.lynnes at nasa.gov>
To: esip-preserve at lists.esipfed.org
Sent: Tuesday, October 12, 2010 8:00:11 AM
Subject: Re: [Esip-preserve] ESIP Citation Guidelines
I'm only half Scandinavian, so my view is not so pessimistic as Bruce's. However, as a would-be practitioner watching the debate, it looks like it has gone too far down in the weeds to have practical value to Joe Data Manager. My suggestion is to divide and conquer by coming back to specific questions that can (and cannot yet) be answered.
For example:
(1) Can I prove that File A and B are the same file? A: a cryptographic hash can do this (most of the time)
(2) Can I prove that File A and B contain the same data? A: yes, if they are the same file. But see next question...
(3) Can I prove that File A and B do NOT contain the same data? A: much more difficult, due to reformatting, reordering, etc.
(4) Are Dataset A and B the same? A: yes, if they have the same dataset identifier (e.g., a DOI)
(5) Did researcher A and B use the same data from datasets A and B? A: much more difficult to determine
What I have seen of the debate revolves mostly around questions 3 and 5. Even though questions 1, 2, and 4 may seem too simple, degenerate or incomplete to be interesting from an academic standpoint, they do have some practical value in today's world of data management. Perhaps you can couch any recommendations in terms of the questions can be answered easily, v. those that are difficult to answer?
--
Dr. Christopher Lynnes NASA/GSFC, Code 610.2, Greenbelt, MD 20771
Phone: 301-614-5185
_______________________________________________
Esip-preserve mailing list
Esip-preserve at lists.esipfed.org
http://www.lists.esipfed.org/mailman/listinfo/esip-preserve
More information about the Esip-preserve
mailing list