[Esip-preserve] Possible Workaround for data identity non-uniqueness?
Lynnes, Christopher S. (GSFC-6102)
christopher.s.lynnes at nasa.gov
Wed Oct 13 08:56:23 EDT 2010
I agree with Curt's assessment that the canonicalization has practical problems for data that has been reformatted in a way that does not affect the content.
Is there perhaps a workaround where the reformatting agent simply asserts that they are equivalent? That is, to add a metadata attribute that says, "this file is scientifically equivalent to this other file (e.g., identified by uuid)"?
On Oct 13, 2010, at 8:33 AM, Curt Tilmes wrote:
> You can argue that coming up with a C(x) canonicalization isn't
> practical for our data (I won't even disagree :-) I sure don't want to
> do it myself), but your paper doesn't present that argument, or even
> address the point. Your conclusion simply assumes it is true.
>
> As Altman demonstrates for his field, it is certainly conceivable.
>
> I'm also not certain that we have to develop something that "applies
> to all Earth science data" to be useful. Perhaps we can come up with
> something reasonable for a subset, for example, annotated files in one
> of the self-describing formats (HDF/NetCDF/etc.) where the annotations
> can contribute to the canonicalization process (i.e. you tag text
> fields with a property that says "case-insensitive canonicalization of
> this field will maintain scientific equivalence"
--
Dr. Christopher Lynnes NASA/GSFC, Code 610.2, Greenbelt, MD 20771
Phone: 301-614-5185
More information about the Esip-preserve
mailing list