[Esip-preserve] [FOO] Scientific Equivalence

Wed Oct 20 15:59:13 EDT 2010

On 10/20/10 15:36, Bruce Barkstrom wrote:
> It is much more direct to identify corresponding data elements in
> each collection (noting that one data collection being compared may
> come in a number of files, whereas the other collection might be
> grouped in a single file), identify which elements constitute the
> ones to use for scientific identity, and compare the values
> directly.

Note that in part of my example, the archive produces the granule,
then produces the SEI, then deletes the granule.  Someone else later
attempts to reproduce the workflow captured within the provenance
information, obtaining a distinct granule, with a distinct granule
identifier and distinct provenance, but with the same SEI.

The granules themselves never existed at the same time, and their
values couldn't have been compared.

> There may be a philisophical issue lurking here: do we require the
> same provenance to get two measurements to have scientifically
> identical values?  I suspect not, but that's a question that
> requires a bit more deliberate thinking.  Clearly, human beings with
> exactly the same provenance must be identical (indeed, that
> identifies a unique individual).

I think I need a better term for what I am doing.

"Scientific Equivalence" I think should be reserved for a comparison
of content (like you are doing) or the fingerprinting of content (like
that other guy).

Cavanaugh and Graham [1] break down equivalence into three terms:

Exact Equivalence - exactly the same content (what I prefer to call
identical)

Strong Provenance Equivalence - things where all provenance
information is the same.  With my definition of provenance, as you
point out above, it is impossible to have two entities with identical
total provenance that aren't exactly equal.

Weak Provenance Equivalence - Some specified subset of provenance
matches.  This is what I am doing.

> However, we can affirm that two measurements with very different
> provenance chains are scientifically identical if we have a
> trustworthy method of comparison.  Perhaps we can call that
> measurement validation.

I agree.  If two independent satellites measured some geophysical
property with two different instruments operating in two different
manners using two different algorithms, and each were independently
validated against a third (known better) dataset, we could still say
that those datasets were scientifically equivalent.  As you point out,
that is a totally different concept than what I am trying to
accomplish.

Curt

[1] "Apples and Apple-shaped Oranges: Equivalence of Data Returned on
Subsequent Queries with Provenance Information",
http://people.cs.uchicago.edu/~yongzh/papers/apples-oranges.ps