[Esip-preserve] Another Pleasantry - Unique Identifiers for "Jobs"
Curt Tilmes
Curt.Tilmes at nasa.gov
Tue Aug 17 09:21:25 EDT 2010
On 08/17/2010 08:47 AM, alicebarkstrom at verizon.net wrote:
> While we've had fun with unique identifiers for files and file
> collections, we haven't paid much attention to process
> identifiers. The math is clear: production involves a graph whose
> nodes are files and "jobs", even if we have to deal with ad hoc (or
> exploratory) production. To do provenance tracking, you have to be
> able to do a breadth first seach of the production graph, which
> means that for production history, the jobs need unique identifiers
> as much as the files (or - in some odd cases for Earth science -
> database transactions). In other words, provenance tracking is going
> to require unique identifiers for the residue of "jobs". If you
> don't have these, you can't be sure of being able to reconstruct the
> production history provenance.
There are others, but the main options are the same as for granules:
1) UUID, Assign a unique, global identifier for each instance
2) URI, (PURL/ARK/XRI/etc. something persisent that is a URI)
or both. (Though I also like the hierarchical URN approach like SPASE
too.)
If you choose UUID, the next step is to discuss resolution, which
inevitably leads to 2. Though I do think resolution is less important
than for files, I think it is reasonable and useful to provide it for
both jobs and files (granules).
If we just say "URI", then any of the URI-like schemes can be
accomodated. If some organization (for whatever reason) wants to use
ARKs, and another wants to use PURLs, etc. As long as there is a
unique, persisent, resolvable URI, it all works.
Provenance tracking within an organization is easy (not trivial, but
at least straight forward). The bigger problem is across
organizations -- if we can address that, the local problem solution
falls out naturally.
If we like OPM (http://openprovenance.org/) [I do], then we can
recommend the XML or RDF serialization of graphs represented with that
model:
http://openprovenance.org/examples/pc1-time.xml (XML)
http://openprovenance.org/examples/pc1-time.n3 (RDF)
We're then talking about using our recommended granule identifiers in
the <opm:artifact id="..."> and our recommended job identifiers in the
<opm:process id="..."> part. (Either UUID or URI or something else)
Curt
More information about the Esip-preserve
mailing list