[Esip-preserve] Some Thoughts on OPM
Bruce Barkstrom
brbarkstrom at gmail.com
Fri Dec 10 09:41:53 EST 2010
I downloaded the OWL file for OPM and brought it up
in Protege (3.4.4). After a bit of playing and taking some
time to think about what's in the model, here are a few thoughts:
1. It looks like OPM has a reasonable structure for
handling graphs, which is sensible given its origins
in workflows.
2. The nodes in the graph appear to be unstructured, meaning
that in raw form, all nodes are treated the same.
As a result, structural typing of the nodes will have
to be introduced from "outside" OPM. As a concrete
example, this aspect of OPM would not distinguish
between different data products or different versions
of a Data Set from a particular instrument, as well as
Data Sets produced by different sources that are
covering the same data observation time interval.
3. As an analogy for the concerns this raises, you
could imagine building an OPM graph for the components
used in creating a house. At the atomic level of description
in OPM, you'd have individual bricks, boxes of nails,
collections of boards, boxes of shingles, and so on.
When you get to the description of how the house gets
built, you'd wind up with something like "attach board X
to board Y using nail from box B". This isn't a very useful
description - which is why blueprints come as hierarchies.
Someplace in the description of a house, we need an
"architectural view", an "assembly view" and a "sub-assembly
view", probably with a "plumbing blueprint" and an "electrical
wiring blueprint". From my perspective, this kind of hierarchical
description of multi-faceted objects (in the data search sense
of facet, rather than the more limited use of the term to refer
to data types and such in the Ontology building tools) is
absolutely critical. Without it, it's probably impossible to
control data production or to provide meaningful
search capability to users.
4. The "timestamp" in OPM is probably not a useful
label for files (or database entries) unless you're doing
inventory tracking or schedule tracking. For Earth science
data, particularly as used in climate, the key dates are
those that describe the start and end of the observation.
The time the data appear in public is not as important
in the long run as when the Earth was being observed.
This is one of those aspects of Earth science data use
that differs from the library community's experience with
books and serials, where the publication data distinguishes
between entities. In the biomedical field, where a fair amount
of OPM experience arises, I would expect that the time of
observation and the timestamp when the data are placed
in the repository are so close that they can be treated as
nearly the same. In the Earth sciences, it can be years between
the time of observation and the time of "publication". What
matters is the observation time, not the hour, day, month,
or year of public visibility.
5. In tracking time, the scientific basis should be referred
back to the original sources, which are based on the National
Bureaus of Standards and the International Astronomical Union.
While ISO 8601 is the convention used by the IT community for data
transfers,
data time stamping for observation times should probably
be using Astronomical Julian Date and Universal Time Coordinated
(UTC) [see the Explanatory Supplement to the Astronomical
Almanac, as well as the current edition of the Astronomical
Almanac]. Note that 8601 doesn't really allow for times before
0 B.C.E. Also, with Julian Date and UTC, you can calculate
where the Sun is in the sky if you have longitude and latitude.
Ditto for satellite positions if you've got the ephemeris.
Bruce B.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-preserve/attachments/20101210/bc7bd9b0/attachment.html>
More information about the Esip-preserve
mailing list