[Esip-preserve] Material for the Collection Structure Breakout at the Winter Meeting
Bruce Barkstrom
brbarkstrom at gmail.com
Tue Dec 24 17:29:19 EST 2013
While I realize it's at the height of the Holiday season, I've had my hands
full trying
to create the material for the breakout session at the Winter meeting.
There are
five attached files that contain material I've used in putting this
material together.
The core of the presentation is contained in the
Revised_Collection_Structure_... . pdf
file. This is fairly close to a PowerPoint or Impress presentation,
including fades and
such in presenting points. There are about a half dozen slides that
identify topics for
discussion by the attendees. I think it would help our discussion if you
could take a
look at these. Note also, that I've taken into account a suggestion that
PowerPoint-like
presentations should show pictures, not just long pages with bullets. [See
if you can find
the two visualizations of IGBP vegetation types and decide if the
underlying data
files are scientifically equivalent.]
The approach I've taken in putting together this material is to try to
create a set of
concrete categories for stable objects in an archive's collection. I've
used a taxonomic
classification approach that is intended to produce an operational
definition of the
categories that result. There's a preliminary categorization - and I've
put together
a taxonomic definition for each of them. Only two of these definitions
appear in
the main presentation. The remaining categories have definitions in the
file
ExampleDefinitions.pdf. That file appears as a non-working hyperlink in
the presentation.
I've also tried to make the categorization crisper by applying a bookkeeping
(or accounting) approach in which objects in an archive's inventory use
accounts
to keep track of the items in the inventory. This raises certain labelling
issues,
including the relationship between the numbering or labelling used in the
chart
of accounts for such an inventory system. I've included a spreadsheet of
the
account headings (and account numbers) from the U.S. Standard General
Ledger.
Usually, government agencies use the USSGL numbers in an OID-like labelling
of the accounts that appear in the chart. Thus, the account labels actually
use a hierarchy based on funds, programs, and projects.
One issue of definition (what part of a digital file contains Earth science
data)
relies on ensuring that the data comes from real observations or
measurements
of the Earth. A provenance chain that demonstrates this traceability
appears
in the pdf file. To understand the traceability, you'll need to examine
the figure
at much higher magnification than just the normal view. I recommend that
you
start by looking at the page with the figure at 400% magnification and then
increase the magnification gradually to as much as 3200%. I believe that
the pdf
viewer should support moving around to various parts of the diagram. The
figure
provides a blueprint of a collection structure, including versions for one
month
of processing data from a single satellite from Level 0 (at the top of the
diagram)
to Level 3 (at the bottom). As a note, both CERES and MODIS can produce
much more complex diagrams than this one figure - including millions of
files.
Folks that are interested in reconstructing data collections might find it
useful to contemplate the implications of this fact of life in satellite
production.
The diagram does include input files of calibration coefficients and other
kinds
of numerical data that would need to be available if someone wanted to
reconstruct
the original data for verification of results.
The latter part of the presentation material deals with scientific
equivalence
(which probably has strong ties to the contentious term "uniqueness" and
touches on "authenticity"). Because finding algorithms to test whether two
inventoried objects are scientifically equivalent touches on when text
strings
are equivalent, I've included a spreadsheet I've developed over the last
half
decade or so. It has about 2,000 terms - converted to upper case and laid
out alphabetically. The terms come from a variety of sources - and appear
to make a case that individual developer tribes use their own dialect with
little
intertribal interchanges. Since users may not come from any of these
developer
tribes, it isn't clear how well users will understand the "standardized
vocabularies".
I'm sorry I wasn't able to get this out to you earlier. However, it's been
a fair
amount of work to put it together. Hopefully, it will be understandable
and of
some interest in trying to resolve some of the difficult issues we have.
Bruce B,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-preserve/attachments/20131224/09231eb3/attachment-0003.html>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-preserve/attachments/20131224/09231eb3/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: AlphabetizedParameterValidsV5.ods
Type: application/vnd.oasis.opendocument.spreadsheet
Size: 95452 bytes
Desc: not available
URL: <http://www.lists.esipfed.org/pipermail/esip-preserve/attachments/20131224/09231eb3/attachment-0001.ods>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.lists.esipfed.org/pipermail/esip-preserve/attachments/20131224/09231eb3/attachment-0005.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ExampleDefinitions.pdf
Type: application/pdf
Size: 55094 bytes
Desc: not available
URL: <http://www.lists.esipfed.org/pipermail/esip-preserve/attachments/20131224/09231eb3/attachment-0002.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Revised_Collection_Structure_Breakout_Session.pdf
Type: application/pdf
Size: 947258 bytes
Desc: not available
URL: <http://www.lists.esipfed.org/pipermail/esip-preserve/attachments/20131224/09231eb3/attachment-0003.pdf>
More information about the Esip-preserve
mailing list