[Esip-preserve] a new relation type for subset citations

Greg Janée gjanee at ucsb.edu
Fri Jul 17 14:52:01 EDT 2015


Andreas Rauber via Esip-preserve <esip-preserve at lists.esipfed.org> wrote:
> As Ruth already pointed put, the primary goal was to find a solution to one part of the challenge, and this has to be a solution that can be practically deployed without prohibitive effort or changes to existing operations.

To return to the original topic of this thread, I think it's always worthwhile to examine the implications of a proposal, and it was in that vein that I brought this topic up.  For the dynamic data citation proposal in question, the implications are that queries will need to be stored (somewhere, somehow) and that PIDs will need to be created and thereafter managed to be able to persistently refer to those queries.  The influx of new PIDs will result in citation dilution---we won't recognize and properly aggregate references to the same dataset---unless the PIDs are related.  My assertion is that this calls for a new type of relation, because these query PIDs are playing a new and distinctive role in the citation ecosystem.

Offline somebody asked me, so what's wrong with the IsPartOf relation?  While we can interpret IsPartOf as a mathematical subset, I think in common usage we most often use "part" in reference to a hierarchical decomposition of some kind.  But for these proposed query PIDs, that interpretation really breaks down, because the query PIDs are going to look like this:

PID #1: granule set { 1, 3, 17, 95, 214, ... }
PID #2: granule set { 3, 95, 514, 925, ... }
PID #3: granule set { 1, 3, 95, 214, 885, ... }
PID #4: "everything west of the Rockies"
PID #5: "evertthing west of the Rockies, before 2004"
PID #6: "east of the Rockies, after 2012"
PID #7: "some geographic coordinates N/S/E/W, a date range, and cloud-free"
...
ad infinitum

While these all might be subsets of the same dataset, I maintain that calling them "parts" of the dataset is stretching our use of the term.  This is particularly apparent if we consider the inverse relation, HasPart.  Do we really want to say that a dataset has the above "parts"?.  Hence my suggestion of IsCitedSubsetOf.

This doesn't get at the other issues that the dynamic data citation WG is addressing, e.g., versioning.  Perhaps we need additional relation types, or perhaps IsCitedSubsetOf satisifes those cases as well, I'm not sure.

-Greg



More information about the Esip-preserve mailing list