[Esip-preserve] NSF data management & sharing policies

Bruce Barkstrom brbarkstrom at gmail.com
Thu Apr 28 15:47:02 EDT 2016


While a survey of policies is probably of some use, it might be equally
sensible to ask for insight from more fundamental
point of view, such as how much work a data producer team has to do to get
data into acceptable or "publishable"
condition.  One metric is how long has it taken similar teams to get their
data into what they regard as "acceptable"
condition.  Another is how much data are they producing over the period of
the experiment (perhaps reduced to a
rate metric like the average number of files or the average peak rate at
which files are being produced).  A third is how much
software the team is using to produce their data (probably measured in
Source Lines of Code or some similar measure).
A fourth is the requirements on uncertainty that the team must meet.

Generically, we might rank each of these four into rough bins:

Average time required to remove blunders and correct errors or
inconsistencies (in months):
   Short (1 day to 1 week)
   Moderate (1 week to 1 month)
   Fairly long (1 month to 6 months)
   Long (> 6 months)

Data production rate (measured in files produced or objects collected per
day):
   Small data production rate - (1/30 per day - or 1 per month to 1 per day)
   Moderate rate - (1 per day to 10 per day)
   Fairly high rate - (over 10 per day to 100 per day)
   Industrial high rate - (over 100 per day)

Size of software used (measured in SLOC):
   Small (< 10,000)
   Moderate (10,000 to 50,000)
   Fairly large (50,000 to 100,000)
   Large (> 100,000)

Required Uncertainty (measured by credible interval/range of variability)
   Large tolerance for uncertainty (> 10)
   Moderate tolerance (0.1 to 10)
   Fairly small tolerance (0.01 to 0.1)
   Highly stringent tolerance (< 0.01)

There are a few other factors that might be useful to allow policies to be
"bendable":
- How much leeway should be allowed for factors outside the control of the
producer,
   such as hardware failures or operational glitches?
- How much physical and mathematical knowledge does a user need in order to
   avoid erroneous interpretation of anomalies in the data?
- How open are investigators not on an original science team to contributing
   without receiving funding that's already been allocated to the team?

I wouldn't want to claim to have put a lot of quantitative thinking into the
suggested categories, but I think these or similar measures would be
helpful in
setting up guidelines for more substantive policies.

Bruce B.

On Thu, Apr 28, 2016 at 1:50 PM, Arctur, David K <david.arctur at utexas.edu>
wrote:

> Here’s another compilation I just found for DMP templates across multiple
> agencies: https://dmptool.org/guidance?method=get&scope1=all
>
> -dka
>
>
>
> On Apr 28, 2016, at 12:12 PM, Bruce Barkstrom <brbarkstrom at gmail.com>
> wrote:
>
> The policies vary not just because of the agency differences; they also
> vary because of the practices regarding data distribution
> and quality control.  For example, weather reports are published within a
> few hours of data ingest into the forecasting centers.
> In the cases I've been involved with in NASA's EOS missions, the rate of
> data production for the larger data centers can lie
> between 10,000 and probably many more than 100,000 files per day - 24-7.
> The science teams on these projects undertake large-scale
> validation efforts to check data consistency and uncertainty.  Just
> gathering the intercomparison data (checking against in-situ
> measurements, for example, as well as seasonal or longitudinal averages)
> could take three to six months.  The same scale issues
> arise in climate model intercomparisons.  In the case of the investigation
> of Clouds and the Earth's Radiant Energy System (CERES),
> we did release the data (at whatever level) provided the data user was
> willing to sign on to a notice that he or she had read a caveat
> about the use of the data and would report back on findings before
> attempting to publish them.  Of course these kinds of projects
> are different from laboratory experiments in the life sciences, as well as
> other scientific disciplines.
>
> Bruce B.
>
> On Thu, Apr 28, 2016 at 12:27 AM, Arctur, David K via Esip-preserve <
> esip-preserve at lists.esipfed.org> wrote:
>
>> Folks, I have some q’s related to data management policy requirements &
>> oversight, would appreciate your feedback.
>>
>> Reviewing the policies of the individual NSF Directorates shows
>> variations in the policy - for example Geosciences permits for a 2-year
>> hold period *following collection*
>> <http://www.nsf.gov/geo/ear/2010EAR_data_policy_9_28_10.pdf> while
>> Engineering which permits for 2-year hold following publication
>> <http://nsf.gov/eng/general/ENG_DMP_Policy.pdf>. Even within Geosciences
>> there is variation (e.g. Division of Ocean Sciences requirement to archive
>> data in specific National Data Center).
>>
>> Some questions that come up for me include:
>>
>>    - Which aspects of the current data management and sharing policy
>>    have been most effective and which aspects does the NSF hope to improve?
>>    - How does the NSF monitor researchers' managing and sharing their
>>    data consistent with the policies and researchers' proposals?
>>    - How much flexibility do PIs of ongoing research projects have in
>>    how they satisfy data management and sharing requirements, in relation to
>>    what they included in their proposal?
>>    - Re: Engineering Directorate - I understand that data that could
>>    lead to commercialization is not subject to the publication requirement. Is
>>    this defined when a proposal is made? How has this been observed and used
>>    in practice?
>>
>> Thanks for your help and thoughts on this.
>>
>> Best, dka
>>>> David K Arctur
>> Research Scientist & Fellow, University of Texas at Austin
>>
>>
>>
>>
>> _______________________________________________
>> Esip-preserve mailing list
>> Esip-preserve at lists.esipfed.org
>> http://lists.deltaforce.net/mailman/listinfo/esip-preserve
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.deltaforce.net/pipermail/esip-preserve/attachments/20160428/6ff503b3/attachment.html>


More information about the Esip-preserve mailing list