[Esip-preserve] On Earth Science Data File Uniqueness

Curt Tilmes Curt.Tilmes at nasa.gov
Wed Feb 16 10:24:26 EST 2011


On 02/16/11 09:21, Bruce Barkstrom wrote:
> Actually, I've had at least one instance that I would classify as
> "malicious".
> In ERBE, we were honest about recording instances in which we couldn't
> observe "clear skies".  Other researchers took our data, modified the values
> in those cases and then "published" their revisions as "ERBE" data.  While
> the republishers were not the security community's "black hats", the effect
> was the same.

Our typical scenario for publishing data would be the archive making a
data granule:

ftp://aurapar2u.ecs.nasa.gov/data/s4pa///Aura_OMI_Level2/OMTO3.003//2011/047/OMI-Aura_L2-OMTO3_2011m0216t0444-o35051_v003-2011m0216t125137.he5

and it's metadata:

ftp://aurapar2u.ecs.nasa.gov/data/s4pa///Aura_OMI_Level2/OMTO3.003//2011/047/OMI-Aura_L2-OMTO3_2011m0216t0444-o35051_v003-2011m0216t125137.he5.xml

which includes a field for size in bytes: 38892592
and CRC32 checksum: 148881205

After downloading that granule from the first link, and the metadata
in the second link, I re-calculate the checksum and verify that it
matches.  If it does, I have increased my confidence that the content
the archive thinks it sent to me and the content I think I received
are the same.

With a simple checksum like CRC32, it is trivial for a malicious agent
at the DISC to replace the content on the server side with some
content that has the same CRC32 checksum.

With a 'broken' algorithm like MD5 or SHA-1, that malicious agent
might be able to use considerable work and do the same.  The attack
methods are severely constrained such that it may be difficult to come
up with valid data at all, but it is conceivable.

With an as-yet un-broken algorithm like SHA-256 or SHA-512, it is
computationally infeasible for that malicious agent to do the same.

Of course, in this scenario, that malicious agent could easy replace
both the content and the metadata with a new digital signature.

Others who had downloaded the correct metadata would have the old,
correct one, so conceivably you could catch the problem and
investigate?  If the malicious party replaced the data before it was
sent out, you're out of luck.  That malicious party could probably
just mess with the algorithm to make bad data to begin with too..

Curt


More information about the Esip-preserve mailing list