[Esip-cloud] ESIP Cloud Computing Cluster April Telecon Second Announcement

James Coll jamesmcoll at gmail.com
Tue Apr 19 10:18:46 EDT 2022


Hello ESIP Cloud Computing Cluster Members!  We are excited to welcome
speaker Lucas Sterzinger, a PhD candidate at UC Davis, to present on
[kerchunk](https://fsspec.github.io/kerchunk/) at this month's meeting.
Find the abstract and meeting agenda below.
**Meeting Logistics!**
Topic: Kerchunck tutorial
Speaker: Lucas Sterzinger
Monday April 25th, 10:00-11:00 am PT / 1:00-2:00 pm ET
https://us02web.zoom.us/j/86535177705?pwd=ay9yVDJ6UzNiSGRMWTFxbkNXdEJXUT09
Meeting ID: 865 3517 7705
Passcode: 354962
Find your local number: [Zoom International Dial-in Numbers](
https://us02web.zoom.us/u/knxOPNBj5)
**Abstract:**
Many organizations are moving their data to cloud-hosted object storage,
which allows them greater flexibility in cost, dataset size, access, and
security. For multi-dimensional data, the Zarr format has emerged as a
popular cloud storage format, with consolidated metadata and data chunks
stored in separate objects that allow efficient parallel access.
NetCDF4/HDF5 files have been a community standard for decades and remain an
extremely popular format, however, they do not have consolidated metadata.
Without consolidated metadata, accessing this data requires many small
reads resulting in poor performance on the cloud. Transforming the vast
existing NetCDF4/HDF5 data archives would require substantial computational
resources and create a duplicate of the dataset, doubling storage
requirements and complicating data version control, provenance, and archive
protocols. A potential solution to this problem is to create a consolidated
metadata file containing the byte-range locations of the data chunks and
use it to access the NetCDF4/HDF5 data.  Kerchunk, along with
ReferenceFileSystem - a new part of the Intake group's fsspec (local and
remote file system interfaces for Python) project - perform this task by
creating a JSON file that allows a NetCDF4/HDF5 file to look like a file
system. The data can then be read efficiently using the Zarr library
directly. Using data from the GOES-East satellite hosted on Amazon Web
Services, we demonstrate the effectiveness of this approach and provide a
pathway to improving data access for the vast existing NetCDF4/HDF5 data
archives.
**Prior to the meeting, please clone the repo and set up the environment as
outlined in the readme at [
https://github.com/lsterzinger/2022-esip-kerchunk-tutorial](
https://github.com/lsterzinger/2022-esip-kerchunk-tutorial)**
**Agenda:**
* 5-10 minutes - Announcements: ESIP Summer meeting planning, Open Call for
other announcements
  * Fill out the ESIP summer meeting interest poll at [
https://forms.gle/iGjBVEjx5nUxkBRY7](https://forms.gle/iGjBVEjx5nUxkBRY7)
* 20-40 minutes - Hands on Presentation
* 10-20 minutes - Discussion and questions
Hope to see you next week!
Jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.esipfed.org/pipermail/esip-cloud/attachments/20220419/223d74fc/attachment.htm>


More information about the Esip-cloud mailing list