User Persona Research for the Throughput Annotation Database
University of Michigan School of Information, United States of America
The Throughput project is developing and populating an annotation database that will support the discovery of related-but-distributed research products for Earth science and natural history data. Researchers will have a way to link data resources, add context, or provide additional information about data, software and publications. Researchers use this system to create annotations that link resources via unique identifiers. Over time, these links create a network of data resources.
Crucial to this project’s success is the development of annotation functionalities and interfaces that meet users’ needs and data practices. We are taking a proactive user-centered approach in our design work, and conducting user needs assessments as part of our development work. Despite the importance of good user design, user needs assessments and user experience research on earth science data repositories are scarce.
Our work here can act as a model for other projects seeking to take a proactive approach in user experience research. Though we initially planned to run a series of participatory design workshops and focus groups, in light of COVID-19 stay-at-home orders, we have pivoted to employ user personas in our design work. Based on the previous literature on the behavior of earth science information users, we have developed 16 user personas and use cases that represent different target users’ goals, needs, skill sets, motivations, actions, and frustrations. This poster presents a selection of these personas, as well as the workflow for using them to assess a user’s potential experience of the current Throughput system. Through analysis with these personas, we find strong support for the Throughput project’s current annotation efforts. We also identify a number of features that Throughput and other projects may wish to consider in further interface development.
Use of StraboSpot in response to Covid-19 pandemic: Using digital tools when the field is out of reach
1University of Wisconsin - Madison, United States of America; 2University of Kansas, United States of America; 3Texas A&M University, United States of America; 4Temple University, United States of America
The Covid-19 pandemic has fundamentally altered how we teach in the geological sciences. It became clear in early March that nearly all field camps – traditionally a six-week capstone experience for undergraduate majors conducted outside – would be cancelled. This development presents a major problem, as many programs require fieldcamp for an undergraduate degree, and there is no other commonly accepted capstone experience for all students. Realizing that StraboSpot provided one possible digital solution to this problem, three StraboSpot PIs and one other geoscientist organized the first community virtual meeting on March 23, 2020. This organization has now lead to a community-based effort, that includes utilizing digital tools.
StraboSpot, originally designed as a research tool, is being rapidly repurposed as a digital teaching tool. The digital data system is designed to allow researchers to digitally collect, store, contextualize, and share geologic data in both the field and laboratory. We are quickly attempting to repurpose existing accumulated data to have students utilize Strabo’s functionality to recreate the “exploration,” “discovery,” and “multiple hypothesis testing” modes that are part of fieldwork. Moreover, we are leveraging other funded NSF grants, specifically on human-nature science interface, to help cognitively assess the students learning. The benefit of StraboSpot for developing self-standing fieldcamp modules is that it provide a “full package” in which exercises can largely be self-contained, such that multiple platforms do not need to used simultaneous (e.g., GoogleEarth, Powerpoint, etc.).
This episode has highlighted the importance of digital data in the curriculum. Part of the crisis results from the lack of digital tools typically utilized in the fieldcamps and the undergraduate curriculum generally. Once fieldcamp experiences are available, there is still a place for digital tools in the undergraduate curriculum: Lab exercises, remote/online learning, and accessibility issues all benefit from having high-quality virtual field exercises.
The Magnetics Information Consortium (MagIC) Data Repository: Interoperability with GeoCodes, EPOS, and Other Information Systems
1Scripps Institution of Oceanography; 2Oregon State University
MagIC (earthref.org/MagIC) is an organization dedicated to improving research capacity in the Earth and Ocean sciences by maintaining an open community digital data archive for rock and paleomagnetic data with portals that allow scientists and others to access to archive, search, visualize, download, and combine versioned datasets. A recent focus of MagIC has been to make our data more accessible, discoverable, and interoperable to further this goal. In collaboration with the GeoCODES/P418 group, we have continued to add more schema.org metadata fields to our data sets which allows for more detailed and deep automated searches. We are involved with the Earth Science Information Partners (ESIP) schema.org cluster which is working on extending the schema.org schema to the sciences. MagIC has been focusing on geo-science issues such as standards for describing deep time. We are also collaborating with the European Plate Observing System (EPOS)'s Thematic Core Service Multi-scale laboratories (TCS MSL). MagIC is sending its contributions' metadata to TCS MSL via DataCite records for representation in the EPOS system. This collaboration should allow European scientists to use MagIC as an official repository for European rock and paleomagnetic data and help prevent the fragmenting of the global paleomagnetic and rock data into many separate data repositories. By having our data well described by an EarthCube supported standard (schema.org/JSON-LD), we will be able to more easily share data with other EarthCube projects in the future.
The Heliophysics and Space Weather Open Knowledge Network: The Convergence Hub for the Exploration of Space Science (CHESS)
1Atmospheric & Space Technology Research Associates, L.L.C.; 2University of California Los Angeles (UCLA); 3Electric Power Research Institute; 4Georgia Institute of Technology (Georgia Tech); 5NASA Goddard Space Flight Center (GSFC)
The growing scale of Earth and space science challenges dictate new modes of discovery--discovery that embraces cross-disciplinary interactions and links between communities, between data, between technologies.
Nowhere is the challenge more pressing than in the field of Heliophysics where solar energy is generated, propagated through interplanetary space, interacts with the Earth's space environment, and poses immediate threat to our technological infrastructure and human-natural systems (i.e., space weather).
We will present a new project within the National Science Foundation Convergence Accelerator program that represents this new mode of discovery "The Convergence Hub for the Exploration of Space Science (CHESS)." Our approach is to semantically link Heliophysics data through a Knowedge Graph/Network (KG). The presentation and discussion will focus on:
- What is a knowledge graph (KG)?
- In what ways are KGs poised to transform Earth and space science?
- The Convergence Hub for the Exploration of Space Science (CHESS) project and bridging to metadata and knowledge architecture efforts in Heliophysics
We will highlight linkages to the NSF EarthCube program and ongoing efforts in the geoinformatics and data science communities across e.g., NSF, NOAA, and NASA.
The Discover framework for domain knowledge supported analysis and interactive visualization of multivariate spatial-temporal data sets
University of Kansas, United States of America
The Discover Framework is a web-based visualization tool built on freeware to enable the visual representation of data, and thus aid scientiﬁc and societal understanding of Earth systems. Open data sources are coalesced to, for example in the DiscoverWater application, illustrate the impacts on streamﬂow of irrigation withdrawals. Scientists and stakeholders are informed through synchronized time-series data plots that correlate multiple spatial-temporal data sets, an interactive time-evolving map that provides spatial context, and domain-knowledge supported trend analysis. Together, these components elucidate trends so that the user can try to envision the relations between groundwater-surface water interactions, the impacts of pumping on these interactions, and the interplay of climate. Aligning data in this manner has the capacity for interdisciplinary knowledge discovery and motivates dialogue about system processes. The Discover Framework has been demonstrated using two ﬁeld cases. First, DiscoverWater, visualizes data sets from the High Plains aquifer, where reservoir- and groundwater-supported irrigation has affected the Arkansas River in western Kansas. Second, DiscoverHABs, combines data and model results from Cheney Reservoir outside of Wichita, Kansas reveal environmental and biogeochemical patterns in the formation of harmful algal blooms (HABs). The Discover Framework is a powerful tool that can be applied to various Earth resource scenarios is starting to expand its use cases with the goal of making it easier for scientists and stakeholder to gain knowledge from data. Therefore, it is necessary to strengthen the correlative measures among data sets, increase compatibility with other scientific tools, such as available from EarthCube, and test the efficacy in a range of important Earth systems through the development of new Discover applications.
Simplifying MetPy for Users Through Data Model Choices
1UCAR, United States of America; 2Iowa State University
MetPy is a Python toolkit for meteorology, encompassing tools for reading data, performing calculations, and making plots. As part of the Pangeo project, whose goal is to provide a framework for analyzing earth system model output that scales to petabyte-scale datasets, MetPy serves as a set of domain-specific functionality that rests on a foundation based on other scientific Python libraries, such as numpy and matplotlib.
In order to scale to the needs of large datasets, Pangeo has identified the need to leverage the XArray and Dask libraries as part of this foundation. XArray provides a standard data model for n-dimensional gridded data based on the netCDF data model, similar to the Common Data Model used within the netCDF-Java library. Dask provides a framework for distributed computation that greatly simplifies the task of doing out of core computation, necessary to work with petabyte-scale datasets.
This work discusses work that has gone on in MetPy to make XArray the core data model in MetPy. This includes making XArray work with MetPy’s chosen physical unit library, Pint. Other additions have included modifying MetPy’s interface for calculation functions to natively accept xarray DataArrays and Datasets to simplify the information users need to provide.
Seamless Long-Tail and Big Data Access via the EarthCube Brokering Cyberinfrastructure BALTO
1Virginia Tech; 2OPeNDAP; 3University of Colorado, Boulder
The EarthCube BALTO broker (Brokered Alignment of Long-Tail Observations) provides streamlined access to both long-tail and big data using Web Services through several distinct mechanisms. First, we updated the OPeNDAP framework Hyrax, software that serves big data from USGS, NASA, and other sources, with a BALTO extension that tags dataset landing pages with JSON-LD encoding automatically. Therefore, the big data made available through Hyrax are now searchable via EarthCube GeoCODES (formerly P418) and Google Dataset Search. The BALTO broker extension to Hyrax makes thousands of datasets easily searchable and accessible. Second, we focused our efforts on a geodynamics use-case aimed at advancing our understanding of continental rifting processes through the use of an NSF mantle convection code called ASPECT. By addressing this use-case, we implemented a web services brokering capability in ASPECT that allows for remotely accessing datasets via a URL defined in an ASPECT parameter file. Third, through another use-case in ASPECT aimed at testing hypotheses involving global mantle flow, we developed a brokering mechanism for a “plug-in” that accesses NetCDF seismic tomography data from the NSF seismology facility IRIS, then transforms it into the format needed by ASPECT to run global mantle flow models constrained by seismic tomography. Fourth, we demonstrate methods to allow any scientist or citizen scientist to make their in-situ IoT based sensor data collection efforts available to the world. Finally, we are developing a Jupyter Notebook with a GUI that allows for users to search Hyrax servers for big datasets and long-tail data. These cyberinfrastructure developments comprise the entire EarthCube BALTO brokering capabilities.
Sciunit: A Reproducible Container for EarthCube Community
1DePaul University, United States of America; 2DePaul University, United States of America; 3DePaul University, United States of America; 4Department of Engineering Systems & Environment, University of Virginia, Charlottesville, VA, USA; 5Department of Engineering Systems & Environment, University of Virginia, Charlottesville, VA, USA; 6Department of Civil and Environmental Engineering, Utah Water Research Laboratory, Utah State University, Logan, Utah, USA
The conduct of reproducible science improves when computations are portable and verifiable. A container provides an isolated environment for running computations and thus is useful for porting applications on new machines. Current container engines, such as Linux Containers (LXC) and Docker, however, have a high learning curve, are resource intensive, and do not address the entire reproducibility spectrum consisting of portability, repeatability, and replicability. As part of EarthCube, we have developed Sciunit (https://sciunit.run) which encapsulates application dependencies i.e, system binaries, code, data, environment, along with application provenance. The resulting research object can be easily shared and reused amongst collaborators. Sciunit is integrated within HydroShare’s JupyterHub CUAHSI notebook environment, and available to the entire community for use. In this poster, we will present three new features in Sciunit which have emerged based on community-provided use cases and discussion. Sciunit is available as a command-line utility. We will: (1) showcase the new Sciunit API. This will allow data facilities to integrate Sciunit as a reproducible environment on portals, (2) show how a Sciunit container can transition to a Docker container and vice versa, and finally, (3) demonstrate the ability to contrast two containers in terms of content and metadata. We will show these capabilities with the Hydrology use case of pySUMMA, a Python API for the Structure for Unifying Multiple Modeling Alternative (SUMMA) hydrologic model.
Remote Identification of Rocks using Spectral Analysis
1Department of Astronomy and Planetary Science, Northern Arizona University; 2Department of Ecology & Evolution, Stony Brook University
The remote location of Antarctica makes it difficult to gather on-site samples of various rock types present on the continent. Thus, identification through remote sensing is a more readily available method of identifying land cover types and understanding Antarctica's surface. While spectral analysis is already used on an image-by-image basis, it has yet to be used on a wider spatial scale in conjunction with satellite imagery. We created a method of automating the collection and processing of spectral data from WorldView-2 and WorldView-3 satellite imagery in order to categorize the types of rocks present on Antarctica. By comparing this data with the on-site data available in the National Science Foundation Polar Rock Repository (PRR), we hope to identify unique parameters in the satellite data in order to categorize the rocks of Antarctica through satellite imagery. While we expect to characterize certain unique rock types using statistical analysis of our available data, there are limitations present with this method and more work will need to be done in order to have a more detailed map of the continent.
Remote data processing inside the ASPECT analysis tool
1OPeNDAP; 2Virginia Tech
ASPECT (Advanced Solver for Problems in Earth's ConvecTion) is an analysis tool that simulates convection in the Earth’s mantle and other planets. The BALTO project has extended the ASPECT software so it can read data used to perform the simulations from the BALTO brokering server. These additions to the ASPECT codebase allow data to be remotely accessed and then processed as if the data were stored on the user’s local computer. The additions to ASPECT can be split into two distinct sections: a URL reader and a netCDF reader/translator. The URL reader uses the Data Access Protocol (DAP) to access remote data from supported web servers. Data values are transferred from the BALTO broker and converted within the URL reader plugin to match the format that is expected by the rest of the ASPECT code. Similarly, the netCDF plugin reads data stored using netCDF from the BALTO broker and transforms these data into the sph file format required by ASPECT to perform global mantle convection. The netCDF plugin can use either local or remote data. Once the NetCDF data are read, the plugin combines and formats the required variables (longitude, latitude, seismic velocity, and depth). These newly formatted values are then converted into the sph internal representation to be used as spherical harmonic data by ASPECT. The conversion and processing of data all takes place within the ASPECT program. Both plugins have been integrated to allow the user to lookup remote data in a seamless fashion and broaden the types of data that can be requested by the user.
QGreenland: Enabling Science through GIS
1National Snow and Ice Data Center, University of Colorado Boulder, United States of America; 2Cooperative Institute for Research in Environmental Sciences, University of Colorado Boulder, United States of America
Geoscientists often spend significant research time identifying, downloading, and refining geospatial data before they can use it for analysis. Exploring interdisciplinary data is even more challenging because it may be difficult to evaluate data quality outside of one’s expertise. QGreenland, a newly funded EarthCube project, is designed to remove these barriers for interdisciplinary Greenland-focused research and analysis via an open data, open platform Greenland GIS tool. QGreenland will combine interdisciplinary data (e.g., glaciology, human health, geopolitics, hydrology, biology, etc.) curated by an international Editorial Board into a unified, all-in-one GIS environment for offline and online use. The package is designed for the open source GIS platform QGIS. QGreenland will include multiple levels of data use: 1) a fully downloadable base package ready for offline use, 2) additional disciplinary and/or high-resolution data extension packages for select download, and 3) online-access-only data to facilitate especially large datasets or updating time series. Software development has begun and we look forward to discussing techniques to create the best open access, reproducible methods for package creation and future sustainability. We also now have a beta version available for experimentation and feedback from interested users and the Editorial Board. The version 1 public release is slated for fall 2020, with two subsequent annual updates.
As an interdisciplinary data package, QGreenland is designed to aid collaboration and discovery across fields. Along with discussing QGreenland development, we will also provide an example use case to demonstrate the potential utility of QGreenland for researchers, educators, planners, and communities.
Planet Microbe: a platform for marine microbiology to discover and analyze interconnected ‘omics and environmental data
1University of Arizona, United States of America; 2Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory; 3Daniel K. Inouye Center for Microbial Oceanography: Research and Education, University of Hawaii,
In recent years, large-scale oceanic sequencing efforts have provided a deeper understanding of marine microbial communities and their dynamics. These research endeavors require the acquisition of complex and varied datasets through large, interdisciplinary and collaborative efforts. However, no unifying framework currently exists for the geoscience community to integrate sequencing data with physical, geological, and geochemical datasets. Planet Microbe is a web-based platform that enables data discovery from curated historical and on-going oceanographic sequencing efforts. In Planet Microbe, each ‘omics sample is linked with other biological and physio-chemical measurements collected for the same water samples or during the same sample collection event, to provide broader environmental context. This work highlights the need for curated aggregation efforts that can enable new insights into high-quality metagenomic datasets. Planet Microbe is freely accessible from https://www.planetmicrobe.org/.
Parameters Affecting the Accuracy of Inverse Models of Quartz Crystallographic Preferred Orientation
1Sonoma State University, United States of America; 2Carleton College, United States of America
Crystallographic preferred orientation (CPO) is an important tool for drawing inferences about the deformation histories of rocks. To move beyond qualitative understanding of CPO to rigorous statistical treatment of CPO integrated with other data types and inferred deformation, we require quantitative methods, and we need to understand the strengths and limitations of those methods.
We are building a software tool that performs forward and inverse modeling of CPO according to the Taylor-Bishop-Hill theory. The software includes a method, based on recent advances in orientation statistics, for quantifying the mismatch between predicted and observed CPO. This method allows us to fit deformations to data using maximum likelihood estimation and Bayesian Markov chain Monte Carlo simulation.
In this presentation, we apply these techniques to synthetic quartz CPO data sets. The data sets, which include realistic uncertainties, are generated from numerical simulations of shear zones with triclinic symmetry. We explore how well the best-fit model recovers the angular (dip, angle of oblique convergence) and magnitude (intensity) parameters of the shear zone. Preliminary results suggest that magnitude is poorly fit because of attractors in the underlying Taylor-Bishop-Hill dynamical system.
Eventually, these statistical methods can be applied to, as well as further refined by, the burgeoning cyberinfrastructure of crystallographic textures hosted by the Strabo data repository. These analytical tools add significant value to our repositories and incentivize the participation in the development of these indispensable resources.
Lithospheric Control of Melt Generation Beneath the Rungwe Volcanic Province and the Malawi Rift, East Africa
1Virginia Tech, USA; 2OPeNDAP, USA
The EarthCube BALTO (Brokered Alignment of Long-Tail Observations) project is aimed at developing new cyberinfrastructures that enables brokered access to diverse geoscience datasets. Towards achieving this BALTO objective, we developed a plug-in for the community extensible NSF open-source code ASPECT (Advanced Solver for Problems in Earth’s Convection) that permits ASPECT to read data from the BALTO server (OPeNDAP’s Hyrax open-source data server) over the web. We present a use-case of the BALTO-ASPECT client, which accesses lithospheric structures from the BALTO server to constrain a 3-D lithospheric modulated convection (LMC) modeling and melt generation beneath the Rungwe Volcanic Province (RVP) and the Malawi Rift. We test the hypothesis that at least part of the melt feeding the RVP is generated from LMC. In the model, we assume a rigid lithosphere, while for the asthenosphere we use non-Newtonian, temperature-, pressure- and porosity-dependent creep laws of peridotite. We find that a significant percentage of decompression melt from LMC occurs at a maximum depth of ~200 km beneath the axis of the Malawi Rift, consistent with the location and maximum depth of imaged low velocity zones. At shallower depths (~100 km), the melting region is focused beneath the RVP where there is rapid (~3 cm/yr) upwelling. Our results suggest that asthenospheric upwelling due to LMC is the main source of melt beneath the RVP and might also entrain the plume head materials with reported high 3He/4He values. We, therefore, propose that part of the melt beneath the northern Malawi Rift feeding the RVP can be generated by LMC without necessitating plumes impinging the base of the lithosphere at present. This use-case demonstrates the capability of the BALTO-ASPECT client to accelerate research by brokering input data from the BALTO server for modeling LMC and melt generation.
Interactive visualization, interprocess communication, and legacy software in a busy world of number crunching
1University of Miami; 2UCAR
Our software stack is lovely, but the strruggle to identify and build a user community remains. Researchers' time budget does not favor visualizations and interactive work that is not an automatically testable code leading to a narrowly replicable digital output object. Even a project-supported PhD student in disciplinary work ended up choosing Matlab-centric conventional workflows and crunched-number type results. Meanwhile, many classroom students find the Jupyter installation and utilization too complicated and cumbersome. The increasingly popular solution of remote JupyterHub notebook servers does not allow for installations of other software on server side -- software whose strength in any case is that it creates nimble lucid interactive visualizations by running on client side. Meanwhile, key project development talent was lost to immigration policies and other vagaries of team recruitment and retention. In short, the project is winding down with less success than we hoped. Results include a working stack of capabilities, both legacy-package improvements and project-fostered synergies, and the lessons learned both scientific and software-related. Our technology stack may yet find its energetic champion, including the PI after this late-project phase of discouragement wears off and the fine capabilites (valued highly by myself) find the right scales and scopes of application for a community of others.
ICEBERG Ahead! Application of Critical Imagery Cyberinfrastructure Calibration Techniques to Investigate Land Surface Properties
1Department of Astronomy and Planetary Science, Northern Arizona University; 2Department of Ecology & Evolution, Stony Brook University; 3Department of Computer Engineering, Rutgers University
One of the greatest challenges to the integration and processing of petabytes of remote sensing data is ensuring that data calibration and processing techniques are sufficient for the application. In this presentation, we highlight the efforts of the Imagery Cyber-infrastructure and Extensible Building blocks to Enhance Research in the Geosciences (ICEBERG) EarthCube project to expand and automate the processing, calibration, analysis, and interpretation of remote sensing data. Specifically, we will highlight the ICEBERG project’s work in the coastal and mountainous regions of Antarctica where field work and surface validation has historically been difficult. This work includes the automation of atmospheric correction techniques, the identification and removal of shadowed terrain, the calibration of multispectral images to surface reflectance, and the spectral parameterization of surfaces to (1) better visualize spectral heterogeneity in satellite images, and (2) correlate these signatures to known and quantifiable surface properties. These efforts are required for investigating regional- and continental-scale geologic and surface processes. We use samples and spectral data from the National Science Foundation Polar Rock Repository (PRR) to validate our calibration efforts and to test the efficacy of our surface reflectance calibration pipeline. These samples are also used to generate spectral parameters to assess the range of spectral heterogeneity observed in the ice-free regions of Antarctica.
Heliophysics Events Knowledgebase support for Heliophysics and Space Weather Research
Lockheed Martin ATC, United States of America
The Heliophysics Events Knowledgebase (HEK) began full operations in 2010 in support of the Solar Dynamics Observatory (SDO) with the purpose of helping researchers navigate the daily 2TB flood of data from its 3 instruments. It consisted of three main components, along with the associated hardware and software infrastructure: an automated Event Detection System (EDS) for identifying features and events in the (primarily) SDO data stream; the Heliophysics Event Registry (HER) for capturing the metadata extracted by the EDS; and the Heliophysics Coverage Registry (HCR) for tracking subsets of the SDO datasets requested by users.
The infrastructure underlying the HER and HCR had previously been prototyped for the Hinode mission, where it was known as the Hinode Observation system, which was, at its base, an implementation of the VOEvent XML standard developed by the International Virtual Observatory Alliance (IVOA).
The HEK team realized that the issues they were addressing for SDO and Hinode would continue to be issues for new missions as Heliophysics entered the era of Big Data and as the Heliophyiscs System Observatory came into being. They spend considerable effort to design the HEK to be an expandable, community resource. The HER can support new event classes, data sources and algorithms, as well as support concepts such as “hypotheses” or meta-events connecting other HEK events and community annotation and cross-linking similar to Facebook and DOIs.
This was first put to the test with the addition of the IRIS mission launched in 2013. The HCR was revamped to support the more complex datasets and to enhance and better integrate the HCR search capabilities.
The launch of the next generation of heliospheric missions, including Parker Solar Probe and Solar Obiter are revealing challenges in event management and mission coordination for which the HEK approach offers a straight-forward solution. Here we present our recent efforts and plans to support these new heliophysics missions as well as the broader needs of heliophysics and space weather research.
Extending Ocean Drilling Pursuits [eODP]: Making Scientific Ocean Drilling Data Accessible Through Searchable Databases
1School of Earth Sciences, University of Bristol, UK; 2Academy of Natural Sciences of Drexel University, USA; 3International Ocean Discovery Program, Texas A&M University, USA; 4Department of Geoscience, University of Wisconsin, USA
Scientific ocean drilling through the International Ocean Discovery Program (IODP) and its predecessors, has a far-reaching legacy. They have produced vast quantities of marine data, the results of which have revolutionized many geoscience subdisciplines. Yet much of the data remain heterogeneous and dispersed. Each study, therefore, requires reassembling a synthesis of data from numerous sources; a slow, difficult process that limits reproducibility and slows the progress of hypothesis testing and generation. A computer programmatically-accessible repository of scientific ocean drilling data which spans the globe will allow for large-scale marine sedimentary geology and micropaleontologic studies and may help stimulate major advances in these fields.
The eODP project seeks to facilitate access to and visualization of these large ocean drilling microfossil and stratigraphic datasets. To achieve these goals, eODP will be linking and enhancing existing database structures: the Paleobiology Database (PBDB), and Macrostrat. Over the next three years, eODP will be accomplishing the following goals: (1) enable construction of sediment-grounded and flexible age models in an environment that encompasses the deep-sea and continental records; (2) expand existing lithology and age model construction approaches in this integrated offshore-onshore stratigraphically-focused environment; (3) adapt key microfossil data into the PBDB data model; (4) develop new API-driven web user interfaces for easily discovering and acquiring data; and (5) establish user working groups for community input and feedback. This project is targeting shipboard drilling-derived data, but the infrastructure will be put in place to allow the addition of other shore-based information.
Enhancing Data Quality Assessment Capabilities by Providing Unique, Authoritative, Discoverable, Referenecable Sensor Model Descriptions
1Woods Hole Oceanographic Institution, United States of America; 2Texas A&M Corpus Christi, United States of America
With observational data becoming widely available, researchers struggle to find information enabling assessment for its reliable use. A small first-step toward enabling data quality assessment of observational data is to associate the data with the sensor used to make the observations and to have the sensor description machine-harvestable. In the latest additions to the X-DOMES (Cross-Domain Observational Metadata for Enviromental Sensing) toolset, we have created targeted editors for creating SensorML documents to describe sensor models. The team has adjusted its delivery to enable integration of the X-DOMES content with the GEOCODES (JSON-LD/schema.org) EarthCube project. At our poster-session, we will highlight the new changes and capabilities and demonstrate the use of new X-DOMES tools.
Development of Intelligent Databases and Analysis Tools for Heliophysics
1New Jersey Institute of Technology, United States of America; 2Bay Area Environmental Research Institute, United States of America
The scope of the project is to develop and evaluate data integration tools to meet common data access and discovery needs for two types of Heliophysics data: 1) long-term synoptic activity and variability, and 2) extreme geoeffective solar events caused by solar flares and eruptions. The methodology consists in the development of a data integration infrastructure and access methods capable of 1) automatic search and identification of image patterns and event data records produced by space and ground-based observatories, 2) automatic association of parallel multi-wavelength/multi-instrument database entries with unique patterns or event identifiers, 3) automatic retrieval of such data records and pipeline processing for the purpose of annotating each pattern or event according to a predefined set of physical parameters inferable from complementary data sources, and 4) generation of a pattern or catalog and associated user-friendly graphical interface tools that are capable to provide fast search, quick preview, and automatic data retrieval capabilities.
Deep learning methods for bedrock extraction from imagery of Antarctica
University of Colorado, United States of America
The increasing rates of contribution to the global ocean from the Antarctic ice sheet threaten coastal communities worldwide. Most of the mass loss is driven by changes to the ice dynamics within west Antarctica. Recently available large data sets such as the Reference Elevation Model of Antarctica (REMA) and associated imagery provide a means to refine and extract high-resolution modern day observations of ice heights around nunataks and rock outcrops. Mapping bedrock extents helps establish key anchor points from which we can improve and validate observations about ice changes that affect global sea level.
Here we describe a fully automated Deep Learning pipeline for the semantic segmentation of high-resolution satellite imagery in order to extract bedrock masks in Antarctica. Using a state-of-the-art UNet architecture as the base, we train the network in a supervised manner to predict binary masks denoting bedrock cover. Validating the results against the ground truth masks obtained from PGC's LIMA effort conducted in 2007 using Landsat data, our pipeline achieves an F1 score of 0.75 and a Jaccard coefficient of 0.60. This indicates that our method can effectively map bedrock outcrops across Antarctica with about a 10 fold increase in resolution compared to LIMA. We also delineate the process of obtaining vectorized polygon geometries from these binary image masks, the hardware used and the scalability of the processes. These can help us validate our results using photographic field evidence. Our bedrock masks provide data suitable for logistics planning in Antarctica and provide invariant search space targets for co-registration of diverse sets of imagery.
DataAtRisk.org - Status update and upcoming do-a-thons
1Ronin Institute for Independent Scholarship, United States of America; 2CloudBirst Inc; 3Geological Survey of Alabama; 4Harvard Library; 5University of Notre Dame; 6University of Texas; 7DataAtRisk.org
Briefed at last year's EarthCube Community Meeting, the web-based Data Nomination Tool at DataAtRisk.org responds to the clear need for a community-building application by connecting data in need to data expertise and resources. Data owners, managers, or others nominate assets for targeted preservation action by data management experts and community volunteers, and create a community of Data Heroes who ensure the longevity of data products essential to quality research.
The concept is for the tool to have the following key functions:
Allow data to be submitted (or “nominated”) for rescue via a web form that supports opt-in anonymity for nominators who are whistleblowers or need to protect their identity for other reasons
Enable someone who can help with the data rescue activities to sign up and work on activities that meet their interests/capabilities
Share information about the DataAtRisk.org project’s background and provides ways to contribute/get involved
The tool, still in Alpha, initially focuses on Earth, Environmental, and Space science data with future intent to expand to all research data. Over the past year, considerable progress has been made in fleshing out the interface and underlying systems, particularly in response to user input. Currently three scenarios, including one with environmental data from an IS-GEO workshop events, are being piloted to validate further design assumptions, to gain further feedback on interface usability, and to actually rescue some data!
This poster will describe updates over the past year and upcoming opportunities to test drive the system and (hopefully) rescue some more data!
Data Nomination Tool is created and hosted by CloudBIRST (key contact: Joan Saez). DataAtRisk.org’s current members consist of individuals from Earth Science Information Partners (see ESIP Partners here: https://www.esipfed.org/partners) and representatives from several University Research Libraries.
Data Search and Exploration using EarthCube Data Discovery Studio:
1UC San Diego, United States of America; 2US Geoscience Information Network, United States of America; 3University of Hawaii, United States of America
The EarthCube Data Discovery Studio (DDStudio) integrates several technical components into an end-to-end data discovery and exploration system. Beyond supporting dataset search across multiple data sources, it lets geoscientists explore the data using Jupyter notebooks; organize the discovered datasets into thematic collections which can be shared with other users; edit metadata records and contribute metadata describing additional datasets; and examine provenance and validate automated metadata enhancements. DDStudio provides access to 1.67 million metadata records from 40+ geoscience repositories, which are automatically enhanced and exposed via standard interfaces in both ISO-19115 and in schema.org markup; the latter can be used by commercial search engines (Google, Bing) to index DDStudio content. For geoscience end users, DDStudio provides a custom Geoportal-based user interface which enables spatio-temporal, faceted, and full-text search, and provides access to additional functions listed above. Key project accomplishments over the last year include:
- User interface improvements, based on design advice from a Science Gateways Community Institute (SGCI) usability team, who conducted user interviews, performed usability testing, and analyzed a dozen of other search portals to identify the most useful features. This work resulted in a streamlined user interface, particularly in presentation of search results and in management of thematic collections.
- The earlier effort to publish DDStudio content using schema.org markup resulted in significant usage increase. With over 900K records indexed by Google, nearly half of the roughly 1000 unique users per month are now accessing DDStudio via referrals from Google.
- The added ability to harvest and process JSON-LD metadata makes it possible to integrate EarthCube GeoCodes content into DDStudio, and work with this content using DDStudio’s user interface.
- New application domains include joint work with the library community, and interoperation with DataMed, a similar system that indexes 2.3 million biomedical datasets.
CF Conventions for netCDF
1UCAR Unidata, United States of America; 2EUMETSAT; 3Univ. of Washington/JISAO and NOAA/PMEL
The CF (Climate and Forecast) Conventions are a community-developed metadata standard for storing and describing Earth system science data in the netCDF binary data format. Numerous existing FOSS (Free and Open Source Software) and commercial software tools can explore, analyze, and visualize data that is encoded using the CF Conventions. The CF community holds annual workshops to develop, refine, and review enhancements to the CF Conventions and to manage the CF governance and processes.
The EarthCube netCDF-CF project worked with the CF community on the development of extensions to netCDF-CF. Several of these have been accepted into the CF Conventions. Work on these extensions involved broad participation by members of the existing netCDF-CF community as well as members of science domains not traditionally represented in the netCDF-CF community.
This presentation will provide an update of recent work and an overview of CF plans and future activities.
Building a Geological Cyber-infrastructure: Automatically detecting Clasts in Photomicrographs
Sonoma State University, United States of America
To incentivize the participation and contribution to the growth of an earth-science-based cyberinfrastructure, analytical environments need to be developed that allow automatic analysis and classification of data from connected data repositories. The purpose of this study is to investigate a machine learning technique for automatically detecting shear-sense-indicating clasts (i.e., sigma or delta clasts and mica fish) in photomicrographs, and finding their shear sense (i.e., sinistral (CCW) or dextral (CW) shearing). Previous work employed transfer learning, a technique in which a pre-trained Convolutional Neural Network (CNN) was repurposed, and artificially augmented image datasets to distinguish between CCW and CW shearing. Preprocessing images by denoising, a process in which noise at different scales is removed while preserving edges of an image, improved classification accuracy. However, upon randomizing the denoising parameters, the CNN model didn’t converge due to severe lack of data. While the efforts for acquiring more labeled data is ongoing, this work compensated for it by implementing a pre-processing “detection” system that automatically crops images to regions of image containing the clasts. This is done by utilizing YOLOv3, a CNN based image detection system that outputs a bounding box around an object of interest. YOLOv3 was trained using 93 photomicrographs containing bounding boxes of 344 shear-sense-indicating clasts. The retrained detector was tested on two sets: set A with 10 photomicrographs containing clasts and set B with 100 photomicrographs not containing clasts. All but one of the clasts in set A were correctly detected with an average confidence score of 96.6%. On set B, 72% of images correctly did not indicate presence of clasts. On the remaining images, where clasts were incorrectly identified, an average confidence score of 78.3% was observed. By utilizing a threshold on the confidence scores, the system could be made more accurate. Future work involves utilizing the bounding boxes output by the detection system to refine and improve the CNN model for classifying shear sense of clasts in photomicrographs.
Applying Multi-Region Input-Output Analysis to Marine Bioinvasions: A Scientific Paper of the Future (SPF) in Progress
1University of Auckland, New Zealand; 2Cawthron Institute; 3University of Sydney, Australia
The Scientific Paper of the Future (SPF) concept, initiated by the EarthCube OntoSoft Funded Project, encourages scientists to publish not only peer-reviewed journal articles, but also all associated data, software (data processing scripts), and computational workflows, in order to enable full science reproducibility. While the SPF concept was originally aimed at geoscientists, it can also be applied to interdisciplinary projects such as between ecology, economics, and maritime shipping.
Multi-region input-output (MRIO) analysis is a method from economics for analyzing economic interdependencies between different regional entities. Entities can be countries, regions within a country, or groups of countries. MRIO can also be used to analyze other types of interdependencies, such as the environmental impact of one region’s activities on another.
For this project, we use MRIO to analyze the global spread of marine non-indigenous species via cargo ships. Over 90% of global trade occurs by maritime shipping. Along with intended cargo, ships provide a means for marine organisms to move to locations beyond their natural ranges, mainly via hull fouling or in ballast tanks. These species can have harmful ecological and economic impacts at their destinations. By using MRIO to follow the imports and exports of commodities between countries, we can deduce the magnitude of seaborne trade connections based on physical volume of commodity traded, and therefore the magnitude and geographic distribution of marine biosecurity risk.
MRIO model construction involved incorporating a diversity of data types from ecology, economics, and shipping, and has turned out to be a surprisingly complex endeavor. My poster will demonstrate the principles of an SPF by providing a diagram of the computational workflow involved in the model’s construction, including an explanation for each dataset incorporated into the model’s input parameters and each piece of software written to process the data and assemble and run the model.
An ICEBERG Update and Request for Community Input
1Stony Brook University; 2Rutgers University; 3Northern Arizona University; 4University of California - Santa Barbara
The ICEBERG (Imagery Cyber-infrastructure and Extensible Building blocks to Enhance Research in the Geosciences) project (NSF 1740595) aims to (1) develop open source image classification tools tailored to high-resolution satellite imagery of the Arctic and Antarctic to be used on HPDC resources, (2) create easy-to-use interfaces to facilitate the development and testing of algorithms for application to specific geoscience requirements, (3) apply these tools through use cases that
span the biological, hydrological, and geoscience needs of the polar community, and
(4) transfer these tools to the larger non-polar community.
This paper updates the status and preliminary results of our 4 pilot use cases: automated Antarctic seals and penguin colony detection, Greenland supraglacial stream delineation, and Antarctic land cover classification. We will provide an overview of the underlying cyberinfrastucture that has been developed for the project, discuss the CI and machine learning lessons learned, and describe the application of this technology to non-polar regions such as Florida coast mapping. Finally, we will lay out the planned software distribution mechanisms and ongoing community support.
A Cyber Workflow for Collection, Management, and Exploration of 40Ar/39Ar Geochronology Data
1Department of Geosciences, University of Wisconsin-Madison, Madison, WI 53706, USA; 2New Mexico Bureau of Geology & Mineral Resources, Socorro NM 87801, USA; 3Department of Geosciences, University of Arizona, Tucson, AZ 85721, USA
The grand challenge of developing a fully integrated 4-D digital Earth model has prompted geochronologists to implement cyberinformatics systems to collect, manage, and share data. The WiscAr lab workflow now features the PyChron data-collection and analysis software and Sparrow data management system. Both are open-source and extensible, allowing them to evolve in sync with our analytical approaches. PyChron links location and stratigraphic metadata with samples upon receipt; these metadata are automatically retained throughout the process of sample preparation, irradiation, analysis, and data reduction. Once an age is determined, the analytical and metadata are ingested by Sparrow through a customizable schema-based importing tool that can be adapted for any type of geochronologic data. Sparrow connects our lab-level data to synthetic databases including Macrostrat, Neotoma, and the Paleobiology Database (PBDB), as well as to individual consumers, who can search, navigate, and visualize radioisotopic age data and metadata through the WiscAr Lab Sparrow portal. This pipeline produces data that are accessible, searchable, and reusable, and which can be reassessed in the context of future refinements to decay constants and standard mineral ages.
We demonstrate how the PyChron-Sparrow pipeline generates and spatially represents geochronologic data from several tuffs spanning deposition of lacustrine sediments of the Eocene Wilkins Peak Member of the Green River Formation, WY. We use these data to prototype the integration between the lab workflow and Macrostrat. New 40Ar/39Ar and U-Pb age determinations, considered in parallel with published ages, automatically refine Macrostat’s stratigraphic age model for the Eocene. Our findings illuminate a pathway for further integration between lab-level data systems and virtual synthetic databases and the power of an integrated pipeline for exploring stratigraphic problems that are informed by all available geochronologic data.
@HDMIEC RCN Working Group Workshop Report: Machine Learning in Heliophysics and Space Weather Forecasting
1New Jersey Institute of Technology; 2Georgia State University
The PI Team of the EarthCube Research Coordination Network “Towards Integration of Heliophysics Data, Modeling, and Analysis Tools,” in collaboration with the leading team of the @HDMIEC RCN Working Group: “Benchmark Datasets for Reproducible Data-driven Science in Solar Physics” organized on 16-17 January 2020 at the New Jersey Institute of Technology, Newark, New Jersey, a 2-day workshop that brought together a group of 44 participants representing data providers, expert modelers, and computer and data scientists. The objective of this working group workshop was to discuss recent critical developments and prospects of the application of machine and/or deep learning techniques for data analysis, modeling and forecasting in Heliophysics and to shape a strategy for further developments in the field. The workshop combined a set of plenary sessions featuring invited introductory talks interleaved with a set of open discussions sessions. Here we report on the main outcomes of this RCN activity, as encapsulated in a white paper submitted to the funding agency.
Assimilative Mapping of Geospace Observations (AMGeO): Data Science Tools for Collaborative Geospace Systems Science
1University of Colorado Boulder; 2Virginia Polytechnic Institute and State University; 3John Hopkins University / Applied Physics Laboratory
The most dynamic electromagnetic energy and momentum exchange processes between the upper atmosphere and the magnetosphere take place in the polar ionosphere, as evidenced by the aurora. Accurate specification of the constantly changing conditions of high-latitude ionospheric electrodynamics has been of paramount interest to the geospace science community. In response this community’s need for research tools to combine heterogeneous observational data from distributed arrays of small ground-based instrumentation operated by individual investigators with global geospace data sets, an open-source Python software and associated web-applications for Assimilative Mapping of Geospace Observations (AMGeO) are being developed and deployed (https://amgeo.colorado.edu). AMGeO provides a coherent, simultaneous and inter-hemispheric picture of global ionospheric electrodynamics by optimally combining diverse geospace observational data in a manner consistent with first-principles and with rigorous consideration of the uncertainty associated with each observation. In order to engage the geospace community in the collaborative geospace system science campaigns and a science-driven process of data product validation, AMGeO software is designed to be transparent, expandable, and interoperable with established geospace community data resources and standards. This paper presents an overview of the AMGeO software development and deployment plans as part of a new NSF EarthCube project that has started in September 2019.
Comparison of InSAR time series generation techniques as part of the collaborative GeoSciFramework research project
1University of Colorado Boulder, USA (CIRES); 2University of Leeds, UK (COMET); 3UNAVCO, Inc., Boulder Colorado, USA
The GeoSciFramework project (GSF), funded by the NSF Office of Advanced Cyberinfrastructure and NSF EarthCube programs, aims to improve intermediate-to-short term forecasts of catastrophic natural hazard events, allowing researchers to instantly detect when an event has occurred and reveal more suppressed, long-term motions of Earth's surface at unprecedented spatial and temporal scales. These goals will be accomplished by training machine learning algorithms to recognize patterns across various data signals during geophysical events and deliver scalable, real-time data processing proficiencies for time series generation. The algorithm will employ an advanced convolutional neural network method wherein spatio-temporal analyses are informed both by physics-based models and continuous datasets, including Interferometric Synthetic Aperture Radar (InSAR), seismic, GNSS, tide gauge, and gas-emission data. The project architecture accommodates increasingly large datasets by implementing similar software packages already proven to support internet searches and intelligence gathering.
This talk will focus primarily on the Differential InSAR (DInSAR) time-series analysis component, which quantifies line-of-sight (LOS) ground deformation at mm-cm spatial resolution. Here, we compare time series products generated under three different processing techniques. The first, an automated version of InSAR processing using the small baseline subset (SBAS) method performed in parallel on systems such as Generic Mapping Tool SAR (GMT5SAR) and the Generic InSAR Analysis Toolbox (GIAnT). The second method will resemble the first but will implement different processing systems for performance comparison using the InSAR Scientific Computing Environment (ISCE) and the Miami InSAR Time Series Software in Python (MintPy). The final strategy, developed by Drs. Zheng and Zebker from Stanford University, concentrates on the topographic phase component of the SAR signal so that simple cross multiplication returns an observation sequence of interferograms in geographic coordinates [Zebker, 2017]. Our results provide high-resolution views of ground motions and measure LOS deformation over both short and long periods of time.
Earth Science Cookbook - Discoverability and Credit through APIs
1University of Wisconsin - Madison, United States of America; 2Consortium for Ocean Leadership; 3Concord University; 4Lamont-Doherty Earth Observatory, Columbia University; 5University of Minnesota; 6Northern Arizona University; 7University of Michigan
The Throughput EarthCube database contains information for over 2000 research data catalogs and several hundred thousand GitHub repositories linked to these resources, along with NSF Awards, language information and annotations of individual data elements.
Throughput is designed to improve discoverability of workflows related to one or more research databases, by providing a search tool for researchers who are engaged in novel research that crosses disciplines. In developing the database (http://throughputdb.org) and associated API (http://throughputdb.org/api/), patterns of use and re-use become clear, and it is possible to provide guidance to individuals developing workflows for research data resources.
This presentation will showcase the current API and user interface for Throughput, patterns of code use and re-use within the database, and lessons learned from hand curation, and scripting tasks that have added databases and data nodes to the repository. We will also indicate the ways in which this kind of knowledge graph can provide insights into the impact and breadth of current projects, and provide additional citability for researchers who are engaged heavily in tool creation.
GeoEDF: An Extensible Geospatial Data Framework for FAIR Science
1Research Computing, Purdue University; 2Lyles School of Civil Engineering, Purdue University; 3Agricultural Economics, Purdue University; 4Agricultural and Biological Engineering, Purdue University; 5Marshall University
Collaborative research in the earth sciences is increasingly conducted online in web-based research platforms or "science gateways". With the growing emphasis on FAIR (Findable, Accessible, Interoperable, Reusable) science to enable reproducible research, science gateways need to ensure that the data products used and produced by such research are compliant with these FAIR principles. In practice, earth science research often involves complex scientific workflows that comprise various data acquisition, pre-processing, analysis, and simulation tasks. However, these workflows are rarely conducted completely in the science gateway environment. They often involve a mix of non-reusable code, desktop tools, and manual, intermediate data staging steps that present a significant challenge to ensure compliance to FAIR principles. Furthermore, the complexity of using diverse, large quantities of data from remote repositories compounds these challenges. For example, hydrologists and agricultural economists in interdisciplinary research need to not only use diverse data sets for their specialty domain but also connect their computational models through exchange of data. Data sources can range from repositories managed by NASA, USGS, etc., to sensor arrays in smart cities, and crowdsourcing. Due to the inherent massive volume, high dimensionality, heterogeneous formats, and variability in access protocols, researchers often spend a lot of time manually collecting and processing data using custom code, instead of focusing on scientic questions.
We are developing a data framework (GeoEDF) that abstracts away the complexity of acquiring and utilizing data from diverse data providers. Extensible data connectors will implement common data query and access protocols such as HTTP, OPeNDAP, FTP, and Globus supporting both static and streaming data. Data sources can be configured by simply specifying the data location, authentication, access protocol, etc. Connectors are parameterizable, allowing reuse for subdataset, temporal, and, spatial choices. Extensible data processors will implement common and domain-specific geospatial data processing such as resampling, format conversion, or a scientic simulation model. A plug-and-play workflow composer will allow users to string together data connectors and processors into declarative, reproducible workflows that can be executed in heterogeneous environments, directly from a science gateway. Automated metadata extraction and annotation will be integrated into such workflows, supporting FAIR science through ease of data discovery and reproduction. By bringing data to the science, GeoEDF will accelerate data-driven discovery.
Guidance and Tools for Structured Data on the Web: An update on publishing, validating, and harvesting schema.org and other structured data vocabularies to support FAIR data goals
1Ocean Leadership; 2BCO-DMO, Woods Hole Oceanographic Institution; 3Biodiversity Institute, University of Kansas
The availability of machine accessible structured data on the web and the underlying digital objects, metadata records, and infrastructure are evolving rapidly to fulfill the desire to improve discovery, researcher workflow integration, and FAIR data principles in general. This evolution is exemplified by the emergence of data set centric search tools, evolving notebook and service offerings in the community, and publisher guidance such as that from the Coalition for Publishing Data in the Earth and Space Sciences (COPDESS). Mechanisms such as schema.org and similar technologies offer considerable flexibility for describing structured data resources, and therein lies a key challenge. Without broad agreements or standards of practice, information silos can emerge from inconsistencies in representation of common concepts. Publishers are frustrated by a plethora of competing, apparently similar approaches. Consumers must adjust tooling to work across disciplines.
Data managers, DevOps and other actors involved in the data pipeline benefit from publishing guidance, validation, and tooling to help address these goals. Developments in the EarthCube and ESIP community are seeking to address these demands through development of community guidelines, best practices, and software tools to assist both publishers and consumers. The ESIP Science on Schema work evolved from EarthCube and the recent ESIP Lab funded 2020 Science-On-Schema.Org Validator focus on assisting publishers through easily applied guidelines with testable validation mechanisms. These are combined with tooling exemplified by Gleaner (gleaner.io) to test harvest and indexing of online structured data resources.
An overview of these projects, demonstrations, and approaches for engaging and integrating with them will be presented.
Integration of Reproducible Methods into Community Cyberinfrastructure
1Utah State University, United States of America; 2DePaul University; 3University of Virginia
For science to reliably support new discoveries, its results must be reproducible. This has proven to be a challenge in many fields including fields that rely on computational methods as a means for supporting new discoveries. Reproducibility in these studies is particularly difficult because they require open, documented sharing of data and models and careful control of underlying hardware and software dependencies so that computational procedures executed by the original researcher are portable and can be run on different hardware or software and produce consistent results. Despite recent advances in making scientific work more findable, accessible, interoperable and reusable (FAIR), fundamental questions in the conduct of reproducible computational studies remain: Can published results be repeated in different computing environments? If yes, how similar are they to previous results? Can we further verify and build on the results by using additional data or changing computational methods? Can these changes be automatically and systematically tracked? This presentation will describe our EarthCube project to advance computational reproducibility and make it easier and more efficient for geoscientists to preserve, share, repeat and replicate scientific computations. Our approach is based on Sciunit software developed by prior EarthCube projects which encapsulates application dependencies composed of system binaries, code, data, environment and application provenance so that the resulting computational research object can be shared and re-executed on different platforms. We have deployed Sciunit within the HydroShare JupyterHub platform operated by the Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI) for the hydrology research community and will present use cases that demonstrate how to preserve, share, repeat and replicate scientific results from the field of hydrologic modeling. While illustrated in the context of hydrology, the methods and tools developed as part of this project have the potential to be extended to other geoscience domains. They also have the potential to inform the reproducibility evaluation process as currently undertaken by journals and publishers.
Leveraging BALTO to Utilize IoT Sensors and Citizen Science in Advancing FAIR-Principled Geoscience
1Virginia Tech; 2OPeNDAP; 3Montana Tech; 4University of Maryland Eastern Shore; 5NCAR; 6University of Colorado; 7Ronin Institute; 8National Renewable Energy Laboratory; 9University of Oklahoma
The EarthCube project titled BALTO (Brokered Alignment of Long-Tail Observations), is a natural fit for surmounting key challenges in citizen deployment of inexpensive sensors to advance the geosciences. The societal benefits of citizen science are potentially great, especially in the context of a changing climate; and the potential economies gained from inexpensive sensors—from the Internet of Things (IoT) arena—are quite substantial. However, such benefits and economies are vulnerable to weak links in the long chain that begins at the IoT sensor and ends in a data repository, where geoscientists increasingly expect adherence to the so-called FAIR Guidance Principles, namely: data should be Findable, Accessible, Interoperable and Reusable. This presentation features efforts of a dozen or so technical experts and geoscientists, from a remarkably diverse set of specialties, to strengthen links in the IoT-to-repository chain. Exploiting Free and Open-Source Software, and building on BALTO capabilities for trans domain interoperability, the work presented will potentially enable workflows that collect inexpensive field observations—at yard, plot, field, and watershed scales—as (FAIR-compliant) input for modeling and analytic studies in basic and applied fields of agriculture, water-resources management, mountain ecology, hydrology, local air quality, and biogeochemical cycling.
Machine Learning Enhanced Cyberinfrastructure for Understanding and Predicting the Onset of Solar Eruptions
New Jersey Institute of Technology, United States of America
In this talk we present an overview and initial results of this new Earthcube project. Space weather has significant impact on Earth, therefore is a critical component of geosciences, and closely related to the understanding and prediction of the Earth system. Coronal mass ejections (CMEs) are among the most important sources of space weather, as they carry a tremendous amount of mass and energy with twisted magnetic fields from the Sun to the interplanetary space. It is believed that the topology and evolution of magnetic fields are the determining factors in providing energy storage and triggering of solar eruptions. Observations of key parameters such as magnetic fields and flows are critical for understanding and monitoring the non-potentiality and energy content in active regions (ARs) that may power solar flares and CMEs. An extended database covering a large number of events is a critical component to achieve the science goal. Therefore, we propose to build and utilize a science-enabling infrastructure to characterize solar ARs, apply machine learning tools to predict solar flares and CMEs, and address two key science questions: (1) Which parameters and physical processes are most important for the onset of solar eruptions? (2) What is the accuracy of using these parameters to predict solar eruptions? The project has three components:
(1) We will utilize and interface with the infrastructure developed by some team members under a previous EarthCube project, to include digitized and digital high-resolution H-alpha and white-light data of the Big Bear Solar Observatory (BBSO) from 1970 to now, current NASA data, as well as legacy data such as SOHO from 1995 to 2011 for a more comprehensive archive of flares and associated ARs. (2) Dynamic non-potentiality properties of ARs will be derived using advanced imaging and machine learning tools. We will use deep learning techniques to trace fibril/loop structures in the chromosphere and corona. Combining these with coronal field extrapolation will provide novel parameters to describe non-potentiality in ARs. Considering the evolution of ARs, we will derive two new parameters that may be critically linked to flares and CMEs: flow motions and magnetic helicity injection in flare productive ARs. (3) Based on flare/CME properties and important parameters derived from hosting ARs, we will further adapt deep learning techniques to predict the occurrence and energy range of flares and CMEs.
Role of Atmospheric Rivers in modulating upper ocean salinity variability: A study using Argovis, a next generation platform for climate data.
1University of Colorado Boulder, United States of America; 2University of California San Diego, United States of America; 3University of Maryland
Modern web 2.0 technologies revolutionize Earth system science workflows by improving data accessibility for oceanic and atmospheric products. We have expanded the Argovis web app (argovis.colorado.edu) to include a visualization and data-delivery tool that displays oceanic and atmospheric gridded products, e.g. float trajectory forcasting, oceanic fields, and weather events. A Rasterizing component displays gridded products on a web map. Additionally, it charts differences between grids for comparison. The weather event module has users query Argo profiles co-located with the weather event of interest (by a user-defined co-location strategy). Argovis runs on the browser. No need for large file storage and software installation. Users import data directly from the app’s database through an API for analysis in their favored programming language.
As an example of how this technology accelerates Earth system science workflows, we have studied Atmospheric River (AR) events colocated with oceanic data. ARs are filamentary structures in the atmosphere that transport water vapor from the tropics to mid-latitudes over the oceans in highly episodic events. We will present results from analysis of the role of Atmospheric Rivers in modulating seasonal to interannual variability of upper ocean salinity.
StraboSpot digital data system: Incorporating the long tail data of the geological field sciences
1University of Kansas, United States of America; 2Texas A&M University, United States of America; 3University of Wisconsin - Madison, United States of America; 4Univeristy of Utah, United States of America; 5Penn State University, United States of America; 6Univesity of North Carolina, United States of America; 7Princeton University, United States of America; 8Rensselaer Polytechnic Institute, United States of America; 9MIT, United States of America
The StraboSpot digital data system is quickly expanding to many field-based disciplines in the Geological Sciences. Originally designed for structural geology field data, it allows researchers to digitally collect, store, contextualize, and share geologic data in both the field and laboratory. With this community, we simultaneously achieved two tasks: 1) Designing a digital data system that recreated the workflow of field-based structural geologists, particularly using two main concepts - spots and tags - to organize data; and 2) An iterative process of development with the communities that it serves. It became clear that the approach was sufficiently robust to use with other communities.
We pursued two different efforts to expand StraboSpot simultaneously: 1) Including petrology (igneous and metamorphic) and sedimentology field data; and 2) Including microscale (thin section scale) and experimental rock deformation data. Our approach to the development of this data system for these communities was two-fold: First, that the data system must be part of the workflow of doing science, and second, the relevant geologic communities are involved in the development efforts from initial dreams and formulations through testing and revision. The engagement of scientists was through workshops, field trips, and student outreach: See Chan et al. (this meeting) for engagement with the sedimentology community. We have also started working with the community studying volcanic deposits, not part of original plan, because this community was organized and their data fit easily in the StraboSpot framework.
The strength of the StraboSpot platform is its flexibility, accommodating the needs of a wide-range of the geologic community. The utilization of the system by multiple sub-disciplines will allow integration of digital efforts across the geological sciences. The StraboSpot data system – in coordination with other digital data efforts – will allow geologists to conduct new types of science and join big data initiatives.
StraboSpot for Sedimentary Geology: Encouraging community involvement and feedback through workshops and fieldtrips
University of Utah, United States of America
An NSF EarthCube-funded project supported a field-based workshop designed to evaluate and refine the sedimentology/stratigraphy portion of the StraboSpot digital data management system. Eleven academics attended the workshop, representing a spectrum of career levels and specialties. The participants teach classes in sedimentology and conduct sedimentary research, but had not used any previous digital mobile apps in the field.
The field component focused on learning the basic functionality of the StraboSpot app as a method of collecting digital data in the field. On the first day, teams of 2-3 participants measured a stratigraphic section in a highly visited locality of the well-studied Book Cliffs of central Utah. Teams saw how the vocabulary and spot functionality worked to collect sedimentary field data and to generate stratigraphic columns. The second day was spent measuring a more complex mixed carbonate-clastic sequence in the San Rafael Swell (Utah). Half of the third day was spent in discussion on major issues with workflow/vocabulary and getting feedback on how to simplify and streamline descriptive data collection functions (stratal attributes), and reviewing the more challenging interpretation functions (processes, depositional environments, and architecture). A major discussion point was how best to handle data collection and stratigraphic plotting of ‘interbedded’ intervals. As a result of the workshop, we streamlined workflow options and refined portions of the vocabulary.
This field testing followed up on two previous workshops that solicited expert advice to develop the program categories and basic vocabulary for the sedimentary community. Overall, workshop participants were enthusiastic about the potential of digital data systems, and the ability to link annotated photographs and sketches to georeferenced localities. All participants indicated they were inclined to use StraboSpot in both teaching and research, particularly with versatile and customizable options.
The Sparrow software interface for linking analytical data and metadata in laboratory archives
1University of Wisconsin – Madison, Madison, WI; 2University of Arizona, Tucson, AZ; 3New Mexico Bureau of Geology & Mineral Resources, Socorro, NM; 4Boise State University, Boise, ID
Large-scale, observation-driven digital Earth models are an emerging focus of scientific innovation. Constructing these models requires en-masse harmonization of data across traditional domain boundaries. One example is the integration of geologic time into model-driven assessments of global change, which requires the calibration of Earth-system databases (e.g. Macrostrat and Neotoma) against robust global age datasets. Within geochronology, community-level standards and archival facilities have been developed to promote the accessibility and reusability of age data (e.g. Geochron.org and IGSN). However, balkanized data practices have made integrating these endpoints complex and labor-intensive; additional reporting requirements are useful but place additional burdens on researchers.
To increase data interchange, reduce strain on laboratory workers, and support a rich ecosystem of data pipelines spanning the geosciences, the EarthCube Geochronology Frontiers project seeks to automate exchanges between labs and data facilities. The centerpiece of this effort is Sparrow (https://sparrow-data.org), a standardized software interface and management layer to the data archive of an individual geochronology laboratory. Sparrow is designed to sit atop current workflows for data collection, reduction, and storage; its application programming interface (API) supports access by end users and centralized archives.
To date, Sparrow has been deployed at laboratories specializing in U-Pb, 40Ar/39Ar, optically-stimulated luminescence, and cosmogenic nuclide dating. Work is ongoing to extend the system across the full archives of several high-throughput laboratory facilities and to streamline data import and management tooling. This project update showcases improvements to the Sparrow user interface that enhance laboratory data-management capabilities. These include a web-based dashboard that enables rich searching and filtering and new tools for linking analytical measurements with geological context and publication metadata. These capabilities, applied to lab archives of tens of thousands of age measurements, demonstrate Sparrow’s concrete and scalable contributions to a linked community data infrastructure.
A path to data-based permeability upscaling in subsurface using Digital Rocks Portal data and deep learning models
Hildebrand Department of Petroleum and Geosystems Engineering, The University of Texas at Austin, United States of America
Transport in porous media is critical to understand the geological process of rock formation and applications such as the management of groundwater resources, carbon sequestration, enhanced oil recovery, and contaminant transport. Typical geological systems are composed of a broad spectrum of porous media with properties such as permeability (ability to transmit flow) varying by orders of magnitude. Significant scientific opportunities can be realized by developing methods that study transport within individual porous media datasets (e.g. 2D and 3D from microscopy or tomography) and considering approaches to aggregate observations from many different datasets.
Deep learning models, in particular convolutional neural networks (CNNs), currently provide the best solutions in image classification and segmentation. We here present one of the first successful applications of a CNN model for predicting 3D fluid velocity fields in a variety of porous media. All of the data were accesses through Digital Rocks Portal (https://www.digitalrocksportal.org, developed through NSF EarthCube Grant 1541008.) We trained the CNN using a carefully selected set of geometry descriptors as well as a velocity field simulated using a detailed (time consuming) simulation in an unconsolidated granular material of void fraction (porosity) of 36%. After training, we tested the CNN in different, more consolidated media such as sandstones (in nature, granular materials like sands are consolidated as they get buried deeper, and their porosity reduces to 8-20%). Our work demonstrates the opportunity to apply deep learning models for physical prediction of velocity fields at a single (pore) length/time scale in sands with tremendous speed up. This makes data-based upscaling of key subsurface flow properties such as permeability within reach in heterogenous rock formations but requires convergence of (good) data and (rather easy to use) simulation tools. The implementation was done using open source software and will soon be available on GitHub.
Combining deep learning and SAR to estimate significant wave heights in the New Jersey coastal area
Rutgers, the State University of New Jersey
Ocean waves are important to the earth system. First, they transport energy and mass and the resultant sea-surface roughness defines the drag coefficients that transmits wind energy to the ocean (Drennan et al., 2003). Second, they impact on the shoreline to change the shape, erosion, and landscape of coastal areas. Third, storm surge waves cause flood damage in the coastal area. Fourth, recent studies reveal that wetlands are sensitive to the wave condition, which determines the retreat or growth of the coastal ecosystem (Green and Coco, 2007; Mariotti and Fagherazzi 2010). Finally, human activities rely on the condition of waves to conduct marine activities of fishing, shipping, oil extraction, and offshore constructions. So we need to understand ocean waves to improve the capability of earth modeling, protect the coastline, predict the storm surge, preserve the coastal ecosystem, and enhance the offshore business. This project will explore the application of deep learning in the field of SAR-based coastal wave estimate. HF radar data will be used as ground truth to calibrate and validate the wave height estimator. The developed code will enhance the current capability to process the satellite data and create a new platform to monitor the coastal environment. The collected data will help further our understanding of the wave spectrum in a coastal environment and the data can support other research in the related topics, e.g. the interaction of waves and ice sheets, wetlands, shorelines, wind farm and aquaculture.