Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Session Overview
Poster presentations 1
Tuesday, 14/June/2022:
4:00pm - 5:30pm

External Resource:
Show help for 'Increase or decrease the abstract text size'

OpenMindat: Transforming crowd-sourcing data into scientific discoveries in geoscience

Xiaogang Ma1, Jolyon Ralph2, Anirudh Prabhu3, Shaunna M. Morrison3, Robert M. Hazen3

1University of Idaho, United States of America;, United Kingdom; 3Carnegie Institution for Science, United States of America

Mindat is a community-driven, free-access, online database platform that records information about all known mineral species and their worldwide distribution. Started as a private database project in 1993 it has been running on the internet since 2000 and has since become a powerful resource for both education and scientific research. The OpenMindat project will, for the first time, allow automated querying and downloads from this data resource for academic research. This effort will involve moving all appropriate data into an Open Science compatible license, building and operating a web-based platform for both automated queries and bulk data downloads, preparing all documentation on the use of this data and building a suite of developer tools including packages in Python and R for direct data access from workflow platforms. The Mindat database has been used effectively since 2010 in data-driven geoscience research and has underpinned hundreds of scientific publications every year, but to do so has required Mindat to prepare individual agreements with researchers and to provide data extracts manually. The new open data developments in OpenMindat will allow researchers to access the data without such barriers. This will be a powerful boost for many potential research topics in geoscience. This work is supported by the National Science Foundation EarthCube program (NSF #2126315).

The IUGS Deep-Time Digital Earth Program: Understanding the Past to Illuminate our Future

David A. Leary1, James Ogg2, Robert M. Hazen3, Jennifer McKinley4, Roland Oberhänsli5, Natarajan Ishwaran6, Susan Nash7, Chengshan Wang8

1IUGS Deep-time Digital Earth, United States of America; 2Purdue University, Dept. Earth, Atmospheric, and Planetary Sciences, Indiana; 3Carnegie Institution for Science, Earth and Planets Laboratory, Washington, DC; 4Queen’s University, School of Natural and Built Environment, Belfast; 5University of Potsdam, Institute of Earth and Environmental Sciences, Potsdam; 6IUGS Deep-time Digital Earth, Secretariat, Suzhou, China; 7American Association of Petroleum Geologists, Innovation & Emerging Science and Technology, Tulsa; 8China University of Geosciences, School of the Earth Science and Resources, Beijing

The geological record offers the best documentation of Earth's evolution beyond the historical time scales. It also hosts clues to understanding some of the most pressing issues, such as climate change and sustainable development of natural resources. However, we currently lack a systematic and effective way to integrate and analyze the enormous volumes of deep-time Earth data scattered in isolated databases or literature. We also need a well-structured geoscience knowledge graph to harness the power of artificial intelligence in geoscience.

To address these challenges, the International Union of Geological Sciences initiated the Deep-Time Digital Earth (DDE) program in collaboration with national geological surveys, professional associations, academic institutions, and scientists worldwide. DDE aims to harmonize deep-time data, share global geoscience knowledge and facilitate data-driven discoveries about Earth's history. To this end, DDE will build on previous research to develop a systematic deep-time geoscience knowledge graph, a FAIR data infrastructure that links existing databases and makes gray data visible, and a set of tailored tools for data analysis and visualizations.

The current research effort is dedicated to building a one-stop online research platform for geoscientists. The core of the platform, Deep-time Engine, is supported by the DDE cloud, which connects various cloud computing services. Deep-time Engine further consists of four sub-engines that provide computing, knowledge, data, and research services, respectively. Through a well-curated knowledge graph, the knowledge service enables scientists to leverage the advantages of artificial intelligence in geoscience. The Data service focuses on automating information extraction from literature and interlinking existing databases. This will allow scientists to discover, access, and utilize deep-time earth data more efficiently. The Computing service allows scientists to develop, reuse, modify, and share tools and models for data processing and interpretation. Additionally, it provides geoscientists with large-scale, high-performance cloud computing services. When these services are combined, they enable scientists to rapidly build a customized working platform for complex scientific research and analysis scenarios using service APIs and browser components. By creating this open-access deep-time research platform, DDE holds the promise of understanding our planet's past, present, and future in new and vivid detail.

With the EarthCube program entering its tenth and final year, numerous projects provide excellent learning opportunities and valuable lessons on multiple levels, ranging from cyberinfrastructure design to data interlinking, model development, and program management. With this in mind, DDE cordially invites any interested EarthCube project leaders and researchers to share their experiences and stories with DDE in order to improve its service to the geoscience community. Meanwhile, the DDE service is open to all scientists and may provide some of the critical storage and computing resources required to sustain the current EarthCube-funded projects. Finally, we hope that the potential collaboration between DDE and EarthCube benefits both parties and helps transform Earth Science.

Project Raijin: Community Geoscience Analysis Tools for Unstructured Grids

John Clyne1, Orhan Eroglu1, Cecile Hannay1, Brian Medeiros1, Michaela Sizemore1, Paul Ullrich3, Anissa Zacharias1, Colin Zarzycki2

1NCAR, United States of America; 2Penn State University; 3UC Davis

Project Raijin is an NSF EarthCube-funded effort to develop sustainable, community-owned tools for the analysis and visualization of unstructured grid model outputs arising from next generation climate and global weather models. The primary software environment for Project Raijin is the Scientific Python Ecosystem, making use of, in particular, the Xarray, Dask, and Jupyter components that comprise the Pangeo stack. Working in close collaboration with atmospheric modelers, Raijin aims to: (1) develop extensible, scalable, open source tools supporting fundamental analysis and visualization methods capable of operating directly (without resampling) on unstructured grid model outputs at global storm resolving resolutions; and (2) establish an active, vibrant community of user-contributors, committed to extending our work beyond the scope of this NSF award, thus helping ensure the long term sustainability of the project. To support both of these primary goals, work on Raijin is conducted under an open development model that encourages participation in all aspects of the project. This presentation will provide an overview of the current status of Project Raijin, with an emphasis on describing our open development model and how the community can get involved. We will discuss current capabilities, including support for various model outputs, such as MPAS, CAM-SE, E3SM, and IKON, as well as provide a roadmap for future development.

Bringing Data-Rich Science to the Classroom: Opportunities and Resources from the Compass Project

Sean Fox, Cailin Huyck Orr, Ellen Iverson

Carleton College, United States of America

In order to ensure high quality teaching resources, connected to transformative geoscience research, are brought into broad and diverse use, we are advancing Earth education resource discovery through the Compass project. Compass's set of approaches to this challenge offer opportunities for EarthCube allied projects, current and future, to increase the reach of their work across the community of educators. The Science Education Resource Center (SERC) hosts materials from over 120 geoscience education projects and is currently engaged, through the Compass project, in increasing the impact of these Earth education materials. A central element of Compass is improving discoverability of resources for the 5 million visitors who explore SERC-hosted sites each year. Compass is developing new partnering mechanisms that will allow SERC visitors to find high-quality external materials (hosted outside SERC) as they interact with SERC search and discovery tools. For EarthCube projects that host their own educational resources this could provide a new route to a broader audience. For projects still exploring how they might achieve broader impacts for their science by reaching educators Compass offers a number of new supporting tools and processes. Contributions of individual teaching activities through SERC's Teach the Earth site can now be categorized and discovered based on a new Diversity, Equity, Inclusion and Justice controlled vocabulary. This vocabulary also allows easy access to a rich set of guidance around these issues. Similarly, a new set of guidelines and supporting tools are available to improve accessibility as it plays out in Earth education teaching resources. This new content joins wide array of models and practices reflected across SERC's work in helping the community connect current, data-rich science to the classroom.

Implementing community best practice data and metadata capture into laboratory work flows using Sparrow

Stephen C. Kuehn1, Daven P. Quinn2, Casey R. Idzikowski2, Victor Atasie1, Samson Tsega1

1Concord University; 2University of Wisconsin

An implementation of the Sparrow data system ( is currently being developed to support laboratory workflows for sample preparation, geochemical analysis, and SEM imaging in support of tephra research. Tephra, consisting of fragmental material ejected from volcanoes, has a multidisciplinary array of applications from volcanology to geochronology, archaeology, environmental change, and more. The international tephra research community has developed a comprehensive set of recommendations for data and metadata collection and reporting ( as part of a broader effort to adopt FAIR practices. Implementations of these recommendations now exist for field data via StraboSpot ( and for samples, analytical methods, and geochemistry via SESAR and EarthChem (

Implementing these recommended practices in Sparrow helps to (1) cover laboratory workflows between field sample collection and project data archiving and (2) address a key researcher pain point. As re-emphasized by participants in the Tephra Fusion 2022 workshop earlier this year (Wallace et al., this meeting), the huge workload currently needed to capture and organize data and metadata in preparation for archiving in community data repositories is a major obstacle to achieving FAIR practices. By capturing this information on the fly during laboratory workflows and integrating it together in a single data system, this challenge may be overcome.

We are implementing the tephra community recommendations as extensions to Sparrow’s core database schema. Data import pipelines and user interfaces to streamline metadata capture are also being developed. In the longer term, we aim to achieve interoperability with an ecosystem of tools and repositories like StraboSpot, SESAR, EarthChem, and Throughput. The results of these developments will be applicable not just to tephra but also to other research areas which utilize similar laboratory and analytical methods - e.g. sedimentology, mineralogy, and petrology.

PBot, the Integrative Paleobotany Portal: Building beyond barriers to answer big questions in paleobotanical research

Claire Cleveland1, Ellen Currano1, Dori Contreras2, Rebecca Koll2, Douglas Meredith3, Shanan Peters4, Mark Uhen5, Andrew Zaffos3

1Department of Botany, University of Wyoming, 1000 E University Ave, Dept. 3165, Laramie, WY 82701-2000; 2Perot Museum of Nature and Science, Dallas, TX 75201; 3Arizona Geological Survey, University of Arizona, Tucson, AZ 85721; 4Department of Geosciences, University of Wisconsin - Madison, Madison, WY 53706; 5Department of Atmospheric, Oceanic, and Earth Sciences, George Mason University, Fairfax, VA 22030

Paleobotanical data is severely underrepresented in major publicly accessible databases, even though fossil plants represent the best record of ancient terrestrial environments. A major barrier to the inclusion of paleobotanical data in databases at meaningful levels of taxonomy is that plant parts are most often preserved separately, with varying potential for taxonomic resolution. Paleobotanists therefore commonly use morphologically-based, informal taxonomies (morphotypes) rather than traditional Linnaean classifications. The names given to a particular morphotype can be inconsistent among research groups, and currently there is no data-management infrastructure that allows comparison and synonymizing of morphotypes among regions or time periods or with published formal taxonomies. As a result, a large proportion of the millions of fossil plant specimens housed in museums worldwide are, together with their spatio-temporal occurrence data, inaccessible for inclusion in studies to answer big questions in paleobiology, paleoclimatology, Earth system modeling, macroevolution, and macroecology.

In 2020, our group received EarthCube funding to address these problems by creating PBot, The Integrative Paleobotany Portal. In the first year of this project, PBot has (1) organized workshops and surveys with the worldwide paleobotany community to gather data on database needs and institutional practices to guide the development of PBot, (2) formed eight community teams to develop descriptive schemas for seven plant organs, including cuticles, fruits, pollen, seeds, spores, shoots, and multiple leaf groups, (3) coordinated with the Paleobiology Database (PBDB) to adopt existing data architecture for collections, Linnaean taxonomy, localities, and references and to expand associated PBDB databanks, (4) developed backend architecture to enter and browse informal taxonomies, and (5) begun design of the online workbench for standardized fossil plant descriptions and data entry. We will continue to offer workshops and training as PBot develops.

Opening ARM: Developing Open-Science Cookbooks Leveraging Users, Data, and Infrastructure from the Community

Maxwell Grover1, Zachary Sherman1, Scott Collis1, Jitendra Kumar2, Adam Thiesen1, Robert Jackson1, Kevin Tyle3, Julia Kent4, Drew Camron5

1Argonne National Laboratory, United States of America; 2Oak Ridge National Laboratory; 3University at Albany, State University of New York; 4National Center for Atmospheric Research; 5Unidata Program Center

The Atmospheric Radiation Measurement (ARM) facility is a multi-laboratory, U.S. Department of Energy (DOE) scientific user facility that collects atmospheric data from around the world, and makes this data freely available with the goal of improving climate models. One common challenge from working with this data is finding reproducible workflows that utilize open-source tools, such as Py-ART (Python ARM Radar Toolkit). While there are examples within the PyART documentation, there is a serious gap in documented end-to-end workflows involving the PyART library. An example of an end-to-end workflow here is reading in the data, applying some correction, and plotting the data. Another piece missing from those examples are links to library-specific documentation (e.g. Matplotlib, NumPy), with tutorials specifically developed for the Geoscience community. These gaps are highlighted within the recent PyART Roadmap, and one solution to solve these problems to to create a “Radar Data Cookbook” which provides end-to end workflows in addition to linking to existing geoscience-focused foundational material (e.g. Matplotlib, NumPy).

Within this paper, we discuss the recently developed Radar Data Cookbooks, which leverage content from an existing EarthCube initiative, “Project Pythia”. We provide users with an overview of the data structure (ground-based radar data), a tutorial on how to access data from the ARM program utilizing the ARMLive data API, and a detailed walkthrough of common scientific workflows, such as analyzing a cross section through a thunderstorm. We link to various foundational material from the Project Pythia Foundations content; for example, when covering the final plotting procedure within the content, we indicate that “Matplotlib Basics” is a recommended prerequisite, preventing the need to create an ad-hoc matplotlib tutorial within the cookbook. The cookbook materials can be executed within a Jupyter interface using Binder, or via the JupyterLab hosted by the ARM program. We hope that this cookbook framework serves as an example for other projects to continue to push for open-data, open-source, and open-science practices.

Tephra Fusion 2022 workshop focused on best practices in tephra data recommends innovative computer solutions to build databases

Kristi Wallace1, Marcus Bursik2, Steve Kuehn3, Andrei Kurbatov4

1US Geological Survey/Volcano Science Center, USA; 2University at Buffalo, SUNY, Buffalo, USA; 3Concord University, Athens, WV, USA; 4University of Maine, Orono, USA

A series of international workshops held in 2014, 2017, 2019, and 2022 focused on improving tephra studies from field collection through publication and encouraging FAIR (findable, accessible, interoperable, reusable) data practices for tephra data and metadata. Two consensus needs for tephra studies emerged from the 2014 and 2017 workshops: (a) standardization of tephra field data collection, geochemical analysis, correlation, and data reporting, and (b) development of next generation computer tools and databases to facilitate information access across multidisciplinary communities. To achieve (a), we developed a series of recommendations for best practices in tephra studies, from sample collection through analysis and data reporting ( A 4-part virtual workshop series ( was held in February and March, 2022, to update the tephra community on these developments, to get community feedback, to learn of unmet needs, and to plan a future roadmap for open and FAIR tephra data. More than 230 people from 25 nations registered for the workshop series. The community strongly emphasized the need for better computer systems, including physical infrastructure (repositories and servers), digital infrastructure (software and tools) and human infrastructure (people, training, and professional assistance), to store, manage and serve global tephra datasets. Some desired attributes of improved computer systems include: 1) user friendliness 2) ability to easily ingest multiparameter tephra data (using best practice recommended data fields); 3) interoperability with existing data repositories; 4) development of tool add-ons (plotting and statistics); 5) improved searchability 6) development of a tephra portal with access to distributed data systems, and 7) commitments to long-term support from funding agencies, publishers and the cyberinfrastructure community.

Geologic data standardization for database entry: preparing diverse datasets for hosting and accessibility

Leah LeVay1, Andrew Fraass2,3, Shanan Peters4, Jocelyn Sessa2, Seth Kaufman5, Wai-Yin Kwan5

1Texas A&M University; 2Drexel University; 3University of Victoria; 4University of Wisconsin, Madison; 5Whirl-i-Gig

Aggregating a large number of datasets into databases facilitates new research avenues and creates a more accessible pathway for information discovery. When dealing with datasets that come from multiple sources spanning decades, however, that harmonization process can become quite onerous. Extending Ocean Drilling Pursuits (eODP) is an EarthCube-funded project compiling and migrating scientific ocean drilling data that span six decades into three representations: an eODP-specific aggregation database, the Macrostrat database, and the Paleobiology Database (PBDB). Sediment rock descriptions and microfossil assemblage data have been stored as flat files in various places and formats depending upon the year of collection, ultimately limiting their utility. Additionally, as methodologies have evolved over the years, new and different information was captured in loose, customizable formats that require translation by content experts. A major goal of eODP is to centralize and harmonize much of this information. Preparing the raw data for ingestion into Macrostrat and the PBDB has required intensive effort. This includes entering taxonomic opinions for every microfossil genus, cleaning and editing microfossil names, and standardizing and cross-walking column header names.

The standardization of datasets and database prep work has been a combination of computational cleaning and formatting, along with manual cleaning and data entry. In order to format all of the files in a consistent way, a new database called the eODP database, was created. This database consists of all of the files retrieved, with completed cross-walking, with data staged for migration into Macrostrat and the PBDB. Before microfossil assemblage data can be transferred to the PBDB, all taxonomic opinions and fossil name errors or misspellings were manually entered and reviewed. Furthermore, stratigraphic age information and sediment rock descriptions not stored in a digital format have required manual database entry. Setting the foundation for eODP has been complex, due to the inconsistencies within files and the sheer volume of data, but progress, necessary to produce the best outcome for the community, is being made.

An Update on Project Pythia: A Community Learning Resource for Geoscientists

Kevin Tyle1, John Clyne2, Anderson Banihirwe2, Drew Camron3, Chris Cardinale1, Orhan Eroglu2, Robert Ford1, Maxwell Grover4, Julia Kent2, Alea Kootz2, Matthew Long2, Ryan May3, Kevin Paul2, Brian Rose1, Michaela Sizemore2, Anissa Zacharias2

1University at Albany, State University of New York; 2National Center for Atmospheric Research; 3Unidata; 4Argonne National Laboratory

In this presentation, we update the EarthCube community about Project Pythia (, currently in its second of three years of EarthCube funding ( Project Pythia is a web-based, community-owned, educational resource whose aim is to help teach geoscientists of all stripes and experience levels how to make effective use of the Geoscientific Python Software Ecosystem for the analysis and visualization of their data.

Project Pythia represents the educational efforts of the Pangeo project ( Pangeo is a community of academic scientists and industry software engineers whose partnership serves to "foster collaboration around the open source scientific Python ecosystem for ocean, atmosphere, land, and climate science." While Pangeo is probably best known for its scalable software stack built on Jupyter, Xarray, and Dask, the Pangeo community can proudly tout its support for and advocacy of open development practices and reproducible science. Project Pythia was funded to create a "go-to" resource to help geoscientists navigate and learn the complex scientific Python ecosystem and to help train scientists in the best practices necessary to develop and share free, open and reproducible scientific software and datasets.

Project Pythia currently provides the following core components, all accessible through our main portal at

  1. Pythia Foundations Book ( presented as an executable Jupyterbook (, Foundations consists of a series of educational modules that cover the basic tools and knowledge necessary to wade into the complex geoscience Python software ecosystem. Chapters include:

    1. Introduction to Python and Jupyter

    2. Markdown

    3. Git and GitHub

    4. NumPy

    5. Matplotlib

    6. Cartopy

    7. Datetime

    8. Pandas

    9. Xarray

    10. Geoscientific data formats (e.g., NetCDF, Zarr)

  2. Pythia Resource Gallery ( A curated, searchable gallery of tutorials, videos, Jupyterbooks, and courses covering various aspects of the Python geoscience software ecosystem.

Besides adding to and refining the existing content served by Foundations and the Resource gallery, work is ongoing on the following:

  1. Pythia Cookbooks: Jupyter notebooks geared to specific use-cases, such as analysis and visualization of cloud-served datasets

  2. Pythia Platform: Development efforts currently target a light-weight, containerized, Binderhub-based platform that will make it possible to launch Cookbook content in customizable executable environments in the Cloud or on an HPC cluster with only a single click.

The StraboSpot Digital Data System: New Developments in the Field App and First Release of the Laboratory Application

Jason Ash1, Julie Newman2, Basil Tikoff3, J Douglas Walker1, Randy Williams3, Nicolas Roberts3, Ellen M Nelson3, Alex Lusk3, Noah Phillips4, Jessica Novak1, Nathan Novak1

1Kansas University; 2Texas A&M University; 3University of Wisconsin - Madison, United States of America; 4Lakhead University

The StraboSpot digital data system continues to expand its capabilities and communities involved in its use. We have introduced a new interface for our field app that we call simply StraboSpot2. It is a more intuitive implementation that conforms more closely to the workflow of the field geologist while introducing new functionality taking it beyond an electronic implementation of a paper field book and map. We have also released the ability to create, edit, and share stratigraphic sections. This functionality is described in Duncan et al. (2021, Geosphere) and is built into the StraboSpot1 app but will be released this year into the StraboSpot2 app. We will also be releasing an Android version of StraboSpot2 soon. We continue to reach out to other geology communities. Our work continues to meet the needs of field geologists studying volcanic deposits (Tephra community). Image analysis in the field on rock faces can be done using the StraboTools iOS app. And we continue to reach out to the geomorphology and active tectonics community to collect field data.

One significant new development has been a stand-alone application for micrographs, an image taken on any type of microscope. We call this StraboMicro, and it uses the same “Spot”-based approach to solving the ability to work at different scales of resolution. The application also allows one to cross-reference and superimpose images taken on different microscopes (optical, SEM, TEM, etc.). The vocabulary for this application was developed through community outreach nationally and internationally coordinated with the European Plate Observing System effort. StraboMicro will allow storage of all information, images, and interpretations related to micrographs in the same data system as the field app.

The other new development has been developing a cyberinfrastructure to support the experimental rock deformation community. We refer to this as StraboExp, with the user interface being developed by the rock mechanics group at MIT. The goal is to have StraboExp work seamlessly with StraboMicro to compare naturally and experimentally produced deformation microstructures.

The pandemic has required significant modification of our community engagement approach. Previously, we ran large workshops, short courses, and field trips. As these activities were curtailed, we have funded a group of a dozen “superusers” to test the system in research settings and provide feedback. We have conducted Zoom-based webinars that are open to all users and are presented on a StraboSpot Channel on YouTube. We are also expanding our micrograph efforts to include igneous, metamorphic, and sedimentary petrology through similar virtual interactions.

The StraboSpot platform has been hugely benefitted by its inclusion in EarthCube. The StraboSpot data system – in coordination with other digital data efforts – has allowed geologists to conduct new types of science and join big data initiatives.

StraboMicro: Contextualizing and sharing micro-scale geologic data through a digital application

Julie Newman1, Ash Jason2, Randy Williams3, Alex Lusk3, Noah Phillips4, Basil Tikoff3

1Texas A&M University; 2Kansas University; 3University of Wisconsin - Madison, United States of America; 4Lakhead University

StraboMicro is a desktop application that provides tools for image management and contextualizing, storing and sharing geologic data that is observed and analyzed at the microstructural scale. StraboMicro is part of the StraboSpot ecosystem (, a digital datasystem enabling collection and storing of geologic data across scales and disciplines. One of the benefits of a common data system is that micro-scale data in StraboMicro can be linked to associated field data in StraboSpot or experimental data in StraboExperimental (currently under development). Here, we demonstrate the capabilities of the StraboMicro application, which include: 1) adding images from any source (e.g., optical microscope, scanning or transmission electron microscope), with instrument metadata; 2) hierarchical organization of images and data to preserve spatial information; 3) defining scale and orientation; 4) image annotation; 5) the ability to add associated files (e.g., data spreadsheets) and links to other databases (e.g., EarthChem), repositories, or websites; 6) flexible grouping of images based on user-defined concepts, and 7) the ability to add a variety of detailed disciplinary observations and/or data types based on vocabulary defined by the microstructural community. StraboMicro is an effective way to make microstructural data FAIR (findable, accessible, interoperable, and reusable), and an important step in working towards a future of big data science and machine learning in micro-scale geologic disciplines.

Building OpenMindat for FAIR mineralogical data access

Jolyon Ralph1, Xiaogang Ma2, Anirudh Prabhu3, Pavel Martynov1

1Hudson Institute of Mineralogy (, United States of America; 2University of Idaho, United States of America; 3Carnegie Institution for Science, United States of America has been building a crowdsourced database of mineralogical information, most notably mineral occurrence information, since October 2000. The aim of the OpenMindat project (funded under NSF grant 2126315) is to make these data available in an open access and interoperable system so that data can be used freely and efficiently for the advancement of geoscience research.
The database is the result of decades of work by both amateur and professional volunteers working together to enter and verify information. Access to the data has always been free for users to browse but a machine interface for data access and download has, until now, not been established.
The data model proposed for this interface includes information about mineral taxonomy and properties, petrological taxonomy (including meteoritics), mineral occurrence information including hierarchical and non-hierarchical locality structures and reference citation information.
OpenMindat will allow discovery and full open access to this data along with the opportunity in the future to submit new data into the database for review and possible integration. Data will be offered in JSON format and modules are being developed for both Python and R allowing direct integration of data into Jupyter Notebook and R Markdown along with API documentation and examples in Python, PHP and JavaScript.

VICTOR – A new Cyber-infrastructure for Volcanology

Einat Lev1, Charles Connor2, Abani Patra3, Sylvain Charbonnier2, Laura Connor2

1Columbia University, United States of America; 2University of South Florida; 3Tufts University

Forecasting the impact of active or future volcanic eruptions and correctly interpreting the remnants of past eruptions requires access to models of eruptive processes. The volcano modeling community recognizes a need for more equitable access to models that are robust, verified, validated, and easier to use. To answer this need, we are building VICTOR, a new cyberinfrastructure for the volcano modeling community. To date, we have established a steering committee that represents our community, and connected with larger, national efforts including CONVERSE and SZ4D. We scoped a collaboration with a non-profit organization (2i2c) that will manage VICTOR’s back-end in the form of a JupyterHub placed in the cloud, and we are searching for technical staff to build the platform. In anticipation of the JupyterHub deployment, we have developed Jupyter notebooks that call existing volcano models such as the lava flow code MOLASSES and the tephra dispersal code Tephra2. We are working to build containers for these. Implementations of recently developed workflows for inversion, benchmarking and uncertainty quantification are actively being developed. For example, we augmented a workflow for supporting VATD using the puffin tool for calculations on Vhub using remote data sources for sonde data and will transfer this workflow to VICTOR once it is ready. We also explored the use of surrogate models based on convolutional neural networks for debris flow computations and uncertainty analysis using the TITAN2D model. To ensure continuity of service for our community, we negotiated the transition of the current cyberinfrastructure to a stub group inside a separately supported new preserving current functionality for an interim period of 2 years while VICTOR is ramping up. Lastly, we developed a plan for teaching a graduate, multi-institutional course on volcanic hazard modeling using VICTOR.

Toward Causal Predictive Models of Trace Element Partitioning

Aaron S. Wolf1, Erica Cung2, Gokce Ustunisik2,3, Roger Nielsen2, Paula Antoshechkina4, Kerstin A. Lehnert5, Peng Ji5, Sean Cao5, Lucia R. Profeta5

1Univeristy of Michigan; 2South Dakota School of Mines; 3American Museum of Natural History; 4Caltech; 5Columbia University

The partitioning of trace elements between minerals and melt during igneous rock formation imprints distinct geochemical fingerprints that can be used to infer rock formation histories. Interpreting these geochemical markers is challenging, however, since partitioning is sensitive to many factors that influence elemental exchange, including temperature, pressure, composition, and lattice strain. Accurately modeling partitioning thus requires fitting large experimental databases capable of distinguishing between multiple overlapping and competing effects. Good models must make accurate predictions for unobserved conditions, enabling interpolation within and extrapolation beyond the original training data. This problem presents an excellent data-driven modeling challenge, highlighting the need for integrated data acquisition, filtering, & processing together with geochemical model building and assessment [EAR2026904, EAR1948806].

Existing empirical models of trace-element partitioning typically rely upon simple linear regressions for only one or a few dependent variables [1,2,3]. This approach attempts to avoid overfitting and improve predictive accuracy by limiting model complexity, but fails to account for a strong selection bias imposed directly by thermodynamic equilibrium. To be present in the dataset at all, every experiment must have both the liquid and mineral of interest stable and present at detectable levels, causing all parameters that influence phase equilibria to be unavoidably correlated (independent of trace element behavior). This correlation is easily confused for a causal effect in overly-simplified regression models, dramatically worsening prediction accuracy for new conditions, especially those outside the training data.

We present a new generalized causal model for trace element partitioning based on a large comprehensive database of published experiments (LEPR/traceDs available at Our model is applied to clinopyroxene, garnet, and amphibole partitioning for 53 trace elements, and is the first of its type to directly account for the biasing effect of phase equilibrium constraints using the specialized techniques of causal inference [4] (largely unknown to the geological community). To control overfitting, we combine statistical regularization constraints with Monte Carlo analysis of correlated model uncertainties. These models are constructed and maintained in a simple Jupyter notebook environment that enables dynamic updates as more data are added to the underlying LEPR/traceDs database. The final result is a set of partitioning models for key igneous phases which make predictions that diverge substantially from existing models based on simpler model forms and more limited datasets.

Funding- EAR-2026904(2026819, 2026916) : A data-driven modeling infrastructure to support research and education in volcanology, geochemistry and petrology; EAR-1948806: EarthChem & SESAR - Data Infrastructure for Geochemistry and Earth Science Samples Communities

[1] Nielsen, R. L. (1990), Reviews in Mineralogy and Geochemistry, 24(1), 65-105.

[2] Bédard, J. H. (2006), Geochimica et Cosmochimica Acta, 70(14), 3717-3742.

[3] Blundy and Wood (1991), Geochimica et Cosmochimica Acta, 55(1),193-209.

[4] Pearl, J. (2009), Statistics surveys, 3, 96-146.

Air Quality from Local to Global Scales: An Introduction to the MELODIES-MONET Python Package

David Fillmore1, Rebecca Schwantes2,4, Rebecca Buchholz1, Duseong Jo1, Louisa Emmons1, Meng Li2,4, Jian He2,4, Barry Baker3, Zachary Moon3, Margaret Bruckner5

1National Center for Atmospheric Research - Atmospheric Chemistry Observations and Modeling; 2National Oceanic and Atmospheric Administration - Chemical Sciences Laboratory; 3National Oceanic and Atmospheric Administration - Air Resources Laboratory; 4Cooperative Institute for Research in Environmental Sciences - University of Colorado, Boulder; 5University of Wisconsin, Madison

MELODIES (Model Evaluation using Observations, Diagnostics and Experiments Software) is a modular python framework for evaluation and assessment of regional and global atmospheric chemistry models. Its core functionality includes the spatial and temporal alignment of a diverse set of model and observational datasets, including observations from surface networks, aircraft, and Earth orbiting satellites. In addition to a standard suite of spatial pattern and time series plots, MELODIES generates a set of statistical metrics, and is designed to be both highly extensible and customizable.

MELODIES incorporates the python package MONET (Model and Observation Evaluation Toolkit), in particular its dataset readers and plotting library. The combined MELODIES-MONET package is further based on the pandas (for time series) and the xarray/numpy (for numerical array computations) python modules. This Jupyter notebook gives an overview of the essential functionality of MELODIES-MONET, and includes several examples for exploring and visualizing air quality data.

MELODIS-MONET is open source software hosted at

and documented at

Comparing air quality models to satellite observations with MELODIES MONET

Margaret Bruckner1, Rebecca Buchholz3, David Fillmore3, Meng Li2,4, Rebecca Schwantes2,4, Barry Baker5, Duesong Jo3, Louisa Emmons3, Jian He2,4, Zachary Moon5

1University of Wisconsin-Madison; 2National Oceanic and Atmospheric Administration- Chemical Sciences Laboratory; 3National Center for Atmospheric Research - Atmospheric Chemistry Observations and Modeling; 4Cooperative Institute for Research in Environmental Sciences - University of Colorado, Boulder; 5National Oceanic and Atmospheric Administration - Air Resources Laboratory;

Validation studies for atmospheric composition models commonly include evaluation with satellite observations. The MELODIES MONET python package ( aims to provide a flexible framework for statistically and visually comparing air quality models with remote sensing and in-situ observations. MELODIES MONET contains routines for temporal and spatial resampling, application of satellite averaging kernels or apriori columns, and calculation of column densities for model datasets. This notebook illustrates these capabilities through the evaluation of a global atmospheric composition model during the 2019 Fire Influence on Regional to Global Environments and Air Quality (FIREX-AQ) field campaign.

Harnessing Large InSAR Datasets to Explore Crustal Processes along the San Andreas Fault Plate Boundary Over Time: GMTSAR in Action

Katherine Anna Guns1, Xiaohua Xu2, David Sandwell1, Yehuda Bock1

1Institute of Geophysics and Planetary Physics, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA; 2Institute for Geophysics, University of Texas at Austin, Austin, TX, USA

Interferometric Synthetic Aperture Radar (InSAR) measurements collected regularly over time provide us with a detailed observational tool to investigate ongoing crustal deformation processes, including earthquake cycle effects, hydrological changes, and volcanic deformation. This type of data, however, remains one of the largest datasets in the geosciences, with over 3 petabytes (3000 TB) of high-quality data available annually through the European Space Agency’s (ESA) Sentinel-1A/B SAR satellite system (where one raw data file for a ~30s measurement = ~4 GB). Moreover, future SAR missions, like the upcoming NASA-ISRO SAR (NISAR) mission, will add even more high-quality data to the bank of accessible data for the community. Harnessing immense datasets like this requires not only powerful computers, but it also requires innovative new methods with actively-managed and robust software packages. Here, we apply the freely available, open-source software GMTSAR ( to create and strengthen a workflow of InSAR time series processing. We utilize 92+ TB of Sentinel-1A/B data from nine satellite tracks that span California to explore the tectonic and hydrologic processes along the San Andreas plate boundary over the period of 2015 – 2021 (visit to see the most recent version of these products).

While InSAR allows for continuous spatial ground coverage, its measurement lacks an absolute reference system and can be prone to artifacts from atmospheric delay which is most significant at longer wavelengths. To remedy this, we integrate InSAR time series data with precise Global Navigation Satellite System (GNSS) time series data in the International Terrestrial Reference Frame (ITRF), which provides precise corrections for any long-wavelength errors as well as an absolute reference system. This combined, corrected, GNSS/InSAR time series product allows us to monitor, in detail, the effects from recent earthquakes (namely the 2019 Ridgecrest, CA event), subsidence and uplift effects from changing groundwater basin storage conditions, and changes in actively creeping fault zones within the plate boundary, all with millimeter accuracy.

Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: 2022 EarthCube Annual Meeting
Conference Software - ConfTool Pro 2.6.144
© 2001–2022 by Dr. H. Weinreich, Hamburg, Germany