Conference Agenda

7A - Ecology and Environmental Sciences
Thursday, 08/July/2021:
9:15am - 10:45am

Session Chair: Ulfah Mardhiah
Zoom Host: Adrian Maga
Replacement Zoom Host: Dorothea Hug Peter
Virtual location: The Lounge #talk_ecology_environment

Session Topics:
Ecology, Environmental sciences

9:15am - 9:35am
ID: 147 / ses-07-A: 1
Regular Talk
Topics: Environmental sciences
Keywords: big data

startR: A tool for large multi-dimensional data processing

An-Chi Ho, Núria Pérez-Zanón, Nicolau Manubens, Francesco Benincasa, Pierre-Antoine Bretonnière

Barcelona Supercomputing Center (BSC-CNS)

Nowadays, the growing data volume and variety in various scientific domains have made data analysis challenging. Simple operations like extracting data from storage and performing statistical analysis on them have to be rethought. startR is an R package developed at the Earth Science Department in Barcelona Supercomputing Center (BSC-CNS) that allows to retrieve, arrange, and process large multi-dimensional datasets automatically with a concise workflow.

startR provides a framework under which the datasets to be processed can be perceived as a single multi-dimensional array. The array is first declared, then a user-defined function can be applied to the relevant dimensions in an apply-like fashion, building up a declarative workflow that can be executed in various computing platforms. During execution, startR implements the MapReduce paradigm, chunking the data and processing them either locally or remotely on high-performance computing systems, leveraging multi-node and multi-core parallelism where possible. Besides the data, metadata are also well-preserved and expanded with the operation information, ensuring the reproducibility of the analysis.

Several functionalities in startR, like spatial interpolation and time manipulation, are tailored for atmospheric sciences research such as climate, weather, and air quality. It is compatible with other R tools developed in BSC-CNS, forming a strong toolset for climate research. However, it is potentially competent in other research fields. Even though netCDF is the only data format supported in the current release, adaptors for other file formats can be plugged in, enabling the tool to be exploited in different scientific domains where large multi-dimensional data is involved.

Link to package or code repository.

9:35am - 9:55am
ID: 193 / ses-07-A: 2
Regular Talk
Topics: Environmental sciences
Keywords: hydrology, river hydrograph, hydrograph separation, climate change, spatial analysis

grwat: a new R package for automated separation and analysis of river hydrograph

Timofey Samsonov1, Ekaterina Rets2, Maria Kireeva1

1Faculty of Geography, Lomonosov Moscow State University, Russian Federation; 2Institute of Water Problems, Russian Academy of Science, Russian Federation

`grwat` is a new R package aimed at analysis of river hydrograph — a time series of river discharge values. The overall shape of hydrograph is specific for each river and is heavily influenced by climatic conditions within a river basin. Since the climate is changing, the shape of a typical hydrograph for each river is also transformed. The main goal of grwat package is to provide automated tools to extract the genetic components of river discharge (e.g. how much disharge is due to thaws, floods etc.) as well as graphical and statistical tools to reveal interannual and long-term changes of these components. The core procedure which allows extraction of genetic components is separation. The implementation of separation in `grwat` is two-stage. First, it follows the generaly acclaimed approach to separate the discharge into quick flow and baseflow. Second, it involves the temperature and precipitation time series to separate the quick flow into seasonal (snowmelting), thaw and flood-induced discharge using the originally developed algorithm. The separation is programmed in pure STL C++17 and then interfaced into `grwat` via Rcpp. Separated hydrograph is represented as a data frame where for each observation the input total discharge is distributed between several columns, each representing a genetic component. Such data frame can be further analyzed with `grwat` resulting in more than 30 interannual and long-term statistically tested variables characterizing the aggregated values, dates and durations of specific events and periods. Examples are seasonal flood runoff, annual groundwater discharge, number of thaw days, and the date of seasonal flood beginning. Finally, `grwat` contains convenient functions to quickly visualize one or more variables using ggplot2 graphics, and to generate high-quality R Markdown-based HTML reports which combine graphics and results of statistical tests for all computed variables. Development is funded by Russian Science Foundation (Project 19-77-10032)

Link to package or code repository.

9:55am - 10:15am
ID: 190 / ses-07-A: 3
Regular Talk
Topics: Ecology
Keywords: agent-based modelling, animal, R6, simulation, OOP

Using R6 object-oriented programming to build agent-based models

Liam Daniel Bailey, Alexandre Courtiol

IZW Berlin, Germany

Agent or individual-based modelling is an invaluable tool in the biological sciences, used to understand complex topics such as conservation management, invasive species, and animal population dynamics. However, while R is one of the most common programming languages used in the biological sciences it is often considered 'unsuitable' for agent-based modelling tasks, with other tools such as NetLogo, Java, and C++ utilized instead. Here, we introduce how the package R6 can be used to build agent-based models and simulate complex population and evolutionary dynamics in R. R6 offers the possibility to easily define classes with encapsulated methods. It has become the package of choice behind many well-known R packages that use encapsulated object-oriented programming (e.g. shiny, dplyr, testthat). Yet, while simulations have been built in R using other class systems such as S3 and S4, the potential of R6 to perform such tasks remains untapped. We provide a real-world example from our research on the large African carnivore, the spotted hyena. Object-oriented programming using R6 was easy to learn and implement, and working in R allowed us to quickly build, document, and unit test our code by taking advantage of existing tools in R/RStudio with which we were already familiar (e.g. RStudio projects, roxygen2, testthat). Implementing agent-based modelling in R will allow ecologists to easily make use of this powerful tool in their research. Researchers will not be required to learn any new programming languages but can instead implement agent-based models in the same language they already use for data wrangling, statistical analysis, and data visualisation.

10:15am - 10:35am
ID: 171 / ses-07-A: 4
Regular Talk
Topics: Environmental sciences
Keywords: data processing

Climate Forecast Analysis Tools Framework: from the storage to the HPC to get reproducible climate research results and services

Núria Pérez-Zanón1, An-Chi Ho1, Francesco Benincasa1, Pierre-Antoine Bretonnière1, Louis-Philippe Caron2, Chihchung Chou1, Carlos Delgado-Torres1, Llorenç Lledó1, Nicolau Manubens3, Lluís Palma1

1Barcelona Supercomputing Center (BSC); 2Ouranos; 3NA

Climate forecast researchers need to assess the quality of their forecasts by comparing them against reference observation datasets using state-of-the-art verification metrics. This procedure requires reading in the seasonal forecasts and reference data and restructuring them for later comparison (e.g.: regridding, resampling or reordering). Only then, statistical methods can be applied to assess forecast skill and, finally, tailored visualization tools are employed to explore the results.

At the Earth Sciences department of the Barcelona Supercomputing Center, the expertise in seasonal forecast research has traditionally been compiled in the s2dverification R package since its first release in 2009. The package provides tools implementing all the steps required for the procedure described above, allowing researchers to share their methods while reducing development and maintenance cost. However, as the department broadened its activity to include research on sub-seasonal forecast, decadal prediction and climate projections, as well as development of climate services for various stakeholders, new state-of-the-art tools to manipulate climate data became necessary.

As a result, the department is currently maintaining eight R packages. These packages can be used separately or in their common framework, and include methods for calibration, downscaling and combination in the CSTools package, climate indicators in ClimProjDiags, and CSIndicators -among other climatological methods- in s2dv (s2dverification’s successor). The framework has been designed to be flexible and efficient. The Big Data issue inherent to climate data analysis is addressed by employing the startR and multiApply packages to seamlessly enable chunked multi-core processing, optionally leveraging multi-node parallelism in HPC platforms.