Session
| ||

3A - Machine Learning and Data Management
Session Topics: Data mining / Machine learning / Deep Learning and AI
Session Sponsor: MemVerge
Session Slides | ||

Presentations
| ||

8:45am - 9:05am
Talk-Live ID: 188 / ses-03-A: 1 Regular Talk Topics: Data mining / Machine learning / Deep Learning and AI Keywords: Automated Machine Learning, R package, Hyperband mlr3automl - Automated Machine Learning in R Ludwig-Maximilians-Universität Munich We introduce mlr3automl, an open-source framework for Automated Machine Learning in R. Based on the mlr3 Machine Learning package, mlr3automl builds robust and accurate classification and regression models for tabular data. mlr3automl provides automatic preprocessing, which guarantees stable performance in the presence of missing data, categorical and high-cardinality features, and large data sets. Preprocessing and model building is solved through a flexible pipeline implemented with mlr3pipelines. This allows mlr3automl to jointly optimize preprocessing, model selection and model hyperparameters using Hyperband. mlr3automl shows strong performance and stability on a benchmark consisting of 39 challenging classification tasks. mlr3automl successfully completed every task in the benchmark within the strict time budget, which three out of five other state of the art AutoML systems failed to achieve. 9:05am - 9:25am
Talk-Video ID: 168 / ses-03-A: 2 Regular Talk Topics: Data mining / Machine learning / Deep Learning and AI Keywords: exploratory data analysis Triplot: model agnostic measures and visualisations for variable importance in predictive models that take into account the hierarchical correlation structure MI2 Data Lab, Warsaw University of Technology One of the key elements of the explanatory analysis of a predictive model is to assess the importance of the individual variables. The rapid development of the area of predictive model exploration (also called explainable artificial intelligence or interpretable machine learning) has led to the popularization of methods for local (instance level) and global (dataset level) methods, such as Permutational Variable Importance, Shapley Values (SHAP), Local Interpretable Model Explanations (LIME), Break Down and so on. However, these methods do not use information about the correlation between features which significantly reduce the explainability of the model behaviour. In this work, we propose new methods to support model analysis by exploiting the information about the correlation between variables. The dataset level aspect importance measure is inspired by the block permutations procedure, while the instance level aspect importance measure is inspired by the LIME method. We show how to analyse groups of variables (aspects) both when they are proposed by the user and when they should be determined automatically based on the hierarchical structure of correlations between variables. Additionally, we present a new type of model visualisation, triplot, that exploits a hierarchical structure of variable grouping to produce a high information density model visualisation. This visualisation provides a consistent illustration for either local or global model and data exploration. We also show an example of real-world data with 5k instances and 37 features in which a significant correlation between variables affects the interpretation of the effect of variable importance. The proposed method is, to our knowledge, the first to allow direct use of the correlation between variables in exploratory model analysis. Triplot package for R is developed under open source GPL-3 licence and is available on GitHub repository at https://github.com/ModelOriented/triplot. 9:25am - 9:45am
Talk-Video ID: 252 / ses-03-A: 3 Regular Talk Topics: Data mining / Machine learning / Deep Learning and AI Keywords: networks, embeddings, machine learning, algorithms Getting sprung in R: Introduction to the rsetse package for embedding feature-rich networks UCL, United Kingdom The Strain Elevation Tension Spring embedding algorithm (SETSe) is a deterministic method for embedding feature-rich networks. The algorithm uses simple Newtonian equations of motion and Hooke's law to embed the network onto a locally euclidean manifold. To create the embedding, SETSe converts node attributes into forces and the edge attributes into springs. SETSe finds an equilibrium position when the forces on the springs balance the forces of the nodes. The algorithm has a time complexity of O(2) and linear memory complexity; this means the algorithm avoids issues faced by other physics based embedding methods and can be used to embed graphs with tens of thousands of nodes and more than a million edges. Some applications of SETSe are; analysing social networks; understanding the robustness of power grids; geographical analysis; predicting node features; understanding power dynamic between individuals and organisations; analysis of molecular structures. This presentation will provide both a brief technical discussion of the algorithm and its implementation, as well as several use cases. The use cases describe how to embed a network and then how to interpret that embedding. There are very few options for graph embeddings using R, and this is something that rsetse seeks to address; the algorithm has been implemented in the package `rsetse` and is available on CRAN. Link to package or code repository.
https://github.com/JonnoB/rSETSe https://jonnob.github.io/rSETSe/index.html 9:45am - 10:05am
Talk-Video ID: 137 / ses-03-A: 4 Regular Talk Topics: Data mining / Machine learning / Deep Learning and AI Keywords: data envelopment analysis An R package for the implementation of Efficiency Analysis Trees and the estimation of technical efficiency Miguel Hernández University of Elche EAT is a new R package that includes functions to estimate production frontiers and technical efficiency measures using non-parametric techniques based on CART regression trees. The package implements the main algorithms associated with a new technique introduced to estimate the efficiency of a set of decision making units in Economics and Engineering through machine learning techniques, called Efficiency Analysis Trees (Esteve et al., 2020). It encompasses the estimation of radial measures, oriented Russell efficiency measures, the directional distance function, the weighted additive model, graphical representations of the production frontier using tree-shaped structures and the classification of input variable importance. In addition, it incorporates a code to carry out an adaptation of the Random Forest Algorithm to estimate technical efficiency. This work describes the methodology and application of the functions. |