Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
8C - Statistical modeling & Data Analysis 1
Time:
Friday, 09/July/2021:
1:45pm - 3:15pm

Session Chair: Liz Hare
Zoom Host: Rachel Heyard
Replacement Zoom Host: Balogun Stephen
Virtual location: The Lounge #talk_stats_data_analysis_1
Session Topics:
Data mining / Machine learning / Deep Learning and AI, Multivariate analysis


Presentations
1:45pm - 2:05pm
Talk-Video
ID: 176 / ses-08-C: 1
Regular Talk
Topics: Data mining / Machine learning / Deep Learning and AI
Keywords: Fairness, Bias, AI, Machine Learning, Visualization

fairmodels: A Flexible Tool For Bias Detection, Visualization, And Mitigation

Jakub Wiśniewski1, Przemysław Biecek1,2

1Faculty of Mathematics and Information Science, Warsaw University of Technology; 2Faculty of Mathematics, Informatics and Mechanics, University of Warsaw

Machine learning decision systems are getting increasingly omnipresent in our lives. From dating apps to rating loan seekers, algorithms affect both our well-being and future. Typically, however, these systems are not infallible. Moreover, complex predictive models are very eager to learn social biases present in historical data that can lead to increasing discrimination. If we want to create models responsibly then we need tools for in-depth validation of models also from the perspective of potential discrimination.

This article introduces an R package fairmodels that helps to validate fairness and eliminate bias in classification models in an easy and flexible fashion. The fairmodels package offers a model-agnostic approach to bias detection, visualization, and mitigation. The implemented set of functions and fairness metrics enables model fairness validation from different perspectives. The package includes a series of methods for bias mitigation that aim to diminish the discrimination in the model.

The package is designed not only to examine a single model but also to facilitate comparisons between multiple models.



2:05pm - 2:25pm
Talk-Video
ID: 106 / ses-08-C: 2
Regular Talk
Topics: Data mining / Machine learning / Deep Learning and AI
Keywords: causal analysis, estimation, high-dimensional data, machine learning, statistical inference

DoubleML - Double Machine Learning in R

Philipp Bach1, Victor Chernozhukov2, Malte S. Kurz1, Martin Spindler1

1University of Hamburg, Germany; 2MIT

The R package DoubleML implements the double/debiased machine learning framework of Chernozhukov et al. (2018). DoubleML makes it possible to estimate causal parameters based on machine learning methods. The double machine learning framework consist of three key ingredients: Neyman orthogonality, high-quality machine learning estimation and sample splitting. Estimation of nuisance components can be performed by various state-of-the-art machine learning methods that are available in the mlr3 ecosystem. DoubleML allows users to perform inference in a variety of causal models, including partially linear and interactive regression models and their extensions to instrumental variable estimation. The object-oriented implementation of DoubleML enables a high flexibility for the model specification and makes it easily extendable. This talk serves as an introduction to the double machine learning framework and the R package DoubleML. We demonstrate how users of DoubleML can perform valid inference based on machine learning methods in reproducible code examples with simulated and real data sets.

References:

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018), Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21: C1-C68, URL: https://academic.oup.com/ectj/article/21/1/C1/5056401.

Lang, M., Binder, M., Richter, J., Schratz, P., Pfisterer, F., Coors, S., Au, Q., Casalicchio, G., Kotthoff, L. and Bischl, B. (2019), mlr3: A modern object-oriented machine learing framework in R. Journal of Open Source Software, doi:10.21105/joss.01903, URL: https://mlr3.mlr-org.com/.



2:25pm - 2:45pm
Talk-Video
ID: 173 / ses-08-C: 3
Regular Talk
Topics: Multivariate analysis
Keywords: statistical independence, conditional independence, variable selection, causal analysis

copent: Estimating Copula Entropy and Transfer Entropy in R

Jian MA

NA

Statistical independence and conditional independence are two fundemental concepts in statistics and machine learning. Copula Entropy is a mathematical concept for multivariate statistical independence measuring and testing, and also proved to be closely related to conditional independence (or transfer entropy). It has been applied to solve several fundamental statistical or machine learning problems, including association discovery, structure learning, variable selection, and causal discovery. The method for estimating copula entropy with rank statistic and the kNN method was implemented in the 'copent' package in R. This talk first introduces the theory and estimation of Copula Entropy, and then the implementation details of the package. Three examples will also be presented to demonstrate the usage of the package: one with simulated data and the other two with real-world data for variable selection and causal discovery. The copent package is available on the CRAN and also on GitHub at https://github.com/majianthu/copent/.

Link to package or code repository.
https://github.com/majianthu/copent