Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Session Overview
19-AM-08: ST1.3 - Data Science for Innovation Challenges
Wednesday, 19/June/2019:
8:30am - 10:00am

Session Chair: Paola Belingheri, University of Pisa
Session Chair: Filippo Chiarello, University of Pisa
Session Chair: Antonella Martini, University Of Pisa
Session Chair: Andrea Bonaccorsi, University of Pisa
Location: Amphi Lagarrigue

Session Abstract

The information field has changed dramatically over the past years, affecting the economy, technology, culture and society. However, these changes have left an even stronger mark on business systems (Jin 2015). Considering the mass of digital information produced in the past 10 years, companies have found themselves in a chaotic and constantly expanding digital universe. To innovate and stay competitive, companies must master methods and tools to prevent information overload, while gaining useful knowledge from the available data (Feng 2015).

The discipline of Data Science has emerged as a clear (although broad) field of research to solve data-related problems (Provost 2013). Data science is an interdisciplinary field that uses scientific methods to extract knowledge and insights from structured and unstructured data. It attracts researchers and encompasses methodologies from wide-ranging fields such as statistics, mathematics, information science, computer science , data analysis, machine learning and communication and is therefore an ideal tool to bridge the gap between research, industry and society (Waller 2013).

The objective of the present track is to collect works that use state-of-the-art Data Science tools and techniques to gather, transform, model and visualize data (Wickham 2014) to gain valuable information relevant for firm innovation. The scope is to use publicly available data to obtain a clearer view of which information sources contain the most untapped value and which methods and tools can be used to uncover it.

The main contributions are expected to highlight which information is relevant for different companies to build knowledge as a tool for innovation, in particular related to:

- data science for product innovation (Chiarello 2018a; Tan 2015): e.g. data-driven product development, A/B testing, patents analysis, product success evaluation, machine learning for innovation.

- data science for technology intelligence (Colladon 2018; Chiarello 2018b): e.g. brand analysis, competitors mapping, partners individuation, tools for knowledge visualization and communication.

- data science for open innovation & co-creation (Hoornaert 2017): e.g. papers mapping, open analytics, IP analysis, cloud computing.

- data science for new skill identification & mapping (Frey 2017): e.g. curricula analysis, job vacancies identification, job creation, new skills for innovation.

We expect to see contributions coming from the usual sources (e.g. open databases, patents, papers, social media) but we especially welcome contributions from less-known sources. Since Data Science is broad, we expect to showcase a wide range of methodologies such as machine learning, deep learning, natural language processing, image analysis or tools for data visualization and communication, to name a few.

Show help for 'Increase or decrease the abstract text size'

Leveraging Deep Learning for Image Data Anonymization in the Insurance Domain: A Methodological Approach and a Relevant Case Study

Alessandra Andreozzi1, Antonella Martini1, Lorenzo Ricciardi Celsi2

1Università di Pisa, Italy; 2ELIS Consulting & Labs, Italy


Digital transformation is triggering radical changes in terms of value proposition in the insurance industry, thanks to the emerging data-driven processes based on deep learning tools. However, unlocking valuable insight in this respect depends on the fine-tuning of suitable algorithms, and on the quality/quantity of the prepared input data.


AI encompasses deep learning techniques that are suitable for the object detection task foreseen by image anonymization for insurance purposes: among the several available methods of implementing such computer vision strategies, convolutional neural networks have returned the best results over the last years [3].

More in detail, object detection requires deep ad-hoc architectures and the subsequent approach to combining layers into a suitable model is generally tailored to the task that needs to be performed. In this respect, new ways of combining layers are constantly released in order to provide improved architectures [3]: among these, YOLO [4, 5], SSD [6], Faster R-CNN [7], and RetinaNet [8, 9] have been identified as the most in line with the customer’s needs.

Eventually, RetinaNet was chosen as the most suitable framework for image anonymization, mainly due to the fact that focal loss allows concentrating the algorithm design on learning from hard examples.

Literature Gap

The implementation and fine-tuning of the most recent deep neural network architectures provides valuable and unprecedented performance in privacy-preserving insurance business processes. In order to effectively evaluate the peculiar task of anonymizing sensitive data in insurance images, ad hoc performance metrics are to be identified and validated (i.e., Intersection-over-Union, Recall).

Research Questions

The research aim is to design an efficient procedure for anonymizing car images, blurring sensitive data while leaving any other data unaffected (i.e., the part of the image accounting for the damaged objects).


The considered case study refers to a project carried out by ELIS Consulting and Labs for an Italian insurance company, within a wider strategic view aimed at fostering the automation of the claim management process through deep learning algorithms adopted for image analytics purposes [3-9]. The investigated problem consists in solving the task of detecting and blurring the sensitive data appearing in photographic material picturing car accidents.

Empirical Material

The iterative design, training, and validation of the object detection algorithm for image anonymization, as well as the evaluation of the most relevant performance metrics, was carried out starting from renowned open-source packages that are made available by several GitHub repositories. Furthermore, the algorithm implementation relies on the most recent version of the Keras and TensorFlow Python libraries for creating and manipulating deep convolutional neural networks.

The detection task is performed on a set of more than 10000 images, representing car accidents, and containing sensitive data (e.g., license plates, person faces, and shapes) that are subject to anonymization.

In particular, neural network training was carried out relying on GPU support, thus reaching a degree of computational performance that, based on the currently available technology, would have been otherwise unachievable.


The object detection algorithm is aimed at allowing the anonymization of car images by detecting and blurring any sensitive data while leaving any other valuable data unaffected. Thus, the part of the image accounting for the damaged objects is left untouched so that the subsequent categorization of the car image can be performed (e.g., assessing where the most relevant car damages are located and what is the extent of the damage).

The related deep neural network, designed according to the RetinaNet framework, was trained on a sample of more than 10000 images stored in the customer’s database, and eventually, several tests on samples of 3000 images containing sensitive data were carried out to evaluate the algorithm performance.

The obtained results show that the performance of the trained object detection algorithm at predicting any object classes accounting for sensitive data (e.g., license plates, vehicle identification numbers and person shapes) in terms of recall and Intersection-over-Union metrics proves to be greater than 90%. This score was found to be consistent with the recent computer vision literature as well as with the customer requirements, thus encouraging the adoption of the proposed tool within the process of automating claim management in the insurance domain.

Contribution to Scholarship

Using deep learning based tools enables insurance companies to create value by improving their data-driven processes in line with the digital transformation strategy and maintaining a customer-centric approach. The outcome and benefits of this work are expected to be measured, in the medium/long run, in terms of positive impact on claim management and customer satisfaction KPIs (i.e., average cost per claim, customer turnover).

Contribution to Practice

Insurance companies have been heavily influenced by digital transformation with several implications in terms of effective data management, especially as regards data collection, preparation, and processing. The analysis of unstructured data (e.g., images) and subsequent knowledge extraction in order to drive business decisions urgently requires the adoption of advanced AI techniques, such as deep learning: this will not only enable the automation of processes and low value-added activities (thus leading to cost reduction) but also provide decision support at several levels (e.g., privacy protection and information sanitization in compliance with EU regulation) and improve customer experience and satisfaction [10].


The project output represents a powerful tool enabling the insurance industry to improve the knowledge base in terms of data intelligence, and encouraging data-driven product/service development. The implementation of object detection solutions allows to optimize cost allocation when planning and executing the whole claim management process.


[1] Waller, M. A., & Fawcett, S. E. (2013). Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management. Journal of Business Logistics, vol. 34, no. 2, pp. 77-84.

[2] Daugherty, P. R., & James Wilson, H. (2018). Human + Machine. Reimagining work in the age of AI. Harvard Business Review Press.

[3] L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chien, X. Liu, M. Pietikainen (2018). Deep Learning for Generic Object Detection: A Survey, International Journal of Computer Vision, in press.

[4] J. Redmon and A. Farhadi. (2018). YOLOv3: An Increment Improvement, arXiv.

[5] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi (2016). You Only Look Once: Unified, Real-Time Object Detection, in Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A.C. Berg (2016). SSD: Single Shot MultiBox Detector, in Proceedings of the 2016 European Conference on Computer Vision (ECCV).

[7] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, S. Belongie (2017). Feature Pyramid Networks for Object Detection, in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollar (2018). Focal Loss for Dense Object Detection, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2980-2988.

[9] K. He, X. Zhang, S. Ren, J. Sun (2015). Deep Residual Learning for Image Recognition, in Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] M. Corlosquet-Habart, J. Janssen (2018). Big Data for Insurance Companies, Volume 1. Wiley.

Circular Economy: new paradigm or just relabelling? A Quantitative Text and Social Network Analysis on Wikipedia webpages.

Dario Cottafava1, Grazia Sveva Ascione1, Ilaria Allori2

1University of Turin, Italy; 2University of Pisa, Italy


The interest in Circular Economy is rapidly growing, gathering efforts from academia, industry and policy makers, involving different disciplines such as economics [2], chemistry [3], design [4] and industrial ecology [5]. Due to its interdisciplinary but ambiguous nature, a systemic approach is required to set its boundaries.


Circular Economy has gained a momentum among researchers and practitioners, aiming to develop new technologies, applications, procedures and methodologies. However, despite the steep growth in Scopus publications (around 350 in 2016 and more than 1000 in 2018) and consultancy reports, critics hold that blurriness can occur when a concept operate in significant different worlds of thoughts [9]; furthermore, such a great variety of approaches might result in a barrier to operationalize the concept for stakeholders [7]. Therefore, the risk of collapsing or remaining in a deadlock is higher if there is a conceptual contention [4].

Some authors tried to overcome the blurriness around CE through bibliometric analysis [1,9] and comparisons of definitions. These studies resulted in an increased transparency about the understanding of CE and its core principles. On the other hand, they could not provide any information about terms definition and the linkages between them.

Literature Gap

Considering the necessity of systematically analyse the CE concept, it is clear that a methodology which goes beyond definitions analysis and comparison would be needed. This research work could help not only defining the boundaries of the CE, but also analysing the relationships between different disciplines and spheres of knowledge.

Research Questions

The research questions are as follows:

Which network of terms exhaustively depicts the Circular Economy domain?

Does this network portray the Circular Economy as a new paradigm or does it represent a relabelling of already existent knowledge?


The methodology builds on a previous study by Chiarello et al. [8], in which a novel approach to Industry 4.0 was proposed.

As Industry 4.0 and Circular Economy share high difficulty of delineation, applying an analytical tool that helps mapping the field is of great interest.

The main steps are:

creation of a CE dictionary, starting from a seed list of terms extracted from scientific literature, later expanded and cleaned;

graph generation using the above-said dictionary: each node represents a term and the edge represents a link between Wikipedia pages;

representation and manual labelling of identified clusters.

Empirical Material

The seed documents of this research work will be selected as follows:

- by taking the 10 most cited papers on Circular Economy to date, according to Scopus. The reference to CE must take place either in the title, abstract or keywords;

- by selecting 10 relevant reports about CE, including non-academic sources such as Ellen McArthur Foundation and EY Reports, as well as additional technical and governmental documents.

The above-mentioned documents will be manually parsed in order to extract a seed list of terms, that will serve as an input for the automatic expansion phase. The latter will consist in:

- finding the Wikipedia pages corresponding to each term of the seed list;

- automatically expand the seed list by retrieving all the hyperlinks included in the Wikipedia page.

The automatic phase will give back a set of hyperlinks target pages, that has to be analysed and cleaned, by excluding the ones that are not relevant for the purpose of this research and by classifying the remaining.

The result will be a graph in which nodes are terms and the edges represent links between Wikipedia pages; the resulting network will therefore be an interesting target for the application of graph-theoretic metrics.


The main contributions of this work is the development of a heterogeneous dictionary for the Circular Economy, which will contribute to the identification of the main theoretical and applied research fields entailing Circular Economy. Thus, the first result will be the creation of a dictionary of terms.

Furthermore, this research will generate a network of Wikipedia webpages related to the Circular Economy, in order to show the connections among different concepts and contents. Then, it will be possible to perform a graph analysis and automatically identify clusters of terms, that will be later manually labelled.

Consequently, the obtained network will shed light on the semantic structure of Circular Economy, enhancing stakeholders’ understanding about relationship among different domains, disciplines and technologies, which relates to the CE.

Contribution to Scholarship

The CE dictionary aims to be an helpful tool for shedding a light on the main components and on the boundaries of this domain. In addition, the dictionary could be used in a diverse range of applications such as in patents or in publications to achieve classification goals. In addition, it could be used to make comparisons with the other methodologies used to perform similar tasks, such as IPC codes, keyword search etc.

Contribution to Practice

The dictionary could help stakeholders to achieve a greater clarity about the field and so encourage them to participate in the Circular Economy. A great advantage of the dictionary relies in the fact that it will be automatically updated, then it could help in evaluating trends over time in the composition and in the density of the network. Furthermore, identifying void regions in the network could give hints about the future evolution of the field.


Circular Economy is a research field which links research, industry and society as the stakeholders involved in its processes range from academics, to politicians to companies. In addition, this research will implement a recent methodology to this field, creating a result which is impossible to find in previous works.


[1] Geissdoerfer, M., Savaget, P., Bocken, N. M., & Hultink, E. J. (2017). The Circular Economy–A new sustainability paradigm?. Journal of cleaner production, 143, 757-768.

[2] Andersen, M. S. (2007). An introductory note on the environmental economics of the circular economy. Sustainability Science, 2(1), 133-140.

[3]Clark, J. H., Farmer, T. J., Herrero-Davila, L., & Sherwood, J. (2016). Circular economy design considerations for research and process development in the chemical sciences. Green Chemistry, 18(14), 3914-3934.

[4] Bocken, N. M., de Pauw, I., Bakker, C., & van der Grinten, B. (2016). Product design and business model strategies for a circular economy. Journal of Industrial and Production Engineering, 33(5), 308-320.

[5] Yuan, Z., Bi, J., & Moriguichi, Y. (2006). The circular economy: A new development strategy in China. Journal of Industrial Ecology, 10(1‐2), 4-8.

[6] Okorie, O., Salonitis, K., Charnley, F., Moreno, M., Turner, C., & Tiwari, A. (2018). Digitisation and the Circular Economy: A Review of Current Research and Future Trends. Energies, 11(11), 3009.

[7] Kirchherr, J., Reike, D., & Hekkert, M. (2017). Conceptualizing the circular economy: An analysis of 114 definitions. Resources, Conservation and Recycling, 127, 221-232.

[8] Chiarello, F., Trivelli, L., Bonaccorsi, A., & Fantoni, G. (2018). Extracting and mapping industry 4.0 technologies using wikipedia. Computers in Industry, 100, 244-257.

[9] Gladek, E., (2017). The Seven Pillars of the Circular Economy. Available at.

Defining definition: a Text mining Approach to Define Innovative Technological Fields

Vito Giordano, Elena Cervelli, Filippo Chiarello

University of Pisa, Italy


New technological fields are emerging from a current landscape characterized by multidisciplinarity, turbulence and uncertainty [1]: for this reason, it is hard for researchers and practitioners to give a commonly agreed definition. This work aims at solving this problem by proposing a Text Mining approach.


The first task for mapping a new technology is field delineation. The delineation of new fields of science and technology is an issue addressed since the late ‘70s, after the pioneering period of bibliometrics. Field delineation is a necessary step when existing classifications do not offer timely, reliable or comprehensive coverage of a topics, for example of a new technology or a new technological field. Moving beyond existing classifications require undertaking a search which, in general, may follow a lexical approach, a citationist approach, or a mix between the two [2]. The main approach has been based on manual collection keywords, to be identified in various regions of documents (title, abstract, keywords, full text of an article; title, abstract, claims, full text of a patent) and to be used as queries.

Literature Gap

Expert based (or top-down) keyword definition is a very expensive activity [3]. Furthermore, the keyword selection is based on subjective judgment, and when experts are asked to decide on relatedness measures (e.g. synonims, hypernims or hyponims), they do not apply systematic rules [4].

Research Questions

With the present research we want to demonstrate that is possible to use text mining techniques in order to define a fuzzy technological field thus overcoming the issues of the top-down approaches.


Our work starts by manually analyzing a set of technological fields definitions taken from Wikipedia to create a set of empirical rules. This set of bottom-up generated rules defines a mathematical model of definitions, that is an enriched and actionable version of the well known Genus–differentia definition model [5]. The model allows to obtain a formal definition of a target technological field, using a text mining approach. The builded definitions have a rigorous structure.

Empirical Material

Wikipedia is a multilingual, web-based, free encyclopedia. Since it based on a model of openly editable and viewable content, what it contains is the result of the interaction of many contributors with different backgrounds. For this reason, we used Wikipedia to formalize the definition model, in particular we extracted information from about 1000 pages.

In addition, to search for a definition set regarding a certain technological field and to construct the best one, we rely on 40.000 abstracts available on Elsevier's Scopus, the largest abstract and citation database of peer-reviewed literature.


Our model has been tested for mining the definition of target technological fields from Scopus papers and for identifying the best definition among this set. It represents the starting point for the construction of the automatic definition tool we propose. We applied the automatic definition tool to four case studies that are Industry 4.0, Data Science, Artificial Intelligence, fintech for a total of 40.000 articles. These technological fields have been chosen for their multidisciplinarity, turbulence and uncertainty. For each technological field we construct a formal definition, having a predetermined structure: they present the most used synonyms and abbreviations in scientific papers, formally explain the technological field and propose a complete list of technologies belonging to it.

We will also present a discussion about the positive and negative aspects of the proposed definitions, asking to expert of the different fields taken into consideration in the case study.

Contribution to Scholarship

The major contribution of our research is to give the possibility to formalise the technological field definition process. This activity is typically time-consuming and unstructured, leading to fuzzy results in terms of the lexical structure and the level of abstraction of the definition. Our work wants to make this process systematic and controlled, thanks to the method we propose.

Contribution to Practice

The automatic definition tool is designed to help researchers, innovators and policy makers in the scope definition process. The scope involves getting information required to start a project, and the features the product needs to have in order to meet the stakeholders’ requirements. For R&D activities in the context of new technological fields the boundaries are not easily detectable and it’s difficult to decide what it is in-scope and out-of-scope. Our tool helps in the setting of the boundaries of a project by giving the definition of a technological fields and a complete set of technologies of that field.


This research seeks to bridge the gap between research and industry, by developing and demonstrating a tool that will systematically analyze scientific publications in an fuzzy technological field to define a definition of the field itself.


[1] Chiarello, F., Trivelli, L., Bonaccorsi, A., & Fantoni, G. (2018). Extracting and mapping industry 4.0 technologies using wikipedia. Computers in Industry, 100, 244-257.

[2] Small, H. (2006). Tracking and predicting growth areas in science. Scientometrics, 68(3), 595-610.

[3] Tseng, Y. H., Lin, C. J., & Lin, Y. I. (2007). Text mining techniques for patent analysis. Information Processing & Management, 43(5), 1216-1247.

[4] Noh, H., Jo, Y., & Lee, S. (2015). Keyword selection and processing strategy for applying text mining to patent analysis. Expert Systems with Applications, 42(9), 4348-43607

[5] Parry, W. T., & Hacker, E. A. (1991). Aristotelian logic. Suny Press..

Attracting Talent Through the Elimination of Gender Bias in Job Vacancies: a Preliminary Lexical Approach

Paola Belingheri, Filippo Chiarello, Antonella Martini, Andrea Bonaccorsi

University of Pisa, Italy


Language can often be considered gender-specific (Weatherall, 2005). In the workplace, this can lead to biases in recruitment processes (Bem& Bem, 1973; Gaucher et al., 2011), creating barriers for women to access male-dominated industries. Using Text Mining, Semantic and Social Network Analysis, we present a novel approach to address this.


Language can be differently interpreted by men and women and, especially in the workplace, this can lead to biases in application and recruitment processes (Bem & Bem, 1973; Gaucher et al., 2011). Subtle nuances persist in the way companies write vacancy notices, which may still influence what type of person responds, including their gender (Gaucher et al., 2011). These are often difficult to identify by those who are operating in the field since they can be hidden in the orthographical, grammatical or semantic content of the text. This is a major problem and a barrier for women to access male-dominated industries (e.g., aerospace), and the business context in general.

Starting from general purpose gender-biased lexicons and using Natural Language Processing and Named Entity Recognition (Nadeau & Sekine, 2007) to identify generic expressions in the vacancies, we produce an upgraded lexicon of gender-biased expressions, thereby supporting companies in attracting diverse talent.

Literature Gap

Research has examined the effects of gender-biased language on applicants, in contrast we're focusing on whether and how a gender-biased language influences recruiting processes and firm performance. In order to do this, first a statistical model needs to be developed that will measure the degree of gender-bias in vacancy notices.

Research Questions

The research objective of this paper is to identify and compile a relevant lexicon to assess the gender bias of vacancy notices, and apply it to a sample of vacancy notices in the aerospace industry. This industry has been chosen because it is historically male-dominated.


Starting from the analysis of general purpose gender-biased lexicons we will use Natural Language Processing and Named Entity Recognition (Nadeau & Sekine, 2007) to identify generic and field specific expressions in the aforementioned vacancies. Our method will make it possible to develop the dictionary in two directions:

1- Depth: identify new words (nouns, verbs, adjective and pronouns) that are indicators of gender bias both generically and in the field of aerospace.

2- Width: Identify synonyms and periphrases of the gender biased expression coming both from the general purpose gender-biased lexicons and the in depth expansion made at point 1.

Empirical Material

As a basis for this paper we have downloaded more than 800 vacancy notices from the website, one of the most well-known job search portals in the aerospace industry. This material will be used both to test the general purpose gender-biased lexicons and to extract new lexicon specific of the aerospace sector.


Starting from our database of sector specific vacancies, we will create a lexicon of gender-biased expressions. We will start from the analysis of general purpose gender-biased lexicons and use Natural Language Processing and Named Entity Recognition (Nadeau & Sekine, 2007) to identify generic and aerospace-specific expressions in the aforementioned vacancies.

The output will be a preliminary version of an upgraded lexicon of expressions that indicate gender biases, that will be used to measure the extent of gender bias in specific vacancy notices. This will be the first statistical model that will enable us to work towards answering questions related to gender bias in vacancy notices and firm performance. Once developed with the present work the lexicon we will relate the presence of gender biased jergon with the ability of companies to attract a balanced set of potential workers, and thus to build more talented and diverse teams.

Contribution to Scholarship

We will contribute to the gender, human resources management, organizational behavior, and natural language processing streams of literature. Specifically, we will develop new methodologies for the assessment of gender-biases in organizational communication and we will extend literature on the determinants of female presence in firms by assessing the role of the language they use.

Contribution to Practice

The tool we develop based on this statistical model could be used by firms to assess whether, and to what extent, gender-bias pervades their language. At the same time, based on the empirical results, we will inform practitioners on the consequences of the use of a gender-biased language, and offer insights on how to design gender-neutral recruitment campaigns. Ultimately, this project could impact society, supporting the achievement of a more balanced representation of women in the workforce.


The promotion of diversity within firms is a key asset to promote innovation. The theme of diversity indeed spans across industry, research and society and is one of the great challenges of the 21st Century.


Bem, S. L., & Bem, D. J. (1973). Does sex‐biased job advertising “aid and abet” sex discrimination? Journal of Applied Social Psychology, 3(1), 6-18.

Gaucher, D., Friesen, J., & Kay, A. C. (2011). Evidence that gendered wording in job advertisements exists and sustains gender inequality. Journal of Personality and Social Psychology, 101(1), 109.

Nadeau, D., & Sekine, S. (2007). A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1), 3-26.

Weatherall, A. (2005). Gender, language and discourse: Routledge.

Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: R&D Management Conference 2019
Conference Software - ConfTool Pro 2.6.135+TC
© 2001 - 2020 by Dr. H. Weinreich, Hamburg, Germany