Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
Workshop 4
Time:
Wednesday, 21/Feb/2024:
1:30pm - 4:30pm

Session Chair: Ji-Ping Lin, Academia Sinica, Taiwan
Location: Seminar 4 (Room 1.11)

Rheinische Fachhochschule Köln Campus Vogelsanger Straße Vogelsanger Str. 295 50825 Cologne Germany

Show help for 'Increase or decrease the abstract text size'
Presentations

Why Data Science and Open Science Are Key to Build Smart Big Data: An Example Based on a Decade Research on Hard-to-Reach Population in Taiwan

Ji-Ping Lin

Academia Sinica, Taiwan

Duration of the Workshop:
2.5 hours

Target Groups:
Persons who are interested in computational social science, big data, data science, open data, and open science.

Is the workshop geared at an exclusively German or an international audience?
international audience

Workshop Language:
English

Description of the content of the workshop:
The emerging availability of big data in the past decade has overcome traditional constraints in research, especially in the discipline of humanities & social sciences. Increasing availability of big data is changing our world and transforming conventional thoughts about decision-making. Data science aims to cope with issues of big data. By definition, it consists of three disciplines, i.e., hacking skills, advanced mathematics and statistics, and domain knowledge. Taking full advantage of big data requires not only knowledge about fundamentals of data science, but also the ability of implementation. Big data do not offer us enough insight and vision. We need to go further to build smart data through the process of enriching and integrating the quantity and quality from different sources of big data. In the meanwhile, open data and open science have emerged simultaneously in the past decade in light of growing calls for the need to examine research reproducibility.
This workshop aims at
(1) addressing how open data and smart data sets are built by integrating hacking skills, advanced math/statistics methods, and domain knowledge of various disciplines on the basis of data science and open science
(2) the role of online open data repositories in promoting crowd collaboration.
In the three disciplines of data science, the workshop focuses solely on how hacking skills and advanced math/stat are applied to build big data and smart data In the context of extracting valuable information embedded in source individual data, enriching the extracted information through the processes of cleaning, cleansing, crunching, reorganizing, and reshaping the source data. The data enrichment processes produce a number of data sets that contain no individual information but retain most of the source data information. The enriched data sets thus can be open to the public as open data.

Because the corresponding domain knowledge about hard-to-reach population research and Taiwan Indigenous Peoples (TIPs) is not easy to understand for the audience, the instructor will make a very short introduction. The workshop uses a set of open data in TIPD (Taiwan Indigenous Peoples Open Research Data, for details, see https://osf.io/e4rvz/) as an example to demonstrate big data, open data, smart data, data science, and open science. TIPD complies with FARE (Findable, Accessible, Interoperable, Reusable) data principle.
It consists of the following categories of open data from 2007 to 2022:
(1) categorical data,
(2) multi-dimensional data,
(3) population dynamics (e.g. see TPDD: https://www.rchss.sinica.edu.tw/capas/posts/11621),
(4) temporal geocoding data (e.g. see High-resolution visualizations of population distribution, migration dynamics, traditional communities at https://www.rchss.sinica.edu.tw/capas/posts/11393),
(5) household structure data,
(6) traditional TIPs community data (TICD at https://www.rchss.sinica.edu.tw/capas/posts/11205),
(7) generalized TICD query system as a smart data (see https://TICDonGoogle.RCHSS.sinica.edu.tw),
(8) genealogical data (not open to the public).
In the end, the workshop will briefly highlight the impact of open data on promoting crowd collaboration and that of smart data on making effective policy decision-making by using interactive migration dynamics derived from TIPD as an example (TIPD at https: https://www.rchss.sinica.edu.tw/capas/posts/11206; Interactive migration visualizations at https://www1.rchss.sinica.edu.tw/jplin/TIPD_Migration/).

Goals of the workshop:
(1) illustrating methods such as “old-school” multi-dimensional tables that are applied to build&update big open data in automation mode;
(2) demonstrating how open data is built to comply with FAIR, ethical, and legal requirements under the principles of open science;
(3) introducing techniques in record linkage&highly precise address-matching geocoding that enable to enrich temporal&spatial information in big data;
(4) to introduce techniques of data engineering&data sharing that enable us to build and integrate open data repositories systematically and automatically;
(5) to demonstrate why the process of online crowd collaboration to improve open data quality as an effective way to build smart data.

Necessary prior knowledge of participants:
No prior knowledge is required.Participants with knowledge or experince in hacking skills (e.g. digital infrastructure, programming, perfomance tuning of computing system, data engineering, etc.), and/or individual data processing skills (e.g.data cleanse, record linkage), and/or spatial data structure (e.g.spatial data, attribute data, fundamentals of GIS system, etc.) are particularly welcome.

Literature that participants need to read for preparation
None

Recommended additional literature
(1) Lin, Ji-Ping. 2017a. "Data Science as a Foundation towards Open Data and Open Science: The Case of Taiwan Indigenous Peoples Open Research Data (TIPD)," in Proceedings of 2017 International Symposium on Grids & Clouds, PoS (Proceedings of Science).

(2) Lin, Ji-Ping, 2017b, "An Infrastructure and Application of Computational Archival Science to Enrich and Integrate Big Digital Archival Data: Using Taiwan Indigenous Peoples Open Research Data (TIPD) as Example," in Proceedings of 2017 IEEE Big Data Conference, the IEEE Computer Society Press.

(3) Lin, Ji-Ping. 2018. "Human Relationship and Kinship Analytics from Big Data Based on Data Science: A Research on Ethnic Marriage and Identity Using Taiwan Indigenous Peoples as Example," pp.268-302, in Stuetzer et al. (ed) Computational Social Science in the Age of Big Data. Concepts, Methodologies, Tools, and Applications. Herbert von Halem Verlag (Germany), Neue Schriften zur Online-Forschung of the German Society for Online Research.

(4) Lin, Ji-Ping. 2021. "Computational Archives of Population Dynamics and Migration Networks as a Gateway to Get Deep Insights into Hard-to-Reach Populations: Research on Taiwan Indigenous Peoples," Proceedings of 2021 IEEE International Conference on Big Data, IEEE Computer Society Press.

Information about the instructor:

Dr. Ji-Ping Lin received his B.Sc. in Geography from National Taiwan University (Taiwan) in 1988, M.Sc. in Statistics from National Central University (Taiwan) in 1990, and Ph.D. in Geography in 1998 from McMaster University (Ontario, Canada). His main research specialty and interests include migration and population studies, labor study, survey study, scientific & statistical computing, big & open data, data science, and open science. He is serving as associate research fellow at Academia Sinica, Taiwan. The instructor worked in Taiwan’s Bureau of Statistics & Census as research scientist, with abundant real-world experiences in processing, integrating, and enriching various sources of large-scale raw data, as well as in survey planning, sampling design, and conducting surveys. Lin has been serving as consultant for a number of Taiwan’s central government agencies. Since 2013, the instructor devotes himself to the research on hard-to-reach population (HRP) and Taiwan Indigenous Peoples (TIPs). Based on the fundamentals of computational social science, data science and open science, he has been building a number of big open data and smart data.

Will participants need to bring their own devices in order to be able to access the Internet? Will they need to bring anything else to the workshop?
Participants are suggested to bring their own laptop or tablet computer with internet access.



 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: GOR 24
Conference Software: ConfTool Pro 2.8.101
© 2001–2024 by Dr. H. Weinreich, Hamburg, Germany