Conference Agenda (All times are shown in Eastern Daylight Time)

Session
Paper Session 11: Scholarly Ecosystems and Publishing and Generative AI
Time:
Monday, 17/Nov/2025:
9:00am - 10:30am

Location: Potomac IV


Presentations
9:00am - 9:15am

The Wicked Problem of ChatGPT: Information Avoidance, Uncomfortable Knowledge, and AI in Scholarly Communication

H. Moulaison-Sandy, H. Thach

University of Missouri, USA

Generative artificial intelligence (GenAI) has had polarizing effects due to its content-creation capabilities and the systems in which it is integrated; at the same time, GenAI has been undertheorized in information science. By grounding its conceptual inquiry in the literature and examining the scholarly communication ecosystem, this short paper considers the role of GenAI as a “wicked problem” for scholars and publishers in scholarly communication, affecting how information is produced, selected for publication, and consumed. The concept of uncomfortable knowledge is used to explore how GenAI may be accepted or rejected out of hand. Building on this, the human information behavior (HIB) principle of information avoidance further situates the problem by examining how scholars and publishers may resist or ignore information that conflicts with their existing worldviews. By formally intersecting these concepts to analyze GenAI in scholarly communication, this paper addresses an important conceptual lacuna in information science.



9:15am - 9:30am

Library Genesis to Llama 3: Navigating the Waters of Scientific Integrity, Ethics, and the Scholarly Record

L. Ridenour1, H. Thach1, S. E. Knudsen2

1University of Missouri, USA; 2Independent Scholar, USA

This work examines the intricate connections between Generative AI (GenAI), its training data, and the scholarly record through a data-driven discourse analysis. It is common for training datasets for GenAI models to be confidential and proprietary. Consequently, questions about the quality and provenance of the data are often raised. Leaked internal documents suggest that Meta’s Llama3 was trained using pirated data from the file-sharing platform Library Genesis (LibGen) (Reisner, 2015). Given the increasing use and popularity of GenAI in scientific and educational contexts, we investigate metadata from scientific articles in LibGen that were reportedly used to train Llama3 and assess the potential impact of retractions and related concerns identified by Retraction Watch (Retraction Watch Database, 2018) on GenAI output. Using the LibGen API, we identified retracted articles in biomedical science and chemistry, domains known for high retraction rates, and analyzed their retraction reasons. This paper is a preliminary exploration of a complex topic and contributes to discussion surrounding the effects of training data quality on GenAI output, with particular attention to scientific integrity and the ethical implications of data sourcing practices.



9:30am - 10:00am

Unraveling the Complexity of Carbon Footprint Research: A Framework of Sigmoid-Based Lifepaths, Regime Classification, and Topic Modeling

O. Buchel1, L. Hedayatifar1, S. Aytac2, C. Y. Tran3

1New England Complex Systems Institute; 2Long Island University, USA; 3Stony Brook University

Our study introduces a novel bibliometric methodology integrating sigmoid-parameterized lifepaths, regime classification, and topic modeling to analyze the dynamics of scholarly research. Using carbon footprint (CF) research as a case study, our approach moves beyond traditional bibliometric techniques by applying sigmoid modelling to track the research trajectories of authors, countries, and topics. This enables the identification of distinct phases, including acceleration, inflection, saturation, and decline. Our framework incorporates “time folds” to accommodate nonlinear disruptions and catalytic transitions, revealing how external factors – such policies, economic shifts, and global events – shape research progress. Our findings highlight regional disparities in research activity, with China and India exhibiting rapid acceleration, Western nations indicating saturation, and African research remaining fragmented. Topic modelling identifies key research shifts in agriculture, construction, and carbon-free technologies, reflecting evolving global priorities. To facilitate further exploration, we provide an interactive visualization on GitHub, enabling scholars to engage with research lifepaths, analyze thematic shifts, and examine country-level trends. By making these tools and methods openly accessible, this study offers a foundation for researchers to refine and expand the framework across other research fields, ultimately supporting deeper investigations into scholarly ecosystems, emerging trends, and the impact of research policies.



10:00am - 10:30am

LISGPT: Research on the Construction of a Library and Information Science Academic LLM Based on the Boundary Knowledge Enhance Framework

Y. Zhu1, Y. Duan2, H. Hu3, J. Jin2, J. Ye1

1School of Information Management, Nanjing University, People's Republic of China; 2School of Government, Beijing Normal University, People's Republic of China; 3School of Economics and Management, China University of Geosciences, People's Republic of China

Academic large language models have demonstrated transformative potential in natural language processing tasks. However, they still face significant challenges in adequately understanding highly specialized and complex domain-specific knowledge. To address this issue, this study introduces the Boundary Knowledge Enhance (BKE) framework, which constructs a large-scale, high-quality professional question-answering dataset (n = 276,083) in the Library and Information Science (LIS) domain, specifically designed to capture the complexity of social science knowledge. Furthermore, by employing the proposed Direct Boundary Knowledge Optimization (DBKO) training method, the model’s ability to comprehend and apply specialized domain knowledge is significantly enhanced. Experimental results show that LISGPT achieves superior performance compared to state-of-the-art commercial models. In the literature keyword prediction task, it outperforms all baseline models with an F1 Score of 0.3973, ranking first. In the professional translation task, it reaches 99.1% of the performance level of DeepSeek-V3-671b, achieving an average score of 0.5971 and ranking third. Ablation studies confirm that the overall performance improvement of LISGPT after DBKO training is 2.32%. This study open-sources the large LIS training datasets and three versions of a specialized LIS academic model, offering a practical paradigm for developing open-source, efficient models in other humanities and social sciences domains.