Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
Parallel 4c: Parallel Session 4c
Time:
Tuesday, 27/Aug/2024:
5:30pm - 6:50pm

Session Chair: Vladimír Havlík
Location: 116 (40)

1st floor (40 seats)

Show help for 'Increase or decrease the abstract text size'
Presentations
5:30pm - 6:10pm

Probabilistic Arbitrary Reference

Matteo Nizzardo

University of St Andrews

Arbitrary Reference is the idea that we can refer to individual entities with some degree of arbitrariness. Although there are different accounts of Arbitrary Reference currently on the market, nearly all of them can be challenged on the basis that they entail the existence of free-floating semantic facts, namely: semantic facts which are not grounded in any non-semantic fact.

To see this, let w1 and w2 be two possible worlds which agree with respect to all non-semantic facts. Suppose that w1 is such that Charlie, a first year student at Oxford University, writes down a proof to the effect that every natural number is either even or odd, starting it with the supposition "Let n be a natural number". By stipulation, w2 is also such that Charlie, a first year student at Oxford University, writes down a proof to the effect that every natural number is either even or odd, starting it with the supposition "Let n be a natural number". (Whether the Charlie in w2 is the same individual as the Charlie in w1, or just one of his counterparts, is irrelevant here. For this reason, I will here remain silent on the problem of transworld individuals and counterparts.) Since w1 and w2 agree with respect to all non-semantic facts, the context and the use facts associated with Charlie's proof in w1 must be the same as the context and the use facts associated with Charlie's proof in w2. The challenge arises because Arbitrary Reference allows for the fact that the referent of the term 'n' in w1 might not be the same as the referent of the term 'n' in w2 --- despite all non-semantic facts are exactly the same in w1 and w2.

However, it is a common assumption in the Philosophy of Language that every semantic fact is grounded in some non-semantic facts. We find this assumption in virtually all theories of reference, be them internalist or externalist.

In this paper I propose a solution. First I argue that the friends of Arbitrary Reference can answer the challenge by appealing to the notion of indeterministic grounding. Indeterministic grounding happens when some low-level facts [P1] ... [Pn], which provide a full grounding base for some incompatible high-level facts [Q1] and [Q2], underdetermine which of [Q1] or [Q2] obtains. In this case, although [P1] ... [Pn] fully ground [Q1] and [Q2], the relation of grounding between low- and high-level facts is indeterministic: the obtaining of [P1] ... [Pn] alone doesn't suffice for the obtaining of [Q1] or [Q2]. Some form of chance is required, which however doesn't threaten the grounding chain: whichever of [Q1] and [Q2] obtains, it will be grounded, albeit indeterministically, on [P1] ... [Pn].

Then I propose a new account of Arbitrary Reference as a probabilistic phenomenon. Informally, the idea is that when we make suppositions like "Let n be a natural number'' we introduce some probabilistic constraints on the possible referents of the instantial term 'n'. I argue that this new account should be preferred over the classical versions of Arbitrary Reference for its ability to build a bridge between cases of canonical and arbitrary reference and the new insights it offers on the phenomenon of semantic vagueness.



6:10pm - 6:50pm

Should language models be treated as models? If so, of what?

Jumbly Grindrod

University of Reading, United Kingdom

Should language models be treated as models? If so, of what?

In this talk, I am going to argue that language models (including large language models such as OpenAI’s GPT series) should be treated as scientific models of external languages: languages understood as a social object and that are plausibly sets of linguistic conventions adopted by a community. In order to defend this view, I will first reject two related positions: the first claims that language models can be used as models of linguistic competence while the second claims that language models are (merely) models of their training data.

Many within computational linguistics, and specifically the field of distributional semantics, are excited about the prospect of language model technology as a new form of scientific inquiry into language (Baroni, 2022; Lenci, 2008; Sahlgren, 2008; Westera & Boleda, 2019). Perhaps the most vocal recent proponent of this view is (Piantadosi, 2023), who claims that language model technology challenges some of the core claims of the generative linguistic tradition. Although they don’t phrase it this way, the best interpretation of this view is that language models can be treated as models of linguistic competence, and so we can then inspect the model as a way of investigating linguistic competence. But the idea that deep learning neural networks could inform linguistic inquiry in this way has been criticized by Chomsky (Chomsky et al., 2023; Norvig, 2012) and others (Dupre, 2021; Veres, 2022). Roughly put, these critics worry that language models are blank slate systems (i.e. they do not have the same innate restrictions as humans) that simulate speaker performance without emulating speaker competence. Although there is evidence being produced from the emerging probing classifier literature that attenuates the strength of the points made by these critics, I will argue that they are nevertheless right.

Many who are skeptical of the possibility of language models providing linguistic insight have instead claimed that language models are merely models of their training data. After all, language models are constructed by setting a language model the task of predicting new text given what has come previously according to the distributional properties of the data it was trained on. This is the second position I will consider. A similar view can be found in Chiang’s (2023) suggestion that language models are best thought of as compressions of their training data, as a jpeg is of a higher resolution image. This view also has an affinity with Kilgarriff’s (1997) famous claim that word meanings only exist relative to the statistical properties of corpora. However, I will argue against this position, for in considering the success of a language model we do not evaluate its success in the language prediction task it is trained on, but set the model to work on new evaluation tasks. One of the amazing insights of language model technology is that these systems are able to perform so well across a wide range of natural language processing tasks. The nature of the evaluation tasks for such models – as well as their success in them – reveals that we are not holding models to a standard internal to the training corpus but are instead testing the extent to which they track something consistent across both their training set and evaluation sets.

I will argue then that language models should be thought of as models of the external language understood as a social object: the E-language in Chomsky’s (1986) terms. What language models are trained on is the actual activity of a language, where all instances across training and evaluation sets are taken to share the feature of being part of the wider language. Viewed through this lens, we are able to see the exciting possibility that language models bring, for they provide us with a way of exploring E-languages that was not available before. If E-languages are a set of social conventions, then they are undoubtedly highly complex objects, and if we acknowledge the fact that any speaker’s cognizance of that set of conventions is going to be incomplete and imperfect, then access to that complex object has previously looked fraught with difficulty. This is partly why Chomsky (1986) has taken there to be no point in positing E-languages. But now that we are able to construct models of an E-language, and in doing so bypass the cognitive domain in a way that wasn’t possible before, we have a new and exciting way of investigating them. I will finish by drawing upon recent work in philosophy of science on the use of deep learning models in scientific practice in order to further support the positive view defended here (Creel, 2020; Shech & Tamir, 2023; Sullivan, 2022, 2023).

References

Baroni, M. (2022). On the proper role of linguistically-oriented deep net analysis in linguistic theorizing (arXiv:2106.08694). arXiv. https://doi.org/10.48550/arXiv.2106.08694

Chiang, T. (2023). Chatgpt is a blurry JPEG of the web. The New Yorker. https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web

Chomsky, N. (1986). Knowledge of Language: Its nature, origin, and use. Praeger.

Chomsky, N., Roberts, I., & Watumull, J. (2023, March 8). The false promise of chatgpt. The New York Times. https://www.nytimes.com/2023/03/08/opinion/noam-chomsky-chatgpt-ai.html

Creel, K. A. (2020). Transparency in Complex Computational Systems. Philosophy of Science, 87(4), 568–589. https://doi.org/10.1086/709729

Dupre, G. (2021). (What) can deep learning contribute to theoretical linguistics? Minds and Machines, 31(4), 617–635. https://doi.org/10.1007/s11023-021-09571-w

Kilgarriff, A. (1997). I don’t believe in word senses. Computers and the Humanities, 31(2), 91–113. https://doi.org/10.1023/A:1000583911091

Lenci, A. (2008). Distributional semantics in linguistic and cognitive research. Italian Journal of Linguistics, 20(1), 32.

Norvig, P. (2012). Colorless green ideas learn furiously: Chomsky and the two cultures of statistical learning. Significance, 9(4), 30–33. https://doi.org/10.1111/j.1740-9713.2012.00590.x

Piantadosi, S. (2023). Modern language models refute Chomsky’s approach to language. LingBuzz. https://lingbuzz.net/lingbuzz/007180

Sahlgren, M. (2008). The distributional hypothesis. Italian Journal of Linguistics, 20.1, 33–53.

Shech, E., & Tamir, M. (2023). Understanding from Deep Learning Models in Context [Preprint]. https://philsci-archive.pitt.edu/21296/

Sullivan, E. (2022). Understanding from Machine Learning Models. The British Journal for the Philosophy of Science, 73(1), 109–133. https://doi.org/10.1093/bjps/axz035

Sullivan, E. (2023). Do Machine Learning Models Represent Their Targets? Philosophy of Science, 1–11. Cambridge Core. https://doi.org/10.1017/psa.2023.151

Veres, C. (2022). Large language models are not models of natural language: They are corpus models. IEEE Access, 10(Journal Article), 61970–61979. https://doi.org/10.1109/ACCESS.2022.3182505

Westera, M., & Boleda, G. (2019). Don’t blame distributional semantics if it can’t do entailment. IWCS. https://doi.org/10.18653/v1/W19-0410



 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: PLM 2024
Conference Software: ConfTool Pro 2.8.106
© 2001–2025 by Dr. H. Weinreich, Hamburg, Germany