PP-Fr-am - S3 -4: Parallel papers Friday morning
Friday, 05/Jul/2019:
11:00am - 1:00pm

Session Chair: Chris Heffer
Location: Seminar room 3
Storey Hall level 7

11:00am - 11:30am

Toward linguistic explanation of idiolectal variation – understanding the black box

Krzysztof Kredens, Piotr Pezik, Lisa Rogers, Samantha Shiu

Aston University, United Kingdom

The development of powerful computing tools and easy accessibility of large quantities of language data online have sparked renewed interest in authorship analysis in a variety of domains. However, as new computational models are put forward for authorship attribution purposes and ever greater success rates reported, a vast majority of the studies remain silent on the nature and types of linguistic phenomena associated with idiolectal style. Meanwhile, in forensic authorship attribution, models should be explanatorily rich: the forensic linguist needs to be both certain of the validity of his/her findings and able to explain them to lay triers of fact; s/he needs to know what actually happens inside the black box.

This paper reports on the findings of a project using what to the best of our knowledge is the biggest corpus (ca. 3 billion words produced by over one million authors) ever used in computational author classification research. However, we are less concerned with classification results here but interested instead in harnessing the big-data capability to inform our understanding of idiolectal variation. By drawing up a typology of style markers that proved to be instrumental in correct author classification (resulting in ‘true positives’) and those that classified authors erroneously (yielding ‘false positives’), we revisit the fundamental question (e.g. Coulthard 2004, Mcmenamin 1993) of just what kinds of idiolectal style markers have the greatest individuating potential.


Coulthard, M. (2004), 'Author Identification, Idiolect, and Linguistic Uniqueness', Applied Linguistics 25(4), pp. 431–447.

McEmnamin, G. Forensic Stylistics (1993), Amsterdam: Elsevier.

11:30am - 12:00pm

‘NBK is closing in’: The Linguistic Repackaging of Mass Murder.

Emily Katherine Powell

Cardiff University, United Kingdom

‘NBK is closing in’: The Linguistic Repackaging of Mass Murder.

Emily Powell

Cardiff University

The description of harm in terms intended to make it more palatable has been explored in relation to a range of descriptions of harmful conduct, from accounts of genocide and torture (Cohen 2001) and stories given by executioners (Osofsky et al. 2005), to language describing the slaughter of animals (Presser 2013). Bandura et al. (1996) assert that in order to cause harm and be able to live with ourselves, it is necessary to disengage our actions from our moral code. One of the key elements of this is the use of euphemistic labelling, sanitising language, and the lending of agency to nature, events and nominalisations (van Leeuwen 2008) in order to make the act more acceptable, the blame less easily placed, and possibly to reduce inhibition. This has implications for the study of offenders and the impact of their own engagement with the elements of their crimes, and the extent to which they take responsibility for them.

This study uses a corpus aided approach to analyse the written narratives of a small group of perpetrators and explores their descriptions of their future crimes in ways that may enable them to preserve their own moral code (Bandura et al. 1996), defend their future actions (Jarvinen 2000), distance themselves from the detail of their actions and responsibility for them (Cohen 2001), and potentially enable them to act.

12:00pm - 12:30pm

From Keywords to Authorship Profiling: A Keyness Approach

Shaomin Zhang

Guangdong University of Foreign Studies, China, People's Republic of

Authorship profiling is used to distinguish between classes of authors. Gender, age classes, occupation, Nationality and religious background, personal writing style, etc. as the key characteristics of an author’s profile are of great significance to narrow down suspect authors in forensic linguistic contexts. Keyness approach, through generating keywords, could give robust indications of a text’s “aboutness” (Bondi, 2010), “style” (Bondi, 2010; Hyland, 2010) and writer’s “stance”(Hyland, 2010) which are assumed to be important pointers of an author’s profile and thus deserve exploration.

This study aims to demonstrate how a text author’s profile in terms of “aboutness”, “style” and writer’s “stance” could be revealed through the exploration of keywords. Specifically, keywords are qualitatively believed as words that play a role in identifying important elements of a text and quantitatively believed as words whose frequency in a text is statistically significant(log-likelihood or Chi-Squared) when compared with a reference text (Bondi, 2010).

In the pilot study, by means of the linguistic software package AntConc 3.5.7, a two-author profiling corpus is analyzed(using log-likelihood as keyword statistics and p<0.05(+Bonferroni) as keywords statistic threshold). Two findings are considered to be noteworthy. Firstly, based on the keywords list of each author, the three types of indications of a text could be identified in both authors’ messages. Secondly, further investigation of collocation of “style” indicator “that” from author 1 (do not use “which” at all) and “which” from author 2 (do not use “that” at all) demonstrate very significant difference in their use.

Authorship Profiling for Chinese Microblogs in Forensic Science

Wenlin Huang, Jingyang Li, Li Wang, Baoliang Sun, Hui Sheng

Institute of Forensic Science, Ministry of Public Security, China, China, People's Republic of

This paper focuses on the study of authorship profiling for Chinese microblogs in the field of forensic science. Analysis for Chinese microblogs would be conducted in this paper, such as language characteristics of Chinese microblogs and the author’s basic demographic traits. Firstly, the principle of Chinese microblogs is discussed. Secondly,the corpus is collected by different age, gender, geographic origin, native language, occupation, level of education of about 80 volunteers. Statistics are obtained from different topics, writing patterns and length. And the author’s traits, such as age, gender, geographic origin, native language, occupation, level of education are profiled through language characteristics, communication topics, writing patterns and individual information. Finally, the new pattern of network communication gives access to more new information through internet, such as bloggers’ characteristic information, social characteristics of network groups, territorial information and hobbies. All these information generated from new communication pattern, together with phonetic features, lexical features, syntactic features, textual features and rhetoric devices provide new brand research angle for authorship profiling studies.

