Automated fake news detection is something of a holy grail at the moment in deception research, machine learning and AI in general. Yet, despite all the high profile committees and investigations dedicated to it, fake news is not an isolated problem; it is part of the general epistemic malaise that has caused us to refer to our current era as ‘post-truth’. With this in mind, I have approached the problem of fake news detection building on my work on fake review detection. This is not to trivialise the problem; despite its greater social and political impact, the production of fake news is an equally commercial operation (complete with its own writing factories eg. Macedonia, Kosovo and Maine).
In 2018 I presented a paper on fake book review detection at Stanford University’s Misinformation and Misbehaviour Mining on the Web workshop. One of my key findings was that authentic reviews were significantly more likely to contrast positive and negative aspects of a book, even in 5-star reviews; positive reviewers often hedge their praise and include caveats (see examples 1 and 2 below). Fake reviews were significantly less likely to display such a balance – basically, deceivers were unable to suggest good points and bad points about a book they hadn’t read. Instead, deceptive reviewers would make a single point and then continue on – elaborate – in the same vein, sometimes in a rambling or waffly manner (for example, 3 below).
1. You’re not going to find endless action, shocking plot-twists, or gut-busting comedy. What you will find is a simple beautiful poetic story about life, desire and happiness.
2. Sometimes things happen a bit too conveniently to ring true, sometimes it is predictive, but in the end you won’t care.
3. This story is extremely interesting and thought provoking. It raises many questions and brings about many realizations. As you read it becomes increasingly clear we really are not so different after all. Great read!
Figure 1: Extracts from Amazon book reviews used in Popoola (2018)
Contrasting is most often (although certainly not always) signalled with ‘but’ – as in example 2 above – so a rough and ready technique for testing whether Contrasting is more common in truth than deception is to compare the frequency of ‘but’ in known real and fake reviews. I followed up my initial findings by analysing 1570 true and fake book reviews and found authentic reviews do use ‘but’ substantially more than fake reviews and that authentic reviews are more likely to use ‘but’ to signal Contrast relations (see Figures 8 and 9 below; full findings, along with data source, can be found in Popoola (2018).)
USE OF ‘BUT’ IN TRUE VS. FAKE REVIEWS
What does this have to do with fake news? Presenting all sides of a case or argument, in the name of objectivity and balance, is a conventional feature of the news story genre because it is fundamental to journalism ethics. Balancing and Contrasting are not the same but linguistically they can be performed with similar language – contrastives. Contrastives include conjunctions such as ‘but’, ‘either’ and ‘or’, conjunctive adverbs such as ‘however’ and prepositions like ‘despite’. This can be contrasted with the use of additives – e.g. ‘and’, ‘also’, ‘in addition – for Elaborating. Contrastives and additives are two of four general linguistic strategies for connecting texts – cohesion devices (REF).
My hypothesis is that there will be variation between the different news sources in the proportion of additives vs. contrastives used – and, just like the book reviews, authentic news sources will use more contrastives. Since additives are the most common way of connecting textual information (‘and’ is the third most common word in English, six times more frequent than ‘but’ – good word frequency list here if you are into that kind of thing), I calculated the relative use of contrastives compared to additives
I piloted this approach on a 1.7million word corpus of political news stories downloaded from 15 news sources in Spring 2017. The 15 sources were a representative mix of legacy and contemporary news media from acrosss the political spectrum: Bipartisan Report; Breitbart; Freedom Daily; The Daily Caller; The Daily Mail; Addicting Info; Alternative Media Syndicate; The Daily Beast; Think Progress; BBC; CBS; CNBC; CNN; The Huffington Post; The New York Times.
I used the following definitions for the cohesion strategies:
- contrastives = ‘but’|’either’|’or’
- additives = ‘and’|’also’|’in addition’.
Figure 2 is a scatterplot of each news outlet’s proportion of additive and constrastive relation cues. It shows substantial variation in text cohesion strategies with six news sources lying over one standard deviation from the mean (i.e. outside of the yellow rectangle); additive cohesion is particularly frequent for The Daily Mail, Breitbart, Bipartisan Report and The Daily Caller , while contrastive cohesion is particularly frequent for The Daily Beast and the BBC.
Figure 2: Scatterplot of variation in text cohesion strategies in 15 online news sources. x=Contrastive / [Contrastive+Additive]; y=Additive / [Contrastive+Additive]. Coloured rectangle represents 1 SD from mean.
Example of additive textual cohesion from Breitbart
Example of contrastive textual cohesion from The Daily Beast
Full article here: https://www.thedailybeast.com/nikki-haley-steps-up-in-syria-crisis
So, we can see that the textual cohesion strategies can differentiate articles within the genre. My hypothesis says that the news articles using more additive strategies are more likely to be fake, in this case that The Daily Mail and Breitbart are more likely to produce fake news than the BBC and The Daily Beast. How do we know what is fake? Since we are a looking at the overall source rather than individual articles, we can use a general scoring system. For now, we’ll use the simple ‘failed a factcheck’ test. All the news sources that have ever failed a factcheck are marked in red in Figure 4 below.
Figure 4: Scatterplot of variation in text cohesion strategies in 15 online news sources. News sources with failed factchecks marked red (source: mediabiasfactcheck.com)
As can be seen, 9 of the 15 news sources have failed a factcheck recently; factchecking by itself is not the most sensitive discriminator. However, 6 of the 15 news sources tend towards additive cohesion strategies and all 4 of the highest additives have failed factchecks whilst neither of the prototypical contrastive texts are ‘fake by this definition.
So, it would seem that just like with fake book reviews, there is a tendency for fake news to lack shades of contrast. Perhaps deceivers are less likely to contrast their lies with the truth because it dilutes their deception. As you read, I’ve been adding more news sources to the analysis and refining the cohesion strategy specifications. Stay tuned!
Can you tell a real book review from a fake book review? Below are five tips that I have learned from my research. It is likely to be fake if:
1) It reads like a press release
2) It reads like the blurb on the back of the book
3) It is either extremely positive or extremely negative. Authentic reviews are more nuanced and tend to mention possible negatives even in 5-star reviews.
4) Fake reviewers are more likely to talk about themselves
5) Fake reviewers are more likely to address you the reader
The more of these signs you find in a review, the more likely it is to be fake. Now take the Fake Review Challenge and see how you get on!
In the first post of this two-part linguistic investigation, we set up an unsupervised analytical approach; factor analysis to identify latent dimensions of linguistic variation in the ‘What’s Happened’ reviews then feeding these dimensions into a cluster analysis in order to identify a small number of distinct text types. We know that the reviewing patterns for ‘What Happened’ displayed ‘burstiness’ i.e. a high frequency of reviews within a short period of time (see Figure 1 below). As Figure 2 below hypothesises, if there is a text type cluster that displays similar ‘burstiness’, we can infer that there there was probably some level of coordination of reviewing behaviour and identify linguistic features associated with less-than-authentic reviews.
Figure 1: Quantity, frequency and rating of ‘What Happened’ book reviews in the first month after launch.
Figure 2: Hypothesis for fake review detection using cluster analysis with time series.
The factor analysis found four dimensions of language variation in the ‘What Happened’ reviews: Engagement; Reflection; Emotiveness; Commentary.
Dimenson 1: Engagement
One linguistic dimension of these reviews describes levels of Engagement. In engaging reviews, writers directly address (using ‘you’ pronouns) either the reader or Hillary Clinton. The style is conversational and persuasive with exclamations, questions and hypotheticals used to interact with the reader.
THANK YOU for telling your story Secretary Clinton! You have accomplished so much and are a genuine inspiration. If they weren’t so afraid of you, they wouldn’t work so hard to shut you up. Keep fighting and I will too!
It’s her side of the story. That’s what it claims to be, and that’s what it is. For those who don’t like it because you disagree with her, you’re missing the point. After reading it, did you get a better feel for who the candidate was, what she was thinking, and even what her biases were and are? If so, then the book does what it claims to do.
Non-engaging reviews are more linguistically dense, using longer words and giving complex descriptions of the content.
The second chapter describes the days after the election, when she first isolated herself from the deluge of texts and emails from well-wishers. Eventually, however, she threw herself back into the fray, writing letters of thanks to supporters, attending galas, and spending time with her family.
Dimension 2: Reflection
A second linguistic dimension sees reviewers reflect on their personal experience of reading the book. This may include autobiographical elements, narratives related to the book purchase and reading occasions as well as feelings had while reading. The key linguistic features here are ‘I’-pronoun and past tense:
Like many other people, I wondered if this book would really be worth reading. I voted for Clinton but I wondered how much value there could be in her account of the 2016 Presidential election campaign. Luckily, this book is so much more. It hit my Kindle on Tuesday and as it happens I had three airplane flights (including two very long ones) on Wednesday and Thursday, so I made it my project for those flights. I didn’t have to force myself to keep going; once I started, her narrative and the force of her ideas and anecdotes kept me reading.
Dimension 3: Emotiveness
Reviews with a high Emotiveness score were extremely positive in their praise of the book and, especially, Hillary Clinton. This was signalled by use of long strings of positive adjectives that might reasonably be considered excessive:
A funny, dark, and honest book by one of the truest public servants of her generation. Her writing on her marriage was deeply heartfelt and true. The sad little haters will never keep this woman down, and history will remember her as a trailblazer and a figure of remarkable courage.
The People’s President, Hillary delivers her heartfelt, ugly cry inducing account about What Happened when she lost the Electoral College to the worst presidential candidate in modern history. Politics aside, America lost when they elevated Russia’s Agent Orange to the presidency. Think what you will, but America missed the chance to have a level headed, intelligent and resilient leader, and yes the first female president.
Hillary’s a smart, insightful, resilient, inspiring, kind, caring, pragmatic human being. This book is a journey through her heart and soul.
Dimension 4: Commentary
Reviews with high Commentary focused on Hillary Clinton and the other actors in the election story (high use of third person pronouns). The reviews analyse and evaluate Clinton’s perspective and explanation of what happened in 2016, in a conversational manner much like a TV commentator or pundit.
I disagree with the reviewers who says Hillary doesn’t take responsibility for her mistakes. She analyzes all the reasons she thinks she lost the election–yes, she talks about Russian interference, malpractice by the FBI, and false equivalence by the mainstream press IN ADDITION TO missteps she thinks she made. My own take is that she doesn’t pay enough attention to the reasons why Bernie Sanders was able to command so strong a following with so few resources; but that is part and parcel of who she is.
Historical memoir from the first female candidate for a major political party…a unique perspective and platform to write from. She does recount her successes as well as her failures…she was mostly shut down during the campaigns by repetitious questions and by over-coverage of Trump by the media. She is intelligent and well-informed and states her case without self-pity.
Having the identified these four linguistic functions in the ‘What Happened’ reviews, the trick is to see how they combined to form clusters of review text types – and whether any one of these clusters is more strongly correlated with the high frequency and early reviews.
As Figure 4 shows, hierarchical cluster analysis identifed four review text types: ‘Tribute’ reviews, the largest cluster, have high Emotiveness; ‘Pundit’ reviews have high levels of Commentary and Engagement; ‘Content descriptive’ or ‘spoiler’ reviews talk about what’s in the book in an objective manner i.e. without Reflection or Engagement; ‘Experiential’ reviews narrate the writer’s personal Reflection on the experience of reading the book.
Figure 4: 4-Cluster solution with mean factor loadings, interpretations and percentage of total reviews.
So, we have these four review text types…do any of these correlate with the bursty reviewing patterns identified? Figure 5 below shows that the actual linguistic pattern of ‘What Happened’ reviews appears to correlate with the burstiness pattern; a large proportion of the first day reviews are Tribute reviews and most of this review type occurs within the first week before tailing off during the rest of the month. The fact that no other review type is particularly time sensitive suggests that, at the very least, Tribute reviews are correlated with early reviewing and are potentially evidence of coordinated recruitment of Hillary Clinton’s ‘fans’ as book reviewers.
Figure 5: Distribution of ‘What Happened’ review text types during first month following book launch, compared to hypothetical deceptive and non-deceptive distributions.
If Hillary Clinton’s PR team did solicit positive reviews in the early days of the book launch, perhaps it is not surprising; they would have been responding to an extensive negative campaign against her book which included manipulating review helpfulness metrics (i.e. massive upvoting of low-rated reviews) as well writing fake negative reviews.
From an investigative linguistic perspective, this analysis shows that: a) suspicious activity can be detected using linguistic data as well as network or platform metadata; b) unqualified praise and intense positive emotions are deception indicators in the online review genre; and c) cluster analysis is an effective way of recognising linguistic deception features in an unsupervised learning setting.
September 12, 2017, was the launch day for Hillary Clinton’s autobiographical account of the 2016 election she lost to Donald Trump, definitively entitled ‘What Happened’. By midday 1669 reviews had been written on Amazon.com. By 3pm over half of the reviews, all with 1-star ratings, had been deleted by Amazon and a new review page for the book had been set up. After Day 1, ‘What Happened’ had over 600 reviews and an almost perfect 5 rating. What happened?!
Figure 1: Genuine support or fake reviews? Hillary Clinton’s ‘What Happened’ Amazon rating 1 day after launch (and after all the negative reviews were deleted )
There were good reasons to view the flood of negative reviews as suspicious. Only 20% of the reviews had a verified purchase and the ratio of 5-star to 1-star reviews – 44%-51% – was highly irregular; the vast majority of products reviewed on Amazon.com display an asymmetric bimodal (J-shaped) ratings distribution (see Hu, Pavlou and Zhang, 2009), in which there is a concentration of 4 or 5 star reviews, a number of 1-star reviews and very few 2 or 3 star reviews. The charts in Figure 2 below, originally featured in this QZ article, show the extent to which ‘What Happened’ was initially a ratings and purchase pattern outlier.
Figure 2: Two charts indicating the unusual reviewing behaviour for ‘What Happened’. Source: Ha, 2017
Faced with accusations of pro-Clinton bias as a result of deleting only negative reviews, an Amazon spokesperson confirmed that the company, in taking action against “content manipulation”, looks at indicators such as the ‘burstiness’ of reviews (high rate of reviews in a short time period) and the relevance of the content – but doesn’t delete reviews simply based on their rating or their verified status. (Hijazi, 2007).
It would appear that Amazon have taken on board the academic literature suggesting that burstiness is a feature of review spammers and deceptive reviews (e.g. this excellent paper by Geli Fei, Arjun Mukherjee, Bing Liu et al. ) and that it is right to interpret a rush of consecutive negative reviews close to a book launch as suspicious.
But what about the subsequent burst of 600+ positive reviews? One might expect the Clinton PR machine to mobilize its own ‘positive review brigade’ in anticipation of , or in response to, a negative ‘astroturfing’ campaign against her book. One could even argue that it would be foolish not to manage perceptions of such a controversial and polarising book launch. If positive review spam is identified, should it also be deleted?
I tracked the number of Amazon reviews of ‘What Happened’ for a month after its launch on the new ‘clean’ book listing (the listings have since been merged but you can see my starting point here). Figure 3 below shows clear signs of ‘burstiness’; the rate of reviewing decreases exponentially over the first month even while the rate of 5-star reviews remained consistently high.
Figure 3: Number and frequency of ‘What Happened’ reviews in the first 30 days following its launch and deletion of negative reviews.
So, it is perfectly legitimate to ask whether the ‘What Happened’ reviews were manipulated through ‘planting’ of ‘fake’ 5 star reviews written for financial gain or otherwise incentivised e.g. in exchange for a free copy of the book, which would circumvent Amazon’s Verified Purchase requirement. With my investigative linguist hat on, I’m wondering if there are any linguistic patterns associated with this irregular – and potentially deceptive – behaviour? (If there are, these could be used to aid deception detection in the absence of – or in tandem with – non-linguistic ‘metadata’.)
A line of fake review detection research has confirmed linguistic differences between authentic and deceptive reviews, although the linguistic deception cues are not consistent and vary depending on the domain and the audience (see my brief overview in this paper). Since we don’t know the deception features in advance and no ground truth has been established (i.e. we don’t know for sure if there was a deception), I’m going to use two unsupervised learning approaches appropriate for unlabeled data: factor analysis, to find the underlying dimensions of linguistic variation in all the reviews, followed by cluster analysis to segment the reviews into text types based on the dimensions with the hope of finding specific deception clusters.
If there is a text cluster that correlates with ‘burstiness’ – i.e. occurs more frequently in the reviews closest to the book launch date and/or occurs repeatedly within a short time frame – then that would suggest there are specific linguistic styles and/or strategies correlated with this deceptive reviewing behaviour. The existence of such a distinct deception cluster would strongly suggest that Clinton’s PR team gamed the Amazon review system (understandably, in order to counter the negative campaign against the book). Alternatively, different reviewing strategies might be distributed randomly across the review corpus and unrelated to its proximity to the book launch date. This would weaken the argument that linguistic variation in the reviews is a potential deception cue. The two scenarios are illustrated in Figure 4 below:
Figure 4: Hypothetical illustration of how review text types (clusters) might be distributed over a 30 day period in the case of astroturfed fake reviews (top) or genuine positive reviews (bottom).
My prediction? Surely, Hillary Clinton’s PR team would not so be so brazen as to solicit fake positive reviews in bulk and in an organised fashion. Yes, there were a disproportionate number of reviews written in the first few days but I believe this was a spontaneous groundswell of genuine support. I do expect there to be a few different types of linguistic review style, reflecting the different ways in which books can be reviewed (e.g. focus on book content; retell personal reading experience; address the reader – these are some of the review styles I presented at the ICAME39 (2018) conference in Tampere). However, if the support is spontaneous I would expect these review styles not to be correlated with burstiness or other deceptive phenomena but to occur randomly throughout the month.
Check back here in a few days for Part #2: Results and Discussion!
In 2006, a federal court judged four of the ‘Big Five’ US tobacco companies – Phillip Morris, RJ Reynolds, British American Tobacco, Lorillard (sold to RJ Reynolds in 2014) – to have been operating for more than half a century as a de facto criminal enterprise guilty of racketeering. In November 2017, US tobacco companies finally issued, through national TV and print media, a series of statements correcting their sixty year deception of the American public. They had been appealing the original judgement for over ten years.
The ‘racket’ was the continued sale and marketing of tobacco products in full knowledge of their addictive properties and their causal connection to lung cancer. Under the Racketeer Influenced and Corrupt Organisations (RICO) Act 1970, these tobacco companies were held to have defrauded smokers i.e. obtained smokers’ money by dishonest means, specifically “deceiving smokers, potential smokers, and the American public about the hazards of smoking and second hand smoke, and the addictiveness of nicotine” (United States vs. Phillip Morris et al, 2006, p4).
Below is a list of deceptions maintained by the ‘Big Tobacco’ enterprise:
- false denial of the adverse health consequences of smoking
- publicly denial that nicotine is addictive
- concealment of research data and other evidence that nicotine is addictive
- false denial of the manipulation of nicotine levels to create and sustain addiction
- deceptive marketing and public statements suggesting ‘low tar’ cigarettes are less harmful than full-flavor cigarettes
- false denial that their marketing targets youth
- false and misleading public statements denying that environmental tobacco smoke (passive smoking) is hazardous to nonsmokers
How does one deceive the public for so long (linguistically)? My approach to this question is to first identify the agents and channels of deceptive communication. We know who received the deceptive messages but who were the senders? Did they use agents/messengers? What channels did they use?
The ‘Big Tobacco’ enterprise built an infrastructure of deception by establishing a number of front organizations i.e. groups that appear to independently support or be motivated by one particular purpose but are actually a ‘front’ for another group whose covert agenda they secretly serve. Front organizations are agent-messengers that appear to be senders. Examples of the forms that front organizations might take include think tanks, associations of consumers or workers, and single-issue interest groups.
Chief among these was the Tobacco and Industry Research Council (TIRC), founded in 1954, which later changed its name to the Centre for Tobacco Research (CTR). TIRC/CTR was established on the strategic recommendation of Hill & Knowlton, public relations counsel to the Big Tobacco enterprise. Backed by money from the Big Tobacco enterprise, it ran a multi-million dollar research programme providing substantial grants for ‘independent’ scientific research into the health effects of smoking. This produced a body of research that obfuscated the link between smoking and cancer and left their causal connection as an “open question”. TIRC/CTR also funded research that diverted discourse away from the dangers of smoking by suggesting alternative causes of cancer such as air pollution, diet and genetics.
This programme of decoy research clearly had a deceptive purpose. However, the research itself was not necessarily deceptive – it created doubt rather than false belief by challenging the anti-smoking research attracting the attention of the American government and health organisations at that time. Neither was this research directly responsible for the mass deception of the American public since the public was not the audience for scientific research.
In the 2006 judgement, Judge Kessler noted that The Big Tobacco front organisations disseminated ‘commentaries’ on both pro- and anti-smoking research through a variety of publication channels, including:
- management commentaries in annual reports, read by business professionals
- newsletters and booklets targeting medical professionals with favourable research summaries.
- public statements, comments made by tobacco-friendly scientists discrediting research that linked smoking with cancer.
Whilst annual reports and newsletters helped communicate the tobacco deception to professional outgroups, as shown in Figure 1 below, press releases were the most influential channel for reaching the general public. Every publication and statement made by a scientist connected to the enterprise was accompanied by a press release. This would be sent out to thousands of editors and then transmitted to the general public through newspapers and popular magazines.
Thus, the press release plays a doubly deceptive role; it reports the deceptive framing of the discourse around tobacco/cancer research and then amplifies its interpretation though popular media. Indeed, PR agency Hill & Knowlton prided themselves on their ability to spin “obscure scientific reports favourable to the industry into headline news across the country”.
Figure 1: Mass deception; infrastructure, channels and genres
Below (Figure 2) is one example of press release distortion of research in the tobacco domain. This 1955 study from the British Empire Cancer Campaign (a forerunner of Cancer Research), published in the British Medical Journal, reports a nuanced set of findings in relation to the carcinogenic properties of smoking. It reports findings indicating that: i) tar is not carcinogenic in mice, ii) condensation from tobacco smoke is carcinogenic in mice and iii) carcinogenic effects vary between species so more research is needed.
This nuance is lost in the press release, which seizes on the first finding related to tar and extends it to smoking and tobacco in general. The press release, authored by Hill & Knowlton, draws it authority by reporting the statement made by the TIRC chairman Timothy Hartnett and uses repetition to reinforce its point three times in the first page – “Outstanding British scientists could not induce cancer in experimental animals with tobacco smoke derivatives” / “Experiments conducted at several leading British medical research institutions had failed to induce any cancers” / “18 month experiments fail to show any connection between cancer and smoking”.
Figure 2: Comparison of British Medical Journal article and TIRC press release
The fact that this repetitive message itself references a message that is practically self-authored (considering the close relationship between TIRC and Hill & Knowlton, who even shared the same office at one stage) is indicative of the low information quotient in this press release. Yet providing information is arguably the main purpose of the press release genre. What we have here, then, is a deficient or deviant genre communication – inauthenticity (deception) has compromised the integrity of the genre.
The same has been suggested about the annual reports produced by the TIRC, “which read much like industry position papers” (USA vs Phillip Morris et al, 2006, p58). The extract below, from the 1958 TIRC annual report, is illustrative:
A problem may well be obscured, and its solution delayed, by the soothing acceptance of an oversimplified and immature [tobacco theory] hypothesis. The proponents of the tobacco theory have generated increasingly intensive and extensive propaganda. As a result, a non-scientific atmosphere, conducive to prematurity, unbalance, and inadequacy of public judgement, has pervaded the whole field. The prohibition concept discounts or ignores all considerations of smoking benefits in terms of pleasure, relaxation, relief of tension or other functions.
Once again, in this case by allowing bias to enter annual reporting, a genre is compromised through performing a deceptive purpose. These two examples suggest that the tobacco deception was partly sustained by ‘genre fronts’ – communications that appear to serve one conventional purpose but in fact fulfill functions of another genre or are simply deficient.
Press releases are more central to mass deception than annual reports because they are more frequent and produced for immediate impact in popular media. Press release spin is picked up by editors and transferred to newspapers and magazines – sometimes wholesale, sometimes with additional fervour. This resulted in a news environment that actively facilitated the disinformation strategies of the Big Tobacco enterprise leading to a massively misinformed general public.
The selection of headlines, news articles and editorials in Figure 3 below reflect the general tenor of the ‘decoy discourse’ maintained by the tobacco industry, which involved: attacking the integrity of government scientists, casting doubt on environmental controversies such as the use of pesticides and climate change, and linking cancer to spurious causes. The confluence of tobacco advocates and climate change skeptics is a striking feature of this mass deception.
Figure 3: Selection of newspaper and magazine items used to support tobacco advocacy. Taken from ‘Bad Science: A Resource Book’
Newspaper items come in a variety of sub-genres, for example news articles whose purpose is to present factual information and editorials that present an opinion. Editorials themselves can be written by newspaper staff or invited writers as ‘op-eds’. Both of these sub-genres can be used for deceptive means i.e. the presentation of false facts in news articles and the advancement of an undisclosed agenda in the case of editorials. In such cases, the news genre loses its features of objectivity and transparency and becomes distorted as ‘fake news’.
Taking the ‘tobacco deception’ as a case study, it would seem that the infrastructure for maintaining this deception was built on the deviant use of a variety of genres – annual reports, press releases, newspaper/magazine items – that were connected by channels invisible to the general public i.e. front organizations and funded scientists. The schema presented above will be used in future posts to evaluate similar deception controversies in the environmental and health domains.
If you are ever arrested and asked to make a statement by the police on US soil, be careful with your pronouns. Law enforcement officers in the US are likely to have received training in analysing statements for deception from Mark McClish or Don Rabon, which means the pronouns you use will be inspected very closely. McClish and Rabon come in for a lot of stick from forensic linguists working in academia, due to the lack of citations and over-generalisations in their blogs and best-selling books.
But to be fair, many of the leading academics working on lie detection have said similar things about the use of first person pronouns being an indicator of veracity. Newman, Pennebaker and colleagues, in their seminal work ‘Lying Words: Predicting Deception From Linguistic Styles’ provided some empirical support for the correlation between low self-reference and deception; subsequently it has been found to broadly hold in online dating profiles, business communication and criminal narratives (but, interestingly, not the case in consumer reviews which use ‘reader engagement’ as a deception strategy).
It should be noted, however, that Newman and Pennebaker’s prediction rate was 67% – that is wrong 1 in 3 times. Consequently, simply counting the use of any first person pronouns is not by itself the magic cue for deception detection. There are a number of different first person pronouns – I, me, my, mine, myself, we, us, our, ours, ourselves. Not only do these have different strengths in terms of socio-psychological ideas of distance and commitment (compare ‘I was hit’ with ‘The car hit me’) but they work differently linguistically. For example, ‘I’ and ‘me’ will correlate with verbs, ‘my’ with nouns; also ‘I’ correlates with stance and modal verbs (‘I thought’, ‘I tried’, ‘I would’) so is a more ‘active’ first person pronoun than the passive ‘me’.
The above also applies to ‘we’, ‘us’ and ‘our’. In addition first person plural pronouns have the additional pragmatic parameter of clusivity to distinguish who exactly is included in the ‘we’ (see Figure 1 below). And there is the linguistic phenomenon of nosism which includes royal and editorial ‘we’ (see Ben Zimmer’s excellent 2010 article to fully appreciate the complexities of ‘We’).
Figure 1: Referential parameters of ‘we’: inclusive (left), exclusive (right). By LucaLuca. Reproduced under Creative Commons licence
A case in point is ex-UK Prime Minister David Cameron’s response to questions about his alleged use of off-shore tax havens to avoid paying tax, as revealed in the Panama Papers leak. In April 2016, 10 weeks before the EU Referendum and his subsequent resignation, Cameron was taking questions about the upcoming referendum and speaking in support of remaining in the EU at a town-hall style Q&A event held at PriceWaterhouseCoopers’ Birmingham offices.
Figure 2 is a transcript of David Cameron’s response to an unexpected question from Sky News journalist Faisal Islam regarding the controversy over his connection to an offshore investment company (Blairmore Holdings) owned by Cameron’s late father. Cameron denied owning any shares or offshore investments but was roundly criticised for his evasive answer (Cameron restricts his answer to the present tense, despite Faisal Islam’s specific temporal reference to the past and the future – lines 4-5). Five days later, under public pressure to resign, Cameron was forced to admit that he had owned shares in his father’s business (which he sold at a profit shortly before taking office as Prime Minister).
Figure 2: Transcript of David Cameron’s first public response on the Panama Papers allegations. Given during a Q&A with workers at accountancy firm PWC in Birmingham, 5 April 2016.
David Cameron’s use of pronouns is an excellent example of linguistic duplicity, shifting between inclusive ‘we’ (yellow), exclusive ‘we’ (red) as well as ‘I’, all in reference to himself; the identities referred to by ‘I’ are split between David Cameron as a UK citizen and his role as Prime Minister.
Furthermore, the scope of ‘we’ varies within the text and is sometimes unclear. In answer to a question about “you and your family” with regard to the financial business of one’s late father, one might expect ‘we’ to refer to some aspect of family. However, Cameron initially moves to include the whole audience (and viewers) in his personal financial affairs by referring to ‘we’ as a nation with the contextual reference “our tax authority” and later “our own country” (line 8). In the middle of his response, a different (exclusive) ‘we’ appears mid-sentence – “I have a house, which we used to live in, which we now let out while we are living in Downing Street” (lines 16-17). There is no explicit reference to the scope of this ‘we’ but the reference to personal property ownership means one can assume that is not the national ‘we’.
If one assumes that it is the ‘we’ originally asked for in the reporter’s question – i.e. “you and your family” – then Cameron has violated the right frontier constraint (Webber, 1988), which stipulates that anaphoric elements such as pronouns are interpreted in ambiguous cases by reference to information at the end of the previous discourse unit i.e. the right frontier (for languages with left-to-right scripts). That Cameron returns to ‘we as a nation’ for the remaining text further highlights his dynamic use of pronominal reference.
The linguistic duplicity displayed by David Cameron above is in stark contrast to the language he used when owning up to his involvement with Blairmore Holdings in a hastily-arranged national TV interview. As the transcript shows, Cameron doesn’t use ‘we’ at all in response to similar questions. This case study shows that assessing veracity and potential deception by tracking pronoun use is valid but more complex than simply counting; the inherent capacity for linguistic duplicity is contained within a complex system of deceptive pragmatics.