Virginia Roberts Giuffre 2015 Court Testimony [Case 9:08-cv-80736-KAM]

In 2015, Virginia Roberts gave testimony about her alleged life as a trafficking victim of Jeffrey Epstein to a Florida Court in a failed attempt to join a civil suit against the US Government. Two victims of Epstein were suing the US government over a widely criticised plea deal granted to Epstein in 2008, in which multiple charges of sex-trafficking and child sex abuse were watered down to a single count of soliciting prostitution from an underage girl. (BBC report.) Virginia Roberts (‘Jane Doe 1’ in the  case filing) had wanted to join that case as a victim; Judge Kenneth Marra ruled she could appear in any subsequent Epstein case as a witness but not as a victim.

In addition to evidence of trafficking by Epstein and his partner Ghislaine Maxwell, Roberts testimony contained claims about having sex with lawyer and Harvard Professor Alan Dershowitz and Prince Andrew, Duke of York, on several occasions. Both vehemently deny the claims until this day (see Prince Andrew Newsnight interview analysis blog). Roberts giving testimony under oath to a Florida Court and had legal representation.

Virginia Roberts, with Prince Andrew (left)
Patrick McMullan Archives
Jeffrey Epstein and Ghislaine Maxwell
epstein dershowitz
Jeffrey Epstein and Alan Dershowitz

Outliar™ ‘Linguistic Polygraph’ Methodology

OutliarLinguistic Polygraph is based on principles of deceptive communication drawn from Information Manipulation Theory (McCornack et al. 2014): that lies are built on truth and therefore deception most often produces texts that are a strategic mixture of truth and lies. Using this insight, the Outliar methodology utilizes the most sensitive linguistic deception cues (LDCs) drawn from the academic literature (see Hauch et al. 2015 for a good overview), as well as LDCs used on investigator training programmes, in order to identify and separate credible and suspicious content (see Popoola (2017) for a case study). Disclaimer: Outliar is not a lie detector. It is an investigative linguistic tool that highlights credible text segments and identifies suspicious text segments as ‘points of interest’ deserving further investigation i.e. loci of potential deception.

Virginia Roberts Giuffre testimony

virginia giuffre outliar analysis

Credible testimony segments

Segment 1: Roberts describes how she met Epstein and Maxwell and how she was groomed. There are verifiable details as well as sincere reflection on her motivation and willingness to work for Epstein (“they were promising me the world,” line 22). Roberts alludes to the fact that her father accompanied her (“My father was not allowed upstairs, line 15”, ). This embarrassing detail (it is the first and only mention of her father) increases credibility.

VR segment 1

Segment 6: Roberts’ description of meeting Prince Andrew the first time contains clear details of places (Ghislaine Maxwell’s house; Club Tramp), people (Prince Andrew’s security detail), conversations (Sarah “Fergie” Ferguson; Epstein’s instructions) and intention (“exceed everything”- line 157).

VR segment 6

Segment 8: Roberts describes her second and third encounters with ‘Andy’ with additional verifiable details including a new co-witness (Johanna Sjoberg), another alleged Epstein conspirator (Jean-Luc Brunel) and recall of conversations. Roberts is specific about the location of events and freely offers time-specific and contextual information, indicating she is confident of her information and not afraid of incriminating herself.

vr segment8b

Segment 10: Roberts mentions a number of named entities and specific details – Epstein, Brunel, Maxwell, Sarah Kellen, U.S. Virgin Islands, Palm Beach, New Mexico – to give a concise summary of the sex trafficking operation; the main actors  and their motivations are outlined and the varying levels of involvement are conveyed with a measured tone. Rumours that former US President Bill Clinton was also among the public and elite figures that made up Epstein’s alleged clientele are evenly rebutted; this distinction adds credibility to the accusations Roberts does make.

VR segment 10

Segment 12: At the end of her testimony, Roberts admits that she has omitted details in relation to sexual activities to focus her testimony on the trafficking operation (global travel, glamorous events and powerful people). Indications of potentially suspicious or low-credibility testimony should be interpreted in the context of this admission and Roberts’ willingness to divulge further details if legally required. The combination of ‘omission admission’ the truth declaration made under penalty of perjury add to Roberts’ credibility.

VR segment 12

Suspicious testimony segments

Segment 2:  Here, Roberts is distancing herself from events and removing self-agency (High ration of ‘me’ to ‘I’ in comparison to other segments); Roberts may be downplaying her level of willingness (complicitness?) in the circumstances described.  Admission of some level of agency (e.g. admitting that she was not literally imprisoned) would aid credibility.

The role of Ghislaine Maxwell is unclear; it is odd that Roberts describes being fearful of Epstein but not Maxwell although they were allegedly both involved in the training. This lack of coherence may mean pertinent information about Maxwell has been omitted. Contrasting Epstein and Maxwell would aid credibility; nuance and the ability to draw distinctions are signs of veracity.

VR segment 2

Segment 7: High use of ‘we’ indicates Virginia is emphasising her willingness to be included in the company. It is also noteworthy that Virginia use ‘we’ in reference to the claimed ‘sexual activities’ with Prince Andrew. This, and use of the familiar ‘Andy’, also suggests willingness/complicitness. This tone is out of kilter with the general communication of a regimented trafficking operation.

The reference to the ‘sexual activities’ is notably brief compared to the rest of the story and is referenced impersonally. This could indicate omission of pertinent information. Roberts later admits pleasant surprise at receiving $15,000 from Epstein on this occasion with ‘Andy’ (“When I got back from my trip, Epstein paid me more than he had paid me to be with anyone else — approximately $15,000” – see lines 196-198 in in my analysis transcript.

Considering the credibility of the surrounding trext, a reason analysis flags this segment as a suspicious point-of-interest is that Roberts may be omitting information that would explain why she was rewarded so ‘generously’ for the sexual activities alluded to here (see note on omissions in the Segment 12 analysis above).

VR segment 7

Summary and Postscript

This is generally credible testimony because the suspicious segments are actually explained in the credible segments; regarding Segment 2’s elision of the role of Ghislaine Maxwell, this is developed to some extent in other parts of the testimony. Also the main focus of this testimony was the ongoing investigation of Epstein so it may be understandable that Roberts focus was not on Maxwell.

Similarly, although the segment relating to Roberts’ first sexual encounter with Prince Andrew is flagged as suspicious, the subsequent narration of the 2nd and 3rd encounters restore credibility. It may be that the disparity between what Roberts was allegedly paid for the 1st encounter with Prince Andrew, $15000, and the 3-figure sums received on subsequent occasions (see line 208 in my analysis transcript), reflect differences in the service provided that were omitted .

Ambiguity in relation to Roberts’ motivation and incentivisation should be investigated. Questioning should probe the psychological and financial aspect of her ‘working’ relationship with Epstein as well as Roberts’ relationship with Ghislaine Maxwell.

Ghislaine Maxwell latest
Ghislaine Maxwell. Evidence has been offered that this picture, reportedly taken in a Los Angeles burger joint in August 2019, was staged or photoshopped.


Prince Andrew Newsnight Interview: Deception Analysis

Prince Andrew + Emily MThe Duke of York interviewed by Emily Maitlis on Newsnight (©BBC 2019)

HRH Prince Andrew, the Duke of York, was interviewed by Emily Maitlis in a BBC Newsnight special broadcast on 16 November. The topic of the interview was Prince Andrew’s relationship with Jeffrey Epstein, a billionaire and convicted pedophile who died in prison whilst being investigated for multiple sex trafficking charges. Prince Andrew was also asked about allegations made by Virginia Giuffre (neé Roberts), one of the women who claimed to have been trafficked and abused by Epstein and his partner Ghislaine Maxwell. Giuffre testified under oath in a 2015 court deposition that she had been trafficked to Prince Andrew by Epstein and his partner Ghisliane Maxwell, and that she had “engaged in sexual activities” with Prince Andrew on three occasions.

Andrew Virginia photo depositionandrew epstein
Top: Extract from Virginia Roberts Giuffre 2015 court deposition. Below: Prince Andrew and Jeffrey Epstein in Central Park in 2010.

Newsnight negotiated the interview with Buckingham Palace over a six-month period. After initially refusing an interview with Newsnight in May due to a reluctance to talk about Epstein, Prince Andrew and Buckingham Palace agreed to the interview after Epstein’s death in August (see GQ interview with Newsnight producer Sam McAlister ).

Prince Andrew and the palace agreed that no questions regarding the Epstein-related allegations would be off limits; neither were questions agreed in advance. Considering these circumstances of the interview, it seems possible that Prince Andrew was motivated by a desire to clear his name. It is also likely that Buckingham Palace were convinced that the allegations made against Prince Andrew were false. These circumstances point to a strong desire to appear credible and justify a presumption of truth.

Outliar™ ‘Linguistic Polygraph’ Methodology

OutliarLinguistic Polygraph is based on principles of deceptive communication drawn from Information Manipulation Theory (McCornack et al. 2014): that lies are built on truth and therefore deception most often produces texts that are a strategic mixture of truth and lies. Using this insight, the Outliar methodology utilizes the most sensitive linguistic deception cues (LDCs) drawn from the academic literature (see Hauch et al. 2015 for a good overview), as well as LDCs used on investigator training programmes, in order to identify and separate credible and suspicious content (see Popoola (2017) for a case study). Disclaimer: Outliar is not a lie detector. It is an investigative linguistic tool that highlights credible text segments and identifies suspicious text segments as ‘points of interest’ deserving further investigation i.e. loci of potential deception.

Prince Andrew’s Interview


Credible interview segments

Segment 3: This is Prince Andrew at his most reflective. He admits that he thought visiting Epstein after his conviction was honourable rather than inappropriate (“I felt that doing it over the telephone was the chicken’s way of doing it” lines 167-168); he notes that he took the decision to visit Epstein himself and against the advice of at least some of his team (“I had a number of people counsel me in both directions…“). Reflective engagement with different perspectives and self-questioning is a cognitively complex stance that is difficult to maintain during deception.

interview segment 3

Segments 6 and 7: These contain a host of ostensibly verifiable details –  information about a Pizza Express visit (lines 371-375); details of a medical condition that prevents sweating (lines 387-393), the kinds of clothes he usually wears when traveling (lines 470-472). In the age of ‘deep fakes’, even his skepticism as to the provenance of the photograph with a 17-year old Virgina Roberts (see above) comes across as reasonable (lines 476-477). As well as verifiable facts, Prince Andrew provides reasons and explanations; all this adds to the credibility

interview segment 6

interview segment 7

Suspicious interview segments

Prince Andrew’s language register shifts significantly in Segment 8; his coherence disappears and he becomes increasingly vague. Up until this point, he has been able to somewhat plausibly deny specific occasions of meeting Virginia Roberts; however, he is unable to convincingly deny knowing Virginia Roberts at all. Prince Andrew doesn’t offer any reasons for not knowing her or concessions towards the fact that people might think he has met her. This runs counter to the reflective and conciliatory register of the remainder of the interview.

interview segment 8

In general, liars don’t like to directly speak lying words. Here, each of Prince Andrew’s propositions in lines 577-580 – ‘I don’t remember meeting her, ‘I don’t remember a photograph being taken’, ‘I’ve said [many times] that we never had…sexual contact’ – can be taken literally as true (i.e. not remembering, saying something frequently). However, this is a key difference between lying – speaking falsehoods – and deception i.e. creating false belief; deception is often executed through exploiting presupposition and perceived credibility cues – without explicitly stating false facts. A more credible answer would include a concession (e.g. “I wish I could remember”) or explanation (e.g. “I rarely, if ever, have met young girls in a casual setting so it’s extremely unlikely…”).

This segment of the interview is particularly awkward for two further reasons. Firstly, Prince Andrew pattern of register change indicates a strong distancing strategy; Andrew literally puts a barrier between himself and Virginia Roberts (“I don’t have a message for her because I have to have a thick skin”, line 589), and quite disparagingly refers to Roberts as just “somebody making allegations”(lines 589-90). This negativity is in stark contrast to the tone of the interview up until now. Liars are more likely to express unmoderated negativity when omitting pertinent information (whereas they become more verbose and personal when exaggerating or falsifying). Secondly, Prince Andrew’s suggestion that a man always remembers having sex because it is a “positive act” is vague and unconvincing; Andrew is trying to emphasize the extent to which he doesn’t remember; it is difficult to prove a negative i.e. that you don’t remember something (just as it is difficult to disprove a negative).

It is most suspicious that Prince Andrew does not address the second half of Maitlis’ double-question: “Is there any way you could have had sex with that young woman or any young woman trafficked by Jeffrey Epstein in any of his residences?” (line 594) . In answer to this question, Prince Andrew’s persistent use of the ‘it’ pronoun to refer to the Virginia Roberts alleged incident than more general allegations is clear avoidance of the second part of Maitlis’ question.

Summary and Postscript

Andrew’s denial of any knowledge of the existence of Virginia Roberts is unconvincing. Questions about her motivation are  denied vociferously but incoherently and with negativity and lack of engagement. This is out of step with the general register of the interview which has a reflective and considered (prepared?) tone. Although Prince Andrew may not have had sex with Virginia Roberts, he is likely to have more information on why she might be making these allegeations.

The photograph below, of Prince Andrew and Jeffrey Epstein on a yacht with a number of scantily-clad young females, indicates that the aforementioned unanswered question may be a key to the Prince Andrew – Epstein mystery.

andrew epstein yacht

Prince Andrew with Jerry Epstein. Phuket, 2001. Credit: Jason Fraser

Prince Andrew Newsnight Interview with Emily Maitlis – Transcript







Genres of mass deception

In 2006, a federal court judged four of the ‘Big Five’ US tobacco companies – Phillip Morris, RJ Reynolds, British American Tobacco, Lorillard (sold to RJ Reynolds in 2014) – to have been operating for more than half a century as a de facto criminal enterprise guilty of racketeering. In November 2017, US tobacco companies finally issued, through national TV and print media, a series of statements correcting their sixty year deception of the American public. They had been appealing the original judgement for over ten years.


The ‘racket’ was the continued sale and marketing of tobacco products in full knowledge of their addictive properties and their causal connection to lung cancer.  Under the Racketeer Influenced and Corrupt Organisations (RICO) Act 1970, these tobacco companies were held to have defrauded smokers i.e. obtained smokers’ money by dishonest means, specifically “deceiving smokers, potential smokers, and the American public about the hazards of smoking and second hand smoke, and the addictiveness of nicotine” (United States vs. Phillip Morris et al, 2006, p4).

Below is a list of deceptions maintained by the ‘Big Tobacco’ enterprise:

  • false denial of the adverse health consequences of smoking
  • publicly denial that nicotine is addictive
  • concealment of research data and other evidence that nicotine is addictive
  • false denial of the manipulation of nicotine levels to create and sustain addiction
  • deceptive marketing and public statements suggesting ‘low tar’ cigarettes are less harmful than full-flavor cigarettes
  • false denial that their marketing targets youth
  • false and misleading public statements denying that environmental tobacco smoke (passive smoking) is hazardous to nonsmokers

How does one deceive the public for so long (linguistically)? My approach to this question is to first identify the agents and channels of deceptive communication. We know who received the deceptive messages but who were the senders? Did they use agents/messengers? What channels did they use?

The ‘Big Tobacco’ enterprise built an infrastructure of deception by establishing a number of front organizations i.e. groups that appear to independently support or be motivated by one particular purpose but are actually a ‘front’ for another group whose covert agenda they secretly serve. Front organizations are agent-messengers that appear to be senders. Examples of the forms that front organizations might take include think tanks, associations of consumers or workers, and single-issue interest groups.

Chief among these was the Tobacco and Industry Research Council (TIRC), founded in 1954, which later changed its name to the Centre for Tobacco Research (CTR). TIRC/CTR was established on the strategic recommendation of Hill & Knowlton, public relations counsel to the Big Tobacco enterprise. Backed by money from the Big Tobacco enterprise, it ran a multi-million dollar research programme providing substantial grants for ‘independent’ scientific research into the health effects of smoking. This produced a body of research that obfuscated the link between smoking and cancer and left their causal connection as an “open question”. TIRC/CTR also funded research that diverted discourse away from the dangers of smoking by suggesting alternative causes of cancer such as air pollution, diet and genetics.

This programme of decoy research clearly had a deceptive purpose. However, the research itself was not necessarily deceptive – it created doubt rather than false belief by challenging the anti-smoking research attracting the attention of the American government and health organisations at that time. Neither was this research directly responsible for the mass deception of the American public since the public was not the audience for scientific research.

In the 2006 judgement, Judge Kessler noted that The Big Tobacco front organisations disseminated ‘commentaries’ on both pro- and anti-smoking research through a variety of publication channels, including:

  • management commentaries in annual reports, read by business professionals
  • newsletters and booklets targeting medical professionals with favourable research summaries.
  • public statements, comments made by tobacco-friendly scientists discrediting research that linked smoking with cancer.

Whilst annual reports and newsletters helped communicate the tobacco deception to professional outgroups, as shown in Figure 1 below, press releases were the most influential channel for reaching the general public. Every publication and statement made by a scientist connected to the enterprise was accompanied by a press release. This would be sent out to thousands of editors and then transmitted to the general public through newspapers and popular magazines.

Thus, the press release plays a doubly deceptive role; it reports the deceptive framing of the discourse around tobacco/cancer research and then amplifies its interpretation though popular media. Indeed, PR agency Hill & Knowlton prided themselves on their ability to spin “obscure scientific reports favourable to the industry into headline news across the country”.

Deception Infrastructure S

Figure 1: Mass deception; infrastructure, channels and genres

Below (Figure 2) is one example of press release distortion of research in the tobacco domain.  This 1955 study from the British Empire Cancer Campaign (a forerunner of Cancer Research),  published in the British Medical Journal, reports a nuanced set of findings in relation to the carcinogenic properties of smoking. It reports findings indicating that: i) tar is not carcinogenic in mice,  ii) condensation from tobacco smoke is carcinogenic in mice and iii) carcinogenic effects vary between species so more research is needed.

This nuance is lost in the press release, which seizes on the first finding related to tar and extends it to smoking and tobacco in general. The press release, authored by Hill & Knowlton, draws it authority by reporting the statement made by the TIRC chairman Timothy Hartnett and uses repetition to reinforce its point three times in the first page – “Outstanding British scientists could not induce cancer in experimental animals with tobacco smoke derivatives” / “Experiments conducted at several leading British medical research institutions had failed to induce any cancers” / “18 month experiments fail to show any connection between cancer and smoking”.


Figure 2: Comparison of British Medical Journal article and TIRC press release

The fact that this repetitive message itself references a message that is practically self-authored (considering the close relationship between TIRC and Hill & Knowlton, who even shared the same office at one stage) is indicative of the low information quotient in this press release. Yet providing information is arguably the main purpose of the press release genre. What we have here, then, is a deficient or deviant genre communication – inauthenticity (deception) has compromised the integrity of the genre.

The same has been suggested about the annual reports produced by the TIRC, “which read much like industry position papers” (USA vs Phillip Morris et al, 2006, p58). The extract below, from the 1958 TIRC annual report, is illustrative:

A problem may well be obscured, and its solution delayed, by the soothing acceptance of an oversimplified and immature [tobacco theory] hypothesis. The proponents of the tobacco theory have generated increasingly intensive and extensive propaganda. As a result, a non-scientific atmosphere, conducive to prematurity, unbalance, and inadequacy of public judgement, has pervaded the whole field. The prohibition concept discounts or ignores all considerations of smoking benefits in terms of pleasure, relaxation, relief of tension or other functions.

Once again, in this case by allowing bias to enter annual reporting, a genre is compromised through performing a deceptive purpose. These two examples suggest that the tobacco deception was partly sustained by ‘genre fronts’ – communications that appear to serve one conventional purpose but in fact fulfill functions of another genre or are simply deficient.

Press releases are more central to mass deception than annual reports because they are more frequent and produced for immediate impact in popular media. Press release spin is picked up by editors and transferred to newspapers and magazines – sometimes wholesale, sometimes with additional fervour. This resulted in a news environment that actively facilitated the disinformation strategies of the Big Tobacco enterprise leading to a massively misinformed general public.

The selection of headlines, news articles and editorials in Figure 3 below reflect the general tenor of the ‘decoy discourse’ maintained by the tobacco industry, which involved: attacking the integrity of government scientists, casting doubt on environmental controversies such as the use of pesticides and climate change, and linking cancer to spurious causes. The confluence of tobacco advocates and climate change skeptics is a striking feature of this mass deception.


Figure 3: Selection of newspaper and magazine items used to support tobacco advocacy. Taken from ‘Bad Science: A Resource Book’

Newspaper items come in a variety of sub-genres, for example news articles whose purpose is to present factual information and editorials that present an opinion. Editorials themselves can be written by newspaper staff or invited writers as ‘op-eds’. Both of these sub-genres can be used for deceptive means i.e. the presentation of false facts in news articles and the advancement of an undisclosed agenda in the case of editorials. In such cases, the news genre loses its features of objectivity and transparency and becomes distorted as ‘fake news’.

Taking the ‘tobacco deception’ as a case study, it would seem that the infrastructure for maintaining this deception was built on the deviant use of a variety of genres – annual reports, press releases, newspaper/magazine items – that were connected by channels invisible to the general public i.e. front organizations and funded scientists. The schema presented above will be used in future posts to evaluate similar deception controversies in the environmental and health domains.



Who is ‘we’? Investigating pronouns for deception.

If you are ever arrested and asked to make a statement by the police on US soil, be careful with your pronouns. Law enforcement officers in the US are likely to have received training in analysing statements for deception from Mark McClish or Don Rabon, which means the pronouns you use will be inspected very closely. McClish and Rabon come in for a lot of stick from forensic linguists working in academia, due to the lack of citations and over-generalisations in their blogs and best-selling books. 

language police.png

But to be fair, many of the leading academics working on lie detection have said similar things about the use of first person pronouns being an indicator of veracity. Newman, Pennebaker and colleagues, in their seminal work ‘Lying Words: Predicting Deception From Linguistic Styles’ provided some empirical support for the correlation between low self-reference and deception; subsequently it has been found to broadly hold in online dating profiles, business communication and criminal narratives (but, interestingly, not the case in consumer reviews which use ‘reader engagement’ as a deception strategy).

It should be noted, however, that Newman and Pennebaker’s prediction rate was 67% – that is wrong 1 in 3 times. Consequently, simply counting the use of any first person pronouns is not by itself the magic cue for deception detection. There are a number of different first person pronouns – I, me, my, mine, myself, we, us, our, ours, ourselves. Not only do these have different strengths in terms of socio-psychological ideas of distance and commitment (compare ‘I was hit’ with ‘The car hit me’) but they work differently linguistically. For example,  ‘I’ and ‘me’ will correlate with verbs, ‘my’ with nouns; also ‘I’ correlates with stance and modal verbs (‘I thought’, ‘I tried’, ‘I would’) so is a more ‘active’ first person pronoun than the passive ‘me’.

The above also applies to ‘we’, ‘us’ and ‘our’. In addition first person plural pronouns have the additional pragmatic parameter of clusivity to distinguish who exactly is included in the ‘we’ (see Figure 1 below). And there is the linguistic phenomenon of nosism which includes royal and editorial ‘we’ (see Ben Zimmer’s excellent 2010 article to fully appreciate the complexities of ‘We’).

clusivity we.png

Figure 1: Referential parameters of ‘we’: inclusive (left), exclusive (right). By LucaLuca. Reproduced under Creative Commons licence

A case in point is ex-UK Prime Minister David Cameron’s response to questions about his alleged use of off-shore tax havens to avoid paying tax, as revealed in the Panama Papers leak. In April 2016, 10 weeks before the EU Referendum and his subsequent resignation, Cameron was taking questions about the upcoming referendum and speaking in support of remaining in the EU at a town-hall style Q&A event held at PriceWaterhouseCoopers’ Birmingham offices.

Figure 2 is a transcript of David Cameron’s response to an unexpected question from Sky News journalist Faisal Islam regarding the controversy over his connection to an offshore investment company (Blairmore Holdings) owned by Cameron’s late father. Cameron denied owning any shares or offshore investments but was roundly criticised for his evasive answer (Cameron restricts his answer to the present tense, despite Faisal Islam’s specific temporal reference to the past and the future – lines 4-5). Five days later, under public pressure to resign, Cameron was forced to admit that he had owned shares in his father’s business (which he sold at a profit shortly before taking office as Prime Minister).

Cameron Panama response 1

Figure 2: Transcript of David Cameron’s first public response on the Panama Papers allegations. Given during a Q&A with workers at accountancy firm PWC in Birmingham, 5 April 2016.

David Cameron’s use of pronouns is an excellent example of linguistic duplicity, shifting between inclusive ‘we’ (yellow), exclusive ‘we’ (red) as well as ‘I’, all in reference to himself; the identities referred to by ‘I’ are split between David Cameron as a UK citizen and his role as Prime Minister.

Furthermore, the scope of ‘we’ varies within the text and is sometimes unclear. In answer to a question about “you and your family” with regard to the financial business of one’s late father, one might expect ‘we’ to refer to some aspect of family. However, Cameron initially moves to include the whole audience (and viewers) in his personal financial affairs by referring to  ‘we’ as a nation with the contextual reference “our tax authority” and later “our own country” (line 8). In the middle of his response, a different (exclusive) ‘we’ appears mid-sentence – “I have a house, which we used to live in, which we now let out while we are living in Downing Street” (lines 16-17). There is no explicit reference to the scope of this ‘we’ but the reference to personal property ownership means one can assume that is not the national ‘we’.

If one assumes that it is the ‘we’ originally asked for in the reporter’s question – i.e. “you and your family” – then Cameron has violated the right frontier constraint (Webber, 1988), which stipulates that anaphoric elements such as pronouns are interpreted in ambiguous cases by reference to information at the end of the previous discourse unit i.e. the right frontier (for languages with left-to-right scripts). That Cameron returns to ‘we as a nation’ for the remaining text further highlights his dynamic use of pronominal reference.

The linguistic duplicity displayed by David Cameron above is in stark contrast to the language he used when owning up to his involvement with Blairmore Holdings in a hastily-arranged national TV interview. As the transcript shows, Cameron doesn’t use ‘we’ at all in response to similar questions. This case study shows that assessing veracity and potential deception by tracking pronoun use is valid but more complex than simply counting; the inherent capacity for linguistic duplicity is contained within a complex system of deceptive pragmatics.

What does honesty look like (statistically)?

Certain linguistic features (e.g  reference, modality) facilitate deception because they are malleable to context and flexible to interpretation. My first blog post showed that deceptive communication contains ‘outliars’, portions of texts with an unusually high concentration of these linguistic features; in the second post we saw that the linguistic hotspots where these features cluster can be taken as ‘points of interest’ worthy of further investigation. Of course, liars do not have a monopoly on the use of modals! Furthermore, truth-tellers can sometimes be mistaken for liars due to nervousness, fear of disbelief, or perceptions of powerlessness (known as the ‘Othello error’). So what does honesty (non-deceptive) communication look like?

sharapova mistake

In my Standford Decepticon 2017 conference paper I tested the ‘Outliar’ investigative linguistic methodology on honest admissions of doping – true confessions – by the following five sports persons and professionals:

true doping confessions.png

The Maria Sharapova case took the tennis world by surprise (she was the first high-profile female tennis player to fail a drug test). In 2016, Sharapova was banned from competition after testing positive for meldonium during the Australian Open in January of that year. Meldonium is a heart medication that was found by the World Anti-Doping organisation (WADA) to be particularly popular amongst sports persons from Russia and Eastern Europe, perhaps due to its ability to block the body’s conversion of testosterone to oestrogen. Having placed meldonium on a watch list in 2015, WADA had fully prohibited the substance from January 1 2016, two weeks before the Australian Open. Following the failed drug test, Sharapova admitted she had been taking meldonium as medication since 2006 and stated that she had negligently and inexcusably missed the communications from WADA prohibiting its use.

Linguistic analysis of the explanation Sharapova gave to fans via her Facebook page shows two ‘outliars’ at the beginning and end of the post (see Figure 4 below).

Sharapova outliar graph

[1] I want to reach out to you to share some information, discuss the latest news, and let you know that there have been things that have been reported wrong in the media, and I am determined to fight back. You have shown me a tremendous outpouring of support, and I’m so grateful for it.

[13] I have been honest and upfront. I won’t pretend to be injured so I can hide the truth about my testing. I look forward to the ITF hearing at which time they will receive my detailed medical records. I hope I will be allowed to play again. But no matter what, I want you, my fans, to know the truth and have the facts.

Figure 3: Outliar analysis of Maria Sharapova’s 2016 Facebook post and outlier extracts.

Sharapova begins her post by suggesting she has been a victim of unjust media coverage. It had been widely reported that she had received five ‘warnings’ about the upcoming change to the WADA regulations. Sharapova agreed that she had received newsletters with links to the WADA rule changes but argued that these were ‘communications’ rather than warnings through which one had to “hunt, click, hunt, click, hunt, click, scroll and read” in order to find information about the prohibition. Sharapova ends her post by strongly maintaining that she is being honest about her genuine mistake (of using Meldonium as medication after the ban).

These anomalous extracts are particularly emotional when compared to the main body of this post, in which Sharapova gives specific details about all the communications she did receive (see yellow highlighted text in Figure 4 below). There is a lot of literature that suggests specific details are a strong indicator of veracity in legal genres such as witness statements. (Professor Aldert Vrij’s research on Criteria Based Content Analysis is a good place to start.) These anomalous extracts could just be ‘Othello errors’ that are confusing emotional intensity for deception.

Sharapova FB 1cSharapova FB 2c

Figure 4: Maria Sharapova Facebook post, March 2016. Last accessed 21/7/2018

Accounting for the ‘Othello error’ is one reason a full ‘Outliar’ analysis uses an additional measure of language change within a text – intratextual language variation – when assessing text veracity. Texts can range from having a uniform style with consistent use of features throughout – a stable text – to displaying marked changes in language style at several points – variable or ‘spiky’ text.  Outliar captures this by summing the amount of change shown in a text.

Figure 5 is an example of this. It compares ‘Outliar’ analysis of Sharapova’s Facebook post (left) one of a Lance Armstrong TV interview in whch he falsely denied doping (see previous blog for more discussion of Armstrong’s deception). Visually, you can see that  Lance Armstrong’s language use displayes high variability in comparison to which Sharapova’s language is relatively stable.

Figure 5: Comparison of the ‘Outliar’ analysis of Maria Sharapova’s Facebook post (left) and Lance Armstrong’s ESPN interview (right) .

Figure 6 below shows a statistical measure of intratextual language variation for five false doping denials vs. five true doping confessions (see p7 here for the formula). It can be seen that the deceptive communications show more language change than the honest ones. So, combining outlier text detection with an overall measure of language variability can be helpful in distinguishing honesty from dishonesty. Frequent and marked language style change is a signal of potential deception.

intratext analysis edit 2

Figure 6: Analysis of intratextual variation. Y-axis = total intratextual variation measured as aggregate z-score for each text; X-axis represents ten texts in total –  five deceptive texts (false denials by: 1) Barry Bonds; 2) Linford Christie; 3) Lance Armstrong; 4) Alex Rodriguez; 5) Marion Jones) and five honest texts (true confessions by: 1) Maria Sharapova; 2) Dwain Chambers; 3) Victor Conte; 4) Floyd Landis; 5) Levi Leiphemer)

In Sharapova’s case, the tribunal were satisfied she had not intended to cheat (although she was found to have also taken the drug to enhance her performance) and her relatively light ban (reduced from two years to 15 months on appeal) reflected the fact that she had been negligent but not deceptive. I would argue that the (relatively) stable language of both Sharapova’s Facebook post and the initial press conference where she announced her drug test failure support the tribunal finding. The press conference video is below – judge for yourself.



Linguistic Pointing and Deception Detection

So here’s the thing. You can tell somebody is lying – or more correctly, deceiving – by the words they use. I’m not talking about gesture or disguise or other types of non-verbal deception. I mean when there are words and text – either written or spoken – those words will reveal deception if you know what to look for.

“Listen, nobody believes in doping controls more than me.” — Lance Armstrong

Does that mean it’s possible to detect deception by reading a text or transcript or listening to someone speak? Not exactly. Factors such as human truth bias and our reliance on heuristics to process information mean that judgement derived from our senses is not entirely reliable (although it can be improved by education and training).

Deception detection is possible by processing a text. Now, unless you are some kind of artificial intelligence, you will rely on an automated tool for computational and statistical analysis. Non-verbal deceptions such as credit card or other financial fraud are already detected using statistical algorithms and other data mining techniques. Advances in linguistics mean that texts can also be processed as data and then classified and grouped together – all without being read or heard.

There are different features of language that can be analysed. Words, word sequences, types of words, grammar, syntax and so on. These features can be analysed individually or as groups that represent underlying concepts (e.g. ‘certainty’ or ‘complexity’). You can analyse known true and deceptive texts for these language features, compare the frequencies and distributions, and find linguistic tendencies that correlate with deception and truth. But what are these linguistic features?

There are many sets of linguistic features that have been used for deception detection (see Hausch et al’s 2015 meta-analysis for a comprehensive list of experiments and linguistic features used). The features that are most effective are the ones that enable the linguistic act of deception.

Take the following:

– “ Car.”

By itself, this single common noun cannot be a lie. If I said ‘car’ and pointed to a bicycle or a phone then that could be a lie. There are linguistic different resources for ‘pointing’:

– “That is a car.”
– “I have a car”
– “My car”

In fancy linguistic terminology, ‘pointing’ is known as referential indexicality. Other types of ‘pointing’ can be to a particular place, period of time, event, assumption, thought and so on – even to the text itself. Drawing on the linguistic theory and influence of the Prague School, I use a set of these linguistic features to analyse texts for ‘textual hotspots’ – a linguistic equivalent of the non-verbal hotspots such as micro-expression, gesture and voice identified by the psychologist Dr Paul Ekman (on whom the TV show, ‘Lie To Me’ is based).

I have developed a tool for identifying these linguistic hotspots using anomaly detection techniques adapted from banking fraud. Research has shown that these linguistic deception features cluster together when deceptive language is being used. Ergo, anomalous clusters of linguistic features that point to deception are areas of potential deception. Not to steal Dr Ekman’s thunder, I am calling these anomalous textual hotspots ‘outliars’.

I’ve tested this hypothesis on a number of known deceptions and the results, which are promising, were presented at the Decepticon 2017 conference held at Stanford University.  I chose statements made by sportspersons about the use of performance-enhancing drugs and doping because these are high-stakes deceptions and so more likely to leave linguistic traces.

Screenshot 2018-06-21 19.50.12

One of the most famous examples of doping deception is Lance Armstrong. How did Lance Armstrong successfully deceive so many millions for so long? Bullying, good lawyers and a fairytale narrative of cancer recovery and global charity certainly played their part. But the key to Armstrong maintaining this deception – through various testimony, interviews, biographies – was his sustained verbal performance.

One classic example is Armstrong’s 2005 interview with Bob Ley on the ESPN show ‘Outside the Lines’.’Outside the Lines’ is an investigative ESPN TV series that takes a critical look at American sports issues. This interview was conducted by the usual anchorman, Bob Ley. Armstrong was a year into his first retirement, after winning his 7th Tour de France in 2005 and had just been cleared of doping allegations after a lengthy trial. The show is renowned for its tough questioning and investigative slant, and Bob Ley did not hold back. Below is a transcript of the interview.

Figure 1 shows my ‘Outliar’ analysis of the responses. The  analysis picks out two linguistic hotspots of potential deception – sections 5 and 12. These are highlighted in the transcript but I’m going to lay them out here for analysis.

Screenshot 2018-06-23 21.02.48

Figure 1: Outliar analysis of Lance Armstrong’s interview responses. Armstrong’s interview responses are represented as a time series on the x-axis (c.30 second chunks). The y-axis measures the relative frequency of linguistic deception features; text segments scoring over 3.5 are recorded as anomalous (the Iglewicz-Hoaglin method).

Segment 5 contains the following extract. Bob Ley had asked whether it was true Armstrong had made a phone call to Dr Prentice Steffen threatening “to spend a lot of money to make your life miserable” if Steffen did not retract comments accusing Armstrong of doping [transcript lines 46-50].

ARMSTRONG: Not true. Steven, er Prentice Steffen I think was his name, was not part of the team when I was there, I hardly know him. The only interaction I ever had with him I think was when he was a team doctor with the Mercury cycling team and I helped one of their young riders I think get care for testicular cancer. That’s the first and only interaction I ever had with him.

In this outlier extract, Armstrong denies the accusation by distancing himself from Dr Prentice Steffen (Steven, Stefan, what was his name again?). Instead he foregrounds his charity work for an anonymous sick rider. The underlined sentence introduces three new referents – a young rider, Mercury and cancer.  Such a topic shift and introduction of third party issues is a pragmatic technique for diverting attention. The cluster of pronouns picked out by the analysis – ‘he’, ‘their’, ‘I’– facilitate the diversion and leave the final ‘him’ ambiguous (technically this ‘him’ should refer to the nearest qualifying noun i.e. ‘one of their young riders’).

In contrast,  the following extract from segment 8 is representative of Armstrong’s ‘baseline reading”. Bob Ley had asked whether it was true Armstrong had made a phone call to Greg Lemond, threatening to smear him: “I can produce 10 people that say you took EPO”:

With regards to Greg Lemond I have to say as a young guy, I did idolize him in 1989, I think we all remember that incredible story coming back after getting shot and winning the tour by 8 seconds, the smallest margin ever. I mean he was a guy that quite literally put all of us into cycling, because he was appealing to us at a young age. But, er, for a past champion and a great champion, one of the greatest athletes of all time, to be so involved in a case, I mean I ask you Bob, I ask the viewers, why would you be so involved?

In answering this, Armstrong appears to show rare humility; he acknowledges his own inspiration and even someone else’s achievements. However, a closer reading reveals a mocking tone in which Armstrong draws attention to the narrowness of LeMond’s victory – “winning the tour by 8 seconds, the smallest margin ever”. The transcript shows that taunting accusers is Armstrong’s baseline linguistic behaviour in this interview, which is why the reticent language used when discussing Dr Prentice Steffen in segment 5 stands out as deceptive.

The analysis also flags the following segment 12 as a linguistic hotspot of potential  deception. Here Ley has asked Armstrong about his attempts to shut down the World Anti-Doping Agency (WADA) investigation that was shining a light on doping in cycling and Armstrong at that time.

Now there’s two people involved in this process. There’s the athletes and there’s the people who police the athletes. And both of them have to be ethical. Listen, nobody believes in doping controls more than me. I’ve submitted to all of them, whether in competition or out of competition. Now listen, I’m not saying my best defence is I’ve never tested positive. All I’m saying is that the last few years when you were supposed to tell the investigators and the drug testers everywhere you were everyday of the year, I did it.

Here, rather than taunting accusers, Armstrong again points the linguistic finger. He insinuates that the drug testing process and its ubiquitous participants (‘people’, ‘investigators’, ‘testers’) may not be ethical and he portrays himself as a willing (and perhaps slightly persecuted) subject to the testing regime. However, with the assertion “nobody believes in doping controls more than me”, Armstrong leaks the fact that he has been expert at manipulating the drug testing system. He immediately realises this ‘slip’ and moves to deny its implicature that his “best defense is I’ve never tested positive”. The final ‘it’ is ambiguous and difficult to resolve – a deception strategy we also saw in the above segment 5.

There are more examples like this in my Decepticon 2017 Stanford presentation (including an interesting connection between doping and asthma!) so take a look at that if you are interested in more detail on the method (or write to me). But the real value of this method, I think, is as an investigative linguistic tool which can identify ‘points of interest’ and thus aid forensic and journalistic investigations. So future blog posts will probe the public statements and testimony related to the key events, scandals and crimes of this post-truth era.