What does honesty look like (statistically)?

Certain linguistic features (e.g  reference, modality) facilitate deception because they are malleable to context and flexible to interpretation. My first blog post showed that deceptive communication contains ‘outliars’, portions of texts with an unusually high concentration of these linguistic features; in the second post we saw that the linguistic hotspots where these features cluster can be taken as ‘points of interest’ worthy of further investigation. Of course, liars do not have a monopoly on the use of modals! Furthermore, truth-tellers can sometimes be mistaken for liars due to nervousness, fear of disbelief, or perceptions of powerlessness (known as the ‘Othello error’). So what does honesty (non-deceptive) communication look like?

sharapova mistake

In my Standford Decepticon 2017 conference paper I tested the ‘Outliar’ investigative linguistic methodology on honest admissions of doping – true confessions – by the following five sports persons and professionals:

true doping confessions.png

The Maria Sharapova case took the tennis world by surprise (she was the first high-profile female tennis player to fail a drug test). In 2016, Sharapova was banned from competition after testing positive for meldonium during the Australian Open in January of that year. Meldonium is a heart medication that was found by the World Anti-Doping organisation (WADA) to be particularly popular amongst sports persons from Russia and Eastern Europe, perhaps due to its ability to block the body’s conversion of testosterone to oestrogen. Having placed meldonium on a watch list in 2015, WADA had fully prohibited the substance from January 1 2016, two weeks before the Australian Open. Following the failed drug test, Sharapova admitted she had been taking meldonium as medication since 2006 and stated that she had negligently and inexcusably missed the communications from WADA prohibiting its use.

Linguistic analysis of the explanation Sharapova gave to fans via her Facebook page shows two ‘outliars’ at the beginning and end of the post (see Figure 4 below).

Sharapova outliar graph

[1] I want to reach out to you to share some information, discuss the latest news, and let you know that there have been things that have been reported wrong in the media, and I am determined to fight back. You have shown me a tremendous outpouring of support, and I’m so grateful for it.

[13] I have been honest and upfront. I won’t pretend to be injured so I can hide the truth about my testing. I look forward to the ITF hearing at which time they will receive my detailed medical records. I hope I will be allowed to play again. But no matter what, I want you, my fans, to know the truth and have the facts.

Figure 3: Outliar analysis of Maria Sharapova’s 2016 Facebook post and outlier extracts.

Sharapova begins her post by suggesting she has been a victim of unjust media coverage. It had been widely reported that she had received five ‘warnings’ about the upcoming change to the WADA regulations. Sharapova agreed that she had received newsletters with links to the WADA rule changes but argued that these were ‘communications’ rather than warnings through which one had to “hunt, click, hunt, click, hunt, click, scroll and read” in order to find information about the prohibition. Sharapova ends her post by strongly maintaining that she is being honest about her genuine mistake (of using Meldonium as medication after the ban).

These anomalous extracts are particularly emotional when compared to the main body of this post, in which Sharapova gives specific details about all the communications she did receive (see yellow highlighted text in Figure 4 below). There is a lot of literature that suggests specific details are a strong indicator of veracity in legal genres such as witness statements. (Professor Aldert Vrij’s research on Criteria Based Content Analysis is a good place to start.) These anomalous extracts could just be ‘Othello errors’ that are confusing emotional intensity for deception.

Sharapova FB 1cSharapova FB 2c

Figure 4: Maria Sharapova Facebook post, March 2016. Last accessed 21/7/2018

Accounting for the ‘Othello error’ is one reason a full ‘Outliar’ analysis uses an additional measure of language change within a text – intratextual language variation – when assessing text veracity. Texts can range from having a uniform style with consistent use of features throughout – a stable text – to displaying marked changes in language style at several points – variable or ‘spiky’ text.  Outliar captures this by summing the amount of change shown in a text.

Figure 5 is an example of this. It compares ‘Outliar’ analysis of Sharapova’s Facebook post (left) one of a Lance Armstrong TV interview in whch he falsely denied doping (see previous blog for more discussion of Armstrong’s deception). Visually, you can see that  Lance Armstrong’s language use displayes high variability in comparison to which Sharapova’s language is relatively stable.

Figure 5: Comparison of the ‘Outliar’ analysis of Maria Sharapova’s Facebook post (left) and Lance Armstrong’s ESPN interview (right) .

Figure 6 below shows a statistical measure of intratextual language variation for five false doping denials vs. five true doping confessions (see p7 here for the formula). It can be seen that the deceptive communications show more language change than the honest ones. So, combining outlier text detection with an overall measure of language variability can be helpful in distinguishing honesty from dishonesty. Frequent and marked language style change is a signal of potential deception.

intratext analysis edit 2

Figure 6: Analysis of intratextual variation. Y-axis = total intratextual variation measured as aggregate z-score for each text; X-axis represents ten texts in total –  five deceptive texts (false denials by: 1) Barry Bonds; 2) Linford Christie; 3) Lance Armstrong; 4) Alex Rodriguez; 5) Marion Jones) and five honest texts (true confessions by: 1) Maria Sharapova; 2) Dwain Chambers; 3) Victor Conte; 4) Floyd Landis; 5) Levi Leiphemer)

In Sharapova’s case, the tribunal were satisfied she had not intended to cheat (although she was found to have also taken the drug to enhance her performance) and her relatively light ban (reduced from two years to 15 months on appeal) reflected the fact that she had been negligent but not deceptive. I would argue that the (relatively) stable language of both Sharapova’s Facebook post and the initial press conference where she announced her drug test failure support the tribunal finding. The press conference video is below – judge for yourself.

 

 

The case of the asthmatic cyclists: deception detection as investigative linguistics

Did you know that Sir Bradley Wiggins, Sir Mo Farah and I-thought-he-was-a-sir Chris Froome all suffer from asthma? As do a number of other succesful sports persons accused of using performance-enhancing drugs?  What I call ‘investigative linguistics’ led me down this particularly rabbit hole. My investigative linguistic approach examines texts for ‘points of interest’ (POIs). It uses deception detection tools on communications with unknown veracity in order to automatically identify POIs. One benefit is that you can approach a topic without any prior knowledge or biases and quickly find avenues that are objectively worth exploring.

After analysing the known deceptions of Lance Armstrong (see previous blog), I collected a bunch of statements made by sports people admitting or denying their use of perfomance-enhancing drugs. Based on currently available evidence these statements were divided into three categories: a) false denials, b) true confessions and c) presumed-to-be-true denials (see my Stanford Decepticon 2017 presentation for full details). For the ‘true’ denials category, I picked five recent high-profile cases: two relating to the cycling controversy around Sir Bradley, Sir David Brailsford and Team Sky (video explainer), and three connected to the controversy around the infamous athletics coach Alberto Salazar and his Nike Oregon Project which engulfed Sir Mo Farah, the Canadian Cameron Levins and, indirectly, Paula Radcliffe MBE.

true denials dataset

It was a surprise to me that each of the interviews (graphed below) mentions asthma and related issues. If I had done some prior research I would have realised that the provision and use of asthma medication during and around major sport events was a key issue in sports doping. Still, it shows that deception detection techniques can be used for exploration i.e. to find the ‘points of interest’ worthy of further investigation. Furthermore, a ‘naive’ approach helps to avoid unconscious bias affecting the analysis

Slide1

(‘Outliar’ analysis of four ‘true denials’. Interview responses are represented as a time series on the x-axis (c.30-60 second chunks). Green shading indicates ‘outliar’ text. Asthma mentions marked with ∇)

In Bradley Wiggins’ Guardian interview (given to counter suggestions of illegal doping during the 2012 Tour de France), the analysis highlights inconsistencies in Wiggins’ stance towards his asthma allergy:

[14] I was paranoid about making excuses: “Ah, my allergies have kicked in.” I’d learned to live with this thing. It wasn’t something I was going to shout from the rooftops and use as an excuse and say, “my allergies have started off again”. That’s convenient isn’t it Brad, your allergies started when you got dropped.

[17] I didn’t mention it in the book. I’d come off a season of … I’d won everything that year. When I was writing the book I wasn’t sat there thinking, “I’d better bring my allergies up”. I was flying on cloud nine after dominating the sport all year. It wasn’t something that I brought to mind.

In these two extracts, asthma is simultaneously a big deal and a non-issue for Wiggins. While this does not in any way confirm deception or guilt, it does indicate a defensive stance that is worth investigating. This can be contrasted with Mo Farah’s discussion of his own asthma:

[4] “This picture has been painted of me. It’s not right. I am 100% clean. I love what I do. I want to continue winning medals. But I want people to know that I am 100%, I am not on any drugs, I am not on thyroids, I am not on any other medication. The only medication that I am on, I am on asthma and I have had that since I was a child. That’s just a normal use. I am on TUE [therapeutic use exemption] where you have … it’s just the normal stuff. And that’s it.” – Sky Sports interview, 2015

In contrast to Wiggins, Mo Farah volunteers information in a non-defensive fashion about his asthma and use of Therapeutic Use Exemptions (TUEs – a doctor’s note and prescription). Canadian runner Cameron Levins’ response to questions about his use of prescription drugs registers as more ‘interesting’ than Farah’s although not as high as Wiggins:

Interviewer: No prescription drugs?

Levins: I have some medication I take for my asthma, but that is something that is wrong with me. I’m asthmatic.

Interviewer: Was that before you came on with the (Nike Oregon) Project?

Levins: Yeah, I was dealing with it before I joined the project actually. A little bit after the London Olympics I started having quite a bit of difficulty with it. So it was before I joined the project.

Levins later goes on to explain that “adult onset asthma is pretty common”. Obviously, having asthma since childhood is easier to defend which may explain the higher ‘interestingness’ score of Levins response compared to Farah.

In a 2016 interview with Sky Sports news David Brailsford, director of Team Sky (whose riders included Bradley Wiggins and Chris Froome), offered the following highly ‘interesting’ reply when asked about his team’s covert use of Therapeutic Use Exemptions to obtain otherwise prohibited drugs:

[5] We’ve reviewed this over the years as we’ve moved forward. We have changed our policy, we’ve changed the way we do it, and in the future going forward, I think we’re going to take the next step, which has been debated on a wider basis across the whole of the TUE process, and look at having the consent of the riders to make all TUEs transparent

In this segment, Brailsford tries draw a line under anything that may have occurred in the past, using many words related to looking to the future. In no way presuming anything illegal on Brailsford or Team Sky’s part, previous policy would clearly be a ‘point of interest’.

So, this analysis suggests that cyclists use of asthma medication is more ‘interesting’ than that of athletes. (In Farah’s interview, the analysis flags his comments about missed drug tests rather than specific doping allegations.) Understanding the reasons for this can then provide a focus for further investigation. As Chris Froome’s recent successful appeal shows, the asthma issue may be due to faulty regulations based on models with a tendency to generate ‘false positives’ – itself a form of deviance if not deception.

froome inhaler

(picture © BBC/Getty Images, 2018)

For this naive analyst, investigative linguistics revealed an important connection between asthma and sports doping that is clearly ‘interesting’. Application of the same techniques to the domains of business, politics and finance will definitely be interesting…