This analysis was conducted at 2018-01-09 16:59:15 and only captures statement ratings PolitiFact published before that time. This project will auto-update each time that I re-run the R script.
According to its website, PolitiFact is a “fact-checking website that rates the accuracy of claims by elected officials and others who speak up in American politics.” The goal of this project is to analyze all of the ratings that PolitiFact has issued since it began in 2007 and to look for trends and other curiosities in the data. Besides strengthening my data analysis and mark-up skills, my motivations for this project have been my admiration of PolitiFact’s non-partisan fact-checking and my own simple curiosity.
The most recent statements evaluated by PolitiFact appear at this page, (http://www.politifact.com/truth-o-meter/statements/). As of the date and time I ran this code, PolitiFact has evaluated a total of 14,181 ratings. [^1]
On its site, PolitiFact evaluates statements using either its Truth-O-Meter or its Flip-O-Meter system. In short, the Truth-O-Meter is used to evaluate statements for their accuracy (did the speaker say the truth, a falsehood, or something in between?), while the Flip-O-Meter is used to evaluate an official’s consistency on an issue (did they maintain their position, or partly/completely change their stance on a topic?).
From here on, my analysis will focus only on those statements evaluated by the Truth-O-Meter since they are much more numerous. There are 13,950 (98.4%) Truth-O-Meter Ratings and 231 (1.6%) Flip-O-Meter Ratings.
PolitiFact assigns statements one of six possible ratings to Truth-O-Meter Ratings:
TRUE - The statement is accurate and there’s nothing significant missing.
MOSTLY TRUE - The statement is accurate but needs clarification or additional information.
HALF TRUE - The statement is partially accurate but leaves out important details or takes things out of context.
MOSTLY FALSE - The statement contains an element of truth but ignores critical facts that would give a different impression.
FALSE - The statement is not accurate.
PANTS ON FIRE - The statement is not accurate and makes a ridiculous claim.
For more information on how PolitiFact selects and evaluates statements, see here.
If we consider “truths” to be those statements PolitiFact rated as Mostly True or True and “falsehoods” to be those rated as Pants on Fire!, False, or Mostly False, then 34.7% of the total statements rated were truths, 45.3% of the total statements rated were falsehoods, and the remaining 20.0% statements were Half-True . As such, PolitiFact has rated more statements as falsehoods than as truths; whether this represents a selection bias on PolitiFact’s part or the nature of American political rhetoric is something not possible to say with this data alone.
PolitiFact is made up of various “Editions” that focus on different sources for the statements that they will rate. There are a total of 25 editions, which I have grouped into two main categories: “State” Editions that focus on statements made by officials or people in a certain U.S. state (21 such editions) and “Non-State” Editions that do not focus on a a single state (the remaining 4 editions). These 21 State Editions cover states with a total of 345 electoral votes, equal to 64.1% of the total electoral votes available for presidential elections.
As we see from this table, PolitiFact National was the source for over one-third of the total statements evaluated. This is likely due in part to the fact that it was the earliest edition founded. The other “Not State” editions include PunditFact, which evaluates statements made by political pundits, PolitiFact Global News Service, which evaluates statements made about Health and Development, and PolitiFact NBC, which is a partnership between PolitiFact and NBC.
Rank | Issuer | Founded | Type | Total Ratings | Percentage of Total Ratings | Electoral Votes |
---|---|---|---|---|---|---|
1 | PolitiFact National | 2007 | Not State | 4,662 | 33.4% | NA |
2 | PolitiFact Florida | 2009 | State | 1,435 | 10.3% | 29 |
3 | PolitiFact Texas | 2010 | State | 1,354 | 9.7% | 38 |
4 | PolitiFact Wisconsin | 2010 | State | 1,259 | 9.0% | 10 |
5 | PunditFact | 2013 | Not State | 948 | 6.8% | NA |
6 | PolitiFact Georgia | 2010 | State | 862 | 6.2% | 16 |
7 | PolitiFact Ohio | 2010 | State | 591 | 4.2% | 18 |
8 | PolitiFact Rhode Island | 2010 | State | 544 | 3.9% | 4 |
9 | PolitiFact Virginia | 2010 | State | 537 | 3.8% | 13 |
10 | PolitiFact New Jersey | 2011 | State | 395 | 2.8% | 14 |
11 | PolitiFact Oregon | 2010 | State | 390 | 2.8% | 7 |
12 | PolitiFact New Hampshire | 2011 | State | 153 | 1.1% | 4 |
13 | PolitiFact California | 2015 | State | 121 | 0.9% | 55 |
14 | PolitiFact Global News Service | 2016 | Not State | 91 | 0.7% | NA |
15 | PolitiFact Missouri | 2015 | State | 90 | 0.6% | 10 |
16 | PolitiFact New York | 2016 | State | 88 | 0.6% | 29 |
17 | PolitiFact North Carolina | 2016 | State | 88 | 0.6% | 15 |
18 | PolitiFact Pennsylvania | 2016 | State | 80 | 0.6% | 20 |
19 | PolitiFact Tennessee | 2012 | State | 76 | 0.5% | 11 |
20 | PolitiFact Illinois | 2016 | State | 63 | 0.5% | 20 |
21 | PolitiFact Nevada | 2016 | State | 41 | 0.3% | 6 |
22 | PolitiFact Arizona | 2016 | State | 38 | 0.3% | 11 |
23 | PolitiFact Colorado | 2016 | State | 29 | 0.2% | 9 |
24 | PolitiFact Iowa | 2015 | State | 12 | 0.1% | 6 |
25 | PolitiFact NBC | 2016 | Not State | 3 | 0.0% | NA |
Given that only a subset of all of the PolitiFact Editions are responsible for the vast majority of the statement ratings, I am going to focus on those Editions which individually were responsible for more than 1% of the Truth-O-Meter ratings. These top 12 PolitiFact Editions were cumulatively responsible for 94.0% of the total statements rated.
Looking at the most important PolitiFact Editions, I have set up a metric to measure the average truthfulness of the statements that they have rated. Any statement that is rated as True, I have assigned a Truthfulness Score of +2. Likewise, any statement that PolitiFact has rated False, I have assigned a Truthfulness value of -2. Mostly True, Half-True, and Mostly False statements thus correspond to scores of +1, 0, and -1 respectively. Pants on Fire! claims are “False” statements that are especially ridiculous, so I have given them a Truthfulness Score of -3 (see Table below).
Statement Rating | Truthfulness Score |
---|---|
True | 2 |
Mostly True | 1 |
Half-True | 0 |
Mostly False | -1 |
False | -2 |
Pants on Fire! | -3 |
Using then the top 12 PolitiFact Editions, I have generated a Mean Truthfulness Score for all of the statements PolitiFact has examined. This number takes the total number of statements by rating (e.g. True, Mostly True, etc.), the Truthfulness Score I have assigned each rating, and then divides the sum of those Truthfulness values by the total number of ratings that each Edition has made. In other words, this metric says “What is the average truthfulness of a statement evaluated by each PolitiFact Edition?”.
As we see from the graph below, PunditFact has the lowest average Truthfulness score at -1.148, which means that the average statement that PunditFact rates is Mostly False. Rounding out the top three, PolitiFact Wisconsin and PolitiFact National were the next Editions with the lowest average Truthfulness Scores. On the other end of the spectrum PolitiFact Georgia and PolitiFact Ohio were the only Editions who had a positive truthfulness rating, meaning that the average statement they rated was at least Half-True.
Given that PolitiFact provides the date that it has rated each statement, it may be interesting to explore if the relative truthfulness of the statements rated has changed over time. Since mid-2010, PolitiFact has generally issued about 3-4 statement ratings per day. In the below graph, I have taken the Mean Truthfulness Score for all of the statements that PolitiFact has evaluated each month (excluding months where PolitiFact rated fewer than 30 statements) and a smooth line to estimate the average Mean Truthfulness Score over time.
Some interesting trends present themselves.
Seeing as how quickly PolitiFact’s monthly Mean Truthfulness Score becomes very negative after the 2016 presidenital election, I decided to separate the ratings into three mutually exclusive categories to figure out why:
I took the subtotal Mean Truthfulness values for these three statement categories and divided by the total number of ratings issued each month by PolitiFact. These steps allows me to determine what each of the three categories contributed to PolitiFact’s total Mean Truthfulness Score for each month.
For example, in December 2017 for all of PolitiFact there were 78 total statements rated with a Mean Truthfulness Score of -1.359. This value correlates to the average statement being at least Mostly False, and this month has the lowest Mean Truthfulness Score since PolitiFact was founded as you can again see in the graph below. Of that -1.359 Mean Truthfulness Score:
Since December 2016, ratings of statements made by websites have been the single largest contributor to the highly negative Mean Truthfulness Scores of late. This coincides with the founding of PolitiFact’s partnership with Facebook to fact-check claims made on the social media site, and likely represents a push overall to rate the accuracy of more claims made online. [^2] Since December 2016, PolitiFact has rated an average of 18 statements made online per month and these statements have an average rating of -2.766, which is closest to Pants on Fire!. Prior to December 2016, PolitiFact only rated an average of 4.2 statements from websites per month, and with an average rating of -1.782 these ratings were closer to False.
PolitiFact groups together many of the statements it rates by the subject that statement addresses (e.g. Abortion, Patriotism, Obama Birth Certificate, etc.) here. PolitiFact also notes the frequency with which it assigns its six ratings (i.e. True, Mostly True, Half-True, Mostly False, False, and Pants on Fire!) to statements in each subject. Currently there are 149 different subjects. The most frequently discussed subjects and their Mean Truthfulness Scores are below:
Rank | Subject | Total Ratings | Mean Truthfulness Score |
---|---|---|---|
1 | Health Care | 1,580 | -0.582 |
2 | Economy | 1,508 | -0.067 |
3 | Taxes | 1,340 | -0.278 |
4 | Education | 996 | -0.102 |
5 | Jobs | 962 | -0.166 |
6 | Federal Budget | 958 | -0.170 |
7 | State Budget | 920 | -0.216 |
8 | Candidate Biography | 830 | -0.488 |
9 | Elections | 826 | -0.317 |
10 | Immigration | 747 | -0.503 |
Looking at the most truthfully discussed and most falsely discussed subjects, it comes as no surprise that topics such as “Fake news” and Obama’s Birth Certificate have very negative Mean Truthfulness Scores. I leave it to others to ponder why topics such as “Population”, “Redistricting”, and “Gambling” are more truthfully discussed.[^3] I believe part of it has to do with uneven statement sampling.
And finally, the last part of my analysis looks at various U.S. elected politicians’ individual records with the Truth-O-Meter. My intent has been to capture the most important U.S. politicians, whom I have defined to include current and recently former Presidents, Vice-Presidents, Presidential Candidates, Senators, Representatives, and Governors. [^4] In total, I will focus on the 467 Democratic and Republican U.S. politicians from these groups who have made statements that PolitiFact has rated.
Interestingly, PolitiFact has rated statements by 19.2% more Republican politicians than Democratic ones (254 vs. 213) but it has rated 62.7% more statements by Republican politicians than statements by Democratic ones (4,085 vs. 2,511). See the table below:
Party | Number of Politicians Rated | Pants on Fire! | False | Mostly False | Half-True | Mostly True | True | Total Ratings |
---|---|---|---|---|---|---|---|---|
Democratic | 213 | 69 | 308 | 331 | 596 | 682 | 525 | 2,511 |
Republican | 254 | 306 | 890 | 802 | 822 | 693 | 572 | 4,085 |
Next, I will look at which politicians have the highest Mean Truthfulness score by party. The five Democratic politicians with the highest Truthfulness Score are Sherrod Brown, Tim Kaine, Hillary Clinton, Bill Clinton, and Barack Obama while the five Democrats with the lowest Truthfulness Score are Terry McAuliffe, Nancy Pelosi, Debbie Wasserman Schultz, Tammy Baldwin, and Joe Biden. The five Republican politicians with the highest Truthfulness Score are Rob Portman, Nathan Deal, John Kasich, Jeb Bush, and Rand Paul while the five Republicans with the lowest truthfulness score are Michele Bachmann, Donald Trump, Ted Cruz, Newt Gingrich, and Rick Santorum. The five least truthful Democrats all have a higher Mean Truthfulness Score than the five least truthful Republicans.
Currently, there are 0 major U.S. Politicians who have not made a False or Pants on Fire! statement, among those with 30 or more statements rated by PolitiFact. The adage that “All politicians lie” seems to be accurate, but it should be followed up with “but some lie a lot more than others.”
The final part of my analysis includes grouping all of the major U.S. politicians who have received ratings from PolitiFact by party. This may be an accurate reflection of which party is more “truthful” or it may reflect selection-bias by me or by PolitiFact. Nonetheless, after performing a z-test on statements by major U.S. politicians from the Democratic and Republican party, I can say with greater than 99% confidence that the Mean Truthfulness Score for statements made by major U.S. Democratic politicians is statistically higher than that for major U.S. Republican politicians. [^5]
This project is simply an effort to improve my data wrangling and analysis skills as well to create a personal and demonstrable product of my abilities. While I have no connection to PolitiFact, I deeply appreciate the work they do and I encourage others to do the same. I apologize for using up their server space while scraping their site, but hopefully any exposure of their work and financial contributions from myself and others can more than repay that.
For information on joining PolitiFact, see here.
If you have any suggestions about ideas to extend this analysis, please share them. You are free to share this github site (attributing me as the author) or to use any of the R code I have written for your own private, non-commercial use. Simply put, please respect the time and effort I put into this project.
[^1] Not until I had done a lot data wrangling did I realize that PolitiFact seems to be missing from its site any ratings issued in November 2008. While it has ratings through October 31, 2008 and beginning again on December 1, 2008, there are no ratings at all for November 2008. Visit this page and nearby ones to verify (http://www.politifact.com/truth-o-meter/statements/?page=669)
[^2] It appears that PolitiFact’s efforts to fact-check claims appearing on Facebook do not appear in its Truth-O-Meter ratings, which puts them beyond the scope of my analysis. When I discuss PolitiFact’s ratings of statements made by websites, I only analyze those that appear in PolitiFact’s Truth-O-Meter ratings.
[^3] Not all subjects listed on PolitiFact’s subjects page are actual issues. While subjects such as Abortion and Islam capture statements that refer to Abortion or Islam, the subject “This Week - ABC News” captures statements instead made by politicians and pundits while actually on that television show.
[^4] PolitiFact has rated statements by 3,666 different entities. In my analysis of statements by major politicians by their political party, I used the following lists:
When I refer to ‘Democrats’ and ‘Republicans’ in my analysis, I am referring to politicians from the above lists who I have been successfully able to match to individuals who have had statements rated by PolitiFact. Because the matching process is difficult, I cannot guarantee that every individual who falls into the following categories has been accounted for but this should include enough politicians to make my conclusions and analysis valid. I furthermore sought to ensure that every valid U.S. politician who has had at least 10 statements rated by PolitiFact was included. The political party designation used comes from the above mentioned sources.
The remaining entities whose individual records I have intentionally not chosen to analyze include:
Keep in mind that many politicians who did fit my criteria may at some point have held a position on the above list.
[^5] A z-test is used when the population variance and standard-deviation are known. Because I am considering the population only to be the statements that PolitiFact has rated (rather than all statements all politicians have made), I believe a z-test is more appropriate than a t-test. For Democrats, the standard deviation of their Mean Truthfulness Score is 1.388 and for Republicans it is 1.519 (all numbers in this footnote are rounded to three digits). In the z-test, the null hypothesis is that the true difference between the Mean Truthfulness Score of Democrats and Republicans is equal to zero; currently the difference between those means is 0.637. With 99 percent confidence, the true difference between these means is estimated to be between 0.543 and 0.731.