PolitiFact - 10+ Years of Fact Checking

This analysis was conducted at 2018-01-09 16:59:15 and only captures statement ratings PolitiFact published before that time. This project will auto-update each time that I re-run the R script.

1. Background

According to its website, PolitiFact is a “fact-checking website that rates the accuracy of claims by elected officials and others who speak up in American politics.” The goal of this project is to analyze all of the ratings that PolitiFact has issued since it began in 2007 and to look for trends and other curiosities in the data. Besides strengthening my data analysis and mark-up skills, my motivations for this project have been my admiration of PolitiFact’s non-partisan fact-checking and my own simple curiosity.

The most recent statements evaluated by PolitiFact appear at this page, (http://www.politifact.com/truth-o-meter/statements/). As of the date and time I ran this code, PolitiFact has evaluated a total of 14,181 ratings. [^1]

1.1 Truth-O-Meter vs. Flip-O-Meter

On its site, PolitiFact evaluates statements using either its Truth-O-Meter or its Flip-O-Meter system. In short, the Truth-O-Meter is used to evaluate statements for their accuracy (did the speaker say the truth, a falsehood, or something in between?), while the Flip-O-Meter is used to evaluate an official’s consistency on an issue (did they maintain their position, or partly/completely change their stance on a topic?).

From here on, my analysis will focus only on those statements evaluated by the Truth-O-Meter since they are much more numerous. There are 13,950 (98.4%) Truth-O-Meter Ratings and 231 (1.6%) Flip-O-Meter Ratings.

PolitiFact assigns statements one of six possible ratings to Truth-O-Meter Ratings:

TRUE - The statement is accurate and there’s nothing significant missing.
MOSTLY TRUE - The statement is accurate but needs clarification or additional information.
HALF TRUE - The statement is partially accurate but leaves out important details or takes things out of context.
MOSTLY FALSE - The statement contains an element of truth but ignores critical facts that would give a different impression.
FALSE - The statement is not accurate.
PANTS ON FIRE - The statement is not accurate and makes a ridiculous claim.

For more information on how PolitiFact selects and evaluates statements, see here.

2. Truth-O-Meter Summary Statistics

If we consider “truths” to be those statements PolitiFact rated as Mostly True or True and “falsehoods” to be those rated as Pants on Fire!, False, or Mostly False, then 34.7% of the total statements rated were truths, 45.3% of the total statements rated were falsehoods, and the remaining 20.0% statements were Half-True . As such, PolitiFact has rated more statements as falsehoods than as truths; whether this represents a selection bias on PolitiFact’s part or the nature of American political rhetoric is something not possible to say with this data alone.

2.1 Rating Issuer

PolitiFact is made up of various “Editions” that focus on different sources for the statements that they will rate. There are a total of 25 editions, which I have grouped into two main categories: “State” Editions that focus on statements made by officials or people in a certain U.S. state (21 such editions) and “Non-State” Editions that do not focus on a a single state (the remaining 4 editions). These 21 State Editions cover states with a total of 345 electoral votes, equal to 64.1% of the total electoral votes available for presidential elections.

As we see from this table, PolitiFact National was the source for over one-third of the total statements evaluated. This is likely due in part to the fact that it was the earliest edition founded. The other “Not State” editions include PunditFact, which evaluates statements made by political pundits, PolitiFact Global News Service, which evaluates statements made about Health and Development, and PolitiFact NBC, which is a partnership between PolitiFact and NBC.

The Various PolitiFact Editions
Rank	Issuer	Founded	Type	Total Ratings	Percentage of Total Ratings	Electoral Votes
1	PolitiFact National	2007	Not State	4,662	33.4%	NA
2	PolitiFact Florida	2009	State	1,435	10.3%	29
3	PolitiFact Texas	2010	State	1,354	9.7%	38
4	PolitiFact Wisconsin	2010	State	1,259	9.0%	10
5	PunditFact	2013	Not State	948	6.8%	NA
6	PolitiFact Georgia	2010	State	862	6.2%	16
7	PolitiFact Ohio	2010	State	591	4.2%	18
8	PolitiFact Rhode Island	2010	State	544	3.9%	4
9	PolitiFact Virginia	2010	State	537	3.8%	13
10	PolitiFact New Jersey	2011	State	395	2.8%	14
11	PolitiFact Oregon	2010	State	390	2.8%	7
12	PolitiFact New Hampshire	2011	State	153	1.1%	4
13	PolitiFact California	2015	State	121	0.9%	55
14	PolitiFact Global News Service	2016	Not State	91	0.7%	NA
15	PolitiFact Missouri	2015	State	90	0.6%	10
16	PolitiFact New York	2016	State	88	0.6%	29
17	PolitiFact North Carolina	2016	State	88	0.6%	15
18	PolitiFact Pennsylvania	2016	State	80	0.6%	20
19	PolitiFact Tennessee	2012	State	76	0.5%	11
20	PolitiFact Illinois	2016	State	63	0.5%	20
21	PolitiFact Nevada	2016	State	41	0.3%	6
22	PolitiFact Arizona	2016	State	38	0.3%	11
23	PolitiFact Colorado	2016	State	29	0.2%	9
24	PolitiFact Iowa	2015	State	12	0.1%	6
25	PolitiFact NBC	2016	Not State	3	0.0%	NA

Given that only a subset of all of the PolitiFact Editions are responsible for the vast majority of the statement ratings, I am going to focus on those Editions which individually were responsible for more than 1% of the Truth-O-Meter ratings. These top 12 PolitiFact Editions were cumulatively responsible for 94.0% of the total statements rated.

Looking at the most important PolitiFact Editions, I have set up a metric to measure the average truthfulness of the statements that they have rated. Any statement that is rated as True, I have assigned a Truthfulness Score of +2. Likewise, any statement that PolitiFact has rated False, I have assigned a Truthfulness value of -2. Mostly True, Half-True, and Mostly False statements thus correspond to scores of +1, 0, and -1 respectively. Pants on Fire! claims are “False” statements that are especially ridiculous, so I have given them a Truthfulness Score of -3 (see Table below).

Truthfulness Score per Statement Rating
Statement Rating	Truthfulness Score
True	2
Mostly True	1
Half-True	0
Mostly False	-1
False	-2
Pants on Fire!	-3

Using then the top 12 PolitiFact Editions, I have generated a Mean Truthfulness Score for all of the statements PolitiFact has examined. This number takes the total number of statements by rating (e.g. True, Mostly True, etc.), the Truthfulness Score I have assigned each rating, and then divides the sum of those Truthfulness values by the total number of ratings that each Edition has made. In other words, this metric says “What is the average truthfulness of a statement evaluated by each PolitiFact Edition?”.

As we see from the graph below, PunditFact has the lowest average Truthfulness score at -1.148, which means that the average statement that PunditFact rates is Mostly False. Rounding out the top three, PolitiFact Wisconsin and PolitiFact National were the next Editions with the lowest average Truthfulness Scores. On the other end of the spectrum PolitiFact Georgia and PolitiFact Ohio were the only Editions who had a positive truthfulness rating, meaning that the average statement they rated was at least Half-True.

2.2 Ratings Over Time

Given that PolitiFact provides the date that it has rated each statement, it may be interesting to explore if the relative truthfulness of the statements rated has changed over time. Since mid-2010, PolitiFact has generally issued about 3-4 statement ratings per day. In the below graph, I have taken the Mean Truthfulness Score for all of the statements that PolitiFact has evaluated each month (excluding months where PolitiFact rated fewer than 30 statements) and a smooth line to estimate the average Mean Truthfulness Score over time.

Some interesting trends present themselves.

The Mean Truthfulness Score for each month is generally negative (the overall rating for all statements PolitiFact has rated was -0.33). There were only two occasions where the Mean Truthfulness Score was notably positive for a few months in a row: September and October of 2007 (early in PolitiFact’s rating history) and around December 2012 / January 2013. To a smaller extent, this same positive blip also occurred in December 2014 / January 2015. Maybe the PolitiFact staff around the holiday season join in the spirit of giving by becoming more generous with their ratings?
I had thought that the presidential campaign season, which for the sake of argument I have set as one-year prior to the presidential election, would have seen several months with particularly low Mean Truthfulness Scores. My thinking was that close to the presidential election, politicians would be less truthful in order to capture undecided voters at the last minute. At least from this graph, these factors appear to be uncorrelated though that may be due to the fact that there are not many presidential candidates per election and the number of them decreases as the election date approaches which in turn generates fewer statements for PolitiFact to rate.
The Mean Truthfulness Score for each month has become remarkably negative since April 2017, with the record low in December 2017 at -1.359 as you can see in the graph below. Each month since April (except October) has broken the previously low record set in August 2009. Why is this so? Some possible ideas may be:
1. Increased political polarization whereby politics has gotten to the point where candidates will say anything to win
2. An artefact of the post-truth era some pundits think we have entered into?
3. Donald Trump and his notoriety for telling falsehoods
4. Or is there some other explanation?

2.3 Recent Ratings

Seeing as how quickly PolitiFact’s monthly Mean Truthfulness Score becomes very negative after the 2016 presidenital election, I decided to separate the ratings into three mutually exclusive categories to figure out why:

I flagged as Website any rating of a statement made by an entity ending in .com, .net, .org, .us, .edu, .site, .gov or that had the word email, facebook, blog, tweet, or image in its name.
I flagged as Trump any rating of a statement made by Donald Trump.
I flagged all other ratings as Other.

I took the subtotal Mean Truthfulness values for these three statement categories and divided by the total number of ratings issued each month by PolitiFact. These steps allows me to determine what each of the three categories contributed to PolitiFact’s total Mean Truthfulness Score for each month.

For example, in December 2017 for all of PolitiFact there were 78 total statements rated with a Mean Truthfulness Score of -1.359. This value correlates to the average statement being at least Mostly False, and this month has the lowest Mean Truthfulness Score since PolitiFact was founded as you can again see in the graph below. Of that -1.359 Mean Truthfulness Score:

With 15 statements rated, Websites contributed -0.5 to this month’s Mean Truthfulness Score, equal to 36.8% of it
With 7 statements rated, Donald Trump contributed -0.115 to this month’s Mean Truthfulness Score, equal to 8.46% of it
And the remaining 56 statements rated contributed -0.744 to this month’s Mean Truthfulness Score, equal to the remaining 54.7% of it

Since December 2016, ratings of statements made by websites have been the single largest contributor to the highly negative Mean Truthfulness Scores of late. This coincides with the founding of PolitiFact’s partnership with Facebook to fact-check claims made on the social media site, and likely represents a push overall to rate the accuracy of more claims made online. [^2] Since December 2016, PolitiFact has rated an average of 18 statements made online per month and these statements have an average rating of -2.766, which is closest to Pants on Fire!. Prior to December 2016, PolitiFact only rated an average of 4.2 statements from websites per month, and with an average rating of -1.782 these ratings were closer to False.

3. Subject

PolitiFact groups together many of the statements it rates by the subject that statement addresses (e.g. Abortion, Patriotism, Obama Birth Certificate, etc.) here. PolitiFact also notes the frequency with which it assigns its six ratings (i.e. True, Mostly True, Half-True, Mostly False, False, and Pants on Fire!) to statements in each subject. Currently there are 149 different subjects. The most frequently discussed subjects and their Mean Truthfulness Scores are below:

Most Discussed Subjects in PolitiFact Ratings
Rank	Subject	Total Ratings	Mean Truthfulness Score
1	Health Care	1,580	-0.582
2	Economy	1,508	-0.067
3	Taxes	1,340	-0.278
4	Education	996	-0.102
5	Jobs	962	-0.166
6	Federal Budget	958	-0.170
7	State Budget	920	-0.216
8	Candidate Biography	830	-0.488
9	Elections	826	-0.317
10	Immigration	747	-0.503

Looking at the most truthfully discussed and most falsely discussed subjects, it comes as no surprise that topics such as “Fake news” and Obama’s Birth Certificate have very negative Mean Truthfulness Scores. I leave it to others to ponder why topics such as “Population”, “Redistricting”, and “Gambling” are more truthfully discussed.[^3] I believe part of it has to do with uneven statement sampling.

4. Individuals

And finally, the last part of my analysis looks at various U.S. elected politicians’ individual records with the Truth-O-Meter. My intent has been to capture the most important U.S. politicians, whom I have defined to include current and recently former Presidents, Vice-Presidents, Presidential Candidates, Senators, Representatives, and Governors. [^4] In total, I will focus on the 467 Democratic and Republican U.S. politicians from these groups who have made statements that PolitiFact has rated.

Interestingly, PolitiFact has rated statements by 19.2% more Republican politicians than Democratic ones (254 vs. 213) but it has rated 62.7% more statements by Republican politicians than statements by Democratic ones (4,085 vs. 2,511). See the table below:

Number of Statements by Major U.S. Politicians by Party
Party	Number of Politicians Rated	Pants on Fire!	False	Mostly False	Half-True	Mostly True	True	Total Ratings
Democratic	213	69	308	331	596	682	525	2,511
Republican	254	306	890	802	822	693	572	4,085

Next, I will look at which politicians have the highest Mean Truthfulness score by party. The five Democratic politicians with the highest Truthfulness Score are Sherrod Brown, Tim Kaine, Hillary Clinton, Bill Clinton, and Barack Obama while the five Democrats with the lowest Truthfulness Score are Terry McAuliffe, Nancy Pelosi, Debbie Wasserman Schultz, Tammy Baldwin, and Joe Biden. The five Republican politicians with the highest Truthfulness Score are Rob Portman, Nathan Deal, John Kasich, Jeb Bush, and Rand Paul while the five Republicans with the lowest truthfulness score are Michele Bachmann, Donald Trump, Ted Cruz, Newt Gingrich, and Rick Santorum. The five least truthful Democrats all have a higher Mean Truthfulness Score than the five least truthful Republicans.

Currently, there are 0 major U.S. Politicians who have not made a False or Pants on Fire! statement, among those with 30 or more statements rated by PolitiFact. The adage that “All politicians lie” seems to be accurate, but it should be followed up with “but some lie a lot more than others.”

4.1 Overall Party Mean Truthfulness Score

The final part of my analysis includes grouping all of the major U.S. politicians who have received ratings from PolitiFact by party. This may be an accurate reflection of which party is more “truthful” or it may reflect selection-bias by me or by PolitiFact. Nonetheless, after performing a z-test on statements by major U.S. politicians from the Democratic and Republican party, I can say with greater than 99% confidence that the Mean Truthfulness Score for statements made by major U.S. Democratic politicians is statistically higher than that for major U.S. Republican politicians. [^5]

5. Final Words

This project is simply an effort to improve my data wrangling and analysis skills as well to create a personal and demonstrable product of my abilities. While I have no connection to PolitiFact, I deeply appreciate the work they do and I encourage others to do the same. I apologize for using up their server space while scraping their site, but hopefully any exposure of their work and financial contributions from myself and others can more than repay that.

For information on joining PolitiFact, see here.

If you have any suggestions about ideas to extend this analysis, please share them. You are free to share this github site (attributing me as the author) or to use any of the R code I have written for your own private, non-commercial use. Simply put, please respect the time and effort I put into this project.

Footnotes

[^1] Not until I had done a lot data wrangling did I realize that PolitiFact seems to be missing from its site any ratings issued in November 2008. While it has ratings through October 31, 2008 and beginning again on December 1, 2008, there are no ratings at all for November 2008. Visit this page and nearby ones to verify (http://www.politifact.com/truth-o-meter/statements/?page=669)

[^2] It appears that PolitiFact’s efforts to fact-check claims appearing on Facebook do not appear in its Truth-O-Meter ratings, which puts them beyond the scope of my analysis. When I discuss PolitiFact’s ratings of statements made by websites, I only analyze those that appear in PolitiFact’s Truth-O-Meter ratings.

[^3] Not all subjects listed on PolitiFact’s subjects page are actual issues. While subjects such as Abortion and Islam capture statements that refer to Abortion or Islam, the subject “This Week - ABC News” captures statements instead made by politicians and pundits while actually on that television show.

[^4] PolitiFact has rated statements by 3,666 different entities. In my analysis of statements by major politicians by their political party, I used the following lists:

U.S. Senators and Representatives from the 109th to the 115th Congressional Sessions, available here. I included all of the sessions that have overlapped with PolitiFact’s existence, which began in 2007.
Current U.S. State Governors, available here
Currently living former U.S. State Governors, available here
2008, 2012, and 2016 presidential candidates
The eight Presidents and Vice-Presidents from 1993 to Present (PolitiFact has not yet rated any statements by presidents prior to Bill Clinton).

When I refer to ‘Democrats’ and ‘Republicans’ in my analysis, I am referring to politicians from the above lists who I have been successfully able to match to individuals who have had statements rated by PolitiFact. Because the matching process is difficult, I cannot guarantee that every individual who falls into the following categories has been accounted for but this should include enough politicians to make my conclusions and analysis valid. I furthermore sought to ensure that every valid U.S. politician who has had at least 10 statements rated by PolitiFact was included. The political party designation used comes from the above mentioned sources.

The remaining entities whose individual records I have intentionally not chosen to analyze include:

Organizations, civic initiatives, PACs, campaigns, party committees and the like
Websites
Pundits such as Rush Limbaugh, Sean Hannity, Rachel Maddow, etc.
And Politicians whose only positions thus far are:
- At the state level
- In a non-elected appointment (i.e. to the cabinet level, ambassadorship, or the Supreme Court)
- Unsuccessful campaigns for a political seat (other than U.S. president)
- Not-yet assumed office (i.e. Governor-elect, Senator-elect, etc., but I do include Presidents-elect)
- Or not clearly associated with either the Democratic or Republican parties (e.g. I have excluded Gary Johnson, Ralph Nader, and Jesse Ventura, but not Bernie Sanders or Joe Lieberman).

Keep in mind that many politicians who did fit my criteria may at some point have held a position on the above list.

[^5] A z-test is used when the population variance and standard-deviation are known. Because I am considering the population only to be the statements that PolitiFact has rated (rather than all statements all politicians have made), I believe a z-test is more appropriate than a t-test. For Democrats, the standard deviation of their Mean Truthfulness Score is 1.388 and for Republicans it is 1.519 (all numbers in this footnote are rounded to three digits). In the z-test, the null hypothesis is that the true difference between the Mean Truthfulness Score of Democrats and Republicans is equal to zero; currently the difference between those means is 0.637. With 99 percent confidence, the true difference between these means is estimated to be between 0.543 and 0.731.