# US Presidential Debate Sep 26 2016

This is a Jupyter notebook detailing an analysis of the language used in the first debate.

Prepared by Doug Blank, Bryn Mawr College 
For full discussion, see: http://blankversusblank.blogspot.com/2016/09/post-debate-analysis.html

Data from http://www.nytimes.com/2016/09/27/us/politics/transcript-debate.html

There were some errors that I corrected, so you can use the version here [first_debate.txt](first_debate.txt).

First, we read the text into an array of lines:

In [165]:
text = [line.strip().replace("\n", " ").replace(".", " ").replace("?", " ")
 .replace("“", " ").replace("”", " ").replace(":", " ")
 .replace(",", " ").replace("—", " ").replace("-", " ")
 for line in open("first_debate.txt").readlines()]
text_all = " ".join(text)

A sample to see what it looks like:

In [166]:
text[0]

'HOLT Good evening from Hofstra University in Hempstead New York I’m Lester Holt anchor of NBC Nightly News I want to welcome you to the first presidential debate '

Now, we break it down by speaker:

In [167]:
holt = ""
clinton = ""
trump = ""

current = None
for line in text:
 if not line:
 continue
 elif line in ["(APPLAUSE)", "(CROSSTALK)", "(LAUGHTER)"]:
 continue
 elif line.startswith("HOLT"):
 current = "HOLT"
 holt += line[4:] + " "
 elif line.startswith("TRUMP"):
 current = "TRUMP"
 trump += line[5:] + " "
 elif line.startswith("CLINTON"):
 current = "CLINTON"
 clinton += line[7:] + " "
 else:
 if current == "HOLT":
 holt += line + " "
 elif current == "TRUMP":
 trump += line + " "
 elif current == "CLINTON":
 clinton += line + " "
 else:
 raise Exception("No speaker?!")
 
holt = holt.lower()
clinton = clinton.lower()
trump = trump.lower()

clinton = clinton.strip()
while " " in clinton:
 clinton = clinton.replace(" ", " ")
holt = holt.strip()
while " " in holt:
 holt = holt.replace(" ", " ").strip()
trump = trump.strip()
while " " in trump:
 trump = trump.replace(" ", " ").strip()

## Characters:

In [168]:
len(holt), len(trump), len(clinton)

(10400, 42263, 33173)

And split the text into words:

In [169]:
clinton_words = clinton.split(" ")
trump_words = trump.split(" ")
holt_words = holt.split(" ")

## Number of total "words" spoken

In [170]:
len(clinton_words), len(trump_words), len(holt_words)

(6237, 8139, 1878)

## Number of unique words spoken

In [171]:
clinton_set = set(clinton_words)
trump_set = set(trump_words)
holt_set = set(holt_words)

In [172]:
len(clinton_set), len(trump_set)

(1379, 1269)

## Number of each word spoken, ranked from highest to lowest

In [173]:
def make_dict(words):
 d = {}
 for word in words:
 count = d.get(word, 0)
 d[word] = count + 1
 return d

In [174]:
clinton_dict = make_dict(clinton_words)
trump_dict = make_dict(trump_words)

In [175]:
common_words = ["the", "to", "and", "or", "that", "of", "a", "in", "have", "it", "be",
 "am", "are", "was", "were", "been", "be", "being", "is", "do", "would",
 "but", "what", "so", "with", "about", "at", "on", "has", "can", "as",
 "because", "when", "by", "an", "for", "this"]

In [184]:
for pair in sorted([items for items in clinton_dict.items() if items[0] not in common_words and
 items[1] > 2],
 key=lambda pair: pair[1], reverse=True):
 print("%s: %s" % pair)

i: 138
we: 122
you: 76
he: 56
our: 42
not: 40
think: 38
well: 36
people: 32
they: 31
know: 28
going: 27
donald: 26
your: 25
one: 24
need: 23
us: 22
who: 21
more: 21
will: 21
that’s: 21
them: 21
it’s: 21
really: 20
there: 20
their: 19
want: 19
his: 18
from: 18
said: 17
if: 17
we’re: 17
country: 16
just: 16
good: 16
we’ve: 16
jobs: 16
lot: 16
tax: 16
make: 15
get: 15
new: 15
out: 15
up: 15
business: 15
got: 14
how: 14
work: 14
some: 14
should: 14
go: 13
all: 13
economy: 13
very: 13
also: 12
american: 12
had: 12
nuclear: 11
see: 11
he’s: 11
no: 11
down: 10
debt: 10
don’t: 10
look: 10
million: 10
my: 10
too: 10
into: 10
i’ve: 10
actually: 10
many: 10
kind: 10
fact: 10
important: 10
now: 9
put: 9
other: 9
police: 9
deal: 9
did: 9
again: 9
information: 9
middle: 9
say: 9
wealthy: 8
those: 8
first: 8
isis: 8
iran: 8
let’s: 8
plan: 8
president: 8
talk: 8
back: 8
over: 8
then: 8
support: 8
years: 8
much: 7
i’m: 7
lester: 7
ever: 7
things: 7
why: 7
me: 7
class: 7
something: 7
even: 7
paid: 7
sta

In [177]:
clinton_dict["china"], clinton_dict["plan"]

(3, 8)

In [178]:
trump_dict["china"], trump_dict["plan"]

(9, 3)

In [185]:
for pair in sorted([items for items in trump_dict.items() if items[0] not in common_words and
 items[1] > 2],
 key=lambda pair: pair[1], reverse=True):
 print("%s: %s" % pair)

i: 229
you: 189
we: 109
it’s: 72
they: 71
very: 65
our: 55
not: 50
country: 46
going: 43
all: 41
they’re: 40
look: 40
me: 39
said: 37
think: 37
them: 35
just: 35
she: 33
don’t: 32
will: 32
say: 32
i’m: 29
that’s: 29
people: 28
no: 28
doing: 27
out: 27
know: 27
get: 26
secretary: 25
clinton: 25
should: 25
one: 25
we’re: 24
many: 24
want: 23
years: 22
now: 21
thing: 21
things: 21
did: 21
their: 20
like: 20
your: 20
can’t: 20
if: 20
much: 20
other: 20
companies: 20
her: 20
jobs: 19
good: 19
really: 19
well: 19
way: 19
my: 19
some: 19
go: 18
these: 18
from: 17
money: 17
tell: 17
new: 17
over: 16
great: 16
you’re: 16
into: 15
believe: 15
time: 15
lot: 15
tax: 15
us: 15
up: 15
could: 15
leaving: 15
bad: 14
ever: 14
world: 14
deal: 14
isis: 14
he: 14
against: 14
agree: 13
got: 13
more: 13
first: 13
bring: 13
which: 13
back: 13
war: 12
trillion: 12
she’s: 12
i’ll: 12
right: 12
lester: 12
even: 12
than: 12
wrong: 12
countries: 12
where: 12
see: 11
done: 11
had: 11
how: 11
tremendous: 11
take: 1

## How often did they speak

In [180]:
text_all.count("TRUMP")

126

In [181]:
text_all.count("CLINTON")

93