US Presidential Debate Sep 26 2016

This is a Jupyter notebook detailing an analysis of the language used in the first debate.

Prepared by Doug Blank, Bryn Mawr College
For full discussion, see: http://blankversusblank.blogspot.com/2016/09/post-debate-analysis.html

Data from http://www.nytimes.com/2016/09/27/us/politics/transcript-debate.html

There were some errors that I corrected, so you can use the version here first_debate.txt.

First, we read the text into an array of lines:

In [165]:
text = [line.strip().replace("\n", " ").replace(".", " ").replace("?", " ")
        .replace("“", " ").replace("”", " ").replace(":", " ")
        .replace(",", " ").replace("—", " ").replace("-", " ")
        for line in open("first_debate.txt").readlines()]
text_all = " ".join(text)

A sample to see what it looks like:

In [166]:
text[0]
Out[166]:
'HOLT  Good evening from Hofstra University in Hempstead  New York  I’m Lester Holt  anchor of  NBC Nightly News   I want to welcome you to the first presidential debate '

Now, we break it down by speaker:

In [167]:
holt = ""
clinton = ""
trump = ""

current = None
for line in text:
    if not line:
        continue
    elif line in ["(APPLAUSE)", "(CROSSTALK)", "(LAUGHTER)"]:
        continue
    elif line.startswith("HOLT"):
        current = "HOLT"
        holt += line[4:] + " "
    elif line.startswith("TRUMP"):
        current = "TRUMP"
        trump += line[5:] + " "
    elif line.startswith("CLINTON"):
        current = "CLINTON"
        clinton += line[7:] + " "
    else:
        if current == "HOLT":
            holt += line + " "
        elif current == "TRUMP":
            trump += line + " "
        elif current == "CLINTON":
            clinton += line + " "
        else:
            raise Exception("No speaker?!")
            
holt = holt.lower()
clinton = clinton.lower()
trump = trump.lower()

clinton = clinton.strip()
while "  " in clinton:
    clinton = clinton.replace("  ", " ")
holt = holt.strip()
while "  " in holt:
    holt = holt.replace("  ", " ").strip()
trump = trump.strip()
while "  " in trump:
    trump = trump.replace("  ", " ").strip()

Characters:

In [168]:
len(holt), len(trump), len(clinton)
Out[168]:
(10400, 42263, 33173)

And split the text into words:

In [169]:
clinton_words = clinton.split(" ")
trump_words = trump.split(" ")
holt_words = holt.split(" ")

Number of total "words" spoken

In [170]:
len(clinton_words), len(trump_words), len(holt_words)
Out[170]:
(6237, 8139, 1878)

Number of unique words spoken

In [171]:
clinton_set = set(clinton_words)
trump_set = set(trump_words)
holt_set = set(holt_words)
In [172]:
len(clinton_set), len(trump_set)
Out[172]:
(1379, 1269)

Number of each word spoken, ranked from highest to lowest

In [173]:
def make_dict(words):
    d = {}
    for word in words:
        count = d.get(word, 0)
        d[word] = count + 1
    return d
In [174]:
clinton_dict = make_dict(clinton_words)
trump_dict = make_dict(trump_words)
In [175]:
common_words = ["the", "to", "and", "or", "that", "of", "a", "in", "have", "it", "be",
                "am", "are", "was", "were", "been", "be", "being", "is", "do", "would",
                "but", "what", "so", "with", "about", "at", "on", "has", "can", "as",
                "because", "when", "by", "an", "for", "this"]
In [184]:
for pair in sorted([items for items in clinton_dict.items() if items[0] not in common_words and
                   items[1] > 2],
                   key=lambda pair: pair[1], reverse=True):
    print("%s: %s" % pair)
i: 138
we: 122
you: 76
he: 56
our: 42
not: 40
think: 38
well: 36
people: 32
they: 31
know: 28
going: 27
donald: 26
your: 25
one: 24
need: 23
us: 22
who: 21
more: 21
will: 21
that’s: 21
them: 21
it’s: 21
really: 20
there: 20
their: 19
want: 19
his: 18
from: 18
said: 17
if: 17
we’re: 17
country: 16
just: 16
good: 16
we’ve: 16
jobs: 16
lot: 16
tax: 16
make: 15
get: 15
new: 15
out: 15
up: 15
business: 15
got: 14
how: 14
work: 14
some: 14
should: 14
go: 13
all: 13
economy: 13
very: 13
also: 12
american: 12
had: 12
nuclear: 11
see: 11
he’s: 11
no: 11
down: 10
debt: 10
don’t: 10
look: 10
million: 10
my: 10
too: 10
into: 10
i’ve: 10
actually: 10
many: 10
kind: 10
fact: 10
important: 10
now: 9
put: 9
other: 9
police: 9
deal: 9
did: 9
again: 9
information: 9
middle: 9
say: 9
wealthy: 8
those: 8
first: 8
isis: 8
iran: 8
let’s: 8
plan: 8
president: 8
talk: 8
back: 8
over: 8
then: 8
support: 8
years: 8
much: 7
i’m: 7
lester: 7
ever: 7
things: 7
why: 7
me: 7
class: 7
something: 7
even: 7
paid: 7
state: 7
states: 7
him: 7
working: 7
pay: 7
trade: 7
world: 7
time: 7
proposed: 7
only: 7
communities: 7
home: 7
where: 7
taken: 6
everyone: 6
top: 6
national: 6
young: 6
percent: 6
everything: 6
having: 6
thing: 6
man: 6
money: 6
called: 6
which: 6
they’ve: 6
part: 6
we’ll: 6
made: 6
different: 6
nations: 6
weapons: 6
take: 6
you’re: 6
any: 6
trying: 6
government: 6
after: 6
worked: 6
able: 6
better: 6
against: 6
cyber: 5
returns: 5
obama: 5
two: 5
future: 5
plans: 5
come: 5
iraq: 5
there’s: 5
right: 5
heard: 5
united: 5
kinds: 5
never: 5
these: 5
together: 5
maybe: 5
give: 5
fair: 5
system: 5
believe: 5
number: 5
done: 5
federal: 5
sure: 5
law: 5
long: 5
both: 5
justice: 5
use: 5
incomes: 5
looked: 5
hope: 5
let: 5
doing: 4
like: 4
secretary: 4
saying: 4
benefit: 4
facing: 4
started: 4
facts: 4
troops: 4
york: 4
black: 4
5: 4
does: 4
clean: 4
says: 4
families: 4
hack: 4
clear: 4
provide: 4
went: 4
finally: 4
worst: 4
best: 4
zero: 4
america: 4
determines: 4
debate: 4
deals: 4
means: 4
security: 4
gun: 4
try: 4
understand: 4
they’re: 4
way: 4
real: 4
donald’s: 4
family: 4
taxes: 4
nato: 4
military: 4
help: 4
foreign: 4
attacks: 4
question: 4
trillion: 4
buy: 4
sometimes: 4
making: 4
here: 4
biggest: 4
you’ve: 4
job: 4
reasons: 4
around: 4
russia: 4
hard: 4
year: 4
invest: 4
most: 4
off: 4
same: 4
growth: 4
energy: 4
businesses: 4
small: 4
add: 4
under: 4
away: 4
could: 4
trickle: 4
great: 4
create: 4
still: 4
second: 4
issues: 4
problems: 4
recession: 3
street: 3
father: 3
grow: 3
seen: 3
intelligence: 3
40: 3
another: 3
asked: 3
half: 3
americans: 3
deserve: 3
course: 3
vote: 3
opportunities: 3
building: 3
lose: 3
installers: 3
financial: 3
white: 3
start: 3
private: 3
live: 3
muslim: 3
near: 3
may: 3
she: 3
before: 3
keep: 3
every: 3
race: 3
sailors: 3
stand: 3
life: 3
prepared: 3
responsibilities: 3
college: 3
matter: 3
remember: 3
took: 3
her: 3
release: 3
budget: 3
crime: 3
face: 3
criminal: 3
investments: 3
china: 3
racist: 3
though: 3
war: 3
build: 3
met: 3
policy: 3
voted: 3
tried: 3
health: 3
men: 3
education: 3
efforts: 3
leadership: 3
putin: 3
lie: 3
$5: 3
abroad: 3
absolutely: 3
rising: 3
enough: 3
else: 3
word: 3
problem: 3
barack: 3
can’t: 3
intend: 3
african: 3
happen: 3
share: 3
east: 3
unfortunately: 3
ways: 3
defeat: 3
In [177]:
clinton_dict["china"], clinton_dict["plan"]
Out[177]:
(3, 8)
In [178]:
trump_dict["china"], trump_dict["plan"]
Out[178]:
(9, 3)
In [185]:
for pair in sorted([items for items in trump_dict.items() if items[0] not in common_words and
                   items[1] > 2],
                    key=lambda pair: pair[1], reverse=True):
    print("%s: %s" % pair)
i: 229
you: 189
we: 109
it’s: 72
they: 71
very: 65
our: 55
not: 50
country: 46
going: 43
all: 41
they’re: 40
look: 40
me: 39
said: 37
think: 37
them: 35
just: 35
she: 33
don’t: 32
will: 32
say: 32
i’m: 29
that’s: 29
people: 28
no: 28
doing: 27
out: 27
know: 27
get: 26
secretary: 25
clinton: 25
should: 25
one: 25
we’re: 24
many: 24
want: 23
years: 22
now: 21
thing: 21
things: 21
did: 21
their: 20
like: 20
your: 20
can’t: 20
if: 20
much: 20
other: 20
companies: 20
her: 20
jobs: 19
good: 19
really: 19
well: 19
way: 19
my: 19
some: 19
go: 18
these: 18
from: 17
money: 17
tell: 17
new: 17
over: 16
great: 16
you’re: 16
into: 15
believe: 15
time: 15
lot: 15
tax: 15
us: 15
up: 15
could: 15
leaving: 15
bad: 14
ever: 14
world: 14
deal: 14
isis: 14
he: 14
against: 14
agree: 13
got: 13
more: 13
first: 13
bring: 13
which: 13
back: 13
war: 12
trillion: 12
she’s: 12
i’ll: 12
right: 12
lester: 12
even: 12
than: 12
wrong: 12
countries: 12
where: 12
see: 11
done: 11
had: 11
how: 11
tremendous: 11
take: 11
better: 10
nato: 10
i’ve: 10
also: 10
hillary: 10
politicians: 10
why: 10
there: 10
let: 10
stop: 10
president: 10
doesn’t: 10
taken: 9
there’s: 9
trade: 9
job: 9
didn’t: 9
you’ve: 9
never: 9
him: 9
china: 9
times: 9
maybe: 9
whether: 8
big: 8
trump: 8
come: 8
taking: 8
far: 8
then: 8
almost: 8
campaign: 8
russia: 8
company: 8
community: 8
talking: 8
deals: 8
iran: 8
need: 8
sean: 8
nuclear: 8
happened: 8
regulations: 7
help: 7
obama: 7
greatest: 7
who: 7
saying: 7
nafta: 7
started: 7
down: 7
work: 7
nobody: 7
haven’t: 7
defend: 7
give: 7
experience: 7
nothing: 7
before: 7
30: 7
middle: 7
able: 7
worst: 7
under: 7
paying: 7
make: 7
fact: 7
hannity: 7
day: 7
business: 7
losing: 7
wait: 7
taxes: 7
last: 6
seen: 6
korea: 6
mean: 6
everybody: 6
endorsed: 6
ok: 6
he’s: 6
website: 6
long: 6
another: 6
$20: 6
oil: 6
respond: 6
spent: 6
something: 6
problem: 6
little: 6
debate: 6
political: 6
mess: 6
happen: 6
watch: 6
mexico: 6
release: 6
east: 6
important: 6
laws: 5
cyber: 5
somebody: 5
hundreds: 5
terms: 5
strongly: 5
put: 5
probably: 5
billions: 5
nation: 5
york: 5
million: 5
fed: 5
biggest: 5
north: 5
after: 5
debt: 5
everything: 5
anywhere: 5
used: 5
called: 5
stamina: 5
land: 5
dollars: 5
income: 5
question: 5
major: 5
formed: 5
single: 5
percent: 5
off: 5
true: 5
disaster: 5
only: 5
cannot: 5
fight: 5
i’d: 5
10: 5
old: 5
talks: 5
we’ve: 5
approve: 5
getting: 5
name: 5
made: 5
list: 5
african: 5
murders: 5
totally: 5
four: 5
lots: 5
returns: 5
temperament: 5
care: 5
certainly: 5
states: 4
wealthy: 4
places: 4
certificate: 4
different: 4
interest: 4
supposed: 4
500: 4
number: 4
read: 4
asked: 4
american: 4
thinking: 4
ask: 4
soon: 4
frisk: 4
around: 4
raise: 4
produce: 4
place: 4
sent: 4
advantage: 4
perhaps: 4
year: 4
during: 4
lists: 4
lawsuit: 4
best: 4
any: 4
billion: 4
proud: 4
expand: 4
anybody: 4
gave: 4
actually: 4
iraq: 4
cut: 4
we’ll: 4
find: 4
minute: 4
airports: 4
relationships: 4
does: 4
once: 4
two: 4
audited: 4
brought: 4
japan: 4
kind: 4
real: 4
budget: 4
birth: 4
excuse: 4
what’s: 4
week: 4
talk: 4
they’ve: 4
needs: 4
inner: 4
cities: 4
left: 4
learn: 4
tough: 4
too: 4
build: 3
beautiful: 3
night: 3
myself: 3
his: 3
article: 3
audit: 3
feel: 3
15: 3
coming: 3
concerned: 3
reason: 3
energy: 3
yes: 3
wonder: 3
avenue: 3
met: 3
releases: 3
trying: 3
created: 3
unbelievable: 3
happy: 3
strong: 3
defective: 3
quickly: 3
5: 3
fine: 3
banks: 3
800: 3
course: 3
shows: 3
small: 3
call: 3
interview: 3
200: 3
$650: 3
badly: 3
michael: 3
control: 3
blumenthal: 3
mine: 3
life: 3
roads: 3
terror: 3
000: 3
looks: 3
e: 3
bit: 3
start: 3
thousands: 3
dnc: 3
hard: 3
telling: 3
arabia: 3
communities: 3
schedule: 3
pennsylvania: 3
history: 3
opening: 3
united: 3
plants: 3
isn’t: 3
economy: 3
cost: 3
millions: 3
told: 3
numbers: 3
own: 3
assets: 3
thinks: 3
same: 3
impact: 3
reporter: 3
estate: 3
every: 3
credit: 3
went: 3
anything: 3
saudi: 3
within: 3
approved: 3
person: 3
d: 3
keeping: 3
election: 3
end: 3
admirals: 3
fifth: 3
manager: 3
donald: 3
yeah: 3
mails: 3
since: 3
internet: 3
certain: 3
mainstream: 3
fault: 3
plan: 3
order: 3
ohio: 3
shear: 3
ahead: 3
ago: 3
winning: 3
win: 3
signed: 3
love: 3
nice: 3
report: 3
truth: 3
defending: 3
terrible: 3
fly: 3
obama’s: 3
family: 3
sell: 3
rates: 3
pay: 3
frankly: 3
oh: 3
most: 3
story: 3
power: 3
audit’s: 3
worth: 3

How often did they speak

In [180]:
text_all.count("TRUMP")
Out[180]:
126
In [181]:
text_all.count("CLINTON")
Out[181]:
93