So Many Hardballs, So few over the plate
Conclusions from our “Debate” with Donald
Foster
Ward E.Y. Elliott 1, Robert J. Valenza2
1Claremont
(ward.elliott@claremontmckenna.edu)
Abstract. Foster’s critique of our work is overdrawn, has left our findings 99.9% intact.
Key words. Stylometry, Shakespeare authorship, Elizabethan poems, Elizabethan plays
Editor’s note
In 1996 and 1998, CHum published several articles in a heated
debate between Donald Foster and the authors of this article. This note is the
authors’ final response. -- N.I. and
E.M.
1. Was Our Study Foul Vapor?
This is the
last of four “Responses”, two by Donald Foster, two by us, to the final report
of our Claremont Shakespeare Clinic, published by CHum in 1996/97 after Foster had warned us that publishing our
results would destroy our reputations. When we persevered, Foster arranged for CHum to publish his Response pronouncing
our work “worthless”, “madness”, and “foul vapor”.
We checked out his charges and found
him to be dead right -- on four small points.
We missed three whenas’s, one whereas, and an I’m in Shakespeare’s plays, and there was a minor glitch in one of
our programs, Textcruncher, which threw off some of our counts by a percent or
two. All of these combined amounted to
an error of a tenth of a percent in our original findings. We fixed them all
and published the corrected figures in our First Response (our 1998/1999) with
no changes in our overall conclusions. None of Foster’s other charges were
substantiated. Some were plainly wrong.
Foster’s 21-page Second Response
(his 1998/1999) was much longer than his first, but equally harsh in its rhetoric, and equally
lacking in substantiation. CHum has
strictly limited our response to it here to eight pages, but there is a longer
version on our webpage.[1] In his Second Response, Foster reaffirmed his
prior charges of our “cherry picking” and “stacking the deck” and added harsh
new complaints of our “idiocy”, our “arbitrary and chaotic handling of ...
data”, our “astonish[ing] ... methodological sloppiness”, our “toddling toward
a precipice from day one”, our “silent and extensive alteration of data”, our
“defamatory” and “assaultive” article full of “invented quotations”, our
“gerrymandering”, and our “ventriloquized self-flagellation”.
Little of this invective can stand
scrutiny. Foster has had six years to
specify which of our quotations are “invented” and what he thinks he actually
did say, but he has never done so. He
has quietly redefined his charges of “cherry-picking,” “deck-stacking,” and
“gerrymandering”, but it hasn’t helped his case. The old problem was that we “banished” plays
he himself had told us were of questionable authorship from our Shakespeare
baseline. He dismissed our recollection
on this point as “quite mistaken” but immediately conceded that all but one of
our “banished” plays were “widely considered by scholars to be
non-Shakespearean” (his 1998/1999, 509, n. 12). Just so. He was right on the concession, wrong on the
charge.
His new concept of “deck-stacking” seems to be that A Lover's Complaint and FE pass many of the “original 54 tests
for which Venus and Adonis, The Rape of
Lucrece, and the Sonnets receive
‘not-Shakespeare’ rejections”. The problem here, however, is not us stacking
the deck but him trying to deal CHum
readers cards from his own deck as if they came from ours. All quantitative tests are sensitive to
sample size. The bigger the sample, the
more variance it averages out, and the more tests can be used. Many tests that
work on big, 20,000-word play-sized blocks don’t work on little, 3,000-word
blocks and should not be so misapplied. We have made this point four times to CHum readers (our 1996, 204-05, our
1998/1999, 430-31, 435, 442), and carefully avoided confusing large-block
validations with small-block. Foster was
not so careful. He is blaming us for a
test we did not use and should not have used.
The problem with our sinister-sounding “silent and extensive alteration
of data” turns out to be that, after several years of corrections and
refinements, our tests have gotten better
and better at accepting Shakespeare and rejecting non-Shakespeare. Now we don’t just get 100% reliability from all three test rounds combined in
accepting core Shakespeare and rejecting non-Shakespeare, we also get better
than 95% reliability, in rejecting non-Shakespeare from each round separately, while still getting 100% reliability in
accepting core Shakespeare. This growing redundancy and robustness should be
cause for congratulation, not condemnation.
As for the “defamatory personal charges,” in our “assaultive article”,
Foster has it backwards. We haven’t
called his work idiocy or foul vapor, nor have we falsely accused him of
deck-stacking or inventing quotations. Dat veniam corvis, vexat censura columbas.
So much for his four most damning charges. We are not remotely guilty of any of
them. He also makes seven lesser
charges: that our “literary advisors” dismissed the Clinic as idiocy; that our
copytexts were “never…edited”; that we improperly “suppressed” BoB4 but failed
to suppress our other BoB tests; that we used tests like O v. Oh, reflecting editor’s, not author’s preference; that we
didn’t provide tallies for relative clauses but nonetheless miscounted them by
“as much as 50%”; that our claimed Textcruncher error for no + not is not 9/10 of a percent, but nine percent; that we
“simply forgot” the two not’s in FE’s prose dedication; that we “clearly
misunderstood” our leaning microphrase tests; and that we miscounted whereas’s, whenas’s, first- and last-word it’s,
hark’s, list’s, and see’s.
The first six of these are completely wrong. The people he referred to were not our
literary advisors and did not bail, as he erroneously claims, though they did
supply – but not substantiate -- the spicy adjectives he gleefully embraces.
Our comparison texts – unlike Foster’s in his Elegy by W.S. -- were carefully edited to commonize spelling,
standardize sample size, and separate prose from verse. But they were not aggressively repunctuated,
as Foster wants us to do, in keeping with his own aggressive editing of FE.
He raised the Elegy’s average sentence length by 44 percent, and more
than doubled its percentage of run-on (enjambed) lines. Then he announced that
its resultant long sentences and high enjambment rates were sure signs that
Shakespeare must have written it! We
think the hazards of such editing far outweigh its benefits.
We covered both of his BoB points in our First Response (our 1998/1999,
pp. 432-37) and shall not repeat our arguments here. He has responded to none of the points we
made. His Oh v. O and two related criticisms again attack us for a test we
didn’t use in our CHum report. It might have been a good point in 1990 when
we did try these tests, before discarding them for the reasons he cites. It’s not such a good point nine years later.
His relative-clause charges are self-contradictory and wholly without
substantiation. 9/1,000’s is nine-tenths of a percent, just as we said, not
nine percent, as he maintains. He thinks we should have counted two not’s in FE’s prose dedication. But
he has elsewhere chastised us of our supposed “disregard for prosody and genre”
(his 1996, 248, his 1998/1999, 505). We
would urge him to follow his own rules, as we did in this case, and count only
the indicators in the body of the poem.
Foster made our leaning-microphrase tests a rhetorical hot-spot issue
in his First Response, dismissing them as “foul vapor.” In his hardball Second
Response he used similar harsh language, but, at the end – to our surprise –
seems to have conceded the point and dismissed the whole question as “much ado
about nothing” (his 1998/1999, 505).
In our First Response we conceded that Foster was partially right about
his last point. Using fuller context and reclassifying some
If debates have any value, it is to expose both sides’ arguments and
evidence to an opposing viewpoint, revealing otherwise-undetected errors, which
can then be corrected. By our count,
over the whole debate, Foster has charged us with over 30 errors, most of them
serious enough, we take it, to confirm his diagnosis of idiocy on our part. We
have admitted to four of these errors, but they are all trivial. They are the Textcruncher glitch, the BoB
dating qualification, and our undercounts of I’m’s by one and whereas/whenas’s
by four. These four, all of which we corrected in our First Response, made no
more than a 1/10 of one percent change in our overall results, a rather modest
change, we would think, considering the harshness of the accusations it must be
measured against. In all, Foster got only four of his 30-plus hardballs over
the plate, a disturbingly high error rate for someone as intolerant of errors
as he professes to be.
He, by contrast, has made over 40 errors of his own, about half in each response, and about half of them major. We have already discussed his first-response errors (our 1998/1999, 440-44). Among the major errors in his Second Response are: falsely claimed our copytexts were “unedited”; got it wrong that BoB4 was improperly “suppressed” and that the other BoB’s were redundant and should have been suppressed; got it wrong that we “clearly misunderstand” our own leaning microphrase tests; failed repeatedly, again, to allow for sample size; dredged up his previous false charges of “deck-stacking” our baseline; falsely tried to deny his own 1987 Dubitanda selections; never seems to have thought about how his criticism of our tests might bear on his own tests; and -- about a half-dozen times -- tried to nail us for tests we didn’t use.
The bad news from this exchange is that we and our students got two
undeserved bashings for ten years of good work and have had to spend five more
years picking off the mud, scrutinizing it, and, in the end, defending
ourselves, at CHum’s request, much
more briefly and bluntly than we would have preferred. The good news is that the debate did force us
to reexamine our methods and findings (and Foster’s), especially those most
pertinent to FE. It highlighted important differences between our
approach and his. We have commonized our comparative samples systematically; he
hasn’t. We have controlled for sample
and baseline size; he hasn’t. We have
controlled consistently for date and genre; he hasn’t. We have explained and
supported every step of our analysis. He
hasn’t. When good evidence shows we have made a substantive mistake, we have
admitted it and fixed it. He hasn’t.
Above all, we have
used “silver-bullet” tests, tending to disprove common authorship, while he has
used “smoking-gun” tests, seeking to prove it. Silver-bullet tests are orders
of magnitude more reliable, both in theory and in practice. If your foot is size five and fits
Cinderella’s glass slipper, it does not prove that you are Cinderella. You could just as well be Little Miss
Muffet. But, if your foot is size ten,
it is strong evidence that you are not
Cinderella (our 1997, 183-85). “Could be” is never as strong a proof as
“couldn’t be” is a disproof.
The other good news is
that, after two years of determined bashing by an authorship blackbelt who was
not pulling any punches, our work did not shatter. 99.9% of our original results still stand,
and we have fixed the .1% that were wrong with no change in the bottom
line. This is not, of course, to say
that Foster’s failed assault has eliminated all our expectations of further
erosion. Some erosion still seems
inevitable for methods as sweeping, novel, and experimental as ours, but it is
more likely to come from close focus on one or two disputed texts than from a
global assault like Foster’s CHum
Responses.
2. Did Shakespeare Write the Funeral Elegy?
What about the other
half of the debate, the one arguing whether or not Shakespeare wrote the Elegy? This one, never far below the surface of the CHum debate, was conducted explicitly in
PMLA and in Elegy by W.S. by Foster, and in The
Shakespeare Quarterly and Literary and Linguistic Computing by us (Foster, 1996a, 1989; our
1997, 2001). Here, too, despite Foster’s
ringing proclamations that FE is
“Shakespeare’s beyond all reasonable doubt”, it seemed to us that his
ascription was in big trouble. It now
seems so to him as well (below). FE
is indeed loaded with “smoking-gun” features that W.S. shared with
Shakespeare. But it is even more loaded
with features not shared with
Shakespeare: enclitic, proclitic, and no/no+not
scores far below Shakespeare’s range, and odd, un-Shakespearean usages, such as
adventer instead of adventure, an husband, instead of a
husband, thank (noun) instead of thanks (noun), and none other instead of no
other (our 1997). Each of these is a
silver bullet in the Shakespeare ascription. The patient might survive two or
three such hits, but a dozen such hits are far too many for a cheerful
prognosis. Brian Vickers (forthcoming) and
Gilles Monsarrat (2002) now argue from smoking-gun evidence that FE is also much more loaded with
features shared with John Ford than with features shared with Shakespeare. Our studies show that the odds of Shakespeare
authorship are 3,000 times worse than the odds for Ford (our 2001). In June,
2002, without having read Vickers and with no direct mention of us, but
supposedly convinced by Monsarrat, Foster finally publicly conceded that Ford
was the obvious author.
Does this mean that
Foster’s Elegy by W.S. is no longer a
trove of authorship lore, that SHAXICON is worthless, that Foster’s golden ear
for authorship is a myth? Not at all. Elegy by W.S. is wrong about the Elegy,
but right about much else. SHAXICON
remains an inspired idea not yet verified.
Foster’s computer-aided ear was gold for the author of Primary Colors and the scholarly readers
of his manuscript for Elegy by W.S. It was tin for the Elegy and for the author
of the Jon Benét Ramsey ransom note.[2] We are less wedded than Foster to the maxim, erratum in uno, erratum in omnibus[3]
and more inclined to believe that a certain amount of error and uncertainty
come with the territory and with the novel methods we (and Foster) were trying
out.
For us, the good part
of the debate was tracking Shakespeare with powerful new tools. The bad part was scraping off the mud. Looking for Shakespeare’s hand in a dreary,
pietistic Ford Elegy is hardly as inspirational as looking for it in the Sonnets, or even looking for it in the
poems of Spenser, Donne, or Marlowe. We
once likened our controversy with Foster over the Elegy to a land war in Asia
over the literary equivalent of the
On the other hand, the
Notes
References
Crain, Caleb. 1998.
“The Bard’s Fingerprints” LinguaFranca,
July/August, p. 29.
Foster,
Donald. (1989): “Elegy” by W.S.: A
Study in Attribution.
Monsarrat,
Gilles. (2002): “A Funeral Elegy: Ford,
W.S., and Shakespeare.” Review of
English Studies, 53, 186.
Vickers, Brian.
Forthcoming: Counterfeiting
Shakespeare.
[1] The longer rejoinder, our 2000, available in hardcopy on request, can be found at http://govt.claremontmckenna.edu/welliott/hardball.htm. For a history of the Shakespeare Clinic, see http://govt.claremontmckenna.edu/welliott/shakes.htm.
[2] As reported in CBS, 48 Hours, 8 April, 1999. See also the “perpetrator”’s web page: http://www.jameson245.com/foster_page.htm (accessed June 7, 2001)
[3] Cf. these Foster pronouncements: (1) “I know you [Patsy Ramsey] are innocent—know it absolutely and unequivocally. I will stake my professional reputation on it, indeed my faith in humanity”. (2) [months later] “It is not possible that any individual except Patsy Ramsey wrote the ransom note”. (3) “In the ten years I have done textual analysis, I've never made a mistaken attribution”. And, perhaps most telling, (4) “All I need to do is get one attribution wrong ever, and it will discredit me not just as an expert witness … but also in the academy” (see http://www.jameson245.com/foster_page.htm; Crain, 1998, p. 30).