So Many Hardballs, So few over the plate
Conclusions from our “Debate” with Donald Foster
Ward E.Y. Elliott 1, Robert J. Valenza2
Abstract. Foster’s critique of our work is overdrawn, has left our findings 99.9% intact.
Key words. Stylometry, Shakespeare authorship, Elizabethan poems, Elizabethan plays
In 1996 and 1998, CHum published several articles in a heated debate between Donald Foster and the authors of this article. This note is the authors’ final response. -- N.I. and E.M.
1. Was Our Study Foul Vapor?
This is the last of four “Responses”, two by Donald Foster, two by us, to the final report of our Claremont Shakespeare Clinic, published by CHum in 1996/97 after Foster had warned us that publishing our results would destroy our reputations. When we persevered, Foster arranged for CHum to publish his Response pronouncing our work “worthless”, “madness”, and “foul vapor”.
We checked out his charges and found him to be dead right -- on four small points. We missed three whenas’s, one whereas, and an I’m in Shakespeare’s plays, and there was a minor glitch in one of our programs, Textcruncher, which threw off some of our counts by a percent or two. All of these combined amounted to an error of a tenth of a percent in our original findings. We fixed them all and published the corrected figures in our First Response (our 1998/1999) with no changes in our overall conclusions. None of Foster’s other charges were substantiated. Some were plainly wrong.
Foster’s 21-page Second Response (his 1998/1999) was much longer than his first, but equally harsh in its rhetoric, and equally lacking in substantiation. CHum has strictly limited our response to it here to eight pages, but there is a longer version on our webpage. In his Second Response, Foster reaffirmed his prior charges of our “cherry picking” and “stacking the deck” and added harsh new complaints of our “idiocy”, our “arbitrary and chaotic handling of ... data”, our “astonish[ing] ... methodological sloppiness”, our “toddling toward a precipice from day one”, our “silent and extensive alteration of data”, our “defamatory” and “assaultive” article full of “invented quotations”, our “gerrymandering”, and our “ventriloquized self-flagellation”.
Little of this invective can stand scrutiny. Foster has had six years to specify which of our quotations are “invented” and what he thinks he actually did say, but he has never done so. He has quietly redefined his charges of “cherry-picking,” “deck-stacking,” and “gerrymandering”, but it hasn’t helped his case. The old problem was that we “banished” plays he himself had told us were of questionable authorship from our Shakespeare baseline. He dismissed our recollection on this point as “quite mistaken” but immediately conceded that all but one of our “banished” plays were “widely considered by scholars to be non-Shakespearean” (his 1998/1999, 509, n. 12). Just so. He was right on the concession, wrong on the charge.
His new concept of “deck-stacking” seems to be that A Lover's Complaint and FE pass many of the “original 54 tests for which Venus and Adonis, The Rape of Lucrece, and the Sonnets receive ‘not-Shakespeare’ rejections”. The problem here, however, is not us stacking the deck but him trying to deal CHum readers cards from his own deck as if they came from ours. All quantitative tests are sensitive to sample size. The bigger the sample, the more variance it averages out, and the more tests can be used. Many tests that work on big, 20,000-word play-sized blocks don’t work on little, 3,000-word blocks and should not be so misapplied. We have made this point four times to CHum readers (our 1996, 204-05, our 1998/1999, 430-31, 435, 442), and carefully avoided confusing large-block validations with small-block. Foster was not so careful. He is blaming us for a test we did not use and should not have used.
The problem with our sinister-sounding “silent and extensive alteration of data” turns out to be that, after several years of corrections and refinements, our tests have gotten better and better at accepting Shakespeare and rejecting non-Shakespeare. Now we don’t just get 100% reliability from all three test rounds combined in accepting core Shakespeare and rejecting non-Shakespeare, we also get better than 95% reliability, in rejecting non-Shakespeare from each round separately, while still getting 100% reliability in accepting core Shakespeare. This growing redundancy and robustness should be cause for congratulation, not condemnation.
As for the “defamatory personal charges,” in our “assaultive article”, Foster has it backwards. We haven’t called his work idiocy or foul vapor, nor have we falsely accused him of deck-stacking or inventing quotations. Dat veniam corvis, vexat censura columbas.
So much for his four most damning charges. We are not remotely guilty of any of them. He also makes seven lesser charges: that our “literary advisors” dismissed the Clinic as idiocy; that our copytexts were “never…edited”; that we improperly “suppressed” BoB4 but failed to suppress our other BoB tests; that we used tests like O v. Oh, reflecting editor’s, not author’s preference; that we didn’t provide tallies for relative clauses but nonetheless miscounted them by “as much as 50%”; that our claimed Textcruncher error for no + not is not 9/10 of a percent, but nine percent; that we “simply forgot” the two not’s in FE’s prose dedication; that we “clearly misunderstood” our leaning microphrase tests; and that we miscounted whereas’s, whenas’s, first- and last-word it’s, hark’s, list’s, and see’s.
The first six of these are completely wrong. The people he referred to were not our literary advisors and did not bail, as he erroneously claims, though they did supply – but not substantiate -- the spicy adjectives he gleefully embraces. Our comparison texts – unlike Foster’s in his Elegy by W.S. -- were carefully edited to commonize spelling, standardize sample size, and separate prose from verse. But they were not aggressively repunctuated, as Foster wants us to do, in keeping with his own aggressive editing of FE. He raised the Elegy’s average sentence length by 44 percent, and more than doubled its percentage of run-on (enjambed) lines. Then he announced that its resultant long sentences and high enjambment rates were sure signs that Shakespeare must have written it! We think the hazards of such editing far outweigh its benefits.
We covered both of his BoB points in our First Response (our 1998/1999, pp. 432-37) and shall not repeat our arguments here. He has responded to none of the points we made. His Oh v. O and two related criticisms again attack us for a test we didn’t use in our CHum report. It might have been a good point in 1990 when we did try these tests, before discarding them for the reasons he cites. It’s not such a good point nine years later. His relative-clause charges are self-contradictory and wholly without substantiation. 9/1,000’s is nine-tenths of a percent, just as we said, not nine percent, as he maintains. He thinks we should have counted two not’s in FE’s prose dedication. But he has elsewhere chastised us of our supposed “disregard for prosody and genre” (his 1996, 248, his 1998/1999, 505). We would urge him to follow his own rules, as we did in this case, and count only the indicators in the body of the poem.
Foster made our leaning-microphrase tests a rhetorical hot-spot issue in his First Response, dismissing them as “foul vapor.” In his hardball Second Response he used similar harsh language, but, at the end – to our surprise – seems to have conceded the point and dismissed the whole question as “much ado about nothing” (his 1998/1999, 505).
In our First Response we conceded that Foster was partially right about
his last point. Using fuller context and reclassifying some
If debates have any value, it is to expose both sides’ arguments and evidence to an opposing viewpoint, revealing otherwise-undetected errors, which can then be corrected. By our count, over the whole debate, Foster has charged us with over 30 errors, most of them serious enough, we take it, to confirm his diagnosis of idiocy on our part. We have admitted to four of these errors, but they are all trivial. They are the Textcruncher glitch, the BoB dating qualification, and our undercounts of I’m’s by one and whereas/whenas’s by four. These four, all of which we corrected in our First Response, made no more than a 1/10 of one percent change in our overall results, a rather modest change, we would think, considering the harshness of the accusations it must be measured against. In all, Foster got only four of his 30-plus hardballs over the plate, a disturbingly high error rate for someone as intolerant of errors as he professes to be.
He, by contrast, has made over 40 errors of his own, about half in each response, and about half of them major. We have already discussed his first-response errors (our 1998/1999, 440-44). Among the major errors in his Second Response are: falsely claimed our copytexts were “unedited”; got it wrong that BoB4 was improperly “suppressed” and that the other BoB’s were redundant and should have been suppressed; got it wrong that we “clearly misunderstand” our own leaning microphrase tests; failed repeatedly, again, to allow for sample size; dredged up his previous false charges of “deck-stacking” our baseline; falsely tried to deny his own 1987 Dubitanda selections; never seems to have thought about how his criticism of our tests might bear on his own tests; and -- about a half-dozen times -- tried to nail us for tests we didn’t use.
The bad news from this exchange is that we and our students got two undeserved bashings for ten years of good work and have had to spend five more years picking off the mud, scrutinizing it, and, in the end, defending ourselves, at CHum’s request, much more briefly and bluntly than we would have preferred. The good news is that the debate did force us to reexamine our methods and findings (and Foster’s), especially those most pertinent to FE. It highlighted important differences between our approach and his. We have commonized our comparative samples systematically; he hasn’t. We have controlled for sample and baseline size; he hasn’t. We have controlled consistently for date and genre; he hasn’t. We have explained and supported every step of our analysis. He hasn’t. When good evidence shows we have made a substantive mistake, we have admitted it and fixed it. He hasn’t.
Above all, we have used “silver-bullet” tests, tending to disprove common authorship, while he has used “smoking-gun” tests, seeking to prove it. Silver-bullet tests are orders of magnitude more reliable, both in theory and in practice. If your foot is size five and fits Cinderella’s glass slipper, it does not prove that you are Cinderella. You could just as well be Little Miss Muffet. But, if your foot is size ten, it is strong evidence that you are not Cinderella (our 1997, 183-85). “Could be” is never as strong a proof as “couldn’t be” is a disproof.
The other good news is that, after two years of determined bashing by an authorship blackbelt who was not pulling any punches, our work did not shatter. 99.9% of our original results still stand, and we have fixed the .1% that were wrong with no change in the bottom line. This is not, of course, to say that Foster’s failed assault has eliminated all our expectations of further erosion. Some erosion still seems inevitable for methods as sweeping, novel, and experimental as ours, but it is more likely to come from close focus on one or two disputed texts than from a global assault like Foster’s CHum Responses.
2. Did Shakespeare Write the Funeral Elegy?
What about the other half of the debate, the one arguing whether or not Shakespeare wrote the Elegy? This one, never far below the surface of the CHum debate, was conducted explicitly in PMLA and in Elegy by W.S. by Foster, and in The Shakespeare Quarterly and Literary and Linguistic Computing by us (Foster, 1996a, 1989; our 1997, 2001). Here, too, despite Foster’s ringing proclamations that FE is “Shakespeare’s beyond all reasonable doubt”, it seemed to us that his ascription was in big trouble. It now seems so to him as well (below). FE is indeed loaded with “smoking-gun” features that W.S. shared with Shakespeare. But it is even more loaded with features not shared with Shakespeare: enclitic, proclitic, and no/no+not scores far below Shakespeare’s range, and odd, un-Shakespearean usages, such as adventer instead of adventure, an husband, instead of a husband, thank (noun) instead of thanks (noun), and none other instead of no other (our 1997). Each of these is a silver bullet in the Shakespeare ascription. The patient might survive two or three such hits, but a dozen such hits are far too many for a cheerful prognosis. Brian Vickers (forthcoming) and Gilles Monsarrat (2002) now argue from smoking-gun evidence that FE is also much more loaded with features shared with John Ford than with features shared with Shakespeare. Our studies show that the odds of Shakespeare authorship are 3,000 times worse than the odds for Ford (our 2001). In June, 2002, without having read Vickers and with no direct mention of us, but supposedly convinced by Monsarrat, Foster finally publicly conceded that Ford was the obvious author.
Does this mean that Foster’s Elegy by W.S. is no longer a trove of authorship lore, that SHAXICON is worthless, that Foster’s golden ear for authorship is a myth? Not at all. Elegy by W.S. is wrong about the Elegy, but right about much else. SHAXICON remains an inspired idea not yet verified. Foster’s computer-aided ear was gold for the author of Primary Colors and the scholarly readers of his manuscript for Elegy by W.S. It was tin for the Elegy and for the author of the Jon Benét Ramsey ransom note. We are less wedded than Foster to the maxim, erratum in uno, erratum in omnibus and more inclined to believe that a certain amount of error and uncertainty come with the territory and with the novel methods we (and Foster) were trying out.
For us, the good part
of the debate was tracking Shakespeare with powerful new tools. The bad part was scraping off the mud. Looking for Shakespeare’s hand in a dreary,
pietistic Ford Elegy is hardly as inspirational as looking for it in the Sonnets, or even looking for it in the
poems of Spenser, Donne, or Marlowe. We
once likened our controversy with Foster over the Elegy to a land war in Asia
over the literary equivalent of the
On the other hand, the
Crain, Caleb. 1998. “The Bard’s Fingerprints” LinguaFranca, July/August, p. 29.
Donald. (1989): “Elegy” by W.S.: A
Study in Attribution.
Monsarrat, Gilles. (2002): “A Funeral Elegy: Ford, W.S., and Shakespeare.” Review of English Studies, 53, 186.
 The longer rejoinder, our 2000, available in hardcopy on request, can be found at http://govt.claremontmckenna.edu/welliott/hardball.htm. For a history of the Shakespeare Clinic, see http://govt.claremontmckenna.edu/welliott/shakes.htm.
 Cf. these Foster pronouncements: (1) “I know you [Patsy Ramsey] are innocent—know it absolutely and unequivocally. I will stake my professional reputation on it, indeed my faith in humanity”. (2) [months later] “It is not possible that any individual except Patsy Ramsey wrote the ransom note”. (3) “In the ten years I have done textual analysis, I've never made a mistaken attribution”. And, perhaps most telling, (4) “All I need to do is get one attribution wrong ever, and it will discredit me not just as an expert witness … but also in the academy” (see http://www.jameson245.com/foster_page.htm; Crain, 1998, p. 30).