Ward Elliott and Robert Valenza / Debate with Donald Foster

SO MUCH HARDBALL, SO LITTLE OF IT OVER THE PLATE
CONCLUSIONS FROM OUR “DEBATE” WITH DONALD FOSTER

Ward E.Y. Elliott and Robert J. Valenza
Claremont McKenna College, Claremont, CA 91711, USA
(e-mail: ward.elliott@claremontmckenna.edu)

October 26, 2002

A false balance is abomination to the Lord: but a just weight is his delight. Proverbs, 11:1

Anything that’s false is bad. Steven Ballmer, 23 Sept. 1999

Abstract. This article was originally written to be the last of four “Responses”, two by Donald Foster, two by us, in a “debate” over the final report of our Claremont Shakespeare Clinic, published by Computers and the Humanities (CHum) in 1996/97. CHum’s editor limited our last response (but not Foster’s) to six pages. We did so reluctantly in the published version (our 2002), but promised to post this, the full original, on the web to provide fuller substantiation.

Foster was not pleased with our conclusion that Funeral Elegy by W.S. and a couple of confirming texts were not Shakespeare’s work, and he thought our methods left much to be desired. He deplored our “worthless figures”, “cherry-picking”, “gerrymandering”, “silent and extensive alteration of data”, and “arbitrary and chaotic handling of data.” He decried our work as “idiocy”, “madness”, and “foul vapor”. And he reproached us for being “assaultive” and “defamatory”.

A few small points aside, we are not persuaded. He is right about three whenas’s, one whereas, and an I’m that we missed, and about a glitch in our analytical software which threw some of our counts off by a percent or so. All of these combined amount to an error of a tenth of a percent in our findings. We published the corrected figures in our First Response (our 1998/99) with no changes in our overall conclusions. As far as we can tell, the other Foster charges are all entirely groundless or wrong. So, as Foster himself has since conceded (Abrams and Foster, 2002), is his Shakespeare ascription for the Funeral Elegy.

Key Words: authorship, Shakespeare, attribution, claimants, apocrypha, Elegy by W.S., stylometry

1. Introduction

1.1. BACKGROUND

In 1987 a team of Claremont College students in the Claremont Shakespeare Clinic set out with computers to shorten the list of 58 “claimants” to the true authorship of Shakespeare’s poems and plays by ruling out authors whose works did not match Shakespeare’s. We were the Clinic’s faculty advisors and the ones who refined, developed, and wrote up its results. In January of 1997, several generations of students later, we published the Clinic’s “final report” in CHum, concluding that none of the 37 testable claimants, and none of 27 plays in the Shakespeare Apocrypha, matched Shakespeare (our 1996, appeared 1997). Several “Shakespeare” poems “discovered” in the 1980’s, including Shall I die? As This is Endless, and Funeral Elegy by W.S. (FE), also flunked our Shakespeare tests. So did A Lover’s Complaint, which most recent scholars have considered Shakespeare’s. So did Timon of Athens, Henry VI, Part 1, and parts of Pericles, Henry VIII, and Two Noble Kinsmen, texts in the Canon that most scholars believe are not by Shakespeare, solo. So, too, did two other plays from the Shakespeare canon, Henry VI, Part 3 and Titus Andronicus. There is no strong scholarly consensus on Shakespeare’s authorship of these two plays. Our tests suggested that both were co-authored with someone else. We cautioned our readers that our methods were novel, experimental, and applied wholesale, with seven-league boots, to millions of words of text. We expected some erosion by researchers who studied shorter texts more closely. And we did not take our conclusions to be the last word on the subject. Nevertheless, we thought we had made significant advances in using quantitative evidence to test authorship, and we were pleased and proud to have the Clinic’s findings and evidence appear in CHum.

1.2. FOSTER’S FIRST RESPONSE, 1996/97

We were not so pleased to find that Donald Foster, our onetime advisor-in-chief, had quietly appointed himself our censor-in-chief in the very same issue. Foster had warned us a year earlier that publishing our results would destroy our reputations. When we persevered, he somehow managed, without our knowledge or consent, and supposedly without a go-ahead or a scholarly review from either CHum’s old editor or its then-new ones, to get our article repackaged as a “debate” with him. His opinion of our character and competence had taken a turn for the worse. In his hardball First Response, he accused us of “rigorous cherry-picking”, “playing with a stacked deck”, and “conveniently exiling ... inconvenient evidence.” He thought our methods were “madness” and “foul vapor”, our numbers “worthless” and “no good”, and our conclusions “quite simply wrong.” “This is no way”, he concluded, “to conduct attributional scholarship” (Foster, 1996a).

1.3. OUR FIRST RESPONSE, 1998/99

Within a few weeks, in consultation with CHum’s then-new editors, we thought we had settled on some terms for the “debate” we found ourselves in. Besides our Final Report, 1996, each side would get two Responses, a long First and a shorter Second of “a few pages”, both sides’ Seconds to be “of equal length”, everything to be published “as soon as possible”. Foster’s First had already appeared simultaneously with our original Report (his 1996a). Our First appeared simultaneously with his Second (our 1998/99, his 1998/99). His Second, at 21 pages, was more than twice as long as his first and no less harsh and false. The present article was written to be our Second (and final) Response. Unfortunately, Chum’s editor did not feel bound by the “equal length” terms of the debate agreement, and “as soon as possible” turned out to be almost six years after the “debate” was initiated. Our published Second Reponse, she said, would be limited to four printed pages (later increased to eight, and ultimately reduced to six) or it would not appear at all. We acceded, reluctantly, and wrote a six-page Second Response (published in October 2002), with the understanding that readers would be referred to this longer, more documented, original, which we posted on our webpage in July, 2000.

In our First Response we concluded that Foster had had trouble getting his First-Response hardballs across the plate. We examined all of his assertions and found them, in a nutshell, to be 90% false and 10% true but trivial (our 1998/99). Of 21 Foster-charged gaffes in our 54 tests, only five had any validity at all, all minor. We thought they called for very slight qualifications or adjustments in five of the 54 tests, in no case invalidating the test questioned, and that they left the other 49 tests wholly intact. By our first estimate, total erosion of our findings could have amounted, at most, to half of one percent.

We also published revised results, including all Foster-inspired corrections, plus some ongoing corrections of our own (our 1998/99). These already-published results can serve, for practical purposes, as our second and final estimate because they included all our corrections for Foster’s Second Response, as well as to his First, albeit with little or no discussion as to how we arrived at our Second-Response corrections. We promised such discussion in this, our own Second Response, and we deliver it below. A comparison of our corrected results with our original ones makes it clear that the actual changes have, in fact, turned out to be less than a tenth of a percent, measured by total Foster-inspired changes to three of our test ranges for Shakespeare’s plays.[1] We made three changes in our 54 tests, all of them small. Considering the Folio Shakespeare, not just the Riverside, we now count nine I’m’s in Shakespeare’s plays, not five, as we had previously counted, nor six, as Foster seems to have counted (see our 1998/99, pp. 436-37). We now believe that six of the when as’s in the Riverside Shakespeare Canon can legitimately be counted as whenas’s, not three, as we had previously reckoned, nor eight, as Foster reckons (see below). And we fixed a glitch in Textcruncher, our student-designed analytical program which threw off some of our counts by as much as a percent or two (our 1998/99, 431-32).[2] Not a single test was invalidated by any of these corrections. No changes at all were called for in our I’m and whenas/whereas Shakespeare profiles, and the slight changes from fixing Textcruncher made next to no change in our overall results. We still got 100% passes for Core Shakespeare works; 100% rejections for Claimant and Apocrypha works. 49 of our tests remained unchallenged by Foster, and not one of Foster’s 40-odd other charges of “methodological madness” had enough substance to justify further changes in our data or conclusions.

Nor had a single one of Foster’s sensational First-Response accusations of cherry-picking, deck-stacking, and evidence-exiling withstood scrutiny. We were, and we remain, puzzled why he was so fierce in accusation and so threadbare in proof, and we were concerned that some of the hardballs were nowhere near the strike zone. Why had he tarred us for tests we hadn’t used? Why was he blaming us for “exiling” texts to the Dubitanda that he himself had told us were dubious? Why had he made such a show of lambasting us, not just for points we had rebutted in our last letter to him, but also for points we had conceded? “This”, we said, “was a good point the first time around, good enough for us to concede it immediately and acknowledge the concession by reference in our article. It is not such a good point if he continues to belabor us over our supposed stubborn failure to grasp it” (our 1998/99, p. 437). Dead errors don’t count.

1.4. FOSTER’S SECOND RESPONSE

Foster’s Second Response was another display of his hardball skills, no closer to the plate than the First. He did express momentary regret for “a few of [his] less delicate phrases.” But his second set of phrases, this one vetted with CHum’s editors and scholarly readers and presumably more carefully considered, seems scarcely more delicate than his first, or more accurate. Our “idiocy” and “madness” (his 1998/99, p. 492) led his new list, approvingly quoted from two of our supposed “literary advisors”, followed by:

Our “arbitrary and chaotic handling of ... data” (his 492).
Our “fuming” (his 492).
Our “fiasco” (his 492).
Our “squandered opportunity” (his 492).
Our “massive sloppiness” (his 492).
Our “assaultive article, full of invented quotations” (his 492).
Our “defamatory personal charges” (his 493).
Our “dinning repetition of ... prose” (his 494).
Our “toddling toward a precipice from day one” (his 494).
Our “silent and extensive alteration of data” (his 496).
Our “iron indifference” to perpetuated mistabulations (his 499).
Our “astonish[ing] ... methodological sloppiness” (his 497).
Our “rubber hatchet” (his 506). And our “ritualized reenactment of ventriloquized self-flagellation” (his note 3, p. 508).

Foster reaffirmed his prior charges of “cherry-picking” and “stacking the deck” (but tried to define and justify them differently) and added a new one, “badly gerrymander[ing]” (his 1998/99, pp. 500, 507). None of these new charges are consistent with his declared intention “not to start a public spat” and to “welcome scrutiny of [his] own work.” And none of them are true.

2. “The Mud”

Let us start, as he does, with his four charges most suggestive of duplicity and defamation on our part, the ones he candidly and appropriately calls “The Mud” (his 1998/99, p. 492). He alleges that many of our quotations of him “are, in fact, invented ... and no key to what I have actually said or written (his 1998/99, note 3);” that we cherry-picked, stacked the deck, and gerrymandered our texts (his 1998/99, pp. 500, 507); that we have made “defamatory personal charges” in our “assaultative article” (his 1998/99, p. 492); and that we have engaged in “silent and extensive alteration of data” (his p. 496).

2.1 “INVENTED QUOTATIONS”

He has had six years to tell us which of our quotations are “invented” and what he thinks he actually did say, but he has never seen fit to do so. Why has he been so persistently evasive? And how could he possibly hope to get this issue cleared up without specifying what exactly it is and how it should be resolved? Till he does, we count it as yet another pairing of harsh accusation with weak proof, or, in this case, no proof at all.

2.2 “CHERRY PICKING”

His renewed charges of cherry-picking and deck-stacking, and his new one of gerrymandering (his 1998/99, pp. 500, 507), as we read them, have themselves been “silently and extensively altered”, but are no better supported in his Second Response than they had been in his First. In his First Response, the alleged problem was our “conveniently exiling ... inconvenient data”, our “banishing” of plays in the Canon supposed to be jointly- or other-authored from our core Shakespeare baseline. Since it was Foster himself who told us which plays were suspect, we thought he had little standing to bash us for doubting them ourselves. But he did, nonetheless (our 1998/99, pp. 433-34). He has since dismissed our recollection on this point as “quite mistaken” (his 1998/99, note 12, p. 509) -- but has then gone on, in the very next breath, to concede that all but one play of his original dubitanda list were “widely considered by scholars to be non-Shakespearean”![3] He was right on the second breath, wrong on the first.

His new concept of “deck-stacking” seems to be that A Lover’s Complaint and FE “pass many of the original 54 tests for which Venus and Adonis, The Rape of Lucrece, and the Sonnets receive ‘not-Shakespeare’ rejections’ [as silently and misleadingly redefined by himself]. In attributional work [he continues] this is called stacking the deck” (his 1998/99, p. 507). Wrong. This is not us stacking the deck but him trying to palm off some bad cards from his own deck as if they came from ours. We scrupulously separated our 14 tests that worked on 3,000-word poem blocks from our 51 tests that worked on 20,000-word play blocks because any block-and-profile test like ours, or his (1989, pp. 148-153), is sensitive to block size. Giant, 20,000-word blocks average out much more variance than small, 3,000-word ones and permit a much wider range of valid tests. Big-block play-validation and small-block poem-validation are two separate issues; they must be treated separately, and they were by us. We devoted a whole section of “And Then There Were None” to the distinction (our 1996, pp. 204-05) and re-emphasized the point three times in our First Response (our 1998/99, pp. 430-31, 435, 442). But, despite all our warnings, it doesn’t seem to have registered, and he keeps trying to nail us by confusing the two and “rigorously applying” (i.e., misapplying) our big-block, play-validated tests to small poem blocks. Unfortunately, he is still blaming us for a test we did not use and should not have used.

2.3 “DEFAMATORY PERSONAL CHARGES”

What about those “defamatory personal charges” he claims to have found in our “assaultative article?” He is referring to our 1997 Shakespeare Quarterly article, in which we bent over backwards to be both polite and accurate. In it we called his Elegy by W.S. “a treasure trove of Shakespeare identifiers” and his “Best-speaking witnesses” “an impressive example of what can be done with ... old-tech and new-tech methods” (our 1997, p. 179). But we also, in a postscript, correctly described our parting of the ways with Foster, who had informed us that our tests were “certainly wrong” and our work unfit for publication. Any attempt to publish our findings or present them at conferences on the Elegy, he said, would result in the destruction of our reputation -- by whom, we wondered? He “put us on notice that he did not think the world was big enough or uncertain enough to accommodate both our findings and his; that we were the ones who were ‘certainly wrong’; and that our choice was to put up or shut up” (our 1997, p. 206). As “defamatory personal charges”, none of this mild language begins to compare with what he had already said, and continues to say, of us and our work: “idiocy”, “madness”, “foul vapor”, “invented quotations”, “deck stacking”, and the rest -- topped off with his astonishing declaration that we are the ones who are assaultive! Dat veniam corvis, vexat censura columbas.[4]

2.4 “SILENT AND EXTENSIVE ALTERATION OF DATA”

What about our sinister-sounding “silent and extensive alteration of data?” Here his evidence turns out to be differences between different iterations of our summarized data from one year to the next. For example, we found 78%/94%/92% rejection of whole Claimant plays in the three rounds of our 1995 report and 100%/94%/98% rejections in our 1996 report (his 1998/99, p. 496). If he had looked at our further-revised data reported in our 1998/99, the figures would be changed yet again, slightly, to 98%/94%/98% rejections. If he had taken it back to our 1994 summary, the figures would have been considerably lower, 82%/91%/67%. Over the years, after many rounds of new corrections and refinements, our tests have gotten better and better at rejecting non-Shakespeare. Now we not only get 100% reliability from all three test rounds combined in accepting core Shakespeare and rejecting non-Shakespeare, we also get 100% reliability in accepting core Shakespeare from each round separately, and better than 95% reliability, in most cases, in rejecting non-Shakespeare. This growing redundancy and robustness, “overkill”, if you will, should be cause for congratulation, not condemnation. It helps explain why we have not found the prospect of erosion of some of our tests particularly alarming. You could knock out an entire round, a third of our 51 play tests, and still have zero false negatives -- “couldn’t be’s” -- for Shakespeare and zero false positives -- “could-be’s” -- for others. You could even knock out two entire rounds and still have zero false Shakespeare couldn’t-be’s and not much more than five percent false non-Shakespeare could-be’s. Foster’s criticisms, which showed us we were off by a tenth of a percent, have come nowhere near to producing such levels of erosion.

Hence, our “silent and extensive alteration of data” and our “suppression” of weak or redundant tests are actually evidence of diligence and success on our part, not the duplicity and “squandering of opportunity” that Foster’s heated rhetoric would suggest. They are exactly what you should expect to happen if you continue to recheck data, look for errors, redundancies, imprecisions, and inconsistencies, and correct them -- and if, as is looking more and more likely, the tests are good. We did this rechecking relentlessly throughout the Clinic, went on doing so long after the Clinic closed down, and shall doubtless continue to do so as long as we or others find enough new problems to justify the effort. When you refine and rerun tests, you make changes in profiles, pass rates, and rejection rates, and you have to change your summaries correspondingly. It’s proper, not sinister, to recheck and refine your data, and it’s encouraging, not sinister, if the refinements make them look better and better.

There are some hazards to this process of self-correction. The more judgmental your test, the more important it is to have the same rules in mind across the board while you are running it, to have it done, if possible, in the same way, at the same time, by the same person. When you revisit the same test years later, even if the tests were your own, and not those of some cherished student long since departed, the old rules are not as fresh in mind as they once were. You have to approximate them as best you can or rerun the entire data base with new ones. Piecemeal corrections months or years later can do more harm than good. Even if you have perfect replicability, which you normally don’t with manual tests, every time you make changes in a large spreadsheet with lots of hidden columns and mass copying you introduce the possibility -- a statistician would say the inevitability -- of making new transcription errors. We have gone through four generations of spreadsheet and four pairs of glasses since the Shakespeare Clinic started. With our old DOS spreadsheets, and our old bifocals, we had to guess where the column and row headings were. We lived in dread of getting lost, missing or misplacing an entry, and getting an entire column or row wrong, as Foster seems to have done in Elegy by W.S. (1989, pp. 149-50, last two columns) -- or of saving an old spreadsheet to track errors and then confusing it with the current one, or of misplacing a decimal point while converting raw Textcruncher scores like “The (2nd lws) / #”,”3.81679389312977E-02” into 38 second-to-last-word the’s per thousand sentences in the Elegy. Foster has blamed us twice for misplacing decimals (his 1996b , p. 248; his 1998/99, note 11) but turns out in both cases to have misplaced them himself (our 1998/99, p. 429; see 3.1 (9) below). Both his complaints are of a piece with that of the man protesting his speeding ticket: “But officer! How could I have been doing seventy miles an hour? I’ve only been driving for ten minutes”. If we were doing this again from scratch, and were willing to risk the transcription errors, we might well put all our rates per thousand into rates per hundred to minimize such confusion. It is harder to get lost in our new Excel spreadsheets (with a new set of bifocals) but, with Excel, the greater worry has been learning to use the new system, to bypass some of its rigidities and to get it to accept DOS conversions without wholesale freezeups and wiping out of data.

Despite these problems, we believe that correcting and refining data, over and over again, with any modern electronic spreadsheet is a hundred times quicker and easier, and considerably more reliable, than trying to do it without one. Any of the Iron Men of pre-1980’s stylometrics, who achieved amazing accuracy with slow hand counts and eye-glazing, data-corrupting, old-style manual spreadsheets, would laugh at our grumbling about electronic spreadsheets and appropriately liken us to the Princess and the Pea. Our hope in taking on the hazards of constant refinement has been to remove more errors than we have introduced, and we think we have done so. One indicator that we have succeeded is the very improvement in discrimination that Foster now deplores with his talk of silent and extensive alteration.

3. Method or Madness?

Foster, in his new bill of particulars against our “methodological madness” makes the following charges of our error:

(1) Our own “literary advisors” dismissed the Clinic as madness (his 1998/99, p. 492);
(2) Our copytexts “were never commonized” and “incredib[ly]” remained “unedited” throughout the life of the Clinic (his 495);
(3) we improperly “suppressed” one of our “bundle of badges” tests, BoB4 (his 496);
(4) we improperly failed to suppress our other BoB tests, which were redundant and structurally flawed (his 495);
(5) we used some tests like O vs. Oh, exclamation marks, and ‘d vs –ed endings more reflective of editor’s preference than of author’s (his 496);
(6) we improperly performed “silent and extensive alteration of data” (his 496);
(7) our doubts about Shakespeare’s authorship of A Lover’s Complaint, Funeral Elegy, and Hand D of Sir Thomas More, and some early and late plays in the Shakespeare Canon, despite their many resemblances to Shakespeare, prove that we were “cherry-picking” (his 500); and “stacking the deck;” and that our test regime was “badly gerrymandered” (his 507);
(8) we provided no separate tallies for relative clauses and miscounted them by “as much as 50%” (his 502-03);
(9) our claimed Textcruncher error for no/no+not is not 9/10 of a percent, but nine percent (his note 11, p. 509);
(10) our count of of 41 not’s in Funeral Elegy is off by two because we have “simply forgotten” the two not’s in the prose dedication (his note 11, p. 509);
(11) we “clearly misunder[stood]” our leaning microphrase tests (his 503-06);
(12) we miscounted whereas’s and whenas’s (his 498-501);
(13) we miscounted it’s as first and last word of sentences (his 498);
(14) we miscounted hark’s, list’s and see’s (his 501).

3.1. ELEVEN SHORT REJOINDERS

Do any of these have merit? For the first eleven charges the answer is plainly no, and the explanation can be brief.

(1) Our supposed “advisors” were not advisors and did not “bail.” Two of our colleagues did describe our work in less than adulatory terms, but neither one was an advisor to the Clinic, as Foster erroneously claims, and neither one “bailed”, as Foster erroneously declares (his 1998/99, p. 496). For them, as for Foster, the proper concern should not be how spicy are the names they called us, but how good is their evidence that the names describe anything that is real. Of that he has told us nothing, possibly because there is nothing there worth telling. Name-calling does not pass for argument or evidence in our fields, even if it is done by persons of the greatest distinction. Alternatively, Foster could be pulling rank on us, as the Dauphin did, sending a deprecatory tun of tennis balls to the fledgling Henry V (1.02.255-97). Perhaps it is his way of reminding us and CHum readers that we are only Little-League Shakespeare students “toddling toward a precipice from day one” (his 1998/99, p. 494) and lacking the credentials to play with the big boys in the Lit Department. If so, he is certainly right in a sense -- but not so right as to keep us from publishing “Glass Slippers” in the big boys’ top journal (our 1997). Which of our sharp-tongued colleagues can say as much?

(2) Our texts were far from “unedited.” Foster’s charge that we never commonized or edited our texts is absurdly misconceived, especially coming from one who has effusively and repeatedly thanked us, even in the midst of both his hardball barrages, for our gift of commonized texts to the Vassar Archive. Gathering and commonizing the necessary texts was among the greatest labors of the Clinic. We spent more hours at it than we did on any other single Clinic activity, normally with hired students doing the initial work and one of us going over their drafts personally with a Riverside Shakespeare spellchecker specially created for the purpose. Like that of the Riverside itself, our commonization was less than perfect, and its focus -- appropriately, given the time available, the tests that got our greatest attention, and the risks of introducing spurious indicators of our own -- was on spelling, not punctuation or hyphenation. Those we generally took as they came.

Moreover, once we had punctuation-based tests of our own, we were wiser to avoid heavy repunctuation than we would have been to get aggressive with it, as Foster insists we should have done. Aggressive editing has risks of its own when the editor has a stake in the outcome. It is one thing to rely on the judgment of outside professional editors and warn of its hazards, as we did. It is quite another to reedit aggressively ourselves, as Foster did with the Elegy, raising its average sentence length by 44 percent, more than doubling its percentage of enjambed lines, and then tell the world that its long sentences and resultant high rate of enjambment are sure signs that Shakespeare must have written it (our 1997, pp. 203-06)! We think the hazards of such editing far outweigh its benefits.

Finally, how serious are the real risks of leaving punctuation alone? We mentioned some (our 1996, pp. 208-09), but, with two exceptions (whereas/whenas and first- and last-word it’s), considered in the next section, Foster has merely listed our tests that might be invalidated by inconsistent punctuation, but not supplied any new evidence that the editing actually is inconsistent enough to degrade the test. Nor has he mentioned the implications of his criticisms for his own work. For example, we have seen no sign that he has done a hundredth as much commonization to the 100,000 words or so of elegiac poetry to which he compared FE (his 1989, pp. 148-150) as we did to the millions of words in the Claremont Archive, yet the same commonization issues he sees as “incredible” oversights on our part pertain to some of his tests as well: sentence length, open lines, compound words, and so on. Is he ready to retract these? Many of these issues are discussed in our 1996, pp. 208-09, and we have seen nothing in Foster’s evidence to change the conclusion we came to then: “Our objective was to not to get the ‘noise’ level to zero, but to get it low enough not to obscure our tests’ discrimination, and we believe we have generally succeeded in doing so.”

(3 and 4) BoB4 was dropped, and the others kept, for good reason. Foster faults us (his 496) for “suppressing” BoB4, which we considered redundant, and for not suppressing BoB3-7, which he considers redundant and structurally flawed (his 495), but we do not. We covered both issues at length in our 1998/99, pp. 432-37, and will not repeat our arguments here. He has responded to none of the points we made.

(5) Dead errors don’t count. This is another ill-starred attempt to nail us for tests we didn’t use in the Final Report he is supposedly critiquing. In this case he criticizes us for using O vs. Oh, exclamation marks, and ‘d vs. -ed endings as authorship tests, when they could as well have come from the editor as from the author. This would have been a good point in 1990, when we did consider these tests -- before we discarded them for exactly the reasons he cites. It’s not such a good point nine years later, in a supposed response to our 1996 final report, which used none of these tests. The same is true of the news accounts he cites of various sensational discoveries our students seemed on the verge of making during the life of the Clinic, but which got dropped later when we checked them out. None of these appear in the final report either, though you would never guess it from reading Foster’s critiques. Surely we must have made enough real, live errors that he shouldn’t have to rely so heavily on imaginary or dead ones.

(6 and 7) “Silent alternation” and “gerrymandering” charges: not substantiated. We have already discussed his rhetoric of “silent and extensive alteration of data”, “cherry-picking”, “gerrymandering”, and “stacking the deck” (Section 2 above). None of these are substantiated; all are wrong.

(8) Relative clauses charges: not substantiated. Our supposed failure to provide separate tallies of relative clauses (his 1998/99, pp. 502-03)? More wrong than right. We did provide them in our 1990 annual report, which he has, but we did not repeat the information in the final report. If it’s an error, it is not a substantive one, nor one he or anyone else should have trouble checking out – as he promptly reveals by telling readers that we miscounted relative clauses by “as much as 50%.” That would be a substantive error if he had supported it with evidence. But again he didn’t.

(9) Textcruncher no+not score was off by 9/10 of a percent, not nine percent. Foster claims it was really nine percent (111 instead of 120), not nine-tenths of a percent, as we claimed (his 1998/99, note 11). But the denominator of the 111 and the 120 is 1,000, not 100 (our 1996, p. 215). 9/1,000ths is nine-tenths of a percent, just as we said -- and, in any case, the small glitch in Textcruncher has now been fixed with no noticeable change in overall outcome.

(10) A Funeral Elegy has 41 not’s not 43. Foster thinks we should have counted two not’s in the Elegy’s prose dedication (his 1998/99, note 11). But he has also chastised us elsewhere for our supposed “disregard for prosody and genre” (his 1996, p. 248; his 1998/99, p. 505; see 11 below). We would urge him to follow his own rules, as we did in this case, keep the genres separated, and count only the indicators in the body of the poem.

(11) Leaning microphrases: conceded by Foster. These complicated, effective tests are interesting for several reasons. Foster made them a hot-spot rhetorical issue in his First Response. He lambasted our “undependable figures” and “subjective judgments” and our supposed “disregard for prosody and genre.” He castigated the tests as “at best, dubious, and at worst, foul vapor” (his 1996b, pp. 248-49). In his Second Response he couched his three pages of discussion in much the same hardball language, announcing, among other things, that we “clearly misunderstand even what it was [we] were testing” (his 1998/99, pp. 503-06). But at the end, to our surprise, he conceded the point as to FE and A Lover’s Complaint, the only such concession he has given us in the entire debate, and dismissed the whole question as “much ado about nothing” (his 1998/99, p. 505).

We could take issue with the three pages of hardball arguments before and after the FE/LC concession. He did leave it in, after all, and it is full of mistakes. We, who devised it, are not the ones who misunderstood this test -- and Foster’s citation of an uncommonized passage from Venus and Adonis (his 1998/99, p. 503) is not in keeping with his previous outrage at our “incredible” (and imaginary) failure to commonize our texts (his 1998/99, p. 495). He was right the first time. His failure to commonize the passage concealed its abundance of hyphenated compound words not countable as leaning microphrases and did needlessly muddy the waters. There are many more such mistakes in this section. But we have already been over this complicated subject twice (our 1996, pp. 201-02; our 1998/99, Section 3.2, pp. 428-30) and we, too, would not mind getting on to the other issues. And we certainly don’t want to cast too quarrelsome an eye over Foster’s one grudging concession. We’ll take it, both on the merits and in the hope that it might one day lead to a lifting of the dark cloud that has fallen over our relations with him.

3.2. THREE MORE-STUDIED REJOINDERS.

That leaves three points that call for closer scrutiny and might actually call for concessions on our part. Let us start with the one where we think we stand corrected.

(12) Foster is right about 3 when as’s in 3H6, and we were wrong -- but it does not change our Shakespeare profile. This test was another rhetorical hot spot in Foster’s First Response. He thought we had undercounted Robert Greene’s whereas’s and whenas’s by a “full 80%”, and that our “figures are wrong so often as to be worthless” (his 1996a, p. 254). His arithmetic was badly mistaken about the 80% (our 1998/99, note 2), but that still left a 44% difference between his Greene counts and ours. The reason: he long-counted every when as as a whenas, while we, more conservatively, did not -- except for clear instances in our Shakespeare corpus where the when as could not possibly be a when, as. We argued that either system would do, so long as consistently applied; that our figures are not worthless simply because they are conservative, any more than Foster’s are worthless for being liberal; and that his liberal counts for Greene, far from undermining our already-ample evidence that Greene is an unlikely Shakespeare claimant, made it even stronger (our 1998/99, pp. 438-39). These arguments, as far as we can tell, are still sound.

In his Second Response Foster repeats his original arguments and complains of our “iron indifference” (his 1998/99, p. 499) to his exhortations to use the long count. If we did, he says, we would find eight more whereas’s and whenas’s that we had overlooked in Greene’s Selimus, and yet more in plays by Kyd, Jonson, and Fletcher (his 1998/99, p. 499). So far, so bad. Selimus is not among the Greene plays we analyzed, so it should hardly be surprising that Foster found more whenas’s in it than we did. And he has given us no reason to inflate the count for the others, other than that it would get each play a rejection if we kept the Shakespeare-rejection cutoff at zero. But, if we switched to his liberal counting system for other people’s plays, we should do it for Shakespeare, too, raising the Shakespeare-rejection cutoff to one and cancelling out most of the new, long-counted Jonson, Kyd, and Fletcher rejections, so the actual changes in rejections would be small.

But Foster’s arguments get more telling when he addresses them to the Shakespeare Canon and supports them with commendably detailed line citations (his 1998/99 p. 499). With heavy whenas-using claimants like Greene it makes little difference whether you count long or short; either way you get a clear Shakespeare rejection. But with very light whereas- and whenas-users like Shakespeare, it does make a difference which system you use. Whichever one you use, you should use it consistently. When we first counted whereas’s and whenas’s, we did it with a KWIC stringfinder called Srch77, principally because it was much clearer and many hours faster than our early-1990’s word processor. Srch77 could pick out whereas’s where as’s, whenas’s and when as’s in 50 plays in seconds -- but it would give you only one line of context and could conceivably miss some where as’s and when as’s split between two lines. With the then-new 486 computers of the day, and the Word Perfect of the day, it took ten hours to do what Srch77 took ten minutes to do, and we did not consider the extra ounce of context worth the ton of extra time required.

But times change, computers and word processors get faster, Foster for once had given us a well-marked trail to follow, and we went back to the Canon with a word processor and with less of a groan than we would have uttered in 1994. Foster long-counts 15 whereas and whenas variants in “canonical Shakespeare” (his 1998/99, p. 499), by which he seems to mean the plays only. We long-count six whereas’s, one where as, and eight when as’s in the Riverside plays, for a total of 15, plus two when as’s in the Riverside poems. Neither of the when as’s in the poems, Ven 999 and Son 49.3, strikes us as a clear whenas, but the point is moot (and Foster’s strange notion that Elegy gets a “clear pass”, but Son and Ven do not (his 1998/99, p. 500), is mistaken), since we did not use the whereas/whenas test on poems.

Like Foster, we counted all six whereas’s in the plays: 1H6 1.02.84; 2.05.76, and 5.05.64; Per 1.02.34 and 1.04.70; and 2H6 4.04.34. We did not count the where as in 2H6 1.02.58 as a clear whereas in 1994 but now, with two more lines of context, think Foster is right to count it, even under our short-count rules. But 1H6 and Per 1 and 2 are widely doubted as Shakespeare’s work and were so doubted by Foster in 1987, though he seems since to have changed his mind about 1H6. Our tests confirm these doubts by firmly rejecting Shakespeare as sole author. Thus, five of the seven whereas equivalents are in “canonical Shakespeare”, since they appear in the Canon, but not “core Shakespeare”, since we, and most editors, doubt they are Shakespeare’s work. The last two, in 2H6, are outliers for us, but they are the only whereas equivalents in all of core Shakespeare. Since we had already given 2H6 a rejection for its one whereas, it did not get another one for its second whereas equivalent. 2H6’s overall rejection total is still only two, low enough to fit well within our Shakespeare profile. Foster is wrong in saying that we have concluded that 2H6 is another doubtful or collaborative play (his 1998/99, p. 500). We haven’t.

The biggest discrepancies between our short counts and Foster’s long ones are with the when as’s. There are eight of these (and no whenas’s) in the Riverside play canon: 3H6 1.02.74, 2.01.46, and 5.07.34; Cym 5.04.138 and 5.05.435; Err 4.04.137; Wiv 3.01.24; and Tit 4.04.92. Foster counts them all as whenas’s. We counted only the ones in Cym and Tit as whenas’s, considering the ones in 3H6, Err and Wiv unclear. Hence, Foster long-counts eight whenas equivalents, while we had short-counted only three. With more lines of context, we are no longer so sure about Cym, which could easily be two when, as’s, but we continue, conservatively, to count both its when as’s as whenas’s. Hence, we don’t differ with Foster on these. The passages in Err and Wiv are interrupted and don’t have enough context to count as clear when as’s under our rules. We do differ with him on these. But surely two, and probably all three of the when as’s in 3H6 should be whenas’s even by our short-counting conventions. We no longer differ with him on these. Hence, our revised whenas short count, following our own previous rules, has been doubled from three to six, and we differ from Foster only as to the two interrupted when as’s in Err and Wiv. To be both conservative and consistent with our prior short-counting rules, we have not tried to squeeze out any more when as’s in our short-counted claimant or apocrypha plays. If we did, as noted previously, it would only make more and firmer rejections than the ones we reported, not fewer or weaker ones.

The crucial question is not so much what our three new whenas equivalents in 3H6 mean for the claimants, but what they mean for our Shakespeare profiles. Do they loosen the profile? Foster’s answer, we take it, would be yes, that he and the “Shakespeare scholarly consensus” ascribe 3H6 solidly and entirely to Shakespeare, and that the play’s three whenas equivalents, together with five more in other canonical plays, show that whenas’s are not nearly as rare in Shakespeare as we make them out to be. We say no. Scholarly consensus on authorship of the entire H6 series is anything but solid (Evans, 1974, p. 588;[5] Wells and Taylor, 1987, p. 112 ).[6] And our other tests show 3H6, with six other rejections, to be much more likely a collaborative work than Shakespeare’s sole creation. The same is true of Titus Andronicus, which Foster seems to believe (as other scholars do not; see Wells and Taylor, 1987, pp. 113-15) is exclusively Shakespeare’s. It, too, has six other rejections by our tests. Its whenas equivalent, likewise, makes a seventh rejection, further weakening the case for sole Shakespeare authorship. Putting aside non-clear when as’s (in Err and Wiv) and non-core-Shakespeare when as’s, the only remaining countable whenas equivalents in all our core-Shakespeareare plays are the two rather doubtful ones in Cym. These are enough to make Cym a Shakespeare outlier on this test, but, like the whereas equivalents in 2H6, not quite enough to justify changing our profile.

(13-14) Foster’s challenges to our it as first or last word of sentence counts, and our hark/listen counts are not persuasive. Counting it’s should be among the easiest and least subjective of tests. Foster has faithfully tried to do so, with commendable attention to substantiation, and gotten much of it right -- apart from what looks like two major problems. The first problem is that none of his 1H6 citations for first- or last-word it’s actually has a first- or last-word it. We would guess that one of those dreaded transcription errors has crept in, the kind of thing that Foster might have called “astonish[ing] ... methodological sloppiness”, had he found it in our work. The second problem is that he appears to have badly miscounted or, more likely, transposed first-word it’s and last-word it’s in Titus Andronicus. With the Tit transposition fixed, most of his counts for first-word or last-word it’s produce frequency rates within one or two points of ours.

Why aren’t they identical? Surely not because it’s are hard to count. Could it be because of our “astonish[ing] methodological sloppiness”, as Foster proclaims (his 1998/99, p. 497)? Or could it be that counting sentences -- exactly as Foster has implicitly warned us with his demands for aggressive re-punctuation of copytexts -- is not so cut and dried as counting it’s?[7] First-time users of electronic word counters are always pleased with how much quicker and more accurate they are than manual counts, but then dismayed to find that no two word counters agree with each other, thanks to minor differences in the computer’s definition of a word. Computer-defined sentences are a step or two more complicated than word counters. Our favorite sentence counter, Grammatik 4 for DOS, gets 1365 sentences for 1H6, while whatever Foster is using (which probably does not use his own computer-resistant definition of sentence )[8] gets only 1311. The same play, counted by Word Perfect 8 and Word97 for Windows, respectively, gets readings of 2,839 and 2,829 sentences! Textcruncher counts sentences implicitly but does not report its results.

We would guess from the small, 0-2 point differences between Textcruncher’s counts and Foster’s counts of first- and last-word it’s, that its sentence counts are different from Foster’s, but much closer to Foster’s than to Word97’s. Where the sentence counters are in the same ballpark, as they normally are and appear to be in this case, the prudent course is to pick one and stick to it throughout the test run. If the test has quirks, as most do, at least they are the same quirks for the entire test run. If Foster sees a pressing need to use manual counts and a different sentence-counter, he is welcome to do so. It would be a different, but probably sufficient way of doing the same thing, getting similar, but not identical results. But it would also be much more vulnerable to mistakes, judging from what we have seen of it in his article. It hardly proves that our way was wrong, far less that it was so gravely wrong as to be “astonish[ing] methodological sloppiness.”

Foster’s critique of our hark/listen and see counts suffers from the same mistaken premise: that, because his counts differ from ours by a point or two, ours are unpardonably sloppy. This mistake is badly compounded by another, which makes the differences between his counts and ours look larger than they actually are. Our published figures are standardized to rates per 20,000 words and are clearly and repeatedly so described; for example, see our 1996, pp. 200, 222, and 224; our 1998/99, p. 436. Despite all the warnings, Foster has persisted in reading them as if they were raw numbers and then telling us, erroneously, that we have miscounted (his 1996b, p. 252; our 1998/99, p. 436; his 1998/99, p. 501), and that therefore “our figures are wrong so often as to be worthless” (his 1996a, p. 254).

This time he says that we counted 15 hark’s in The Tempest; that there are really only 11; and that therefore “nothing can be inferred” from our “sloppy tabulation” (his 1998/99, pp. 502). But what we actually reported was 15 hark adversions per 20,000 words, which amounts to 12 in a play the length of The Tempest (about 16,000 words). We went back to The Tempest, recounted hark adversions, and found 11 or 12, depending on whether you count “wilt thou be pleas’d to hearken” (Tmp 3.02.39) as a hark adversion. We probably did so count it in 1994, since it is a “usage which could invite someone else’s attention” (see our 1996, pp. 201, 229), but you could argue it either way. More important, you could get usable results either way, as long as you do it consistently.

These are manual counts; they do involve some exercise of judgment, and it should not be surprising if different people, or even the same person at different times, got slightly different results. Discrepancies between two different, but sufficient counting regimes are not per se proofs that one is unimpeachably right and the other intolerably wrong, only that the test, like many, is not immune to interpretational wobble. If the test gives you some discrimination -- and this one does -- it is generally better to limit the wobble by having the same person do the same test at the same time by the same rules, as we did, than “to have dumped the test altogether”, as Foster would prefer (his 1998/99, p. 500).

4. The Foster Scorecard

4.1. FIVE HITS, NO RUNS

In our First Response we asked who was winning the “debate” and encouraged CHum readers, great quantifiers that they are, to devise a scoring system. We also suggested one of our own. We proposed to give Foster a “hit” for every significant new qualification to our tests to which we should properly admit because of his response; a “run” for every test he forced us to drop entirely; and an “error” for any clear methodological mistake he made. We gave him three hits, one for each of three BoB tests we should qualify, no runs, and 23 errors (our 1998/99, pp. 440-44), 11 of them serious. In retrospect, we should actually have given him four hits, counting identifying the Textcruncher glitch as his biggest hit.

His Second Response, together with our own ongoing corrections and our publication of revised results after our Textcruncher fix, gives us marginally sharper rates for no+not’s and some BoB’s. It also gives us four more I’m’s and three more whenas equivalents than we had before, and one more whereas equivalent, none of these changing the pertinent Shakespeare profile. We are grateful to Foster for these Foster-inspired refinements. We would give him another hit for helping us refine our whereas/whenas results, but we would make no other changes to his prior hit or run scores, because the changes are so small. Cumulative Foster score for entire debate: five hits, no runs.

4.2. ERRORS: FOUR FOR US, FORTY-FOUR FOR FOSTER

(1) Ours. What about errors? Let us start with our own. In his First Response, Foster charged us with 19 substantive errors and was partially right as to three (our no+not and I’m counts, and our insufficient cautions about chronology discounts to BoB tests), though not right enough to invalidate any test, or materially alter our Shakespeare profiles. We thought he was completely wrong on the other 16 charged errors, but we now know from checking his Second Response that he was partially right about our whenas counts. Hence, he was actually completely wrong on only 15 of the 19 substantive errors he originally charged us with. But 15 of 19 should still be a disturbingly high error rate -- 79% -- for someone as intolerant of errors, great or small, as Foster claims to be. By this measure, he only got only 21% of his hardballs over the plate. We were troubled by three features of the first Foster Response: the seriousness of his charges of incompetence and malfeasance on our part, the flimsiness of his supporting evidence, and his strange practice of ignoring both the concessions and the non-concessions we had sent him in our April 1996 letter, discussing five of our tests which rejected the Elegy. “No one”, we said, “likes to be publicly berated for stubborn incomprehension of a point already conceded, or for points where ‘inconvenient’ counterargument and counterevidence has already been offered, but ignored” (our 1998/99, p. 444).

By the time our First Response appeared, in April 1999, we had seen all of Foster’s case, checked out all his error charges, fixed the ones that were genuine, including the Textcruncher glitch, rerun every test affected, and sent the revised results to CHum. At this point we could accurately count and assess our errors and see how much difference each one in fact had made. Our final error count for the entire debate was four: Textcruncher glitch; not enough warning of BoB chronology problem; undercounted I’m’s by four, whenas equivalents by three, and whereas equivalents by one. Whereas and whenas were counted as one test and, hence, also counted as one error.

Of Foster’s 14 charges of our error in his 1998/99 Response, ten have turned out to be wrong (1-7, 9-11 above); three are unsubstantiated (8, 13-14); and one, whenas, is partially correct (12). In other words, this time Foster got only one of his 14 hardballs anywhere near the plate (seven percent), an error rate even worse than that of his First Response. Total substantive Elliott-Valenza errors in entire debate: four, all minor -- and all fixed.

(2) Foster’s. In our own First Response we listed a dozen clear-but-minor Foster errors (such as his problems with reading, spelling, and arithmetic) and 11 clear-and-serious ones (such as lambasting us for banishing “inconvenient” texts to the Dubitanda which he himself had identified as dubious) (our 1998/99, pp. 443-44). In his Second Response, he responded, again erroneously, to one of our minor error charges, and perhaps to as many as half of the serious ones. His minor-point response claimed, in effect, that nine-thousandths is the same as nine percent; it is not (see 3.1 (9) above). If we read his leaning-microphrase concession broadly, he has also tacitly conceded the first five of our 11 charged major errors: falsely saying we didn’t consider prosody, wrongly knocking us for inconclusive results from a test we did not use, misunderstanding the difference between natural stress and metric stress, falsely claiming we had no safeguards against subjectivity, and absurdly denouncing our counts per thousand lines because “neither poem is even 1000 lines long.” But officer! How could I have been doing 70? I’ve only been out ten minutes. If we read his concession narrowly, as we would guess he actually intended, then he hasn’t conceded the errors, but he hasn’t acknowledged or responded to our contrary evidence either. In either case, these are still manifest, gross errors and still belong in Foster’s error totals.

Foster, in his Second Reponse, did reply to another of our previous error charges: that he erred in blasting us for banishing “inconvenient” texts to the Dubitanda which he himself had identified as dubious. But his reply is not persuasive. In his 1998/99, note 12, he announced with one breath that our recollection on this point is “quite mistaken” -- but then, with the very next breath, as good as admitted that it wasn’t mistaken after all -- except for 1H6 (Section 2 above, third paragraph). Not much of a refutation, we would say. In 1987 Foster told us in great and welcome detail which parts of the Canon should be considered dubious. 1H6 was, in fact, on his list, as it should have been. We pulled all the dubious material so identified out of our core Shakespeare baseline, as we should have to get a clean baseline. He nonetheless castigated us in his First Response for following his own instructions (his 1996b, pp. 252-54 “conveniently exiling ... inconvenient data”), and, in his Second Response, seems to have done it again (his 1998/99, p. 507, “badly gerrymandered”) -- and then tried ineffectually to deny it (his 1998/99, note 12). Wrong, wrong, and wrong. The only question on this one is not whether it should count as a gross error at all but whether it should count as one error or three or six, for its many and insistent reiterations -- or should we use Foster’s own term, “dinning repetions” and “iron indifference” to contrary evidence (his 1998/99, pp. 494, 499)? We would give him at least three errors for his persistence in pressing a charge that seems to us untenable.

Foster did not respond at all to the remaining five of our first-Response charges of major error: falsely denouncing us for our stubborn incomprehension of a point we conceded to him three times in the two preceding years; falsely bashing us for “deck-stacking” for using badges chosen from Shakespeare’s middle plays but validated for all his plays; and repeatedly ignoring the need to control for sample size (three times). All five of these errors are still clear and still major. Bottom line for Foster’s first-Response errors: no change, still 23 errors, 11 major. Apart (perhaps) from his leaning-microphrase concession, none of these have been fixed.

What about Foster’s Second-Response errors? We would add eight minor and thirteen major ones to the previous totals of 12 and 11. The minor ones would be: thinks 9/1000ths is nine percent; improperly counted not’s in prose prologue to Elegy, got it wrong about our “literary advisors” “bailing” from the Clinic, wrongly thinks we rejected 2H6; wrongly claims we overcounted whenas’s in Selimus; didn’t catch our clear rounding warning about whenas’s in Cymbeline (his 1998/99, p. 500, our 1998/99, p. 438); thinks FE flunked only five of our tests (see below); and got all his it citations in 1H6 wrong -- but went ahead and bashed us for these, his errors, as if they were ours. The major errors are: falsely claimed our copytexts were “unedited”; got it wrong that BoB4 was improperly “suppressed” and that the other BoB’s were redundant and should have been suppressed; got it wrong that we “clearly misunderstand” our own leaning microphrase tests; failed repeatedly, again, to allow for sample size; reaffirmed his previous false charges of “deck-stacking” our baseline; falsely tried to deny his 1987 Dubitanda selections; never seems to have thought about how his criticism of our tests might bear on his own tests; and repeatedly -- that is, about a half-dozen times -- tried to nail us for tests we didn’t use. The last two problems are particularly flagrant. Many of Foster’s criticisms, if true of our work, would be devastating for his own tests, too (see below). Is he ready to retract them? His misleading misuse of our play tests on poems, our standardized figures as if they were raw, and our discarded Clinic tests as if we still relied on them, is, as he would put it, “no way to conduct attributional scholarship.” Dead errors don’t count. Neither do factitious ones.

Foster is also clearly mistaken in thinking that his counting conventions for whereas, whenas, sentence-beginning and ending it’s, hark’s, list’s and see’s are the only legitimate ones. They are not. But our counts are not the only legitimate ones either, so, unlike Foster, we don’t count his alternative versions as errors and, in one case, actually have changed our counts to include three of his when as’s and one where as in the Shakespeare Canon. We don’t count any of his many errors in his Leaning Microphrase section where he ultimately admitted we were right about FE and LC. And we don’t count several unsubstantiated, and probably false Foster charges as his errors because he has given us no way to verify or refute them. New Foster errors: 21; total substantive Foster errors in debate: 44, 25 of them major. Bottom line for the two Foster innings: four hits, no runs, 44 errors. For someone who can’t abide “sloppy tabulation” (his 1998/99, p. 502) this seems to us an awful lot of hardballs nowhere near the plate. We have seen that Foster may have tacitly acknowledged as many as five of his 32 errors, or he may not; it’s not clear from his concession. But, unlike us, he hasn’t fixed any of them, with the possible exception of his leaning-microphrase errors with the Elegy and A Lover’s Complaint. Nor has he otherwise acknowledged or regretted any of his First-Response curve balls. Instead, he has continued to hurl them in his Second Response, but even more wildly.

5. Lessons from the debate

5.1. CONTRASTING METHODOLOGIES

Can any lessons be learned from this fierce exchange? We think so. We and Foster are in complete agreement that authorship matters, and that, when you are dealing with Shakespeare, it is important to get it right. We also agree with him that, if authorship matters, quantitative internal evidence matters too. Far more fervently than us, judging from his rhetoric, he subscribes to the Proverbist’s belief that a false weight is abomination to the Lord, and to Steve Ballmer’s notion that anything false is bad. Where we differ from him is in the ways we have tried to keep our weights from getting too abominable. We have commonized our comparative samples systematically; he hasn’t. We have controlled for sample and baseline size; he hasn’t. We have controlled consistently for date and genre; he hasn’t. We have explained and supported every step of our analysis. Especially with Shaxicon, he hasn’t. When good evidence shows we have made a substantive mistake, we have admitted it and, if possible, fixed it. He hasn’t.

Above all, we have used “silver-bullet” tests, while he has used “smoking-gun” tests. Silver-bullet tests are orders of magnitude more reliable, both in theory and in practice. We have not found, nor has Foster or anyone else we know of, a single perfect stylometric identifier like a fingerprint, free from false positives and negatives. Where evidence is imperfect, we have always believed a priori that the “silver-bullet” negative evidence we use to disprove common authorship is drastically more reliable than the “smoking-gun” evidence analysts like Foster use to try to prove common authorship. Perrault’s fairy tale to the contrary, if your foot is size five and fits Cinderella’s glass slipper, it does not really prove that you are Cinderella. By this test, you could just as well be Little Miss Muffet, or even Tiny Tim. But, if your foot is size ten, it is strong proof that you are not Cinderella (our 1997, pp. 183-85). By nature, “could be” is never as strong a proof as “can’t be” is a disproof.

5.2. WERE OUR TESTS FOUL VAPOR?

With the evidence of both sides now in and weighed, we think our a priori position is well borne out by the results of the CHum debate as to whether or not our work is foul vapor. Clearly it is not. After two years of determined assault by an authorship blackbelt who was not pulling any punches, our work did not shatter like Don Quixote’s visor. On the contrary, it was barely scratched. 99.9% of our original results still stand, and we have fixed the .1% that were wrong with no change in the bottom line. If anything has been shattered, it is whatever the blackbelt was smiting us with. This is not, of course, to say that Foster’s failed assault has eliminated all our expectations of further erosion. Some erosion still seems inevitable for methods as sweeping, novel, and experimental as ours, but we now suspect it is more likely to come from close focus on one or two disputed texts than from a global assault like Foster’s CHum Responses. Our follow-on work on the Elegy, discussed below, may fit this close-focus description. This work long ago resulted in valid discounts to two of our tests: BoB5 for an elegy for a man, where subject matter may well increase the ratio of masculine to feminine pronouns, and grade-level for texts like Elegy where the editor, in this case Foster, has heavily inflated average sentence length (our 1998/99, pp. 437, 440-41).

Since working with other poets’ baselines much smaller than our Shakespeare baseline, we are also reminded that our block-and-profile tests are sensitive not only to sample size, where larger samples give you lower variance for good reasons, but also to baseline size, where smaller baselines give you narrower ranges for not-so-good reasons. These issues are discussed in our 2001, pp. 209-11. The smaller the baseline you use, the greater the need to employ a safety factor of more and firmer rejections before ruling out someone’s authorship. If, for example, we had not known that Shakespeare wrote Hamlet, or had known it but had kept it aside to test how well it matched the rest of Shakespeare, our Shakespeare play range would have narrowed by as much as 5% on some tests. Hamlet would then have gotten multiple rejections and become a doubtful Shakespeare ascription by our strict rules. To accommodate such cases where a sizeable Shakespeare work may not be in our baseline, we would add a 5% safety factor to some or all of our ranges and increase the allowable total of rejections for a Shakespeare “could-be” from three to four. These two small adaptations would be enough to accommodate an omission of a play the size and shape of Hamlet, that is, to accept it as a Shakespeare could-be while still consistently and conclusively rejecting non-Shakespeare plays. For baselines smaller than our sizeable Shakespeare ones, larger safety allowances would be called for.

5.3. DID SHAKESPEARE WRITE THE ELEGY?

What about the other half of the debate, the one arguing whether or not Shakespeare wrote the Elegy? This one, never far below the surface of the CHum debate, was conducted explicitly in PMLA and in Elegy by W.S. by Foster, and in The Shakespeare Quarterly and in Literary and Linguistic Computing by us (Foster, 1996a, 1989; our 1997, 2001). Here, too, despite Foster’s ringing proclamations that the Elegy was “Shakespeare’s beyond all reasonable doubt”, it seemed to us that his ascription was in trouble, and Foster himself has since abandoned it (below). The Elegy is indeed loaded with “smoking-gun” features that W.S. shared with Shakespeare. But it is even more loaded with features not shared with Shakespeare: enclitic, proclitic, and no/no+not scores far below Shakespeare’s range, and odd, un-Shakespearean usages, such as adventer instead of adventure, an husband, instead of a husband, thank (noun) instead of thanks (noun), and none other instead of no other (our 1997, 2001). Each of these is a silver bullet in the Shakespeare ascription. The patient might survive two or three such hits, but fifteen or twenty are too many for a cheerful prognosis. Gilles Monsarrat (2002) and Brian Vickers (forthcoming) argue from smoking-gun evidence that the Elegy is also much more loaded with features shared with Jacobean playwright John Ford than with features shared with Shakespeare. This view was originally advanced by Richard Kennedy in a 1996 Shaksper posting (1996), and tartly dismissed by Foster: “Ears (as Bottom reminds us) come in various sizes. We will all hear better, and more clearly, when the shrill tone of Mr. Kennedy subsides….” (Foster, 1996b).

Our study of the Ford hypothesis, using silver-bullet tests both for Shakespeare and for Ford ascriptions, showed that the Elegy has far too many of’s, noun of noun’s, whiles’s, and such as’s to be Shakespeare, but not too many of these to be Ford’s. All in all, counting firm rejections only, the Elegy flunks 16 of 33 validated Shakespeare tests and only one of 29 validated Ford tests (our 2001, pp. 217-18). Such an outcome is trillions of times more likely for Ford than for Shakespeare (Elliott and Valenza, 2002a). In June 2002, without having read Vickers and with no direct mention of us, but supposedly convinced by Monsarrat, Foster (joined by his Watson, Rick Abrams) surprised many with a public concession that Ford was the obvious author. “I know good evidence when I see it and I predict that Monsarrat will carry the day…. No one who cannot rejoice in the discovery of his own mistakes deserves to be called a scholar…. [!]” (Abrams and Foster, 2002).

5.4. PROBLEMS WITH FOSTER’S “SMOKING-GUN” INDICATORS

Even before this striking concession, Foster’s “smoking-gun” Shakespeare tests were in deep disrepair. More than half the “distinctively-high” percentages of run-on lines he reported for the Elegy, and many of the Elegy’s “distinctively long” sentences were put there by himself, not by W.S., whoever he was. Foster used different and inconsistent rules to set Shakespeare’s rates of run-on lines and feminine endings. Of the “nine, and only nine [common words] that never deviate in the plays by more than a third from their respective mean” (Foster, 1989, p. 141), Foster used only the five that passed the Elegy as Shakespeare’s, and not the four that rejected it (Jackson, 1991, p. 259; our 1997, p. 188). If Foster was right in assailing our use of uncommonized copytexts as “toddling toward a precipice” (his 1998/99, p. 494), he, too, toddled off it with his hyphenated-compound-word tests and his (word)-like tests on uncommonized comparison texts -- but, unlike us, he did not warn his readers of the problem. The unique “stylistic thumbprints” he claimed to find only in Shakespeare and the Elegy, redundant more or most, incongruent who, and hendiadys, all turned out not to be unique at all, but shared plentifully with Ford, and hence useless for distinguishing Ford’s work from Shakespeare’s (our 2001). And none of Foster’s 17 original 1989 tests were controlled for sample size. After these discounts, only about three of his 17 tests remained usable at all as silver-bullet tests -- hyphenated compound words, run-on lines, and feminine endings -- and the last two didn’t even reliably say “could be” to Shakespeare as FE’s author (our 1997, pp. 203-05). Another three -- incongruous who, redundant comparative or superlative, and hendiadys -- might still have some value as “smoking-gun” tests for Shakespeare ascription, compared to some other authors, but hardly to all. None of them can distinguish reliably between Shakespeare and Ford (our 1997, 2001). Foster’s Shaxicon, which was supposed to have clinched his case for Shakespeare ascription by showing matching “spikes” of rare word usage in the Elegy (his 1996) did not really do so because Foster never showed it to be immune from false positives (our 1997, pp. 185-86).

5.5. IMPLICATIONS FOR FOSTER’S OTHER IDENTIFICATIONS

Does this mean that Elegy by W.S. is no longer a treasure trove of authorship lore, that Shaxicon is worthless, that Foster’s golden ear for authorship is a myth, or that he is not worth the $250 an hour he gets as a popular expert witness on authorship questions? Not necessarily. Elegy by W.S. was wrong about the Elegy, but it is loaded with interesting information from Foster’s twelve years of indefatigable, pioneering investigations of incongruous who’s, hendiadys, mutual borrowings, play-performance cycles, printers, compositors, and so on. All of these made him and his book invaluable resources for our work in the Shakespeare Clinic. They still make Elegy by W.S. worthwhile reading, despite its many serious errors. Shaxicon was an inspired idea which Foster never got around to sharing enough for others to test, inspired enough that it would be almost as interesting to know for sure that it does not work as that it does work. When Foster’s case for Shakespeare crashed and burned, Shaxicon, its principal prop, presumably crashed with it – or maybe there is some way of retrieving and rehabilitating some of it. We won’t know for sure till Foster lets it out of the closet.

We are less wedded than Foster seems to be to the maxim, erratum in uno, erratum in omnibus (see Foster, 1996a, passim; 1998/99, passim; see Crain, 1998, p. 30, 36). “In the ten years I have done textual analysis, I’ve never made a mistaken analysis.” “All I need to do is get one attribution wrong ever, and it will discredit me not just as an expert witness but also in the academy.” If this were really so, Foster would be utterly ruined. His centerpiece ascription of the Elegy to Shakespeare, resoundingly proclaimed, turned out to be resoundingly wrong, and his “flawless” supporting methodology riddled with errors. So was his misidentification of “Jameson,” a middle-aged housewife interested in the Jon Benét Ramsey case as the young male killer. So, we suppose, was either his resounding exculpation of Patsy Ramsey -- “I know you [Patsy Ramsey] are innocent—know it absolutely and unequivocally. I will stake my professional reputation on it, indeed my faith in humanity” -- or his equally resounding indictment of her a few months later: “It is not possible that any individual except Patsy Ramsey wrote the ransom note”.[9]

But for us the problem is not that he makes mistakes. Everybody makes mistakes, ourselves not excepted. But not everybody oversells their own mistakes, or denounces the supposed mistakes of others, as resoundingly, absolutely, unequivocally, and routinely as Foster. Too many of his claimed proofs have far outrun his supporting evidence. His compulsion to oversell breeds skepticism about his good insights, as well as his bad ones -- unfortunately, because he has had some good ones. He was right about Primary Colors, the Unabomber, and the scholarly readers of his manuscript for Elegy by W.S. (Crain, 1998). He was also insightful and right about many of the individual pieces of his case for Shakespeare as the author of FE, if not of the case in chief, and even that central error had redeeming features. If he had not been privately convinced in 1984, when he first saw the Elegy, that it was written by Shakespeare, we would had none of the valuable fruits of his labors. No one would have spent so much time and energy on a bad poem by Ford as Foster did supposing it was a bad poem by Shakespeare. We are not grateful to Foster for his resounding, false charges of our idiocy and madness, but we remain grateful to him for his many labors on the Elegy, even though his ascription turned out to be wrong, and for his devoted, cheerful, and gracious aid to us in the many years before things went sour. You can always savor the trip, even if it’s to the wrong destination. We still read his work with interest, as we would the work of an intelligent anti-Stratfordian, but with much more skepticism than we once did, especially when he passionately insists that he knows something absolutely, unequivocally, and beyond all reasonable doubt. In our experience, his “passionate truths” have been more passionate than true.

6. Conclusions

Was the debate worth the trouble? In one sense, hardly. Trying to get Shakespeare right is much more interesting, inspiring, and worthwhile than trying to get Foster right. The Shakespeare Clinic was like mining a rich, new Mother Lode of gold -- hard work, but deeply satisfying, right on the steep part of the learning curve, with lots of discovery -- and, if you had to read something closely, it was mostly Shakespeare. Our CHum debate with Foster has been more like six years of scrubbing off tar and feathers. It is hard work, too, but on the flat part of the learning curve, with almost no payoff in new discovery. Re-explaining why your complicated tests are not idiocy or foul vapor is not much fun, and it still seems like an awful lot of effort just to retrieve those four I’m’s, three whenas’s and one whereas that we missed the first time. Looking for Shakespeare’s hand in a dreary, pietistic Ford Elegy is hardly as inspirational as looking for it in the Sonnets, or even looking for it in the poems of Spenser, Donne, or Marlowe. We once likened our controversy with Foster over the Elegy to a land war in Asia over the literary equivalent of the Spratly Islands. It’s all too true.

On the other hand, the Spratly Islands would be important if they had oil, and authorship does matter, if there is any chance it could be Shakespeare’s. Getting such questions settled accurately is not a complete waste of our time, or Foster’s, or that of CHum’s readers. The old claimant-apocrypha questions we chose to address on our own, and the new Elegy ones Foster forced us to address, are both important enough to want to get right. Whether Shakespeare wrote the Elegy is no longer an issue, but it was a hot question in the 1990’s, with American editors saying “could be”, based on Foster’s quantitative evidence, and British ones saying “couldn’t be”, because the poem does not sound like Shakespeare, and because they don’t trust quantitative evidence. Foster did force us to pay more attention to the Elegy than we otherwise would have done; it did result, we believe, in a clear, quantitative disproof of the Shakespeare ascription and a presentation of our evidence to a wider and less technical audience than CHum’s. It also got our work an immediate, comprehensive, highly adversary going-over by an authorship blackbelt, which it weathered with astonishingly low erosion. If you have made errors, you often learn more about them from your critics than you learn from your friends. If Foster’s time is really worth $250 an hour, we would hate to think what we must owe him for his services preparing the tun of hardballs and mud he has thrown at us. But in the end we still feel like the man in the Mark Twain story who was tarred, feathered, and ridden out of town on a rail. If it weren’t for the honor of the thing, we would just as soon have walked.

References

Abrams, Rick, and Foster, Donald. 2002. “Abrams and Foster on ‘A Funeral Elegy’” (online posting, 12 June 2002, SHAKSPER: The Global Electronic Conference, SHK 13.1514, 13 June 2002 http://www.shaksper.net/archives/2002/1484.html.)

Crain, Caleb. 1998. “The Bard’s Fingerprints.” Lingua Franca, July/August 29

Elliott, Ward, and Valenza, Robert. 1996. “And Then There Were None: Winnowing the Shakespeare Claimants.” Computers and the Humanities 30, 191. Dated 1996, appeared 1997.

Elliott, Ward, and Valenza, Robert. 1997. “Glass Slippers and Seven-League Boots: C-Prompted Doubts About Ascribing A Funeral Elegy and A Lover’s Complaint to Shakespeare.” Shakespeare Quarterly 48, 177.

Elliott, Ward, and Valenza, Robert. 1998/99. “The Professor Doth Protest Too Much, Methinks: Problems with the Foster ‘Response.’” Computers and the Humanities 32, 435. Dated 1998, appeared 1999.

Elliott, Ward, and Valenza, Robert. 2001. “Smoking Guns, Silver Bullets: Could John Ford Have Written the Funeral Elegy?” Literary and Linguistic Computing 16: 205.

Elliott, Ward, and Valenza, Robert. 2002. “So Many Hardballs, So Few of Them Over the Plate: Conclusions from our ‘Debate’ with Donald Foster.” Computers and the Humanities 36, 455.

Elliott, Ward, and Valeza, Robert. 2002a. “Elliott notes on Valenza’s hyperspheric analysis,” Claremont McKenna College manuscript, March 4, 2002

Evans, G. Blakemore, ed. 1974. The Riverside Shakespeare. Boston: Houghton Mifflin Company.

Foster, Donald. 1989. “Elegy” by W.S.: A Study in Attribution. Cranbury, NJ: Associated University Presses.

Foster, Donald. 1996. “A Funeral Elegy: W[illiam] S[hakespeare]’ s ‘Best-Speaking Witnesses,’” PMLA 111, 1082.

Foster, Donald. 1996a. “Response to Elliot and Valenza, ‘And Then There Were None.’” Computers and the Humanities, 30, 247. Dated 1996, appeared 1997.

Foster, Donald. 1996b. “Funeral Elegy” (online posting, 6 March 1996, SHAKSPER: The Global Electronic Conference, SHK 7.0172, 6 March 1996 http://www.shaksper.net/archives/1996/0172.html.)

Foster, Donald. 1998/99. “The Claremont Shakespeare Authorship Clinic: How Severe Are the Problems?” Computers and the Humanities 32, 491. Dated 1998, appeared 1999.

Kennedy, Richard. 1996. “Re: Funeral Elegy ” (online posting, 2 March 1996, SHAKSPER: The Global Electronic Conference, SHK 7.0152, 1 March 1996 http://www.shaksper.net/archives/1996/0152.html)

Monsarrat, Gilles. 2002. “A Funeral Elegy: Ford, W.S., and Shakespeare,” Review of English Studies 53: 186.

Vickers, Brian. forthcoming. Counterfeiting Shakespeare. Cambridge University Press.

Wells, Stanley, and Taylor, Gary. 1987. William Shakespeare: A Textual Companion. Oxford: Clarendon Press.

ENDNOTES

1. These tiny changes are also a tribute to the care and consistency of the editing of the 1974 Riverside Shakespeare.
Fixing the Textcruncher glitch changed the old no/no+not range from 249-365/1,000 (old) to 242-358/1,000 (new).

2. The BoB1 range changed from 232-758/1,000 to 284-758/1,000; and the BoB5 range changed from 150-487/1,000 to 159/487/1,000. That is to say, in round numbers, that three of our old minima were off by 1%, 1% and 5%, and one maximum by 1%. If all these discrepancies were assigned to a single range, they would add up to 7.3% in discrepancy: .7%, .7%, 5%, and .9%. But 7.3% discrepancy in one of 102 maxima and minima amounts to only .07% total erosion of our 51 play tests.

3. The missing play from his 1998/99 dubitanda list is 1H6, which he did, in fact, identify for us as dubious in 1987, following the lead of “most editors since [Lewis] Theobald [1688-1744]” (Wells and Taylor, 1987, p. 112). Our tests say that he and the other editors were probably right the first time. 1H6 has ten rejections in 51 tests, seven more than any core Shakespeare play. Has he changed his mind? Whatever he now thinks about 1H6, his listing of the other dubious plays as “non-Shakespearean” more than supports our point that they don’t belong in a clean core Shakespeare baseline.

4. “His censure spares the crows, but scourges the doves.” Juvenal, Saturae, II, 63.

5. “Unless new data come to light ... educated guesses must take the place of knowledge.”

6. “...pending further investigation Shakespeare’s responsibility for every scene of the play (3H6) should be regarded as uncertain.”

7. Foster’s confusion of our standardized rates per 20,000 words and raw rates per play (below) did not become an issue with 1H6 and Tit. Both plays are close enough to 20,000 words to make rounded-off raw and corrected counts identical.

8. Foster writes: “We may define a ‘sentence’ in this case, not as the distance between capital letter and period, since this depends often on editorial preference, but more narrowly as a single independent clause, inclusive of all parenthetical and subordinate elements, but excluding other independent clauses attached to it with a conjunction or semicolon.” Foster, 1989, p. 123.

9. As reported in CBS, 48 Hours, 8 April, 1999. “Jameson,” Foster’s supposed perpetrator, was completely exonerated. She described the various contretemps, including Foster’s about-face on Patsy Ramsey, in a web site: http://www.jameson245.com/foster_page.htm (accessed June 7, 2001).