Glass Slippers and Seven-League Boots:

C-Prompted Doubts About Ascribing A Funeral Elegy and

A Lover's Complaint to Shakespeare

Ward E. Y. Elliott and Robert J. Valenza

April 2, 1997

Summary: Donald Foster, using computers and the Vassar Electronic Text archive, has devised several impressive "green-light, must-be-Shakespeare" inclusion-tests to show that A Funeral Elegy shares rare words and rare stylistic quirks with Shakespeare. But his conclusion that the Elegy is therefore "Shakespeare's beyond a reasonable doubt" is overstated and underproved, since his inclusion-tests have not been shown to be immune to "false positives." His "must-be's" turn out to be more like "could-be's." Other computer-aided tests, newly developed or adapted by the Claremont Shakespeare Clinic, are "red-light, can't-be-Shakespeare" exclusion-tests. Several of these show both the Elegy and A Lover's Complaint falling far outside of Shakespeare's range. Like Foster's tests, the Claremont tests are novel and imperfect and will no doubt be further qualified under closer scrutiny. But exclusion-tests can be imperfect and still reliably disprove authorship in ways that inclusion-tests cannot reliably prove it. A battery of valid "can't-be" test outcomes do more to disprove authorship than a battery of valid "could-be" outcomes do to prove it.

1. Background: Foster's Computer-Aided "Proof" that Shakespeare Wrote A Funeral Elegy

In 1989, after five years of work and three hundred pages of detailed, mostly manual analysis, Donald Foster had long been privately convinced that A Funeral Elegy by W.S. (hereafter FE) "looks to me like the work of Shakespeare." But he admitted that he had not come up with conclusive proof of Shakespeare's authorship: "My methods have been terribly rudimentary in some respects, due to the lack of available tools." In contrast to other Shakespeare discoveries in the 1980s, "[which] were trumpeted in the press and quickly rejected by the academic community," his "discovery" was not one to be oversold. He warily concluded that "[t]here is a possibility, perhaps even a strong possibility, that [the elegy] was written by Shakespeare."

As he noted ruefully, the world yawned at his equivocation. But he pushed on, perhaps because powerful tools to help resolve the doubts were already appearing on researchers' desks. By 1989 the desktop-computer era was in full course. In 1987-88, for the first time ever, all of the Shakespeare canon had become commonly available on disk, permitting researchers to compare Shakespeare's work systematically and comprehensively with troves of other texts, some having been laboriously keypunched into feeble old mainframe computers, others newly, and much more conveniently, prepared with desktop word-processors or even scanners. By 1989, one could compose a Riverside Shakespeare spellchecker and scan, beg, buy, commonize, and computer-analyze texts with unprecedented speed, accuracy, and ease. "Available tools" had suddenly turned into seven-league boots, which could cover the same ground in a hundredth of the time required by the old methods. Foster was quick to put on the boots and to soar with them back into his project with boldness, sophistication, and persistence -- or perhaps we should more properly say "to wade with them," because conquering one level of computational tedium brings you all too quickly to a new one just as demanding of time and more demanding of sophistication. Foster put together the Vassar Electronic Text archive, "a machine-readable collection that comprises dozens of authors, hundreds of texts, millions of words," which "may be (for the moment, anyway) the single largest collection of late medieval and Renaissance texts on disk." With prodigious labor and a lot of trial and error, he developed and deployed SHAXICON, an innovative play-by-play, part-by-part breakdown of the Shakespeare canon. SHAXICON's capabilities and limitations are just now beginning to be explored, but it could well turn out to be a major breakthrough in textual analysis. At the same time, he continued to pursue the old-tech literary and historical analysis at which, compared to us, he is a master. Computers cannot provide this kind of analysis, and without it, no consideration of authorship is really complete.

In December 1995, after six years of pioneering, new-tech research, Foster and a panel of peers flung equivocation to the winds, triumphantly announced his finding that FE was Shakespeare's "beyond all reasonable doubt," and promised that a formal written proof would soon follow. This time nobody yawned; the "discovery" was a media sensation worldwide. A five-page spread, featuring a full-page color portrait of Foster in academic regalia and holding a skull, parodiant Hamlet, appeared in L'Express: "Shakespeare: vrai ou faux?" The announcement also caused a sensation in the academic world. It certainly did not quiet all scholarly doubt -- far from it -- but FE will nevertheless appear as a possible Shakespeare work in all three forthcoming editions of Shakespeare's complete works.

2. The Current Case for Shakespeare Ascription

The promised written proof finally appeared in October 1996, making the following points:

1. W.S. and Shakespeare appear to have borrowed from the same sources and from each other;

2. people who borrowed from Shakespeare also appear to have borrowed from W.S.;

3. "W.S.'s use of language -- his diction, grammatical accidence, syntax, prosody -- was significantly, often distinctively, Shakespearean"; that is, W.S. shared such rare Shakespeare quirks as a fondness for incongruent who's (such as pillow who or books who) and for hendiadys (e.g., cups and gold, instead of golden cups);

4. SHAXICON-based comparison shows that W.S. and Shakespeare -- and only W.S. and Shakespeare's among texts tested -- share rare-word "spikes" with plays in which Shakespeare was performing at the time of composition, the spikes being particularly pronounced for rare words used in parts presumptively played by Shakespeare.

It should be noted that, while some of this new evidence may supersede some of the 300 pages of conventionally gathered evidence in Elegy by W.S., the main thrust of the new is to confirm and amplify the old, which, despite considerable subsequent questioning of its key tests, is still a treasure trove of Shakespeare identifiers. The old evidence showed, and we think still shows, that Shakespeare was exceptionally fond of feminine endings, run-on lines, occasional very long sentences, and hyphenated compound words, as well as hendiadys, incongruent who's, and redundant comparatives and superlatives (such as most unkindest). Shakespeare also seemed to have predictable ranges of certain common words: and, but, like, most, not, do, and so. These tests have not fared well since 1989 because Foster appears to have used only the ones that passed FE (Section 7.1 below) and failed to control for sample size (Appendix 1 and Postscript below); but at the time they looked like strong evidence that W.S.'s text, alone among forty samples of elegiac verse, fit the Shakespeare profile on all seventeen tests. Of the other samples, only two passed as many as five of the tests.

To our not-so-practiced eyes (neither of us is a Lit Department regular), Foster's new analysis likewise looks like an impressive example of what can be done with a combination of old- and new-tech methods. It does look like a shoring-up of his somewhat-eroded previous case for FE as Shakespeare's, though we don't think it is any more proof against erosion than the old case was. Even if his new conclusion is wrong (as we think it probably is), one can learn much about Shakespeare's style from the ways in which he has tried to prove his case. Moreover, we agree completely with him and his supporters -- and, indeed, with his detractors -- that, when dealing with a question of Shakespeare ascription, getting the authorship right does matter.

3. The Shakespeare Clinic's Computer-Aided "Disproof" of Shakespeare Authorship of Funeral Elegy and Scores of Other Works

But to say that Foster's evidence is sophisticated and illuminating is not to say that any of his proofs are perfect, or that, individually or collectively, they prove Shakespeare's authorship beyond all reasonable doubt. We cannot join with those who have hailed his methodology as "flawless" or even with those who believe he has gotten his Shakespeare ascription right, let alone right beyond all reasonable doubt.

Our hesitance to climb on the bandwagon is based neither on FE's stylistic shortcomings (most lay readers and many professional ones find it too plodding, abstract, and moralistic for Shakespeare) nor on the absence of external evidence connecting Shakespeare with William Peter, the elegy's subject. Foster concedes that style and external evidence do little to prove Shakespeare authorship -- but he contends, perhaps plausibly, that neither precludes it. Tastes differ; preconceptions are often wrong; it took a century for the Sonnets to be accepted as Shakespeare's. If Homer can nod, why can't Shakespeare plod?

Our problem is with Foster's notion that FE's sharing of rare words and quirks with Shakespeare, even a lot of rare words and quirks, proves Shakespearean authorship, and also with his tacit implication that no strong contrary internal evidence exists. Neither, in our view, is true.

Our own approach to authorship questions derives from our experience as faculty advisors to the Claremont Colleges Shakespeare Clinic. This Clinic, which ran intermittently from 1987 to 1994, consisted of a series of teams of Claremont Colleges students using computers to shorten the list of fifty-eight "claimants" to the "true authorship" of Shakespeare's poems and plays. It was sponsored by the Sloan Foundation, the Irvine Foundation, and the Claremont McKenna College Practicum program; and it reported to the Shakespeare Authorship Roundtable, a generally Stratford-skeptical California authorship study group. The Clinic's final report appeared in January 1997, in the April 1996 Issue of Computers and the Humanities -- together with a scorching, but (in our view) poorly-supported critique from Foster. Our postscript, below, summarizes this controversy from our perspective.

4. The Claremont Exclusions
4.1. Shakespeare Claimants

Many Shakespeare Quarterly readers will doubtless be shocked at the thought of anyone spending ten years shortening the list of "credible" anti-Stratfordian claimants, not a one of whom, after all, commands the support even of most anti-Stratfordians. Many would find it even more dismaying that we tried to do it with computers. But few would be terribly surprised by our main results. Only thirty-seven of the fifty-eight claimants turned out to have poems or plays that could be tested against Shakespeare, and not a one of these -- including the frontrunners Bacon, Marlowe, and the Earl of Oxford -- came anywhere near to fitting our Shakespeare profiles. If our tests are right, there has been carnage in the ranks of the "claimants."

4.2. Plays and Poems from the Shakespeare Apocrypha

Nor would many readers be surprised at most of our secondary results. We tested all 27 plays of the Shakespeare Apocrypha; we tested various poems ascribed to Shakespeare, including FE; and, where they were large enough, we tested the Shakespeare Dubitanda, plays and parts of plays in the canon where Shakespeare's authorship has been disputed. None of the Apocrypha plays, nor any of the Dubitanda plays conventionally ascribed to others, nor any of the poems latterly ascribed to Shakespeare (including FE) matched Shakespeare. More surprising, though hardly shocking, is that Henry VI, Part 3, Titus Andronicus, and most of the Dubitanda sections which are ascribed to Shakespeare, while they come much closer to fitting within Shakespeare's profile than the parts ascribed to others, still show what we consider strong signs of co-authorship. Most surprising, though still hardly shocking given its history of disputed authorship, is that A Lover's Complaint (hereafter LC) does not match Shakespeare's other poems or tested play verse.

4.3. Computer-testing the Claremont Electronic Text Archive

We came to these results through a process in some ways similar to that used by Foster. We, too, put together a corpus of Renaissance poems and plays with dozens of authors, hundreds of texts, and millions of words. Like Foster's, ours, too, may well have been the biggest and best of its kind in the world at one time or another. And ours, too, has probably now been eclipsed, in size at least, by Chadwyck-Healey's large, costly collections of electronic texts on CD-ROM. If we had thought of it, our collection, too, would probably have had a resplendent name, such as "the Claremont Electronic Text Archive." Text crunching was (and still is) so new and uncrowded a field that it was not terribly hard for low-budget, small-college pioneers to get to the frontier and lay a plausible claim to a few superlatives. And we certainly crunched our texts every way we could think of, using both traditional tests (including some from Foster's Elegy by W.S.) and fancy new tests of our own devising.

4.4. Test Criteria: Fifty-one Tests, All "Valid," None Perfect

But our methods also differed from Foster's in several respects. Foster favored a mixture of old- and new-tech tests, lightly standardized, if at all, and mostly designed -- at least where FE was at issue -- as "green lights" to include as Shakespeare matches those works that shared some rare quirk supposedly peculiar to Shakespeare. We were partial to new technology, standardization, and "red light" tests designed to exclude as mismatches works that lack a trait ubiquitous in Shakespeare. Our standardized layout was intended to permit easy comparison of each test's results with those of other tests. With Foster's help, we set up clean "core Shakespeare" baselines, excluding, as best we could, all texts where any other writer's hand was suspected. Since one expects more variance among small samples than among large ones, we sought to compare texts in blocks of roughly the same size: whole plays against whole plays, or shorter blocks against other shorter blocks, but never, if we could help it, short blocks against long or long against short. We looked for consistency, that is, for a Shakespeare profile that would pass at least ninety-five percent of our core Shakespeare baseline works or blocks. Any block of text that fell outside Shakespeare's normal profile on any given test would get a "rejection" and would be considered a statistical "outlier," even if we were sure it was Shakespeare's. One or two such rejections, by themselves, are not enough to disprove Shakespeare's authorship. Of our sixty-nine core Shakespeare baseline verse blocks, none had more than two rejections. Only five (seven percent) had even two. For a poem the size of FE or LC, three rejections in fourteen tests are enough to justify serious doubts of Shakespeare's hand. Where there are six rejections in fifteen tests, as with LC, or thirteen rejections in twenty-two tests, as with FE (see sections 7-9, 10.5 below), the doubts become more than serious.

We also looked for a modicum of distinctiveness -- that is, for tests that would reject a minimum of ten percent of our non-Shakespeare samples. The least discriminating of our tests rejected ten to twenty percent of non-Shakespeare samples; the most discriminating rejected over sixty percent; the average test rejected about a third of non-Shakespeare. None of our tests, not even the best of them, was a perfect identifier in the way that fingerprints, DNA analysis, and Cinderella's glass slipper are supposed to be perfect identifiers, fitting only the true Cinderella and no others.

5. Cinderella or Bertha Broadfoot?

Tests that claim to include conclusively have to be perfect; tests that claim to exclude conclusively do not. This means that we had to pay much more attention to negative evidence than to positive because, with less-than-perfect identifiers, strong exclusionary evidence normally trumps strong inclusionary evidence. Imperfect tests are more like shoe sizes than like fingerprints. Only in a fairy tale can the tester actually identify Cinderella because she alone fits a size-five slipper. In the real world there are too many other size-five "false positives" -- Little Miss Muffets, say -- for the test to be conclusive. But, although the shoe-size test can't reliably distinguish Cinderella from Miss Muffet, it can reliably distinguish her from Bertha Broadfoot, who is a size ten. Hence, imperfect tests are intrinsically better at disproving authorship than at proving it. And perfect tests are extremely rare for projects like ours. We would love to have found a rare-quirk test that says "must be Shakespeare," but we have yet to see one documented. In the meantime, for our purposes, "could be Shakespeare" or "couldn't be Shakespeare" has been serviceable enough. In fifty-one tests our core Shakespeare baseline (32 plays) had an average of only one rejection per play, with a range of zero to three rejections per play. With the same tests, Apocrypha and claimant plays had an average of sixteen and nineteen rejections, respectively, and a combined range of eight to twenty-eight rejections. With the most Shakespeare-discrepant of our Shakespeare baseline plays, there was barely a third as many rejections as the least Shakespeare-discrepant of other plays tested.

6. Have Any of the Foster Tests Achieved Perfection?

We believe that our emphasis on exclusionary "red-light" rather than inclusionary "green-light" tests is a principal difference between our approach and Foster's. Though he does not avoid exclusionary tests altogether, his more typical approach is to make a detailed showing that FE has "textual and linguistic fabric indistinguishable from that of canonical Shakespeare" because it passes the inclusion test for a given rare Shakespeare quirk. He then adds that the Vassar archive supplies few or no non-Shakespeare examples of the quirk at issue, notes that the exceptions, if any, were "playwrights who learned to write, as it were, on Shakespeare's knee", and proceeds to the next point. Though he does not openly claim that the presence of any given Shakespeare quirk shows a work to be Shakespeare's, or that its absence shows that a work is not Shakespeare's, he strongly implies that many such quirk-tests seem so close to perfect in their immunity to false positives that a work, if it passes enough of them, "must be Shakespeare's."

6.1. "Sum-of-the-Quirks" Tests Can Mislead and Should Require Very Strong Proof

However, the "sum-of-the-quirks" standard presupposes near-heroic levels of due diligence: the user must show that the quirk is in fact unique to Shakespeare and that every contrary indicator has been ruled out. We are skeptical of such presuppositions. Anti-Stratfordians have put together thousands of pages listing seemingly rare quirks, many of them pointing with what looks like amazing specificity to one claimant or another. The Earl of Oxford "bore the canopy" (Sonnet 125), was lame (Sonnets 37, 89), was almost Henry Wriothesley's father-in-law, was a dead ringer for Hamlet, wrote letters packed with Shakespearean imagery, and so on. But not only can't all the claimants be Shakespeare; if we did it right, none of the thirty-seven we tested can be Shakespeare. In his Cardenio, or the Second Maiden's Tragedy, Charles Hamilton has given us what must surely be the greatest of all sums of Shakespeare quirks: 18,000-plus words of manuscript, every one to all appearances in Shakespeare's own handwriting, written at almost the right time to be Shakespeare and Fletcher's long-lost Cardenio, containing material from the Cardenio story, and certified by a master graphoanalyst. "Every word doth almost tell my name!" Should The Second Maiden's Tragedy therefore take its place beside FE in the new Shakespeare canon? Not at all. It is crammed with Middleton quirks and has twenty-eight Shakespeare rejections, more than any other play we tested. From our perspective, even if we grant its 18,000-plus points of resemblance to Shakespeare, it is the twenty-eight points of difference that tell us who probably did write it and tell us for sure who did not write it.

We say "probably" on the inclusionary side, both because our tests are validated for Shakespeare, not Middleton, and because we can't be quite sure that Middleton's own quirks are unique. If our archive is representative, Middleton was inordinately, maybe uniquely fond of contractions. Middleton plays account for only a tenth of the words in our commonized collection of plays with identified authors, but Middleton nevertheless contributed three quarters of the I'm's and you're's, and ninety-five percent of the we're's, I've's, and you've's. It seems plausible to credit him, as most scholars do, with the authorship of The Revenger's Tragedy, A Yorkshire Tragedy, and The Puritan, along with The Second Maiden's Tragedy, since all of these are loaded with these rare-quirky marker words. But what of Woodstock, which is also loaded with Middleton's rare quirk words? Should we follow the 170-odd well-documented quirk words which say "this must be Middleton," or should we follow common sense in noting that Middleton was only twelve to fifteen years old when Woodstock is thought to have appeared and seems an improbable author? For us, and, we believe, for most people, the one strong piece of exclusionary evidence would prevail over the scores of otherwise persuasive-looking pieces of inclusionary evidence.

Summing the inclusionary quirks, a common feature of authorship studies, has been much deplored by scholars, Foster included. As he puts it:

In the past, attributional debate was a game in which "striking verbal parallels" were advanced without proof of their distinction, only to be dismissed as "unconvincing" or "commonplace" without proof of their insignificance. It is no longer safe to make casual pronouncements of this sort, for researchers can now test such matters objectively, by mapping the recorded language of an archived writer against the linguistic system shared by a community.

6.2. Has Extraordinary Proof Been Provided?

How testable are Foster's own proofs of distinction? One answer might be "not so easily as you might suppose from the passage just quoted." Computers have drastically lowered the time costs of many tests but hardly of all tests and not to zero. Take, for example, Foster's claims about Shakespeare and W.S.'s use of hendiadys and redundant comparatives or superlatives. In testing such claims, computers can spare you scanning all of the thousands of pages of text in the Vassar archive by taking you swiftly from and to and and from one more or most to the next. But tens of thousands of and's, more's, and most's remain in the archive, several per page, and the net burden of checking all of those manually from the screen is scarcely less daunting than that of physically leafing through thousands of pages. How many reviewers, especially the ones most skeptical of Shakespeare provenance in the first place, would undertake such a task to test the claim that "neither past scholarship nor the Vassar text archive has yielded a single poet besides Shakespeare and W.S. who makes frequent use of ... 'Shakespearean hendiadys'"? We won't say that only a true believer would perform such heroically tedious tests, but it makes much more sense for a true believer, who expects to hit pay dirt, than for a skeptic who does not, to do such tests -- or, like Don Quixote with his second helmet, to consider them as good as done.

6.3. SHAXICON

Something similar may be said of SHAXICON. Highlighting some striking patterns of correspondence between W.S. and Shakespeare is a remarkable accomplishment, which by itself could be worth the price of admission. But it is not the same as showing that SHAXICON's rare-word "spikes" are immune to false positives. Spotchecking can be done by anyone having access to SHAXICON (which most people don't), but making a compact, comprehensive case for perfect immunity from false positives is another heroically tedious task, and this one has barely begun. Foster is probably right in supposing that "it may take years before this evidence is scrutinized and fully digested," and we wish him luck in his hopes of getting SHAXICON licensed for others to try out on the Internet. SHAXICON seems to have made such stylometric exercises a hundred times more doable, and it might well become a powerful tool for ascribing and dating texts. Or it might not. Foster's own preliminary summary of SHAXICON results is more tentative than what he later presented in "Best-Speaking Witnesses," and even that preliminary summary has been strongly criticized for giving Shakespeare many more acting roles than seem justifiable by the historical record. Once again, "doable" is not necessarily the same as "done."

6.4. Incongruent Who's: Limited to the Shakespeare Camp?

We have not checked every text in the Vassar archive, but we did make a small spot-check of his second main quirk-test, Shakespeare's incongruent who (as in books who). According to Foster,

In Shakespeare's canonical poems and in W.S.'s elegy, twenty percent of the clauses beginning with who or whom are of this idiosyncratic form. Elsewhere in English literature, the percentage is immeasurably small. The Vassar archive supplies only a few non-Shakespearean texts in which Shakespearean who appears more than once. Two instances occur in the anonymous play The Reign of King Edward III, both of them in . . . scenes commonly ascribed to Shakespeare. The remaining plays are by George Wilkins and John Ford, junior playwrights who learned to write, as it were, on Shakespeare's knee.

When we were testing claimant poems in 1990, we did not use the "incongruent who" test because, once we standardized Shakespeare into 3,000-word poem blocks, we could not exclude anything with it. Of our fifteen Shakespeare poem blocks, three had no incongruent who's. Our rules allowed us only one outlier from Shakespeare's normal range for an acceptably consistent, usable Shakespeare profile (see section 4.4 above). Larger blocks of play verse -- or perhaps even of play prose -- might have shown greater consistency (or they might not have), but the Clinic was all but over by the time we had the plays sufficiently edited for this kind of analysis. In retrospect, it is possible that our rigid, exclusionary test standards may have cost us an otherwise illuminating test.

Foster's more flexible, inclusionary methods permitted him to use and showcase the "incongruent who" test, but he has not dispelled questions about the test's immunity to false positives. While not every incongruent-looking who is clear-cut, our counts for Shakespeare's poems seem broadly to agree with his. What does not agree as well with his thesis is a clutch of what looks to us like false positives from a brief spot-check of poem samples in the Claremont archive. This yielded two or three "Shakespearean who's" each from Heywood, Daniel, Lodge, and Greene, and perhaps one from Drayton. The percentages here may still be too small for Shakespeare -- though no one can know what is "too small for Shakespeare" without some kind of Claremont-style standardization: until, that is, you have defined the test and run it on all of Shakespeare's core works, standardizing for text length and genre, and found some kind of Shakespeare profile. We see no real sign that Foster has done this. It may be, of course, that we need better guidance in recognizing "Shakespearean who's"; or it may be that some or all of the poets named learned to write on Shakespeare's knee, or he on theirs. On the other hand, it may be that a more comprehensive search would have turned up yet more false positives. We don't know. What we do know is that each of the apparent false positives is at least a yellow light and maybe a red light -- a warning that the test may not be as errorproof as it is made out to be and, again, that the full substantiation which an inclusion test requires may be doable but has not yet been done.

7. Contrary Evidence for Funeral Elegy: MacDonald Jackson and the Shakespeare Clinic

So much for the evidence that FE must be by Shakespeare. We are impressed by the analysis and would like to see more of it, but we don't think the case for the affirmative has been proved. What of the evidence that it is not by Shakespeare? In one sense Foster is right in asserting that "no one [has] attempted a systematic rebuttal of my evidence for Shakespearean authorship of the elegy." Our discussion thus far may have shown why. An all-out, internal-evidence rebuttal could require several disproofs a step more heroic than the "proofs" supplied by Foster, plus more access to SHAXICON than has yet been made available, and FE skeptics are much less likely than FE believers to consider the poem worth such an effort. But this is not to say that there is no contrary evidence worth considering. As we have seen, there have already been sharp assaults on FE for its lack of literary distinction and externally evidenced connection with Shakespeare. There have also been two stylometric critiques of Foster's case for FE, MacDonald Jackson's and ours. Let us start with his.

7.1. MacDonald Jackson and Common-Word Frequencies

In 1991, like us, MacDonald Jackson admired Foster for his methodological enterprise, but Jackson did not subscribe to every Foster test or to Foster's then-tentative conclusion that FE was by Shakespeare. In a five-page review of Elegy by W.S., Jackson noted FE's un-Shakespearean "air of dull abstraction" and asked a stylometric question that Foster has never answered: why was Foster so wedded to common-word tests that passed FE, yet so loath to use other obvious common-word tests that would have flunked it? Having picked out the "nine, and only nine [common words] -- and, but, by, in, not, so, that, to, with, -- that never deviate in the plays by more than a third from their respective mean [Shakespeare] frequencies," Foster discarded the four prepositions (by, in, to, with) where FE fell outside Shakespeare's range, while testing only for the remaining five words where FE fit the Shakespeare profile. Foster's only explanation for discarding four FE-rejecting tests was that he thought they had "little if any value as indices of style." Jackson concluded that "it seems probable that there has been a degree of unconscious bias in the selection and application of the tests." He went on to note that FE also fell outside of Shakespeare's poem frequency range in its use of the collocation in the. Shakespeare's poem maximum was 2.3 in the's per thousand words; FE's rate was 5.5. "If such solid-looking blocks in the case that Foster constructs can be shown to be hollow, the whole edifice seems in danger of collapsing."

7.2. Funeral Elegy Has Too Many In The's for a Shakespeare Poem

Such common-word tests are only the beginnings of our kind of test regime: first, establish a Shakespeare profile; second, see if the test sample fits within it; and third, give the sample a "could be" if it passes, a rejection if it doesn't (section 4.4 above). When these procedures are applied to Foster's common-word tests, FE gets five rejections out of eleven instead of six passes out of six. The next steps, under our procedures, would be to broaden the verse baseline by adding some play verse, to break the broadened baseline into blocks of roughly the same size as that of the sample texts to be tested, and to run the test against a range of samples to find if the trait tested is unique to Shakespeare. While reblocking our entire Shakespeare verse baseline to 4,300-word blocks is a labor bigger than we consider appropriate just to try out five new tests on one poem, we did do a spot-check of one of the Jackson tests, in the's per FE-sized block of Shakespeare's verse -- ten 4,300-word blocks of poems and nineteen blocks of verse from a selection of five late plays. The result is a range of 2-14 in the's per block of Shakespeare poems and 1-8 for verse blocks from Shakespeare plays. FE, with 23 in the's on our counter in 4,300 words, is a clear rejection (see Table 2 below).

7.3. Shakespeare Clinic Rejections of Funeral Elegy and A Lover's Complaint

We shall resist the temptation to describe all fifty-one of our play tests, or even all fourteen of the tests we validated for our 213,000-word sample of Shakespeare's poems and selected play verse, duly broken down into (roughly) 3,000-word blocks, other than to note that both FE and LC passed some of our toughest, most glorious, and highest-tech -- though still imperfect -- tests. So did many other poems and blocks of poems not by Shakespeare. Nor shall we spend much time here on two of our tests which rejected FE but which we discarded or discounted for FE in response to what we considered plausible objections from Foster. These are (1) one of our "Bundles of Badges" tests, based on ratios between Shakespeare-preferred and other-preferred words, and (2) a "grade-level" test, based on word length and sentence length. Because they raise interesting questions about possible blind spots in both teams' approaches, we discuss them separately in Appendix 1; but here we shall focus on the six original tests that rejected FE and LC -- and which we have not yet found good reason to discard or discount.

Table 1 summarizes these results. The top line gives our Shakespeare range for each test; the middle line gives LC's scores; the bottom line gives FE's scores. The Shakespeare ranges are the overlapped highs and lows of our two sets of verse blocks, one with fifteen poem blocks, the other with fifty-five play-verse blocks, using the higher of the two highs and the lower of the two lows. It is the broadest range estimate we considered justifiable by the data. Shaded cells show rejections from our regular menu, six for LC, three for FE. For comparison, apart from LC, only five (seven percent) of our seventy Shakespeare verse blocks had even two rejections. Only LC had more than two. Its six rejections, in our view, are more than enough to revive old doubts as to whether it is Shakespeare, and either to isolate it, under our ninety-five percent Shakespeare consistency rule, as an extreme, one-percent outlier or better, under our clean-baseline rule, to remove it from baseline altogether. In either case, the Shakespeare all-test profile for 3,000-word blocks should not allow more than two rejections. If FE's three remaining rejections are good, even ignoring for the moment the additional in the rejection noted by MacDonald Jackson and confirmed by us above, are good, they do put FE's Shakespeare ascription in jeopardy.

[Table 1 about here]

Table 1. A Lover's Complaint falls outside six of fourteen original Shakespeare verse profiles (Shakespeare rejections shaded). Elegy by W.S. falls outside three. Of sixty-nine other Shakespeare verse blocks tested, none had more than two rejections.

The listed tests are "enclitic and proclitic microphrases" (described below); "percentages of sentences with with as second-to-last word;" "no's per thousand no's and not's;" and two Thisted-Efron tests, "slope" and "new word." The easiest of these to describe compactly (and also to run, understand, and critique) are "with as second-to-last word" and "no's per thousand no's and not's." They are simple ratios; they can be crunched out in minutes; they do show Shakespeare profiles: FE falls well outside one profile; and LC falls well outside both.

7.4. No/no plus not

The with test passes FE, and Foster has not yet objected to it. However, he does object to the no/no plus not test, arguing that one should expect higher ratios of no to no plus not where there is a lot of dialogue, as in Venus and Adonis, The Rape of Lucrece, and the plays, than in an elegy where little or no dialogue appears. Although this notion seems plausible, it does not fit the available Shakespeare evidence very well. Three-thousand-word blocks of Shakespeare's play-verse have essentially the same range and average scores as his poems: 172-432 range, 300 average for plays; 184-520 range, 348 average for poems. The only difference perhaps worth mentioning is that plays score slightly lower than poems. Between-poem comparisons become more problematical as the baseline text corpora get shorter, and presumably more variable, but Venus and Adonis and The Rape of Lucrece, with lots of dialogue, still fall into about the same range and average brackets as the Sonnets, which have little or no dialogue. The Venus and Adonis/The Rape of Lucrece range is 233-520, 397 average; the Sonnets range is 316-476, 323 average. LC, which is mostly dialogue of sorts, has a score of 111 on this test; FE, with little or no dialogue, has a score of 89, barely half of Shakespeare's lowest score. Controlling for dialogue content does little to move FE or LC into the Shakespeare column on this test.

7.5. Thisted-Efron Slope and New-Word Tests

Thisted-Efron tests -- the brainchildren of statisticians Ronald Thisted and Brad Efron and the original inspiration for the Shakespeare Clinic -- are novel, fancy, high-tech tests which are much harder to explain compactly in non-technical terms than the simple ratio tests just described. These tests tempted us into a ten-year labor with the colossally mistaken notion that pressing a few buttons might be all you had to do to tell what was Shakespeare and what was not.

Thisted and Efron cleverly adapted a methodology famous for predicting the discovery of butterfly species to the task of calculating Shakespeare's "latent vocabulary" from the then-new Spevack Shakespeare Concordance. How many "latent" words did Shakespeare know that he did not write down? How many would appear in a hypothetical, newly discovered Shakespeare poem or play of a given length? The answer was something like "about 20 to 25 new (to Shakespeare) words, plus 320 of his own rare words, per 1,500 more words written down."

When Shall I die? was "discovered" to be Shakespeare's, Thisted tested it and found that it did have the right number of Shakespeare-new words for a Shakespeare work. Furthermore, it also had about the right number of Shakespeare's rare words (used less than 100 times in the canon) and the right degree of "slope" from the rarest of rare words to the most common of rare words. Thisted's tests, in order to be valid, turned out to require a much larger baseline and longer samples than he used on Shall I die?, and, as is appropriate with imperfect tests, his "could be" was not presented as a "must be." Hence, though it would not be easy to find many today who think Shakespeare wrote Shall I die?, that fact does not invalidate the test but merely confirms that, like all other tests we have encountered, it is not immune to false positives. Using a larger baseline and longer samples, we eventually did validate all three Thisted-Efron tests for Shakespeare's plays. Two of the play-validated tests, "slope" and "new word," also worked on Shakespeare poems, and we now consider them among the best of our tests. Matched against our all-Shakespeare baseline, the tests give "could be's" to all of Shakespeare's poems except LC (as they do to FE and to many non-Shakespeare blocks), but they give "couldn't be's" to LC. Since the scores measure discrepancies between the test sample's counts and what would normally be expected from the Shakespeare baseline, LC's strong negative scores mean that its slope is too shallow for Shakespeare, and that it has too many new-to-Shakespeare words. FE's case might be marginally strengthened by getting "could be's" in these tests, but it is indirectly weakened by LC's rejections. If LC, with its many Shakespeare rejections, is considered merely a small bundle of Shakespeare aberrations, it means that Shakespeare's own writing was inconsistent and his stylometric range a wide one, maybe wide enough to encompass other aberrant poems, such as FE. On the other hand, the larger a bundle of rejections LC accumulates, the harder it becomes to believe that Shakespeare could have written it, and the more it looks as though Shakespeare's verse-writing style is the narrow, consistent one revealed in the sixty-nine other Shakespeare verse blocks. FE falls far outside this narrow, consistent range, and, past a point, its Shakespeare attribution is indirectly damaged by strong new LC rejections.

7.6. "Leaning Microphrase" Tests

The last of the Clinic's decisive reported rejections, both of LC and of FE, are enclitic and proclitic "leaning" microphrases. These forbidding-sounding tests are adapted from the work of Marina Tarlinskaja, a "Russian-School" verse analyst from the University of Washington. These tests have nothing to do with computers; they are slow, manual, judgment-requiring tests, much like the more heroic of Donald Foster's quirk-tests -- such as hendiadys and incongruent who's. We have previously admired such slow, judgmental tests for their ingenuity but deplored them for not being as quick, clean, and replicable as the "crunchable" tests we normally favor. In our wardrobe of seven-league silicon-laced boots, such plodding, road-bound, leathern relics are an anomaly. Moreover, they are complicated and even harder to describe compactly than the Thisted-Efron tests. But we could not resist trying them because they do work. Anyone interested in Shakespeare verse attribution needs to know something about them.

The long, definitive version of what needs to be known can be found in Chapter 6 of Tarlinskaja's Shakespeare's Verse. A much shorter summary can be found in section 6 of our "And Then There Were None." The gist is that certain "clinging monosyllables," stressed in normal speech, get bent out of stress for metrical reasons in iambic-pentameter verse. Some versifiers, such as Fletcher, were much more inclined than Shakespeare to do such bending; others, such as Marlowe and Peele, were much less inclined. If the stress-losing "clinging monosyllable" precedes the stressed syllable, the microphrase is proclitic, "leaning forward." Proclitic microphrases in the passage from Sonnet 29 below are defined by the stress-losing "sings" in "sings hymns" and "sweet" in "sweet love," the italicized second syllable being the stressed one. With enclitic microphrases the stress-losing monosyllable follows the stressed syllable, "leaning backward." Example: "wealth" in "such wealth" in the same passage.

...(Like to the lark at break of day arising

From sullen earth) sings hymns at heaven's gate;

For thy sweet love rememb'red such wealth brings

That then I scorn to change my state with kings.

In Tarlinskaja's counting rules only "notional" words are counted: nouns, verbs, adjectives, adverbs, impersonal pronouns, "this" as object. "Grammatical" or "form" words that have no stress in common speech -- articles, prepositions, personal pronouns, possessives, conjunctions, and indefinite pronouns -- are not counted. Only eight kinds of phrase are counted:

1. subject and predicate (love thrives)

2. modifier and modified (sweet heart, love's pains)

3. verb and adverb (sink down)

4. adverb and verb (well said)

5. adverb and adverb or adjective (more strong)

6. verb and object (give ear)

7. adverb modifier not connected with first word (so then)

8. apposition or title (Lord Sands)

And tight links prevail over loose ones. In the phrase "such wealth brings," "such wealth" makes a tighter link than "wealth brings" and thus makes the countable microphrase.

Is this clear? If so, please accept our congratulations for mastering a powerful, difficult authorship test on the first try. If not, you are not alone. We found that this complicated classification system takes some learning, and we have conceded that our classifier (Elliott) is not as good at it as Tarlinskaja, typically producing counts five or ten percent below hers. Foster cites this as evidence that the test is subjective, nonreplicable, and perhaps incomprehensible -- and, even apart from these objections, that it would still not be appropriate for comparing poems of different genre, subject matter, stanzaic structure and rhyme schemes. In typically vivid language he dismisses the test as "at best, dubious, and at worst, foul vapor." We are not convinced. Shakespeare's Sonnets, play verse, and erotic epyllia all have different subject matter, rhyme schemes, and stanzaic layout, but they also have essentially similar ranges of leaning-microphrase scores. The replicability of results constitutes a problem for us, but hardly one big enough to invalidate the test. For one thing, ninety-to-ninety-five-percent replicability is, shall we say, not bad for government-department work; in fact, we see it as strong evidence that the test is not incomprehensible. If a mere government-department amateur can achieve such levels of apparent comprehension of these tests, imagine what serious literature-department professionals could do if they set their minds to it. Our guess is that even this level of replicability compares favorably with that of some of Foster's favorite tests -- incongruous who or hendiadys, for example. Moreover, how much difference could a ten-percent margin of error make in explaining away FE's enclitic scores, which are barely half of Shakespeare's late-verse minimum, let alone explaining LC's enclitic scores, which are barely a quarter of Shakespeare's minimum? Finally, none of the most crucial counts, those for LC, FE, and our Shakespeare-verse baseline counts, are Elliott counts. They were all done by Marina Tarlinskaja herself, meeting the MacDonald Jackson standard for a test involving judgment -- the same person implementing the same test at the same time. These safeguards may not bring judgmental wobble down to zero, but they should get it as low as can reasonably be expected of a manual-recognition test, certainly low enough that test results like those of FE and LC -- both of which are drastically lower than Shakespeare's lowest late-verse block score -- can't plausibly be blamed on methodological noise.

8. Two Follow-On Tests: t' plus verb, I-compounds

Besides the tests described in the Shakespeare Clinic report, we also tried two (actually three, counting the one adapted from MacDonald Jackson) follow-on tests on 4,300-word blocks of Shakespeare's poems and verse from five late plays. These are: t' plus verb per block (as in t'enoble), and I-combinations, such as I'll, I shall, I will, or I do (not counting I do not), per I (Table 2). FE flunked both tests: too many t's, too few I-combinations per I, as it also flunked our MacDonald Jackson in the test discussed earlier. LC passed the t' plus verb and in the tests, but it escaped a clear rejection for having too few I-combinations per I only because this particular comparison was with FE-sized 4,300-word blocks, which are substantially longer than LC and therefore presumably less prone to variability. However, LC does fall outside Shakespeare's I-combination range when compared against 3,000-word blocks of Shakespeare closer to its actual length of about 2,600 words. We have not run these tests on the full range of claimant poems or play verse, but they show three more ways that FE does not match comparably sized Shakespeare verse blocks.

[Table 2 about here]

Table 2. Elegy by W.S. falls outside of Shakespeare's verse profile in three follow-on tests (Shakespeare rejections shaded), A Lover's Complaint in one.

9. Anomalous Word Choice

FE offers yet further anomalies where W.S.'s word-choice patterns seem to run counter to Shakespeare's. When Shakespeare wants to say adventure (or adventures, adventuring, adventurous, and so on, but not peradventure), he (or his compositors) uses adventure 30 times, adventer only once. W.S. says adventer once (355), adventure never. Shakespeare says a husband 35 times, an husband never. W.S. says an husband once (526), a husband never. As Robert Gross notes, when Shakespeare refers to gratitude as a noun, he uses the word thanks over 100 times, thank no more than once and only in a situation where the s would normally disappear through elision with 's (contraction of is.) W.S. says thank twice as a noun, neither in a situation involving elision (l. 249, 433), thanks never. Shakespeare says no other 54 times, none other only once. W.S. says none other once (566, "none other prop"), no other never. Shakespeare never uses the contraction 'em in his nondramatic verse. W.S. does (552). Stated differently, Shakespeare was 99-percent consistent with himself in his choices whether to use one of these variants or the other; but W.S. got it wrong four times in four tries -- five, if you count the 'em as a single try (Table 3). Table 3 is only a start. In his plays Shakespeare was two or three times more likely to say while, whilst, or whilest than whiles. In his poems he was twenty-four times more likely to choose while or whilst over whiles. W.S. defied the odds with two while's and nine whiles's. Shakespeare chose till over until in his plays four times out of five, in his poems, nineteen times out of twenty. W.S. again defied the odds with one till and two until's. Shakespeare chose because over for that (meaning "because") in his poems and plays about six times out of seven. W.S. offered six for that's in this sense, and no because's (l. 34, 110, 155, 371, 374, and 456). Are these striking discrepancies just happenstance? Or do they, too, raise doubts as to whether FE is "formed from textual and linguistic fabric indistinguishable from that of canonical Shakespeare?"

[Table 3 about here]

Table 3. Shakespeare strongly preferred adventure to adventer, a husband to an husband, thanks (noun) to thank (noun), and no other to none other. He never used 'em in place of them in his nondramatic verse. W.S. got it "wrong" at least four times in four tries (Shakespeare rejections shaded).

10. Conclusions

We draw several lessons from our years of testing:

10.1. Authorship matters.

Most people who have read this far would suppose that it matters whether Rembrandt painted A Polish Rider. If he did -- or if he did not -- it makes major differences as to how we should judge the painter's tastes, contacts, and stylistic range and how we should view the painting. It also matters whether Shakespeare wrote FE. If he did, we would have a range of moods and styles hitherto unrevealed or at least unrecognized. Perhaps we would have a clue to the identities of the Fair Youth and the Dark Lady of the Sonnets. In the case of Shakespeare, such prospects should and do excite. Literature department regulars may be understandably weary of the abundance, persistence, and intensity of Shakespeare-authorship controversies, too many of which are far-fetched, but they should not be surprised at their existence nor offended at the public's continuing and consuming interest in them.

10.2. Some idées fixes can have redeeming social value even if mistaken.

Nor should they always take umbrage if a given researcher has a blind spot or two and hasn't made all the right calls. What would have happened if Foster had looked at FE in 1984 and had not become privately convinced that Shakespeare wrote it? It seems hardly likely that we would have had Elegy by W.S., the Vassar Electronic Text archive, SHAXICON, or Foster's twelve years of relentless, indefatigable, pioneering investigations of incongruent who's, hendiadys, mutual borrowings, play performance cycles, relationships with Thomas Thorpe, and so on, which have made him an invaluable resource for our investigations. Who would dream of spending so much time on, say, vindicating a private conviction that W.S. was not Shakespeare, far less that he was someone like Simon Wastell or William Sclater, other FE claimants? If Foster had arrived too quickly at our skeptical conclusions about FE's provenance, we and the world would have been much the poorer for it.

The same, we suppose, could be said of our own implicit preconceptions that one or another of the Shakespeare claimants might have been the True Shakespeare and that a year or two of pressing buttons might show which one it was. Unlike Foster, we abandoned our preconceptions after our own three to seven years of relentless text-gathering and innovative button-pressing showed both our preconceptions to be untenable. But had we not started with our own idées fixes, it seems hardly likely that we would have gone to the trouble of gathering and commonizing the Claremont Text archive; learning how to search it efficiently for new words, leaning microphrases, bundles of badges, and so on; and coming up with validated tests which we could apply to FE. We would never have gone to such lengths just to test FE.

10.3. However, if an ounce of mistaken conviction can be good, a ton of it is not necessarily better.

Moreover, in a field as conjectural as authorship studies, the right call is not always clear. Scholarly consensus on LC, for example, has shifted back and forth many times over the years and may well do so again as people think of new ways to test it. It seems natural to expect that computers would narrow the zone of controversy somewhat, though one would hardly guess so from the controversy over FE and LC, where the two high-tech teams have come to opposing conclusions. If the controversy were a bit lower-key, and if the focus were on the other ninety-eight percent of our findings, or on a comparable percentage of the Foster exclusions, we suspect that there would be substantial agreement between the two teams, which did work closely together for many years; from our perspective, both profited greatly from the association. Certainly ours did. Furthermore, since most of these new methods are experimental, it is foolish to expect any of them to be the last word on the subject at this stage. Trials of this kind do produce error; and you can't normally expect to find and correct all your own errors without sharing results with others and seeing how they look from a different perspective. Our work has made us skeptical of many Shakespeare attributions; it has also made us cautious about claiming that our own disproofs of authorship have ended all argument, far less that someone else's proofs of authorship have done so.

10.4. Shakespeare is of a piece.

Nevertheless, in test after test, we were able to find a consistent Shakespeare profile that seemed tight enough at least to distinguish him from some others. If Shakespeare's works were written by a committee, as some anti-Stratfordians claim, the committee was an astonishingly consistent one. If they were written by any of the claimants we tested, or by the same person who wrote any of the apocryphal plays and poems we tested, that person was astonishingly inconsistent.

10.5. Despite some striking Shakespeare resemblances, FE and LC fail too many Shakespeare tests to look much like Shakespeare.

Donald Foster has developed and deployed many Shakespeare tests. Some of them, such as hendiadys, incongruent who's, and especially SHAXICON (from what we know of it) are glorious ones -- clever, subtle, innovative, telling, but not necessarily conclusive. Some tests are more mundane. We are in his debt for both kinds. Few people will want to try a Shakespeare attribution without taking most of the Foster tests into consideration. Taken as a whole, these formidable-looking tests do tell us that, whatever we may think of FE's literary quality, the poem is loaded with green lights for rare quirks that it shares with Shakespeare.

Our problem is not with the green lights but with the red ones, which we deem more important for authorship determination. Red-light tests are not as well-developed in the Foster arguments as green-light tests, and we believe that his neglect of the red has led to overconfidence that his green-light evidence is enough to prove his case. Any test that purports to say "this can only be Shakespeare" carries a heavy burden of proof that it unfailingly says no to everyone else. We do not think that Foster has met this burden, either for SHAXICON, which is still in the early stages of its development, or for incongruent who's. In the latter case there are enough contrary examples to suggest that something still seems to be squishy, either the definition of a "Shakespearean who" or the definition of "immeasurably small," or the comprehensiveness of the Vassar archive, or the exhaustiveness with which it was canvassed. And, if incongruent who's, which are relatively easy to check, still seem squishy, what is one then to make of the equally sweeping but ten-times-harder-to-check claim that hendiadys is "something like Shakespeare's private property?" We remain wary of such claims.

We also remain skeptical of the parallel assertions and intimations that the "lack of systematic rebuttal" of FE's supporting evidence means that there is no contrary evidence. Admittedly, neither MacDonald Jackson nor we gave FE much more than passing attention. But it is not true that the passing attention we did pay failed to turn up any red lights. Jackson turned up a string of red lights; so did we. We took two of our red lights off the list when, under closer scrutiny, they turned out not to be red enough for a clear exclusion of this particular poem. As Section 10.3 suggests, finding and fixing errors is a normal part of the game, especially a game involving novel, experimental techniques. Thanks to prior spot-checking, more often than not by someone like Foster whose ox was gored by one of our tests, we have refined or discounted some of our tests, added to a long list of caveats urged on our readers, and thanked the spot-checker for his or her help with the larger quest. This process is not over; and we hope that Shakespeare Quarterly readers will continue it, even if their ox is not gored. Second opinions can help -- as long as one is open to them.

At this point, with a first round of rejoinders in hand, it seems to us, despite Foster's opinion that they are "certainly wrong," that all six of the LC rejections in Table 1 are still valid, as are all three of the FE rejections. We have not heard from him on the three new FE rejections -- in the, t' plus verb, and I combinations -- nor on W.S.'s very un-Shakespearean choice ratios of a half-dozen "interchangeable" words and phrases. Of course, it would be surprising if all of our broad-brush, seven-league-validated tests were to escape erosion and refinement as others try them out in detail on the ground. But FE defies Shakespeare's customary stylistic odds in so many ways that it would also be surprising if enough of them could be eroded to allow FE a spot in the canon which is beyond all reasonable doubt. For now, if even half our tests are good, W.S. has far too many size-ten outcomes to be Shakespeare and looks more like Bertha Broadfoot than like Cinderella. We look forward to the next round of rejoinders -- or to a firmer showing that the Foster rare-quirk tests are, in fact, impervious to false positives. It is hard to see how our exclusory evidence and Foster's inclusory evidence could both be right beyond doubt. In the meantime, Shakespeare Quarterly readers, distressed at the thought that "flawless" computer-based methods have irrefutably shown Shakespeare's paternity of such a dowdy-looking poem as FE, might take comfort in knowing that there are two schools of thought on the matter. A team other than Foster's, using different methods, doubts that the evidence of Shakespeare's paternity would be strong enough to prevail even in a civil action, which requires only a preponderance of the evidence, let alone in a criminal action, which requires proof beyond a reasonable doubt.

10.6. Our tests do not say who did write FE and LC, but they do say that George Chapman probably did not write LC.

Our exclusionary tests, designed to tell us what Shakespeare did not write, are not at all designed to tell us who did write FE and LC. We don't have any Simon Wastell poems, or William Strode poems, or William Sclater poems, far less the sizable body of their poems which we would need in order to test these FE "claimants" against FE. Others are far more fit, and better disposed, to pursue this question. We do, however, have enough of George Chapman's work to permit a field-expedient test of the traditional notion that, if Shakespeare did not write LC, the true author must have been Chapman. If our Shakespeare-validated tests are also good for Chapman (and they do show consistency, though in a much smaller baseline of Chapman works than of Shakespeare's), Chapman seems an even less likely author than Shakespeare. Measured against Chapman profiles, LC fails nine of our fourteen poem tests; against Shakespeare profiles it failed only six. Some discounts could be applied to some of the tests, but with so many tests to discount, a very aggressive discounting effort would be required to put Chapman into the "could be" column for LC (see Table 4).

[Table 4 about here]

Table 4. A Lover's Complaint falls outside of Chapman's profile on nine of fourteen verse tests (Chapman rejections shaded).

10.7. "Due diligence" now requires attention to computer evidence in authorship studies.

Many literary people of Whitmanesque sensitivities may find it distressing that anyone would spend even one second, let alone ten years, crunching poems and plays when one could be reading them instead. We feel their pain. It does seem more like taking butterflies and grinding them up to find matching nucleotides than like looking up at them in perfect silence, like the stars. But grinding and matching nucleotides does tell us things that we could otherwise never know about butterflies -- how closely one population is related to another, how swiftly each population mutates, and so on. No lepidopterist could make a competent genetic-drift assessment without reference to it. Likewise, collecting and crunching poems and plays tell us things we could never know without computers. If authorship matters, computers have to matter, too. Literary people certainly should not stop reading poems or poring over old papers in dusty libraries, any more than butterfly people should stop observing live butterflies in their natural habitats. The new seven-league analytical boots are a supplement to, not a substitute for, traditional, labor-intensive research methods. In practice, they seldom even make the overall process less laborious, because, while new methods do make many tasks that used to take months now take only seconds, they don't do so for all tasks, and they often lead on to new, thitherto impracticable tasks (such as sorting out SHAXICON) which can still take months or years to get done. Nonetheless, it seems hard to imagine serious, comprehensive authorship work in the future taking place without reference to computer-aided analysis. "Due diligence" will require it. Both the Shakespeare Clinic and the Vassar project are hints of what such a future might look like. We think it will have a lot of gadgetry in it but a lot of discovery, too. Like the Thisted-Efron tests, "It all began with butterflies" -- and seems to end there, too.

10.8. Time will tell.

Computer-aided authorship analysis is in its infancy. We know enough about its capabilities to rate it a "must-consider" in authorship studies, yet reasonable people may still differ as to whether and what it proves beyond doubt. In 1989 Foster rested his case with commendable caution:

If we have learned anything at all from the recent controversy over "Shall I die, shall I fly," it is that scholars ought not to let their enthusiasm triumph over a reasoned and reasonable skepticism. Today's great discoveries too often become tomorrow's great embarrassments. Under no circumstances should the Elegy be admitted to the Shakespeare canon, or be included in forthcoming editions of his collected works, without having first been subjected to the most rigorous cross-examination. Many talented scholars will find it preposterous that Shakespeare could be credited with such a poem. Their voice needs to be heard. I do not think that I have blinded myself to any scrap of contrary evidence, yet it may be that there yet lurks in the Elegy itself, or elsewhere, some weighty evidence that Shakespeare could have had no part in writing this poem. One must allow also for a time of probation, to permit the scholarly community to digest ... contrary evidence ..., and to find where my own discussion of evidence may be objectionable, my methodology defective, or my figures inaccurate.... [I]t is a worldwide community of readers, not I..., who will have the final word.

Much has changed since then. New techniques, new evidence, new conclusions, the old cautions jettisoned. We are impressed with Foster's new evidence. But we think his old cautions are still very much in order.

Appendix 1

Two Discarded or Discounted Tests

This Appendix considers two tests that rejected FE, but which were discarded or discounted in response to what we considered plausible objections from Foster. The two tests were "Bundle of Badges 5," which subtracted a bundle of Middleton high-frequency words from a bundle of Shakespeare high-frequency words, and "Grade Level," which used a commercial grammar-checker to calculate from word-length and sentence-length the grade-level a reader would need in order to comprehend the text. BoB5 did show a clear Shakespeare pattern tight enough to reject not only most of Middleton's plays but also most of Fletcher's plays, and it also worked well enough on our 3,000-word verse blocks to reject 20 percent of the non-Shakespeare blocks tested, while rejecting only three (4 percent) of the 70 Shakespeare blocks tested. But it included he, his, and him among its Shakespeare-frequent words ("badges") and she and her among its Shakespeare less-frequent words ("flukes") and could well have produced a false rejection of a man's elegy, where expected poetic concern with the opposite sex would not be at its strongest. We have not tested other elegies, but it seems reasonable to expect more masculine pronouns and possessives in a man's elegy, and fewer feminine ones, than in a baseline of sonnets, erotic epyllia, and play verse. Hence, the difference could easily be one of subject matter, not authorship. The test seems useful for poems generally but maybe not for elegies.

We are also inclined to discount, if not discard, a "grade-level" rejection for FE, where Shakespeare's poem range was tenth-to-fourteenth grade and FE's long sentences put it at the twenty-second-grade level. But it is hard to know just how much to make the discount. In this case, Foster ascribes the difference to his own "light pointing" in editing FE and also to FE's non-stanzaic structure, found nowhere else in Shakespeare's nondramatic verse. Because it lacks stanzas, FE lacks a strong natural constraint on sentence length, especially on maximum sentence length. Longer sentences and higher grade-level, he argues, should therefore be "right on the money" for what Shakespeare should have done without stanzas.

Foster's pointing was indeed light, compared to that of his FE source quarto, which he describes as "badly printed, with botched punctuation throughout. The printshop compositors -- often bewildered, it seems, by the poet's highly enjambed verse and light pointing -- heavily overpunctuated the text." He lengthened FE's quarto-text sentences, on average, by forty-four percent, raised its grade-level by six, and more than doubled its percentage of open or "enjambed" lines, that is, lines not ended by a piece of punctuation. The results generally make more sense to the modern eye than the original. But whose is the extra enjambment, W.S.'s or Foster's? (See below for our answer.) By contrast, G.B. Evans, in the 1974 Riverside Shakespeare, shortened the original-text Sonnets sentences by nine percent, while, like Foster, almost doubling its percentage of open lines. If one restored both FE and the Riverside Shakespeare baseline poems to original punctuation, in FE-sized blocks -- yet another task made convenient by computers -- it would cut FE's grade-level from twenty-second to sixteenth, while bumping Shakespeare's range up slightly, just enough for one of the ten Shakespeare poem blocks to overlap with FE -- though no longer with A Lover's Complaint.

Thus "restored," FE would barely pass our test, with a ten-percent overlap with our Shakespeare poem-block sample. However, the "restoration" is not especially helpful to the case for FE. For one thing, a ten-percent overlap means a ninety-percent mismatch with Shakespeare, which is not quite enough to trigger a rejection under our rules (we looked for ninety-five percent), but is hardly what most people would consider a smoking gun for FE. For another, it raises questions about Foster's use of open lines and longest sentences as main tests for Shakespeare authorship in Elegy by W.S. Could aggressive de-pointing and systematic inflation of FE's sentence-length have inflated its performance on these tests?

For open lines the answer is "could well be," though there are ways of avoiding a conclusive rejection. A perfect comparison would use texts that are identically spelled and punctuated, preferably by the same editor at the same time, and matched for metrics, genre, date of composition, length, structure, and subject matter. Attaining such perfection, however, is seldom possible or even prudent because tightening a match in one particular often requires loosening it in another. To try out any given test in a reasonable time frame with limited resources, the tester must normally use experience, judgment, and conjecture to know when and how to choose the best tradeoff(s) of many which could reasonably be made. It helps if the choice of tradeoff is consistent, or, if inconsistent, that the discrepancy be pointed out and explained. For his two line-ending tests, Foster chose two plausible, but mutually inconsistent tradeoffs, one tightly matched for time but loosely matched for genre, the other, for no apparent reason, tightly matched for genre but loosely matched for time. In both cases, a pass for FE resulted, but, had he reversed his choice of tradeoffs, or used original punctuation, FE's Shakespeare attribution would have been rejected.

For his open-lines tests Foster chose to compare FE not to Shakespeare's poems but rather to Shakespeare's iambic-pentameter verse in his last plays. This choice made sense at the time because (1) Shakespeare's percentage of open lines quadrupled in his lifetime; (2) tight time-matching could therefore be more important than tight genre-matching for texts of widely differing dates of composition; and (3) scholarly consensus, when Foster was writing Elegy by W.S., held that the Sonnets, though published in 1609, were mostly written in the 1590's, when sonnets were most in fashion -- and long before the elegy was written. SHAXICON and other rare-word tests have since pointed to a later date for many of the Sonnets, but these developments are too recent (and even now perhaps too tentative) to have guided Elegy by W.S. The net effect of Foster's tight time-matching and loose genre-matching was to put the Foster-repointed FE, with its "extraordinarily high," forty-six-percent open lines, squarely within the Riverside-punctuated range of forty-four to fifty-two percent for Shakespeare's last plays.

But Foster incongruously chose not to time-match for his other line-ending test, feminine endings. These tripled during Shakespeare's lifetime, and they raise the same issues as open lines. If it makes sense to time-match for open lines, it also makes sense to time-match for feminine endings. Had this been done, FE, with twelve percent feminine endings, would have fallen far below Shakespeare's expected range based on late-play verse -- thirty to thirty-five percent for the same very late plays Foster used for his open-lines comparison, and measured, as he did, for all the verse in each play. In general, Foster paid little attention to matching by text length, an avoidable source of imprecision which could and should be corrected if -- as we hope -- he does further comprehensive work on Elegy by W.S. If he had matched for text length, it would probably have broadened the pertinent Shakespeare-poem ranges and tended to help his case for putting FE into a widened Shakespeare range. But it would also have hurt it irreparably by making the range so wide that it could no longer exclude other poems. Such a widening could invalidate two-thirds of his seventeen 1989 exclusion tests in Table 1.19 of Elegy by W.S. Open lines would still be a valid test for size-matched testing on FE-sized blocks, but the resultant Shakespeare range, though widened, would still not be nearly wide enough to encompass FE's open-line percentages by other matching rules as plausible as the ones he picked. If, for example, he had matched for genre more than date, and compared his repointed FE to the Sonnets, or to all of Shakespeare's poems (both comparisons in 4,300-word blocks to Riverside-punctuated Shakespeare), FE would have had more than twice as many open lines -- 46 percent -- as Shakespeare's respective maxima of 18 and 20 percent. If, on the other hand, he had converted both FE and Shakespeare's poem blocks back to original punctuation, FE would have fallen to twenty-two percent open lines, still far outside the Sonnets' original-punctuation range (seven to twelve percent open lines) and the all-poems original-punctuation range (two to fifteen percent). With a few days of editing to segregate verse, stage directions and speech headings, the same experiment could no doubt be performed on Shakespeare's late plays in Folio punctuation. But this brief sketch should be enough to illustrate some basic practical points about "pointing" and making matched comparisons: (1) quantitative comparisons can be done in several different ways; (2) choosing among the ways requires some exercise of judgment/conjecture which can strongly influence the results; (3) the Foster choices for testing line endings, though individually defensible, are inconsistent with one another, and, like some others of his "green-light" tests, both seem to say "FE could be Shakespeare's," where several other plausible choices say "FE couldn't be Shakespeare's."

For sentence length the impact of repointing is also clouded, for two reasons. First, the Foster test is not based on the length of the average sentence but on the length of the longest sentence; that is, on maybe one or two percent of the sentences in a poem of FE's length. From the Vassar perspective these exceptional whoppers would be "rare quirks," perhaps revealing the hand of the Bard by a tendency to stray from his norm in a certain way. From the Claremont perspective they would be "outliers," atypical of an author's prevailing habits of self-expression. We prefer tests grounded in an author's normal habits, not his rare aberrations. Second, Foster's definition of "sentence" is a rather specialized one that is not easy to test with a computer word-counter:

We may define a "sentence" in this case, not as the distance between a capital letter and period, since this depends on editorial preference, but more narrowly as a single independent clause, inclusive of all parenthetical and subordinate elements, but excluding other independent clauses attached to it with a conjunction or semicolon.

Whether this reduces dependence on editorial preference or increases it for most Shakespeare Quarterly readers is not for us to say, but it certainly increases it for people like us who rely on computers. It precludes quick, objective verification of the test with present-generation computer programs, and it leaves the impression that the test, for practical purposes, compares unusually long "sentences" created by Foster's conjectural re-editing of FE against other unusually long Shakespeare "sentences" that only he can or will define and count. Once again, under closer inspection, the test seems squishier, more judgmental, and less convincing as an objective Shakespeare identifier than it did at first glance.

Finally, one may wonder how firm the historical evidence supporting the Foster "restoration" actually is. Simultaneously to defend his repointing and to claim the elegy as "right on the money" for Shakespeare takes a long line of countervailing and highly speculative assumptions: (1) that FE's original pointing was light (presumably -- there being no FE manuscript -- because Shakespeare must have written both FE and the lightly-pointed "hand D" portion of the anonymous manuscript for Sir Thomas More, which has been conjecturally ascribed to Shakespeare); (2) that Thomas Thorpe's compositors -- different ones, maybe, from those who had managed not to botch the not-so-enjambed Sonnets and LC two or three years previously -- were nonetheless befuddled by FE's enjambment, half of which was "restored" by Foster; and (3) that, because our tests show a ten-point difference between Thomas Heywood's stanzaic Oenone and Paris and his nonstanzaic Troia Britanica (as they do), there should be the same difference for whoever wrote FE. Our tests also show a seven-point difference between Marlowe's sestiads of Hero and Leander and Chapman's sestiads, which are stanzaicly identical to Marlowe's. This discrepancy argues that there is more to sentence-length than stanzaic structure, and that controlling for stanzas (especially where both examples are poets other than Shakespeare) does not necessarily put you "on the money" for Shakespeare. Maybe such speculation as to how FE should have been pointed, and how long Shakespeare's longest sentences should have been if he hadn't written in stanzas can spare FE an outright rejection, but we don't see it as the stuff that proofs beyond reasonable doubt are made of.

Postscript

An Alliance Gone Bad

Knowledgeable readers may be puzzled over signs of a major shift in our relations with Donald Foster. From 1987 through December 1995 he was a close friend and advisor, effusively grateful to us for our "invaluable assistance" to his project. In January 1996 the alliance went bad. He had just announced his "proof beyond all reasonable doubt" that FE was by Shakespeare; but our final report had been accepted by Computers and the Humanities, still quietly reporting five rejections, three of which we were not prepared to disavow. Foster abruptly became "exasperated that problems with accuracy and with the validity of testing were never addressed, only multiplied." He informed us that (1) FE was "Shakespeare's beyond all reasonable doubt;" (2) all our tests rejecting FE were "not just doubtful, but certainly wrong;" (3) none of our work was ready for publication; and (4) any attempt to publish our findings or present them at conferences on FE would destroy our reputation and besmirch his by association. Would we please remove his name from our acknowledgements? This "moment of weariness," as he describes it, marked the end of his written correspondence with us.

It also, in a sense, put us on notice that he did not think the world was big enough or uncertain enough to accommodate both our findings and his; that we were the ones who were "certainly wrong;" and that our choice was to put up or shut up. It forced us to abandon the low profile on FE that we took in the Shakespeare Clinic Report and to pursue the focused study of FE and LC which led to this article. We did not, however, abandon our low profile toward our old ally Foster, for several reasons. It would seem ungrateful; it would clutter up our article with periphera; and it might help avoid a land war in Asia with him over what many regard as the literary equivalent of the Spratly Islands.

By and large, we have stuck to this position, though now it is almost solely to avoid clutter. In January 1997, as we prepared this article for publication, our Clinic Report finally appeared -- along with a blistering, blustering "Response" which Foster had quietly gotten slipped into the same issue of Computers and the Humanities. Foster thought that, despite our claim to have eliminated every Shakespeare claimant tested, we were still peddling our false, anti-Stratfordian screed; that our figures were "worthless;" that we were "cherry-picking" and "playing with a stacked deck;" that we had "exiled" inconvenient data; that our tests were "alas, no good;" and that the whole project, like one of its tests, was "at best dubious, at worst, foul vapor." Here, it seemed, was our land war right in the same package with our "last chapter."

We did not see this "Response" as Foster's best work. We likened it to an exorcism of our fancied demons by the old conjuror Glendower in Henry IV, Part I, claiming to call up "foul vapors" from our vasty deep. None had come at Foster's command, and we concluded that he was trying to win too many points with bluster that he could not win honestly on the facts. Readers who wonder how much heat one can work up over proclitic microphrases might wish to consult both Foster's "Response" and our reply, which is available on request. Others might prefer to settle for the present version, which, we think generally shuns confrontation. It may not be as exciting as a nice pitched battle, but we think it raises less dust and is more appropriate for a discussion of experimental methodologies. We also think the tests are better understood as simple authorship tests than as bludgeons to belabor those who disagree with us. We hope that Foster will see it that way, too, but we can't say that we are holding our breath.

Notes