The view that biotechnology patenting has reached unsustainable levels is well accepted among many legal scholars. This Article presents the first comprehensive empirical study of biotechnology patents designed to test this hypothesis. Our analysis reveals the striking rise and fall in biotechnology patenting, the surprisingly diffuse pattern of patent ownership, and the consistent influx of new entrants conducting biotechnology research and development. This Article finds little evidence that the rise in biotechnology patenting is adversely affecting innovation. Counting patents, as it turns out, offers few insights on its own. One must also have a measure of the geographic scope of the scientific commons and the distribution of patents within it. These findings lead to a cautionary corollary for patent metrics generally-certain fundamental uncertainties associated with the statistics of innovative success cannot be overcome using even the most sophisticated empirical methods. Ironically, the current enthusiasm for empirical work may have caused academics to reify simple patent metrics over the manifest complexity of innovative processes.
I. Introduction
The debate over the dramatic rise in patents issued each year in the United States springs from a deceptively simple question: Do we have too much of a good thing? For many commentators, the numbers speak for themselves.' As Jon Dudas, Director of the U.S. Patent and Trademark Office (PTO), has noted, "the [PTO] issued more patents last year . . . than it did during the first 40 years of its existence."2 Yet, the economic theory that frames this debate, the tragedy of the anticommons, is a matter of relative scale-fifty patents distributed over a narrow field of invention may be grounds for concern whereas fifty patents of analogous scope scattered over a broad field will not.3 Large numbers alone are thus meaningless. One must have a measure of the geographic scope of the relevant scientific commons and the distribution of patents within it to assess the impact of rising patent numbers.
Patent metrics are central to the debate over the 1990s patent bubble. The simplest metric, patent counts, has dominated theorizing about the effects of the extraordinary rise in U.S. patenting on innovation. The most prominent example of this approach, Michael Heller and Rebecca Eisenberg's elegant anticommons theory, posits that transaction costs can spiral out of control as patent numbers increase.4 Other patent metrics, such as the number of claims or citations in a patent, are also important, most notably as indicators of patent value. Commentators have proposed that patent characteristics be used by the PTO as metrics for enhancing the efficiency of PTO reviews and reducing the burden of rising patent application numbers.5 Patent metrics thus play an integral role in efforts to diagnose and mitigate the impacts of the recent surge in U.S. patenting.
This Article challenges the widely held belief that the rapid growth in biotechnology patenting over the last decade is impeding innovation. It argues that the misuse of patent metrics has both fostered dire predictions and created unrealistic expectations about the capacity of patent data to guide policy. The Article's focus on biotechnology is motivated by its singular role in the debate-for many commentators, biotechnology is the proverbial canary in the coal mine signaling the need for major reforms.6 The unique status of biotechnology follows from the extensive overlap between public and private biotechnology research, the importance of patents to the biotechnology industry, and the high social value accorded to biomedical research.7 These traits have dramatized the tensions between the ownership model of patent law and the open-access principles of science.
The centerpiece of the Article is a comprehensive empirical study of biotechnology patenting in the United States.8 Counter to much legal scholarship, the study finds little evidence that the recent growth in biotechnology patenting is threatening innovation. This analysis is based on a data set comprised of biotechnology patents granted in the United States from January 1990 through December 2004, more than 52,000 patents in all.9 The years encompassed by the data set offer a rich context for examining patent policy. The data cover the period of the most dramatic rise in biotechnology patenting, important shifts in PTO policy towards more stringent standards for obtaining patents on genetic sequences, and dramatic growth followed by a significant retrenching of the biotechnology financial markets.10
The study is designed pragmatically to take advantage of several readily available categories of data, including investigations of broad patent trends, patterns of patent ownership, and the distribution of patents across PTO patent subclasses.11 One of the Article's most significant findings is the degree to which ownership of biotechnology patents is diffuse. Even the largest companies, on average, are granted fewer than thirty biotechnology patents per year, and the number of entities obtaining biotechnology patents has consistently increased over the fifteen years covered by the data set.12 Interpreting these trends is necessarily impressionistic, but the lack of concentrated control, the rising number of patent applications, and the continuous record of new market entrants provide strong evidence that biotechnology patenting is not adversely affecting innovation.
These results are consistent with an emerging schism between current theorizing about biotechnology patenting and recent empirical work. The existing empirical studies find few clear signs that the patenting of biotechnology inventions is adversely affecting biomedical innovation.13 This Article suggests that the divergence between data and theory is traceable to an over reliance on patent counts, which prove to be a very weak metric. Subsequent commentary has broadened Heller and Eisenberg' s anticommons theory, which drew on two specific scenarios that highlighted the potential negative effects of upstream patents on downstream research and products.14 This discourse has morphed the anticommons theory into one that associates rising patent numbers almost inexorably with patent anticommons, transforming Heller and Eisenberg' s contextually delimited theory into a generalized model premised on a relatively simple relationship existing between patent counts and transaction costs.15
Little reason exists for accepting the simplified model implicit in the generalized variant of Heller and Eisenberg's theory. Unlike Garret Hardin's uniform agricultural commons for which a simple metric (number of cattle) was available,16 patent policy must contend with a much more complex environment. Science lacks a unique set of spatial dimensions; its geography is too heterogeneous and multidimensional, and numerous points of reference exist from which to assess the distribution and size of patented enclosures. Further, the scope of patents themselves is both highly variable and exceedingly difficult to quantify. Direct patent counts, as a consequence, are unlikely to be consistently correlated with the patent anticommons on which the generalized theory is predicated.
This Article exposes a distinct set of interpretive challenges for metrics based on patent characteristics. The study data display remarkably broad variances for the number of claims, citations made by or to a patent, and the time spent by PTO examiners prosecuting (i.e., reviewing) patent applications. To give just one example, in any given year of our study the number of claims in a patent ranged from one to several hundred. The broad variances we observe effectively rule out using these characteristics to predict other attributes of a patent, such as its technological field, ownership status (e.g., government, corporation), or economic value.
The practical ramifications of this finding are far-reaching. Chief among the issues implicated is fashioning an effective response to the dramatic rise in patenting, which has created a large backlog of patents at the PTO.17 Legal commentators have singled out the substantial majority of patents-some data suggest more than ninety-five percent18-that have little economic value as a target for economizing PTO resources.19 Under this scheme, PTO resources would be conserved by subjecting valuable patents to careful scrutiny and all other patents to light reviews. Our analysis shows that this triage scheme is unworkable for the simple reason that valuable patents cannot be identified ex ante.
This result has a noteworthy theoretical corollary. The economist F.M. Scherer was one of the first people to show that innovative success is chaotic and that the distribution of valuable inventions defies standard statistical methods.20 A recent empirical study, Valuable Patents, challenges, at least implicitly, the empirical work on which Scherer's findings are based.21 We reexamine the Valuable Patents study in light of our own data and conclude that Scherer's work ultimately withstands the challenge. Regrettably, this result implies that one cannot avoid the statistical uncertainties exposed by Scherer and other economists.
The Article is divided into two central sections and an Appendix that describes the data and analysis in detail. Part II examines the general trends in biotechnology patenting including patent counts, patent-ownership patterns, and the distribution of biotechnology patents across distinct areas of research and development. This analysis finds few tangible signs of the negative impacts presumed to be associated with patent anticommons. Part III assesses the characteristics of biotechnology patents and evaluates our results comparatively with three recent studies of U.S. patents. This analysis reveals the methodological obstacles impeding empirical work on patents-the uncertainties prove to be greater than Scherer estimated. The Article concludes with a short discussion of opportunities for refining current empirical methods.
II. Searching for Signs of a Biotechnology Anticommons
The spectacular rise in patenting since the mid-1980s has inspired a flurry of writing by scholars concerned about its negative impacts.22 Critics of the current patent bubble, inspired by Heller and Eisenberg's work, have cultivated and generalized their potent metaphor-the-anticommons-to illustrate how patents can deter innovation.23 Consistent with the image evoked by the anticommons metaphor, expansive patenting is believed to fragment the scientific commons such that no single entity has sufficient patent stock to pursue its program of research and development.
Heller and Eisenberg took care, however, to identify two specific scenarios in which patents may unduly increase the transaction costs of downstream product development. In the first, patents on numerous "upstream" technologies, or research tools, act like "tollbooths on the road to product development, adding to the costs and slowing the pace of downstream biomedical innovation."24 In the second, use of reach-through license agreements on patented upstream technologies are used to obtain "rights in subsequent downstream technologies" (e.g., royalties on sales, licenses on future discoveries).25 The direct signs that such localized anticommons are burdening a scientific field will be reduced scientific output (e.g., papers, patents, data) and rising patent licensing and equipment costs. Indirect effects may include reduced private sector investment, diminished entry of new scientists and companies, and increased concentration of patent ownership (i.e., patent portfolio races).26
Despite the widespread attention that the generalized anticommons theory has garnered, few empirical studies are available either to confirm or refute its predictions. The best studies have been conducted by social scientists who have surveyed scientists working in the public and private sectors about the impacts of patents on their work.27 These studies have found little clear evidence that patenting of biotechnology inventions is impeding biomedical innovation.28 Setting aside often repeated anecdotal evidence,29 only one empirical study of biotechnology patenting has found evidence of generalized anticommons scenarios. However, that study relied on an indirect measure-aggregate citation rates to scientific articles covered by a patent-and found at most a "quantitatively modest" effect.30
The lack of supporting empirical work has both conceptual and practical origins. From the outset of the debate, the intuitive appeal of the generalized anticommons theory has obscured the limited value of patent counts in determining whether an anticommons exists. This perspective marginalizes the denominator-the scope of the field of invention at issue-by focusing attention on seemingly impressive patent statistics. In so doing, the generalized theory avoids the far more difficult problems entailed in defining the relevant fields of invention and identifying accurate metrics for the scope of the patents at issue.
To appreciate the significance of this oversight, it is useful to return to Hardin's seminal theory on the tragedy of the commons. Hardin's economic parable describes how individual self-interest leads inexorably to unsustainable exploitation of resources held in common.31 When applied to intellectual property, however, Hardin's storyline inverts because intellectual property is an inexhaustible, nonexcludable resource. The problem thus becomes inadequate internalization of positive externalities, and the tragic result becomes wwaferexploitation of the resource. The traditional remedy in both cases is privatization.
Hardin's theory has important shortcomings that derive from his simple model of the commons as a homogenous resource used by several operators for the same purpose.32 Subsequent commentators have exposed the limits of his model by describing counterexamples (e.g., rivers, parks) for which privatization is not the optimal legal regime.33 They show that the physical characteristics of the resource and the ways in which it is exploited (e.g., consumptively versus nonconsumptively) have important implications for selecting an efficient legal regime.34 Above all, these examples demonstrate that policies for managing property, intellectual or otherwise, cannot be assessed in the abstract. One must have a detailed understanding of the characteristics and uses of the underlying resource itself.
Proponents of the generalized anticommons theory either ignore the characteristics of the scientific commons altogether or base their views on questionable assumptions about it, such as the assumption that upstream patents will inevitably restrict access to essential research tools for which no alternatives exist.35 This neglect causes the generalized anticommons theory to be incomplete, and may account for the divergence between the social science data and the cautionary predictions of its proponents.36
The subparts that follow employ several complementary methods to assess the potential significance of the generalized anticommons theory for biotechnology research and development. The first subpart explores the interplay of patent statistics, shifts in PTO policies, and changing economic conditions. This analysis affords a number of insights into trends in biotechnology patenting and their relationships to external factors. The second subpart evaluates ownership patterns of biotechnology patents and assesses the implications of the small portfolios found for most entities. The third subpart argues that biotechnology, at this stage of its development, is generally not congested, but finds that determining whether localized anticommons exist raises irreducibly complex problems for studies based on patent counts. The generalized anticommons theory is shown to be empirically elusive.
A. The Rise and Fall of Biotechnology Patenting
At the broadest level, we find that the number of biotechnology patents issued per year peaked at 5,977 patents in 1998 and then declined to 4,324 patents (a twenty-nine percent drop) by 2004.37 The same basic progression-a peak in the late 1990s followed by a flattening or significant decline in the number of patents issued-is mirrored in virtually all of the data. This trend is observed for each of the five biotechnology subfields we defined,38 individual PTO subclasses drawn from each of the biotechnology subfields,39 collectively the thirty PTO subclasses with the highest numbers of patents,40 the four groupings of biotechnology companies we created,41 and the three categories of assignees42 (i.e., the federal government, universities, and corporations).43
The other major trends we observe concern assignee types and differences between biotechnology subfields. As one would anticipate, corporate ownership of biotechnology patents dominates, accounting for an average of eighty percent of the patents issued from 1990 through 2004. However, this overall average obscures the growth, in absolute and relative terms, of university and government patenting over this period.44 These gains represent about a tenfold increase in patents issued to universities and the federal government between 1990 and 1998-1999, although the peak numbers were followed by a decline of more than twenty- five percent.45
Contrary to our expectations, the division of patents between the five biotechnology subfields is similar for corporations, universities, and the federal government.46 Corporations patent more in the proteins subfield and less in the nucleotides and immunological subfields, but the differences are nominal.47 Patenting of genetically modified organisms is the one area of substantial divergence. Universities and the federal government received fully twenty-nine percent of the patents, but the absolute numbers are quite low.48 We anticipated that corporations would receive a higher proportion of patents covering measuring and testing processes, as one might expect corporations to focus on applied work, but no such differentiation between basic and applied patenting is observable in our data. This finding adds credence to concerns that universities are very actively patenting biotechnology research tools.
The observed rise and fall in the number of biotechnology patents issued is consistent with the generalized anticommons theory. One could interpret the fall-off in patents issued after 1999 as a drop in innovative output brought about by the fragmenting effects of thousands of patents and growing patent tolls on research and development. The turnaround in 1999 thus could represent a dramatic tipping point beyond which spiraling licensing costs outweighed the incentives patents provide.
A closer examination of the data quickly reveals the deficiencies of this simple storyline. For one, the number of applications received (as opposed to issued) by the PTO for biotechnology patents rose substantially post1999.49 This observation on its own suggests that the anticommons theory cannot be accepted before eliminating several alternative explanations.
Changing economic conditions are an obvious factor to evaluate. The most dramatic decline over the fifteen-year period of the study took place in 2002,50 but this followed the peak in biotechnology patenting by more than a year.51 The 1998 slump in equity financing of biotechnology companies is more promising,52 as it aligns with the peak in biotechnology patenting. It also runs into trouble, though. A narrow market contraction cannot explain the concurrent drops in the number of biotechnology patents issued to universities and the federal government.53 More importantly, research and development funding in the private and public sectors continued to grow after 1998. The largest increases in research and development funding among public companies occurred after 1999,54 and federal funding for the life sciences climbed steadily through 2004.55 These trends are at odds with an economic study linking growth in research and development funding to rises in the number of patents issued.56 Faltering economic conditions are therefore unable to account for the late- 1990s decline in biotechnology patenting.57
I. Tipping Points: Lagging PTO Resources and Shifting Patent Terms. - The PTO' s decision to strengthen the utility requirements in 1999 stands out as a legal reform that could explain the dramatic leveling off of biotechnology patenting.58 This rule change reversed the PTO's 1995 decision to liberalize its utility guidelines, which led to much freer granting of patents on DNA and polypeptide sequences.59 Both the relaxing of the utility doctrine in 1995 and the PTO's reversal in 1999 correlate well with the dramatic rise in biotechnology patenting that began in 1994 and the leveling off (or decline) that occurred in 1 998-1 999.60 The obvious defect in this explanation is that the changes in the utility doctrine target a subset, albeit an important subset, of biotechnology inventions.61 Yet, the same trend is observed for biotechnology inventions that do not directly claim isolated nucleotide or polypeptide sequences.62
On its face, the narrow scope of the change in the utility guidelines suggests that it cannot possibly explain the rise and fall in biotechnology patenting that occurred during the 1990s. However, if the PTO's move-first to liberalize and then to strengthen its utility guidelines-represented a general policy to relax standards followed by a renewed vigilance in reviewing biotechnology patent applications, it could explain, at least partially, the late- 1990s inflection. Moreover, if these changes did in fact reflect a broader shift in PTO policy, more stringent review of biotechnology patents would likely impact the average time for patent prosecutions and the rate of patent application denials, both of which are measurable.
We do in fact observe a significant increase (about a year) in the average prosecution time during this period.63 Evidence also exists that denials of biotechnology patents increased after 1 999 - the number of biotechnology patent applications filed with the PTO increased by about forty percent while the number of biotechnology patents issued declined by almost thirty percent.64 The striking temporal correlation and apparent rise in the number of application denials for biotechnology patents support an inference that the shift in PTO policy contributed to the observed decline in the number of biotechnology patents granted.
Two transient factors, however, appear to be much more important influences on the declining patent numbers. First, the change in June 1995 from the seventeen-year patent term to the twenty-year term caused a large jump in biotechnology patent applications-4,602 applications were filed in 1994; 7,626 in 1995; and 4,045 in 1996.65 This discontinuity magnified the rise in patents issued during the late 1990s and, as a one-time infusion of applications, contributed to the leveling off and decline observed post-1998.66
Second, the falloff in the number of biotechnology patents issued likely reflects a saturation of examiner resources. In other words, the PTO cannot process more biotechnology patents, such that the rate-limiting step in issuing patents is no longer inventive output but the PTO itself. Tellingly, this predicament is not unprecedented. In the 1970s, the leveling off and slight decline observed at that time in the total number of patents issued was traced back to inadequate PTO resources.67 Circumstantial evidence also supports this theory now. The existence of a resource shortfall is consistent with recent calls from the PTO for more resources and growing alarm about the backlog of patents waiting for review.68
Several factors undoubtedly account for the decline in biotechnology patents issued each year. Although our data cannot resolve the key causative factor unequivocally, we suspect that the shortage in PTO resources is, as in the 1 970s, the dominant factor. Even ignoring the much larger total number of biotechnology patent applications filed with the PTO, the highest number of biotechnology patents issued in a year, about 5,900, is far lower than the numbers of biotechnology patents filed in 1995, 1999, and 2000 that were also ultimately issued. These data suggest that the PTO's maximum review capacity lags current application rates by hundreds of patents, which, consistent with our data, would affect all categories of biotechnology patents. This explanation has the virtue of being simple, and particularly given that biotechnology patent applications have continued to rise, it is also more plausible than the scenario predicted by the anticommons theory.
2. (Relatively) Low Patent Rates for Genes and Proteins.-The anticommons debate has fixated on gene patents. Yet, the biotechnology subfield with by far the largest number of patents is measuring and testing processes. This subfield accounts for almost fifty percent of the biotechnology patents granted from 1990 through 2004, and it has the single-mostpopulated PTO subclass (approximately eleven percent of the total), measuring and testing processes involving nucleic acids.69 Patents specifically directed at nucleotide sequences and genetically modified organisms account for only nine percent and three percent, respectively, of the total.70 At the mid-level, patents on protein sequences and immunological inventions account for about twenty-six percent and twelve percent, respectively.
The specific types of inventions that dominate biotechnology patenting include the following:
* Measuring and testing devices (e.g., involving nucleic acids, viruses, bacteria, enzymes, microorganisms)
* Methods for making proteins and polypeptides71 (e.g., microorganisms, molecular vectors)
* Polypeptides of various lengths and protein sequences (e.g., enzymes)
* Polynucleotides of various lengths and DNA/RNA fragments72
* Methods related to polymerase chain reaction (PCR)73
* Immunological testing methods (e.g., monoclonal antibody-based methods)
Patents on complete gene sequences are notably absent from this list. Moreover, while gene fragments are among the more highly patented biomolecules, the absolute number of patents issued per year recently has been, on average, fewer than 800 patents.74 A similar trend is observed for patents on protein sequences. Almost 160 patents directed at protein sequences were issued in 1998, although this number has since declined to about 120 patents per year.75 To put these numbers in perspective, genetic markers number in the millions,76 and scientists estimate that there are likely about one million human proteins.77 The number of potential polypeptide and protein sequences is certain to be far greater. In this light, the number of gene and protein patents currently being issued appears to be much less threatening.78
The dominance of the measuring and testing processes subfield is a constant throughout the fifteen-year period of our study. By contrast, patents on protein and polypeptide sequences experienced almost a fifty percent drop in their relative share while patents on genetically modified organisms, nucleotide sequences, and immunological processes and compounds almost tripled their share of biotechnology patents. This relative drop in the number of protein and polypeptide patents could reflect a shift in research and development priorities. During this period, dramatic technological advances were achieved in genome mapping and the Human Genome Project -which directed substantial funds and scientific talent towards resolving the nucleotide sequence of the human genome - was launched and completed.79
Overall, these data run contrary to fears that upstream patents on gene and protein sequences are privatizing the human genome and proteome.80 The relatively low numbers of patents on genetic and protein sequences suggest that worries about excessive patenting of genes and proteins may be overblown. While interpreting the stabilization in the number of patents granted on genetic and protein sequences is difficult given that limited PTO resources appear to be a major factor, there are other signs - particularly recent announcements of large databases of gene- and protein-sequence data being dedicated to the public domain - that speculative patenting in this area is not wildly proliferating.81 More studies and consistent monitoring undoubtedly will be needed before this less ominous view can be accepted, but the existing trends provide grounds for hope rather than cynicism.
B. Biotechnology's Diffuse Patent-Ownership and Diminutive-Patent Portfolios
One of the most striking features of our data is the large number of entities that own biotechnology patents.82 Figure 8, which displays the number of assignees over time, mirrors the trend-a rapid rise during the mid-1990s followed by a stabilization period-observed in the number of biotechnology patents issued each year. Almost identical trends are observed for each of the five biotechnology subfields,83 and for the data for assignee corporations and universities.84 These trends prove that biotechnology patents are spread broadly across an expanding number of patent owners.85
The distribution of patents over a large number of assignees is also reflected in the low averages for the number of patents received annually per assignee.86 The average total number of patents obtained for all three types of assignees is about twelve, or fewer than one per year, and almost fifty percent of assignees obtained no more than twenty-five patents over the fifteen-year period. Even among the top thirty patent owners (excluding the federal government and the University of California), assignees obtained on average just 440 patents over the fifteen years of the study, or about 29 per year, and they account for twenty-eight percent of the total patents issued. These numbers are tiny in comparison to information technology and electronics industries, where the largest companies (e.g., IBM Corporation, Canon Kabushiki Kaisha, Samsung Electronics Corporation) regularly obtain more than one thousand patents each year.87
We disaggregated the data to evaluate the trends for the university and corporate assignee classes and biotechnology subfields. The highest annual average observed for corporations, about 4.5 patents in one year, occurred in 1998, and the current average is only about 3.0 patents per year. The ten corporations with the largest biotechnology patent portfolios obtained an average of just 27 patents per year and have total biotechnology patent portfolios that average 409 patents.88 Interestingly, the numbers for our large, successful biotechnology companies are significantly lower, with the average annual rate being about 14 patents and portfolio sizes averaging 264 patents. We observe a slightly higher concentration of patent ownership among top universities, with the ten largest universities averaging about thirty-one patents per year, although if the two largest multi-campus systems-California and Texas-are removed, the average drops to twenty-three patents per year.89
Only the top few assignees have patent portfolios for the fifteen-year period that are quite large. The federal government and the University of California have, by far, the largest numbers with 1,322 and 1,287 patents, respectively; however, neither is a single entity in the common or practical sense of this term. Consistent with these findings, the average fifteen-year patent portfolio sizes for corporations and universities are about nine and twenty-four patents, respectively. Similarly, the fifteen-year averages for each biotechnology subfield varies from nine, for measuring and testing processes patents, to about three, for genetically engineered organisms.
The average patenting rates for the top assignees provide an important calibration point for understanding the implications of our findings. More than a factor of thirty-five separates the aggregate average from that of the top thirty assignees. But, as we have already noted, even the top assignees obtained modest numbers of patents per year and maintained patent portfolios typically in the range of several hundred patents.90 Further, no single entity owns more than a small percentage of the biotechnology patents issued over the past fifteen years. The low aggregate and disaggregated averages that we calculate expose the low ownership density of biotechnology patents. The modest sizes of current biotechnology patent portfolios suggest that no single entity has the patent capital necessary to dominate biotechnology research and development.
The small patent portfolio sizes also provide indirect evidence that patent anticommons are uncommon in the biotechnology sector.91 Where patent anticommons are pervasive, such as the semiconductor industry, they have caused portfolio sizes to balloon.92 One would anticipate a similar phenomenon to be reflected in our data. Similarly, although the large number of patentees could exacerbate the risk of patent anticommons-diffuse ownership implies that the number of licenses required ought to be higher-the existing empirical studies suggest otherwise, as the number of patents that must typically be licensed tends to be very low (i.e., not more than a handful).93 Finally, the rising number of entrants also runs contrary to the anticipated effects of a patent anticommons, which would deter new entrants and cause funding for new ventures to dry up, particularly with the increasingly widespread practice of venture capitalists requiring intellectual property due diligence before investing in a company.
Other factors, of course, could be influencing portfolio sizes, and large portfolios may not be an inevitable byproduct of patent anticommons.94 Similarly, the low level of licensing currently observed is based on relatively modest studies that may not be adequately representative, and in any event conditions could change. Nevertheless, an inconsistency exists between the existing data and the patenting trends often associated with patent anticommons, which, at the very least, ought to be explained if the generalized anticommons theory were to be accepted.
C. Overlooking Scale and Distribution in the Anticommons Debate
Studies of general patent trends illuminate the aggregate effects of rising biotechnology patent numbers. The question at the heart of this debate-are patents deterring innovation?-ultimately turns on the size and distribution of the scientific spaces enclosed by biotechnology patents. This subpart focuses on these two elements. Drawing on an earlier article by Adelman,95 we argue that patents occupy a small region of the biomédical science commons and show that this general lack of congestion mitigates against patent anticommons emerging.96
The discussion then turns to the distribution of biotechnology patents. It is here that our analysis focuses on the structure of the scientific commons. We begin by examining the heterogeneous and multidimensional character of biomédical science using the PTO 's classification system. This analysis shows that no single analytical framework can provide a definitive picture of biotechnology patenting or the distribution of patents within the biomédical science commons. Common objections to the arbitrariness of the PTO classification system97 therefore do not necessarily derive solely from limitations that are unique to the PTO's approach to classifying patents.98
A point on methodology is worth making here. Two primary approaches exist for analyzing patent data-one narrow and finely grained, the other broad and based on general patent characteristics. Several excellent studies have been conducted using the narrow, fine-grained approach; in the legal literature, most notably by John Allison and various coauthors.99 Our study adopts the broad approach that is often found in the economics literature, which sacrifices detail for completeness. We opted for this approach because completeness is central to our objective of assessing the potential value of patent counts as a metric for the prevalence of anticommons problems.
I. Biotechnology's Uncongested Commons.-Proponents of the generalized anticommons theory presume, as they must, that the commons for biomedical science is strictly finite and congested.100 Yet, a characteristic of biomedical science that stands out is its unbounded scope. Biotechnology methods have produced vast quantities of genetic data, but scientists have not been able to keep up with the explosion of new information.101 The opportunities for biotechnology consequently far exceed the capacities of the scientific community.102 Research scientists' accounts affirm this view through their observations that, in the great majority of cases, patents can be avoided by undertaking parallel lines of research.103 It is this disparity between resources and opportunities that makes biomedical science an unbounded and uncongested resource.104
The complexity of human biology is in large part responsible for the open-ended nature of biotechnology science. Many biological systems have built-in redundancies that protect against failures of specific processes, and this redundancy is more prevalent the more important the process.105 Further, diversity is found in the huge range of genetic variants scientists are discovering and the multigenic nature of common diseases.106 This complexity belies an atomistic, gene-by-gene analysis of disease processes.107 Common diseases will, as a consequence, be associated with multiple pathways or molecules, implying that most important diseases will have numerous potential drug targets.108 Thus, by both affording numerous opportunities for research and a variety of treatment options, the complexity of biological processes reduces the potential for anticommons problems to emerge.
The unbounded nature of biomédical science transforms the anticommons debate. Hardin's original storyline once again illustrates why relative capacity is critically important. Under the traditional commons scenario, individual self-interest inexorably leads to overexploitation. An obvious, though often ignored, assumption implicit in Hardin's tale is that individual actions must be capable of collectively overexploiting a resource. If overexploitation is impossible, the tragedy of the commons disappears.109 The generalized anticommons theory is subject to the same logic-a large scientific commons with relatively small regions of patenting will afford numerous opportunities for scientists to circumvent potential blocking patents and, in effect, make patent anticommons less problematic and less likely to arise.110
This argument requires an important qualification. Biomédical science has numerous sources of heterogeneity, including external market factors, fundamental biological characteristics, and the vagaries of research priorities. At any given time, isolated areas of dense patenting are therefore bound to exist. Our argument does not, and cannot, rule out localized areas of congestion, such as those that could emerge around powerful technologies (e.g., genetic replication methods) or areas with large market potential (e.g., stem cells, diabetes, monoclonal antibodies).111 It demonstrates only that the ambient conditions of biotechnology patenting mitigate against patent anticommons being a pervasive problem. Detecting isolated regions of dense patenting requires either narrow studies or a mapping of the distribution of biotechnology patents in the scientific commons, which is the subject of the subpart that follows.
2. The Multiple Dimensions of Biomédical Science.-The current focus of the academic debate on patent counts ignores the challenges of identifying reliable metrics for areas of intense patenting.112 Unlike patent-ownership patterns, for which an obvious base unit exists (the assignee), the very novel and ill-defined boundaries of most scientific disciplines defy simple conventions. As a consequence, defining the scope of specific fields of invention will generally be far from trivial-one need only consider the byzantine PTO classification system to appreciate the difficulty of this task.113 Typically these obstacles are dealt with by omitting the denominator-the commons itself-from the discussion altogether, which amounts to treating ignorance as a virtue.
The root question is not simply one of classification, but one also of gauging the potential (i.e., currently unrealized) scope of the specific discipline or area of technological development. The complexity and rapidly developing nature of biotechnology heighten this uncertainty and complicate the interpretation of our data. For example, at first blush, it would seem that patents are heavily concentrated in certain subfields and subclasses. About three-quarters of the biotechnology patents issued over the past fifteen years fall into our methods (forty-eight percent) and protein-sequence (twenty-seven percent) subfields. Similarly, approximately fifty percent of the patents in our study fall into just 30 of the more than 700 PTO subclasses represented in our biotechnology patent database. 14
Yet, in absolute terms, only thirty percent of the PTO subclasses covered by our study have more than one hundred patents, and only about one percent have more than one thousand patents.115 Further, if one considers the top thirty PTO subclasses, on average two-thirds of them increased by fewer than fifty patents per year. In fact, for the vast majority of PTO subclasses, fewer than 100 patents were issued within them over the fifteenyear period of our study.116 Like so many features of inventive activity, the distribution of patents across the PTO subclasses is highly skewed-most subclasses have small populations, while a few super subclasses are vast.
Of the 704 PTO subclasses covered by our study, 113 contain more than 100 patents. The challenges involved in interpreting these data are illuminated by Table 1, which provides a list of the thirty PTO subclasses with the largest numbers of patents granted between 1990 and 2004. While the PTO classification system lacks the bizarre qualities of the infamous Chinese Encyclopedia Celestial Emporium of Benevolent Knowledge,111 in which animals are divided into classes that include "those that belong to the Emperor," "embalmed ones," "fabulous ones," "those that from a long way off look like flies," and "stray dogs"118-to-name just a few-it does not lack for seemingly arbitrary and obviously overlapping categories.
Table 1: The Top Thirty PTO Subclasses for Biotechnology Patents
Table 1: The Top Thirty PTO Subclasses for Biotechnology Patents
Table 1: The Top Thirty PTO Subclasses for Biotechnology Patents
Even this short list of subclasses illustrates the erratic scope of the PTO classification scheme. Almost half of the subclasses describe specific classes of biomolecules, particularly proteins, polypeptides, and nucleotides (i.e., DNA and RNA sequences) of certain lengths or having certain characteristics. The scope of many of these molecular classes is enormous, and equally importantly, they are not unique to any particular area of biotechnology. For example, should issuance of 1,462 patents on the subclass "protein sequences having more than 100 amino acid residues" over the past fifteen years be troubling when scientists estimate that there are approximately one million human proteins and most of the proteins in this subclass will be biologically unrelated?121 One could reasonably make the case that we should be more worried about 300-plus patents issued on glycoproteins, which encompass a much narrower, although still broad and important, class of compounds.
The two largest PTO subclasses in our study cover biotechnology methods and their essential materials, such as biomolecules used in genetic engineering, techniques for manipulating DNA, and isolated animal cells and microorganisms. These subclasses are, if anything, more amorphous and potentially broader than those for classes of biomolecules (e.g., polypeptide sequences of a specific length). The subclass "measuring and testing processes involving nucleic acids," which dwarfs all other subclasses, is vast and can include anything from commercial tests (e.g., genetic tests for infants, DNA tests used in law enforcement), to genetic tests for microorganisms, to cutting-edge microarray technologies used in monitoring gene expression levels in living organisms. This single subclass encompasses a vast area of technological development that is applicable to a huge range of biological problems.
It is possible that insights into localized conditions may be gained through case studies of well-defined areas of technological development. We attempted two preliminary studies along these lines, focusing on patents related to diabetes and HIV-AIDS research, each of which is at a different stage of research and development. The data are somewhat informative insofar as we found that patenting in these areas encompassed a relatively broad range of technologies, as measured by the number of PTO subclasses in which the patents issued. We observe a close correlation between the number of patents issued annually and the number of PTO subclasses they cover, which is evidence that their distribution is diffuse. While these results are enticing, they demonstrate the limits of relying on technology categories and generic patent data. Even knowing the field of invention, it is difficult to assess the significance of patent numbers in each of the PTO subclasses without more detailed information on the specific patents at issue.
The preceding examples illustrate the variable scope of PTO subclasses and the poor basis patent counts provide for making inferences about patent densities.122 An obvious question raised by these problems is whether it is possible to do better than the PTO. If one simply wishes to improve the logic and rigor of the PTO's classification system, the answer is probably yes. For example, an approach that more systematically draws on established functional taxonomies could be implemented.123 However, if the objective is to establish a system of categories with carefully calibrated scope, the answer is almost certainly no. The novelty of inventions and the evolving, multidimensional nature of science preclude such a grid-like approach, and multiple frameworks (e.g., biotechnology patents could be classified by biochemical pathway, organ, or disease) will always exist for classifying inventions.
Judging whether the volume of patenting is good, bad, or indifferent requires concrete estimates of the distribution and scope of the patents at issue. Our analysis demonstrates the difficulties of interpreting patent-count data given the impossibility of constructing a synoptic quantitative framework for analyzing them. Unsurprisingly, biomedical science bears little resemblance to Hardin's vision of the public commons, with its relatively simple spatial metrics and model of competition. With this added complexity, assessing specific regions in the scientific commons for potential congestion will require detailed analysis of related groups of patents that is clearly beyond the capacities of high-level patent studies and will be far from trivial for even localized areas of research and development. In the absence of quantitative measures for patent density or distribution, the analysis will largely turn on how patents are categorized, which, as we have seen, will be subject to significant uncertainties and debate.
D. Towards Reliable Metrics for the Scientific Commons
Using patent counts as a metric for gauging patent policy offers important insights when used in conjunction with other information. Our broad surveys of biotechnology patenting provide a reasonably coherent picture. The lack of concentrated ownership, continuous growth in patent applications, rising research and development expenditures, and consistent infusion of new patent owners all suggest that biotechnology patenting is not adversely affecting innovation. These findings are consistent with the existing social science survey data that find little clear evidence that patents are interfering with scientists' research work.124 They are also consistent with our characterization of biotechnology as a still-emerging field in which research opportunities far outstrip current scientific resources. Taken together, these observations suggest that anticommons problems ought to be the exception rather than the rule for biotechnology research and development.
Reliance on patent counts runs into trouble when it becomes the sole metric. Increasingly, the debate over the growth in biotechnology patenting has been conducted as though large numbers alone signal the imminent emergence of patent anticommons.125 In order for the current debate to progress, those concerned that innovation is threatened by expansive patenting will have to move beyond this one-dimensional model. To do so, however, will require commentators to jettison the rhetorically potent metaphors that have framed the debate over biotechnology patenting over the last decade.
It will also, we hope, force academics and policymakers to develop new, more accurate strategies for assessing the impacts of patents on innovation. This could entail, for example, conducting detailed studies using random samples of patents taken from carefully specified research domains. Identifying areas for such focused studies could be based on a combination of factors, including scientific importance, economic potential, and prevalence of patenting in related PTO subclasses. The broad metrics utilized in this study, such as trends in the numbers and types of entities obtaining patents, also provide important information about the dynamics of patenting. Analogizing to analytical strategies used in the ecological sciences,126 which must contend with similar types of complexity, these dynamical measures may prove to be more reliable metrics than simple patent counts. The concluding section of this Article returns to these questions and provides some initial suggestions for future empirical work.
III. The Statistical Insignificance of Patent Metrics as Predictive Tools
Only the most churlish cynic could fail to be excited by the wave of empirical studies animating patent scholarship today. Recent studies have transformed our understanding of patent trends in the United States and abroad.127 Exciting new work combining patent citation data and network theory is being conducted to analyze the relative importance of patents and their relationship to each other.128 More provocatively, a 2004 study by Allison et al., Valuable Patents, purports to have demonstrated that it is possible to identify valuable patents reliably using certain patent characteristics, such as citations to a patent.129
On the other hand, new empirical work should not be embraced uncritically. The rising interest in patent metrics among legal scholars follows decades of work by economists,130 who have long viewed patents as a unique, though difficult-to-interpret, source of information on national innovative output.131 Their experience suggests that patent metrics, and particularly patent characteristics, must be analyzed with great care. Economists' efforts have often been frustrated by the highly skewed (i.e., not bell-shaped) distribution of patent characteristics.132 The vast majority of patents, for example, have very low economic value, whereas a small number of patents (the skewed tail of the distribution) accounts for a disproportionate share of the wealth generated by innovation.133 This distributional feature of patent data, as we will show, creates significant analytical challenges.134
It is also important to recognize the potential pitfalls of patent studies. Unlike atoms, which have perfectly fungible characteristics, patents resemble living organisms, with their highly variable characteristics that can evolve over time. The value of patented inventions, for example, changes as technological fields develop-advances can lead to improved methods that replace patented inventions, or synergistic technologies may arise that make the patented invention more valuable. While statistical analyses can capture broad trends and important relationships, these numbers represent composite pictures made up of thousands of distinct inventions. The broad, highly skewed distributions found for most patent characteristics evidence this underlying diversity and undermine efforts to use statistical methods to construct predictive models for classifying individual patents (e.g., based on economic value, assignee type, or industry).
The discussion that follows begins by presenting our data and examining their limiting features. This analysis focuses on the statistical trends in three distinct patent characteristics: the number of claims, the length of PTO patent prosecutions, and the number of citations made in and received by a patent.135 This work is unique in that it focuses on the statistical limits of evaluating individual patents, as opposed to samples or populations of them. Drawing on these results, we reevaluate the findings of several recent empirical studies, most notably the studies reported in Valuable Patents136 and a recent article by John Allison and Mark Lemley.137 Contrary to the conclusions of these studies, we find no support for the view that broad statistical regularities in patent metrics have practical utility for predicting patent value.
A. Skewed Distributions and Uncharacteristic Patents
Our data suggest that few generalizations can be made about the characteristics of biotechnology patents. With the exception of the recent increases in prosecution times, the data display weak trends, virtually all of which are tiny relative to the broad distribution of the data.138 Further, median values for the three patent characteristics differ only slightly between our biotechnology subfields and assignee types. In short, we find little evidence that the characteristics of biotechnology patents vary meaningfully over time or between assignee types and biotechnology subfields.
/. Mismeasures of Biotechnology Patents: Prosecution Times, Number of Claims, and Citation Counts.-The data on patent prosecution times offer some limited insights.139 We find that the median patent prosecution time diminished between 1990 and 1994 from 2.7 to 2.1 years and then began to climb (most dramatically since 2002), such that by 2004 the median prosecution time for a biotechnology patent was 3.2 years.140 This rise, which constitutes a forty-three-percent increase in median prosecution time from the low in 1 994, is consistent with recent warnings from the PTO that prosecution times and backlogs for patents involving complex technologies have increased substantially.141 Its significance is also borne out by statistical simulations we conducted to obtain a measure of the year-to-year random variation of the mean prosecution time.142
Aggregate statistics alone do not provide a complete account of the data. If one is concerned about the implications of the data for individual patents-such as for purposes of discriminating between patents based on prosecution time-one must take into account the exceptionally broad range of prosecution times.143 The median values and the quartiles presented in Figure 10a are the relevant factors for this analysis. By these measures, the recent rise in patent prosecution time clearly falls within the box plots for each year, which demarcate the range containing fifty percent of the data. This overlap implies that the observed median values for each year all have at least a fifty-fifty chance of being observed in one year as another. Simply stated, the range of likely prosecution times for a patent is indistinguishable from year to year over the fifteen years of our study. The broad dispersion of the data, in effect, creates a statistical haze that makes it impossible to discern any trend at all.
The small differences in median patent prosecution times between the three assignee types and the five biotechnology subfields lead to similar findings.144 For the three assignee groups, the difference in median prosecution time is largest between corporations and the government, with the former having a mean prosecution time of 3.0 years and the latter 3.5 years. By contrast, the upper and lower bounds of the fifty-percent interquartiles in the box plots span 1.5 years and 1.7 years for corporations and the government, respectively. Similarly, the largest difference in the median patent prosecution times between biotechnology subfields is only 0.7 years (2.7 years for polypeptide and protein patents versus 3.4 years for immunological and genetically-modified-organism patents), whereas the smallest fifty-percent interquartile span is 1.5 years. In both cases, the differences fall well within the broad distribution of the data.
These results confirm our intuitions about the dynamics of the patent prosecution process. At least, there is little reason to believe that patent prosecution time is sensitive to technical differences between biotechnology subfields. First, while patent review may stretch over more than two years, the actual time spent on patent applications by examiners averages out to be just eighteen hours.145 In other words, because much of the delay derives from external factors, one would expect the total time for prosecuting a patent to correlate weakly with the actual time a PTO examiner and a patentee's representative(s) spend on the prosecution process. As recent appeals for greater resources suggest,146 PTO backlogs caused by limited resources, more complex patents, and a rising number of applications have a much more direct effect on time for patent review than differences between technologies.
Second, while backlogs may vary between technology fields because of differences in the numbers of patents filed and the complexity of the technology, the influence of these factors will be obscured by the diverse characteristics of inventions within a given field.147 It is obvious, for example, that some biotechnology inventions are complex, while others are elegantly simple. Similarly, many patents involve follow-on inventions that represent minor variations on a basic theme within an established area of research and development, whereas others are truly novel and groundbreaking. The range of inventions, whether characterized by novelty, complexity, or predictability, is reflected in the broad variability of the patent prosecution times for each of our biotechnology subfields.148 The underlying diversity of inventions within a field thus blurs the slight association that might exist between patent prosecution time and substantive differences between fields of research and development.
The analysis of our data for number of claims and patent citations mirrors that for prosecution time data.149 The overall trends are straightforward enough. The median number of claims per patent decreased slightly until 1994, increased through 2000, and then leveled off through 2004.150 Similar to the prosecution time data, the aggregate values suggest a slight trend toward an increasing number of claims, ranging from a low in the median of nine claims to a high of fourteen claims. The modest nature of this trend is highlighted by the fact that the number of claims in a patent can exceed 400. As above, when considered on a patent-by-patent basis, these changes once again fall well within the fifty-percent interquartile of the box plots, and thus, are meaningless for predictive purposes.151 Nor do we observe any meaningful differences in median claim numbers between biotechnology subfields or assignee types.152
These results also bear out our intuitions. No grounds exist to anticipate that the number of claims in a patent correlates in a systematic way with its issue date or area of invention. Valuable inventions, which may justify more expansive (and therefore expensive) claims, arise in all areas of biotechnology research and development. Further, claiming strategies are as variable as the inventions they are designed to protect, which range from multifaceted to streamlined, from complex to simple, and from paradigm-shifting to cumulative. These differences transcend variations over time and between subfields; cumulative to transformative work occurs across time and between specific areas of research and development. Our data provide quantitative support for this view. The variation in the number of claims is almost as broad within discrete patent subpopulations as it is for biotechnology as a whole, suggesting that differences within subfields or years are likely to overwhelm differences between them.153
These observations expose another closely related point. Commentators have become increasingly prone to categorizing certain fields of invention as being dominated by specific types of inventive activity. Biotechnology has been characterized, for example, as being research oriented and distinguished by its discrete technological leaps.154 We find that almost fifty percent of biotechnology patents cover new measuring and testing processes,155 many of which are more analogous to devices than to drugs, which typically provide the model for biotechnology research and development.156 While these sorts of generalizations can be useful, they risk distorting legal policies by propagating a one-dimensional image of biotechnology that marginalizes important innovative work.157 Our data show that the potential for misunderstanding is particularly high for biotechnology.
The citation data are the least illuminating of the study.158 We find that the number of citations made in biotechnology patents exhibit no trends whatsoever. Almost across the board, whether by year, assignee type, or biotechnology subfield, the median number of citations is two; indeed, only a few instances exist in which it deviates to one or three citations. Yet, the range in the number of citations made is quite broad and apparently becoming broader. The data on citations received are much more limited because of the poor temporal overlap between our data and the citation data we integrated from the Hall Study.159 The median number of citations received was highest (four citations) for patents that issued in 1990 and 1991, after which it diminishes monotonically.160
Citation data are arguably most interesting for what they could reveal about successful assignees and possible differences between the public and private sectors. Intuitively one would expect the number of citations a patent receives to correlate with the value or, perhaps less significantly, the scope of the invention, and the most successful assignees to own more patents that receive a greater number of citations.161 The data display no difference between the median value for our "big ten" (i.e., the largest and most successful) biotechnology companies and the median value for all other entities. This result must be treated skeptically, as it could be an artifact of the short time period covered or of the citations in biotechnology patents tending to be to nonpatent sources.162 Alternatively, it may be that successful companies also fail often or perhaps that the existing measures of patent value are too narrow.163 In any event, our citation data for biotechnology patents prove to be decidedly mute.
2. An Empirical Haze in Common.-Two recent studies of U.S. patent trends, the Hall Study and an earlier study by Allison and Lemley,164 purport to reach opposite conclusions about the insights that can be gleaned from patent characteristics.165 On closer inspection, however, it becomes apparent that the authors gloss over several significant limitations of their findings.166 This is particularly true of certain aspects of the Allison-Lemley study. For example, the authors investigate average patent prosecution times for the fourteen technology fields defined in their study and from these results claim that the "patent prosecution system . . . spends much more time and attention on some sorts of patents than others."167
The basis for this conclusion is a regression analysis,168 which reveals that the association between the technology field and the prosecution time is statistically significant (i.e., greater than zero169) for five of the fourteen technology fields.170 But the authors fail to report the magnitude of these associations, which is essential to interpreting the practical significance of their findings. This omission is equivalent to saying that today's temperature will not be zero without giving any indication of how far from zero it is predicted to deviate.171 Their data thus do not disclose whether much more time is actually spent on certain classes of patents.
Allison and Lemley single out their citation data as being particularly compelling.172 They base this conclusion on the purportedly divergent results among their fourteen technology categories. Their data disclose that the overall median number of references for the fourteen technology categories is ten references173 and that the range of median values for the technology fields varies from eight references, for semiconductor- and communications-related inventions, to seventeen references, for biotechnology inventions.174 However, all but three of the technology fields are clustered around the median of ten references, and each of the divergent categories has a small sample size.175 Accordingly, far from showing dramatic differences, the broad variances of their data make it difficult to infer that these differences are real, and where the differences appear noteworthy, the data are too thin for one to have confidence in the result.176
The Allison-Lemley study displays the same limitations we observe in our own data. The most consistent features are the large variance of the data and the nominal differences observed between classes of patents-differences between classes are much less than the differences within them. This disparity limits the insights that these studies can provide. As we argue above, intuitive grounds exist for expecting these results, but a more fundamental reason may also exist. It may be that few "general characteristics" of patents exist, and this variability may be particularly true of rare high-value patents. The analytical challenges created by this go beyond the descriptive studies analyzed here. They also hamper predictive studies, like the Valuable Patents study discussed below, that seek to identify useful associations between specific characteristics of a patent and its economic value or likelihood of being litigated.
B. The Elusive Search for Metrics of Innovative Success
Metrics for predicting patent value are receiving significant attention from economists and lawyers following the construction of new databases.177 Economists have released pioneering studies correlating patent-renewal statistics and citation rates to patent value, and they were the first to use litigated patents to determine whether valuable patents have distinct characteristics.178 Studies by lawyers soon followed with the creation of major databases on litigated patents and comparative work on litigated and nonlitigated patents that exploit the new data.179 This work has generated important insights but, as we will show, metrics based on patent characteristics have severe limitations that have gone unrecognized.
Among legal scholars, this empirical work is closely associated with concerns about the PTO's growing backlog of patents. Legal commentators have suggested that PTO resources could be economized by limiting reviews of the substantial majority of patents that have little or no economic value.180 Their reasoning is simple. Because the slim tail of high-value patents accounts disproportionately for the success of the U.S. patent system, valuable patents ought to be subject to careful review by the PTO, whereas the masses of so-called valueless patents ought to be reviewed lightly.181
This triage strategy raises practical and theoretical problems. At the practical level, it requires that valuable patents be identifiable ex ante. We show that the broad variance of patent characteristics makes this exceedingly difficult. At the theoretical level, it implicates the work of Scherer, a Harvard economist who revealed that innovative success is chaotic and resistant to statistical methods.182 Scherer's work has not dampened optimism. To the contrary, proponents of a patent triage system, particularly the authors of Valuable Patents, all but reject the teachings of Scherer's studies.183 We reassess the empirical bases of this approach to determine whether it succeeds in contravention of Scherer's work.184
Valuable Patents begins with two provocative statistics: patentees spend a total of $4.33 billion per year obtaining patents, but ninety-nine percent of U.S. patents are never enforced through litigation.185 The authors do not find the imbalance between dollars spent on prosecuting patents and the lack of enforcement to be problematic. Instead, they posit that "[s]ome patents are intrinsically more valuable than others."186 For them, the important issue is identifying the distinctive characteristics of valuable patents.187 Their approach is elegant-they use litigation as a proxy for value on the premise that a patent will not be litigated unless it has significant economic or strategic value to the patentee.188 Litigation status provides the authors with a reliable criterion for collecting a sample of valuable litigated patents to compare against otherwise valueless nonlitigated patents.189
The article contains two independent empirical studies, one large and one small. The large study uses the Hall Study data set (i.e., every patent that issued between 1963 and 1999) as the putative population of valueless patents.190 Their sample of valuable patents is drawn from patent suits that terminated during the period of January 1999 to December 2000.191 The smaller, and more detailed, second study uses the thousand patents collected by Allison and Lemley (discussed above) as the putatively valueless patent sample.192 The valuable patents are drawn from the same sample of lawsuits used in the first study.193 For each study, the authors independently evaluate the characteristics of litigated and nonlitigated patents to determine whether statistically significant differences exist between the two sets of patents.
The authors claim that their "data conclusively demonstrate that valuable patents differ in substantial ways from ordinary patents both at the time the applications are filed and during their prosecution. This suggests that valuable patents can be identified before-hand, at least in the aggregate."194 The authors find that litigated patents have numerous stable characteristics,195 although they single out number of claims, prior citations made, and citations received as "unambiguously strong predictors of patent litigation."196 To clarify the bases of their findings, we reexamine the data on these three characteristics and patent prosecution times.197
The analytical limits we identified in the preceding subpart apply equally here.198 The authors focus exclusively on statistical significance. Yet, as already noted, statistical significance does not have a direct bearing on the strength of association between, say, patent value and number of patent claims; it only demonstrates that it is very unlikely that no association exists at all.199 Weak associations, though statistically significant, will have little or no predictive value for either individual patents or samples of modest size; their only value will be in estimating the relative proportion of valuable patents in a large sample.
1. Statistical Significance Without Predictive Power.-Virtually all of the authors' statistical analyses are conveyed in two tables at the end of the article.200 In Tables 2 and 3 below, their findings are presented to ensure that the results of the two studies are readily comparable. For our purposes, two of the factors in the tables are of particular interest: (1) "p-values," which typically must be less than five percent (0.05) for the result to be statistically significant, and (2) "unit increases," which represent the percentage change for each unit increase in a descriptive variable (e.g., number of patent claims).201
Table 2: Large Study Statistics
Table 3: Sample Study Statistics
Average Time for PTO Patent Prosecution. The association between prosecution time and probability of litigation is marginal. For their large and small studies, we calculate a three percent and fourteen percent difference, respectively, between nonlitigated and litigated patents.208 The large study exemplifies the limited value of statistical significance for large samples, which because of their high statistical power cause even nominal differences (three percent) to be found statistically significant. The smaller, less powerful study identifies a larger, though still modest, association between prosecution time and probability of litigation. However, the estimated value, though larger, is not statistically significant, meaning that zero association cannot be rejected. In either case, the slight differences in patent-review times are plainly inadequate for predicting whether a patent is likely to be litigated.
Average Number of Claims. This characteristic is also a poor predictor of whether a patent is likely to be litigated. While the association is statistically significant, its magnitude is small-just a 1.4% increase per claim for the large study and 9.5% increase per independent claim (only 1.3% increase per dependent claim) in the sample study. The weakness of the association is evident in the small difference in the probability of litigation-less than twenty percent-between the average number of claims for nonlitigated and litigated patent samples.209 These weak associations simply reflect the fact that counting claims does not tell us much about the nature of a patent.
Average Number of Citations Made and Received. The data on citations made by patents are, at best, ambiguous. The large study suggests quite a strong association, about a seventy percent difference between the average values for litigated and nonlitigated patents. By contrast, the small study finds the differences to be nominal, only about ten percent, and not statistically significant. The large differences between the two studies are indicative of the instability of these estimates and grounds for caution when assessing whether the number of citations made by a patent is a reliable metric.210
Both studies reveal a consistent and sizeable association between average number of citations to a patent and its litigation status. The large study, because of its much higher average,211 demonstrates that the number of citations received is well correlated with the litigation status of a patent. Litigated patents on average receive 1.6 times more citations.212 The large study thus provides solid grounds for a significant association existing between the number of citations a patent receives and its litigation status.
Status of Owner and Technology Areas. It is worth commenting briefly on owner status and the differences observed between technology areas. Both studies reveal strong positive associations if the patent owner is an individual, and the sample study shows a strong association when the owner is a small business.213 Similarly, strong associations are evident for certain technology fields, particularly mechanics (small study) and computers and communications (large study).214 Several jarring differences between the two studies nevertheless exist.215 For example, the small study finds strong positive associations for electronics and chemistry, whereas the large study predicts modest negative associations for similar fields.216 The strength of the association for individual inventors is also markedly different (more than a factor of three) between the two studies.217 As above, these differences suggest that one should be cautious about reading the strength of these associations too literally.
Our reanalysis clearly affirms two of the authors' central conclusions: the strong association between citations received per patent and the status of the patent owner. It also finds solid evidence for concluding that mechanical patents are more likely to be litigated than the average patent. This interpretation of the Valuable Patents data departs from that of the authors with respect to the three standard patent characteristics, which we find to be poor predictors of whether a patent is likely to be litigated.
If our analysis were to stop here, some prospect, albeit limited, would remain for prioritizing the PTO's patent prosecution process. Unfortunately, as the next section will show, a more fundamental problem also exists.218 The small fraction of litigated patents makes it exceedingly difficult to exploit differences that may exist between nonlitigated and litigated patents. Other than extreme values (e.g., patents with fifty or more citations), the distribution of nonlitigated patents will overwhelm that of the litigated patents. This finding implies that, for the vast majority of patents, even patent characteristics strongly associated with economic value will have only nominal predictive power.
2. Base Rates and the Slim Tail of Innovative Success.-The economist Joseph Schumpeter once famously speculated that a small number of inventions account for a disproportionate share of the total profits from inventive activities.219 Scherer was among the first economists to provide a sound empirical basis for Schumpeter' s prediction.220 In addition to confirming Schumpeter' s hypothesis, Scherer showed that "it is very hard or maybe even impossible to secure stable average profit returns by pursuing portfolio strategies, eg [sic], pooling many research and development projects into a portfolio."221 In other words, the pooling of risks permitted by standard statistical methods is ineffectual in the face of the variability that is characteristic of innovative success.
In this tail-wags-dog universe, statistical measures, such as averages and standard deviations, are often erratic and unreliable. This failure arises because statistical analyses are driven by the center of a sample distribution, not its periphery. Consistency-and-predictability-emerge because the large numbers of small, random events cancel out each other, exposing the systematic influences on the population as a whole. Patents defy this basic model because a small number of extremely valuable patents are disproportionately important to aggregate innovative success; it is the aberrant patents that make the difference.
Scherer's results are at odds with recent empirical work on valuable patents-if general descriptive statistics fail, statistical analyses of specific patent characteristics will fail for the same reason. Scherer likens the market for innovation to a sweepstakes, "the innovation lottery," that randomly bestows huge prizes on a very small number of winners.222 In Valuable Patents, the authors make their objections to Scherer's lottery theory explicit: "if valuable patents can be reliably identified . . . the lottery theory runs into difficulty. At best, it becomes only a partial explanation."223 The article in fact turns on Scherer being wrong, and the authors acknowledge that PTO policy cannot be rationally calibrated and targeted towards certain classes or types of patents if Scherer's findings are accurate.224
The findings of statistical significance presented in Valuable Patents are interpreted as refutations of Scherer's work.225 These proofs fall short of the mark. All that the article establishes is that weak associations between patent value (i.e., litigation status) and certain patent characteristics are unlikely to be the result of random variation. Their findings-unlike strong, consistent predictors of patent value-are not incompatible with Scherer's empirical findings. To the contrary, Scherer's work implies only that statistical measures, such as average values, are unstable over time and unreliable in the long run. Evidence of slight, often uneven associations from two samples of patents cannot refute Scherer's findings. Moreover, absent adequate disproof that inventive processes are chaotic, one would be foolish to rely on their results as trustworthy, long-term estimates of the associations they purport to identify.
The authors' characterization of the results in Valuable Patents is also somewhat misleading. They fail to acknowledge the practical problems inherent in their approach. As we have pointed out several times, a finding of statistical significance is not proof that an association has predictive power.226 This oversight is ultimately secondary, though, as a much more acute obstacle exists. The process of identifying valuable patents presents a standard "base-rate" problem because they constitute such a small minority of the patents issued.227
Making predictions with low base rates is akin to looking for a rare brand of needle, which may or may not have distinctive attributes, in a needle stack. A slightly different example will illustrate this point. Assume you are attempting to identify a small subset of valuable objects from a much larger collection based solely on their size and color. Low base rates, essentially the disparity in numbers between the valuable subset and the full collection, imply that the only valuable objects from the subset capable of being identified unequivocally will be those with colors and sizes that are very rarely found in the broader collection.228 By the same logic, if the distributions of sizes and colors in the small subset overlap significantly with those of the broader collection, few if any objects in the subset will be reliably identifiable.
The citation data from the large study reported in Valuable Patents illustrate the low base-rate problem well. Recall that the averages for number of citations received by the nonlitigated and litigated patents were 4.1 and 12.2, respectively.229 We also found that patents receiving 12.2 citations are about 1.6 times more likely to be litigated than those that receive 4.1 citations.230 Using this regression analysis, we calculate that litigated patents will constitute approximately 1 .4% of all patents with twelve claims, whereas more than 98% will be valueless.231 Even among patents with fifty claims, litigated patents will constitute only about twenty-five percent of the total, and only beyond fifty-five claims are litigated patents projected to exceed the number of nonlitigated patents. The implications of this example are clear-even if reliable, the weak associations identified in Valuable Patents will not, as a practical matter, be able to predict whether an individual patent is valuable.
The base-rate problem and the highly skewed distributions of most patent metrics create formidable obstacles to effective empirical studies of U.S. patents. These statistical barriers demonstrate that studies of general patent characteristics will offer few insights into the nature of individual patents or even fairly large ensembles of patents. The hope that patent policy, and particularly PTO review of patent applications, can be rationalized and calibrated using simple patent metrics is undone by implication. Instead, the value of most empirical work on patents will be limited to the broad aggregate trends that it reveals.
It is important to recognize that the analytical barriers identified here stem from the phenomena being studied, not solely from methodological shortcomings. The enigmatic nature of invention is not reducible to a few variables. In this light, patent metrics have more in common with ecology than atomic weights and measures, whose methods and theory are powerful because the phenomena are (relatively speaking) easy to study and simple.232 Inventive success is simply a hard phenomenon to study given our current level of knowledge and the subject matter's inherent complexity-the inherent heterogeneity of patents themselves, the nonlineari ti es of innovative success, and the dynamic nature of science.
The current excitement about using empirical methods has obscured these basic limits. Given the void of data that we typically confront, this oversight is understandable-in the policy realm, "patent statistics loom up as a mirage of wonderful plentitude and objectivity."233 High hopes alone, of course, cannot overcome the innate unpredictability of innovation and the highly skewed distribution of its benefits. Ironically, it may be a naïve enthusiasm for the rigor of empirical work that has caused academics to reify simple statistical metrics over the self-evident complexity of inventive processes.
Empirical studies are as humbling as they are alluring. It is therefore with some caution that we close by suggesting possible directions for new empirical work. Our analysis shows the difficulties entailed in interpreting patent data and the value of drawing on multiple types of data to construct a composite picture of patenting in a field. These findings suggest the need for more integrative studies that bring together patent data, economic studies, independent information on innovative output (e.g., publication data), and knowledge about the nature of research and development in the relevant field. The deficiencies that we have identified also demonstrate the importance of expanding the methods available to study the interplay between patents and innovation. To this end, recent work using network theory and citation data is particularly important and promising.234
At a more fundamental level, this study has caused us to reassess whether the current focus on patents is ignoring other more fruitful opportunities for empirical work. The core issue is, of course, fostering innovation. Yet, as economists have long recognized, patents are a weak, though still valuable, measure of innovative output. The strategy of this Article rests, in part, on utilizing complementary metrics-such as research and development funding and numbers of entities entering the field-that are readily measurable, less subject to the severe interpretive challenges of patent-count data, and arguably more directly representative of the dynamics of innovation markets and networks. Importantly, this approach shifts the focus from aggregate innovative output to the dynamics of competition and cooperation between individuals and institutions, whether public or private, engaged in research and development.
Ecology and evolutionary biology, which involve the study of natural processes of innovation and competition, may have a lot to teach us about these dynamics.235 Like innovation markets, ecosystems must balance the opposing processes of natural selection (i.e., competition that ensures resources are used efficiently) and species diversification (i.e., natural innovation), which are essential to their long-term resiliency and adaptability to environmental change. This balance is maintained by, among other things, the maintenance of ecological niches, which spare less fit species from competition that they are destined to lose.236 Although the relationship is not a simple one, the balance between dominant and niche species, as well as of overall species diversity, is an important indicator of ecosystem productivity and health. Similar indicators should exist for innovation markets and networks. Indeed, one can imagine how patents could preserve such a balance-by giving small players space to operate-or undermine it-by giving already dominant entities rights that give them complete control.237
In one sense, this type of approach is unremarkable. Legal scholars and economists have long been concerned about the interplay between patents and innovation, such as the impacts of patents on important follow-on research derivatives of a keystone patent. In another sense, it differs fundamentally from patents-oriented models in that it shifts the focus to the dynamics of innovation markets and networks.238 For example, rather than asking seemingly intractable questions about the general characteristics of valuable inventions, we might focus instead on determining the characteristics of vibrant innovation markets, such as the balance between large and small companies or the number of entities in a field, and how these dynamics change as a technology matures.239 This kind of strategy has the potential to identify better metrics that are less likely to be subject to the chaotic behavior that is characteristic of patent metrics.
IV. Conclusions
The question motivating this Article - do we have too much of a good thing? - turns on the distribution of patents across distinct areas of research and development. Our data reveal the striking rise and fall in biotechnology patenting and the surprisingly diffuse and expanding patterns of patent ownership. We conclude that the lack of concentrated control, the rising number of patent applications, and the continuous influx of new patent owners suggest that overall biotechnology innovation is not being impaired by the growth in patents issued each year.
Our analysis also reveals the many pitfalls of seeking to resolve this question at a synoptic level using simple metrics. In this sense, both the advocates of the anticommons theory and enthusiasts of patent characteristics err by oversimplifying the multidimensional character of patent dynamics. This Article has shown that identifying putative areas of dense patenting based on patent counts alone tells very little about whether anticommons problems are likely to exist. Similarly, patent metrics based on general patent characteristics have only nominal predictive value as applied to individual patents. These observations reveal the risks of overly reductive theories and empirical methods, and their close parallels with each other. The analytical obstacles we have described place a premium on developing new, creative approaches to studying U.S. patents. Complementary sources of information beyond legal sources can no doubt aid in filling some of these gaps. However, patents offer a unique source of data - particularly given the strong incentives companies have to withhold information - and they represent one of the most direct links between legal policies and innovation. It is therefore worth thinking carefully about how patents can be used more effectively in empirical work, and perhaps more ambitiously, about how rules governing patent content can be redesigned to facilitate research on and monitoring of patent trends.
V. Figures
Figure 1: Number of Biotechnology Patents Issued Yearly, 1990-2004 (n = 52,039)
Figure 2: Number of Biotechnology Patents Issued Yearly, 1990-2004, by Technology Group (n = 52,039)
Figure 3: Number of Patents Issued Yearly in the Top PTO Biotechnology Subclasses, 1990-2004 (n = 52,039)
Figure 4: Number of Patents Issued Yearly in the Top 30 Biotechnology Subclasses, 1990-2004 (» = 52,039)
Figure 5 : Number of Biotechnology Patents Issued Yearly to Four Classes of Companies, 1990-2004
Figure 6: Biotechnology Patents Issued Yearly to University, Corporate, and Government Assignees, 1990-2004
Figure 7: Assignments of Biotechnology Patents to Corporations, Universities, and Government by Technology Group, 1990-2004
Figure 8: Number of Assignees of Biotechnology Patents Yearly, 1990-2004
Figure 9a: Number of Assignees by Technology Group Yearly, 1990-2004
Figure 9b: Number of University and Corporate Assignees Yearly, 1990-2004
Figure 10a: Prosecution Time Yearly, 1990-2004
Figure 10b: Prosecution Time by Assignee Group, 1990-2004
Figure 10c: Prosecution Time by Technology Group, 1990-2004
Figure 11a: Number of Claims per Patent Yearly, 1990-2004
Figure 11b: Number of Claims per Patent by Assignee Group, 1990-2004
Figure 11c: Number of Claims per Patent by Technology Group, 1990-2004
David E. Adelman & Kathryn L. DeAngelis**
* Associate Professor, James E. Rogers College of Law, University of Arizona. This Article greatly benefited from the suggestions, comments, and criticisms of Graeme Austin, John Barton, Mark Lemley, Greg Mandel, Rob Merges, Michael Meurer, Marc Miller, Barak Orbach, F.M. Scherer, and Kathy Strandburg, as well as invaluable assistance with the data analysis from Roger DeAngelis. The authors also benefited from the feedback they received from participants in the conference on Frontiers of Intellectual Property held at the University of Texas at Austin, and would particularly like to thank the discussants John Allison, Rebecca Eisenberg, and Margaret Sampson.
** Graduate, James E. Rogers College of Law, 2005.
Appendix: Data and Methodology
The biotechnology patent database consists of 52,039 patents that were issued between January 1990 and December 2004. These patents were collected from a larger patent database consisting of all patents, more than 950,000 in total, that were issued between January 1990 and March 2005. These data were obtained directly from the PTO, which has electronic files dating back to at least 1963.240 Once the data for all patents issued between January 1990 and March 2005 were collected, they were converted to a consistent format for analysis.241
The primary issue to resolve in constructing the database was the criteria we would use to determine whether to include or exclude a patent. As other commentators have acknowledged in similar contexts,242 the process of categorizing inventions into subject areas is part art and part science that entails difficult judgment calls, and one is unavoidably confronted with inventions that defy simple categorizing.243 Making these decisions proved to be particularly difficult in areas where pharmaceutical research and development closely parallels, or overlaps, biotechnology work. We therefore focused our attention on ensuring that our data were not confounded by patents on inventions associated with traditional pharmaceutical research.
We adopted a two-stage approach: first, we constructed an unambiguously overinclusive database of biotechnology-related patents using general PTO classes; second, we pared down this overbroad data set using several criteria. The first broad database was drawn from forty-nine PTO classes and contained a total of 89,619 patents.244 For construction of the second, narrower database we employed three complementary strategies. We examined the PTO subclasses in which well-established biotechnology companies were obtaining patents,245 reviewed the subclasses that the PTO treats as biotechnology fields, and undertook our own independent assessment of potentially relevant PTO subclasses to determine their relevance. The process was an iterative one in which we started with the first strategy and then compared these results against the PTO classification system. Where there were differences, or where we had our own reservations about specific subclasses, we undertook our own detailed analysis.
After several rounds of this process, our analysis converged on a final database comprised of patents whose primary PTO classification falls under one of 704 PTO subclasses.246 As a general rule, we sought to be as inclusive as possible, but where a subclass might include a few biotechnology inventions but otherwise be dominated by inventions in other fields, we opted to exclude those subclasses.247 Most differences between the subclasses included in our database and the PTO's biotechnology-art-unit-based designation of biotechnology subclasses derived from our efforts to avoid overlap with pharmaceutical inventions and our decision to exclude agricultural biotechnology inventions from the database.248
The database contains general information on each patent issued, and can be easily updated to include patent data as it is added to the PTO Web site. The specific categories of information include the following: patent number, application number, patent title, issue date, assignee, primary subclass, other subclasses, art unit, number of claims, number of figures, primary examiner, assistant examiner, and search classes. We also augmented the data we received from the PTO by integrating data on patents issued from January 1 990 through December 1 999 from a database compiled by several researchers associated with the National Bureau of Economic Research (NBER), which is called the NBER Patent Citations DATA File.249 These data gave us two additional categories: (1) number of citations made by each patent granted during the 1 990s, and (2) number of citations received by each patent during the 1990s.
We organized the data into discrete data tables to run comparative analyses. The first of these was structured around distinct, albeit still somewhat artificial, biotechnology subfields. Drawing on intermediate categories evident in the PTO classification system, we divided the data set into five distinct areas of biotechnology research and development: (1) measuring and testing processes (MET);250 (2) polypeptide (i.e., short protein subsequences) and protein sequences (PSQ); (3) nucleotide (i.e., DNA, gene) sequences (NSQ); (4) immunological processes and compounds (IGG);251 and (5) genetically modified organisms (GMO). These categories were chosen because they aligned reasonably well with the patents in the database, because they made sense scientifically, and because of their importance in the field of biotechnology research and development.252 Each category is treated as exclusive, although some inventions could be placed in more than one subfield. While this ordering could bias our results, we believe that, given the large number of patents studied, its effects are likely marginal.
We created one final subdivision of the data for our analysis. Patents were categorized according to their ownership status - whether the assignee of the patent was the federal government, a university, or a corporation.253 In addition, to obtain a sense of possible substructures within the corporate category, which is by far the largest one, we created four groups of companies: (1) ten large biotechnology companies based on an examination of revenue, profits, and number of employees;254 (2) ten, somewhat randomly chosen, mid-range biotechnology companies, virtually all of which were operating at a loss;255 (3) the ten biotechnology companies with the largest patent portfolios;256 and (4) the ten pharmaceutical companies (i.e., not solely biotechnology) with the largest patent portfolios.257 These groupings are included in comparative analyses.
We conducted three central analyses of the data.258 First, similar to a number of existing studies, we examined trends for specific characteristics of the patents. These included the number of claims, the length of patent prosecution, and the number of citations made and received.259 Second, we analyzed trends based on assignee type (i.e., corporations, universities, federal government) and assignee size, as well as the thirty assignees with the largest number of patents. Third, we evaluated patent trends based on the five biotechnology subfields listed above - measuring and testing processes, protein sequences, nucleotide sequences, immunological processes and compounds, and genetically modified organisms. All of the data were evaluated to determine trends over time, and several studies were conducted to examine the interplay between three patent characteristics (i.e., number of claims, citations, length of patent prosecution).
The analyses of general trends were supplemented by two studies designed to evaluate the distribution of biotech patenting among patent owners and across distinct areas of research and development. The analysis ranged in scope from studies of discrete areas of research and development (e.g., diabetes research) to the full complement of biotechnology patents in the database. Patent-ownership patterns were done using our data on assignees. The distribution of patents across different subject areas was evaluated using PTO subclasses. In addition, we utilized a variety of statistical methods, particularly standard hypothesis testing and linear regression, to examine the results of three existing studies of U.S. patents. The details of these methods and their implications are integrated into our discussion of the data.