UNITED STATES DEPARTMENT OF COMMERCE PATENT AND TRADEMARK OFFICE - - - - - - - - - - - - - - - - - - x : PUBLIC HEARING ON PATENTING OF : NUCLEIC ACID SEQUENCES : : - - - - - - - - - - - - - - - - - - x Crystal Park Two 2121 Crystal Drive Commissioner's Conference Room Suite 912 Arlington, Virginia Tuesday, April 23, 1996 The hearing in the above-entitled matter, commenced, pursuant to notice, at 9:00 a.m. B E F O R E: BRUCE A. LEHMAN Assistant Secretary of Commerce and Commissioner of Patents and Trademarks LAWRENCE J. GOFFNEY Acting Deputy Assistant Secretary of Commerce and Acting Deputy Commissioner of Patents and Trademarks EDWARD R. KAZENSKE Deputy Assistant Commissioner for Patents STEPHEN G. KUNIN Deputy Assistant Commissioner for Patent Policy and Projects NANCY J. LINCK Solicitor C O N T E N T S ORAL TESTIMONY BY: PAGE Margaret Smith Genetics Computer Group Vice President University Research Park 575 Science Drive Madison, Wisconsin 53711-1060. . . . . . . . . 9 Eli Mintz Compugen Ltd. 17 Hamacabim St. Petach-Tikva, 49220, Israel. . . . . . . . . . 22 Hollie L. Baker Chair, Committee 1001 (Biotechnology) of the ABA Section of Intellectual Property Law Hale and Door Suite 1000, 1455 Pennsylvania Avenue, N.W. Washington, D.C. 20004. . . . . . . . . . . . . 30 Gary M. Pace, Ph.D. Ciba Agricultural Biotechnology Research Unit Staff Scientist/Patent Liaison P.O. Box 12257 Research Triangle Park, North Carolina 27709. . 37 Michael Fannon Human Genome Sciences, Inc. Director of IS Department 9410 Key West Avenue Rockville, Maryland 20850. . . . . . . . . . . 48 Herb Jervis Smith-Kline Beecham. . . . . . . . . . . . . . 57 Albert Shpuntoff. . . . . . . . . . . . . . . 70 P R O C E E D I N G S MR. LEHMAN: Good morning. My name is Bruce A. Lehman. I am the Assistant Secretary of Commerce and the Commissioner of Patents and Trademarks. Joining me at this Hearing today are Lawrence J. Goffney, to my immediate left, Acting Deputy Assistant Secretary of Commerce and Acting Deputy Commissioner of Patents and Trademarks; Edward R. Kazenske, Deputy Assistant Commissioner for Patents, on my immediate right; Stephen Kunin, Deputy Assistant Commissioner for Patent Policy and Projects, on my far left; and Nancy Linck, the Solicitor of the Patent and Trademark Office. This is a hearing to receive public comment on a serious problem that is currently facing the Patent and Trademark Office related to patent protection for nucleic acid sequences. The public was invited to comment on this issue in a notice that we published in the Congressional Record on March 12, 1996. For over a decade, the PTO has been examining and granting patents to claims reciting nucleic acid sequences. Scientific and technological advances have permitted rapid identification of large numbers of genes or gene fragments. The ease of utilizing automated techniques for sequencing nucleic acid fragments has resulted in the filing of a growing, although still relatively small number, of patent applications, each of which claim thousands of nucleic acid sequences. Statistics reveal that the number of these applications is growing and based on the number of organisms and genes still to be discovered, such growth will continue for the near future. In Fiscal Year 1991, the Scientific and Technical Information Center of the PTO searched about 4,000 sequences. In Fiscal Year 95, they searched about 22,000 sequences. Currently, we have over 200,000 sequences claimed in at least 70 patent applications awaiting search and examination. Our estimates show that the search of 100 sequences requires about 15 hours of computing time but the evaluation of the search results for those 100 sequences requires about 65 hours of examiner time. The PTO currently has two massively parallel processor computers and could run the searches in about two years, with the computers running twenty-four hours a day, seven days a week. To examine this relatively small number of patent applications only with respect to the prior art, however, would require over 90 senior-level staff years. Thus, in order to process these applications, the entire staff of the Biotechnology Patent Examining Group 1800 would have to work for more than nine months exclusively on these applications. These applications present a challenge to the PTO and we need help and suggestions on how we can address this problem. The United States is a leader in the rapidly growing field of biotechnology, which is a growth industry important to the economic health of this country. The PTO has taken a very active role in working with its customers to simplify policies and procedures in ways that encourage and promote the growth of this industry. We are committed to improving the responsiveness of the PTO to its customers and to more effectively address the needs of the industry. We must find ways to search and examine the pending applications and provide these applicants with the appropriate patent protection for their inventions without creating an imbalance in the appropriation of the resources within and among the technologies and Patent Examining Groups. The policies established must permit the timely and thorough examination of all applications which require the same resources for completion. We are currently working in partnership with the applicants of these applications in order to explore innovative mechanisms and to accomplish the required work in processing the applications. We appreciate the time each person who is here today has taken to attend the hearing and to provide us with your input into the solutions to these problems. A transcript of the hearing will be prepared and will be made available for purchase by the public approximately 10 days after this hearing. Copies will also be available directly for purchase from the stenographer. The name of the stenographer service today is Miller Reporting Company. Their telephone number is (202) 546-6666. That is Miller Reporting, (202) 546-6666. I assume you can probably talk to the stenographer here too about that. We have received eleven written comments and seven requests to appear orally this morning. However, any persons who wish to speak and who have not previously informed us of their desire to testify are encouraged to add their names to the list located at the credenza at the rear of the room. In order to permit all persons requesting to appear orally, including those people that may be signing up today to present testimony, we would request that each speaker limit their comments to 15 minutes. You don't have to take your full 15 minutes keep in mind. Those persons who wish to provide additional comments must submit their comments to us in writing no later than April-- well, that is today, April 23rd-- no later than today. The speakers have been listed in the order in which the requests were received by us. You may also pick up at the table at the rear of the room copies of the "Official Gazette", publication of the Notice of the Hearings, and the Request for Comments on the issues relating to patent protection for nucleic acid sequences. When you present your comments, we would request that you please give your name and address and tell us whether these comments are you own, or whether they are those of your law firm or company, or whether you represent an organization and are presenting comments on their behalf. We would appreciate it also if the comments could be limited to the questions that were presented in the Federal Register publication of March 12, 1996. The first speaker that we have listed is Michael Langan, but I was informed that he doesn't appear to be here yet. Is Michael Langan here? (No response.) MR. LEHMAN: If not, is Margaret Smith here? MS. SMITH: Yes. MR. LEHMAN: Why don't you come forward, Ms. Smith? MS. SMITH: My name is Margaret Smith and I represent Genetics Commuter Group, GCG. I am the chief operating officer and one of the owners and founders of GCG. The mission of GCG is to serve biologists by discovering, implementing, publishing and supporting algorithms in the area of sequence analysis. This is the only thing that we do. This is our main focus. We also think that standards are very important, and that is one of the arguments that I would like to use as a basis for my comments. And that is that what standards, such as GCG, which is widely used for analysis and searching, also as a central management tool for public databases, that are installed locally, built on standards using Fortran, C motif, runs on standard platform, such as Sun, SGI and Digital, and also incorporates standards such as Fass Day, Smith Waterman, Blast, SRS, the public databases like GenBank, PIR, Swiss Proit, EMBL, other important files like Rebase and Procite, as well as standard comparison tables. The software from our company is also a standard on which other people build tools. Within institutions there are modifications so that longer sequences can be used or that searches can be run in a different manner or a set of searches can be run in a certain way. The GCG software has also been a platform upon which other tools have been built. For example, in Europe there is something called extended GCG, which is another package built on top where there are modifications and extra programs. Compugen has built a strategy. Compugen, which is another company, has taken the basics and looked at the problems of high throughput and how to speed up searches and therefore taken the problems that are probably very similar to the Patent Office that are faced by the pharmaceutical companies and other high throughput laboratories where many sequences must be analyzed. Also, based upon our standards, our standard package, people are building interfaces so that users, scientists, can access just the programs of interest. The European patent office, for instance, uses our software and has a set menu that their examiners use and therefore they are limiting and just using the part of the package that is most appropriate for them. So, the basis of the argument is that there is a set of standards and we have incorporated many of the standards and become a standard. I would like to specifically address the issues of the search strategy. As I mentioned before, the pharmaceutical companies face this issue, and there is a proposal of a filtering system. If you have a lot of sequences to go through, that you can first, I will give the analogy, it's like a course grain filter that is very quick. Things move through it very quickly, but you will catch the most obvious. An example might be blast, the blast algorithm. Then moving on to things that are more particular, the next part of the filter, would be something along the lines of Smith Waterman or Fass Day perhaps. And the next level of filter could be something like frame search, which is a program that allows you to take a protein sequence and search against a nucleic acid database or vice versa. I think an important part of this filter system is to have machine readable output. There are standards. The tools to make up the filter are there, but I think the part where it's new research is needed -- not necessarily research, but new work is needed, is to look at it in terms of machine readable output and also normalization, so that there can be an acceptable level of identity before moving on to the next step in the filter. An acceptable level of identity is recognized. I know that some of the issues seem to deal with ambiguity within sequences and there are also some other programs that can deal with ambiguity within sequences, such as Profile Search, which allows you to indicate regions of high ambiguity or regions of high stringency before the search is done. There are also pattern matching searches that can be done. There is really probably a circumscribed set of tools that can be placed in different slots of the filter, depending upon what the problem is. But I think the common theme is machine readable output, as well as normalization of scores. There is work being done on this already, but I think that this high volume of data that is coming, even though we projected that it would come, everybody, not just GCG, projected that it would come, we are still caught off-base by how quickly it has come. So, work is being done by us and by others on machine readable output, as well as normalization scores, and I think a lot of that work has been done already. I would also like to suggest that based on the last ten years of experience that the Patent Office has had with sequence patents, perhaps guidelines can be provided for customers to provide a better set of prior art searches results that the applicants have done and can be submitted with the application. This would move some of the work, and perhaps more of it, off to the applicant to be brought in as prior art information. An example might be maybe the type of search and parameters that were used, the date of the search, the date of set and the results in a machine readable form and then the examiner can further refine this. I would also like to point out that the PTO and GCG actually share a large set of customers. And these customers, for example, if you look at the people that have many of the companies that have a high number of patents, such as Tequita, Immunex, Hoffman-LaRoche, Ciba-Giby, Santori, Setas, Eli Lilly, those are shared customers, as well as NIH, University of Washington, University of Texas, New York State University. These are also public institutions that have a high number of patents in this area. Therefore those people already have many of these tools, and I think that if they don't, there can also be public access to these tools. But there is an incentive to develop new tools that can be supported within this community and these new tools would be in the area of automatic annotation and normalization. I would also like to state that the strength of the standard tools, which are well supported through GCG software, as well as the strength of Compugen, which supports high throughput solutions, they are working with pharmaceutical companies and addressing many of these issues already with the pharmaceutical companies on how to work with many sequences. I would also like to add that GCG has a long-term working relationship with public database providers. This is also a key point in that keeping the data up to date and searching against that data so that searches against that data are at the appropriate time, have the appropriate data for when that application is appropriate. Also GCG has relationships with other software developers and vendors if new algorithms, perhaps in an adjacent area, are also required. I think that is it. MR. LEHMAN: Is it your view that apparently we are not using these tools right now that you are describing in our searches and that if we were to do so, we would very substantially reduce these processing times that we are talking about. MS. SMITH: I think that some of the tools are being used. Like, for instance, the Smith Waterman. It is my understanding that that is being used. It's hard to say what exactly the tools. The tools I think are interchangeable depending upon the problem that you are searching. I am making the argument that it's a standard set of tools, that if it can be provided where you can pick and choose the appropriate filter to slip in to the grid that you are creating of filters and not have to go out and look for the new one outside and try to figure how to work that into the solution. So, I think you have many of the tools already. It's working them into a filtering system and also the machine readable output and normalization. MR. LEHMAN: So, in theory, if we are using the most efficient filter, the quickest filter, that we would be likely to catch in enough cases a prior art that would invalidate the patent application or the claims that we could just throw that one out and not have to go further. Is that the point, that it would safe us a lot of time? MS. SMITH: Right. Right. MR. LEHMAN: The difficulty is though if one assumes that these patent applications will be more and more and more eloquent and won't be able to be thrown off that easily, we will still be stuck with the problem of having to do a more comprehensive search and have finer filters. MS. SMITH: Right. And it's the ability to slip in those finer filters at an appropriate spot that I think is good about working with standards. The advantage of the standards is that a large community has tested them and they have kind of been through the mill, so to speak, in that errors have been detected and been fixed and people respect the results. MR. LEHMAN: Now your company makes the software -- MS. SMITH: Correct. MR. LEHMAN: -- basically that is used. I assume the primary purpose of that software isn't really to find the prior art in a patent search, it's to aid in the scientific research itself. MS. SMITH: Our software is used for many different things, but it's really just basic sequence analysis. So, if you can represent sequences as text, and in the case of DNA you have four nucleotides and proteins, 20 amino acids, that can each be represented by a single letter, our software is basically a large text processor. So, we are continually adding tools that have been recognized as standards or have come up through much testing, and many of these tools have come from public institutions. So, it's a solution which you can build around. So, it's a core that will allow you to slip in new tools. In fact, that is what some people do at our customer institutions. They will build new tools on top. Because we don't know what everybody needs. So we try and build a common set based on standards that have been tested and in a sense approved by public opinion and then people build things on top of it because we provide the source code and allow that. MR. LEHMAN: Any other questions? MR. KUNIN: I would like to ask -- is this mike on? MR. LEHMAN: Yes. MR. KUNIN: The presentation that you made, I didn't quite understand the full impact of what you were saying, seemed to emphasize a substantial amount of front end processing with a discussion of machine readable output. One of the components that we have indicated here as a significant problem of ours is the post processing. And that is to say that in many instances our problem is not in identifying hits, but our problem is having very large numbers of hits and then having a very difficult time in using Examiners of having to sort through the outcomes and make the actual patentability determinations. Do you have any recommendations in terms of taking the machine readable output and doing some post processing that would help take those results and put them in a more orderly way to help the examiner on the back end. MS. SMITH: I think it would depend upon what level of identity you are willing to accept or not accept. I am making an assumption that the large number of sequences are due to much ambiguity in an original sequence being allowed, but maybe that is a wrong assumption. But if there is a lot, if a lot of ambiguity is allowed, then I think it's going to be very hard to attack this. But if the level of identity that you want can be addressed and put at a certain level, then I think the machine readable output is very good at going through and finding if you attain that level of identity, and then only bringing to the examiner the pertinent information that is at the appropriate level of identity. MR. LEHMAN: Anybody else? (No response.) MR. LEHMAN: If not, thank you very much. MS. SMITH: Thank you. MR. LEHMAN: Next I would like to ask Eli Mintz, please, if he is here, to come forward. MR. MINTZ: Good morning. My name is Eli Mintz. I am the CO of Compugen and speaking on Compugen's behalf. Compugen manufactures hardware for accelerating sequence searches, homology and similarity searches. I am also speaking here today on behalf of Silicon Graphics International, SGI, and the points I am going to talk about have received their approval. I would like to raise four points that I think should help the Patent Office to solve some of its problems. Some of these points are the same as Maggie Smith has raised. The four points are: (1) Cost effective scalable hardware. (2) Filtering strategy for searches. (3) Automating work flow based on rule based systems. (4) Distributed prefiltering at U.S. PTO customer sites. I would like to speak briefly about each point. We think that cost effective scalable hardware solution can be built for doing the searching that U.S. PTO requires. A basic building block would cost $10 per hour and would enable exhaustive searching of 100 sequences each 500 base periods long against a database with sizes one billion base periods in less than 15 hours. So, relative to the numbers published in the announcement, it's about an improvement of a factor of ten in cost effectiveness. The chronology that Silicon Graphics has, for example, on how hardware is based on and the GCG software that is used to integrate all this, the technology is all the time evolving and we believe that in the years to come there will be still substantial advancement and the cost of searching will go down. So, I really don't think that searching will be that much of a problem in the future. I am talking here about the rigorous searches, such as Smith Waterman. Going back to the filtering strategy, I think it is an important strategy and will bear fruit say into the future because eventually everything will be prior art. Then the less rigorous searches will quickly fish out results. So it wouldn't make sense to start out with the most rigorous or the more sensitive algorithm right away, but it would be a good idea to prescreen the results, to prescreen the applications, with an algorithm such as Blast that is very quick, not rigorous, but will pick up a lot of homologies. Let's say in the next few years this may not be the case, but four or five years down the road I am quite confident that most of the filtering will be done at that stage. The third point, which I think is the one where the U.S. PTO can save a lot is automating the work flow. Recently there has been some work done by Chris Sanders' group at EBI, the European Bioinformatics Institute, and what they have done, I believe, is a proof of concept of what can be done. Let me just read this out. The system that they built is called Ginquiz. It is an integrated system for large scale biological sequence analysis that goes from putting sequence to biochemical function using a variety of search and analysis methods and up to date protein and DNA databases. Applying an expert system module to the results of the different methods, Ginquiz creates a compact summary of findings. It focuses on deriving a predicted put in function based on the available evidence, including the evaluation of a similarity to the closest homologue in the data base, identical, clear, tentative, homoginal. The analysis used everything that can possibly be extracted from the databases, including three dimensional models by homology when the structure can be reliably calculated. Ginquiz consists of four modules; the database update, the search system, the interpretation module and a visualization and browsing system. The models are driven by programs and a front end program for visualization based on a WWW browser is available. The principal design requirement is the complete automation of all repetitive actions, repetitive subset dates, efficient segment similarity searches, the automated evaluation and interpretation of the results using expert knowledge quoted in rules. This system has actually been run on a 64 processor Silicon Graphics power challenger ray and it has been used to analyze 6,000 protein sequences from the genome yeast and they plan to analyze the whole yeast genome when it will be available probably later this month. I think this is a proof of concept. The problem they are trying to solve is more difficult than the problem that the U.S. PTO faces because they want to predict function of a sequence, not just to determine if it is prior art or not. And therefore I think that the technology is available today to allow the U.S. PTO to substantially decrease the Examiner time that it requires and use such automated tools. The fourth point is, as I said, distributed pre-filtering at U.S. PTO customer sites. I think the right way to go is to move most of the work to the sites of the people that send in the applications. Because if the U.S. PTO has a defined set of actions that it takes on each application, it would make sense to have the people sending in the applications do this before having sent in the application. And, in my opinion, this can substantially lower the amount of sequences that reach the U.S. PTO, and when they do reach the U.S. PTO, there will already be behind them substantial work done and maybe some of the work will not have to be repeated. Especially if, let's say that third parties take the definitions supplied by the U.S. PTO and industry group, it doesn't matter, and build on top of them some software that can be sold or provided to the community, and the U.S. PTO will know that if you use this software, the results are in a format that is acceptable to it, then a lot of the work can be moved from a centralized location towards the people sending in the applications. That is basically it. MR. LEHMAN: I take it your assumption is that if we moved this prefiltering out to the applicant site, that they would be able to do it sufficiently rapidly that they would be able to get it into us quickly. Because, of course, one of the difficulties is if the invention should end up being disclosed in some way, and keep in mind a lot of this research is being done in governmental institutions and some that aren't filing patent applications and maybe publishing the work, that if they are spending a lot of time doing this presearch themselves before they get to us, they would take a risk of having the invention ultimately go into the public domain, certainly in other countries, if not here. So, would this be able to be done quickly enough, do you think, so that people would not lose an advantage of giving everything they have to the PTO right away? MR. MINTZ: I believe so. Because the problem that each applicant faces is small relative to the problem that the U.S. PTO faces. Because you have to analyze all the sequences and the applicants have only to analyze their specific sequences. I think this can be a pretty quick process. Most of the sites that I am aware of can easily handle just things in house without any problems. They have the capabilities, they have the hardware, they have the software. But they have the building blocks of the software. There needs to be a definition of what exactly should be done with the sequence before sending it into the U.S. PTO. MR. LEHMAN: Does anybody else have any questions? (No response.) MR. LEHMAN: If not, thank you very much. Did you come all the way from Israel for this? MR. MINTZ: Yes. MR. LEHMAN: Next on our list is Hollie Baker. MS. BAKER: My name is Hollie Baker. I am a senior partner at the law firm of Hale and Dorr in Boston, Massachusetts. I am here on behalf of the American Bar Association, the Intellectual Property Section, where I am Chair of the section's biotechnology committee. On behalf of the section, I want to thank the Patent Office for having these hearings. We understand how much hard work and time has gone into the hearings, particularly on the reverse side in working with the committee members who have devoted their time and effort in reading the notice and providing some comments to us and preparing our written report which we have submitted to the Patent Office. We also understand the enormous amount of time that is going to be involved in finding solutions to these issues on nucleotide sequencing. It wasn't that long ago that I sat in the Patent Office Conference Room with other patent practitioners and with patent examiners where we were first exploring the issue of submitting nucleotide sequences at the first instance. And at my former law firm, we were one of the test beds for evaluating the patent program. Technology has moved a long way since that time. Because at the time we were filing patent applications primarily devoted to sequencing of proteins, or perhaps particular sequences with some modifications to make some second generation drugs, vaccines, particular probes for diagnostic uses, and now we are sequencing genomes. So, just as the technology has moved forward in biotechnology, I think the computer industry has moved forward significantly. That goes into the first comment that we would like to make. The Patent Office listed three issues that they wanted to have addressed, and I would like to address the last issue first, which does move into the first issue. Our section recommends that the Patent Office establish a panel of experts on computer searching to work with the Patent Office to evaluate the Patent Office computer searching and analysis of computer results and to make recommendations for improving the quality and lowering the cost of the searches. With the establishment of this panel, the section believes that it can make the appropriate recommendations for improving the Patent Office search capabilities. Now, since our committee is not an expert on computers, most of us are patent practitioners and are primarily involved with the patent application evaluation and examination process, we consulted after the April 3rd hearing with a computer company, actually it's a consulting company, who had worked with the SEC, for example, in developing their Edgar program, which you may be familiar with. This was only to make sure that we were making appropriate recommendations to the Patent Office in our recommendation that they establish a panel of experts to put together something new for the Patent Office searching capabilities. Everything that has been mentioned by the previous two speakers was included in a report that they prepared for us, and I have encouraged this company to submit the report as written remarks to the panel. I believe they will be doing that today. One of the things that was not addressed by the previous two speakers, which was a question by the panel, was: How do you analyze this data once you have prescreened it and once you have prefiltered it? How do you put this together? This company is not a gene sequencing company. What they really encouraged and were rather appalled by is the amount of data that had to be analyzed in these printouts. They really encourage putting together this information as visual data, as an example, so that perhaps in a graph chart form it can be analyzed in a very particular way. Apparently this is doable. I don't have the expertise to know how this is done. But this is something that I had not thought of, but certainly is currently capable of being done with the data that is being generated. In addressing the second issue on underwriting the cost, the section does oppose the imposition of higher fees for patent applications containing a large number of nucleotide and amino acid sequences or long nucleotide or amino acid sequences. The committee is very aware, the section is very aware as well, that the Patent Office is required under 35 United States Code, Section 41-D, to recover the fees on an estimated average cost of its processing services and materials associated with the patent applications. While we now know that this hearing is directed to address the problem of a particularly large number of sequences or long sequences, our section based on that makes some recommendations under the nucleotides sequence rules. Current rules were established that nucleotide submissions had to be made if a biotechnology patent application contained as few as 10 nucleotides or three amino acids, which are 12 nucleotides this requirement applies equally to known prior art sequences, sequences used as probes, sequences used in making constricts, even random sequences that have absolutely no other identifying capability other than their nucleotide sequences. It also, of course, applies to the claimed invention. And although we are not sure, the Patent Office is obviously more aware of this than we are, we believe it may be contributing to an unwieldy database. We think amendment of the nucleotide sequence rules may eliminate some of the clutter that is in this current database. The section recommends amending the rules governing the submissions to eliminate the requirement that all nucleotides, amino acid sequences and a fragments disclose be submitted and require only those sequences that are claimed to be submitted. That obviously doesn't get you around the problem that you are having now because apparently all these sequences are being claimed, but it may help the database. Alternatively, or perhaps additionally, the section recommends amending these rules so that sequence submission will identify new sequences, whether or not those sequences are being claimed, and identify prior art sequences. If new sequences are then added to claims during prosecution, the applicant could identify the sequence identification numbers for those sequences and the Examiner could then search those sequences. This concludes my remarks. Do you have questions? MR. LEHMAN: When you said that the rules should be amended to provide that the applicant provide prior art sequences, are you in a sense suggesting that the applicant provide the prior art that we would use to examine the application, that we wouldn't need to go beyond that? MS. BAKER: Well, most prior art sequences are read already in the public domain somewhere. What we were recommending is that we identify which ones are prior art and perhaps can give a reference as to where those could be located rather than submitting on a computer sequence submission the prior art sequences. Because that is already somewhere in the public domain. MR. LEHMAN: Thank you very much. MS. BAKER: Thank you. MR. LEHMAN: Is Gary Grace here? MR. PACE: Pace. MR. LEHMAN: Pace. Sorry. Gary Pace. MR. PACE: My name is Gary Pace. I am with the Agricultural Biotechnology Research Unit of Ciba Geigy Corporation, which is located in Research Triangle Park, North Carolina. I come here today as a scientist with 13 years of corporate research experience and 4 years of experience as a patent liaison, including considerable experience with sequence searching. This combination of experience has given me, I think, a reasonable perspective on the problem facing the PTO regarding issues relating to patent protection for nucleic acid sequences. The PTO is faced with the problem of how to determine obviousness for applications which disclose sequences. Based on the recent demonstrations to the public, it would appear that a comprehensive structurally based sequence search with extensive reporting of results is being done to examine these cases. The PTO has predicted that examination of sequences for obviousness would soon swamp the resources of Group 1800, and I submit would also swamp the applicant's ability to underwrite the examination. It is my opinion that this exhaustive approach is not required to adequately examine applications disclosing sequences. First, I would like to briefly address the established standards for determining obviousness and then, second, apply them to the specific subject of searching sequences. The Federal Circuit held in, In re Vaeck, that a prima facie case of obviousness was established by showing that the prior art, first, suggests making the claimed invention and, second, reveals that a skilled artisan would have a reasonable expectation of success in attaining the claimed invention. I suggest that this standard can be applied to the problem faced by the PTO with regard to sequences. How would the standard manifest itself in a search strategy? Because of the degeneracy of the genetic code, nucleic acid sequences which bear as little as 70 percent identity, can still encode the same protein. The Federal Circuit held in the Deuel and Bell cases that a partial or complete protein sequence, combined with a general method of cloning, does not make obvious a specific DNA coding sequence. Therefore, by searching protein names or sequences, then identifying any DNA sequences disclosed therein, one will have ascertained those sequences which are obvious under this standard. If the applicant's disclosure recites sequences which are different, then the applicant sequences would be nonobvious under this standard. Hence, for genes of known proteins, at most only a protein search should be sufficient to examine the application. It should be kept in mind that sequence searching algorithms will always find some similarity between the subject sequence and those present in a database. This creates what one of my colleagues calls meaningless homology. The key question, however, is what constitutes meaningful homology. I suggest that meaningful homology for the purposes of examination would be those values above which there is a reasonable expectation of success in obtaining the claimed invention. This means that there should be some rationale basis, other than numerical homology, for concluding that one sequence is obvious in view of another. If one searched the databases for sequences with meaningful homology, it would mean, for example, that a larger word size could be used in the search algorithm. This would significantly speed up the search and reduce the quantity of search results. On another point, excessively long sequences often arise from genomic cloning methods. Such sequences, when disclosed, frequently identify open reading frames or ORFs. A straightforward way of searching databases during the examination would be to conduct a search on an ORF by ORF basis rather than by breaking this large sequence up into arbitrary pieces. In addition, multiple strategies creating an exhaustive search are likely not needed. First, GenBank is available as a combined source with EMBL, which in turn has incorporated the contents of other databases. The only other database which is reported to be significantly different is GENESEO, which specializes in sequences disclosed in published applications or issued patents. Hence, for purposes of prior art searching, only at most two databases need be searched. Second, in the case of GenBank/EMBL the database is broken up into smaller libraries by species, order or some other taxonomic unit. Searching within specific libraries is much less time consuming than searching the entire database. Third, the choice to search program must be an educated one. For example, it has been reported that one commonly used program will find similarities between sequences even where is there is not one single identical residue between them. This would be the stereo typical example of meaningless homology. Without choosing carefully the parameters to use in a search, one obtains reams of output which have limited relevance to the question raised. Lastly, specialized databases are being developed. For example, a database called DBEST has been created which contains more than 200,000 expressed sequence tags from 26 different organisms. Examination of an application disclosing ESTs might be better accomplished through searching this specialized database. I would also like to reinforce some of the particular suggestions that have been discussed in the biotechnology industry organization community for the last several weeks regarding sequence listings in general. First, requiring a sequence to be listed simply because it is disclosed and of a certain length leads to the unnecessary listing of linkers and primers. While these linkers and primers may form part of an enabling disclosure, they are rarely part of the claimed invention. I advocate that only those sequences to be claimed should be required to be part of a formal sequence listing. Second, it often happens that sequences which are disclosed in a specification are already publicly available. For example, if an applicant has compared their claim sequence to a sequence in the prior art, the sequence rules currently require the prior art sequence also to be listed. This prior art sequence need not be listed since it will not be claimed. Each of these suggestions will reduce the resources needed by the PTO and the applicant to process formal listings. Lastly, as one who has prepared and repaired many sequence listings, I would like to thank the PTO for distributing their program checker. This program has reduced the frequency of error in our submitted listings, for which we are grateful, since some of our listings have exceeded 50, sometimes reaching 100 pages in length. To conclude, guidelines and rules are required to define an obviousness standard for sequences and to insure the conduct of informed searches. I believe that the guidance and direction provided by the specification can be turned into an effective search strategy, as is done for key word searches of the prior art. By doing so, the search can be limited to certain criteria based on an understanding of the biology of the claimed invention. How the claim is drafted should also dictate the search strategy. This approach might require specialized skills, but this is exactly the direction that Group 1800 has taken by hiring examiners with these very skills. The simplest suggestion that I can make is not to be deceived by an application which discloses sequences. The presence of a sequence, even a claimed sequence, does not necessarily mean that a sequence search is required to adequately determine whether the claimed invention is obvious or not. In many cases the use of gene names, protein names, source organisms, et cetera, can result in an effective search of the prior art for the purposes of examination. This information can generally be found in the background section of the application. In addition, most applicants already conduct a search of sequence databases prior to filing. By permitting a submission of sequence searches along the lines of an information disclosure statement, the Examiners may thereby be presented with a useful search strategy and useful search results. The problems identified by the PTO in regard to these issues require solutions. I submit that some solutions can be achieved quickly and guidelines should be published similar to the approach taken in resolving utility issues. It should also be realized, however, that biotechnology is one of those areas where technology is outpacing the law. Therefore, it is my belief that these issues will need continual study. Consequently, I suggest that the PTO establish a working group to develop solutions to these problems and to monitor development. I understand that many offers of assistance have already been made and I wish to add mine to this growing list. This is precisely the type of cooperation needed to address these issues now and in the future. Thank you. MR. LEHMAN: Thank you. Your working group would be presumably outside people, would be industry people, that would be called upon? MR. PACE: Well, my idea was the working group would be composed of PTO personnel, applicants, scientists, inventors, bioinformatics experts, to work on the problem collaboratively and jointly. MR. LEHMAN: I am not saying that is not an excellent suggestion, but there is a potential legal problem. We also heard another recommendation for that. The Federal Advisory Committee Act limits our capacity. There is an overall public policy against not having a million different advisory committees to advise government agencies. We have certain limitations sometimes on our capacity to do that. So we would have to see how that affects these suggestions. Thank you. Is there anybody else? (No response.) MR. LEHMAN: Thank you very much. Is Michael Fannon here? MR. FANNON: Good morning. My name is Mike Fannon. I am the Director of Bioinformatics at Human Genome Sciences. As many of you may be aware, HGS is responsible for much of the backlog in terms of these large sequence patents. So I am really pleased to be here to be able to work with you this morning. I have often wondered as we put these things together just how the Patent Office was going to work with these. I am pleased to be able to offer my comments here. I don't claim to have any expertise in the patent specific issues. However, we have developed substantial expertise in searching and analyzing DNA sequences, and I think much of that can apply in this context. There are a number of issues associated with searching DNA sequences, and in my work with HGS we, in scaling up our sequencing capacity, really undertook this by a baptism of fire. And there is a number of tradeoffs that you can make in terms of how we actually approach the idea of does the sequence exist in the public domain and how we keep track of use of the sequences. In particular, I will have some comments about the sequence search algorithms, the methodology by which we determine whether or not a sequence is similar or related to other sequences, the databases themselves and how we choose what we search against, the organization of the results and the interpretation of the results. What we find is that you can get reams of output, as we have heard from some of the other speakers, as a result of the searching techniques. We have in our context applied various types of database and user interface techniques to help our scientist wade through those and it may very well be appropriate in this context. And then citation searches, doing some of that in an on line fashion and linking those automatically with the search results. The issues associated with sequence search algorithms really involve a trade off, and the trade off is pretty simply stated as speed versus sensitivity. That is, the more rigorous the calculation, the longer it is going to take in terms of computational capacity. When we are looking for searches against prior art, we are really looking for in most cases near identical matches to what is known, in which case many of these techniques can be optimized to search for the near or identical matches, and in this case to reduce the number of these misleading or misinformative matches that you can get by extending these algorithms too far and having them find things that really don't exist or don't have a biological context. In many cases we found it helpful to combine the algorithms, to use the faster algorithm as a pre-screen to sort through the hundreds of thousands of sequences to create a data set that is much smaller to work with and then to use the more rigorous algorithms to analyze that subset. So, really in many cases these types of algorithms tend to be a religious issue in the bioinformatics community. But what we found is the practical solution is to use them, as appropriate, to use the faster algorithms with less sensitivity to do the high speed screening and then to analyze the result set that is the output from that. Now, the composition of the database you search against is also a very significant issue in any problem that involves large quantities of DNA sequences. Certainly we have all seen in this business the rapid growth of the databases, and, again, HGS has been a very great contributor there as well, in that the technology now enables us to discover sequences at a very high rate. There is also a well-known redundancy inherent in the public databases and any ability that we can put together to reduce the redundancy of the set that we use as the search set is certainly going to reduce the amount of computational effort required, as well as again helping the Examiners avoid chasing false positives, things that look like hits but really are inappropriate matches of one sort or another. Another way we could think of subsetting the database for the searches is to organize by date and by species. In the case of Human Genome Sciences, the bulk of our work is in the human domain, so we tend to analyze our data relative first, initially, against other human sequences and then secondarily against relationships, evolutionary relationships, we may pick up with other species. So by organizing, the other example that comes to mind is by date. If the application has a certain submission date, if we run that application against the sequences that were known as of that date, then much of the work the Examiner does to determine which came first is in a sense already done. The matches against newly submitted sequences won't show up in the listing because they weren't in the database in begin with, and by putting the extra effort into the thinking that goes into what do I search against, we find that, I think, on the back end in terms of Examiner time required to evaluate these matches, that that will go down proportionately. How the results are organized is an issue that HGS faces on a regular basis as well. We have the capacity to run thousands of sequences in a week and we run these various analytical techniques. The output itself from running these searches now becomes the inputs to our database. That is, we use database management techniques to analyze the search results. I think much of the state-of-the-art prior to the large sequencing efforts at the Genome centers and from companies like HGS and Insight has been involved in really working on the sequence searching technology. What we are finding is that many of the big shops, such as us, take now the results of those analyses, put them into relational database form. Then we can create listings and create queries that allow the Examiners to really identify those things that are the potential trouble spots, that is, the ones that really are hitting high quality matches to known sequences and the ones that require further evaluation, and to bring those to the foreground very quickly through database management techniques. Of course, reviewing the search results itself is an extremely time-consuming task. So we would suggest that the Patent Office undertake an initiative to develop some tools for reviewing the database of search results that I am talking about. This is pretty much standard practice with companies like Insight and HGS, Millenium, the ones that do the large scale sequencing, that their scientists really don't on the first level interact directly with the search results, but they interact with automatically summarized and catalogued search results using database management techniques. A logical consistent responsive user interface to that database then would be required to enable the Examiners to make very quick decisions, and in many cases to figure out which ones obviously don't qualify, if we can very quickly go through the set and classify these into groups that identify those things that really are novel, because they don't show sufficient sequence similarity, and just to really, again, help use the computer technology to narrow down the work that the Examiner is required to perform. A very useful technique that has been explored very much in the public domain by the National Center for Biotech Information is to link the search results with on line databases for citations. Evaluating results requires a good deal of judgment on the part of the Examiner. What we would envision here is something similar to the Blast searching methodology that is available in the public domain that is sponsored by the National Center for Biotech Information, whereby the results of the search is automatically linked then to GenBank. GenBank is cross-referenced with the midline references to, again, reduce the time, be able to come up with an evaluation of the match in a highly integrated manner without doing a separate sort of literature search. I do believe that the technology is available to really help streamline the problem and I certainly can empathize with the problem you are facing here since we deal with large quantities of DNA sequences and it's very challenging. There is no question about it. The problem domain is very technically demanding and we really applaud your efforts in trying to get a handle on this. I think, in summary, a combination of techniques is appropriate to streamline the process and I think at every stage of the game, starting from the method that we use to do the searches to what characterizes the database, to how we interact with the search results and make some conclusions, you know, can be developed into an integrated system that would substantially streamline the process as I understand it now. That is all I have. MR. LEHMAN: Thank you. Are there any other questions? (No response.) MR. LEHMAN: Thank you very much. MR. LEHMAN: Is Herb Jervis here? MR. JERVIS: Good morning. My name is Herb Jervis. I am the Associate Patent Counsel at SmithKline Beecham Corporation in Philadelphia. SmithKline Beecham spends about a billion dollars a year on research and development and has an R&D workforce of over 4000 developing new medicines. DNA research is a major platform upon which our discovery programs are built. In addition to our important alliances with Human Genome Sciences and The Institute for Genomic Research, SB also collaborates with over 140 research institutions and companies involved wholly or partly in DNA research. Meaningful intellectual property protection is fundamental to such an investment. While other speakers, here today and in San Diego last week, have focused on some of the technical solutions to DNA sequence searching problems cited in the PTO's March 12th Notice and illustrated in the demonstration held on April 3rd, I will confine my remarks to some prosecution-based strategies for easing the searching burden outlined by the PTO. While the temptation may be present in new areas of technology to search for "quick fixes" to prosecution problems by implementing special rules, such actions I think generally should be resisted because of the isolating effect special regulations and legislations have on the future development of the law with respect to a particular area of technology. The value of an exceedingly large body of precedent (even if it sometimes appears to be irreconcilable) is that it has resulted from the application of the law to particular facts and has yielded a rich tapestry of illustrations illuminated by accompanying reasoning. It is against this background that the next case can properly be viewed. It appears from my reading of cases such as Amgen v. Chuagi and Fiers v. Sugano that the Court of Appeals for the Federal Circuit is trying very hard not to treat biotechnological inventions very differently. I think it is fair to characterize the Federal Circuit's view as that DNA is a chemical polymer, albeit a complex one, the rules of chemical practice should apply. Having made the case against special treatment, I will qualify my remarks and say when viewed rationally and carefully, there should be provisions to assist both the PTO and the applicant in the prosecution of patent applications in certain areas of technology. Certainly, drawing requirements and the rules regarding their representations in mechanical cases and rules concerning the representation of color when it is a distinguishing feature in plant patents illustrate the effectiveness of such an approach. I have been practicing biotech patent law since the early 1980s and it is from that, dare I say historical, perspective that I wish to cast some more remarks today. It seems just like yesterday that a young patent examiner named Jim Martinell (who by the way used to have his name spelled out in genetic code words on his office door) and I struggled with the appropriate language for a declaration to support a deposited microorganism, which would be consistent with the requirements of U.S.C. 112 as we then understood them. I thank you for indulging me in that bit of reminiscence, but I do so to make a point, and that is the cooperation in this area is key. The PTO, and Group 1800 in particular, have often cooperated in an open and effective manner with the patent bar in respect of many of these problems surrounding biotech inventions, including the deposit declaration practice referred to, rules governing the representation of nucleic acid sequences. The parties even worked together fostering a legislative initiative (103(b)) thought to be necessary to overcome perceived judicial impediments to biotech process patent. I am not sure the world will forgive us for that. Biotechnology even has its own section of the MPEP (Chapter 2400). These hearings represent the most recent continuation of that important interaction. Before making a comment or two about prosecution based strategies, let me make one comment about the technical solutions we have heard today. I think historically extremely effective interactions relating to biotech prosecution were fostered by the so-called "Group 1800 roundtables" sponsored by the PTO and organizations such as AIPLA, BIO and IPO. It seems to me that the technical aspects of the searching problem would be an ideal topic to be addressed in such a forum. The success of specialized approaches, notwithstanding, it would be prudent to examine current prosecution practice to see if some solutions are evident before additional specialized examination regulations are contemplated. Having said that, let me start by recommending against an approach that has been proffered to simplify searching, and that is a strong reliance on a restriction requirement as a limiting approach. Such an approach I think is fundamentally flawed. First of all, the problems associated with searching sequences in the context of, for example, the applicants' reliance on functional language, as pointed out in the PTO Notice, does not change if there is a single sequence or a hundred sequences. Secondly, the over zealous restriction practice is already a subject of much concern within the patent bar. I think further encouragement to utilize restriction to solve these sequencing searching problems would make the problem worse. In a post-GATT world US inventors are severely disadvantaged by having to file and prosecute large numbers of divisionals, not only in terms of fees, but in terms of ultimate loss of patent term. I realize some would argue that we are now on the same footing as the rest of the world, except that the rest of the world operates on a much less restrictive unity of invention basis. Turning to a more positive suggestion, I would like to have the following proposal considered; a two-part examination where non-art issues and art issues are treated separately. I realize that 37 C.F.R. 1.105 and the MPEP Section 707 suggest that office actions be complete and not piecemeal, but I would submit that an office action based on an inneffective search is hardly complete. If such an approach were to be adopted, issues of utility, enablement of a particular claim scope, adequacy of written description, definiteness of claim language and the like could be resolved before undertaking the search, thus providing a more focused search inquiry. I realize that such a suggestion may have the potential for protracting prosecution, but again in a post-GATT world there would seem to be sufficient incentive to resolve such issues quickly. An efficient 112 dialog between the Examiner and applicant should result in a well-drafted and supported claim of definite scope which will lend themselves to more manageable prior art searching. At the time of the art-based examination, I would envision even greater cooperation between the applicant and the Examiner. It is hard for me to imagine that an applicant would go to the time and expense of filling on a sequence without having done some searching. Nominally, the results of that search appear as one or more references in an Information Disclosure Statement. If it were helpful to the Examining Corps, why not provide the search itself? I stress if it was helpful because, from my perspective, the most disappointing fact to come out of these discussions so far is that there is not a uniform searching procedure employed. It would seem to me critical that the PTO, and the patent bar, if appropriate, establish some uniform searching guidelines. If that were so, then some of the search burden may, in fact, be shifted to the applicant. For example, in the two-part process I have outlined above, should the applicant wish to speed up the prosecution of the search, maybe the search then could be provided by the applicant employing the PTO established protocol. The rules currently in place for making applications special or advancing prosecution would appear to provide a framework for such a process. Finally, it hasn't escaped my notice that my two-part examination proposal may eliminate one of the Group 1800 Examiners' favorite indoor sports, known in the profession as the old 112/103 squeeze. This feat of intellectual legerdemain occurs when an Examiner seeking to reduce the claim scope alleges that the art to which the invention relates is highly unpredictable, thus the claims must be limited to working examples. Then in the next paragraph when the Examiner is suggesting that the invention is obviously in view of two tenuously related pieces of prior art, all of a sudden the art to which the invention relates becomes predictable, giving rise to a reasonable likelihood of success. I won't say this two-part examination process will remove this habit entirely, but separating the grounds of rejection temporally may reduce the temptation. Thank you for the opportunity of speaking today. I remain ready to assist in resolution of this problem. MR. LEHMAN: If I understand you right, you are suggesting that by having a two-part examination that we would, first, prior in some cases to doing the search, we would deal with the Section 112 and maybe the Section 101 issues before getting to the search to determine obviousness. Is that correct? MR. JERVIS: Yes. I understood one of the problems is in some of the claim language where the claim recites that all sequences hybridizable therewith, and there is no definition of what that is. It makes a search almost impossible if there is even a single sequence in the application, let alone 100. MR. LEHMAN: In the software area we recently issued guidelines or we sort of moved in the opposite direction because there was a concern in the industry that we were in effect trying to get rid of work sometimes by, first, tormenting some of our applicants on 112 and 101 issues. And so we sort of revised our procedures in the form of guidelines -- Nancy can correct me if I am wrong about this -- to more integrate the process. So this would be sort of moving in the other direction. Would it? MS. LINCK: We certainly have taken the position that a search should be done prior to the entry of any Section 101 rejections. However, in working with the examining core, it really turns out that the 112 issues do need to be addressed. Although they are not to be entered as a rejection, they need to be addressed prior to conducting this search. So perhaps maybe the order leading into the application and examining the application would be a compromise between actually having two discreet steps. I don't know. We hadn't thought about this with respect to the biotech area. MR. JERVIS: If the 112 issues were resolved first, it may resolve the problem of doing the adequate search in terms of timing and still maintaining a priority date as well because that could be postponed to sometime later. MS. LINCK: I think we found with the software guidelines that it will be required that the 112 issues be looked into early on. So, that would be consistent with what Mr. Jervis is suggesting. MR. JERVIS: Thank you. MR. KUNIN: You indicated that you opposed restriction requirements. You also seemed to try to draw a line between U.S. restriction practice and, as I believe you characterized it, a more liberal international standard of unity of invention. But isn't it the case that in applying the international unity of invention standard in places like Japan and the European patent office and elsewhere, that for these types of cases they are holding that the sequences lack the unity of invention? MR. JERVIS: Well, my personal experience, I haven't actually prosecuted the mega types of sequence cases in the European patent office yet. I have never received in the European patent office a 218 waive restriction requirement, I must say, or unit invention requirement. I have gotten those from the U.S. Patent Office. What I am saying is that I think it's the application of those rules that needs to be tightened a bit. But, in general, we file many less divisional applications in Europe than we do in the U.S. as a general rule in biotechnology, not just sequence cases. MR. LEHMAN: But in this particular category of cases, I think it's our understanding that you would have to file 218 individual patent applications in the European Patent Office. MR. KUNIN: I believe in the European Patent Office they are making you take them one sequence at a time and saying that each sequence is an independent compound. MR. JERVIS: Not in my experience they haven't so far. MR. LEHMAN: Any other questions? (No response.) MR. LEHMAN: Thank you very much. MR. SHPUNTOFF: Thank you. My name is Albert Shpuntoff. That is S-H-P-U-N-T-O-F-F, being the only name so far that isn't easily spellable from pronunciation. I work as a consultant very closely with Mass Par. I was formerly the education manager and leader of the post implementation bioinformatics professional services for Mass Par. Having listened to the comments here and in La Jolla, I found a lot of commonality with some of the things we have done with customers and felt a need perhaps to avoid having it seem like we were blind sighting the people we have worked with at the patent office as incumbent in the massively parallel processing area with new applications, as one of my colleagues presented at La Jolla. So, first I would like to respond to a couple of things that were said earlier today. The wide use of a program does not immediately make it a standard for the industry. There is no standard organization at this point that is saying what hardware or software is standard for this industry. There are a number of competitors who claim a standard at this point is aggressive marketing, which should be seen as such. I am an admirer of aggressive marketing, but I think one should avoid claiming standards until they truly are standards by an appropriate accrediting organization or industry panel. I have been involved in implementing a number of automated sequence processing schemes at a number of Mass Par's customers and have been active in advising others who are at the point of starting up an operation. We have seen quite a variety of strategies for dealing with the overwhelming number of sequences that are to be processed in the discovery phase, as well as is there any point in pursuing it to the Patent Office. So I have seen much of the issues from the other side. No matter how much processing gets increased, the amount of proposed discovery increases at least as fast. So while we propose ways of working smarter and working harder to obtain results quicker, I think it's fair to say that the workload does tend to increase as the computing power increases. And as the availability of attractive computing to do bioinformatics in the industry has increaseed, so certainly has the number of novel results to be presented to the Patent Office. We have customers who have chosen to do annotation searches on a relational database version of the standard databases to produce a custom version of the database to be searched. Because it is possible quickly on the Mass Par to produce a searchable database for Smith Waterman searching, we have a number of customers who have chosen to use annotation to restrict the search space and thereby to do more efficient searches, making better use of the parallel time available. We have a number of sites who have chosen to preprocess databases so as to eliminate potential matches that you have to look at over and over again with different annotations. There has been a number of mentions of standard software from NCBI that people are recommending be used. One of the products available from them is a product called NRDB, which allows you to take multiple databases and remove identical sequences from additional copies of the database. There has been a lot of comment at these hearings about the issue of meaningless homology, false hits, large amounts of data coming out of the individual search runs on machines. We would like to support the idea that many of the false matches are involved with such things as repetitive DNA elements, which are parts of sequences that are to be submitted, such things as aloe repeats. There are databases of just the aloe repeats which might well be an additional very useful augmentation to the search to help to identify in perhaps an automated fashion what are the meaningless homology hits during the search process. In setting up high throughput screening for several pharmaceutical companies this has been sufficiently important to make the operation practical that we have made it the second step in an automated processing. After identifying that the sequence being looked at is of sufficient quality to even proceed with analysis, the next step has been to mask out repetitive DNA and aloe sequences with the idea of providing more high quality hits and reducing the number of false hits that have to be followed through by an Examiner. So, this has been an important part, I believe, in the La Jolla hearings. The gentleman from Insight spoke about this also being part of their automated sequence processing. To the degree that the Patent Examiners are spending too much time following hits based on repetitive sequences not key to the art, but which show up as statistical artifacts, it is possible to develop a process which involves masking those prior to running searches. MR. LEHMAN: Does that have to be done like on an individual basis, each time you have to sort of program? You call it annotation. Do you have to annotate the program for the particular search that you are doing so that the Examiner would have to become proficient in that search technology or is that something you could standardize? MR. SHPUNTOFF: Where we have done this and had it contribute, we have used a database which is available at NCBI called the rep base, which is a database of repetitive DNA elements. It is possible to run a relatively simple search. Even Fass Day or Blast are sufficient to identify these repetitive sequences. You don't particularly care to have the most sensitive search for things that are going to be masked out. But even if it's just to run an additional search of these to aid the examiner as a general what in this sequence is in the aloe base or repetitive DNA base already to aid in not following with additional searches, those things which are relatively unlikely to be of significance. In the case that we have automated we actually make this a formal first step and some small percentage of the sequences being searched are almost immediately ruled out as being completely aloes. But for the most part this allows for the output of the first round of searches to be of higher significance by eliminating the most common matches. Another focus of the work we have been doing has been to post process the outputs of MP search. As we have heard, there is a lot of interest in what can be done in terms of post processing. Frequently our customers have been taking the outputs and putting them into either relational databases for maintaining sequence databases or feeding them back into lab management systems based on standard database systems. We are able to parse the reports that are produced using the statistics that are an output of the Smith Waterman technique to provide keys to what further searches might be done, and in a number of the lab settings have been able to choose, based on whether or not we have high matches of a particular sort early during the process, to automatically generate additional searches which are appropriate based on the kind of match that occurred at the beginning. So, in some settings this has led to the equivalent of a family of paradigm searches which Examiners might use through something like the existing graphic interface to deal with sort of standard searches based on particular paradigms where requests fall within particular sets of parameters. I think I will stop with that. MR. LEHMAN: Thank you. Steve. MR. KUNIN: Could you comment on the suggestion that was made in San Diego by the representative from Mass Par with respect to the clustering technique? There was an indication that perhaps a number of the sequences had a fairly high degree of similarity that they could be clustered together and perhaps treated as a batch. Do you have any additional comments along those lines. MR. SHPUNTOFF: I think that what John Burke was talking about is very current research that is being used throughout the community. There will be a number of conferences in which various clustering techniques will be talked about in the very near future with the idea of improving the way with which we do searches. The databases at present contain much redundancy, many attempts to sequence things, which have resulted in what should be identified as a cluster and for which there could be a consensus gene representation generated through techniques such as clustering and multiple sequence alignment. The database that was mentioned earlier today, DBEST, is an attempt to try to provide a cleaner database for expressed sequence tags with the attempt being to do the sort of clustering analysis and multiple sequence alignment prior to presenting the database to be searched. Whether the processing is done within the Patent Office to maintain an indexed and smaller database, whether the clustering technique is used to help understand the commonality in sequences being provided by an applicant, we feel it's a useful technique which has been parallelized for the machine and one which should be integrated into the processes. The discussion earlier about word base search and the possibility of using longer word size in doing searches that would be a little bit faster would be relevant to the particular software he was discussing and PT-2 cluster which was developed at the University of Houston and of which he is one of the authors. MR. LEHMAN: Thank you very much. Has Michael Langan arrived yet? (No response.) MR. LEHMAN: If not, that concludes the witnesses who I have signed up. Has anyone else signed or is there anyone else who would like to testify at this time? (No response.) MR. LEHMAN: If not, that concludes our hearing. Let me mention that the written comments on the notice of hearings and request for public comment must be submitted by today, I guess, by the end of business today. And a transcript of this hearing will be made available as soon as we can make it available, as well as all the written comments that we have received, and that will be available for inspection and review on or about May 13, 1996 in Room 520 of Crystal Park 1. That is at 2011 Crystal Drive. And it will be available also on the Internet at ftp.uspto.gov. Ftp.uspto.gov. All written comments and oral comments made here today will be taken into consideration before we implement any policy on this issue. We can't promise though that any comments received after today will be taken into consideration. I would like to remind those present today that the next public hearing will be held on May 2nd from 9:00 a.m. to 5:00 p.m. in this room on issues related to patent protection for therapeutic and diagnostic methods. And notice for comments and public hearing on that issue was published in the March 13, 1996 Federal Register, 61 Federal Register 10320 and also in the Official Gazette, 1185 OG 64. That concludes the hearing today. I would like to thank all the witnesses that came for their help and all the interested parties that came to hear what they had to say. Thank you. (Whereupon, at 10:40 a.m. the proceedings adjourned.)