Department of Commerce Patent and Trademark Office 37 CFR Part 1 [Docket No: 960828235-6235-01] RIN: 0651-AA88 Changes Implementing Nucleotide and/or Amino Acid Sequence Listings Agency: Patent and Trademark Office, Commerce. Action: Notice of Proposed Rulemaking and Request for Comments. Summary: The Patent and Trademark Office (PTO) is proposing to amend the rules for submitting nucleic acid or amino acid sequences in computer readable form (CRF) for patent applications to simplify the requirements of the rules, to rearrange portions of the rules for better understanding and to establish consistent rules to permit a single internationally acceptable computer readable form. The Sequence Listing will be presented in an international, language neutral format using numeric identifiers rather than the current subject headings and the paper Sequence Listing will be a separately numbered section of the patent application. Sequences which contain fewer than four (4) specifically identified nucleotides or amino acids will no longer be required to be submitted in computer readable form. Date: Written comments must be received by December 3, 1996. Addresses: Address written comments to: Box Comments - Patents, Assistant Commissioner for Patents, Washington, DC 20231, Attention: Esther M. Kepplinger or by Fax to (703) 305-3601 to her attention. Comments may be sent by mail message over the Internet addressed to seqrule@uspto.gov. The written comments will be available for public inspection in Suite 520, Crystal Park One, 2011 Crystal Drive, Arlington, Virginia. For Further Information Contact: Esther M. Kepplinger, by telephone at (703) 308-2339 or by mail addressed to: Box Comments - Patents, Assistant Commissioner for Patents, Washington, DC 20231 marked to her attention or by Fax to (703) 305-3601 or by electronic mail at ekepplin@uspto.gov. Supplemental Information: The existing sequence rules (37 CFR 1.821-1.825) provide a standardized format for the description of nucleotide and amino acid sequence data in patent applications and require the submission of such sequences in computer readable form (CRF). The existing sequence rules have provided the following benefits to the PTO: (1) improved search capabilities; (2) improved interference detection; (3) more efficient examination; (4) cost savings for the input of the sequence data; (5) more efficient and accurate printing of sequences in patents; (6) exchange of the sequence data with other patent offices electronically and (7) improved public access to the sequences electronically. In an effort to streamline and reduce the procedural requirements of the existing rules and to respond to the needs of our customers while establishing an internationally acceptable standard, the PTO proposes to modify the current rules requiring the submission of computer readable forms for nucleotide and amino acid sequences. To decrease the burden on applicants who file applications containing nucleotide and amino acid sequence information under the Patent Cooperation Treaty (PCT), the PTO entered into discussions at the PCT Meeting of International Authorities (MIA) in November 1994 on changing the applicable rules for submission and transfer of Sequence Listings. Under the current PCT rules, each International Searching Authority and national Office may set the standard for submission of the paper and electronic Sequence Listing information. This may impose a burden on applicants of providing several different formats of Sequence Listings in different languages during the international and national phases of the PCT procedure. Under the current PCT practice, the applicant serves as the data repository for requests during each stage of the PCT practice for new electronic copies of the Sequence Listings. Under national practice, a Sequence Listing may be required to be translated into the national language at considerable cost and posing the danger that the data could be inadvertently altered. At the November 1994 MIA to address these problems, rule changes were proposed to require a language neutral Sequence Listing submission which would suffice for PCT and national stage sequence information processing. Initial Trilateral meetings and correspondence suggest that such a sequence submission would be acceptable under European Patent Office (EPO) and Japanese Patent Office (JPO) procedures, thus further lessening the burden on applicants. These sequence rules are proposed to be revised in concert with World Intellectual Property Organization (WIPO) International Standards ST.23 and ST.24 for the paper and electronic submission of sequence information in patent applications, as well as PCT requirements. This should result in an applicant having to produce a single Sequence Listing that would satisfy the filing requirements in all countries, as well as permitting an applicant to submit only a single electronic Sequence Listing in PCT applications. In an effort to profit from the experiences of the nucleotide database information providers which pioneered the electronic submission of sequence information, the PTO discussed with them the possible simplification of the PTO sequence submission rules. In response to their advice (which confirmed the PTO experience), the number of mandatory data elements is proposed to be reduced. Thus, the proposed rule changes include: (1) use of numeric identifiers to replace the language subject headings within the submission; (2) elimination of unnecessary and confusing data elements; (3) movement of the paper Sequence Listing to the end of the application as a section with separately numbered pages; (4) modification of 37 CFR 1.77 to include the paper Sequence Listing as a part of the specification and to provide a place for the paper Sequence Listing in the printed patent; (5) elimination of the requirement to provide a submission for sequences with fewer than four specifically defined nucleotides or amino acids; (6) use of lower-case one-letter codes for nucleotide bases; (7) rearrangement of portions of the rules to improve their context; and (8) clarification and simplification of the rules to aid in understanding of the requirements that they set forth. Request For Comments: The PTO is particularly interested in receiving comments on three queries. Currently sequences containing D-amino acids need not be provided in the "Sequence Listing", but the PTO has accepted voluntary submissions of sequences containing D-amino acids. The commercially available sequence searching software used to search prior art databases is not capable of discerning D-amino acids since they do not have distinct designators. It is for this reason that the rules do not require a computer readable form for the disclosure of sequences which contain D-amino acids. Those seeking to volunteer the information in accordance with these rules might be seeking assurance that a machine search of the closest prior art will be conducted by the PTO or they consider the information useful and wish it to be in the database. If the PTO does not accept voluntary submissions, that would exclude information from the databases that at least some applicants believe to be valuable information. The potential conflict created by accepting these D-amino acid-containing sequences is that the published database will contain sequences with D-amino acids and those using the published database may be operating on the assumption that it does not, given the indication in 1.821(a)(2) that D-amino acid-containing sequences are not intended to be included. For this reason, there may be an advantage to having the D-amino acids indicated by Xaa to alert the user that the Feature section must be consulted. A disadvantage of voluntary submissions is that they will result in the generation of a database which is incomplete and cannot be relied upon to provide a complete search of the U.S. patent literature including sequences containing D-amino acids. The PTO seeks comments on the following query: (1) Should the PTO accept voluntary submissions of computer readable forms and Sequence Listings where a D-amino acid is contained in the sequence? If such voluntary submissions are accepted, should there be a restriction on the choice of identifying a D-amino acid by an Xaa or by its L-amino acid counterpart abbreviation? Section 1.821(c) will continue to require that all sequence information contained in a disclosure, including in the specification, drawings or claims, be presented in the Sequence Listing in accordance with 1.821 - 1.825. This provision does not discriminate between prior art sequences and "new" sequences. The PTO has received comments in the past and is seeking additional comments on this issue. The suggestion has been made that sequences which are prior art, and/or are contained in a database at the time of filing, need not be provided to the PTO in computer readable form since the sequence information is obtainable by other means. Responsive to these public comments, the PTO is considering amending the rules to permit omission of some sequences from the Sequence Listing if these sequences are admitted prior art to applicant and are in a publicly available, electronic, sequence database and the database accession number is supplied. The suggestion to exclude prior art sequences was made when 1.821 - 1.825 were originally adopted. 55 FR 18230, 18237 (1990). The final rules, however, required the submission of all sequence information in computer readable form. The reasons for that decision include: 1) the assessment of whether a particular sequence falls within the requirements of the current rules is simple; 2) the general public is assured that all patents which contain any sequence information contain all of the sequence information in the Sequence Listing and all sequences are available in a computer accessible form; 3) as a publication, the contextual association of new and old information is potentially unique to the patent and very valuable to anyone assessing the state of the art at the time of a patented invention, and thus are desirable to be present in electronic form in association with that patent; and 4) these rules do not require any information to be disclosed in the form of a sequence, but rather require a particular format whenever information is presented in the form of a sequence. These reasons continue to be relevant. The PTO is concerned about how such a provision would be drafted without creating difficult questions. A provision which excludes sequences whenever a sequence is prior art and has previously been included in a publicly available, electronic, sequence database appears to be straightforward; however, many technical and legal issues would result. What constitutes a publicly available, electronic, sequence database? Would the USPTO and the other patent offices which have similar rules be required to produce a list of internationally accepted databases? What would be the criteria for such acceptance? An additional issue would exist involving electronic records maintenance: is there any assurance that once information is contained in a database that it will be retained and available indefinitely without alteration? Changes to the information in nucleic acid sequence databases resulting from the discovery of sequencing errors are well-known. Does the mere existence of the sequence information in such a record constitute reasonable means of retrieval? Would not one need some text basis or other identifier to retrieve the information? Concerns have been voiced that the redundancy of including old sequences in the PTO database creates electronic searching problems, such as increased cost and reduced speed. Upon investigation, it has been found that requiring all disclosed sequences to be included in the Sequence Listing does not cause search processing problems at the PTO or incur increased costs. The PTO seeks comments on the following query: (2) Should the provisions of 37 CFR 1.821(c) be altered to exclude some prior art sequences from inclusion in the Sequence Listing even though they are presented in a patent application disclosure as sequences? Should the reference to an accession number of an admitted prior art sequence in a publicly available, electronic, sequence database suffice and exclude that sequence from the requirements of the sequence rules? At the November 1994 MIA, it was proposed that the Sequence Listings submitted in an international application filed under the PCT would no longer be published on paper. It was suggested that the Sequence Listings be published electronically and be available in the electronic form from several sequence repositories throughout the world. These repositories would have the Sequence Listings available in electronic form at the time of publication of the PCT pamphlet. The PTO seeks comments on the following query: (3) Should Sequence Listings filed in an international application filed under the PCT be published only electronically and made available for retrieval electronically by an accession number from several sequence repositories? Written comments will be available for public inspection and will be available on the Internet (address: www.uspto.gov). Commentators should note that since their comments will be made publicly available, information that is not desired to be made public, such as the address and phone number of the commentator, should not be included in the comments. A public hearing will not be conducted. Discussion of Specific Rules Section 1.77 is proposed to be amended by revising paragraph (g), which would provide for a reference to a Sequence Listing Annex, if any exists. In the application as filed, on a separate page immediately before the claims, reference would be made to a Sequence Listing Annex and the Sequence Listing would be provided as a separately numbered section or Annex to the application. In a printed patent the Sequence Listing would appear immediately before the claims. Section 1.77 is proposed to be amended to redesignate existing paragraphs (g) - (j) as paragraphs (h) - (k) and add an additional paragraph (l) Sequence Listing Annex. In the application as filed, the Sequence Listing would be provided by applicants as a separately numbered section or Annex of the application. The pages of the Sequence Listing Annex should be numbered independently from the specification using sequential integers preceded by "A" to identify them as a part of the Annex and to prevent any confusion which might arise from using numbers already used in the specification. In a printed patent the Sequence Listing would be printed immediately before the claims. In cases where the Sequence Listing is voluminous, the files are difficult to handle. This change would permit easier storage of very large Sequence Listings apart from the main part of the application during pendency. The presentation of the Sequence Listing as a separate Annex would also facilitate compliance with PCT requirements and other national patent office rules. Sections 1.821(a)(1) and (2) are proposed to be amended by referring to sections in World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.23, paragraphs 8 through 12, April 1994, herein incorporated by reference, rather than to paragraphs in 1.822. The WIPO Standard ST. 23 (April 1994) is consistent with 1.822 except for certain corrections which are noted herein and the requirement of the use of the lower case for the one-letter code for nucleotide bases. The proposed rule states that the incorporation has been approved. This language is required by the Federal Register. This incorporation by reference will be reviewed by the Director of the Federal Register in accordance with 5 U.S.C. 552(a) and 1 CFR part 51 before any Final Rule is adopted. Copies may be obtained from the World Intellectual Property Organization; 34 chemin des Colombettes; 1211 Geneva 20 Switzerland. Copies may be inspected at the Patent Search Room; Crystal Plaza 3, Lobby Level; 2021 South Clark Place; Arlington, VA 22202; or at the Office of the Federal Register, 800 North Capitol Street, NW, Suite 700, Washington, DC 20408. Section 1.821(a) is proposed to be amended so that sequences with fewer than four specifically defined amino acids or nucleotides would be expressly excluded from this rule. "Specifically defined" means those amino acids other than "Xaa" and those nucleotide bases other than "N" defined in accordance with WIPO Standard ST.23. This change is being proposed to reduce the burden on applicants for those sequences that contain only a minimal amount of sequence information. For example, if an amino acid sequence is disclosed as being entirely "Xaa" residues, the 1990 version of the sequence rules would require this sequence to be submitted in computer readable form. However, this sequence has no value as sequence information because each of the positions is represented as a "wild card." Such low-information sequences are not very useful in any sequence matching and alignment algorithm. In order to minimize the inclusion of such low-information-value sequence data in the database and to relieve the burden on applicants to submit low-information-value sequences, the Office proposes this change to the sequence rules. If applicants should wish to voluntarily submit a CRF for such sequences, they would be accepted and entered in the PTO's database. It is not necessary that any of the non-N or non-Xaa residues be adjacent to any other non-N or non-Xaa residue in order for a sequence to be subject to 1.821(a). Sections 1.821(a)(2) and 1.822(b) are proposed to be amended by changing "elsewhere in the `Sequence Listing'" to "in the Feature section." The purpose of this change is to enhance clarity of the rule. The only place in the "Sequence Listing" where additional information is permitted is in the Feature section. The current language implies that there are other acceptable portions of the "Sequence Listing" appropriate for additional information and thus is ambiguous and misleading. Section 1.821(a)(2) will continue to indicate that sequences containing D-amino acids need not comply with the provisions of 1.822 - 1.825. To date, the PTO has accepted voluntary submissions of sequences which contain D-amino acids. The sequence information has either indicated an Xaa at each occurrence of a D-amino acid or has indicated the amino acid (or imino acid) by abbreviation as if it were an L-amino acid (or imino acid) and explained the existence of the D-amino acid in the Feature section associated with that sequence. Section 1.821(c) is proposed to be amended by clarifying and establishing a language neutral format sequence listing. Specifically, the use of integer identifiers is proposed for identifying sequences. Where a sequence integer identifier is intentionally omitted, it must be noted by applicant to avoid confusion in the published document. Section 1.821(d) is proposed to be amended by changing "assigned identifier" to "integer identifier" to be consistent with the term used in 1.821(c). Section 1.821(d) is proposed to be amended by adding the phrase, "preceded by `SEQ ID NO:' ". This change is necessitated by the change to 1.821(c). Since the integer identifier in the "Sequence Listing" would be defined now as a numeral only, it is necessary that any reference to a particular sequence in the specification and claims be preceded by "SEQ ID NO:". It is not acceptable to use only a numeric identifier, such as "<200>" or "<400>"- see infra Sequence Listing table, in the description or the claims because one reading a patent may not reasonably be presumed to be familiar with the meanings of numeric identifiers. Section 1.821(e) is proposed to be amended by setting forth the procedure for transferring an accepted computer readable Sequence Listing from one application to a subsequently filed application. The existing rules did not adequately describe the process of transferring a computer readable Sequence Listing into a new application if an identical CRF was previously accepted by the PTO for another application. A further description of the intended procedures has been added for purposes of clarity. This section is intended to describe that if a computer readable Sequence Listing is identical to one that is error-free and already on file at the PTO, an applicant has two options. A new diskette may be submitted, or an applicant may submit a statement clearly directing the PTO to use the previously submitted CRF since they are identical, and that the paper copy of the Sequence Listing in the new application is identical to the disk in the previous application. Section 1.821(g) is proposed to be amended by correcting the reference to 35 U.S.C. 111(a) applications. Section 1.821(h) is proposed to be amended by clarifying that this rule applies to all international applications searched and examined by the PTO. In addition to international applications filed in the United States Receiving Office, the United States is a competent International Searching Authority (ISA) for applications filed in receiving Offices of, or acting for, Brazil, Israel, Mexico, and Trinidad and Tobago. The United States is also a competent ISA for applications filed in the International Bureau where at least one of the applicants is a resident or national of the United States or a resident or national of Barbados. In addition, the United States acts as an International Preliminary Examining Authority for certain applications searched in the EPO. The language change regarding the time limit for compliance and statement accompanying the submission are necessary to conform with the language found in PCT Rule 13ter.1. Section 1.822 is proposed to be revised for clarity and better organization and to accommodate an international request for the use of lower case one-letter codes for nucleotide bases. Section 1.822 (b) is proposed to be amended to refer to WIPO Standard ST.23 (April 1994) and incorporate the information therein. The reorganization groups all nucleotide and all amino acid formats together. Section 1.822 (c)(1) is proposed to be amended by requiring the use of lower case one-letter code for the nucleotide bases. This change would put the PTO requirements in conformance with most large databases. Additionally, the use of lower case letters in a sequence makes the confusion of "g" for "c" and vice versa less likely. Current paragraph (d) is proposed to be redesignated as a part of paragraph (c)(3) and current paragraph (e) is proposed to be deleted with the substance of the paragraph being incorporated into (d)(1). Current paragraph (f) is proposed to be redesignated as paragraph (c)(2); current paragraph (g) is proposed to be redesignated as paragraph (c)(3) and amended to incorporate current paragraph (d). Current paragraph (h) is proposed to be redesignated as paragraph (d)(2). Current paragraphs (i) and (j) are proposed to be redesignated as (c)(4) and (c)(5). Current paragraph (k) is proposed to be redesignated as (d)(3). Current paragraph (l) is proposed to be redesignated as (c)(6) and current paragraph (m) is proposed to be redesignated as (d)(4). Current paragraph (n) is proposed to be redesignated as (c)(7) and amended to delete a sentence, the substance of which is incorporated into (d)(4). Paragraph (d)(1) is proposed to be added to include a reference to WIPO Standard ST.23 (April 1994). Paragraphs (d)(2-4) incorporate the material from current paragraphs (h), (k), (m) and a sentence of (n). Paragraph (d)(5) is proposed to be added to clarify that the use of terminator symbols is not acceptable in amino acid sequences either as "internal" terminator symbols or following the carboxy terminal amino acid of a peptide or polypeptide. Current paragraph (o) is proposed to be redesignated as paragraph (e) and amended to recite integer identifier to be consistent with 1.821 (c) and to permit the language neutral submission. Current paragraph (p) is proposed to be deleted. The lists of nucleic acid and amino acid abbreviations and the lists of modified base controlled vocabulary and the modified and unusual amino acids would be replaced by reference to WIPO Standard ST.23 RECOMMENDATION FOR THE PRESENTATION OF NUCLEOTIDE AND AMINO ACID SEQUENCE LISTINGS IN PATENT APPLICATIONS AND IN PUBLISHED PATENT DOCUMENTS (April 1994) to simplify and shorten the rules. This information will also appear in an appropriate section of the Manual of Patent Examining Procedure to assist applicants in preparing Sequence Listings. For purposes of facilitating review of these proposed rule changes, appropriate corrected excerpts of paragraphs 8, 9, 11 and 12 of WIPO Standard ST.23 are provided below. WIPO Standard ST.23, paragraph 8, provides that the bases of a nucleotide sequence should be represented using the following one-letter code for nucleotide sequence characters. Symbol Meaning Origin of designation A A Adenine G G Guanine C C Cytosine T T Thymine U U Uracil R G or A puRine Y T/U or C pYrimidine M A or C aMino K G or T/U Keto S G or C Strong interactions 3H-bonds W A or T/U Weak interactions 2H-bonds B G or C or T/U not A D A or G or T/U not C H A or C or T/U not G V A or G or C not T, not U N (A or G or C or T/U) or (unknown or other) aNy WIPO Standard ST.23, paragraph 9, provides: Modified bases may be represented as the corresponding unmodified bases in the sequence itself if the modified base is one of those listed below and the modification is further described elsewhere in the Sequence Listing. The codes from the list below may be used in the description or the Sequence Listing but not in the sequence itself. Symbol Meaning ac4c 4-acetylcytidine chm5u 5-(carboxyhydroxylmethyl)uridine cm 2'-O-methylcytidine cmnm5s2u 5-carboxymethylaminomethyl-2- thiouridine cmnm5u 5-carboxymethylaminomethyluridine d dihydrouridine fm 2'-O-methylpseudouridine gal q *beta, D-galactosylqueosine gm 2'-O-methylguanosine i inosine i6a N6-isopentenyladenosine m1a 1-methyladenosine m1f 1-methylpseudouridine m1g 1-methylguanosine m1i 1-methylinosine m22g 2,2-dimethylguanosine m2a 2-methyladenosine m2g 2-methylguanosine m3c 3-methylcytidine m5c 5-methylcytidine m6a N6-methyladenosine m7g 7-methylguanosine mam5u 5-methylaminomethyluridine mam5s2u 5-methoxyaminomethyl-2-thiouridine man q *beta, D-mannosylqueosine mcm5s2u 5-methoxycarbonylmethyl-2-thiouridine mcm5u 5-methoxycarbonylmethyluridine mo5u 5-methoxyuridine ms2i6a 2-methylthio-N6-isopentenyladenosine ms2t6a N-((9-beta-D-ribofuranosyl-2- methylthiopurine-6-yl) carbamoyl) threonine mt6a N-((9-beta-D-ribofuranosylpurine-6-yl)N- methylcarbamoyl) threonine mv uridine-5-oxyacetic acid-methylester o5u uridine-5-oxyacetic acid (v) osyw wybutoxosine p pseudouridine q *queosine s2c 2-thiocytidine s2t 5-methyl-2-thiouridine s2u 2-thiouridine s4u 4-thiouridine t 5-methyluridine t6a N-((9-beta-D-ribofuranosylpurine-6-yl)- carbamoyl)threonine tm 2'-O-methyl-5-methyluridine um 2'-O-methyluridine yw wybutosine x 3-(3-amino-3-carboxy-propyl)uridine, (acp3)u * Indicates a correction of minor typographical errors. WIPO Standard ST.23, paragraph 11, provides that the amino acids should be represented using the following three-letter code with the first letter as a capital. Symbol Meaning Ala Alanine Cys Cysteine Asp Aspartic Acid Glu Glutamic Acid Phe Phenylalanine Gly Glycine His Histidine Ile Isoleucine Lys Lysine Leu Leucine Met Methionine Asn Asparagine Pro Proline Gln Glutamine Arg Arginine Ser Serine Thr Threonine Val Valine Trp Tryptophan Tyr Tyrosine Asx Asp or Asn Glx Glu or Gln Xaa unknown or other WIPO Standard ST.23, paragraph 12, provides: Modified and unusual amino acids may be represented as the corresponding unmodified amino acids in the sequence itself if the modified amino acid is one of those listed below and the modification is further described elsewhere in the Sequence Listing. The codes from the list below may be used in the description or the Sequence Listing but not in the sequence itself. Symbol Meaning Aad 2-Aminoadipic acid bAad 3-aminoadipic acid bAla beta-Alanine, beta-Aminopropionic acid Abu 2-Aminobutyric acid 4Abu 4-Aminobutyric acid, piperidinic acid Acp 6-Aminocaproic acid Ahe 2-Aminoheptanoic acid Aib 2-Aminoisobutyric acid bAib 3-Aminoisobutyric acid Apm 2-Aminopimelic acid Dbu *2,4- Diaminobutyric acid Des Desmosine Dpm 2,2'-Diaminopimelic acid Dpr 2,3-Diaminopropionic acid EtGly N-Ethylglycine EtAsn N-Ethylasparagine Hyl Hydroxylysine aHyl allo-Hydroxylysine 3Hyp 3-Hydroxyproline 4Hyp 4-Hydroxyproline Ide Isodesmosine *aIle allo-Isoleucine MeGly N-Methylglycine, sarcosine *MeIle N-Methylisoleucine MeLys 6-N-Methyllysine MeVal N-Methylvaline Nva Norvaline Nle Norleucine Orn Ornithine * Indicates a correction of a minor typographical error. Section 1.823(a) is proposed to be amended to provide for a reference to a Sequence Listing Annex in the application immediately before the claims and to provide the paper Sequence Listing as an Annex, which is a separately numbered section of the application. This is an internationally desired change and also would facilitate easier storage of very large Sequence Listings separate from the main part of the file during pendency of the application. Section 1.823(b) is proposed to be amended to insert a table to depict items of information (data elements) which are to be included in the Sequence Listing and to indicate whether they are mandatory or optional. The proposed revisions reflect the change to a language neutral submission. The English language data elements headings would be replaced by numeric identifiers. The numeric identifiers are similar to INID codes ("Internationally agreed Numbers for the Identification of Data" as per WIPO Standard ST.9, December 1990) already utilized internationally in patent documents. This change would facilitate a single international standard which would eliminate the need for translations into non-English languages. Large portions of Section 1.823(b) are proposed to be deleted to lessen the burden on applicants and to eliminate collections of material which is of limited use to the Office. The following items are typical of material which would be deleted: (1)(vi)(C) CLASSIFICATION; (2)(i)(C) STRANDEDNESS; (2)(ii) MOLECULE TYPE through (2)(vii)(C) UNITS; and (2)(ix)(C) IDENTIFICATION METHOD. In order to clarify the rule, the proposed change would identify specifically those items which can be enumerated once in a Sequence Listing. It is proposed that the recommended designation be eliminated, leaving only mandatory and optional elements. Accordingly, it is proposed to change element <140> Correspondence Address and elements <150> through <154> from mandatory to optional. Elements <100> General Information,<200> Information for SEQ ID NO, and <400> Sequence Description: SEQ ID NO have been clarified as mandatory. In element <193>, it is proposed to change TELEX to Electronic mail address to be current with technology. It is proposed to eliminate Strandedness because the information is of limited use to the Office. It is proposed to limit the response for Topology to linear or circular because any other response does not permit an adequate search. Because it is essential to the search to know whether the sequence is circular, providing one of these two responses to this data element is mandatory in the Sequence Listing. Consistent with the international desire for eliminating language in the Sequence Listing, Topology would be identified as L (linear) or C (circular), and sequence Type would be N (nucleotide) or A (amino acid). It is proposed to change Feature from a recommended to a mandatory element if the sequence contains "N", "Xaa", a modified or unusual L-amino acid or a modified base. This change would highlight the presence of an unusual residue in the sequence which is important to anyone using Sequence Listing information. Section 1.824 is proposed to be amended by revising the current paragraphs (a) through (h) into paragraphs (a) through (c). Specifically, the following changes are proposed for 1.824: Current 1.824, paragraph (a), is proposed to be redesignated as paragraph (a)(1). In addition, the term "series of diskettes" would be added to indicate the acceptability of receiving numerous disks for large submissions. Current paragraph (b) is proposed to be redesignated as paragraph (a)(2). Current paragraph (c) is proposed to be redesignated as paragraph (a)(3). Current paragraph (d) is proposed to be deleted because it is incorporated into paragraph (a)(1). Current paragraph (e) is proposed to be deleted since the PTO has not found it to be necessary and feels it should not be a requirement placed on the applicant, although the applicant may optionally continue the practice of using write-protection if desired. In proposed paragraph (a)(4), a "compressed file" format would be introduced as an acceptable means to submit a large sequence listing, and in proposed paragraph (a)(5), directions on suppressing page numbering on the computer readable form version would be added for clarity. The text of current paragraph (f) is proposed to be deleted, but the list of computer readable files is proposed to be redesignated as paragraphs under new (b) and (c). In proposed paragraph (b), the explanation for "pagination" is proposed to be revised to reflect the correct format required. Proposed paragraph (b)(1) is proposed to be revised by deleting diskettes from PS/2 operating system as an accepted format. In proposed paragraph (c), the diskette requirements are proposed to be rearranged so that the most common diskette size used for submissions is at the top of the list. Also in proposed paragraph (c)(2), "format" is proposed to be amended to accommodate the current PTO equipment, and in proposed new paragraphs (c)(3), (4), and (5), additional items would be added to the list of acceptable media types due to the changes in available equipment at the PTO. Current paragraph (g) is proposed to be redesignated as paragraph (d). Current paragraph (h) is proposed to be deleted because the text is proposed to be incorporated into paragraph (a)(6). The label requirements would be rewritten more concisely than with the previous rules. In addition, fewer items would be required to be placed on the label under this proposed paragraph because the other items are no longer deemed necessary by the PTO. Current Appendix A is proposed to be rewritten to reflect the correct format of a Sequence Listing. The proposed Appendix A is presented to provide a sample listing in the correct format as described in the Table of amended 1.823(b). This sample includes the use of numeric identifiers which reflect the change to a language neutral submission. Current Appendix B is proposed to be deleted as the information it presents is no longer valid under changes in this proposed rule. Review Under the Paperwork Reduction Act of 1995 This proposed rule change contains information collection requirements which are subject to review by the Office of Management and Budget (OMB) under the Paperwork Reduction Act of 1995, 44 U.S.C. 3501, et seq. The title, description and respondent description of the information collection is shown below with an estimate of the annual reporting burdens. Included in the estimate is the time for reviewing instructions, gathering and maintaining the data needed, and completing and reviewing the collection of information. With respect to the following collection of information, the PTO invites comments on: (1) whether the proposed collection of information is necessary for the proper performance of the PTO's functions, including whether the information will have practical utility; (2) the accuracy of the PTO's estimate of the burden of the proposed collection of information, including the validity of the methodology and assumptions used; (3) ways to enhance the quality, utility, and clarity of the information to be collected; and (4) ways to minimize the burden of the collection of information on respondents, including through the use of automated collection techniques, when appropriate, and other forms of information technology. Notwithstanding any other provision of law, no person is required to respond to nor shall a person be subject to a penalty for failure to comply with a collection of information subject to the requirements of the Paperwork Reduction Act unless that collection of information displays a currently valid OMB control number. OMB Number: 0651-0024 Title: Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence Disclosures Form Numbers: None Type of Review: Revision of currently approved collection Affected Public: Individuals or households, business or other for-profit institutions, not-for-profit institutions, and Federal Government Estimated Number of Respondents: 4,600 Estimated Time Per Response: 80 minutes Estimated Total Annual Burden Hours: 6,133 Needs and Uses: The PTO requires biotechnology patent applicants to submit sequence information to enable the PTO to properly examine and process their applications. As required by the Paperwork Reduction Act of 1995, 44 U.S.C. 3507(d), the PTO has submitted a copy of this proposed rulemaking to OMB for its review of this information collection. Interested persons are requested to send comments regarding these information collections, including suggestions for reducing this burden, to the Office of Information and Regulatory Affairs of OMB, New Executive Office Bldg., 725 17th Street, N.W., Room 10235, Washington, D.C. 20503, Attn: Desk Officer for the Patent and Trademark Office. OMB is required to make a decision concerning the collection of information in these proposed regulations between 30 and 60 days after the publication of this document in the Federal Register. Therefore, a comment to OMB is best assured of having its full effect if OMB receives it within 30 days of publication. This does not affect the deadline for the public to comment to the PTO on the proposed regulations. Other Considerations This proposed rule change is in conformity with the requirements of the Regulatory Flexibility Act (5 U.S.C. 601 et seq.), Executive Order 12612, and the Paperwork Reduction Act of 1995, 44 U.S.C. 3501 et seq. It has been determined that this proposed rule is not significant for the purposes of Executive Order 12866. The Assistant General Counsel for Legislation and Regulation of the Department of Commerce has certified to the Chief Counsel for Advocacy, Small Business Administration, that this proposed rule change would not have a significant economic impact on a substantial number of small entities (Regulatory Flexibility Act, 5 U.S.C. 601 et seq.). The principal effect of this rule change is to simplify and clarify the rules governing the submission of Sequence Listings for patent applications containing nucleic acid and/or amino acid sequences. The PTO has also determined that this proposed rule change has no Federalism implications affecting the relationship between the National Government and the States as outlined in Executive Order 12612. List of Subjects in 37 CFR Part 1 Administrative practice and procedure, Courts, Freedom of Information, Inventions and patents, Reporting and record-keeping requirements, Small businesses. For the reasons set forth in the preamble and under the authority granted to the Commissioner of Patents and Trademarks by 35 U.S.C. 6, the PTO proposes to amend 37 CFR Part 1 as set forth below. Removals are indicated by brackets ([]) and additions indicated by arrows (><). Part 1 - Rules of Practice in Patent Cases 1. The authority citation for 37 CFR Part 1 would continue to read as follows: Authority: 35 U.S.C. 6 unless otherwise noted. 2. Section 1.77 is proposed to be amended by redesignating current paragraphs (g) through (j) as paragraphs (h) through (k) and by adding new paragraphs (g) and (l) to read as follows: 1.77 Arrangement of application elements. * * * * * >(g) Reference to Sequence Listing Annex.< [(g)]>(h)< Claim or claims. [(h)]>(i)< Abstract of the disclosure. [(i)]>(j)< Signed oath or declaration. [(j)]>(k)< Drawings. >(l) Sequence Listing Annex.< 3. Section 1.821 is proposed to be amended by revising paragraphs (a) and (c)-(h) to read as follows: 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and/or amino acid sequences as used in 1.821 through 1.825 are interpreted to mean an unbranched sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides. Branched sequences are specifically excluded from this definition. >Sequences with fewer than four specifically defined nucleotides or amino acids are specifically excluded from this rule. "Specifically defined" means those amino acids other than "Xaa" and those nucleotide bases other than "N" defined in accordance with the World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.23: Recommendation for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications and in Published Patent Documents, paragraphs 8 through 12, April 1994, herein incorporated by reference. (Hereinafter "WIPO Standard ST.23 (April, 1994)"). This incorporation by reference was approved by the Director of the Federal Register in accordance with 5 U.S.C. 552(a) and 1 CFR part 51. Copies of ST.23 may be obtained from the World Intellectual Property Organization; 34 chemin des Colombettes; 1211 Geneva 20 Switzerland. Copies of ST.23 may be inspected at the Patent Search Room; Crystal Plaza 3, Lobby Level; 2021 South Clark Place; Arlington, VA 22202; or at the Office of the Federal Register, 800 North Capitol Street, NW, Suite 700, Washington, D.C. < Nucleotides and amino acids are further defined as follows: (1) Nucleotides are intended to embrace only those nucleotides that can be represented using the symbols set forth in [ 1.822(b)(1)] >WIPO Standard ST.23 (April 1994), paragraph 8<. Modifications, e.g., methylated bases, may be described as set forth in [ 1.822(b)] >WIPO Standard ST.23 (April 1994), paragraph 9< , but shall not be shown explicitly in the nucleotide sequence. (2) Amino acids are those L-amino acids commonly found in naturally occurring proteins and are listed in [ 1.822(b)(2)] >WIPO Standard ST.23 (April 1994), paragraph 11<. Those amino acid sequences containing D-amino acids are not intended to be embraced by this definition. Any amino acid sequence that contains post-translationally modified amino acids may be described as the amino acid sequence that is initially translated using the symbols shown in [ 1.822(b)(2)] >WIPO Standard ST.23 (April 1994), paragraph 11< with the modified positions; e.g., hydroxylations or glycosylations, being described as set forth in [ 1.822(b)] >WIPO Standard ST.23 (April 1994), paragraph 12<, but these modifications shall not be shown explicitly in the amino acid sequence. Any peptide or protein that can be expressed as a sequence using the symbols in [ 1.822(b)(2)] >WIPO Standard ST.23 (April 1994), paragraph 11< in conjunction with a description [elsewhere in the "Sequence Listing"] >in the Feature section< to describe, for example, modified linkages, cross links and end caps, non-peptidyl bonds, etc., is embraced by this definition. (b) * * * (c) Patent applications which contain disclosures of nucleotide and/or amino acid sequences must contain, as a separate part of the disclosure on paper copy, hereinafter referred to as the "Sequence Listing," a disclosure of the nucleotide and/or amino acid sequences and associated information using the symbols and format in accordance with the requirements of 1.822 and 1.823. Each sequence disclosed must appear separately in the "Sequence Listing." Each sequence set forth in the "Sequence Listing" shall be assigned a separate >integer< identifier [written as SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, etc]. >The integer identifiers shall begin with 1 and increase sequentially by integers. If no sequence is present for an integer identifier, the words "This sequence omitted" shall appear following the integer identifier.< (d) Where the description or claims of a patent application discuss a sequence listing that is set forth in the "Sequence Listing" in accordance with paragraph (c) of this section, reference must be made to the sequence by use of the [assigned] >integer< identifier, >preceded by "SEQ ID NO:"< in the text of the description or claims, even if the sequence is also embedded in the text of the description or claims of the patent application. (e) A copy of the "Sequence Listing" referred to in paragraph (c) of this section must also be submitted in computer readable form in accordance with the requirements of 1.824. The computer readable form is a copy of the "Sequence Listing" and will not necessarily be retained as a part of the patent application file. If the computer readable form of a new application is to be identical with the computer readable form of another application of the applicant on file in the Office, reference may be made to the other application and computer readable form in lieu of filing a duplicate computer readable form in the new application >if the computer readable form in the other application was compliant with all of the requirements of these rules<. The new application shall be accompanied by a letter making such reference to the other application and computer readable form, both of which shall be completely identified. >In the new application, applicant must also request the use of the compliant computer readable "Sequence Listing" that is already on file for the other application and must state that the paper copy of the "Sequence Listing" in the new application is identical to the computer readable copy filed for the other application.< (f) In addition to the paper copy required by paragraph (c) of this section and the computer readable form required by paragraph (e) of this section, a statement that the content of the paper and computer readable copies are the same must be submitted with the computer readable form. Such a statement must be a verified statement if made by a person not registered to practice before the Office. (g) If any of the requirements of paragraphs (b) through (f) of this section are not satisfied at the time of filing under 35 U.S.C. 111 >(a)<or at the time of entering the national stage under 35 U.S.C. 371, applicant has one month from the date of a notice which will be sent requiring compliance with the requirements in order to prevent abandonment of the application. Any submission in response to a requirement under this paragraph must be accompanied by a statement that the submission includes no new matter. Such a statement must be a verified statement if made by a person not registered to practice before the Office. (h) If any of the requirements of paragraphs (b) through (f) of this section are not satisfied at the time of filing [,in the United States Receiving Office,] an international application under the Patent Cooperation Treaty (PCT) [applicant has one month from the date of a notice which] >, which application is to be searched by the United States International Searching Authority or examined by the United States International Preliminary Examining Authority, applicant< will be sent >a notice< requiring compliance with the requirements [,or such other time as may be set by the Commissioner, in which to comply] >within a prescribed time period<. Any submission in response to a requirement under this paragraph must be accompanied by a statement that the submission does not include [new] matter [or go] >which goes< beyond the disclosure in the international application as filed. Such a statement must be a verified statement if made by a person not registered to practice before the Office. If applicant fails to timely provide the required computer readable form, the United States International Searching Authority shall search only to the extent that a meaningful search can be performed >and the United States International Preliminary Examining Authority shall examine only to the extent that a meaningful examination can be performed<. * * * * * 4. Section 1.822 is proposed to be revised to read as follows: 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall conform to the requirements of paragraphs (b) through [(p)] >(e)< of this section. (b) The code for representing the nucleotide and/or amino acid sequence characters shall conform to the code set forth in the tables in [paragraphs (b)(1) and (b)(2) of this section] >WIPO Standard ST.23 (April 1994), paragraphs 8 and 11. This incorporation by reference was approved by the Director of the Federal Register in accordance with 5 U.S.C. 552(a) and 1 CFR part 51. Copies of ST.23 may be obtained from the World Intellectual Property Organization; 34 chemin des Colombettes; 1211 Geneva 20 Switzerland. Copies of ST.23 may be inspected at the Patent Search Room; Crystal Plaza 3, Lobby Level; 2021 South Clark Place; Arlington, VA 22202; or at the Office of the Federal Register, 800 North Capitol Street, NW, Suite 700, Washington, DC <. No code other than that specified in [this section] >these sections< shall be used in nucleotide and amino acid sequences. A modified base or >modified or unusual< amino acid may be presented in a given sequence as the corresponding unmodified base or amino acid if the modified base or >modified or unusual< amino acid is one of those listed in [paragraphs (p)(1) or (p)(2) of this section] >WIPO Standard ST.23 (April 1994), paragraphs 9 and 12< and the modification is also set forth [elsewhere in the Sequence Listing (for example, FEATURES 1.823(b)(2)(ix))] >in the Feature section<. Otherwise, all bases or amino acids not appearing in paragraphs [(b)(1) or (b)(2) of this section] >8 and 11 of the WIPO Standard ST.23 (April 1994)< shall be listed in a given sequence as "N" or "Xaa," respectively, with further information, as appropriate, given [elsewhere in the Sequence Listing] >in the Feature section<. [ (1) Base codes: Symbol Meaning A A; adenine C C; cytosine G G; guanine T T; thymine U U; uracil M A or C R A or G W A or T/U S C or G Y C or T/U K G or T/U V A or C or G; not T/U H A or C or T/U; not G D A or G or T/U; not C B C or G or T/U; not A N (A or C or G or T/U) or (unknown or other) (2) Amino acid three-letter abbreviations: Abbreviation Amino acid name Ala Alanine Arg Arginine Asn Asparagine Asp Aspartic Acid Asx Aspartic Acid or Asparagine Cys Cysteine Glu Glutamic Acid Gln Glutamine Glx Glutamine or Glutamic Acid Gly Glycine His Histidine Ile Isoleucine Leu Leucine Lys Lysine Met Methionine Phe Phenylalanine Pro Proline Ser Serine Thr Threonine Trp Tryptophan Tyr Tyrosine Val Valine Xaa Unknown or other ] (c) >Format representation of nucleotides: (1)< A nucleotide sequence shall be listed using the >lower-case letter for representing the< one-letter code for the nucleotide bases[, as] >set forth< in [paragraph (b)(1) of this section] >WIPO Standard ST.23 (April 1994), paragraph 8<. [(d) The amino acids corresponding to the codons in the coding parts of a nucleotide sequence shall be typed immediately below the corresponding codons. Where a codon spans an intron, the amino acid symbol shall be typed below the portion of the codon containing two nucleotides. (e) The amino acids in a protein or peptide sequence shall be listed using the three-letter abbreviation with the first letter as an upper case character, as in paragraph (b)(2) of this section.] [(f)] >(2)< The bases in a nucleotide sequence (including introns) shall be listed in groups of 10 bases except in the coding parts of the sequence. Leftover bases, fewer than 10 in number, at the end of noncoding parts of a sequence shall be grouped together and separated from adjacent groups of 10 or 3 bases by a space. [(g)] >(3)< The bases in the coding parts of a nucleotide sequence shall be listed as triplets (codons). >The amino acids corresponding to the codons in the coding parts of a nucleotide sequence shall be typed immediately below the corresponding codons. Where a codon spans an intron, the amino acid symbol shall be typed below the portion of the codon containing two nucleotides.< [(h) A protein or peptide sequence shall be listed with a maximum of 16 amino acids per line, with a space provided between each amino acid.] [(i)] >(4)< A nucleotide sequence shall be listed with a maximum of 16 codons or 60 bases per line, with a space provided between each codon or group of 10 bases. [(j)] >(5)< A nucleotide sequence shall be presented, only by a single strand, in the 5' to 3' direction, from left to right. [(k) An amino acid sequence shall be presented in the amino to carboxy direction, from left to right, and the amino and carboxy groups shall not be presented in the sequence.] [(l)] >(6)< The enumeration of nucleotide bases shall start at the first base of the sequence with number 1. The enumeration shall be continuous through the whole sequence in the direction 5' to 3'. The enumeration shall be marked in the right margin, next to the line containing the one-letter codes for the bases, and giving the number of the last base of that line. [(m) The enumeration of amino acids may start at the first amino acid of the first mature protein, with the number 1. The amino acids preceding the mature protein, e.g., pre-sequences, pro-sequences, pre-pro-sequences and signal sequences, when presented, shall have negative numbers, counting backwards starting with the amino acid next to number 1. Otherwise, the enumeration of amino acids shall start at the first amino acid at the amino terminal as number 1. It shall be marked below the sequence every 5 amino acids.] [(n)] >(7)< For those nucleotide sequences that are circular in configuration, the enumeration method set forth in paragraph [(l)] >(c)(6)< of this section remains applicable with the exception that the designation of the first base of the nucleotide sequence may be made at the option of the applicant. [The enumeration method for amino acid sequences that is set forth in paragraph (m) of this section remains applicable for amino acid sequences that are circular in configuration.] >(d) Representation of amino acids: (1) The amino acids in a protein or peptide sequence shall be listed using the three-letter abbreviation with the first letter as an upper case character, as in WIPO Standard ST.23 (April 1994), paragraph 11. (2) A protein or peptide sequence shall be listed with a maximum of 16 amino acids per line, with a space provided between each amino acid. (3) An amino acid sequence shall be presented in the amino to carboxy direction, from left to right, and the amino and carboxy groups shall not be presented in the sequence. (4) The enumeration of amino acids may start at the first amino acid of the first mature protein, with the number 1. The amino acids preceding the mature protein, e.g., pre-sequences, pro-sequences, pre-pro-sequences and signal sequences, when presented, shall have negative numbers, counting backwards starting with the amino acid next to number 1. Otherwise, the enumeration of amino acids shall start at the first amino acid at the amino terminal as number 1. It shall be marked below the sequence every 5 amino acids. The enumeration method for amino acid sequences that is set forth in this section remains applicable for amino acid sequences that are circular in configuration. (5) An amino acid sequence that contains internal terminator symbols, e.g., "Ter", "*", or ".", etc., may not be represented as a single amino acid sequence, but shall be presented as separate amino acid sequences. (e)< [(o)] A sequence with a gap or gaps shall be presented as a plurality of separate sequences, with separate [sequence] >integer< identifiers, with the number of separate sequences being equal in number to the number of continuous strings of sequence data. A sequence that is made up of one or more noncontiguous segments of a larger sequence or segments from different sequences shall be presented as a separate sequence. [(p) The code for representing modified nucleotide bases and modified or unusual amino acids shall conform to the code set forth in the tables in paragraphs (p)(1) and (p)(2) of this section. The modified base controlled vocabulary in paragraph (p)(1) of this section and the modified and unusual amino acids in paragraph (p)(2) of this section shall not be used in the nucleotide and/or amino acid sequences; but may be used in the description and/or the "Sequence Listing" corresponding to, but not including, the nucleotide and/or amino acid sequence. (1) Modified base controlled vocabulary: Abbreviation Modified base description ac4c 4-acetylcytidine. chm5u 5-(carboxyhydroxylmethyl)uridine. cm 2'-O-methylcytidine. cmnm5s2u 5-carboxymethylaminomethyl-2- thioridine. cmnm5u 5-carboxymethylaminomethyluridine. d dihydrouridine. fm 2'-O-methylpseudouridine. galq beta,D-galactosylqueosine. gm 2'-O-methylguanosine. i inosine. i6a N6-isopentenyladenosine. m1a 1-methyladenosine. m1f 1-methylpseudouridine. m1g 1-methylguanosine. ml1 1-methylinosine. m22g 2,2-dimethylguanosine. m2a 2-methyladenosine. m2g 2-methylguanosine. m3c 3-methylcytidine. m5c 5-methylcytidine. m6a N6-methyladenosine. m7g 7-methylguanosine. mam5u 5-methylaminomethyluridine. mam5s2u 5-methoxyaminomethyl-2-thiouridine. manq beta,D-mannosylqueosine. mcm5s2u 5-methoxycarbonylmethyluridine. mo5u 5-methoxyuridine. ms2i6a 2-methylthio-N6-isopentenyladenosine. ms2t6a N-((9-beta-D-ribofuranosyl-2- methylthiopurine-6- yl)carbamoyl)threonine. mt6a N-((9-beta-D-ribofuranosylpurine-6-yl)N- methylcarbamoyl)threonine. mv uridine-5-oxyacetic acid methylester. o5u uridine-5-oxyacetic acid (v). osyw wybutoxosine. p pseudouridine. q queosine. s2c 2-thiocytidine. s2t 5-methyl-2-thiouridine. s2u 2-thiouridine. s4u 4-thiouridine. t 5-methyluridine. t6a N-((9-beta-D-ribofuranosylpurine-6- yl)carbamoyl) threonine. tm 2'-O-methyl-5-methyluridine. um 2'-O-methyluridine. yw wybutosine. x 3-(3-amino-3-carboxypropyl)uridine, (acp3)u. (2) Modified and unusual amino acids: Abbreviation Modified and unusual amino acid Aad 2-Aminoadipic acid. bAad 3-aminoadipic acid. bAla beta-Alanine, beta-Aminopropionic acid. Abu 2-Aminobutyric acid. 4Abu 4-Aminobutyric acid, piperidinic acid. Acp 6-Aminocaproic acid. Ahe 2-Aminoheptanoic acid. Aib 2-Aminoisobutyric acid. bAib 3-Aminoisobutyric acid. Apm 2-Aminopimelic acid. Dbu 2,4-Diaminobutyric acid. Des Desmosine. Dpm 2,2'-Diaminopimelic acid. Dpr 2,3-Diaminopropionic acid. EtGly N-Ethylglycine. EtAsn N-Ethylasparagine. Hyl Hydroxylysine. aHyl allo-Hydroxylysine. 3Hyp 3-Hydroxyproline. 4Hyp 4-Hydroxyproline. Ide Isodesmosine. aIle allo-Isoleucine. MeGly N-Methylglycine, sarcosine. MeIle N-Methylisoleucine. MeLys N-Methylvaline. Nva Norvaline. Nle Norleucine. Orn Ornithine. ] 5. Section 1.823 is proposed to be revised to read as follows: 1.823 Requirements for nucleotide and/or amino acid sequences as partof the application papers. (a) The "Sequence Listing" required by 1.821(c), setting forth the nucleotide and/or amino acid sequences, and associated information in accordance with paragraph (b) of this section, must begin on a new page and be titled "Sequence Listing" [and appear] >. On a separate page of the application specification,< immediately prior to the claims [.]>, there shall be a reference to the presence of the "Sequence Listing" in a "Sequence Listing Annex." The "Sequence Listing" shall appear in the "Sequence Listing Annex," which is numbered independently of the numbering of the remainder of the application and shall be placed in the application file. Upon printing the application as a patent, the "Sequence Listing Annex" containing the paper "Sequence Listing" shall be printed immediately before the patented claims.< Each page of the "Sequence Listing" shall contain no more than 66 lines and each line shall contain no more than 72 characters. A fixed-width font shall be used exclusively throughout the "Sequence Listing." (b) The "Sequence Listing" shall, except as otherwise indicated, include, in addition to and immediately preceding the actual nucleotide and/or amino acid sequence, the [following items of information.] > numeric identifiers and their accompanying information as shown in the following table. The numeric identifier shall be used only in the "Sequence Listing."< The order and presentation of the items of information in the "Sequence Listing" shall conform to the arrangement given below [,except that parenthetical explanatory information following the headings (identifiers) is to be omitted]. Each item of information shall begin on a new line [, enumerated with the number/numeral/letter in parentheses as shown below, with the heading (identifier) in upper case characters, followed by a colon, and then followed by the information provided] > beginning with the numeric identifier enclosed in angle brackets as shown<. Except as allowed below, no item of information shall occupy more than one line. [Those items of information that are applicable for all sequences shall only be set forth once in the "Sequence Listing."] The submission of those items of information designated with an "M" is mandatory. [The submission of those items of information designated with an "R" is recommended, but not required.] The submission of those items of information designated with an "O" is optional. >Numeric identifiers <100> through <193> shall only be set forth at the beginning of the "Sequence Listing."< Those items designated with "rep" may have multiple responses and, as such, the item may be repeated in the "Sequence Listing." [(1) GENERAL INFORMATION (Application, diskette/tape and publication information): (i) APPLICANT (maximum of first ten named applicants; specify one name per line: SURNAME comma OTHER NAMES and/or INITIALS - M/rep): (ii) TITLE OF INVENTION (title of the invention, as elsewhere in application, four lines maximum - M): (iii) NUMBER OF SEQUENCES (number of sequences in the "Sequence Listing" (M): (iv) CORRESPONDENCE ADDRESS (M): (A) ADDRESSEE (name of applicant, firm, company or institution, as may be appropriate): (B) STREET (correspondence street address, as elsewhere in application, four lines maximum): (C) CITY (correspondence city address, as elsewhere in application): (D) STATE (correspondence state, as elsewhere in application): (E) COUNTRY (correspondence country, as elsewhere in application): (F) ZIP (correspondence zip or postal code, as elsewhere in application): (v) COMPUTER READABLE FORM (M): (A) MEDIUM TYPE (type of diskette/tape submitted): (B) COMPUTER (type of computer used with diskette/tape submitted): (C) OPERATING SYSTEM (type of operating system used): (D) SOFTWARE (type of software used to create computer readable form): (vi) CURRENT APPLICATION DATA (M, if available): (A) APPLICATION NUMBER (U.S application number, including a series code, a slash and a serial number, or U.S. PCT application number, including the letters PCT, a slash, a two-letter code indicating the U.S. as the Receiving Office, a two-digit indication of the year, a slash and a five-digit number, if available): (B) FILING DATE (U.S. or PCT application filing date, if available; specify as dd-MMM-yyyy): (C) CLASSIFICATION (IPC/US classification or F-term designation, where F-terms have been developed, if assigned, specify each designation, left justified, within an eighteen-position alpha numeric field - rep, to a maximum of ten classification designations): (vii) PRIOR APPLICATION DATA (prior domestic, foreign priority or international application data, if applicable - M/rep): (A) APPLICATION NUMBER (application number; specify as two-letter country code and an eight-digit application number; or if a PCT application, specify as the letters PCT, a slash, a two-letter code indicating the Receiving Office, a two-digit indication of the year, a slash and a five-digit number): (B) FILING DATE (document filing date, specify as dd-MMM-yyyy): (viii) ATTORNEY/AGENT INFORMATION (O): (A) NAME (attorney/agent name; SURNAME comma OTHER NAMES and/or INITIALS): (B) REGISTRATION NUMBER (attorney/agent registration number): (C) REFERENCE/DOCKET NUMBER (attorney/agent reference or docket number): (ix) TELECOMMUNICATION INFORMATION (O): (A) TELEPHONE (telephone number of applicant or attorney/agent): (B) TELEFAX (telefax number of applicant or attorney/agent): (C) TELEX (telex number of applicant or attorney/agent): (2) INFORMATION FOR SEQ ID NO: X (rep): (i) SEQUENCE CHARACTERISTICS (M): (A) LENGTH (sequence length, expressed as number of base pairs or amino acid residues): (B) TYPE (sequence type, i.e., whether nucleic acid or amino acid): (C) STRANDEDNESS (if nucleic acid, number of strands of source organism molecule, i.e., whether single-stranded, double-stranded, both or unknown to applicant): (D) TOPOLOGY (whether source organism molecule is circular, linear, both or unknown to applicant): (ii) MOLECULE TYPE (type of molecule sequenced in SEQ ID NO:X (at least one of the following should be included with subheadings, if any, in Sequence Listing - R)): - Genomic RNA; - Genomic DNA; - mRNA - tRNA; - rRNA; - snRNA; - scRNA; - preRNA; - cDNA to genomic RNA; - cDNA to mRNA; - cDNA to tRNA; - cDNA to rRNA; - cDNA to snRNA; - cDNA to scRNA; - Other nucleic acid; (A) DESCRIPTION (four lines maximum): - protein and - peptide. (iii) HYPOTHETICAL (yes/no - R): (iv) ANTI-SENSE (yes/no - R): (v) FRAGMENT TYPE (for proteins and peptides only, at least one of the following should be included in the Sequence Listing - R): - N-terminal fragment; - C-terminal fragment and - internal fragment. (vi) ORIGINAL SOURCE (original source of molecule sequenced in SEQ IDNO:X - R): (A) ORGANISM (scientific name of source organism): (B) STRAIN: (C) INDIVIDUAL ISOLATE (name/number of individual/isolate): (D) DEVELOPMENTAL STAGE (give developmental stage of source organism and indicate whether derived from germ-line or rearranged developmental pattern): (E) HAPLOTYPE: (F) TISSUE TYPE: (G) CELL TYPE: (H) CELL LINE: (I) ORGANELLE: (vii) IMMEDIATE SOURCE (immediate experimental source of the sequence in SEQ ID NO:X - R): (A) LIBRARY (library -type, name): (B) CLONE (clone(s)): (viii) POSITION IN GENOME (position of sequence in SEQ ID NO:X in genome - R): (A) CHROMOSOME/SEGMENT (chromosome/segment - name/number): (B) MAP POSITION: (C) UNITS (units for map position, i.e., whether units are genome percent, nucleotide number or other/specify): (ix) FEATURE (description of points of biological significance in the sequence in SEQ ID NO:X -R/rep): (A) NAME/KEY (provide appropriate identifier for feature - four lines maximum): (B) LOCATION (specify location according to syntax of DDBJ/EMBL/GenBank Feature Tables Definition, including whether feature is on complement of presented sequence; where appropriate state number of first and last bases/amino acids in feature - four lines maximum): (C) IDENTIFICATION METHOD (method by which the feature was identified, i.e., by experiment, by similarity with known sequence or to an established consensus sequence, or by similarity to some other pattern - four lines maximum): (D) OTHER INFORMATION (include information on phenotype conferred, biological activity of sequence or its product, macromolecules which bind to sequence or its product, or other relevant information - four lines maximum): (x) PUBLICATION INFORMATION (Repeat section for each relevant publication - O/rep): (A) AUTHORS (maximum of first ten named authors of publication; specify one name per line: SURNAME comma OTHER NAMES and/or INITIALS - rep): (B) TITLE (title of publication): (C) JOURNAL (journal name in which data published): (D) VOLUME (journal volume in which data published): (E) ISSUE (journal issue number in which data published): (F) PAGES (journal page numbers in which data published): (G) DATE (journal date in which data published; specify as dd-MMM-yyyy, MMM-yyyy or Season-yyyy): (H) DOCUMENT NUMBER (document number, for patent type citations only; specify as two-letter country code, eight-digit document number (right justified), one letter and as appropriate, one number or a space as a document type code; or if a PCT application specify as the letters PCT, a slash, a two-letter code indicating the Receiving Office, a two-digit indication of the year, a slash and a five-digit number; or if a PCT publication, specify as the two letters WO, a two-digit indication of the year, a slash and a five-digit publication number): (I) FILING DATE (document filing date, for patent-type citations only; specify as dd-MMM-yyyy): (J) PUBLICATION DATE (document publication date; for patent-type citations only, specify as dd-MMM-yyyy): (K) RELEVANT RESIDUES In SEQ ID NO:X (rep): FROM (position) TO (position) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:X:] > Numeric Definition Comments and Format Mandatory (M) or Identifier Optional (O) <100> General Leave blank after M Information <100> <110> Applicant Max. of 10 M names; one name per line; use format: Surname, Other Names and/or Initials; rep <120> Title of Four lines maximum M Invention <130> Number of Use an integer as a M Sequences response <140> Correspondence <140> must be present O Address if subheadings <141>- <146> are used <141> Addressee O <142> Street Four lines maximum O <143> City O <144> State or O Province <145> Country O <146> Zip or Postal O Code <150> Computer Leave blank after O Readable Form <150> <151> Medium Type Type of O diskette/tape submitted <152> Computer Type of computer O used to create diskette/tape <153> Operating Type of operating O System system on computer <154> Software Type of software used O to create computer readable form <160> Current Leave blank after<160>; M, if available Application <160> must be present Data if subheadings <161> &<162> are used <161> Application Specify as: US M, if available Number 07/999,999 or PCT/US96/99999 <162> Filing Date Specify as: M, if available dd-MMM-yyyy <170> Prior Insert heading/ M, if applicable Application subheadings Data only if applicable; leave blank after<170>; <170> must be present if subheadings <171> & <172> are used; rep. <171> Application Specify as: US M, if applicable Number 07/999,999 or PCT/US96/99999 <172> Filing Date Specify as: M, if applicable dd-MMM-yyyy <180> Attorney/Agent Leave blank after O Information <180> <181> Name Use format: Surname, O Other Names and/or Initials <182> Registration O Number <183> File Reference O /Docket Number <190> Telecommunica- Leave blank after O tion Informa- <190> tion <191> Telephone O <192> Telefax O <193> Electronic O mail address <200> Information Response shall be an M for SEQ ID integer representing NO:#: the SEQ ID NO shown; rep. <210> Sequence Leave blank after M Character- <210> istics <211> Length Respond with an M integer expressing the number of bases or amino acid residues <212> Type Whether presented M sequence molecule is nucleotide or amino acid, indicated by N or A <214> Topology Whether presented M sequence molecule is linear or circular, indicated as L or C <290> Feature Description of points M, if "N", "Xaa", of biological or a modified or significance in the unusual L-amino sequence; leave blank acid or modified after <290>; rep. base was used in the sequence <291> Name/Key Provide appropriate M, if "N", "Xaa", identifier for feature; or a modified or four lines maximum unusual L-amino acid or modified base was used in the sequence <292> Location Specify location M, if "N", "Xaa", within sequence; or a modified or where appropriate unusual L-amino state number of acid or modified first and last bases base was used in /amino acids in the sequence feature; four lines maximum <294> Other Other relevant M, if "N", "Xaa", Information information; four or a modified or lines maximum unusual L-amino acid or modified base was used in the sequence <300> Publication Leave blank after O Information <300>; rep. <301> Authors Maximum of ten O named authors of publication; specify one name per line; use format: Surname, Other Names and/or Initials <302> Title O <303> Journal O <304> Volume O <305> Issue O <306> Pages O <307> Date Journal date in O which data published; specify as dd-MMM-yyyy, MMM-yyyy or Season-yyyy <308> Patent Document number; for O Document patent-type citations Number only <309> Filing Date Document filing date, O for patent-type citations only; specify as dd-MMM- yyyy <310> Publication Document publication O Date date, for patent-type citations only; specify as dd-MMM-yyy <311> Relevant FROM (position) O Residues TO (position) <400> Sequence Response shall be M Description: an integer SEQ ID NO:#: representing the SEQ ID NO shown; rep. < 6. Section 1.824 is proposed to be revised to read as follows: 1.824 Form and format for nucleotide and/or amino acid sequence submissions in computer readable form. (a) The computer readable form required by 1.821(e) shall [contain a printable copy of the "Sequence Listing," as defined in 1.821(c), 1.822 and 1.823, recorded as] >meet the following specifications: (1) The computer readable form shall contain< a single [file on] >"Sequence Listing" as< either a diskette, [or a magnetic tape] >series of diskettes, or other permissible media outlined in 1.824(c)<. [The computer readable form shall be encoded and formatted such that a printed copy of the "Sequence Listing" may be recreated using the print commands of the computer/operating-system configurations specified in paragraph (f) of this section.] [(b)] >(2)< The [file] >"Sequence Listing"< in paragraph (a) >(l)< of this section shall be [encoded in a subset of the] >submitted in< American Standard Code for Information Interchange (ASCII) >text<. [This subset shall consist of all printable ASCII characters including the ASCII space character plus line-termination, pagination and end-of-file characters associated with the computer/operating-system configurations specified in paragraph (f) of this section.] No other [characters] >formats< shall be allowed. [(c)] >(3)< The computer readable form may be created by any means, such as word processors, nucleotide/amino acid sequence editors or other custom computer programs; however, it shall [be readable by one of the computer/operating-systemconfigurations specified in paragraph (f) of this section, and shall] conform to [the] >all< specifications [in paragraphs (a) and (b) of] >detailed in< this section. [(d) The entire printable copy of the "Sequence Listing shall be contained within one file on a single diskette or magnetic tape unless it is shown to the satisfaction of the Commissioner that it is not practical or possible to submit the entire printable copy of the "Sequence Listing" within one file on a single diskette or magnetic tape. (e) The submitted diskette or tape shall be write-protected such as by covering or uncovering diskette holes, removing diskette write tabs or removing tape write rings. (f) As set forth in paragraph (c), above, any means may be used to create the computer readable form, as long as the following conditions are satisfied. A submitted diskette shall be readable on one of the computer/operating-system configurations described in paragraphs (1) through (3), below. A submitted tape shall satisfy the format specifications described in paragraph (4), below.] >(4) File compression is acceptable when using diskette media, so long as the compressed file is in a self-extracting format that will decompress on one of the systems described in paragraph (b) of this section. (5) Page numbering shall not appear within the computer readable form version of the "Sequence Listing" file. (6) All computer readable forms shall have a label permanently affixed thereto on which has been hand-printed or typed: the name of the applicant, the title of the invention, the name and type of computer and operating system used, and application serial number and filing date, if known. (b) Computer readable form files submitted must meet these format requirements:< (1) Computer: IBM PC/XT/AT, >or compatibles< [ IBM PS/2 or compatibles]>,or Apple Macintosh<; [(i)]>(2)<operating System: [PC-DOS or] MS-DOS [(Versions 2.1 or above)]>, Unix or Macintosh<; [(ii)]>(3)< Line terminator: ascii carriage Return plus ASCII Line Feed; [(iii)]>(4)< Pagination: [ASCII Form Feed or Series of Line Terminators] >Continuous file (no "hard page break" codes permitted)<; [(iv) End-of-File:ASCII SUB (Ctrl-Z); (v)Media:] >(c) Computer readable form files submitted may be in any of the following media:< [(A) Diskette - 5.25 inch, 360 Kb storage; (B) Diskette - 5.25 inch, 1.2 Mb storage; (C) Diskette - 3.50 inch, 720 Kb storage; (D) Diskette - 3.5 inch, 1.44 Mb storage;] >(1) Diskette: 3.50 inch, 1.44 Mb storage; 3.50 inch, 720 Kb storage; 5.25 inch, 1.2 Mb storage; 5.25 inch, 360 Kb storage; [(vi) Print Command: PRINT filename.extension; (2) Computer: IBM PC/XT/AT, IBM PS/2 or compatibles; (i) Operating system: Xenix; (ii) Line Terminator: ASCII Carriage Return; (iii) Pagination: ASCII Form Feed or Series of Line Terminators; (iv) End-of-File: None; (v) Media: (A) Diskette - 5.25 inch, 360 Kb storage; (B) Diskette - 5.25 inch, 1.2 Mb storage; (C) Diskette - 3.50 inch, 720 Kb storage; (D) Diskette - 3.5 inch, 1.44 Mb storage; (vi) Print Command: Ipr filename; (3) Computer: Apple Macintosh; (i) Operating System: Macintosh; (ii) Macintosh File Type: text with line termination (iii) Line Terminator: Pre-defined by text type file; (iv) Pagination: Pre-defined by text type file; (v) End-of-File: Pre-defined by text type file; (vi) Media: (A) Diskette - 3.50 inch, 400 Kb storage; (B) Diskette - 3.50 inch, 800 Kb storage; (C) Diskette - 3.50 inch, 1.4 Mb storage; (vii) Print Command: Use PRINT command from any Macintosh Application that processes text files, such as Mac-Write or TeachText; (4) Magnetic tape: 0.5 inch, up to 2400 feet; (i) Density: 1600 or 6250 bits per inch, 9 track; (ii) Format:raw, unblocked; (iii) Line Terminator: ASCII Carriage Return plus optional ASCII Line Feed; (iv) Pagination: ASCII Form Feed or Series of Line Terminators; (v) Print Command (Unix shell version given here as sample response -mt/dev/rmt0; 1pr/dev/rmt0):] >(2) Magnetic tape: 0.5 inch, up to 24000 feet; Density: 1600 or 6250 bits per inch, 9 track; Format: Unix tar command; specify blocking factor (not "block size") Line Terminator: ASCII Carriage Return plus ASCII Line Feed; (3) 8mm Data Cartridge: Format: Unix tar command; specify blocking factor (not "block size") Line Terminator: ASCII Carriage Return plus ASCII Line Feed; (4) CD-ROM: Format: ISO 9660 or High Sierra Format (5) Magneto Optical Disk: Size/Storage Specifications: 5.25 inch, 640 Mb< [(g)]>(d)< computer readable forms that are submitted to the Office will not be returned to the applicant. [(h) All computer readable forms shall have a label permanently affixed thereto on which has been hand-printed or typed, a description of the format of the computer readable form as well as the name of the applicant, the title of the invention, the date on which the data were recorded on the computer readable form and the name and type of computer and operating system which generated the files on the computer readable form. If all this information cannot be printed on a label affixed to the computer readable form, by reason of size or otherwise, the label shall include the name of the applicant and the title of the invention and a reference number, and the additional information may be provided on a container for the computer readable form with the name of the applicant, the title of the invention, the reference number and the additional information affixed to the container. If the computer readable form is submitted after the date of filing under 35 U.S.C. 111, after the date of entry in the national stage under 35 U.S.C. 371 or after the time of filing, in the United States Receiving Office, an international application under the PCT, the labels mentioned herein must also include the date of the application number, including series code and serial number.] 7. Section 1.825 is proposed to be amended by revising paragraphs (a), (b) and (d ) to read as follows: 1.825 Amendments to or replacement of sequence listing and computer readable copy thereof. (a) Any amendment to the paper copy of the "Sequence Listing" ( 1.821(c)) must be made by the submission of substitute sheets. Amendments must be accompanied by a statement that indicates support for the amendment in the application, as filed, and a statement that the substitute sheets include no new matter. Such a statement must be averified statement if made by a person not registered to practice before the Office. (b) Any amendment to the paper copy of the "Sequence Listing," in accordance with paragraph (a) of this section, must be accompanied by a substitute copy of the computer readable form ( 1.821(e)) including all previously submitted data with the amendment incorporated therein, accompanied by a statement that the copy in computer readable form is the same as the substitute copy of the "Sequence Listing." Such a statement must be a verified statement if made by a person not registered to practice before the Office. (c) * * * (d) If, upon receipt, the computer readable form is found to be damaged or unreadable, applicant must provide, within such time as set by the Commissioner, a substitute copy of the data in computer readable form accompanied by a statement that the substitute data is identical to that originally filed. Such a statement must be a verified statement if made by a person not registered to practice before the Office. 8. Appendix A to Subpart G is proposed to be revised to read as follows: Appendix A To Subpart G Of Part 1 - Sample Sequence Listing [(1) GENERAL INFORMATION: (i) APPLICANT: Doe, Joan X, Doe, John Q (ii) TITLE OF INVENTION: Isolation and Characterization of a Gene Encoding a Protease from Paramecium sp. (iii) NUMBER OF SEQUENCES: 2 (iv) CORRESPONDENCE ADDRESS: (A) ADDRESSEE: Smith and Jones (B) STREET: 123 Main Street (C) CITY: Smalltown (D) STATE: Anystate (E) COUNTRY: USA (F) ZIP: 12345 (v) COMPUTER READABLE FORM: (A) MEDIUM TYPE: Diskette, 3.50 inch, 800 Kb storage (B) COMPUTER: Apple Macintosh (C) OPERATING SYSTEM: Macintosh 5.0 (D) SOFTWARE: MacWrite (vi) CURRENT APPLICATION DATA: (A) APPLICATION NUMBER: 09/999,999 (B) FILING DATE: 28-FEB-1989 (C) CLASSIFICATION: 999/99 (vii) PRIOR APPLICATION DATA: (A) APPLICATION NUMBER: PCT/US88/99999 (B) FILING DATE: 01-MAR-1988 (viii) ATTORNEY/AGENT INFORMATION: (A) NAME: Smith, John A (B) REGISTRATION NUMBER: 00001 (C) REFERENCE/DOCKET NUMBER: 01-0001 (ix) TELECOMMUNICATIONS INFORMATION: (A) TELEPHONE: (909) 999-001 (B) TELEFAX: (909) 999-0002 (2) INFORMATION FOR SEQ ID NO: 1: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 954 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: genomic DNA (iii) HYPOTHETICAL: yes (iv) ANTI-SENSE: no (vi) ORIGINAL SOURCE: (A) ORGANISM: Paramecium sp (C) INDIVIDUAL/ISOLATE: XYZ2 (G) CELL TYPE: unicellular organism (vii) IMMEDIATE SOURCE: (A) LIBRARY: genomic (B) CLONE: Para-XYZ2/36 (x) PUBLICATION INFORMATION: (A) AUTHORS: Doe, Joan X, Doe, John Q (B) TITLE: Isolation and Characterization of a Gene Encoding a Protease from Paramecium sp. (C) JOURNAL: Fictional Genes (D) VOLUME: I (E) ISSUE: 1 (F) PAGES: 1-20 (G) DATE: 02-MAR-1988 (K) RELEVANT RESIDUES IN SEQ ID NO: 1: FROM 1 TO 954 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: ATCGGGATAG TACTGGTCAA GACCGGTGGA CACCGGTTAA CCCCGGTTAA GTACCGGTTA 60 TAGGCCATTT CAGGCCAAAT GTGCCCAACT ACGCCAATTG TTTTGCCAAC GGCCAACGTT 120 ACGTTCGTAC GCACGTATGT ACCTAGGTAC TTACGGACGT GACTACGGAC ACTTCCGTAC 180 GTACGTACGT TTACGTACCC ATCCCAACGT AACCACAGTG TGGTCGCAGT GTCCCAGTGT 240 ACACAGACTG CCAGACATTC TTCACAGACA CCCC ATG ACA CCA CCT GAA CGT CTC 295 Met Thr Pro Pro Glu Arg Leu -30 TTC CTC CCA AGG GTG TGT GGC ACC ACC CTA CAC CTC CTC CTT CTG GGG 343 Phe Leu Pro Arg Val Cys Gly Thr Thr Leu His Leu Leu Leu Leu Gly -25 -20 -15 CTG CTG CTG GTT CTG CTG CCT GGG GCC CAT GTGAGGCAGC AGGAGAATGG 393 Leu Leu Leu Val Leu Leu Pro Gly Ala His -10 -5 GGTGGCTCAG CCAAACCTTG AGCCCTAGAG CCCCCCTCAA CTCTGTTCTC CTAG GGG Gly 450 CTC ATG CAT CTT GCC CAC AGC AAC CTC AAA CCT GCT GCT CAC CTC ATT 498 Leu Met His Leu Ala His Ser Asn Leu Lys Pro Ala Ala His Leu Ile 1 5 10 15 GTAAACATCC ACCTGACCTC CCAGACATGT CCCCACCAGC TCTCCTCCTA CCCCTGCCTC 558 AGGAACCCAA GCATCCACCC CTCTCCCCCA ACTTCCCCCA CGCTAAAAAA AACAGAGGGA 618 GCCCACTCCT ATGCCTCCCC CTGCCATCCC CCAGGAACTC AGTTGTTCAG TGCCCACTTC 678 TAC CCC AGC AAG CAG AAC TCA CTG CTC TGG AGA GCA AAC ACG GAC CGT 726 Tyr Pro Ser Lys Gln Asn Ser Leu Leu Trp Arg Ala Asn Thr Asp Arg 20 25 30 GCC TTC CTC CAG GAT GGT TTC TCC TTG AGC AAC AAT TCT CTC CTG GTC 774 Ala Phe Leu Gln Asp Gly Phe Ser Leu Ser Asn Asn Ser Leu Leu Val 35 40 45 TAGAAAAAAT AATTGATTTC AAGACCTTCT CCCCATTCTG CCTCCATTCT GACCATTTCA 834 GGGGTCGTCA CCACCTCTCC TTTGGCCATT CCAACAGCTC AAGTCTTCCC TGATCAAGTC 894 ACCGGAGCTT TCAAAGAAGG AATTCTAGGC ATCCCAGGGG ACCCACACCT CCCTGAACCA 954 (2) INFORMATION FOR SEQ ID NO: 2: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 82 amino acids (B) TYPE: amino acid (C) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (ix) FEATURE: (A) NAME/KEY: signal sequence (B) LOCATION: -34 to -1 (C) IDENTIFICATION METHOD: similarity to other signal sequences, hydrophobic (D) OTHER INFORMATION: expresses protease (x) PUBLICATION INFORMATION: (A) AUTHORS: Doe, Joan X, Doe, John Q (B) TITLE: Isolation and Characterization of a Gene Encoding a Protease from Paramecium sp. (C) JOURNAL: Fictional Genes (D) VOLUME: I (E) ISSUE: 1 (F) PAGES: 1-20 (G) DATE: 02-MAR-1988 (H) RELEVANT RESIDUES IN SEQ ID NO:2: FROM -34 TO 48 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: Met Thr Pro Pro Glu Arg Leu Phe Leu Pro Arg Val Cys Gly Thr Thr -30 -25 -20 Leu His Leu Leu Leu Leu Gly Leu Leu Leu Val Leu Leu Pro Gly Ala -15 -10 -5 His Gly Leu Met His Leu Ala His Ser Asn Leu Lys Pro Ala Ala His 1 5 10 Leu Ile Tyr Pro Ser Lys Gln Asn Ser Leu Leu Trp Arg Ala Asn Thr 15 20 25 30 Asp Arg Ala Phe Leu Gln Asp Gly Phe Ser Leu Ser Asn Asn Ser Leu 35 40 45 Leu Val ] > <100> <110> Doe, Joan X, Doe, John Q <120> Isolation and Characterization of a Gene Encoding a Protease from Paramecium sp. <130> 2 <140> <141> Smith and Jones <142> 123 Main Street <143> Smalltown <144> Anystate <145> USA <146> 12345 <150> <151> Floppy disk <152> IBM PC compatible <153> PC-DOS/MS-DOS <154> PatentIn Release #2.00 <160> <161> 09/999,999 <162> 28-FEB-1989 <170> <171> PCT/US/88/99999 <172> 01-MAR-1988 <180> <181> Smith, John A <182> REGISTRATION NUMBER: 00001 <183> 01-0001 <190> <191> (909) 999-0001 <192> (909) 999-0002 <200> 1 <210> <211> 954 base pairs <212> N <214> L <290> <291> CDS <292> join(275..373, 448..498, 679..774) <290> <291> mat_peptide <292> join(451..498, 679..774) <300> <301> Doe , Joan X, Doe, John Q <302> Isolation and Characterization of a Gene Encoding a Protease from Paramecium sp. <303> Fictional Genes <304> 1 <305> 1 <306> 1-20 <307> 02-MAR-1988 <308> FROM 1 TO 957 <400> 1 atcgggatag tactggtcaa gaccggtgga caccggttaa ccccggttaa gtaccggtta 60 taggccattt caggccaaat gtgcccaact acgccaattg ttttgccaac ggccaacgtt 120 acgttcgtac gcacgtatgt acctaggtac ttacggacgt gactacggac acttccgtac 180 gtacgtacgt ttacgtaccc atcccaacgt aaccacagtg tggtcgcagt gtcccagtgt 240 acacagactg ccagacattc ttcacagaca cccc atg aca cca cct gaa cgt 292 Met Thr Pro Pro Glu Arg -30 ctc ttc ctc cca agg gtg tgt ggc acc acc cta cac ctc ctc ctt ctg 340 Leu Phe Leu Pro Arg Val Cys Gly Thr Thr Leu His Leu Leu Leu Leu -25 -20 -15 ggg ctg ctg ctg gtt ctg ctg cct ggg gcc cat gtgaggcagc aggagaatgg 393 Gly Leu Leu Leu Val Leu Leu Pro Gly Ala His -10 -5 ggtggctcag ccaaaccttg agccctagag cccccctcaa ctctgttctc ctag ggg 450 Gly ctc atg cat ctt gcc cac agc aac ctc aaa cct gct gct cac ctc att 498 Leu Met His Leu Ala His Ser Asn Leu Lys Pro Ala Ala His Leu Ile 1 5 10 15 gtaaacatcc acctgacctc ccagacatgt ccccaccagc tctcctccta cccctgcctc 558 aggaacccaa gcatccaccc ctctccccca acttccccca cgctaaaaaa aacagaggga 618 gcccactcct atgcctcccc ctgccatccc ccaggaactc agttgttcag tgcccacttc 678 tac ccc agc aag cag aac tca ctg ctc tgg aga gca aac acg gac cgt 726 Tyr Pro Ser Lys Gln Asn Ser Leu Leu Trp Arg Ala Asn Thr Asp Arg 20 25 30 gcc ttc ctc cag gat ggt ttc tcc ttg agc aac aat tct ctc ctg gtc 774 Ala Phe Leu Gln Asp Gly Phe Ser Leu Ser Asn Asn Ser Leu Leu Val 35 40 45 tagaaaaaat aattgatttc aagaccttct ccccattctg cctccattct gaccatttca 834 ggggtcgtca ccacctctcc tttggccatt ccaacagctc aagtcttccc tgatcaagtc 894 accggagctt tcaaagaagg aattctaggc atcccagggg acccacacct ccctgaacca 954 <200> 2 <210> <211> 82 amino acids <212> A <214> L <400> 2 Met Thr Pro Pro Glu Arg Leu Phe Leu Pro Arg Val Cys Gly Thr Thr -30 -25 -20 Leu His Leu Leu Leu Leu Gly Leu Leu Leu Val Leu Leu Pro Gly Ala -15 -10 -5 His Gly Leu Met His Leu Ala His Ser Asn Leu Lys Pro Ala Ala His 1 5 10 Leu Ile Tyr Pro Ser Lys Gln Asn Ser Leu Leu Trp Arg Ala Asn Thr 15 20 25 30 Asp Arg Ala Phe Leu Gln Asp Gly Phe Ser Leu Ser Asn Asn Ser Leu 35 40 45 Leu Val < 9. Appendix B to Subpart G is proposed to be removed. [Appendix B To Subpart G of Part 1- Headings For Information Items In 1.823 (1) GENERAL INFORMATION: (i) APPLICANT: (ii) TITLE OF INVENTION: (iii) NUMBER OF SEQUENCES: (iv) CORRESPONDENCE ADDRESS: (A) ADDRESSEE: (B) STREET: (C) CITY: (D) STATE: (E) COUNTRY: (F) ZIP: (v) COMPUTER READABLE FORM: (A) MEDIUM TYPE: (B) COMPUTER: (C) OPERATING SYSTEM: (D) SOFTWARE (vi) CURRENT APPLICATION DATA: (A) APPLICATION NUMBER: (B) FILING DATE: (C) CLASSIFICATION: (vii) PRIOR APPLICATION DATA: (A) APPLICATION NUMBER: (B) FILING DATE: (viii) ATTORNEY/AGENT INFORMATION: (A) NAME: (B) REGISTRATION NUMBER: (C) REFERENCE/DOCKET NUMBER: (ix) TELECOMMUNICATIONS INFORMATION: (A) TELEPHONE: (B) TELEFAX: (C) TELEX: (2) INFORMATION FOR SEQ ID NO: X: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: (B) TYPE: (C) STRANDEDNESSS: (D) TOPOLOGY: (ii) MOLECULE TYPE: - Genomic RNA; - Genomic DNA; - mRNA; - tRNA; - rRNA; - snRNA; - scRNA; -preRNA; - cDNA to genomic RNA; - cDNA to mRNA; - cDNA to tRNA; - cDNA to rRNA; - cDNA to snRNA; - cDNA to scRNA; - Other nucleic acid; (A) DESCRIPTION: - protein and - peptide. (iii) HYPOTHETICAL: (iv) ANTI-SENSE: (v) FRAGMENT TYPE: (vi) ORIGINAL SOURCE: (A) ORGANISM: (B) STRAIN: (C) INDIVIDUAL ISOLATE: (D) DEVELOPMENTAL STAGE: (E) HAPLOTYPE: (F) TISSUE TYPE: (G) CELL TYPE: (H) CELL LINE: (I) ORGANELLE: (vii) IMMEDIATE SOURCE: (A) LIBRARY: (B) CLONE: (viii) POSITION IN GENOME: (A) CHROMOSOME/SEGMENT: (B) MAP POSITION: (C) UNITS: (ix) FEATURE: (A) NAME/KEY: (B) LOCATION: (C) IDENTIFICATION METHOD: (D) OTHER INFORMATION: (x) PUBLICATION INFORMATION: (A) AUTHORS: (B) TITLE: (C) JOURNAL: (D) VOLUME: (E) ISSUE: (F) PAGES: (G) DATE: (H) DOCUMENT NUMBER: (I) FILING DATE: (J) PUBLICATION DATE: (K) RELEVANT RESIDUES: (xi) SEQUENCE DESCRIPTION: SEQ ID NO:X: ] September 23, 1996 BRUCE A. LEHMAN Assistant Secretary of Commerce and Commissioner of Patents and Trademarks