DEPARTMENT OF COMMERCE

Patent and Trademark Office

37 CFR Part 1

[Docket No: 960828235-6235-01]
RIN 0651-AA88


Changes Implementing Nucleotide and/or Amino Acid Sequence Listings

AGENCY: Patent and Trademark Office, Commerce.

ACTION: Notice of proposed rulemaking and request for comments.

-----------------------------------------------------------------------

SUMMARY: The Patent and Trademark Office (PTO) is proposing to amend
the rules for submitting nucleic acid or amino acid sequences in
computer readable form (CRF) for patent applications to simplify the
requirements of the rules, to rearrange portions of the rules for
better understanding and to establish consistent rules to permit a
single internationally acceptable computer readable form. The Sequence
Listing will be presented in an international, language neutral format
using numeric identifiers rather than the current subject headings and
the paper Sequence Listing will be a separately numbered section of the
patent application. Sequences which contain fewer than four (4)
specifically identified nucleotides or amino acids will no longer be
required to be submitted in computer readable form.

DATE: Written comments must be received by December 3, 1996.

ADDRESSES: Address written comments to: Box Comments--Patents,
Assistant Commissioner for Patents, Washington, DC 20231, Attention:
Esther M. Kepplinger or by Fax to (703) 305-3601 to her attention.
Comments may be sent by mail message over the Internet addressed to
seqrule@uspto.gov. The written comments will be available for public
inspection in Suite 520, Crystal Park One, 2011 Crystal Drive,
Arlington, Virginia.

FOR FURTHER INFORMATION CONTACT: Esther M. Kepplinger, by telephone at
(703) 308-2339 or by mail addressed to: Box Comments--Patents,
Assistant Commissioner for Patents, Washington, DC 20231 marked to her
attention or by Fax to (703) 305-3601 or by electronic mail at
ekepplin@uspto.gov.

SUPPLEMENTARY INFORMATION: The existing sequence rules (37 CFR 1.821-
1.825) provide a standardized format for the description of nucleotide
and amino acid sequence data in patent applications and require the
submission of such sequences in computer readable form (CRF). The
existing sequence rules have provided the following benefits to the
PTO: (1) Improved search capabilities; (2) improved interference
detection; (3) more efficient examination; (4) cost savings for the
input of the sequence data; (5) more efficient and accurate printing of
sequences in patents; (6) exchange of the sequence data with other
patent offices electronically and (7) improved public access to the
sequences electronically.
    In an effort to streamline and reduce the procedural requirements
of the existing rules and to respond to the needs of our customers
while establishing an internationally acceptable standard, the PTO
proposes to modify the current rules requiring the submission of
computer readable forms for nucleotide and amino acid sequences.
    To decrease the burden on applicants who file applications
containing nucleotide and amino acid sequence information under the
Patent Cooperation Treaty (PCT), the PTO entered into discussions at
the PCT Meeting of International Authorities (MIA) in November 1994 on
changing the applicable rules for submission and transfer of Sequence
Listings. Under the current PCT rules, each International Searching
Authority and national Office may set the standard for submission of
the paper and electronic Sequence Listing information. This may impose
a burden on applicants of providing several different formats of
Sequence Listings in different languages during the international and
national phases of the PCT procedure.
    Under the current PCT practice, the applicant serves as the data
repository for requests during each stage of the PCT practice for new
electronic copies of the Sequence Listings.
    Under national practice, a Sequence Listing may be required to be
translated into the national language at considerable cost and posing
the danger that the data could be inadvertently altered.
    At the November 1994 MIA to address these problems, rule changes
were proposed to require a language neutral Sequence Listing submission
which would suffice for PCT and national stage sequence information
processing. Initial Trilateral meetings and correspondence suggest that
such a sequence submission would be acceptable under European Patent
Office (EPO) and Japanese Patent Office (JPO) procedures, thus further
lessening the burden on applicants.
    These sequence rules are proposed to be revised in concert with
World Intellectual Property Organization (WIPO) International Standards
ST.23 and ST.24 for the paper and electronic submission of sequence
information in patent applications, as well as PCT requirements. This
should result in an applicant having to produce a single Sequence
Listing that would satisfy the filing requirements in all countries, as
well as permitting an applicant to submit only a single electronic
Sequence Listing in PCT applications.
    In an effort to profit from the experiences of the nucleotide
database information providers which pioneered the electronic
submission of sequence information, the PTO discussed with them the
possible simplification of the PTO sequence submission rules. In
response to their advice (which confirmed the PTO experience), the
number of mandatory data elements is proposed to be reduced.
    Thus, the proposed rule changes include:
    (1) Use of numeric identifiers to replace the language subject
headings within the submission;
    (2) Elimination of unnecessary and confusing data elements;
    (3) Movement of the paper Sequence Listing to the end of the
application as a section with separately numbered pages;
    (4) Modification of 37 CFR Sec. 1.77 to include the paper Sequence
Listing as a part of the specification and to provide a place for the
paper Sequence Listing in the printed patent;
    (5) Elimination of the requirement to provide a submission for
sequences with fewer than four specifically defined nucleotides or
amino acids;
    (6) Use of lower-case one-letter codes for nucleotide bases;
    (7) Rearrangement of portions of the rules to improve their
context; and
    (8) Clarification and simplification of the rules to aid in
understanding of the requirements that they set forth.

Request for Comments

    The PTO is particularly interested in receiving comments on three
queries. Currently sequences containing D-amino acids need not be
provided in the ``Sequence Listing'', but the PTO has accepted
voluntary submissions of sequences containing D-amino acids.
    The commercially available sequence searching software used to
search prior art databases is not capable of discerning D-amino acids
since they do not have distinct designators. It is for this reason that
the rules do not require a computer readable form for the disclosure of
sequences which contain D-amino acids.
    Those seeking to volunteer the information in accordance with these

[[Page 51856]]

rules might be seeking assurance that a machine search of the closest
prior art will be conducted by the PTO or they consider the information
useful and wish it to be in the database. If the PTO does not accept
voluntary submissions, that would exclude information from the
databases that at least some applicants believe to be valuable
information.
    The potential conflict created by accepting these D-amino acid-
containing sequences is that the published database will contain
sequences with D-amino acids and those using the published database may
be operating on the assumption that it does not, given the indication
in Sec. 1.821(a)(2) that D-amino acid-containing sequences are not
intended to be included. For this reason, there may be an advantage to
having the D-amino acids indicated by Xaa to alert the user that the
Feature section must be consulted. A disadvantage of voluntary
submissions is that they will result in the generation of a database
which is incomplete and cannot be relied upon to provide a complete
search of the U.S. patent literature including sequences containing D-
amino acids.
    The PTO seeks comments on the following query:

    (1) Should the PTO accept voluntary submissions of computer
readable forms and Sequence Listings where a D-amino acid is
contained in the sequence? If such voluntary submissions are
accepted, should there be a restriction on the choice of identifying
a D-amino acid by an Xaa or by its L-amino acid counterpart
abbreviation?

    Section 1.821(c) will continue to require that all sequence
information contained in a disclosure, including in the specification,
drawings or claims, be presented in the Sequence Listing in accordance
with Secs. 1.821--1.825. This provision does not discriminate between
prior art sequences and ``new'' sequences. The PTO has received
comments in the past and is seeking additional comments on this issue.
The suggestion has been made that sequences which are prior art, and/or
are contained in a database at the time of filing, need not be provided
to the PTO in computer readable form since the sequence information is
obtainable by other means. Responsive to these public comments, the PTO
is considering amending the rules to permit omission of some sequences
from the Sequence Listing if these sequences are admitted prior art to
applicant and are in a publicly available, electronic, sequence
database and the database accession number is supplied.
    The suggestion to exclude prior art sequences was made when
Secs. 1.821-1.825 were originally adopted. 55 FR 18230, 18237 (1990).
The final rules, however, required the submission of all sequence
information in computer readable form. The reasons for that decision
include: (1) The assessment of whether a particular sequence falls
within the requirements of the current rules is simple; (2) the general
public is assured that all patents which contain any sequence
information contain all of the sequence information in the Sequence
Listing and all sequences are available in a computer accessible form;
(3) as a publication, the contextual association of new and old
information is potentially unique to the patent and very valuable to
anyone assessing the state of the art at the time of a patented
invention, and thus are desirable to be present in electronic form in
association with that patent; and (4) these rules do not require any
information to be disclosed in the form of a sequence, but rather
require a particular format whenever information is presented in the
form of a sequence. These reasons continue to be relevant.
    The PTO is concerned about how such a provision would be drafted
without creating difficult questions. A provision which excludes
sequences whenever a sequence is prior art and has previously been
included in a publicly available, electronic, sequence database appears
to be straightforward; however, many technical and legal issues would
result. What constitutes a publicly available, electronic, sequence
database? Would the USPTO and the other patent offices which have
similar rules be required to produce a list of internationally accepted
databases? What would be the criteria for such acceptance? An
additional issue would exist involving electronic records maintenance:
is there any assurance that once information is contained in a database
that it will be retained and available indefinitely without alteration?
Changes to the information in nucleic acid sequence databases resulting
from the discovery of sequencing errors are well-known. Does the mere
existence of the sequence information in such a record constitute
reasonable means of retrieval? Would not one need some text basis or
other identifier to retrieve the information?
    Concerns have been voiced that the redundancy of including old
sequences in the PTO database creates electronic searching problems,
such as increased cost and reduced speed. Upon investigation, it has
been found that requiring all disclosed sequences to be included in the
Sequence Listing does not cause search processing problems at the PTO
or incur increased costs. The PTO seeks comments on the following
query:

    (2) Should the provisions of 37 CFR 1.821(c) be altered to
exclude some prior art sequences from inclusion in the Sequence
Listing even though they are presented in a patent application
disclosure as sequences? Should the reference to an accession number
of an admitted prior art sequence in a publicly available,
electronic, sequence database suffice and exclude that sequence from
the requirements of the sequence rules?

    At the November 1994 MIA, it was proposed that the Sequence
Listings submitted in an international application filed under the PCT
would no longer be published on paper. It was suggested that the
Sequence Listings be published electronically and be available in the
electronic form from several sequence repositories throughout the
world. These repositories would have the Sequence Listings available in
electronic form at the time of publication of the PCT pamphlet.
    The PTO seeks comments on the following query:

    (3) Should Sequence Listings filed in an international
application filed under the PCT be published only electronically and
made available for retrieval electronically by an accession number
from several sequence repositories?

    Written comments will be available for public inspection and will
be available on the Internet (address: www.uspto.gov). Commentators
should note that since their comments will be made publicly available,
information that is not desired to be made public, such as the address
and phone number of the commentator, should not be included in the
comments. A public hearing will not be conducted.

Discussion of Specific Rules

    Section 1.77 is proposed to be amended by revising paragraph (g),
which would provide for a reference to a Sequence Listing Annex, if any
exists. In the application as filed, on a separate page immediately
before the claims, reference would be made to a Sequence Listing Annex
and the Sequence Listing would be provided as a separately numbered
section or Annex to the application. In a printed patent the Sequence
Listing would appear immediately before the claims.
    Section 1.77 is proposed to be amended to redesignate existing
paragraphs (g)-(j) as paragraphs (h)-(k) and add an additional
paragraph (l) Sequence Listing Annex. In the application as filed, the
Sequence Listing would be provided by applicants as a separately
numbered section or

[[Page 51857]]

Annex of the application. The pages of the Sequence Listing Annex
should be numbered independently from the specification using
sequential integers preceded by ``A'' to identify them as a part of the
Annex and to prevent any confusion which might arise from using numbers
already used in the specification. In a printed patent the Sequence
Listing would be printed immediately before the claims. In cases where
the Sequence Listing is voluminous, the files are difficult to handle.
This change would permit easier storage of very large Sequence Listings
apart from the main part of the application during pendency. The
presentation of the Sequence Listing as a separate Annex would also
facilitate compliance with PCT requirements and other national patent
office rules.
    Sections 1.821(a)(1) and (2) are proposed to be amended by
referring to sections in World Intellectual Property Organization
(WIPO) Handbook on Industrial Property Information and Documentation,
Standard ST.23, paragraphs 8 through 12, April 1994, herein
incorporated by reference, rather than to paragraphs in Sec. 1.822. The
WIPO Standard ST. 23 (April 1994) is consistent with Sec. 1.822 except
for certain corrections which are noted herein and the requirement of
the use of the lower case for the one-letter code for nucleotide bases.
The proposed rule states that the incorporation has been approved. This
language is required by the Federal Register. This incorporation by
reference will be reviewed by the Director of the Federal Register in
accordance with 5 U.S.C. 552(a) and 1 CFR part 51 before any Final Rule
is adopted. Copies may be obtained from the World Intellectual Property
Organization; 34 chemin des Colombettes; 1211 Geneva 20 Switzerland.
Copies may be inspected at the Patent Search Room; Crystal Plaza 3,
Lobby Level; 2021 South Clark Place; Arlington, VA 22202; or at the
Office of the Federal Register, 800 North Capitol Street, NW, Suite
700, Washington, DC 20408.
    Section 1.821(a) is proposed to be amended so that sequences with
fewer than four specifically defined amino acids or nucleotides would
be expressly excluded from this rule. ``Specifically defined'' means
those amino acids other than ``Xaa'' and those nucleotide bases other
than ``N'' defined in accordance with WIPO Standard ST.23.
    This change is being proposed to reduce the burden on applicants
for those sequences that contain only a minimal amount of sequence
information. For example, if an amino acid sequence is disclosed as
being entirely ``Xaa'' residues, the 1990 version of the sequence rules
would require this sequence to be submitted in computer readable form.
However, this sequence has no value as sequence information because
each of the positions is represented as a ``wild card.'' Such low-
information sequences are not very useful in any sequence matching and
alignment algorithm. In order to minimize the inclusion of such low-
information-value sequence data in the database and to relieve the
burden on applicants to submit low-information-value sequences, the
Office proposes this change to the sequence rules. If applicants should
wish to voluntarily submit a CRF for such sequences, they would be
accepted and entered in the PTO's database.
    It is not necessary that any of the non-N or non-Xaa residues be
adjacent to any other non-N or non-Xaa residue in order for a sequence
to be subject to Sec. 1.821(a).
    Sections 1.821(a)(2) and 1.822(b) are proposed to be amended by
changing ``elsewhere in the `Sequence Listing' '' to ``in the Feature
section.'' The purpose of this change is to enhance clarity of the
rule. The only place in the ``Sequence Listing'' where additional
information is permitted is in the Feature section. The current
language implies that there are other acceptable portions of the
``Sequence Listing'' appropriate for additional information and thus is
ambiguous and misleading.
    Section 1.821(a)(2) will continue to indicate that sequences
containing D-amino acids need not comply with the provisions of
Secs. 1.822-1.825. To date, the PTO has accepted voluntary submissions
of sequences which contain D-amino acids. The sequence information has
either indicated an Xaa at each occurrence of a D-amino acid or has
indicated the amino acid (or imino acid) by abbreviation as if it were
an L-amino acid (or imino acid) and explained the existence of the D-
amino acid in the Feature section associated with that sequence.
    Section 1.821(c) is proposed to be amended by clarifying and
establishing a language neutral format sequence listing. Specifically,
the use of integer identifiers is proposed for identifying sequences.
Where a sequence integer identifier is intentionally omitted, it must
be noted by applicant to avoid confusion in the published document.
    Section 1.821(d) is proposed to be amended by changing ``assigned
identifier'' to ``integer identifier'' to be consistent with the term
used in Sec. 1.821(c).
    Section 1.821(d) is proposed to be amended by adding the phrase,
``preceded by `SEQ ID NO:' ''. This change is necessitated by the
change to Sec. 1.821(c). Since the integer identifier in the ``Sequence
Listing'' would be defined now as a numeral only, it is necessary that
any reference to a particular sequence in the specification and claims
be preceded by ``SEQ ID NO:''. It is not acceptable to use only a
numeric identifier, such as ``<200>'' or ``<400>''--see infra Sequence
Listing table, in the description or the claims because one reading a
patent may not reasonably be presumed to be familiar with the meanings
of numeric identifiers.
    Section 1.821(e) is proposed to be amended by setting forth the
procedure for transferring an accepted computer readable Sequence
Listing from one application to a subsequently filed application. The
existing rules did not adequately describe the process of transferring
a computer readable Sequence Listing into a new application if an
identical CRF was previously accepted by the PTO for another
application. A further description of the intended procedures has been
added for purposes of clarity. This section is intended to describe
that if a computer readable Sequence Listing is identical to one that
is error-free and already on file at the PTO, an applicant has two
options. A new diskette may be submitted, or an applicant may submit a
statement clearly directing the PTO to use the previously submitted CRF
since they are identical, and that the paper copy of the Sequence
Listing in the new application is identical to the disk in the previous
application.
    Section 1.821(g) is proposed to be amended by correcting the
reference to 35 U.S.C. 111(a) applications. Section 1.821(h) is
proposed to be amended by clarifying that this rule applies to all
international applications searched and examined by the PTO. In
addition to international applications filed in the United States
Receiving Office, the United States is a competent International
Searching Authority (ISA) for applications filed in receiving Offices
of, or acting for, Brazil, Israel, Mexico, and Trinidad and Tobago. The
United States is also a competent ISA for applications filed in the
International Bureau where at least one of the applicants is a resident
or national of the United States or a resident or national of Barbados.
In addition, the United States acts as an International Preliminary
Examining Authority for certain applications searched in the EPO. The
language change regarding the time limit for compliance and statement
accompanying the submission are

[[Page 51858]]

necessary to conform with the language found in PCT Rule 13&ltSUP>ter.1.
    Section 1.822 is proposed to be revised for clarity and better
organization and to accommodate an international request for the use of
lower case one-letter codes for nucleotide bases.
    Section 1.822 (b) is proposed to be amended to refer to WIPO
Standard ST.23 (April 1994) and incorporate the information therein.
The reorganization groups all nucleotide and all amino acid formats
together.
    Section 1.822 (c)(1) is proposed to be amended by requiring the use
of lower case one-letter code for the nucleotide bases. This change
would put the PTO requirements in conformance with most large
databases. Additionally, the use of lower case letters in a sequence
makes the confusion of ``g'' for ``c'' and vice versa less likely.
    Current paragraph (d) is proposed to be redesignated as a part of
subparagraph (c)(3) and current paragraph (e) is proposed to be deleted
with the substance of the paragraph being incorporated into (d)(1).
Current paragraph (f) is proposed to be redesignated as subparagraph
(c)(2); current paragraph (g) is proposed to be redesignated as
subparagraph (c)(3) and amended to incorporate current paragraph (d).
Current paragraph (h) is proposed to be redesignated as subparagraph
(d)(2). Current paragraphs (i) and (j) are proposed to be redesignated
as (c)(4) and (c)(5). Current paragraph (k) is proposed to be
redesignated as (d)(3). Current paragraph (l) is proposed to be
redesignated as (c)(6) and current paragraph (m) is proposed to be
redesignated as (d)(4). Current paragraph (n) is proposed to be
redesignated as (c)(7) and amended to delete a sentence, the substance
of which is incorporated into (d)(4).
    Paragraph (d)(1) is proposed to be added to include a reference to
WIPO Standard ST.23 (April 1994). Paragraphs (d)(2-4) incorporate the
material from current paragraphs (h), (k), (m) and a sentence of (n).
Paragraph (d)(5) is proposed to be added to clarify that the use of
terminator symbols is not acceptable in amino acid sequences either as
``internal'' terminator symbols or following the carboxy terminal amino
acid of a peptide or polypeptide.
    Current paragraph (o) is proposed to be redesignated as paragraph
(e) and amended to recite integer identifier to be consistent with
Sec. 1.821 (c) and to permit the language neutral submission.
    Current paragraph (p) is proposed to be deleted.
    The lists of nucleic acid and amino acid abbreviations and the
lists of modified base controlled vocabulary and the modified and
unusual amino acids would be replaced by reference to WIPO Standard
ST.23 RECOMMENDATION FOR THE PRESENTATION OF NUCLEOTIDE AND AMINO ACID
SEQUENCE LISTINGS IN PATENT APPLICATIONS AND IN PUBLISHED PATENT
DOCUMENTS (April 1994) to simplify and shorten the rules. This
information will also appear in an appropriate section of the Manual of
Patent Examining Procedure to assist applicants in preparing Sequence
Listings. For purposes of facilitating review of these proposed rule
changes, appropriate corrected excerpts of paragraphs 8, 9, 11 and 12
of WIPO Standard ST.23 are provided below.
    WIPO Standard ST.23, paragraph 8, provides that the bases of a
nucleotide sequence should be represented using the following one-
letter code for nucleotide sequence characters.

------------------------------------------------------------------------
       Symbol                  Meaning            Origin of designation
------------------------------------------------------------------------
A...................  A.......................  Adenine
G...................  G.......................  Guanine.
C...................  C.......................  Cytosine.
T...................  T.......................  Thymine.
U...................  U.......................  Uracil.
R...................  G or A..................  puRine.
Y...................  T/U or C................  pYrimidine.
M...................  A or C..................  aMino.
K...................  G or T/U................  Keto.
S...................  G or C..................  Strong interactions 3H-
                                                 bonds.
W...................  A or T/U................  Weak interactions 2H-
                                                 bonds.
B...................  G or C or T/U...........  not A.
D...................  A or G or T/U...........  not C.
H...................  A or C or T/U...........  not G.
V...................  A or G or C.............  not T, not U.
N...................  (A or G or C or T/U) or   aNy.
                       (unknown or other).
------------------------------------------------------------------------

    WIPO Standard ST.23, paragraph 9, provides: Modified bases may be
represented as the corresponding unmodified bases in the sequence
itself if the modified base is one of those listed below and the
modification is further described elsewhere in the Sequence Listing.
The codes from the list below may be used in the description or the
Sequence Listing but not in the sequence itself.

----------------------------------------------------------------------------------------------------------------
                              Symbol                                                  Meaning
----------------------------------------------------------------------------------------------------------------
ac4c.............................................................  4-acetylcytidine.
chm5u............................................................  5-(carboxyhydroxylmethyl)uridine.
cm...............................................................  2'-O-methylcytidine.
cmnm5s2u.........................................................  5-carboxymethylaminomethyl-2-thiouridine.
cmnm5u...........................................................  5-carboxymethylaminomethyluridine.
d................................................................  dihydrouridine.
fm...............................................................  2'-O-methylpseudouridine.
gal q............................................................  *beta, D-galactosylqueosine.
gm...............................................................  2'-O-methylguanosine.
i................................................................  inosine.
i6a..............................................................  N6-isopentenyladenosine.
m1a..............................................................  1-methyladenosine.
m1f..............................................................  1-methylpseudouridine.
m1g..............................................................  1-methylguanosine.
m1i..............................................................  1-methylinosine.
m22g.............................................................  2,2-dimethylguanosine.
m2a..............................................................  2-methyladenosine.
m2g..............................................................  2-methylguanosine.
m3c..............................................................  3-methylcytidine.
m5c..............................................................  5-methylcytidine.
m6a..............................................................  N6-methyladenosine.
m7g..............................................................  7-methylguanosine.
mam5u............................................................  5-methylaminomethyluridine.
mam5s2u..........................................................  5-methoxyaminomethyl-2-thiouridine.

[[Page 51859]]


man q............................................................  *beta, D-mannosylqueosine.
mcm5s2u..........................................................  5-methoxycarbonylmethyl-2-thiouridine.
mcm5u............................................................  5-methoxycarbonylmethyluridine.
mo5u.............................................................  5-methoxyuridine.
ms2i6a...........................................................  2-methylthio-N6-isopentenyladenosine.
ms2t6a...........................................................  N-((9-beta-D-ribofuranosyl-2-methylthiopurine-
                                                                    6-yl) carbamoyl) threonine.
mt6a.............................................................  N-((9-beta-D-ribofuranosylpurine-6-yl)N-
                                                                    methylcarbamoyl) threonine.
mv...............................................................  uridine-5-oxyacetic acid-methylester.
o5u..............................................................  uridine-5-oxyacetic acid (v).
osyw.............................................................  wybutoxosine.
p................................................................  pseudouridine.
q................................................................  *queosine.
s2c..............................................................  2-thiocytidine.
s2t..............................................................  5-methyl-2-thiouridine.
s2u..............................................................  2-thiouridine.
s4u..............................................................  4-thiouridine.
t................................................................  5-methyluridine.
t6a..............................................................  N-((9-beta-D-ribofuranosylpurine-6-yl)-
                                                                    carbamoyl)threonine.
tm...............................................................  2'-O-methyl-5-methyluridine.
um...............................................................  2'-O-methyluridine.
yw...............................................................  wybutosine.
x................................................................  3-(3-amino-3-carboxy-propyl)uridine, (acp3)u.

----------------------------------------------------------------------------------------------------------------
*Indicates a correction of minor typographical errors.

    WIPO Standard ST.23, paragraph 11, provides that the amino acids
should be represented using the following three-letter code with the
first letter as a capital.

------------------------------------------------------------------------
               Symbol                               Meaning
------------------------------------------------------------------------
Ala.................................  Alanine.
Cys.................................  Cysteine.
Asp.................................  Aspartic Acid.
Glu.................................  Glutamic Acid.
Phe.................................  Phenylalanine.
Gly.................................  Glycine.
His.................................  Histidine.
Ile.................................  Isoleucine.
Lys.................................  Lysine.
Leu.................................  Leucine.
Met.................................  Methionine.
Asn.................................  Asparagine.
Pro.................................  Proline.
Gln.................................  Glutamine.
Arg.................................  Arginine.
Ser.................................  Serine.
Thr.................................  Threonine.
Val.................................  Valine.
Trp.................................  Tryptophan.
Tyr.................................  Tyrosine.
Asx.................................  Asp or Asn.
Glx.................................  Glu or Gln.
Xaa.................................  unknown or other.
------------------------------------------------------------------------

    WIPO Standard ST.23, paragraph 12, provides: Modified and unusual
amino acids may be represented as the corresponding unmodified amino
acids in the sequence itself if the modified amino acid is one of those
listed below and the modification is further described elsewhere in the
Sequence Listing. The codes from the list below may be used in the
description or the Sequence Listing but not in the sequence itself.

------------------------------------------------------------------------
               Symbol                               Meaning
------------------------------------------------------------------------
Aad.................................  2-Aminoadipic acid.
bAad................................  3-aminoadipic acid.
bAla................................  beta-Alanine, beta-Aminopropionic
                                       acid.
Abu.................................  2-Aminobutyric acid.
4Abu................................  4-Aminobutyric acid, piperidinic
                                       acid.
Acp.................................  6-Aminocaproic acid.
Ahe.................................  2-Aminoheptanoic acid.
Aib.................................  2-Aminoisobutyric acid.
bAib................................  3-Aminoisobutyric acid.
Apm.................................  2-Aminopimelic acid.
Dbu.................................  *2,4-Diaminobutyric acid.
Des.................................  Desmosine.
Dpm.................................  2,2'-Diaminopimelic acid
Dpr.................................  2,3-Diaminopropionic acid.
EtGly...............................  N-Ethylglycine.
EtAsn...............................  N-Ethylasparagine.
Hyl.................................  Hydroxylysine.
aHyl................................  allo-Hydroxylysine.
3Hyp................................  3-Hydroxyproline.
4Hyp................................  4-Hydroxyproline.
Ide.................................  Isodesmosine.
*aIle...............................  allo-Isoleucine.
MeGly...............................  N-Methylglycine, sarcosine.
*MeIle..............................  N-Methylisoleucine.
MeLys...............................  6-N-Methyllysine.
MeVal...............................  N-Methylvaline.
Nva.................................  Norvaline.
Nle.................................  Norleucine.
Orn.................................  Ornithine.
------------------------------------------------------------------------
* Indicates a correction of a minor typographical error.

    Section 1.823(a) is proposed to be amended to provide for a
reference to a Sequence Listing Annex in the application immediately
before the claims and to provide the paper Sequence Listing as an
Annex, which is a separately numbered section of the application. This
is an internationally desired change and also would facilitate easier
storage of very large Sequence Listings separate from the main part of
the file during pendency of the application.
    Section 1.823(b) is proposed to be amended to insert a table to
depict items of information (data elements) which are to be included in
the Sequence Listing and to indicate whether they are mandatory or
optional. The proposed revisions reflect the change to a language
neutral submission. The English language data elements headings would
be replaced by numeric identifiers. The numeric identifiers are similar
to INID codes (``Internationally agreed Numbers for the Identification
of Data'' as per WIPO Standard ST.9, December 1990) already utilized
internationally in patent documents. This change would facilitate a
single international standard which would eliminate the need for
translations into non-English languages.
    Large portions of Section 1.823(b) are proposed to be deleted to
lessen the burden on applicants and to eliminate collections of
material which is of limited use to the Office. The following items are
typical of material which would be deleted:
    (1)(vi)(C) CLASSIFICATION;
    (2)(i)(C) STRANDEDNESS;
    (2)(ii) MOLECULE TYPE through (2)(vii)(C) UNITS; and
    (2)(ix)(C) IDENTIFICATION METHOD.
    In order to clarify the rule, the proposed change would identify
specifically those items which can be enumerated once in a Sequence
Listing. It is proposed that the recommended designation be eliminated,
leaving only mandatory and optional elements. Accordingly, it is
proposed to change element <140> Correspondence Address

[[Page 51860]]

and elements <150> through <154> from mandatory to optional. Elements
<100> General Information, <200> Information for SEQ ID NO, and <400>
Sequence Description: SEQ ID NO have been clarified as mandatory. In
element <193>, it is proposed to change TELEX to Electronic mail
address to be current with technology.
    It is proposed to eliminate Strandedness because the information is
of limited use to the Office. It is proposed to limit the response for
Topology to linear or circular because any other response does not
permit an adequate search. Because it is essential to the search to
know whether the sequence is circular, providing one of these two
responses to this data element is mandatory in the Sequence Listing.
Consistent with the international desire for eliminating language in
the Sequence Listing, Topology would be identified as L (linear) or C
(circular), and sequence Type would be N (nucleotide) or A (amino
acid).
    It is proposed to change Feature from a recommended to a mandatory
element if the sequence contains ``N'', ``Xaa'', a modified or unusual
L-amino acid or a modified base. This change would highlight the
presence of an unusual residue in the sequence which is important to
anyone using Sequence Listing information.
    Section 1.824 is proposed to be amended by revising the current
paragraphs (a) through (h) into paragraphs (a) through (c).
    Specifically, the following changes are proposed for Sec. 1.824:
    Current Sec. 1.824, paragraph (a), is proposed to be redesignated
as paragraph (a)(1). In addition, the term ``series of diskettes''
would be added to indicate the acceptability of receiving numerous
disks for large submissions. Current paragraph (b) is proposed to be
redesignated as paragraph (a)(2). Current paragraph (c) is proposed to
be redesignated as paragraph (a)(3). Current paragraph (d) is proposed
to be deleted because it is incorporated into subparagraph (a)(1).
Current paragraph (e) is proposed to be deleted since the PTO has not
found it to be necessary and feels it should not be a requirement
placed on the applicant, although the applicant may optionally continue
the practice of using write-protection if desired. In proposed
paragraph (a)(4), a ``compressed file'' format would be introduced as
an acceptable means to submit a large sequence listing, and in proposed
paragraph (a)(5), directions on suppressing page numbering on the
computer readable form version would be added for clarity.
    The text of current paragraph (f) is proposed to be deleted, but
the list of computer readable files is proposed to be redesignated as
subparagraphs under new (b) and (c). In proposed paragraph (b), the
explanation for ``pagination'' is proposed to be revised to reflect the
correct format required. Proposed paragraph (b)(1) is proposed to be
revised by deleting diskettes from PS/2 operating system as an accepted
format. In proposed paragraph (c), the diskette requirements are
proposed to be rearranged so that the most common diskette size used
for submissions is at the top of the list. Also in proposed paragraph
(c)(2), ``format'' is proposed to be amended to accommodate the current
PTO equipment, and in proposed new paragraphs (c)(3), (4), and (5),
additional items would be added to the list of acceptable media types
due to the changes in available equipment at the PTO.
    Current paragraph (g) is proposed to be redesignated as paragraph
(d).
    Current paragraph (h) is proposed to be deleted because the text is
proposed to be incorporated into paragraph (a)(6). The label
requirements would be rewritten more concisely than with the previous
rules. In addition, fewer items would be required to be placed on the
label under this proposed paragraph because the other items are no
longer deemed necessary by the PTO.
    Current Appendix A is proposed to be rewritten to reflect the
correct format of a Sequence Listing. The proposed Appendix A is
presented to provide a sample listing in the correct format as
described in the Table of amended Sec. 1.823(b). This sample includes
the use of numeric identifiers which reflect the change to a language
neutral submission. Current Appendix B is proposed to be deleted as the
information it presents is no longer valid under changes in this
proposed rule.

Review Under the Paperwork Reduction Act of 1995

    This proposed rule change contains information collection
requirements which are subject to review by the Office of Management
and Budget (OMB) under the Paperwork Reduction Act of 1995, 44 U.S.C.
3501, et seq. The title, description and respondent description of the
information collection is shown below with an estimate of the annual
reporting burdens. Included in the estimate is the time for reviewing
instructions, gathering and maintaining the data needed, and completing
and reviewing the collection of information. With respect to the
following collection of information, the PTO invites comments on: (1)
Whether the proposed collection of information is necessary for the
proper performance of the PTO's functions, including whether the
information will have practical utility; (2) the accuracy of the PTO's
estimate of the burden of the proposed collection of information,
including the validity of the methodology and assumptions used; (3)
ways to enhance the quality, utility, and clarity of the information to
be collected; and (4) ways to minimize the burden of the collection of
information on respondents, including through the use of automated
collection techniques, when appropriate, and other forms of information
technology.
    Notwithstanding any other provision of law, no person is required
to respond to nor shall a person be subject to a penalty for failure to
comply with a collection of information subject to the requirements of
the Paperwork Reduction Act unless that collection of information
displays a currently valid OMB control number.
    OMB Number: 0651-0024.
    Title: Requirements for Patent Applications Containing Nucleotide
Sequence and/or Amino Acid Sequence Disclosures.
    Form Numbers: None.
    Type of Review: Revision of currently approved collection.
    Affected Public: Individuals or households, business or other for-
profit institutions, not-for-profit institutions, and Federal
Government.
    Estimated Number of Respondents: 4,600.
    Estimated Time Per Response: 80 minutes.
    Estimated Total Annual Burden Hours: 6,133.
    Needs and Uses: The PTO requires biotechnology patent applicants to
submit sequence information to enable the PTO to properly examine and
process their applications.
    As required by the Paperwork Reduction Act of 1995, 44 U.S.C.
3507(d), the PTO has submitted a copy of this proposed rulemaking to
OMB for its review of this information collection. Interested persons
are requested to send comments regarding this information collections,
including suggestions for reducing this burden, to the Office of
Information and Regulatory Affairs of OMB, New Executive Office Bldg.,
725 17th Street, NW., Room 10235, Washington, D.C. 20503, Attn: Desk
Officer for the Patent and Trademark Office.
    OMB is required to make a decision concerning the collection of
information in these proposed regulations between 30 and 60 days after
the publication of this document in the Federal Register.

[[Page 51861]]

Therefore, a comment to OMB is best assured of having its full effect
if OMB receives it within 30 days of publication. This does not affect
the deadline for the public to comment to the PTO on the proposed
regulations.

Other Considerations

    This proposed rule change is in conformity with the requirements of
the Regulatory Flexibility Act (5 U.S.C. 601 et seq.), Executive Order
12612, and the Paperwork Reduction Act of 1995, 44 U.S.C. 3501 et seq.
It has been determined that this proposed rule is not significant for
the purposes of Executive Order 12866.
    The Assistant General Counsel for Legislation and Regulation of the
Department of Commerce has certified to the Chief Counsel for Advocacy,
Small Business Administration, that this proposed rule change would not
have a significant economic impact on a substantial number of small
entities (Regulatory Flexibility Act, 5 U.S.C. 601 et seq.). The
principal effect of this rule change is to simplify and clarify the
rules governing the submission of Sequence Listings for patent
applications containing nucleic acid and/or amino acid sequences.
    The PTO has also determined that this proposed rule change has no
Federalism implications affecting the relationship between the National
Government and the States as outlined in Executive Order 12612.

List of Subjects in 37 CFR Part 1

    Administrative practice and procedure, Courts, Freedom of
Information, Inventions and patents, Reporting and record-keeping
requirements, Small businesses.

    For the reasons set forth in the preamble and under the authority
granted to the Commissioner of Patents and Trademarks by 35 U.S.C. 6,
the PTO proposes to amend 37 CFR part 1 as set forth below. Removals
are indicated by brackets ( [] ) and additions indicated by arrows (>
<).

PART 1--RULES OF PRACTICE IN PATENT CASES

    1. The authority citation for 37 CFR part 1 would continue to read
as follows:

    Authority: 35 U.S.C. 6 unless otherwise noted.

    2. Section 1.77 is proposed to be amended by redesignating current
paragraphs (g) through (j) as paragraphs (h) through (k) and by adding
new paragraphs (g) and (l) to read as follows:

Sec. 1.77  Arrangement of application elements.

* * * * *
    >(g) Reference to Sequence Listing Annex.<
    [(g)]>(h)< Claim or claims.
    [(h)]>(i)< Abstract of the disclosure.
    [(i)]>(j)< Signed oath or declaration.
    [(j)]>(k)< Drawings.
    >(l) Sequence Listing Annex.<
    3. Section 1.821 is proposed to be amended by revising paragraphs
(a) and (c) through (h) to read as follows:

Sec. 1.821  Nucleotide and/or amino acid sequence disclosures in patent
applications.

    (a) Nucleotide and/or amino acid sequences as used in Secs. 1.821
through 1.825 are interpreted to mean an unbranched sequence of four or
more amino acids or an unbranched sequence of ten or more nucleotides.
Branched sequences are specifically excluded from this definition.
>Sequences with fewer than four specifically defined nucleotides or
amino acids are specifically excluded from this rule. ``Specifically
defined'' means those amino acids other than ``Xaa'' and those
nucleotide bases other than ``N'' defined in accordance with the World
Intellectual Property Organization (WIPO) Handbook on Industrial
Property Information and Documentation, Standard ST.23: Recommendation
for the Presentation of Nucleotide and Amino Acid Sequence Listings in
Patent Applications and in Published Patent Documents, paragraphs 8
through 12, April 1994, herein incorporated by reference. (Hereinafter
``WIPO Standard ST.23 (April, 1994)''). This incorporation by reference
was approved by the Director of the Federal Register in accordance with
5 U.S.C. 552(a) and 1 CFR part 51. Copies of ST.23 may be obtained from
the World Intellectual Property Organization; 34 chemin des
Colombettes; 1211 Geneva 20 Switzerland. Copies of ST.23 may be
inspected at the Patent Search Room; Crystal Plaza 3, Lobby Level; 2021
South Clark Place; Arlington, VA 22202; or at the Office of the Federal
Register, 800 North Capitol Street, NW, Suite 700, Washington, DC. <
Nucleotides and amino acids are further defined as follows:
    (1) Nucleotides are intended to embrace only those nucleotides that
can be represented using the symbols set forth in [Sec. 1.822(b)(1)]
>WIPO Standard ST.23 (April 1994), paragraph 8<. Modifications, e.g.,
methylated bases, may be described as set forth in [Sec. 1.822(b)]
>WIPO Standard ST.23 (April 1994), paragraph 9< , but shall not be
shown explicitly in the nucleotide sequence.
    (2) Amino acids are those L-amino acids commonly found in naturally
occurring proteins and are listed in [Sec. 1.822(b)(2)] >WIPO Standard
ST.23 (April 1994), paragraph 11<. Those amino acid sequences
containing D-amino acids are not intended to be embraced by this
definition. Any amino acid sequence that contains post-translationally
modified amino acids may be described as the amino acid sequence that
is initially translated using the symbols shown in [Sec. 1.822(b)(2)]
>WIPO Standard ST.23 (April 1994), paragraph 11< with the modified
positions; e.g., hydroxylations or glycosylations, being described as
set forth in [Sec. 1.822(b)] >WIPO Standard ST.23 (April 1994),
paragraph 12<, but these modifications shall not be shown explicitly in
the amino acid sequence. Any peptide or protein that can be expressed
as a sequence using the symbols in [Sec. 1.822(b)(2)] >WIPO Standard
ST.23 (April 1994), paragraph 11< in conjunction with a description
[elsewhere in the ``Sequence Listing''] >in the Feature section< to
describe, for example, modified linkages, cross links and end caps,
non-peptidyl bonds, etc., is embraced by this definition.
    (b) * * *
    (c) Patent applications which contain disclosures of nucleotide
and/or amino acid sequences must contain, as a separate part of the
disclosure on paper copy, hereinafter referred to as the ``Sequence
Listing,'' a disclosure of the nucleotide and/or amino acid sequences
and associated information using the symbols and format in accordance
with the requirements of Secs. 1.822 and 1.823. Each sequence disclosed
must appear separately in the ``Sequence Listing.'' Each sequence set
forth in the ``Sequence Listing'' shall be assigned a separate
>integer< identifier [written as SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3,
etc]. >The integer identifiers shall begin with 1 and increase
sequentially by integers. If no sequence is present for an integer
identifier, the words ``This sequence omitted'' shall appear following
the integer identifier.<
    (d) Where the description or claims of a patent application discuss
a sequence listing that is set forth in the ``Sequence Listing'' in
accordance with paragraph (c) of this section, reference must be made
to the sequence by use of the [assigned] >integer< identifier,
>preceded by ``SEQ ID NO:''< in the text of the description or claims,
even if the sequence is also embedded in the text of the description or
claims of the patent application.

[[Page 51862]]

    (e) A copy of the ``Sequence Listing'' referred to in paragraph (c)
of this section must also be submitted in computer readable form in
accordance with the requirements of Sec. 1.824. The computer readable
form is a copy of the ``Sequence Listing'' and will not necessarily be
retained as a part of the patent application file. If the computer
readable form of a new application is to be identical with the computer
readable form of another application of the applicant on file in the
Office, reference may be made to the other application and computer
readable form in lieu of filing a duplicate computer readable form in
the new application >if the computer readable form in the other
application was compliant with all of the requirements of these rules<.
The new application shall be accompanied by a letter making such
reference to the other application and computer readable form, both of
which shall be completely identified. >In the new application,
applicant must also request the use of the compliant computer readable
``Sequence Listing'' that is already on file for the other application
and must state that the paper copy of the ``Sequence Listing'' in the
new application is identical to the computer readable copy filed for
the other application.<
    (f) In addition to the paper copy required by paragraph (c) of this
section and the computer readable form required by paragraph (e) of
this section, a statement that the content of the paper and computer
readable copies are the same must be submitted with the computer
readable form. Such a statement must be a verified statement if made by
a person not registered to practice before the Office.
    (g) If any of the requirements of paragraphs (b) through (f) of
this section are not satisfied at the time of filing under 35 U.S.C.
111 >(a), which application is to be searched by the United
States International Searching Authority or examined by the United
States International Preliminary Examining Authority, applicant< will
be sent >a notice< requiring compliance with the requirements [,or such
other time as may be set by the Commissioner, in which to comply]
>within a prescribed time period<. Any submission in response to a
requirement under this paragraph must be accompanied by a statement
that the submission does not include [new] matter [or go] >which goes<
beyond the disclosure in the international application as filed. Such a
statement must be a verified statement if made by a person not
registered to practice before the Office. If applicant fails to timely
provide the required computer readable form, the United States
International Searching Authority shall search only to the extent that
a meaningful search can be performed >and the United States
International Preliminary Examining Authority shall examine only to the
extent that a meaningful examination can be performed<.
* * * * *
    4. Section 1.822 is proposed to be revised to read as follows:

Sec. 1.822  Symbols and format to be used for nucleotide and/or amino
acid sequence data.

    (a) The symbols and format to be used for nucleotide and/or amino
acid sequence data shall conform to the requirements of paragraphs (b)
through [(p)] >(e)< of this section.
    (b) The code for representing the nucleotide and/or amino acid
sequence characters shall conform to the code set forth in the tables
in [paragraphs (b)(1) and (b)(2) of this section] >WIPO Standard ST.23
(April 1994), paragraphs 8 and 11. This incorporation by reference was
approved by the Director of the Federal Register in accordance with 5
U.S.C. 552(a) and 1 CFR part 51. Copies of ST.23 may be obtained from
the World Intellectual Property Organization; 34 chemin des
Colombettes; 1211 Geneva 20 Switzerland. Copies of ST.23 may be
inspected at the Patent Search Room; Crystal Plaza 3, Lobby Level; 2021
South Clark Place; Arlington, VA 22202; or at the Office of the Federal
Register, 800 North Capitol Street, NW, Suite 700, Washington, DC<. No
code other than that specified in [this section] >these sections< shall
be used in nucleotide and amino acid sequences. A modified base or
>modified or unusual< amino acid may be presented in a given sequence
as the corresponding unmodified base or amino acid if the modified base
or >modified or unusual< amino acid is one of those listed in
[paragraphs (p)(1) or (p)(2) of this section] >WIPO Standard ST.23
(April 1994), paragraphs 9 and 12< and the modification is also set
forth [elsewhere in the Sequence Listing (for example, FEATURES
Sec. 1.823(b)(2)(ix))] >in the Feature section<. Otherwise, all bases
or amino acids not appearing in paragraphs [(b)(1) or (b)(2) of this
section] >8 and 11 of the WIPO Standard ST.23 (April 1994)< shall be
listed in a given sequence as ``N'' or ``Xaa,'' respectively, with
further information, as appropriate, given [elsewhere in the Sequence
Listing] >in the Feature section<.
    [(1) Base codes:

------------------------------------------------------------------------
               Symbol                               Meaning
------------------------------------------------------------------------
A...................................  A; adenine.
C...................................  C; cytosine.
G...................................  G; guanine.
T...................................  T; thymine.
U...................................  U; uracil.
M...................................  A or C.
R...................................  A or G.
W...................................  A or T/U.
S...................................  C or G.
Y...................................  C or T/U.
K...................................  G or T/U.
V...................................  A or C or G; not T/U.
H...................................  A or C or T/U; not G.
D...................................  A or G or T/U; not C.
B...................................  C or G or T/U; not A.
N...................................  (A or C or G or T/U) or (unknown
                                       or other).
------------------------------------------------------------------------

    (2) Amino acid three-letter abbreviations:

------------------------------------------------------------------------
            Abbreviation                        Amino acid name
------------------------------------------------------------------------
Ala.................................  Alanine.
Arg.................................  Arginine.
Asn.................................  Asparagine.
Asp.................................  Aspartic Acid.
Asx.................................  Aspartic Acid or Asparagine.
Cys.................................  Cysteine.
Glu.................................  Glutamic Acid.
Gln.................................  Glutamine.
Glx.................................  Glutamine or Glutamic Acid.
Gly.................................  Glycine.
His.................................  Histidine.
Ile.................................  Isoleucine.
Leu.................................  Leucine.
Lys.................................  Lysine.
Met.................................  Methionine.
Phe.................................  Phenylalanine.
Pro.................................  Proline.
Ser.................................  Serine.
Thr.................................  Threonine.
Trp.................................  Tryptophan.
Tyr.................................  Tyrosine.
Val.................................  Valine.
Xaa.................................  Unknown or other].
------------------------------------------------------------------------

    (c) >Format representation of nucleotides:
    (1)< A nucleotide sequence shall be listed using the >lower-case
letter for

[[Page 51863]]

representing the< one-letter code for the nucleotide bases[, as] >set
forth< in [paragraph (b)(1) of this section] >WIPO Standard ST.23
(April 1994), paragraph 8<.
    [(d) The amino acids corresponding to the codons in the coding
parts of a nucleotide sequence shall be typed immediately below the
corresponding codons. Where a codon spans an intron, the amino acid
symbol shall be typed below the portion of the codon containing two
nucleotides.
    (e) The amino acids in a protein or peptide sequence shall be
listed using the three-letter abbreviation with the first letter as an
upper case character, as in paragraph (b)(2) of this section.]
    [(f)] >(2)< The bases in a nucleotide sequence (including introns)
shall be listed in groups of 10 bases except in the coding parts of the
sequence. Leftover bases, fewer than 10 in number, at the end of
noncoding parts of a sequence shall be grouped together and separated
from adjacent groups of 10 or 3 bases by a space.
    [(g)] >(3)< The bases in the coding parts of a nucleotide sequence
shall be listed as triplets (codons). >The amino acids corresponding to
the codons in the coding parts of a nucleotide sequence shall be typed
immediately below the corresponding codons. Where a codon spans an
intron, the amino acid symbol shall be typed below the portion of the
codon containing two nucleotides.<
    [(h) A protein or peptide sequence shall be listed with a maximum
of 16 amino acids per line, with a space provided between each amino
acid.]
    [(i)] >(4)< A nucleotide sequence shall be listed with a maximum of
16 codons or 60 bases per line, with a space provided between each
codon or group of 10 bases.
    [(j)] >(5)< A nucleotide sequence shall be presented, only by a
single strand, in the 5' to 3' direction, from left to right.
    [(k) An amino acid sequence shall be presented in the amino to
carboxy direction, from left to right, and the amino and carboxy groups
shall not be presented in the sequence.]
    [(l)] >(6)< The enumeration of nucleotide bases shall start at the
first base of the sequence with number 1. The enumeration shall be
continuous through the whole sequence in the direction 5' to 3'. The
enumeration shall be marked in the right margin, next to the line
containing the one-letter codes for the bases, and giving the number of
the last base of that line.
    [(m) The enumeration of amino acids may start at the first amino
acid of the first mature protein, with the number 1. The amino acids
preceding the mature protein, e.g., pre-sequences, pro-sequences, pre-
pro-sequences and signal sequences, when presented, shall have negative
numbers, counting backwards starting with the amino acid next to number
1. Otherwise, the enumeration of amino acids shall start at the first
amino acid at the amino terminal as number 1. It shall be marked below
the sequence every 5 amino acids.]
    [(n)] >(7)< For those nucleotide sequences that are circular in
configuration, the enumeration method set forth in paragraph [(l)]
>(c)(6)< of this section remains applicable with the exception that the
designation of the first base of the nucleotide sequence may be made at
the option of the applicant. [The enumeration method for amino acid
sequences that is set forth in paragraph (m) of this section remains
applicable for amino acid sequences that are circular in
configuration.]
    >(d) Representation of amino acids:
    (1) The amino acids in a protein or peptide sequence shall be
listed using the three-letter abbreviation with the first letter as an
upper case character, as in WIPO Standard ST.23 (April 1994), paragraph
11.
    (2) A protein or peptide sequence shall be listed with a maximum of
16 amino acids per line, with a space provided between each amino acid.
    (3) An amino acid sequence shall be presented in the amino to
carboxy direction, from left to right, and the amino and carboxy groups
shall not be presented in the sequence.
    (4) The enumeration of amino acids may start at the first amino
acid of the first mature protein, with the number 1. The amino acids
preceding the mature protein, e.g., pre-sequences, pro-sequences, pre-
pro-sequences and signal sequences, when presented, shall have negative
numbers, counting backwards starting with the amino acid next to number
1. Otherwise, the enumeration of amino acids shall start at the first
amino acid at the amino terminal as number 1. It shall be marked below
the sequence every 5 amino acids. The enumeration method for amino acid
sequences that is set forth in this section remains applicable for
amino acid sequences that are circular in configuration.
    (5) An amino acid sequence that contains internal terminator
symbols, e.g., ``Ter'', ``*'', or ``.'', etc., may not be represented
as a single amino acid sequence, but shall be presented as separate
amino acid sequences.
    (e)< [(o)] A sequence with a gap or gaps shall be presented as a
plurality of separate sequences, with separate [sequence] >integer<
identifiers, with the number of separate sequences being equal in
number to the number of continuous strings of sequence data. A sequence
that is made up of one or more noncontiguous segments of a larger
sequence or segments from different sequences shall be presented as a
separate sequence.
    [(p) The code for representing modified nucleotide bases and
modified or unusual amino acids shall conform to the code set forth in
the tables in paragraphs (p)(1) and (p)(2) of this section. The
modified base controlled vocabulary in paragraph (p)(1) of this section
and the modified and unusual amino acids in paragraph (p)(2) of this
section shall not be used in the nucleotide and/or amino acid
sequences; but may be used in the description and/or the ``Sequence
Listing'' corresponding to, but not including, the nucleotide and/or
amino acid sequence.
    (1) Modified base controlled vocabulary:

----------------------------------------------------------------------------------------------------------------
                           Abbreviation                                      Modified base description
----------------------------------------------------------------------------------------------------------------
ac4c.............................................................  4-acetylcytidine.
chm5u............................................................  5-(carboxyhydroxylmethyl)uridine.
cm...............................................................  2'-O-methylcytidine.
cmnm5s2u.........................................................  5-carboxymethylaminomethyl-2-thioridine.
cmnm5u...........................................................  5-carboxymethylaminomethyluridine.
d................................................................  dihydrouridine.
fm...............................................................  2'-O-methylpseudouridine.
galq.............................................................  beta,D-galactosylqueosine.
gm...............................................................  2'-O-methylguanosine.
i................................................................  inosine.
i6a..............................................................  N6-isopentenyladenosine.
m1a..............................................................  1-methyladenosine.
m1f..............................................................  1-methylpseudouridine.

[[Page 51864]]


m1g..............................................................  1-methylguanosine.
m1l..............................................................  1-methylinosine.
m22g.............................................................  2,2-dimethylguanosine.
m2a..............................................................  2-methyladenosine.
m2g..............................................................  2-methylguanosine.
m3c..............................................................  3-methylcytidine.
m5c..............................................................  5-methylcytidine.
m6a..............................................................  N6-methyladenosine.
m7g..............................................................  7-methylguanosine.
mam5u............................................................  5-methylaminomethyluridine.
mam5s2u..........................................................  5-methoxyaminomethyl-2-thiouridine.
manq.............................................................  beta,D-mannosylqueosine.
mcm5s2u..........................................................  5-methoxycarbonylmethyluridine.
mo5u.............................................................  5-methoxyuridine.
ms2i6a...........................................................  2-methylthio-N6-isopentenyladenosine.
ms2t6a...........................................................  N-((9-beta-D-ribofuranosyl-2-methylthiopurine-
                                                                    6-yl) carbamoyl)threonine.
mt6a.............................................................  N-((9-beta-D-ribofuranosylpurine-6-yl)N-
                                                                    methyl-carbamoyl)threonine.
mv...............................................................  uridine-5-oxyacetic acid methylester.
o5u..............................................................  uridine-5-oxyacetic acid (v).
osyw.............................................................  wybutoxosine.
p................................................................  pseudouridine.
q................................................................  queosine.
s2c..............................................................  2-thiocytidine.
s2t..............................................................  5-methyl-2-thiouridine.
s2u..............................................................  2-thiouridine
s4u..............................................................  4-thiouridine.
t................................................................  5-methyluridine.
t6a..............................................................  N-((9-beta-D-ribofuranosylpurine-6-
                                                                    yl)carbamoyl) threonine.
tm...............................................................  2'-O-methyl-5-methyluridine.
um...............................................................  2'-O-methyluridine.
yw...............................................................  wybutosine.
x................................................................  3-(3-amino-3-carboxypropyl)uridine, (acp3)u.
----------------------------------------------------------------------------------------------------------------

    (2) Modified and unusual amino acids:

------------------------------------------------------------------------
            Abbreviation                Modified and unusual amino acid
------------------------------------------------------------------------
Aad.................................  2-Aminoadipic acid.
bAad................................  3-aminoadipic acid.
bAla................................  beta-Alanine, beta-Aminopropionic
                                       acid.
Abu.................................  2-Aminobutyric acid.
4Abu................................  4-Aminobutyric acid, piperidinic
                                       acid.
Acp.................................  6-Aminocaproic acid.
Ahe.................................  2-Aminoheptanoic acid.
Aib.................................  2-Aminoisobutyric acid.
bAib................................  3-Aminoisobutyric acid.
Apm.................................  2-Aminopimelic acid.
Dbu.................................  2,4-Diaminobutyric acid.
Des.................................  Desmosine.
Dpm.................................  2,2'-Diaminopimelic acid.
Dpr.................................  2,3-Diaminopropionic acid.
EtGly...............................  N-Ethylglycine.
EtAsn...............................  N-Ethylasparagine.
Hyl.................................  Hydroxylysine.
aHyl................................  allo-Hydroxylysine.
3Hyp................................  3-Hydroxyproline.
4Hyp................................  4-Hydroxyproline.
Ide.................................  Isodesmosine.
aIle................................  allo-Isoleucine.
MeGly...............................  N-Methylglycine, sarcosine.
MeIle...............................  N-Methylisoleucine.
MeLys...............................  N-Methylvaline.
Nva.................................  Norvaline.
Nle.................................  Norleucine.
Orn.................................  Ornithine.]
------------------------------------------------------------------------

    5. Section 1.823 is proposed to be revised to read as follows:

Sec. 1.823  Requirements for nucleotide and/or amino acid sequences as
part of the application papers.

    (a) The ``Sequence Listing'' required by Sec. 1.821(c), setting
forth the nucleotide and/or amino acid sequences, and associated
information in accordance with paragraph (b) of this section, must
begin on a new page and be titled ``Sequence Listing'' [and appear] >.
On a separate page of the application specification,< immediately prior
to the claims [.]>, there shall be a reference to the presence of the
``Sequence Listing'' in a ``Sequence Listing Annex.'' The ``Sequence
Listing'' shall appear in the ``Sequence Listing Annex,'' which is
numbered independently of the numbering of the remainder of the
application and shall be placed in the application file. Upon printing
the application as a patent, the ``Sequence Listing Annex'' containing
the paper ``Sequence Listing'' shall be printed immediately before the
patented claims.< Each page of the ``Sequence Listing'' shall contain
no more than 66 lines and each line shall contain no more than 72
characters. A fixed-width font shall be used exclusively throughout the
``Sequence Listing.''
    (b) The ``Sequence Listing'' shall, except as otherwise indicated,
include, in addition to and immediately preceding the actual nucleotide
and/or amino acid sequence, the [following items of information.] >
numeric identifiers and their accompanying information as shown in the
following table. The numeric identifier shall be used only in the
``Sequence Listing.''< The order and presentation of the items of
information in the ``Sequence Listing'' shall conform to the
arrangement given below [,except that parenthetical explanatory
information following the headings (identifiers) is to be omitted].
Each item of information shall begin on a new line [, enumerated with
the number/numeral/letter in parentheses as shown below, with the
heading (identifier) in upper case characters, followed by a colon, and
then followed by the information provided] > beginning with the numeric
identifier enclosed in angle brackets as shown<. Except as allowed
below, no item of information shall occupy more than one line. [Those
items of information that are applicable for all sequences shall only
be set forth once in the ``Sequence Listing.''] The submission of those
items of information designated with an ``M'' is mandatory. [The
submission of those items of information designated with an ``R'' is
recommended, but not required.] The submission of those items of
information designated with an ``O'' is optional. >Numeric identifiers
<100>

[[Page 51865]]

through <193> shall only be set forth at the beginning of the
``Sequence Listing.''< Those items designated with ``rep'' may have
multiple responses and, as such, the item may be repeated in the
``Sequence Listing.
    [(1) GENERAL INFORMATION (Application, diskette/tape and
publication information):
    (i) APPLICANT (maximum of first ten named applicants; specify one
name per line: SURNAME comma OTHER NAMES and/or INITIALS--M/rep):
    (ii) TITLE OF INVENTION (title of the invention, as elsewhere in
application, four lines maximum--M):
    (iii) NUMBER OF SEQUENCES (number of sequences in the ``Sequence
Listing'' (M):
    (iv) CORRESPONDENCE ADDRESS (M):
    (A) ADDRESSEE (name of applicant, firm, company or institution, as
may be appropriate):
    (B) STREET (correspondence street address, as elsewhere in
application, four lines maximum):
    (C) CITY (correspondence city address, as elsewhere in
application):
    (D) STATE (correspondence state, as elsewhere in application):
    (E) COUNTRY (correspondence country, as elsewhere in application):
    (F) ZIP (correspondence ZIP or postal code, as elsewhere in
application):
    (v) COMPUTER READABLE FORM (M):
    (A) MEDIUM TYPE (type of diskette/tape submitted):
    (B) COMPUTER (type of computer used with diskette/tape submitted):
    (C) OPERATING SYSTEM (type of operating system used):
    (D) SOFTWARE (type of software used to create computer readable
form):
    (vi) CURRENT APPLICATION DATA (M, if available):
    (A) APPLICATION NUMBER (U.S application number, including a series
code, a slash and a serial number, or U.S. PCT application number,
including the letters PCT, a slash, a two-letter code indicating the
U.S. as the Receiving Office, a two-digit indication of the year, a
slash and a five-digit number, if available):
    (B) FILING DATE (U.S. or PCT application filing date, if available;
specify as dd-MMM-yyyy):
    (C) CLASSIFICATION (IPC/US classification or F-term designation,
where F-terms have been developed, if assigned, specify each
designation, left justified, within an eighteen-position alpha numeric
field--rep, to a maximum of ten classification designations):
    (vii) PRIOR APPLICATION DATA (prior domestic, foreign priority or
international application data, if applicable--M/rep):
    (A) APPLICATION NUMBER (application number; specify as two-letter
country code and an eight-digit application number; or if a PCT
application, specify as the letters PCT, a slash, a two-letter code
indicating the Receiving Office, a two-digit indication of the year, a
slash and a five-digit number):
    (B) FILING DATE (document filing date, specify as dd-MMM-yyyy):
    (viii) ATTORNEY/AGENT INFORMATION (O):
    (A) NAME (attorney/agent name; SURNAME comma OTHER NAMES and/or
INITIALS):
    (B) REGISTRATION NUMBER (attorney/agent registration number):
    (C) REFERENCE/DOCKET NUMBER (attorney/agent reference or docket
number):
    (ix) TELECOMMUNICATION INFORMATION (O):
    (A) TELEPHONE (telephone number of applicant or attorney/agent):
    (B) TELEFAX (telefax number of applicant or attorney/agent):
    (C) TELEX (telex number of applicant or attorney/agent):
    (2) INFORMATION FOR SEQ ID NO: X (rep):
    (i) SEQUENCE CHARACTERISTICS (M):
    (A) LENGTH (sequence length, expressed as number of base pairs or
amino acid residues):
    (B) TYPE (sequence type, i.e., whether nucleic acid or amino acid):
    (C) STRANDEDNESS (if nucleic acid, number of strands of source
organism molecule, i.e., whether single-stranded, double-stranded, both
or unknown to applicant):
    (D) TOPOLOGY (whether source organism molecule is circular, linear,
both or unknown to applicant):
    (ii) MOLECULE TYPE (type of molecule sequenced in SEQ ID NO:X (at
least one of the following should be included with subheadings, if any,
in Sequence Listing--R)):

--Genomic RNA;
--Genomic DNA;
--mRNA
--tRNA;
--rRNA;
--snRNA;
--scRNA;
--preRNA;
--cDNA to genomic RNA;
--cDNA to mRNA;
--cDNA to tRNA;
--cDNA to rRNA;
--cDNA to snRNA;
--cDNA to scRNA;
--Other nucleic acid;

    (A) DESCRIPTION (four lines maximum):

--protein and
--peptide.

    (iii) HYPOTHETICAL (yes/no--R):
    (iv) ANTI-SENSE (yes/no--R):
    (v) FRAGMENT TYPE (for proteins and peptides only, at least one of
the following should be included in the Sequence Listing--R):

--N-terminal fragment;
--C-terminal fragment and
--internal fragment.

    (vi) ORIGINAL SOURCE (original source of molecule sequenced in SEQ
ID NO:X--R):
    (A) ORGANISM (scientific name of source organism):
    (B) STRAIN:
    (C) INDIVIDUAL ISOLATE (name/number of individual/isolate):
    (D) DEVELOPMENTAL STAGE (give developmental stage of source
organism and indicate whether derived from germ-line or rearranged
developmental pattern):
    (E) HAPLOTYPE:
    (F) TISSUE TYPE:
    (G) CELL TYPE:
    (H) CELL LINE:
    (I) ORGANELLE:
    (vii) IMMEDIATE SOURCE (immediate experimental source of the
sequence in SEQ ID NO:X--R):
    (A) LIBRARY (library -type, name):
    (B) CLONE (clone(s)):
    (viii) POSITION IN GENOME (position of sequence in SEQ ID NO:X in
genome--R):
    (A) CHROMOSOME/SEGMENT (chromosome/segment--name/number):
    (B) MAP POSITION:
    (C) UNITS (units for map position, i.e., whether units are genome
percent, nucleotide number or other/specify):
    (ix) FEATURE (description of points of biological significance in
the sequence in SEQ ID NO:X -R/rep):
    A) NAME/KEY (provide appropriate identifier for feature--four lines
maximum):
    (B) LOCATION (specify location according to syntax of DDBJ/EMBL/
GenBank Feature Tables Definition, including whether feature is on
complement of presented sequence; where appropriate state number of
first and last bases/amino acids in feature--four lines maximum):
    (C) IDENTIFICATION METHOD (method by which the feature was
identified, i.e., by experiment, by similarity with known sequence or
to an established consensus sequence, or by similarity to some other
pattern--four lines maximum):
    (D) OTHER INFORMATION (include information on phenotype conferred,

[[Page 51866]]

biological activity of sequence or its product, macromolecules which
bind to sequence or its product, or other relevant information--four
lines maximum):
    (x) PUBLICATION INFORMATION (Repeat section for each relevant
publication--O/rep):
    (A) AUTHORS (maximum of first ten named authors of publication;
specify one name per line: SURNAME comma OTHER NAMES and/or INITIALS--
rep):
    (B) TITLE (title of publication):
    (C) JOURNAL (journal name in which data published):
    (D) VOLUME (journal volume in which data published):
    (E) ISSUE (journal issue number in which data published):
    (F) PAGES (journal page numbers in which data published):
    (G) DATE (journal date in which data published; specify as dd-MMM-
yyyy, MMM-yyyy or Season-yyyy):
    (H) DOCUMENT NUMBER (document number, for patent type citations
only; specify as two-letter country code, eight-digit document number
(right justified), one letter and as appropriate, one number or a space
as a document type code; or if a PCT application specify as the letters
PCT, a slash, a two-letter code indicating the Receiving Office, a two-
digit indication of the year, a slash and a five-digit number; or if a
PCT publication, specify as the two letters WO, a two-digit indication
of the year, a slash and a five-digit publication number):
    (I) FILING DATE (document filing date, for patent-type citations
only; specify as dd-MMM-yyyy):
    (J) PUBLICATION DATE (document publication date; for patent-type
citations only, specify as dd-MMM-yyyy):
    (K) RELEVANT RESIDUES In SEQ ID NO:X (rep): FROM position) TO
position)
    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:X:]

----------------------------------------------------------------------------------------------------------------
                                                                                       Mandatory (M) or Optional
     Numeric identifier               Definition              Comments and format                 (O)
----------------------------------------------------------------------------------------------------------------
<100>......................  General Information........  Leave blank after <100>...  M
<110>......................  Applicant..................  Max. of 10 names; one name  M
                                                           per line; use format:
                                                           Surname, Other Names and/
                                                           or Initials; rep.
<120>......................  Title of Invention.........  Four lines maximum........  M
<130>......................  Number of Sequences........  Use an integer as a         M
                                                           response.
<140>......................  Correspondence Address.....  140> must be present if     O
                                                           subheadings <141>-<146>
                                                           are used.
<141>......................  Addressee..................  ..........................  O
<142>......................  Street.....................  Four lines maximum........  O
<143>......................  City.......................  ..........................  O
<144>......................  State or Province..........  ..........................  O
<145>......................  Country....................  ..........................  O
<146>......................  Zip or Postal Code.........  ..........................  O
<150>......................  Computer Readable Form.....  Leave blank after <150>...  O
<151>......................  Medium Type................  Type of diskette/tape       O
                                                           submitted.
<152>......................  Computer...................  Type of computer used to    O
                                                           create diskette/tape.
<153>......................  Operating System...........  Type of operating system    O
                                                           on computer.
<154>......................  Software...................  Type of software used to    O
                                                           create computer readable
                                                           form.
<160>......................  Current Application Data...  Leave blank after <160>;    M, if available.
                                                           <160> must be present if
                                                           subheadings <161> & <162>
                                                           are used.
<161>......................  Application Number.........  Specify as: US 07/999,999   M, if available.
                                                           or PCT/US96/99999.
<162>......................  Filing Date................  Specify as: dd-MMM-yyyy...  M, if available
<170>......................  Prior Application Data.....  Insert heading/subheadings  M, if applicable
                                                           only if applicable; leave
                                                           blank after <170>; <170>
                                                           must be present if
                                                           subheadings <171> & <172>
                                                           are used; rep.
<171>......................  Application Number.........  Specify as: US 07/999,999   M, if applicable.
                                                           or PCT/US96/99999.
<172>......................  Filing Date................  Specify as: dd-MMM-yyyy...  M, if applicable.
<180>......................   Attorney/Agent Information  Leave blank after <180>...  O
<181>......................  Name.......................  Use format: Surname, Other  O
                                                           Names and/or Initials.
<182>......................  Registration Number........  ..........................  O
<183>......................  File Reference/Docket        ..........................  O
                              Number.
<190>......................  Telecommunication            Leave blank after <190>...  O
                              Information.
<191>......................  Telephone..................  ..........................  O
<192>......................  Telefax....................  ..........................  O
<193>......................  Electronic mail address....  ..........................  O
<200>......................  Information for SEQ ID       Response shall be an        M
                              NO:#:.                       integer representing the
                                                           SEQ ID NO shown; rep.
<210>......................  Sequence Characteristics...  Leave blank after <210>...  M
<211>......................  Length.....................  Respond with an integer     M
                                                           expressing the number of
                                                           bases or amino acid
                                                           residues.
<212>......................  Type.......................  Whether presented sequence  M
                                                           molecule is nucleotide or
                                                           amino acid, indicated by
                                                           N or A.
<214>......................  Topology...................  Whether presented sequence  M
                                                           molecule is linear or
                                                           circular, indicated as L
                                                           or C.
<290>......................  Feature....................  Description of points of    M, if ``N'', ``Xaa'', or a
                                                           biological significance     modified or unusual L-
                                                           in the sequence; leave      amino acid or modified
                                                           blank after <290>; rep.     base was used in the
                                                                                       sequence.

[[Page 51867]]


<291>......................  Name/Key...................  Provide appropriate         M, if ``N'', ``Xaa'', or a
                                                           identifier for feature;     modified or unusual L-
                                                           four lines maximum.         amino acid or modified
                                                                                       base was used in the
                                                                                       sequence.
<292>......................  Location...................  Specify location within     M, if ``N'', ``Xaa'', or a
                                                           sequence; where             modified or unusual L-
                                                           appropriate state number    amino acid or modified
                                                           of first and last bases/    base was used in the
                                                           amino acids in feature;     sequence.
                                                           four lines maximum.
<294>......................  Other Information..........  Other relevant              M, if ``N'', ``Xaa'', or a
                                                           information; four lines     modified or unusual L-
                                                           maximum.                    amino acid or modified
                                                                                       base was used in the
                                                                                       sequence.
<300>......................  Publication Information....  Leave blank after <300>;    O
                                                           rep.
<301>......................  Authors....................  Maximum of ten named        O
                                                           authors of publication;
                                                           specify one name per
                                                           line; use format:
                                                           Surname, Other Names and/
                                                           or Initials.
<302>......................  Title......................  ..........................  O
<303>......................  Journal....................  ..........................  O
<304>......................  Volume.....................  ..........................  O
<305>......................  Issue......................  ..........................  O
<306>......................  Pages......................  ..........................  O
<307>......................  Date.......................  Journal date in which data  O
                                                           published; specify as dd-
                                                           MMM-yyyy, MMM-yyyy or
                                                           Season-yyyy.
<308>......................  Patent Document Number.....  Document number; for        O
                                                           patent-type citations
                                                           only.
<309>......................  Filing Date................  Document filing date, for   O
                                                           patent-type citations
                                                           only; specify as dd-MMM-
                                                           yyyy.
<310>......................  Publication Date...........  Document publication date,  O
                                                           for patent-type citations
                                                           only; specify as dd-MMM-
                                                           yyyy.
<311>......................  Relevant Residues..........  FROM (position) TO          O
                                                           (position).
<400>......................  Sequence Description: SEQ    Response shall be an        M
                              ID NO:#:.                    integer representing the
                                                           SEQ ID NO shown; rep.
----------------------------------------------------------------------------------------------------------------

    6. Section 1.824 is proposed to be revised to read as follows:

Sec. 1.824  Form and format for nucleotide and/or amino acid sequence
submissions in computer readable form.

    (a) The computer readable form required by Sec. 1.821(e) shall
[contain a printable copy of the ``Sequence Listing,'' as defined in
Secs. 1.821(c), 1.822 and 1.823, recorded as] >meet the following
specifications:
    (1) The computer readable form shall contain< a single [file on]
>''Sequence Listing'' as< either a diskette, [or a magnetic tape]
>series of diskettes, or other permissible media outlined in
Sec. 1.824(c)<. [The computer readable form shall be encoded and
formatted such that a printed copy of the ``Sequence Listing'' may be
recreated using the print commands of the computer/operating-system
configurations specified in paragraph (f) of this section.]
    [(b)] >(2)< The [file] >``Sequence Listing''< in paragraph (a)
>(l)< of this section shall be [encoded in a subset of the] >submitted
in< American Standard Code for Information Interchange (ASCII) >text<.
[This subset shall consist of all printable ASCII characters including
the ASCII space character plus line-termination, pagination and end-of-
file characters associated with the computer/operating-system
configurations specified in paragraph (f) of this section.] No other
[characters] >formats< shall be allowed.
    [(c)] >(3)< The computer readable form may be created by any means,
such as word processors, nucleotide/amino acid sequence editors or
other custom computer programs; however, it shall [be readable by one
of the computer/operating-system configurations specified in paragraph
(f) of this section, and shall] conform to [the] >all< specifications
[in paragraphs (a) and (b) of] >detailed in< this section.
    [(d) The entire printable copy of the ``Sequence Listing shall be
contained within one file on a single diskette or magnetic tape unless
it is shown to the satisfaction of the Commissioner that it is not
practical or possible to submit the entire printable copy of the
``Sequence Listing'' within one file on a single diskette or magnetic
tape.
    (e) The submitted diskette or tape shall be write-protected such as
by covering or uncovering diskette holes, removing diskette write tabs
or removing tape write rings.
    (f) As set forth in paragraph (c), above, any means may be used to
create the computer readable form, as long as the following conditions
are satisfied. A submitted diskette shall be readable on one of the
computer/operating-system configurations described in paragraphs (1)
through (3), below. A submitted tape shall satisfy the format
specifications described in paragraph (4), below.]
    >(4) File compression is acceptable when using diskette media, so
long as the compressed file is in a self-extracting format that will
decompress on one of the systems described in paragraph (b) of this
section.

[[Page 51868]]

    (5) Page numbering shall not appear within the computer readable
form version of the ``Sequence Listing'' file.
    (6) All computer readable forms shall have a label permanently
affixed thereto on which has been hand-printed or typed: the name of
the applicant, the title of the invention, the name and type of
computer and operating system used, and application serial number and
filing date, if known.
    (b) Computer readable form files submitted must meet these format
requirements:<
    (1) Computer: IBM PC/XT/AT, >or compatibles< [ IBM PS/2 or
compatibles]>, or Apple Macintosh<;
    [(i)]>(2)< Operating System: [PC-DOS or] MS-DOS [(Versions 2.1 or
above)] >, Unix or Macintosh<;
    [(ii)]>(3)< Line Terminator: ASCII Carriage Return plus ASCII Line
Feed;
    [(iii)]>(4)< Pagination: [ASCII Form Feed or Series of Line
Terminators] >Continuous file (no ``hard page break'' codes
permitted)<;
    [(iv) End-of-File: ASCII SUB (Ctrl-Z);
    (v) Media:]
    >(c) Computer readable form files submitted may be in any of the
following media:<
    [(A) Diskette--5.25 inch, 360 Kb storage;
    (B) Diskette--5.25 inch, 1.2 Mb storage;
    (C) Diskette--3.50 inch, 720 Kb storage;
    (D) Diskette--3.5 inch, 1.44 Mb storage;]
    >(1) Diskette : 3.50 inch, 1.44 Mb storage;
    3.50 inch, 720 Kb storage;
    5.25 inch, 1.2 Mb storage;
    5.25 inch, 360 Kb storage;<
    [(vi) Print Command: PRINT filename.extension;
    (2) Computer: IBM PC/XT/AT, IBM PS/2 or compatibles;
    (i) Operating system: Xenix;
    (ii) Line Terminator: ASCII Carriage Return;
    (iii) Pagination: ASCII Form Feed or Series of Line Terminators;
    (iv) End-of-File: None;
    (v) Media:
    (A) Diskette--5.25 inch, 360 Kb storage;
    (B) Diskette--5.25 inch, 1.2 Mb storage;
    (C) Diskette--3.50 inch, 720 Kb storage;
    (D) Diskette--3.5 inch, 1.44 Mb storage;
    (vi) Print Command: Ipr filename;
    (3) Computer: Apple Macintosh;
    (i) Operating System: Macintosh;
    (ii) Macintosh File Type: text with line termination
    (iii) Line Terminator: Pre-defined by text type file;
    (iv) Pagination: Pre-defined by text type file;
    (v) End-of-File: Pre-defined by text type file;
    (vi) Media:
    (A) Diskette--3.50 inch, 400 Kb storage;
    (B) Diskette--3.50 inch, 800 Kb storage;
    (C) Diskette--3.50 inch, 1.4 Mb storage;
    (vii) Print Command: Use PRINT command from any Macintosh
Application that processes text files, such as Mac-Write or TeachText;
    (4) Magnetic tape: 0.5 inch, up to 2400 feet;
    (i) Density: 1600 or 6250 bits per inch, 9 track;
    (ii) Format: raw, unblocked;
    (iii) Line Terminator: ASCII Carriage Return plus optional ASCII
Line Feed;
    (iv) Pagination: ASCII Form Feed or Series of Line Terminators;
    (v) Print Command (Unix shell version given here as sample
response--mt/dev/rmt0; 1pr/dev/rmt0):]
    >(2) Magnetic tape: 0.5 inch, up to 24000 feet;
    Density: 1600 or 6250 bits per inch, 9 track;
    Format: Unix tar command; specify blocking factor (not ``block
size'')
    Line Terminator: ASCII Carriage Return plus ASCII Line Feed;
    (3) 8mm Data Cartridge:
    Format: Unix tar command; specify blocking factor (not ``block
size'')
    Line Terminator: ASCII Carriage Return plus ASCII Line Feed;
    (4) CD-ROM:
    Format: ISO 9660 or High Sierra Format
    (5) Magneto Optical Disk:
    Size/Storage Specifications: 5.25 inch, 640 Mb<
    [(g)]>(d)< Computer readable forms that are submitted to the Office
will not be returned to the applicant.
    [(h) All computer readable forms shall have a label permanently
affixed thereto on which has been hand-printed or typed, a description
of the format of the computer readable form as well as the name of the
applicant, the title of the invention, the date on which the data were
recorded on the computer readable form and the name and type of
computer and operating system which generated the files on the computer
readable form. If all this information cannot be printed on a label
affixed to the computer readable form, by reason of size or otherwise,
the label shall include the name of the applicant and the title of the
invention and a reference number, and the additional information may be
provided on a container for the computer readable form with the name of
the applicant, the title of the invention, the reference number and the
additional information affixed to the container. If the computer
readable form is submitted after the date of filing under 35 U.S.C.
111, after the date of entry in the national stage under 35 U.S.C. 371
or after the time of filing, in the United States Receiving Office, an
international application under the PCT, the labels mentioned herein
must also include the date of the application number, including series
code and serial number.]
    7. Section 1.825 is proposed to be amended by revising paragraphs
(a), (b) and (d ) to read as follows:

Sec. 1.825  Amendments to or replacement of sequence listing and
computer readable copy thereof.

    (a) Any amendment to the paper copy of the ``Sequence Listing''
(Sec. 1.821(c)) must be made by the submission of substitute sheets.
Amendments must be accompanied by a statement that indicates support
for the amendment in the application, as filed, and a statement that
the substitute sheets include no new matter. Such a statement must be a
verified statement if made by a person not registered to practice
before the Office.
    (b) Any amendment to the paper copy of the ``Sequence Listing,'' in
accordance with paragraph (a) of this section, must be accompanied by a
substitute copy of the computer readable form (Sec. 1.821(e)) including
all previously submitted data with the amendment incorporated therein,
accompanied by a statement that the copy in computer readable form is
the same as the substitute copy of the ``Sequence Listing.'' Such a
statement must be a verified statement if made by a person not
registered to practice before the Office.
    (c) * * *
    (d) If, upon receipt, the computer readable form is found to be
damaged or unreadable, applicant must provide, within such time as set
by the Commissioner, a substitute copy of the data in computer readable
form accompanied by a statement that the substitute data is identical
to that originally filed. Such a statement must be a verified statement
if made by a person not registered to practice before the Office.
    8. Appendix A to subpart G is proposed to be revised to read as
follows:

[[Page 51869]]

Appendix A To Subpart G Of Part 1--Sample Sequence Listing

    [(1) GENERAL INFORMATION:
(i) APPLICANT: Doe, Joan X, Doe, John Q
(ii) TITLE OF INVENTION: Isolation and Characterization of a Gene
Encoding a Protease from Paramecium sp.
(iii) NUMBER OF SEQUENCES: 2
(iv) CORRESPONDENCE ADDRESSES:
    (A) ADDRESSEE: Smith and Jones
    (B) STREET: 123 Main Street
    (C) CITY: Smalltown
    (D) STATE: Anystate
    (E) COUNTRY: USA
    (F) ZIP: 12345
(v) COMPUTER READABLE FORM:
    (A) MEDIUM TYPE: Diskette, 3.50 inch, 800 Kb storage
    (B) COMPUTER: Apple Macintosh
    (C) OPERATING SYSTEM: Macintosh 5.0
    (D) SOFTWARE: MacWrite
(vi) CURRENT APPLICATION DATA:
    (A) APPLICATION NUMBER: 09/999,999
    (B) FILING DATE: 28-FEB-1989
    (C) CLASSIFICATION: 999/99
(vii) PRIOR APPLICATION DATA:
    (A) APPLICATION NUMBER: PCT/US88/99999
    (B) FILING DATE: 01-MAR-1988
(viii) ATTORNEY/AGENT INFORMATION:
    (A) NAME: Smith, John A
    (B) REGISTRATION NUMBER: 00001
    (C) REFERENCE/DOCKET NUMBER: 01-0001
(ix) TELECOMMUNICATIONS INFORMATION:
    (A) TELEPHONE: (909) 999-001
    (B) TELEFAX: (909) 999-0002
    (2) INFORMATION FOR SEQ ID NO: 1:
(i) SEQUENCE CHARACTERISTICS:
    (A) LENGTH: 954 base pairs
    (B) TYPE: nucleic acid
    (C) STRANDEDNESS: single
    (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: genomic DNA
(iii) HYPOTHETICAL: yes
(iv) ANTI-SENSE: no
(vi) ORIGINAL SOURCE:
    (A) ORGANISM: Paramecium sp
    (C) INDIVIDUAL/ISOLATE: XYZ2
    (G) CELL TYPE: unicellular organism
(vii) IMMEDIATE SOURCE:
    (A) LIBRARY: genomic
    (B) CLONE: Para-XYZ2/36
(x) PUBLICATION INFORMATION:
    (A) AUTHORS: Doe, Joan X, Doe, John Q
    (B) TITLE: Isolation and Characterization of a Gene Encoding a
Protease from Paramecium sp.
    (C) JOURNAL: Fictional Genes
    (D) VOLUME: I
    (E) ISSUE: 1
    (F) PAGES: 1-20
    (G) DATE: 02-MAR-1988
    (K) RELEVANT RESIDUES IN SEQ ID NO: 1: FROM 1 TO 954

BILLING CODE 3510-16-P

[[Page 51870]]

[GRAPHIC] [TIFF OMITTED] TP04OC96.056

BILLING CODE 3510-16-C

[[Page 51871]]

    (2) INFORMATION FOR SEQ ID NO: 2:
(i) SEQUENCE CHARACTERISTICS:
    (A) LENGTH: 82 amino acids
    (B) TYPE: amino acid
    (C) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein
(ix) FEATURE:
    (A) NAME/KEY: signal sequence
    (B) LOCATION: -34 to -1
    (C) IDENTIFICATION METHOD: similarity to other signal sequences,
hydrophobic
    (D) OTHER INFORMATION: expresses protease
(x) PUBLICATION INFORMATION:
    (A) AUTHORS: Doe, Joan X, Doe, John Q
    (B) TITLE: Isolation and Characterization of a Gene Encoding a
Protease from Paramecium sp.
    (C) JOURNAL: Fictional Genes
    (D) VOLUME: I
    (E) ISSUE: 1
    (F) PAGES: 1-20
    (G) DATE: 02-MAR-1988
    (H) RELEVANT RESIDUES IN SEQ ID NO:2: FROM -34 TO 48

BILLING CODE 3510-16-P

[GRAPHIC] [TIFF OMITTED] TP04OC96.057

BILLING CODE 3510-16-C

>
    <100>
    <110> Doe, Joan X, Doe, John Q
    <120> Isolation and Characterization of a Gene Encoding a
Protease from Paramecium sp.
    <130> 2
    <140>
    <141> Smith and Jones
    <142> 123 Main Street
    <143> Smalltown
    <144> Anystate
    <145> USA
    <146> 12345
    <150>
    <151> Floppy disk
    <152> IBM PC compatible
    <153> PC-DOS/MS-DOS
    <154> PatentIn Release #2.00
    <160>
    <161> 09/999,999
    <162> 28-FEB-1989
    <170>
    <171> PCT/US/88/99999
    <172> 01-MAR-1988
    <180>
    <181> Smith, John A
    <182> REGISTRATION NUMBER: 00001
    <183> 01-0001
    <190>
    <191> (909) 999-0001
    <192> (909) 999-0002
    <200> 1
    <210>
    <211> 954 base pairs
    <212> N
    <214> L
    <290>
    <291> CDS
    <292> join(275..373, 448..498, 679..774)
    <290>
    <291> mat__peptide
    <292> join(451..498, 679..774)
    <300>
    <301> Doe, Joan X, Doe, John Q
    <302> Isolation and Characterization of a Gene Encoding a
Protease from Paramecium sp.
    <303> Fictional Genes
    <304> 1
    <305> 1
    <306> 1-20
    <307> 02-MAR-1988
    <308> FROM 1 TO 957
    <400> 1

BILLING CODE 3510-16-P

[[Page 51872]]

atcgggatag tactggtcaa gaccggtgga caccggttaa ccccggttaa gtaccggtta 60
taggccattt caggccaaat gtgcccaact acgccaattg ttttgccaac ggccaacgtt 120
acgttcgtac gcacgtatgt acctaggtac ttacggacgt gactacggac acttccgtac 180
gtacgtacgt ttacgtaccc atcccaacgt aaccacagtg tggtcgcagt gtcccagtgt 240
acacagactg ccagacattc ttcacagaca cccc atg aca cca cct gaa cgt 292
Met Thr Pro Pro Glu Arg
-30
ctc ttc ctc cca agg gtg tgt ggc acc acc cta cac ctc ctc ctt ctg 340
Leu Phe Leu Pro Arg Val Cys Gly Thr Thr Leu His Leu Leu Leu Leu
-25 -20 -15
ggg ctg ctg ctg gtt ctg ctg cct ggg gcc cat gtgaggcagc aggagaatgg 393
Gly Leu Leu Leu Val Leu Leu Pro Gly Ala His
-10 -5
ggtggctcag ccaaaccttg agccctagag cccccctcaa ctctgttctc ctag ggg 450
Gly
ctc atg cat ctt gcc cac agc aac ctc aaa cct gct gct cac ctc att 498
Leu Met His Leu Ala His Ser Asn Leu Lys Pro Ala Ala His Leu Ile
1 5 10 15
gtaaacatcc acctgacctc ccagacatgt ccccaccagc tctcctccta cccctgcctc 558
aggaacccaa gcatccaccc ctctccccca acttccccca cgctaaaaaa aacagaggga 618
gcccactcct atgcctcccc ctgccatccc ccaggaactc agttgttcag tgcccacttc 678
tac ccc agc aag cag aac tca ctg ctc tgg aga gca aac acg gac cgt 726
Tyr Pro Ser Lys Gln Asn Ser Leu Leu Trp Arg Ala Asn Thr Asp Arg
20 25 30
gcc ttc ctc cag gat ggt ttc tcc ttg agc aac aat tct ctc ctg gtc 774
Ala Phe Leu Gln Asp Gly Phe Ser Leu Ser Asn Asn Ser Leu Leu Val
35 40 45
tagaaaaaat aattgatttc aagaccttct ccccattctg cctccattct gaccatttca 834
ggggtcgtca ccacctctcc tttggccatt ccaacagctc aagtcttccc tgatcaagtc 894
accggagctt tcaaagaagg aattctaggc atcccagggg acccacacct ccctgaacca 954
BILLING CODE 3510-16-P

[[Page 51873]]

[GRAPHIC] [TIFF OMITTED] TP04OC96.058

BILLING CODE 3510-16-C
<200> 2
<210>
<211> 82 amino acids
<212> A
<214> L

[[Page 51874]]

<400> 2
Met Thr Pro Pro Glu Arg Leu Phe Leu Pro Arg Val Cys Gly Thr Thr
-30 -25 -20
Leu His Leu Leu Leu Leu Gly Leu Leu Leu Val Leu Leu Pro Gly Ala
-15 -10 -5
His Gly Leu Met His Leu Ala His Ser Asn Leu Lys Pro Ala Ala His
1 5 10
Leu Ile Tyr Pro Ser Lys Gln Asn Ser Leu Leu Trp Arg Ala Asn Thr
15 20 25 30
Asp Arg Ala Phe Leu Gln Asp Gly Phe Ser Leu Ser Asn Asn Ser Leu
35 40 45
Leu Val <
BILLING CODE 3510-16-P
[GRAPHIC] [TIFF OMITTED] TP04OC96.059

BILLING CODE 3510-16-C

    9. Appendix B to Subpart G is proposed to be removed.

[Appendix B To Subpart G of Part 1--Headings For Information Items
In Sec. 1.823

    (1) GENERAL INFORMATION:
(i) APPLICANT:
(ii) TITLE OF INVENTION:
(iii) NUMBER OF SEQUENCES:
(iv) CORRESPONDENCE ADDRESS:
    (A) ADDRESSEE:
    (B) STREET:
    (C) CITY:
    (D) STATE:
    (E) COUNTRY:
    (F) ZIP:
(v) COMPUTER READABLE FORM:
    (A) MEDIUM TYPE:
    (B) COMPUTER:
    (C) OPERATING SYSTEM:

[[Page 51875]]

    (D) SOFTWARE
(vi) CURRENT APPLICATION DATA:
    (A) APPLICATION NUMBER:
    (B) FILING DATE:
    (C) CLASSIFICATION:
(vii) PRIOR APPLICATION DATA:
    (A) APPLICATION NUMBER:
    (B) FILING DATE:
(viii) ATTORNEY/AGENT INFORMATION:
    (A) NAME:
    (B) REGISTRATION NUMBER:
    (C) REFERENCE/DOCKET NUMBER:
(ix) TELECOMMUNICATIONS INFORMATION:
    (A) TELEPHONE:
    (B) TELEFAX:
    (C) TELEX:
    (2) INFORMATION FOR SEQ ID NO: X:
(i) SEQUENCE CHARACTERISTICS:
    (A) LENGTH:

    (C) STRANDEDNESS:
    (D) TOPOLOGY:
(ii) MOLECULE TYPE:
    --Genomic RNA;
    --Genomic DNA;
    --mRNA;
    --tRNA;
    --rRNA;
    --snRNA;
    --scRNA;
    --preRNA;
    --cDNA to genomic RNA;
    --cDNA to mRNA;
    --cDNA to tRNA;
    --cDNA to rRNA;
    --cDNA to snRNA;
    --cDNA to scRNA;
    --Other nucleic acid;
    (A) DESCRIPTION:
    --protein and
    --peptide.
(iii) HYPOTHETICAL:
(iv) ANTI-SENSE:
(v) FRAGMENT TYPE:
(vi) ORIGINAL SOURCE:
    (A) ORGANISM:
    (B) STRAIN:
    (C) INDIVIDUAL ISOLATE:
    (D) DEVELOPMENTAL STAGE:
    (E) HAPLOTYPE:
    (F) TISSUE TYPE:
    (G) CELL TYPE:
    (H) CELL LINE:
    (I) ORGANELLE:
(vii) IMMEDIATE SOURCE:
    (A) LIBRARY:
    (B) CLONE:
(viii) POSITION IN GENOME:
    (A) CHROMOSOME/SEGMENT:
    (B) MAP POSITION:
    (C) UNITS:
(ix) FEATURE:
    (A) NAME/KEY:
    (B) LOCATION:
    (C) IDENTIFICATION METHOD:
    (D) OTHER INFORMATION:
(x) PUBLICATION INFORMATION:
    (A) AUTHORS:
    (B) TITLE:
    (C) JOURNAL:
    (D) VOLUME:
    (E) ISSUE:
    (F) PAGES:
    (G) DATE:
    (H) DOCUMENT NUMBER:
    (I) FILING DATES:
    (J) PUBLICATION DATE:
    (K) RELEVANT RESIDUES:
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:X: ]

    Dated: September 23, 1996.
Bruce A. Lehman,
Assistant Secretary of Commerce and Commissioner of Patents and Trademarks.
[FR Doc. 96-25074 Filed 10-3-96; 8:45 am]
BILLING CODE 3510-16-P