uspto.gov
Skip over navigation

2422    Nucleotide and/or Amino Acid Sequence Disclosures in Patent Applications [R-07.2015]

37 CFR 1.821  Nucleotide and/or amino acid sequence disclosures in patent applications.

  • (a) Nucleotide and/or amino acid sequences as used in §§ 1.821 through 1.825 are interpreted to mean an unbranched sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides. Branched sequences are specifically excluded from this definition. Sequences with fewer than four specifically defined nucleotides or amino acids are specifically excluded from this section. “Specifically defined” means those amino acids other than “Xaa” and those nucleotide bases other than “n” defined in accordance with the World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25: Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications (1998), including Tables 1 through 6 in Appendix 2, herein incorporated by reference. (Hereinafter “WIPO Standard ST.25 (1998)''). This incorporation by reference was approved by the Director of the Federal Register in accordance with 5 U.S.C. 552(a) and 1 CFR part 51. Copies of WIPO Standard ST.25 (1998) may be obtained from the World Intellectual Property Organization; 34 chemin des Colombettes; 1211 Geneva 20 Switzerland. Copies of ST.25 may be inspected at the Patent Search Room; Crystal Plaza 3, Lobby Level; 2021 South Clark Place; Arlington, VA 22202. Copies may also be inspected at the Office of the Federal Register, 800 North Capitol Street, NW, Suite 700, Washington, DC. Nucleotides and amino acids are further defined as follows:
    • (1) Nucleotides: Nucleotides are intended to embrace only those nucleotides that can be represented using the symbols set forth in WIPO Standard ST.25 (1998), Appendix 2, Table 1. Modifications, e.g., methylated bases, may be described as set forth in WIPO Standard ST.25 (1998), Appendix 2, Table 2, but shall not be shown explicitly in the nucleotide sequence.
    • (2) Amino acids: Amino acids are those L-amino acids commonly found in naturally occurring proteins and are listed in WIPO Standard ST.25 (1998), Appendix 2, Table 3. Those amino acid sequences containing D-amino acids are not intended to be embraced by this definition. Any amino acid sequence that contains post-translationally modified amino acids may be described as the amino acid sequence that is initially translated using the symbols shown in WIPO Standard ST.25 (1998), Appendix 2, Table 3 with the modified positions; e.g., hydroxylations or glycosylations, being described as set forth in WIPO Standard ST.25 (1998), Appendix 2, Table 4, but these modifications shall not be shown explicitly in the amino acid sequence. Any peptide or protein that can be expressed as a sequence using the symbols in WIPO Standard ST.25 (1998), Appendix 2, Table 3 in conjunction with a description in the Feature section to describe, for example, modified linkages, cross links and end caps, non-peptidyl bonds, etc., is embraced by this definition.
  • (b) Patent applications which contain disclosures of nucleotide and/or amino acid sequences, in accordance with the definition in paragraph (a) of this section, shall, with regard to the manner in which the nucleotide and/or amino acid sequences are presented and described, conform exclusively to the requirements of §§ 1.821 through 1.825.
  • (c) Patent applications which contain disclosures of nucleotide and/or amino acid sequences must contain, as a separate part of the disclosure, a paper copy disclosing the nucleotide and/or amino acid sequences and associated information using the symbols and format in accordance with the requirements of §§ 1.822 and 1.823. This paper copy is hereinafter referred to as the “Sequence Listing.” Each sequence disclosed must appear separately in the “Sequence Listing.” Each sequence set forth in the “Sequence Listing” shall be assigned a separate sequence identifier. The sequence identifiers shall begin with 1 and increase sequentially by integers. If no sequence is present for a sequence identifier, the code “000” shall be used in place of the sequence. The response for the numeric identifier <160> shall include the total number of SEQ ID NOs, whether followed by a sequence or by the code “000.”
  • (d) Where the description or claims of a patent application discuss a sequence that is set forth in the “Sequence Listing” in accordance with paragraph (c) of this section, reference must be made to the sequence by use of the sequence identifier, preceded by “SEQ ID NO:” in the text of the description or claims, even if the sequence is also embedded in the text of the description or claims of the patent application.
  • (e) A copy of the “Sequence Listing” referred to in paragraph (c) of this section must also be submitted in computer readable form in accordance with the requirements of § 1.824. The computer readable form is a copy of the “Sequence Listing” and will not necessarily be retained as a part of the patent application file. If the computer readable form of a new application is to be identical with the computer readable form of another application of the applicant on file in the Patent and Trademark Office, reference may be made to the other application and computer readable form in lieu of filing a duplicate computer readable form in the new application if the computer readable form in the other application was compliant with all of the requirements of these rules. The new application shall be accompanied by a letter making such reference to the other application and computer readable form, both of which shall be completely identified. In the new application, applicant must also request the use of the compliant computer readable “Sequence Listing” that is already on file for the other application and must state that the paper copy of the “Sequence Listing” in the new application is identical to the computer readable copy filed for the other application.
  • (f) In addition to the paper copy required by paragraph (c) of this section and the computer readable form required by paragraph (e) of this section, a statement that the content of the paper and computer readable copies are the same must be submitted with the computer readable form, e.g., a statement that “the information recorded in computer readable form is identical to the written sequence listing.”
  • (g) If any of the requirements of paragraphs (b) through (f) of this section are not satisfied at the time of filing under 35 U.S.C. 111(a) or at the time of entering the national stage under 35 U.S.C. 371, applicant will be notified and given a period of time within which to comply with such requirements in order to prevent abandonment of the application. Any submission in reply to a requirement under this paragraph must be accompanied by a statement that the submission includes no new matter.
  • (h) If any of the requirements of paragraphs (b) through (f) of this section are not satisfied at the time of filing an international application under the Patent Cooperation Treaty (PCT), which application is to be searched by the United States International Searching Authority or examined by the United States International Preliminary Examining Authority, applicant will be sent a notice necessitating compliance with the requirements within a prescribed time period. Any submission in reply to a requirement under this paragraph must be accompanied by a statement that the submission does not include matter which goes beyond the disclosure in the international application as filed. If applicant fails to timely provide the required computer readable form, the United States International Searching Authority shall search only to the extent that a meaningful search can be performed without the computer readable form and the United States International Preliminary Examining Authority shall examine only to the extent that a meaningful examination can be performed without the computer readable form.
I.INCORPORATION BY REFERENCE OF WIPO ST.25 (1998) IN 37 CFR 1.821

37 CFR 1.821 incorporates by reference the World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25 (1998), including Tables 1 through 6 of Appendix 2. Copies may be obtained from the World Intellectual Property Organization; 34 chemin des Colombettes; 1211 Geneva 20 Switzerland. Copies may also be inspected at the Office of the Federal Register, 800 North Capitol Street, NW, Suite 700, Washington, DC 20408. These tables are reproduced below. The 1998 version of WIPO ST.25 is available online at www.wipo.int/standards/en/archives.html. Note that the standard was revised in December 2009, and the current version is available online at www.wipo.int/export/sites/www/standards/en/pdf/03-25-01.pdf.

WIPO Standard ST.25 (1998), Appendix 2, Table 1, provides that the bases of a nucleotide sequence should be represented using the following one-letter symbol for nucleotide sequence characters:

Table 1: List of Nucleotides
Symbol Meaning Origin of designation
a a adenine
g g guanine
c c cytosine
t t thymine
u u uracil
r g or a purine
y t/u or c pyrimidine
m a or c amino
k g or t/u keto
s g or c strong interactions 3H-bonds
w a or t/u weak interactions 2H-bonds
b g or c or t/u not a
d a or g or t/u not c
h a or c or t/u not g
v a or g or c not t, not u
n a or g or c or t/u, unknown, or other any

WIPO Standard ST.25 (1998), Appendix 2, Table 2, provides that modified bases may be represented as the corresponding unmodified bases in the sequence itself, if the modification is further described in numeric identifier <223> of the Feature section of the sequence listing. The symbols from the list below may be used in the description (i.e., the specification and drawing, or in the Feature section of the sequence listing) but these symbols may not be used in the sequence itself. Modifications not listed in Table 2 may also be represented as the corresponding unmodified base in the sequence itself, and the modification should be described using its full chemical name in the Feature section of the sequence listing.

Table 2: List of Modified Nucleotides
Symbol Meaning
ac4c 4-acetylcytidine
chm5u 5-(carboxyhydroxymethyl)uridine
cm 2'-O-methylcytidine
cmnm5s2u 5-carboxymethylaminomethyl-2-thiouridine
cmnm5u 5-carboxymethylaminomethyluridine
d dihydrouridine
fm 2'-O-methylpseudouridine
gal q beta, D-galactosylqueuosine
gm 2'-O-methylguanosine
i inosine
i6a N6-isopentenyladenosine
m1a 1-methyladenosine
m1f 1-methylpseudouridine
m1g 1-methylguanosine
m1i 1-methylinosine
m22g 2,2-dimethylguanosine
m2a 2-methyladenosine
m2g 2-methylguanosine
m3c 3-methylcytidine
m5c 5-methylcytidine
m6a N6-methyladenosine
m7g 7-methylguanosine

mam5u 5-methylaminomethyluridine
mam5s2u 5-methoxyaminomethyl-2-thiouridine
man q beta, D-mannosylqueuosine
mcm5s2u 5-methoxycarbonylmethyl-2-thiouridine
mcm5u 5-methoxycarbonylmethyluridine
mo5u 5-methoxyuridine
ms2i6a 2-methylthio-N6-isopentenyladenosine
ms2t6a N-((9-beta-D-ribofuranosyl-2-methylthiopurine -6-yl)carbamoyl)threonine
mt6a N-((9-beta-D-ribofuranosylpurine-6-yl) N-methylcarbamoyl)threonine
mv uridine-5-oxyacetic acid-methylester
o5u uridine-5-oxyacetic acid
osyw wybutoxosine
p pseudouridine
q queuosine
s2t 5-methyl-2-thiouridine
s2c 2-thiocytidine
s2t 5-methyl-2-thiouridine
s2u 2-thiouridine
s4u 4-thiouridine
t 5-methyluridine
t6a N-((9-beta-D-ribofuranosylpurine-6-yl)- carbamoyl)threonine
tm 2'-O-methyl-5-methyluridine
um 2'-O-methyluridine
yw wybutosine
x 3-(3-amino-3-carboxy-propyl)uridine, (acp3)u

WIPO Standard ST.25 (1998), Appendix 2, Table 3, provides that the amino acids should be represented using the following three-letter symbols with the first letter as a capital.

Table 3: List of Amino Acids
Symbol Meaning
Ala Alanine
Cys Cysteine
Asp Aspartic Acid
Glu Glutamic Acid
Phe Phenylalanine
Gly Glycine
His Histidine
Ile Isoleucine
Lys Lysine
Leu Leucine
Met Methionine
Asn Asparagine
Pro Proline
Gln Glutamine
Arg Arginine
Ser Serine
Thr Threonine
Val Valine
Trp Tryptophan
Tyr Tyrosine
Asx Asp or Asn
Glx Glu or Gln
Xaa unknown or other

WIPO Standard ST.25 (1998), Appendix 2, Table 4, provides that modified and unusual amino acids may be represented as the corresponding unmodified amino acids in the sequence itself if the modification is further described in numeric identifier <223> of the Feature section of the sequence listing. The symbols from the list below may be used in the description (i.e., the specification and drawings, or in the Feature section of the sequence listing) but these symbols may not be used in the sequence itself. Modifications not listed in Table 4 may also be represented as the corresponding unmodified amino acid in the sequence itself, and the modification should be described using its full chemical name in the Feature section of the sequence listing.

Table 4: List of Modified and Unusual Amino Acids
Symbol Meaning
Aad 2-Aminoadipic acid
bAad 3-Aminoadipic acid
bAla beta-Alanine, beta-Aminopropionic acid
Abu 2-Aminobutyric acid
4Abu 4-Aminobutyric acid, piperidinic acid
Acp 6-Aminocaproic acid
Ahe 2-Aminoheptanoic acid
Aib 2-Aminoisobutyric acid
bAib 3-Aminoisobutyric acid
Apm 2-Aminopimelic acid
Dbu 2,4-Diaminobutyric acid
Des Desmosine
Dpm 2,2' -Diaminopimelic acid
Dpr 2,3-Diaminopropionic acid
EtGly N-Ethylglycine
EtAsn N-Ethylasparagine
Hyl Hydroxylysine
aHyl allo-Hydroxylysine
3Hyp 3-Hydroxyproline
4Hyp 4-Hydroxyproline
Ide Isodesmosine
aIle allo-Isoleucine
MeGly N-Methylglycine, sarcosine
MeIle N-Methylisoleucine
MeLys 6-N-Methyllysine
MeVal N-Methylvaline
Nva Norvaline
Nle Norleucine
Orn Ornithine

WIPO Standard ST.25 (1998), Appendix 2, Table 5, provides for feature keys related to DNA sequences.

Table 5: List of Feature Keys Related to Nucleotide Sequences
Key Description
allele a related individual or strain contains stable, alternative forms of the same gene which differs from the presented sequence at this location (and perhaps others)
attenuator (1) region of DNA at which regulation of termination of transcription occurs, which controls the expression of some bacterial operons; (2) sequence segment located between the promoter and the first structural gene that causes partial termination of transcription
C_region constant region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; includes one or more exons depending on the particular chain
CAAT_signal CAAT box; part of a conserved sequence located about 75 bp up-stream of the start point of eukaryotic transcription units which may be involved in RNA polymerase binding; consensus=GG (C or T) CAATCT
CDS coding sequence; sequence of nucleotides that corresponds with the sequence of amino acids in a protein (location includes stop codon); feature includes amino acid conceptual translation
conflict independent determinations of the “same” sequence differ at this site or region
D-loop displacement loop; a region within mitochondrial DNA in which a short stretch of RNA is paired with one strand of DNA, displacing the original partner DNA strand in this region; also used to describe the displacement of a region of one strand of duplex DNA by a single stranded invader in the reaction catalyzed by RecA protein
D-segment diversity segment of immunoglobulin heavy chain, and T-cell receptor beta chain
enhancer a cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter
exon region of genome that codes for portion of spliced mRNA; may contain 5'UTR, all CDSs, and 3'UTR
GC_signal GC box; a conserved GC-rich region located upstream of the start point of eukaryotic transcription units which may occur in multiple copies or in either orientation; consensus=GGGCGG
gene region of biological interest identified as a gene and for which a name has been assigned
iDNA intervening DNA; DNA which is eliminated through any of several kinds of recombination
intron a segment of DNA that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it
J_segment joining segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains
LTR long terminal repeat, a sequence directly repeated at both ends of a defined sequence, of the sort typically found in retroviruses
mat_peptide mature peptide or protein coding sequence; coding sequence for the mature or final peptide or protein product following post-translational modification; the location does not include the stop codon (unlike the corresponding CDS)
misc_binding site in nucleic acid which covalently or non-covalently binds another moiety that cannot be described by any other Binding key (primer_bind or protein_bind)
misc_difference feature sequence is different from that presented in the entry and cannot be described by any other Difference key (conflict, unsure, old_sequence, mutation, variation, allele, or modified_base)
misc_feature region of biological interest which cannot be described by any other feature key; a new or rare feature
misc_recomb site of any generalized, site-specific or replicative recombination event where there is a breakage and reunion of duplex DNA that cannot be described by other recombination keys (iDNA and virion) or qualifiers of source key (/insertion_seq, /transposon, /proviral)
misc_RNA any transcript or RNA product that cannot be defined by other RNA keys (prim_transcript, precursor_RNA, mRNA, 5'clip, 3'clip, 5'UTR, 3'UTR, exon, CDS, sig_peptide, transit_peptide, mat_peptide, intron, polyA_site, rRNA, tRNA, scRNA, and snRNA)
misc_signal any region containing a signal controlling or altering gene function or expression that cannot be described by other Signal keys (promoter, CAAT_signal, TATA_signal, -35_signal, -10_signal, GC_signal, RBS, polyA_signal, enhancer, attenuator, terminator, and rep_origin)
misc_structure any secondary or tertiary structure or conformation that cannot be described by other Structure keys (stem_loop and D-loop)
modified_base the indicated nucleotide is a modified nucleotide and should be substituted for by the indicated molecule (given in the mod_base qualifier value)
mRNA messenger RNA; includes 5' untranslated region (5'UTR), coding sequences (CDS, exon) and 3' untranslated region (3'UTR)
mutation a related strain has an abrupt, inheritable change in the sequence at this location
N_region extra nucleotides inserted between rearranged immunoglobulin segments
old_sequence the presented sequence revises a previous version of the sequence at this location
polyA_signal recognition region necessary for endonuclease cleavage of an RNA transcript that is followed by polyadenylation; consensus=AATAAA
polyA_site site on an RNA transcript to which will be added adenine residues by post-transcriptional polyadenylation
precursor_RNA any RNA species that is not yet the mature RNA product; may include 5' clipped region (5'clip), 5' untranslated region (5'UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3'UTR), and 3' clipped region (3'clip)
prim_transcript primary (initial, unprocessed) transcript; includes 5' clipped region (5'clip), 5' untranslated region (5'UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3'UTR), and 3' clipped region (3'clip)
primer_bind non-covalent primer binding site for initiation of replication, transcription, or reverse transcription; includes site(s) for synthetic, for example, PCR primer elements
promoter region on a DNA molecule involved in RNA polymerase binding to initiate transcription
protein_bind non-covalent protein binding site on nucleic acid
RBS ribosome binding site
repeat_region region of genome containing repeating units
repeat_unit single repeat element
rep_origin origin of replication; starting site for duplication of nucleic acid to give two identical copies
rRNA mature ribosomal RNA; the RNA component of the ribonucleoprotein particle (ribosome) which assembles amino acids into proteins
S_region switch region of immunoglobulin heavy chains; involved in the rearrangement of heavy chain DNA leading to the expression of a different immunoglobulin class from the same B-cell
satellite many tandem repeats (identical or related) of a short basic repeating unit; many have a base composition or other property different from the genome average that allows them to be separated from the bulk (main band) genomic DNA
scRNA small cytoplasmic RNA; any one of several small cytoplasmic RNA molecules present in the cytoplasm and (sometimes) nucleus of a eukaryote
sig_peptide signal peptide coding sequence; coding sequence for an N-terminal domain of a secreted protein; this domain is involved in attaching nascent polypeptide to the membrane; leader sequence
snRNA small nuclear RNA; any one of many small RNA species confined to the nucleus; several of the snRNAs are involved in splicing or other RNA processing reactions
source identifies the biological source of the specified span of the sequence; this key is mandatory; every entry will have, as a minimum, a single source key spanning the entire sequence; more than one source key per sequence is permissable
stem_loop hairpin; a double-helical region formed by base-pairing between adjacent (inverted) complementary sequences in a single strand of RNA or DNA
STS Sequence Tagged Site; short, single-copy DNA sequence that characterizes a mapping landmark on the genome and can be detected by PCR; a region of the genome can be mapped by determining the order of a series of STSs
TATA_signal TATA box; Goldberg-Hogness box; a conserved AT-rich septamer found about 25 bp before the start point of each eukaryotic RNA polymerase II transcript unit which may be involved in positioning the enzyme for correct initiation; consensus=TATA(A or T)A(A or T)
terminator sequence of DNA located either at the end of the transcript or adjacent to a promoter region that causes RNA polymerase to terminate transcription; may also be site of binding of repressor protein
transit_peptide transit peptide coding sequence; coding sequence for an N-terminal domain of a nuclear-encoded organellar protein; this domain is involved in post-translational import of the protein into the organelle
tRNA mature transfer RNA, a small RNA molecule (75-85 bases long) that mediates the translation of a nucleic acid sequence into an amino acid sequence
unsure author is unsure of exact sequence in this region
V_region variable region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for the variable amino terminal portion; can be made up from V_segments, D_segments, N_regions, and J_segments
V_segment variable segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for most of the variable region (V_region) and the last few amino acids of the leader peptide
variation a related strain contains stable mutations from the same gene (for example, RFLPs, polymorphisms, etc.) which differ from the presented sequence at this location (and possibly others)
3'clip 3'-most region of a precursor transcript that is clipped off during processing
3'UTR region at the 3' end of a mature transcript (following the stop codon) that is not translated into a protein
5'clip 5'-most region of a precursor transcript that is clipped off during processing
5'UTR region at the 5' end of a mature transcript (preceding the initiation codon) that is not translated into a protein
-10_signal pribnow box; a conserved region about 10 bp upstream of the start point of bacterial transcription units which may be involved in binding RNA polymerase; consensus=TAtAaT
-35_signal a conserved hexamer about 35 bp upstream of the start point of bacterial transcription units; consensus=TTGACa [ ] or TGTTGACA [ ]

WIPO Standard ST.25 (1998), Appendix 2, Table 6 provides for feature keys related to protein sequences.

Table 6: List of Feature Keys Related to Protein Sequences
Key Description
CONFLICT different papers report differing sequences
VARIANT authors report that sequence variants exist
VARSPLIC description of sequence variants produced by alternative splicing
MUTAGEN site which has been experimentally altered
MOD_RES post-translational modification of a residue
ACETYLATION N-terminal or other
AMIDATION generally at the C-terminal of a mature active peptide
BLOCKED undetermined N- or C-terminal blocking group
FORMYLATION of the N-terminal methionine
GAMMA-CARBOXYGLUTAMIC ACID HYDROXYLATION of asparagine, aspartic acid, proline or lysine
METHYLATION generally of lysine or arginine
PHOSPHORYLATION of serine, threonine, tyrosine, aspartic acid or histidine
PYRROLIDONE CARBOXYLIC ACID N-terminal glutamate which has formed an internal cyclic lactam
SULFATATION generally of tyrosine
LIPID covalent binding of a lipidic moiety
MYRISTATE myristate group attached through an amide bond to the N-terminal glycine residue of the mature form of a protein or to an internal lysine residue
PALMITATE palmitate group attached through a thioether bond to a cysteine residue or through an ester bond to a serine or threonine residue
FARNESYL farnesyl group attached through a thioether bond to a cysteine residue
GERANYL-GERANYL geranyl-geranyl group attached through a thioether bond to a cysteine residue
GPI-ANCHOR glycosyl-phosphatidylinositol (GPI) group linked to the alpha-carboxyl group of the C-terminal residue of the mature form of a protein
N-ACYL DIGLYCERIDE N-terminal cysteine of the mature form of a prokaryotic lipoprotein with an amide-linked fatty acid and a glyceryl group to which two fatty acids are linked by ester linkages
DISULFID disulfide bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by an intra-chain disulfide bond; if the ‘FROM’ and ‘TO’ endpoints are identical, the disulfide bond is an interchain one and the description field indicates the nature of the cross-link
THIOLEST thiolester bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by the thiolester bond
THIOETH thioether bond; the ‘FROM’ and ‘TO’ endpoints represent the two residues which are linked by the thioether bond
CARBOHYD glycosylation site; the nature of the carbohydrate (if known) is given in the description field
METAL binding site for a metal ion; the description field indicates the nature of the metal
BINDING binding site for any chemical group (co-enzyme, prosthetic group, etc.); the chemical nature of the group is given in the description field
SIGNAL extent of a signal sequence (prepeptide)
TRANSIT extent of a transit peptide (mitochondrial, chloroplastic, or for a microbody)
PROPEP extent of a propeptide
CHAIN extent of a polypeptide chain in the mature protein
PEPTIDE extent of a released active peptide
DOMAIN extent of a domain of interest on the sequence; the nature of that domain is given in the description field
CA_BIND extent of a calcium-binding region
DNA_BIND extent of a DNA-binding region
NP_BIND extent of a nucleotide phosphate binding region; the nature of the nucleotide phosphate is indicated in the description field
TRANSMEM extent of a transmembrane region
ZN_FING extent of a zinc finger region
SIMILAR extent of a similarity with another protein sequence; precise information, relative to that sequence is given in the description field
REPEAT extent of an internal sequence repetition
HELIX secondary structure: Helices, for example, Alpha-helix, 3(10) helix, or Pi-helix
STRAND secondary structure: Beta-strand, for example, Hydrogen bonded beta-strand, or Residue in an isolated beta-bridge
TURN secondary structure: Turns, for example, H-bonded turn (3-turn, 4-turn, or 5-turn)
ACT_SITE amino acid(s) involved in the activity of an enzyme
SITE any other interesting site on the sequence
INIT_MET the sequence is known to start with an initiator methionine
NON_TER the residue at an extremity of the sequence is not the terminal residue; if applied to position 1, this signifies that the first position is not the N-terminus of the complete molecule; if applied to the last position, it signifies that this position is not the C-terminus of the complete molecule; there is no description field for this key
NON_CONS non consecutive residues; indicates that two residues in a sequence are not consecutive and that there are a number of unsequenced residues between them
UNSURE uncertainties in the sequence; used to describe region(s) of a sequence for which the authors are unsure about the sequence assignment
II.FILING INTERNATIONALLY

The requirements of 37 CFR 1.821 through 37 CFR 1.825 are the result of an effort to harmonize the USPTO requirements with international sequence listing requirements to the extent possible. The requirements of 37 CFR 1.821 through 37 CFR 1.825 substantially correspond to the requirements of WIPO Standard ST.25. PatentIn Version 3.5.1 software (see MPEP § 2430) generates sequence listings that meet all of the requirements of WIPO Standard ST.25. The requirements of 37 CFR 1.821 through 37 CFR 1.825, however, are less stringent than the requirements of WIPO Standard ST.25. Thus, applicants who wish to file in countries which adhere to WIPO Standard ST.25 should consider the following when not using PatentIn Version 3.5.1:

  • (A) The data in numeric identifier <221> must use selections from Tables 5 and 6 of WIPO Standard ST.25 (2009) to comply with that standard. The terms from these Tables are considered language neutral vocabulary;
  • (B) Where the sequence listing forming part of the international application contains free text, e.g., free text in numeric identifier <223>, any such free text shall be repeated in the main part of the description in the language thereof. It is recommended that the free text in the language of the main part of the description be put in a specific section of the description called “Sequence Listing Free Text;
  • (C) A sequence listing filed after the international filing date is generally not considered to be part of the disclosure and usually will not be published as part of the international application publication (see PCT Article 34 and PCT Rules 26 and 91 for exceptions);
  • (D) Paragraphs 4(v) and 4bis(iv) of WIPO Standard ST.25 (2009) requires the specific wording “the information recorded in electronic form furnished under PCT Rule 13ter is identical to the sequence listing”; and
  • (E) WIPO Standard ST.25 (2009), paragraph 24, requires a blank line between numeric identifiers in the sequence listing when the digit in the first or second position of the numeric identifier changes.

Requirements related to the submission of sequence listings may also differ between filing in the United States and filing internationally. For example, where an international application is filed in paper, the sequence listing part of the international application must also be provided in paper, although the search copy must be filed in electronic form, e.g. on a CD or, in the RO/US, as an ASCII text file via EFS-Web. Also, any tables filed in an international application must be an integral part of the application, i.e., cannot be submitted as a separate file in text format.

2422.01   Nucleotide and/or Amino Acids Disclosures Requiring a Sequence Listing [R-07.2015]

I. LENGTH THRESHOLDS

37 CFR 1.821(a) presents a definition for “nucleotide and/or amino acid sequences.” This definition sets forth limits, in terms of numbers of amino acids and/or numbers of nucleotides, at or above which compliance with the sequence rules is required. Nucleotide and/or amino acid sequences as used in 37 CFR 1.821 through 37 CFR 1.825 are interpreted to mean an unbranched sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides. Branched sequences are specifically excluded from this definition. Sequences with fewer than ten specifically defined nucleotides or four specifically defined amino acids are specifically excluded from this section. “Specifically defined” means those amino acids other than “Xaa” and those nucleotide bases other than “n” defined in accordance with the World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25: Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications (1998), including Tables 1 through 6 in Appendix 2 (see MPEP § 2422).

The limit of four or more amino acids was established for consistency with limits in place for industry database collections whereas the limit of ten or more nucleotides, while lower than certain industry database limits, was established to encompass those nucleotide sequences to which the smallest probe will bind in a stable manner.

II.REPRESENTATION OF NUCLEIC ACIDS AND AMINO ACIDS

37 CFR 1.821(a)(1) and 37 CFR 1.821(a)(2) present further definitions for those nucleotide and amino acid sequences that are intended to be embraced by the sequence rules. Situations in which the applicability of the rules is in issue will be resolved on a case-by-case basis.

Nucleotide sequences are further limited to those that can be represented by the symbols set forth in 37 CFR 1.822(b), which incorporates by reference WIPO Standard ST.25 (1998), Appendix 2, Table 1 (see MPEP § 2422). The presence of other than typical 5' to 3' phosphodiester linkages in a nucleotide sequence does not render the rules inapplicable. The Office does not want to exclude linkages of the type commonly found in naturally occurring nucleotides, e.g., eukaryotic end capped sequences.

Amino acid sequences are further limited to those listed in 37 CFR 1.822(b), which incorporates by reference WIPO Standard ST.25 (1998), Appendix 2, Table 3 (see MPEP § 2422), and those L-amino acids that are commonly found in naturally occurring proteins. The presence of one or more D-amino acids in a sequence will exclude that sequence from the scope of the rules. Voluntary compliance is, however, encouraged in these situations; the symbol “Xaa” can be used to represent D-amino acids. The sequence rules embrace “[a]ny peptide or protein that can be expressed as a sequence using the symbols in WIPO Standard ST.25 (1998), Appendix 2, Table 3 in conjunction with a description in the Feature section to describe, for example, modified linkages, cross links and end caps, non-peptidyl bonds, etc.” 37 CFR 1.821(a)(2).

With regard to amino acid sequences, the use of the terms “peptide or protein” implies, however, that the amino acids in a given sequence are linked by at least three consecutive peptide bonds. Accordingly, an amino acid sequence is not excluded from the scope of the rules merely due to the presence of a single non-peptidyl bond. If an amino acid sequence can be represented by a string of amino acid abbreviations, with reference, where necessary, to a features table to explain modifications in the sequence, the sequence comes within the scope of the rules. However, the rules are not intended to encompass the subject matter that is generally referred to as synthetic resins.

III. SEQUENCES DISCLOSED IN APPLICATION TEXT

The requirement for compliance in 37 CFR 1.821(c) is directed to “disclosures of nucleotide and/or amino acid sequences.” (Emphasis added.) All sequence information, whether claimed or not, that meets the length thresholds in 37 CFR 1.821(a) is subject to the rules. The goal of the Office is to build a comprehensive database that can be used for, inter alia, the purpose of assessing the prior art. It is therefore essential that all sequence information, whether only disclosed or also claimed, be included in the database. In those instances in which prior art sequences are only referred to in a given application by name and a publication or accession reference, they need not be included as part of the sequence listing, unless the referred-to sequence is “essential material” per MPEP § 608.01(p). However, if the applicant presents the sequence as a string of particular nucleotide bases or amino acids, it is necessary to include the sequence in the sequence listing regardless of whether the applicant considers the sequence to be prior art. In general, any sequence that is disclosed and/or claimed as a sequence, i.e., as a string of particular nucleotide bases or amino acids, and that otherwise meets the criteria of 37 CFR 1.821(a), must be set forth in the sequence listing.

IV. VARIANTS OF A PRESENTED SEQUENCE

It is generally acceptable to present a single, primary sequence in the specification and sequence listing by enumeration of its residues in accordance with the sequence rules (“primary sequence”) and to discuss and/or claim variants of that primary sequence without presenting each variant as a separate sequence in the sequence listing. However, the primary sequence should be annotated in the sequence listing to reflect such variants. By way of example only, the following types of sequence disclosures would be treated as noted herein by the Office. With respect to a primary sequence and “conservatively modified variants thereof,” the sequences may be described as SEQ ID NO:X (the primary sequence) and “conservatively modified variants thereof,” if desired. With respect to a sequence that “may be deleted at the C-terminus by 1, 2, 3, 4, or 5 residues,” all of the implied variations do not need to be included in the sequence listing. In this latter example, only the sequence without deletions needs to be included in the sequence listing, however applicant is encouraged to annotate the sequence to indicate that deletions have been made at the C-terminus by 1, 2, 3, 4, or 5 residues.

The Office's database will only contain the unmodified sequence. It is strongly recommended that any sequences appearing in the claims, or sequences that are considered essential to understanding the invention, be included in the sequence listing as a separate sequence.

V. SEQUENCE IDENTIFIER

37 CFR 1.821(c) requires that each disclosed nucleic acid or amino acid sequence in the application appear separately in the sequence listing, with each sequence further being assigned a sequence identifier, referred to as “SEQ ID NO.” The sequence identifiers must begin with 1 and increase sequentially by integers. The requirement for sequence identifiers, at a minimum, requires that each sequence be assigned a different number for purposes of identification. However, where practical and for ease of reference, sequences should be presented in the sequence listing in numerical order and in the order in which they are discussed in the application.

37 CFR 1.821(d) requires that where the description or claims of a patent application discuss a sequence that is set forth in the sequence listing, a reference to the sequence identifier of that sequence is required at all occurrences, even if in the text of the description or claims that sequence is set forth by enumeration of its residues. This requirement is also intended to permit references elsewhere in the application (e.g., specification, claims, or drawings) to sequences set forth in the sequence listing by the use of assigned sequence identifiers without repeating the sequence. Sequence identifiers can also be used to discuss and/or claim parts or fragments of a properly presented sequence. For example, language such as “residues 14 to 243 of SEQ ID NO:23” is permissible and the fragment need not be separately presented in the sequence listing. Where a sequence that meets the length thresholds of 37 CFR 1.821(a) is disclosed by enumeration of its residues anywhere in an application, it must be presented in a sequence listing in a manner that complies with the requirements of the sequence rules.

The rules do not alter, in any way, the requirements of 35 U.S.C. 112. The implementation of the rules has had no effect on disclosure and/or claiming requirements. The rules, in general, or the use of sequence identifiers throughout the specification and claims, specifically, should not raise any issues under 35 U.S.C. 112(a) or 35 U.S.C. 112(b). The use of sequence identifiers (SEQ ID NO:X) only provides a shorthand way for applicants to discuss and claim their inventions. These identification numbers do not in any way restrict the manner in which an invention can be claimed.

2422.02   The Requirement for Exclusive Conformance; Sequences Presented in Drawing Figures [R-07.2015]

For all applications that disclose nucleic acid and/or amino acid sequences that fall within the definition set forth in 37 CFR 1.821(a), 37 CFR 1.821(b)requires exclusive conformance to the requirements of 37 CFR 1.821 through 37 CFR 1.825 with regard to the manner in which the disclosed nucleic acid and/or amino acid sequences are presented and described. This requirement is necessary to minimize any confusion that could result if more than one format for representing sequence data was employed in a given application.

Pursuant to 37 CFR 1.83(a), sequences that are included in sequence listings should not be duplicated in the drawings. However many significant sequence characteristics may only be demonstrated by a figure. This is especially true in view of the fact that the representation of double stranded nucleotides is not permitted in the sequence listing and many significant nucleotide features, such as “sticky ends” and the like, may only be shown effectively by reference to a drawing figure. Further, the similarity or homology between/among sequences may only be depicted in an effective manner in a drawing figure. Similarly, drawing figures are recommended for use with amino acid sequences to depict structural features of the corresponding protein, such as finger regions and Kringle regions. The situations discussed herein are given by way of example only and there may be many other reasons for including a sequence in a drawing. However, when a sequence is presented in a drawing, the sequence must still be included in the sequence listing if the sequence falls within the definition set forth in 37 CFR 1.821(a), and the sequence identifier (“SEQ ID NO:X”) must be used, either in the drawing or in the Brief Description of the Drawings.

2422.03   Sequence Listing Submission [R-07.2015]

37 CFR 1.821(c) requires that applications containing disclosures of nucleotide and/or amino acid sequences that fall within the definitions of 37 CFR 1.821(a) contain, as a separate part of the disclosure, a disclosure of the nucleotide and/or amino acid sequences, and associated information, using the format and symbols that are set forth in 37 CFR 1.822 and 37 CFR 1.823. This separate part of the disclosure is referred to as the sequence listing. The sequence listing required pursuant to 37 CFR 1.821(c) may be submitted as an ASCII text file via EFS-Web, on compact disc, as a PDF submitted via EFS-Web, or on paper. The sequence listing required by 37 CFR 1.821(c) is the official copy of the sequence listing. Note that 37 CFR 1.821(e) requires that a copy of the sequence listing referred to in 37 CFR 1.821(c) must also be submitted in computer readable form (CRF) in accordance with the requirements of 37 CFR 1.824.

The Office strongly suggests filing the sequence listing required by 37 CFR 1.821(c) as a text file via EFS-Web. If a new application is filed via EFS-Web with an ASCII text file sequence listing that complies with the requirements of 37 CFR 1.824(a)(2)-(6) and (b), and applicant has not filed a sequence listing in a PDF file, the text file will serve as both the paper copy required by 37 CFR 1.821(c) and the computer readable form (CRF) required by 37 CFR 1.821(e). Note that the specification must contain a statement in a separate paragraph that incorporates by reference the material in the ASCII text file identifying the name of the ASCII text file, the date of creation, and the size of the ASCII text file in bytes. See MPEP § 2422.03(a) for additional information pertaining to EFS-Web submission of sequence listings.

If submitted on paper, the sequence listing is a separate part of the disclosure which must begin on a new page within the specification. A plurality of sequences may, if feasible, be presented on a single page; the separate presentation of both nucleotide and amino acid sequences on the same page is also permitted.

If the official copy of the sequence listing as required by 37 CFR 1.821(c) is submitted on compact disc, the specification must contain an incorporation by reference of the material on the compact disc in a separate paragraph, identifying each compact disc by the names of the file(s) contained on each of the compact discs, their date of creation and their sizes in bytes (37 CFR 1.52(e)). The total number of compact discs including duplicates and the files on each compact disc shall be specified (37 CFR 1.77(b)(5)). The sequence listing must be a single document, but the document may be split using software designed to divide a file, that is too large to fit on a single compact disc, into multiple concatenated files. If the user breaks up a sequence listing so that it may be submitted on multiple compact discs, the compact discs must be labeled to indicate their order (e.g., “1 of X”, “2 of X”).

The compact disc used to submit the sequence listing may also contain table information if the table has more than 50 pages of text. See 37 CFR 1.823(a)(2) and 1.52(e)(1)(iii). The compact disc and duplicate copy must be labeled “Copy 1” and “Copy 2,” respectively, and a statement stating that the copies are identical must be included. If the two compact discs are not identical, the Office will use the disc labeled “Copy 1” for further processing (37 CFR 1.52(e)(4)). See also MPEP § 608.05.

If the sequence listing under 37 CFR 1.821(c) is submitted on compact disc, applicant is still required to submit a separate CRF of the sequence listing pursuant to 37 CFR 1.821(e) and 37 CFR 1.824. If the CRF is also submitted on compact disc, applicants will need to submit a total of three copies of the sequence listing (one pursuant to 37 CFR 1.821(c), and two pursuant to 37 CFR 1.821(e)). The compact disc with the CRF of the sequence listing may be identical to the compact disc submitted under 37 CFR 1.821(c) if the latter compact disc includes only the sequence listing (i.e., no additional content, such as tables).

2422.03(a)   Sequence Listings Submitted as ASCII Text Files via EFS-Web [R-07.2015]

The EFS-Web Legal Framework (www.uspto.gov/patents-application-process /applying-online/legal-framework-efs-web-06april11) and MPEP § 502.05 provide detailed information pertaining to filing applications and other documents via EFS-Web. The information below is specific to sequence listing submissions via EFS-Web.

Pursuant to the EFS-Web Legal Framework, applicants may submit a sequence listing under 37 CFR 1.821 as an as ASCII text file via EFS-Web instead of on compact disc, provided the specification contains a statement in a separate paragraph (preferably on the first page) that incorporates by reference the material in the ASCII text file identifying the name of the ASCII text file, the date of creation, and the size of the ASCII text file in bytes. The requirements of 37 CFR 1.52(e)(3) - (6) for documents submitted on compact disc are not applicable to sequence listings submitted as ASCII text files via EFS-Web. However, each text file must be in compliance with ASCII and have a file name with a “.txt” extension.

I. ASCII TEXT FILE SUBMITTED VIA EFS-WEB MAY SERVE AS BOTH PAPER COPY AND CRF

It is recommended that a sequence listing be submitted in an ASCII text file via EFS-Web rather than in a PDF file. See subsection IV, below, for information regarding filing an international application (PCT) with a sequence listing text file via EFS-Web.

If a sequence listing ASCII text file submitted via EFS-Web on the application filing date complies with the requirements of 37 CFR 1.824(a)(2)-(6) and (b), and applicant has not filed a sequence listing in a PDF file (or on paper) on the same day, the text file will serve as both the paper copy required by 37 CFR 1.821(c) and the computer readable form (CRF) required by 37 CFR 1.821(e).Thus, the following are not required and should not be submitted: (1) a second copy of the sequence listing in a PDF file; (2) a statement under 37 CFR 1.821(f) (indicating that the paper copy and CRF copy of the sequence listing are identical); and (3) a request to use a compliant computer readable form of the sequence listing that is already on file for another application pursuant to 37 CFR 1.821(e). If such a request is filed, the USPTO will not carry out the request but will use the sequence listing submitted in the ASCII text file with the application via EFS-Web. See MPEP § 2422.05. Checker software that may be used to check a sequence listing for compliance with the requirements of 37 CFR 1.824 is available on the USPTO website at www.uspto.gov/patents-getting-started/patent-basics/ types-patent-applications/utility-patent/checker-version-446. The User Notes on the Checker website should be consulted for an explanation of errors that are not indicated, and content that is not verified, by the Checker software.

If a user submits a sequence listing (under 37 CFR 1.821(c) and (e)) as an ASCII text file via EFS-Web in response to a requirement under 37 CFR 1.821(g) or (h), the sequence listing text file must be accompanied by a statement that the submission does not include any new matter which goes beyond the disclosure of the application as filed. In addition, if a user submits an amendment to, or a replacement of, a sequence listing (under 37 CFR 1.821(c)and (e)) as an ASCII text file via EFS-Web, the sequence listing text file must be accompanied by: (1) a statement that the submission does not include any new matter, and (2) a statement that indicates support for the amendment in the application, as filed. See 37 CFR 1.825.

Submission of the sequence listing in a PDF file on the application filing date is not recommended. Applicant must still provide the CRF required by 37 CFR 1.821(e), and the sequence listing in the PDF file will not be excluded when determining the application size fee. The USPTO prefers the submission of a sequence listing in an ASCII text file via EFS-Web on the application filing date because as stated above, if applicant has not filed a second copy of the sequence listing in a PDF file (or on paper) on the same day, the text file will serve as both the paper copy required by 37 CFR 1.821(c) and the CRF required by 37 CFR 1.821(e). Any sequence listing submitted in PDF format (or on paper) on the application filing date is treated as the paper copy required by 37 CFR 1.821(c). If applicant submits a sequence listing in both a PDF file and an ASCII text file via EFS-Web on the application filing date, a statement that the sequence listing content of the PDF copy and the ASCII text file copy are identical is required. In situations where applicant files the sequence listing in PDF format and requests the use of the CRF of another application under 37 CFR 1.821(e), applicant must submit a letter and request in compliance with 37 CFR 1.821(e) and a statement that the PDF copy filed in the new application is identical to the CRF filed in the other application. See MPEP § 2422.05.

II. APPLICATION SIZE FEE

Any sequence listing submitted as an ASCII text file via EFS-Web that is otherwise in compliance with 37 CFR 1.52(e) and37 CFR 1.824(a)(2)-(6) and (b) will be excluded when determining the application size fee required by 37 CFR 1.16(s) or 1.492(j) as per 37 CFR 1.52(f)(1). A sequence listing submitted as a PDF file via EFS-Web will not be excluded when determining the application size fee.

Regarding a table submitted as an ASCII text file via EFS-Web that is part of the specification or drawings, each three kilobytes of content submitted will be counted as a sheet of paper for purposes of determining the application size fee required by 37 CFR 1.16(s) or 1.492(j). Each table should be submitted as a separate text file. Further, the file name for each table should indicate which table is contained therein.

See subsection IV, below, for additional information regarding application size fees in an international application (PCT).

III. SIZE LIMIT FOR TEXT FILES

One hundred (100) megabytes is the size limit for sequence listing text files submitted via EFS-Web. If a user wishes to submit an electronic copy of a sequence listing text file that exceeds 100 megabytes, it is recommended that the user file the application without the sequence listing using EFS-Web to obtain the application number and confirmation number, and then file the sequence listing on compact disc in accordance with 37 CFR 1.52(e) on the same day by using Priority Mail Express® from the USPS in accordance with 37 CFR 1.10, or hand delivery, in order to secure the same filing date for all parts of the application. Alternatively, a user may submit the application on paper and include the electronic copy of the sequence listing text file on compact disc in accordance with 37 CFR 1.52(e) . Sequence listing text files may not be partitioned into multiple files for filing via EFS-Web as the EFS-Web system is not currently capable of handling such submissions. If the sequence listing is filed on a compact disc, the sequence listing must be a single document, but the document may be split using software designed to divide a file, that is too large to fit on a single compact disc, into multiple concatenated files. If the user breaks up a sequence listing so that it may be submitted on multiple compact discs, the compact discs must be labeled to indicate their order (e.g., “1 of X”, “2 of X”).

See subsection IV.B, below, for information regarding submission of a sequence listing text file that exceeds 100 megabytes in an international application (PCT) filed via EFS-Web.

For all other file types, 25 megabytes is the size limit. If a user wishes to submit a table that is larger than 25 megabytes, it is recommended that the electronic copy be submitted on compact disc via Priority Mail Express Mail® from the USPS in accordance with 37 CFR 1.52(f)(1) on the date of the corresponding EFS-Web filing in accordance with 37 CFR 1.52(e) if the user wishes the electronic copy to be considered to be part of the application as filed. Alternatively, the user may submit the application in paper and include the electronic copies on compact disc in accordance with 37 CFR 1.52(e). Another alternative would be for the user to break up a computer program listing or table file that is larger than 25 megabytes into multiple files that are no larger than 25 megabytes each and submit those smaller files via EFS-Web. If the user chooses to break up a table file so that it may be submitted electronically, the file names must indicate their order (e.g., “1 of X”, “2 of X”).

See subsection IV.C, below, for information regarding submission of tables in an international application (PCT) filed via EFS-Web.

IV. FILING SEQUENCE LISTINGS IN INTERNATIONAL APPLICATIONS (PCT) VIA EFS-WEB
A.Sequence Listing Must Be Presented as a Separate Part of the Application

Under PCT Rule 5.2(a), the sequence listing must always be presented as a separate part of the description. When filing an international application (PCT) using EFS-Web, the sequence listing part of the description may be submitted either as a single ASCII text file with a ".txt" extension (e.g., "seqlist.txt") or as a PDF file. Note that 100 megabytes is the size limit for submitting a sequence listing text file via EFS-Web. See subsection IV.B, below.

If the sequence listing is submitted as an ASCII text file, applicant need not and should not submit any additional copies. The single ASCII text file is preferred because the ASCII text file will serve both as the sequence listing part of the description under PCT Rule 5.2 and the electronic form under PCT Rule 13ter.1(a) in the absence of a PDF sequence listing file. The check list of the PCT Request provided via EFS-Web together with the international application (PCT) must indicate that the sequence listing forms part of the international application. Furthermore, the statement as set forth in paragraph 4(v) of the AI Annex C (Administrative Instructions under the PCT, Annex C), that “the information recorded in electronic form furnished under PCT Rule 13ter is identical to the sequence listing as contained in the international application,” is not required. Also, the sequence listing in an ASCII text file will not be taken into account when calculating the application sheet count, i.e., no excess sheet fee will be required for the sequence listing text file.

Submission of the sequence listing part of the description in a PDF file is not recommended because the applicant would also be required to supply a copy of the sequence listing in an ASCII text file for purposes of international search and/or international preliminary examination in accordance with paragraph 40 of AI Annex C. When a sequence listing is filed via EFS-Web in a new PCT international application in both a PDF file and an ASCII text file, the PDF copy of the sequence listing will be considered to form part of the application and the ASCII text file will be used for search purposes and will be transmitted to the International Bureau with the record copy.

The calculation of the international filing fee for an international application (PCT), including a sequence listing, filed via EFS-Web is determined based on the type of sequence listing file. A sequence listing filed in an ASCII text file will not be included in the sheet count of the international application (PCT). A sequence listing filed in a PDF file will be included in the sheet count of the international application (PCT). Therefore, the sheet count for an EFS-Web filed international application (PCT) containing both a PDF file and a text file sequence listing will be calculated to include the number of sheets of the PDF sequence listing.

B.File Size and Quantity Limits

One hundred (100) megabytes is the size limit for sequence listing text files submitted via EFS-Web. Sequence listing text files must not be partitioned into multiple files for filing via EFS-Web as the EFS-Web electronic filing system is not currently capable of handling such submissions. For all other file types EFS-Web is currently not capable of accepting files that are larger than 25 megabytes. Additionally, a single EFS-Web submission may include no more than 60 electronic files. Note that regarding the 60 electronic file limit, an applicant may upload and validate in sets of up to 20 files each, with a limit of three sets of 20. If applicant chooses to divide a file into multiple parts using the multi-doc feature, each part is counted as one file.

The need to submit unusually large sequence listings and/or numerous electronic files may prevent applicant from making a complete international application (PCT) filing in a single EFS-Web submission. Applicant may use EFS-Web to file part of the international application (PCT) and to obtain the international application (PCT) number and the confirmation number, and then file the remainder of the international application (PCT) on the same day as one or more follow-on submissions using EFS-Web, in order to secure the same filing date for all parts of the international application (PCT). However, applicant is not permitted to file part of the international application (PCT) electronically via EFS-Web, and then file the remainder of the international application (PCT) on paper to secure a filing date of all parts of the international application (PCT).

In the situation where applicant needs to file a sequence listing that is over one hundred (100) megabytes, applicant may use EFS-Web to file the international application (PCT) without the sequence listing to obtain the international application (PCT) number and the confirmation number, and then file the sequence listing on compact discs on the same day by using Priority Mail Express® from the USPS in accordance with 37 CFR 1.10, or hand delivery, in order to secure the same filing date for all parts of the international application (PCT). Priority Mail ® from the USPS and hand-carried submissions must not contain PDF files and must fully comply with the guidelines for filing a sequence listing on electronic media. The check list of the PCT Request provided via EFS-Web together with the international application (PCT) must indicate that the sequence listing part of the description will be filed separately on physical data carrier(s), on the same day and in the form of an Annex C/ST.25 text file. The sequence listing must be a single document, but the document may be split using software designed to divide a file, that is too large to fit on a single compact disc, into multiple concatenated files. If the user breaks up a sequence listing into multiple concatenated files so that it may be submitted on multiple compact discs, the compact discs must be labeled to indicate their order (e.g., “1 of X”, “2 of X”).

C.Tables Related to a Sequence Listing

Tables related to a sequence listing must be an integral part of the description of the international application (PCT), and must not be included in the sequence listing part or the drawing part. Such tables will be taken into account when calculating the application sheet count, and excess sheet fees may be required. When applicant submits tables related to a sequence listing in an international application (PCT) via EFS-Web, the tables must be in a PDF file. If applicant submits tables related to a sequence listing in a text file, such tables will not be accepted as part of the international application (PCT). For more information, see Sequence Listings and Tables Related Thereto in International Applications Filed in the United States Receiving Office, 1344 Off. Gaz. Pat. Office 50 (July 7, 2009). If applicant submits tables related to a sequence listing in a text file, such tables will not be accepted as part of the international application (PCT).

2422.04   The Requirement for a Computer Readable Copy of the Official Copy of the Sequence Listing [R-07.2015]

37 CFR 1.821(e) requires the submission of a copy of the sequence listing in computer readable form. The computer readable form may be submitted on the electronic media permitted by 37 CFR 1.824, or may be submitted as an ASCII text file via EFS-Web. The information on the computer readable form will be entered into the Office’s database for searching and printing nucleotide and amino acid sequences. This electronic database will also enable the Office to provide published sequence data, in electronic form, to the National Center for Biotechnology Information (NCBI) for publication in GenBank, and enable NCBI to exchange data with the DNA Data Bank of Japan (DDBJ) and the European Bioinformatics Institute (EBI). It should be noted that the Office’s database complies with the confidentiality requirement imposed by 35 U.S.C. 122. Unpublished pending application sequences are maintained in the database separately from published or patented sequences. That is, the Office will not exchange or make public any information on any sequence until the patent application containing that information is published or matures into a patent, or as otherwise allowed by 35 U.S.C. 122.

The Office may permit correction of the official copy of the sequence listing submitted pursuant to 37 CFR 1.821(c), whether on paper or compact disc, at the least, during the pendency of a given application by reference to the computer readable copy thereof submitted pursuant to 37 CFR 1.821(e) if both the official copy and computer readable form were submitted at the time of filing of the application and the totality of the circumstances otherwise substantiate the proposed correction. A mere discrepancy between the official copy and the computer readable form may not, in and of itself, be sufficient to justify a proposed correction. In this regard, the Office will assume that the computer readable form has been incorporated by reference into the application when the official copy and computer readable form were submitted at the time of filing of the application. The Office will attempt to accommodate or address all correction issues, but it must be kept in mind that the real burden rests with the applicant to ensure that any discrepancies between the official copy and the computer readable form are eliminated or minimized. Applicants should be aware that there will be instances where the applicant may have to suffer the consequences of any discrepancies between the two. If a new application is filed via EFS-Web with an ASCII text file sequence listing that complies with the requirements of 37 CFR 1.824(a)(2) - (6) and 37 CFR 1.824(b), and applicant has not filed a sequence listing in a PDF file, the text file will serve as both the paper copy required by 37 CFR 1.821(c) and CRF required by 37 CFR 1.821(e), eliminating any chance for discrepancies between the official copy and the CRF.

The Office does not desire to be bound by a requirement to permanently preserve computer readable forms for support, priority or correction purposes. For example, the Office will make corrections, where appropriate, by reference to the CRF as long as the CRF is still available to the Office. However, once use of the CRF by the Office for processing has ended, i.e., once the Office has entered the data contained on the computer readable form into the appropriate database, the Office does not intend to further preserve the CRF submitted by the applicant.

2422.05   Request for Transfer of Computer Readable Form [R-07.2015]

37 CFR 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications.

*****

  • (e) A copy of the “Sequence Listing” referred to in paragraph (c) of this section must also be submitted in computer readable form (CRF) in accordance with the requirements of § 1.824. The computer readable form must be a copy of the “Sequence Listing” and may not be retained as a part of the patent application file. If the computer readable form of a new application is to be identical with the computer readable form of another application of the applicant on file in the Office, reference may be made to the other application and computer readable form in lieu of filing a duplicate computer readable form in the new application if the computer readable form in the other application was compliant with all of the requirements of this subpart. The new application must be accompanied by a letter making such reference to the other application and computer readable form, both of which shall be completely identified. In the new application, applicant must also request the use of the compliant computer readable “Sequence Listing” that is already on file for the other application and must state that the paper or compact disc copy of the “Sequence Listing” in the new application is identical to the computer readable copy filed for the other application.

*****

Where the computer readable form (CRF) of the sequence listing of a new application is to be identical with the CRF of another application of the applicant on file in the Office, 37 CFR 1.821(e) provides a mechanism for applicant to request a transfer of the CRF from the application already on file to the new application in limited circumstances. However, the Office strongly recommends that applicant submit an ASCII text copy of a sequence listing in the new application rather than request a transfer of a previously filed CRF to avoid the need to file a PDF or paper copy of the sequence listing (which is included in the calculation of the application size fee) and to avoid delays that may be introduced by defective transfer requests. Applicant may be able to retrieve a copy of the sequence listing in ASCII text format in another application of the applicant from applicant's records, public or private PAIR via the Supplemental Content Tab, or from PATENTSCOPE (WIPO website) when provided in an international application.

I.REQUIREMENTS OF A TRANSFER REQUEST

First, the application in which the request for a transfer is submitted must have been filed with (or include via an amendment that does not add new matter) a paper copy or PDF of a sequence listing. Second, the CRF of the previous application must be identical to the sequence listing contained in the new application and the request for transfer must include a statement to this effect. Note that applicant may only request transfer of a CRF that complies with 37 CFR 1.824(a)(2) - (6) and 37 CFR 1.824(b), (i.e., is a compliant sequence listing ASCII text file). Third, the previous application and the CRF to be transferred must be completely and clearly identified in the transfer request. Necessary identifying information includes the application number, filing date of the application, and submission date of the CRF that is to be transferred.

Form PTO/SB/93 (www.uspto.gov/forms/sb0093.pdf) should be used to request a transfer of a CRF under 37 CFR 1.821(e) to facilitate processing of the request.

If a user submits a sequence listing ASCII text file via EFS-Web and concurrently requests the Office to use a compliant computer readable sequence listing that is already on file for another application pursuant to 37 CFR 1.821(e), the Office will not carry out the request but will use the sequence listing submitted with the application as originally filed via EFS-Web.

II.REPLY TO A DEFECTIVE TRANSFER REQUEST NOTICE

Applicant's reply to a notice of a defective transfer request preferably includes a CRF (an ASCII text file submitted via EFS-Web or on compact disc), however a new transfer request and correction of the noted deficiencies is also permitted. As an example, if applicant requested transfer of a CRF into a new application that does not include a sequence listing and such request is defective, the response to a defective transfer request notice may be a CRF of the sequence listing. If it is not, then the response must include a new transfer request, a PDF or paper copy of the sequence listing, and an amendment entering the sequence listing in the application.

2422.06   Requirement for Statement Regarding Content of Official and Computer Readable Copies of Sequence Listing [R-07.2015]

37 CFR 1.821(f) requires that the official sequence listing (submitted on paper or compact disc pursuant to 37 CFR 1.821(c)) and computer readable copies of the sequence listing (submitted pursuant to 37 CFR 1.821(e)) be accompanied by a statement that the content of the official and computer readable copies are the same, at the time when the computer readable form is submitted. Such a statement may be made by a registered practitioner, the applicant, an inventor, or the person who actually compares the sequence data on behalf of the aforementioned. See MPEP § 2428 for further information and Sample Statements.

Note that if the sequence listing is filed in a new application as an ASCII text file via EFS-Web, and applicant has not filed a sequence listing in a PDF file, the text file will serve as both the paper copy required by 37 CFR 1.821(c) and the computer readable form (CRF) required by 37 CFR 1.821(e). See MPEP § 2422.03(a), subsections I and IV, for additional information. Thus, the following are not required and should not be submitted: (1) a second copy of the sequence listing in a PDF file; and (2) a statement under 37 CFR 1.821(f) (indicating that the paper copy and CRF copy of the sequence listing are identical).

2422.07   Requirements for Compliance, Statements Regarding New Matter, and Sanctions for Failure to Comply [R-07.2015]

37 CFR 1.821(g) requires compliance with the requirements of 37 CFR 1.821(b) through (f), as discussed above, if they are not satisfied at the time of filing under 35 U.S.C. 111(a) or at the time of entering the national stage of an international application under 35 U.S.C. 371, within the period of time set in a notice requiring compliance. Failure to comply will result in the abandonment of the application. When applicant files an amendment to comply with the requirements of 37 CFR 1.821(g) and that amendment adds or amends a compact disc(s) or ASCII text file submitted via EFS-Web, applicant is required to update or insert in the specification an appropriate incorporation by reference statement describing the compact disc and the files contained thereon or the description of the ASCII text file submitted via EFS-Web. See 37 CFR 1.77(b)(5) and 37 CFR 1.52(e)(5). Submissions in reply to requirements under 37 CFR 1.821(g) must be accompanied by a statement that the submission includes no new matter. Such a statement may be made by a registered practitioner, the applicant, an inventor, or the person who actually compares the sequence data on behalf of the aforementioned. Extensions of time in which to reply to a requirement under this paragraph are available pursuant to 37 CFR 1.136. Note, however, that patent applications filed under 35 U.S.C. 111 on or after December 18, 2013, and international patent applications in which the national stage commenced under 35 U.S.C. 371 on or after December 18, 2013, may be subject to reductions in patent terms adjustment pursuant to 37 CFR 1.704(c)(13) if they are not in condition for examination within eight months from the filing date or date of commencement, respectively. “In condition for examination” includes compliance with 37 CFR 1.821 through 1.825 (see 37 CFR 1.704(f)).

Provisional applications filed under 35 U.S.C. 111(b) need not comply with 37 CFR 1.821 through 1.825, however, applicants are encouraged to file a sequence listing as defined in 37 CFR 1.821(c) for ease of identification of the sequence information contained in the provisional application.

If any of the requirements of 37 CFR 1.821(b) - (f) are not satisfied at the time of filing an international application under the Patent Cooperation Treaty (PCT), which application is to be searched by the United States International Searching Authority or examined by the United States International Preliminary Examining Authority, applicant will be sent a notice necessitating compliance with the requirements within a prescribed time period. Submissions in reply to requirements under this paragraph must be accompanied by a statement that the submission does not include matter which goes beyond the disclosure in the international application as filed. Such a statement may be made by a registered practitioner, the applicant, an inventor, or the person who actually compares the sequence data on behalf of the aforementioned. International applications that fail to comply with any of the requirements of 37 CFR 1.821(b)- (f) will be searched and/or examined to the extent possible without the benefit of the information in computer readable form. See PCT Administrative Instructions Section 513(c).

The requirement to submit a statement that a submission in reply to the requirements of this section does not include new matter or matter which goes beyond the disclosure in the application as filed is not the first instance in which the applicant has been required to ensure that there is not new matter upon amendment. The requirement is analogous to that found in 37 CFR 1.125 regarding substitute specifications. When a substitute specification is required because the number or nature of amendments would make it difficult to examine the application, the applicant must include a statement that the substitute specification includes no new matter. The necessity of requiring a substitute sequence listing, or pages thereof, is similar to the necessity of requiring a substitute specification and, likewise, the burden is on the applicant to ensure that no new matter is added. Applicants have a duty to comply with the statutory prohibition (35 U.S.C. 132 and 35 U.S.C. 251) against the introduction of new matter.

The correction of errors in sequencing or any other errors that are made in describing an invention are subject to the statutory prohibition (35 U.S.C. 132 and 35 U.S.C. 251) against the introduction of new matter.

2422.08   Presumptions Regarding Compliance [R-08.2012]

Neither the presence nor absence of information which is not required under the sequence rules will create a presumption that such information is necessary to satisfy any of the requirements of 35 U.S.C. 112. Further, the grant of a patent on an application that is subject to 37 CFR 1.821 through 37 CFR 1.825 constitutes a presumption that the granted patent complies with the requirements of these rules.

2422.09   Box Sequence; Hand Delivery of Sequence Listings and Computer Readable Forms [R-07.2015]

To facilitate administrative processing of all papers and compact discs associated with sequence rule compliance, all computer readable forms, compact discs, fees, and papers accompanying them filed in the Office should be marked “Box SEQUENCE.”

Correspondence relating to the sequence rules may also be hand-delivered to the Customer Service Window. In cases of hand delivery to the Customer Service Window, the computer readable form should be placed in a protective mailer labeled with at least the application number, if available. The labeling requirements of 37 CFR 1.52(e) and 1.824(a)(6) must also be complied with. The use of staples and clips, if any, should be confined to carefully attaching the mailer to the submitted papers without contact or compression of the media. In no situations should additional or complimentary electronic copies be delivered to examiners or other Office personnel.

[top]

 

United States Patent and Trademark Office
This page is owned by Patents.
Last Modified: 11/04/2015 11:01:47