uspto.gov
Skip over navigation

2412 The Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence Disclosures to Include a Sequence Listing in XML file format [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]

Patent applications that contain disclosures of nucleotide and/or amino acid sequences, as defined in 37 CFR 1.831(b) must present the associated biological sequence data in a standardized electronic eXtensible Markup Language (XML) format, or a “Sequence Listing XML” as a separate part of the specification. In particular, World Intellectual Property Organization (WIPO) Standard ST.26 permits applicants to submit a single, internationally acceptable sequence listing in a language-neutral format using specified International Nucleotide Sequence Database Collaboration (INSDC) identifiers in international applications filed under the Patent Cooperation Treaty (PCT) and in national and regional applications in the intellectual property offices (IPOs) of WIPO member states. As a result, a single sequence listing in compliance with WIPO Standard ST.26 can be prepared for use in the IPOs of WIPO member states. The regulatory provisions found at 37 CFR 1.831 - 1.835 implement WIPO Standard ST.26 in the USPTO and set forth requirements for presenting sequence data in patent applications filed on or after July 1, 2022, containing disclosures of nucleotide sequences and/or amino acid sequences.

WIPO Standard ST.26 is incorporated by reference into the USPTO regulations by including new regulatory text at 37 CFR 1.839:

37 CFR 1.839 Incorporation by reference.

  • (a) Certain material is incorporated by reference into this subpart with the approval of the Director of the Federal Register under 5 U.S.C. 552(a) and 1 CFR part 51. All approved incorporation by reference (IBR) material is available for inspection at the USPTO and at the National Archives and Records Administration (NARA). Contact the USPTO’s Office of Patent Legal Administration at 571–272–7701. For information on the availability of this material at NARA, email fr.inspection@nara.gov or go to www.archives.gov/federal-register/cfr/ibr-locations.html. The material may be obtained from the source(s) in paragraph (b) of this section.
  • (b) World Intellectual Property Organization (WIPO), 34 chemin des Colombettes, 1211 Geneva 20 Switzerland, www.wipo.int.
    • (1) WIPO Standard ST.26. WIPO Handbook on Industrial Property Information and Documentation, Standard ST.26: Recommended Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings Using XML (eXtensible Markup Language) including Annexes I–VII, version 1.5, approved November 5, 2021; IBR approved for §§ 1.831 through 1.834.
    • (2) [Reserved]

For ease of access, WIPO Standard ST.26 can be found at: www.wipo.int/export/sites/ www/standards/en/pdf/03-26-01.pdf

A link to WIPO Standard ST.26 is also found on the USPTO’s Sequence Listing Resource Center:

www.uspto.gov /patents/apply/sequence-listing-resource-center / learning-and-resources#standard26

2412.01 Overview of the Sequence Rules [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]

Under the sequence listing rules, an applicant is required to submit sequence data, relating to certain nucleotide and/or amino acid sequences disclosed in patent applications that were filed on or after July 1, 2022, in eXtensible Markup Language (XML) format, where the XML file of the sequence information conforms to the requirements of 37 CFR 1.831 - 1.834, which specify requirements of particular paragraphs of WIPO Standard ST.26.

2412.02 Definition of “Sequence Listing XML” [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]

37 CFR 1.831 Requirements for patent applications filed on or after July 1, 2022, having nucleotide and/or amino acid sequence disclosures.

  • (a) Patent applications disclosing nucleotide and/or amino acid sequences by enumeration of their residues, as defined in paragraph (b) of this section, must contain, as a separate part of the disclosure, a computer readable Sequence Listing in XML format (a “Sequence Listing XML”). Disclosed nucleotide or amino acid sequences that do not meet the definition in paragraph (b) of this section must not be included in the “Sequence Listing XML.” The “Sequence Listing XML” contains the information of the nucleotide and/or amino acid sequences disclosed in the patent application using the symbols and format in accordance with the requirements of §§ 1.832 through 1.834.
  • *****

For 35 U.S.C. 111 applications and international application filed on or after July 1, 2022, that contain disclosures of nucleotide and/or amino acid sequences, where those nucleotides and/or amino acids are disclosed by enumeration of their residues, the presentation of the sequence data associated with those nucleotide and/or amino acid sequences must be presented as a separate part of the disclosure that comprises a computer readable format of the sequences in XML format in accordance with WIPO Standard ST.26 as implemented by 37 CFR 1.831 - 1.834. This sequence listing is referred to as a “Sequence Listing XML” in order to distinguish it from a “Sequence Listing” submitted in an application having a filing date BEFORE July 1, 2022. For such applications having a filing date before July 1, 2022, that contain disclosures of nucleotide and/or amino acid sequences, the presentation of the sequence data associated with those nucleotide and/or amino acid sequences is presented as a separate part of the disclosure as an ASCII plain text file, as PDF sheets of the specification, or on physical sheets of paper. See 37 CFR 1.821(c) and 1.821(e)(1). See also MPEP §§ 2421.01 and 2421.02.

2412.02(a) “Enumeration of its residues” [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]

37 CFR 1.831 Requirements for patent applications filed on or after July 1, 2022, having nucleotide and/or amino acid sequence disclosures.

  • *****

  • (d) “Enumeration of its residues” means disclosure of a nucleotide or amino acid sequence in a patent application by listing, in order, each residue of the sequence, where the residues are represented in the manner as defined in paragraph 3(c)(i) or (ii) of WIPO Standard ST.26 (incorporated by reference, see § 1.839).
  • *****

WIPO Standard ST.26 specifies that “enumeration of its residues” means “disclosure of a sequence in a patent application by listing, in order, each residue of the sequence, wherein [either] (I) the residue is represented by a name, abbreviation, symbol, or structure (e.g., HHHHHHQ or HisHisHisHisHisHisGln); or (ii) multiple residues are represented by a shorthand formula (e.g., His6Gln) (WIPO Standard ST.26, paragraph 3(c)).

2412.03 Nucleotides and Amino Acids Included and Excluded From a “Sequence Listing XML” [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]

37 CFR 1.831 Requirements for patent applications filed on or after July 1, 2022, having nucleotide and/or amino acid sequence disclosures.

  • *****

  • (b) Nucleotide and/or amino acid sequences, as used in this section and §§ 1.832 through 1.835, encompass:
    • (1) An unbranched sequence or linear region of a branched sequence containing 4 or more specifically defined amino acids, wherein the amino acids form a single peptide backbone; or
    • (2) An unbranched sequence or linear region of a branched sequence of 10 or more specifically defined nucleotides, wherein adjacent nucleotides are joined by:
      • (i) (i) A A 3' to 5' (or 5' to 3') phosphodiester linkage; or
      • (ii) Any chemical bond that results in an arrangement of adjacent nucleobases that mimics the arrangement of nucleobases in naturally occurring nucleic acids (i.e., nucleotide analogs).
  • *****

  • (j) A “Sequence listing XML” must not include any sequences having fewer than 10 specifically defined nucleotides, or fewer than 4 specifically defined amino acids.

Generally, the data associated with nucleotide sequences that are an unbranched sequence or constitute a linear portion of a branched sequence of 10 or more specifically defined nucleotides are required to be listed in a “Sequence Listing XML.” See MPEP §§ 2412.03(d) and (e) for definitions of nucleotide and modified nucleotide, respectively. See MPEP § 2412.03(a) for definition of “specifically defined.”

Similarly, the data associated with amino acid sequences that are an unbranched sequence or constitute a linear portion of a branched sequence of 4 or more specifically defined amino acids are required to be listed in a “Sequence Listing XML.” See MPEP §§ 2412.03(b) and (c) for definitions of amino acid and modified amino acid, respectively. See MPEP § 2412.03(a) for definition of “specifically defined.”

37 CFR 1.831(b) sets forth the nucleotides and amino acids which must be included in a “Sequence Listing XML”. 37 CFR 1.831(j) specifies that any sequences having fewer than 10 specifically defined nucleotides, or fewer than 4 specifically defined amino acids must be excluded from any “Sequence listing XML.”

2412.03(a) “Specifically Defined” [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]

37 CFR 1.831 Requirements for patent applications filed on or after July 1, 2022, having nucleotide and/or amino acid sequence disclosures.

  • *****

  • (e) “Specifically defined” means any amino acid or nucleotide as defined in paragraph 3(k) of WIPO Standard ST.26.
  • *****

WIPO Standard ST.26, paragraph 3k, provides that “specifically defined” means any nucleotide other than those represented by the symbol “n” and any amino acid other than those represented by the symbol “X,” shown below in Table 1 for nucleotide symbols and Table 3 for amino acids symbols.

Table 1: List of Nucleotides Symbols
Symbol Definition
a adenine
c cytosine
g guanine
t thymine in DNA/uracil in RNA (t/u)
m a or c
r a or g
w a or t/u
s c or g
y c or t/u
k g or t/u
v a or c or g; not t/u
h a or c or t/u; not g
d a or g or t/u; not c
b c or g or t/u; not a
n a or c or g or t/u; “unknown” or “other”

Reproduced from WIPO Standard ST.26, Annex I, Section 1

Table 3: List of Amino Acids Symbols
Symbol Definition
A Alanine
R Arginine
N Asparagine
D Aspartic acid (Aspartate)
C Cysteine
Q Glutamine
E Glutamic acid (Glutamate)
G Glycine
H Histidine
I Isoleucine
L Leucine
K Lysine
M Methionine
F Phenylalanine
P Proline
O Pyrrolysine
S Serine
U Selenocysteine
T Threonine
W Tryptophan
Y Tyrosine
V Valine
B Aspartic acid or Asparagine
Z Glutamine or Glutamic acid
J Leucine or Isoleucine
X A or R or N or D or C or Q or E or G or H or I or L or K or M or F or P or O or S or U or T or W or Y or V; “unknown” or “other”

Reproduced from WIPO Standard ST.26, Annex I, Section 3

2412.03(b) “Amino Acid” [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/ standards/en/pdf/03-26-01.pdf.]

37 CFR 1.831 Requirements for patent applications filed on or after July 1, 2022, having nucleotide and/or amino acid sequence disclosures.

  • *****

  • (f) “Amino acid” includes any D- or L-amino acid or modified amino acid as defined in paragraph 3(a) of WIPO Standard ST.26.
  • *****

WIPO Standard ST.26, paragraph 3(a), defines “amino acid” to mean any amino acid that can be represented using any of the symbols shown in Table 3: List of Amino Acid Symbols (reproduced in MPEP § 2412.03(a)). Such amino acids include, inter alia, D-amino acids and amino acids containing modified or synthetic side chains. Amino acids will be construed as unmodified L-amino acids unless further described in a feature table. A peptide nucleic acid (PNA) residue is not considered an amino acid, but is considered a nucleotide.

A modified amino acid must be further described in a feature table. Where applicable, the feature keys “CARBOHYD” or “LIPID” should be used together with the qualifier “note”. The feature key “MOD_RES” should be used for other post-translationally modified amino acids together with the qualifier “note”; otherwise the feature key “SITE” together with the qualifier “note” should be used. The value for the qualifier “note” must either be an abbreviation set forth in Table 4 below or the complete, unabbreviated name of the modified amino acid. The abbreviations set forth in Table 4 or the complete, unabbreviated names must not be used in the sequence itself.

Table 4: List of Modified Amino Acids
Abbreviation Modified Amino acid
Aad 2-Aminoadipic acid
bAad 3-Aminoadipic acid
bAla beta-Alanine, beta-Aminopropionic acid
Abu 2-Aminobutyric acid
4Abu 4-Aminobutyric acid, piperidinic acid
Acp 6-Aminocaproic acid
Ahe 2-Aminoheptanoic acid
Aib 2-Aminoisobutyric acid
bAib 3-Aminoisobutyric acid
Apm 2-Aminopimelic acid
Dbu 2,4 Diaminobutyric acid
Des Desmosine
Dpm 2,2'-Diaminopimelic acid
Dpr 2,3-Diaminopropionic acid
EtGly N-Ethylglycine
EtAsn N-Ethylasparagine
Hyl Hydroxylysine
aHyl allo-Hydroxylysine
3Hyp 3-Hydroxyproline
4Hyp 4-Hydroxyproline
Ide Isodesmosine
alle allo-Isoleucine
MeGly N-Methylglycine, sarcosine
Melle N-Methylisoleucine
MeLys 6-N-Methyllysine
MeVal N-Methylvaline
Nva Norvaline
Nle Norleucine
Orn Ornithine

Reproduced from WIPO Standard ST.26, Annex I, Section 4

2412.03(c) “Modified Amino Acid” [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]

37 CFR 1.831 Requirements for patent applications filed on or after July 1, 2022, having nucleotide and/or amino acid sequence disclosures.

  • *****

  • (g) “Modified amino acid” includes any amino acid as described in paragraph 3(e) of WIPO Standard ST.26.
  • *****

WIPO Standard ST.26, paragraph 3(e), identifies “modified amino acid” to mean any amino acid as described in the definition of “amino acid”, other than L-alanine, L-arginine, L-asparagine, L-aspartic acid, L-cysteine, L-glutamine, L-glutamic acid, L-glycine, L-histidine, L-isoleucine, L-leucine, L-lysine, L-methionine, L-phenylalanine, L-proline, L-pyrrolysine, L-serine, L-selenocysteine, L-threonine, L-tryptophan, L-tyrosine, or L-valine. See MPEP § 2412.03(b).

2412.03(d) “Nucleotide” [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]

37 CFR 1.831 Requirements for patent applications filed on or after July 1, 2022, having nucleotide and/or amino acid sequence disclosures.

  • *****

  • (h) “Nucleotide” includes any nucleotide, nucleotide analog, or modified nucleotide as defined in paragraphs 3(f) and 3(g) of WIPO Standard ST.26.
  • *****

Under WIPO Standard ST.26, paragraphs 3(f) and (g), identify a “nucleotide” to mean any nucleotide or nucleotide analogue and includes “modified nucleotides” (see MPEP § 2412.03(e)) that can be represented using any of the symbols set forth in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)), wherein the nucleotide or nucleotide analogue contains:

  • (i) a backbone moiety selected from:
    • (1) 2’ deoxyribose 5’ monophosphate (the backbone moiety of a deoxyribonucleotide) or ribose 5’ monophosphate (the backbone moiety of a ribonucleotide); or
    • (2) an analogue of a 2’ deoxyribose 5’ monophosphate or ribose 5’ monophosphate, which when forming the backbone of a nucleic acid analogue, results in an arrangement of nucleobases that mimics the arrangement of nucleobases in nucleic acids containing a 2’ deoxyribose 5’ monophosphate or ribose 5’ monophosphate backbone, wherein the nucleic acid analogue is capable of base pairing with a complementary nucleic acid; examples of backbone moieties include amino acids as in peptide nucleic acids, glycol molecules as in glycol nucleic acids, threofuranosyl sugar molecules as in threose nucleic acids, morpholine rings and phosphorodiamidate groups as in morpholinos, and cyclohexenyl molecules as in cyclohexenyl nucleic acids; and
  • (ii) the backbone moiety is either:
    • (1) joined to a nucleobase, including a modified or synthetic purine or pyrimidine nucleobase; or
    • (2) lacking a purine or pyrimidine nucleobase when the nucleotide is part of a nucleotide sequence, referred to as an “AP site” or an “abasic site”.

2412.03(e) “Modified Nucleotide” [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]

37 CFR 1.831 Requirements for patent applications filed on or after July 1, 2022, having nucleotide and/or amino acid sequence disclosures.

  • *****

  • (i) “Modified nucleotide” includes any nucleotide as described in paragraph 3(f) of WIPO Standard ST.26.
  • *****

WIPO Standard ST.26, paragraph 3(f), identifies that a “modified nucleotide” means any “nucleotide” as explained in MPEP § 2412.03(d) other than deoxyadenosine 3’-monophosphate, deoxyguanosine 3’-monophosphate, deoxycytidine 3’-monophosphate, deoxythymidine 3’-monophosphate, adenosine 3’-monophosphate, guanosine 3’-monophosphate, cytidine 3’-monophosphate, or uridine 3’-monophosphate.

2412.04 Use of Sequence Identifiers to Denote Sequences Disclosed in the Description or Claims [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]

37 CFR 1.831 Requirements for patent applications filed on or after July 1, 2022, having nucleotide and/or amino acid sequence disclosures.

  • *****

  • (c) Where the description or claims of a patent application discuss a sequence that is set forth in the “Sequence Listing XML” in accordance with paragraph (a) of this section, reference must be made to the sequence by use of the sequence identifier, preceded by “SEQ ID NO:” or the like in the text of the description or claims, even if the sequence is also embedded in the text of the description or claims of the patent application. Where a sequence is presented in a drawing, reference must be made to the sequence by use of the sequence identifier (§ 1.832(a)), either in the drawing or in the Brief Description of the Drawings, where the correlation between multiple sequences in the drawing and their sequence identifiers (§ 1.832(a)) in the Brief Description is clear.

    *****

37 CFR 1.831(c) requires that each nucleotide and/or amino acid sequence set forth in a “Sequence Listing XML” in accordance with 37 CFR § 1.831(a) must be referenced by a sequence identifier, preceded by the notation “SEQ ID NO:” or the like, when in the text of the description or claims. Additionally, where a sequence is presented in a drawing, reference must be made using the sequence identifier from the “Sequence Listing XML” associated with the particular sequence either in the drawing or in the Brief Description of the Drawings. The sequence identifiers in the disclosure must correspond to sequence identifiers set forth in the “Sequence Listing XML” as defined in 37 CFR 1.832(a).

37 CFR 1.831(c) requires that where the description or claims of a patent application discuss a sequence that is set forth in the “Sequence Listing XML,” a reference to the sequence using the sequence identifier of that sequence is required at all occurrences, even if the text of the description or claims include the sequence by enumeration of its residues. This requirement is also intended to permit references elsewhere in the application (e.g., specification, claims, or drawings) to sequences set forth in the “Sequence Listing XML” by the use of assigned sequence identifiers without repeating the sequence. Sequence identifiers can also be used to discuss and/or claim parts or fragments of a properly presented sequence. For example, language such as “residues 14 to 243 of SEQ ID NO:23” is permissible and the fragment need not be separately presented in the “Sequence Listing XML.” Where a nucleotide and/or amino acid sequence that meets the length thresholds of 37 CFR 1.831(b) is disclosed by enumeration of its residues anywhere in an application, it must be presented in a “Sequence Listing XML” in a manner that complies with the requirements 37 CFR §§ 1.831 - 1.834.

The rules do not alter, in any way, the requirements of 35 U.S.C. 112. The implementation of the rules has had no effect on disclosure and/or claiming requirements. The rules, in general, or the use of sequence identifiers throughout the specification and claims, specifically, should not raise any issues under 35 U.S.C. 112(a) or 35 U.S.C. 112(b). The use of sequence identifiers (“SEQ ID NO: or the like”) only provides a shorthand way for applicants to discuss and claim their inventions. These identifiers do not in any way restrict the manner in which an invention can be claimed.

2412.05 Representation and Symbols for Nucleotide and/or Amino Acid Sequences [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]

WIPO Standard ST.26 sets forth specific symbols for representing nucleotide and/or amino acid residues in a sequence. The USPTO rules incorporate those specific symbols and additional format.

2412.05(a) Use of Sequentially Numbered Sequence Identifiers in the “Sequence Listing XML” [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]

37 CFR 1.832 Representation of nucleotide and/or amino acid sequence data in the “Sequence Listing XML” part of a patent application filed on or after July 1, 2022.

  • (a) Each disclosed nucleotide or amino acid sequence that meets the requirements of § 1.831(b) must appear separately in the “Sequence Listing XML.” Each sequence set forth in the “Sequence Listing XML” must be assigned a separate sequence identifier. The sequence identifiers must begin with 1 and increase sequentially by integers as defined in paragraph 10 of WIPO Standard ST.26 (incorporated by reference, see § 1.839).
  • *****

In accordance with 37 CFR 1.832(a), the sequence identifiers in the “Sequence Listing XML” must begin with 1 and increase sequentially by integers. The requirement for sequence identifiers, at a minimum, requires that each sequence be assigned a different number for purposes of identification. However, where practical and for ease of reference, sequences should be presented in the “Sequence Listing XML” in numerical order and in the order in which they are discussed in the application.

WIPO Standard ST.26, paragraph 10, requires each “sequence” be assigned a separate sequence identifier, including a sequence which is identical to a region of a longer sequence. Such a “sequence” is one that is disclosed anywhere in an application by enumeration of its residues and can be represented as:

  • (a) an unbranched sequence or a linear region of a branched sequence containing ten or more specifically defined nucleotides, wherein adjacent nucleotides are joined by:
    • (i) a 3’ to 5’ (or 5’ to 3’) phosphodiester linkage; or
    • (ii) any chemical bond that results in an arrangement of adjacent nucleobases that mimics the arrangement of nucleobases in naturally occurring nucleic acids; or
  • (b) an unbranched sequence or a linear region of a branched sequence containing four or more specifically defined amino acids, wherein the amino acids form a single peptide backbone, i.e. adjacent amino acids are joined by peptide bonds. (WIPO Standard ST.26, paragraph 7).

Where no sequence is present for a sequence identifier, i.e. an intentionally skipped sequence, “000” must be used in place of a sequence. The total number of sequences must be indicated in the “Sequence Listing XML” and must equal the total number of sequence identifiers, whether followed by a sequence or by “000”.

For purposes of intentionally skipped sequences, such sequences must be included in the “Sequence Listing XML” and represented as follows:

  • (a) the element SequenceData and its attribute sequenceIDNumber, with the sequence identifier of the skipped sequence provided as the value;
  • (b) the elements INSDSeq _length, INSDSeq _moltype, INSDSeq _division, present but with no value provided;
  • (c) the element INSDSeq _feature-table must not be included; and
  • (d) the element INSDSeq _sequence with the string “000” as the value. (WIPO Standard ST.26, paragraph 58)

2412.05(b) Representation and Symbols of Nucleotide Sequence Data [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]

37 CFR 1.832 Representation of nucleotide and/or amino acid sequence data in the “Sequence Listing XML” part of a patent application filed on or after July 1, 2022.

  • *****

  • (b) The representation and symbols for nucleotide sequence data shall conform to the requirements of paragraphs (b)(1) through (4) of this section.
    • (1) A nucleotide sequence must be represented in the manner described in paragraphs 11–12 of WIPO Standard ST.26.
    • (2) All nucleotides, including nucleotide analogs, modified nucleotides, and “unknown” nucleotides, within a nucleotide sequence must be represented using the symbols set forth in paragraphs 13–16, 19, and 21 of WIPO Standard ST.26.
    • (3) Modified nucleotides within a nucleotide sequence must be described in the manner discussed in paragraphs 17, 18, and 19 of WIPO Standard ST.26.
    • (4) A region containing a known number of contiguous “a,” “c,” “g,” “t,” or “n” residues for which the same description applies may be jointly described in the manner described in paragraph 22 of WIPO Standard ST.26.
  • *****

I. REPRESENTATION OF NUCLEOTIDE SEQUENCE

WIPO Standard ST.26, paragraph 11, provides that a nucleotide sequence must be represented only by a single strand, in the 5’ to 3’ direction from left to right, or in the direction from left to right that mimics the 5’ to 3’ direction. The designations 5’ and 3’ or any other similar designations must not be included in the sequence. A double-stranded nucleotide sequence disclosed by enumeration of its residues of both strands must be represented as:

  • (a) a single sequence or as two separate sequences, each assigned its own sequence identifier, where the two separate strands are fully complementary to each other, or
  • (b) two separate sequences, each assigned its own sequence identifier, where the two strands are not fully complementary to each other.

WIPO Standard ST.26, paragraph 12, provides that the first nucleotide presented in the sequence is residue position number 1. When nucleotide sequences are circular in configuration, applicant must choose the nucleotide in residue position number 1. Numbering is continuous throughout the entire sequence in the 5’ to 3’ direction, or in the direction that mimics the 5’ to 3’ direction. The last residue position number must equal the number of nucleotides in the sequence.

II. SYMBOLS FOR A NUCLEOTIDE SEQUENCE

WIPO Standard ST.26, paragraph 13, provides that all nucleotides in a sequence must be represented using the symbols Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)). Only lower-case letters must be used. Any symbol used to represent a nucleotide is the equivalent of only one residue.

WIPO Standard ST.26, paragraph 14, sets forth that the symbol “t” will be construed as thymine in deoxyribonucleic acid (DNA) and uracil in ribonucleic acid (RNA). Uracil in DNA or thymine in RNA is considered a modified nucleotide and must be further described in a feature table. See MPEP § 2413.01(g), subsection I for more detail regarding a “feature table.”

WIPO Standard ST.26, paragraph 15, provides that where an ambiguity symbol (representing two or more alternative nucleotides) is appropriate, the most restrictive symbol should be used, as listed in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)). For example, if a nucleotide in a given position could be “a” or “g”, then “r” should be used, rather than “n”. The symbol “n” will be construed as any one of “a”, “c”, “g”, or “t/u” except where it is used with a further description in a feature table. The symbol “n” must not be used to represent anything other than a nucleotide. A single modified or “unknown” nucleotide may be represented by the symbol “n”, together with a further description in a feature table. See MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table.” For representation of sequence variants, i.e., alternatives, deletions, insertions or substitutions, see MPEP § 2412.05(c); and also MPEP § 2413.01(g), subsection XII for information on variants.

WIPO Standard ST.26, paragraph 16, sets forth that modified nucleotides should be represented in the sequence as the corresponding unmodified nucleotides, i.e., “a”, “c”, “g” or “t” whenever possible. Any modified nucleotide in a sequence that cannot otherwise be represented by any other symbol in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)), i.e., an “other” nucleotide, such as a non-naturally occurring nucleotide, must be represented by the symbol “n”. The symbol “n” is the equivalent of only one residue.

WIPO Standard ST.26, paragraph 19, specifies that uracil in DNA or thymine in RNA are considered modified nucleotides and must be represented in the sequence as “t” and be further described in a feature table using the feature key “modified_base”, the qualifier “mod_base” with “OTHER” as the qualifier value and the qualifier “note” with “uracil” or “thymine”, respectively, as the qualifier value. See MPEP § 2413.01(g), subsection I for more detail regarding a “feature table.”

WIPO Standard ST.26, paragraph 21, provides that any “unknown” nucleotide must be represented by the symbol “n” in the sequence. An “unknown” nucleotide should be further described in a feature table using the feature key “unsure”. The symbol “n” is the equivalent of only one residue. See MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table.”

III. DESCRIPTION OF MODIFIED NUCLEOTIDES WITHIN A NUCLEOTIDE SEQUENCE

WIPO Standard ST.26, paragraph 17, specifies that a modified nucleotide must be further described in a feature table (see MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table”) using the feature key “modified_base” and the mandatory qualifier “mod_base” in conjunction with a single abbreviation from Table 2: List of Modified Nucleotides in subsection IV, below, as the qualifier value. See MPEP § 2413.01(g) subsections II and III, for more information regarding use of a feature key; and MPEP § 2413.01(g) subsections V and VI, for more information regarding use of a qualifier. If the abbreviation is “OTHER”, the complete unabbreviated name of the modified nucleotide must be provided as the value in a “note” qualifier. For a listing of alternative modified nucleotides, the qualifier value “OTHER” may be used in conjunction with a further “note” qualifier. The abbreviations (or full names) provided in Table 2 must not be used in the sequence itself.

WIPO Standard ST.26, paragraph 18, describes that a nucleotide sequence including one or more regions of consecutive modified nucleotides that share the same backbone moiety must be further described in a feature table as required for a modified nucleotide. See MPEP § 2413.01(g), subsection I, for information regarding a feature table and MPEP § 2412.03(e) regarding modified nucleotides. The modified nucleotides of each such region may be jointly described in a single INSDFeature element as provided in accordance with 37 CFR 1.832(b)(4). See MPEP § 2413.01(g), subsection I, for information regarding INSDFeature elements of a feature table. The most restrictive unabbreviated chemical name that encompasses all of the modified nucleotides in the range or a list of the chemical names of all the nucleotides in the range must be provided as the value in the “note” qualifier. For example, a glycol nucleic acid sequence containing “a”, “c”, “g”, or “t” nucleobases may be described in the “note” qualifier as “2,3-dihydroxypropyl nucleosides.” Alternatively, the same sequence may be described in the “note” qualifier as “2,3-dihydroxypropyladenine, 2,3-dihydroxypropylthymine, 2,3-dihydroxypropylguanine, or 2,3-dihydroxypropylcytosine.” Where an individual modified nucleotide in the region includes an additional modification, then the modified nucleotide must also be further described in a feature table as required for a modified nucleotide. See MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table”.

WIPO Standard ST.26, paragraph 19, provides that uracil in DNA or thymine in RNA are considered modified nucleotides and must be represented in the sequence as “t” and be further described in a feature table using the feature key “modified_base”, the qualifier “mod_base” with “OTHER” as the qualifier value and the qualifier “note” with “uracil” or “thymine”, respectively, as the qualifier value.

IV. JOINTLY DESCRIBING A REGION OF A NUCLEOTIDE SEQUENCE

WIPO Standard ST.26, paragraph 22, specifies that a region containing a known number of contiguous “a”, “c”, “g”, “t”, or “n” residues for which the same description applies may be jointly described using a single INSDFeature element with the syntax “x..y” as the location descriptor in the element INSDFeature_location. See MPEP § 2413.01(g) subsection I, for description of INSDFeature elements in a Feature Table. For representation of sequence variants, i.e., alternatives, deletions, insertions or substitutions, see MPEP § 2412.05(c) and MPEP § 2413.01(g), subsection XII, for information on variants.

Table 2: List of Modified Nucleotides
Abbreviation Definition
ac4c 4-acetylcytidine
chm5u 5-(carboxyhydroxymethyl)uridine
cm 2'-O-methylcytidine
cmnm5s2u 5-carboxymethylaminomethyl-2- thiouridine
cmnm5u 5-carboxymethylaminomethyluridine
dhu dihydrouridine
fm 2'-O-methylpseudouridine
gal q beta, D-galactosylqueuosine
gm 2'-O-methylguanosine
i inosine
i6a N6-isopentenyladenosine
m1a 1-methyladenosine
m1f 1-methylpseudouridine
m1g 1-methylguanosine
m1i 1-methylinosine
m22g 2,2-dimethylguanosine
m2a 2-methyladenosine
m2g 2-methylguanosine
m3c 3-methylcytidine
m4c N4-methylcytosine
m5c 5-methylcytidine
m6a N6-methyladenosine
m7g 7-methylguanosine
mam5u 5-methylaminomethyluridine
mam5s2u 5-methoxyaminomethyl-2-thiouridine
man q beta, D-mannosylqueuosine
mcm5s2u 5-methoxycarbonylmethyl-2- thiouridine
mcm5u 5-methoxycarbonylmethyluridine
mo5u 5-methoxyuridine
ms2i6a 2-methylthio-N6- isopentenyladenosine
ms2t6a N-((9-beta-D-ribofuranosyl-2- methylthiopurine-6- yl)carbamoyl)threonine
mt6a N-((9-beta-D-ribofuranosylpurine-6- yl)N-methylcarbamoyl)threonine
mv uridine-5-oxyacetic acid-methylester
o5u uridine-5-oxyacetic acid
osyw wybutoxosine
p pseudouridine
q queuosine
s2c 2-thiocytidine
s2t 5-methyl-2-thiouridine
s2u 2-thiouridine
s4u 4-thiouridine
m5u 5-methyluridine
t6a N-((9-beta-D-ribofuranosylpurine-6- yl)-carbamoyl)threonine
tm 2'-O-methyl-5-methyluridine
um 2'-O-methyluridine
yw wybutosine
x 3-(3-amino-3-carboxy-propyl)uridine, (acp3)u
OTHER (requires note qualifier)

(Reproduced from WIPO Standard ST. 26, Annex I, Section 2)

2412.05(c) Representation and Inclusion of Variants [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]

A primary sequence and any variant of that sequence, each disclosed by enumeration of its residues and specifically defined to meet the definition in 37 CFR 1.831(a) and 1.831(b), must each be included in the “Sequence Listing XML” and assigned its own sequence identifier. Any variant sequence, disclosed as a single sequence with enumerated alternative residues at one or more positions, must be included in the “Sequence Listing XML” and should be represented by a single sequence, wherein the enumerated alternative residues are represented by the most restrictive ambiguity symbol. Any variant sequence, disclosed only by reference to deletion(s), insertion(s), or substitution(s) in a primary sequence, should be included in the “Sequence Listing XML”. The table below indicates the proper use of feature keys and qualifiers for nucleic acid and amino acid sequence variants:

List of Feature Keys and Qualifiers
Type of sequence Feature Key Qualifier Use
Nucleic acid variation replace or note Naturally occurring mutations and polymorphisms, e.g., alleles, RFLPs.
Nucleic acid misc_difference replace or note Variability introduced artificially, e.g., by genetic manipulation or by chemical synthesis.
Amino acid VAR_SEQ note Variant produced by alternative splicing, alternative promoter usage, alternative initiation and ribosomal frameshifting.
Amino acid VARIANT note Any type of variant for which VAR_SEQ is not applicable.

(Reproduced from paragraph 96 of WIPO Standard ST.26).

For additional information about the representation of sequence variants in a “Sequence Listing XML,” see MPEP § 2413.01(g), subsection XII.

2412.05(d) Representation and Symbols of Amino Acid Sequence Data [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]

37 CFR 1.832 Representation of nucleotide and/or amino acid sequence data in the “Sequence Listing XML” part of a patent application filed on or after July 1, 2022.

  • *****

  • (c) The representation and symbols for amino acid sequence data shall conform to the requirements of paragraphs (c)(1) through (4) of this section.
    • (1) The amino acids in an amino acid sequence must be represented in the manner described in paragraphs 24 and 25 of WIPO Standard ST.26.
    • (2) All amino acids, including modified amino acids and “unknown” amino acids, within an amino acid sequence must be represented using the symbols set forth in paragraphs 26–29 and 32 of WIPO Standard ST.26
    • (3) Modified amino acids within an amino acid sequence must be described in the manner discussed in paragraphs 29 and 30 of WIPO Standard ST.26.
    • (4) A region containing a known number of contiguous “X” residues for which the same description applies may be jointly described in the manner described in paragraph 34 of WIPO Standard ST.26.
  • *****

I. REPRESENTATION OF AN AMINO ACID SEQUENCE

WIPO Standard ST.26, paragraph 24, specifies that the amino acids in an amino acid sequence must be represented in the amino to carboxy direction from left to right. The amino and carboxy groups must not be represented in the sequence.

WIPO Standard ST.26, paragraph 25, indicates that the first amino acid in the sequence is residue position number 1, including amino acids preceding the mature protein, for example, pre-sequences, pro-sequences, pre-pro-sequences and signal sequences. When an amino acid sequence is circular in configuration and the ring consists solely of amino acid residues linked by peptide bonds, i.e., the sequence has no amino and carboxy termini, applicant must choose the amino acid in residue position number 1. Numbering is continuous through the entire sequence in the amino to carboxy direction.

II. SYMBOLS FOR AN AMINO ACID SEQUENCE

WIPO Standard ST.26, paragraph 26, specifies that all amino acids in a sequence must be represented using the symbols set forth in Table 3, above. Only uppercase letters must be used. Any symbol used to represent an amino acid is the equivalent of only one residue.

WIPO Standard ST.26, paragraph 27, indicates that where an ambiguity symbol (representing two or more amino acids in the alternative) is appropriate, the most restrictive symbol should be used, as listed in Table 3: List of Amino Acids Symbols (MPEP § 2412.03(a)). For example, if an amino acid in a given position could be aspartic acid or asparagine, the symbol “B” should be used, rather than “X”. The symbol “X” will be construed as any one of “A”, “R”, “N”, “D”, “C”, “Q”, “E”, “G”, “H”, “I”, “L”, “K”, “M”, “F”, “P”, “O”, “S”, “U”, “T”, “W”, “Y”, or “V”, except where it is used with a further description in the feature table. The symbol “X” must not be used to represent anything other than an amino acid. A single modified or “unknown” amino acid may be represented by the symbol “X”, together with a further description in a feature table (see MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table”). For representation and inclusion of sequence variants, see MPEP § 2412.05(c). For details of how to represent variants in a “Sequence Listing XML,” see MPEP § 2413.01(g), subsection XII.

WIPO Standard ST.26, paragraph 28, specifies that disclosed amino acid sequences separated by internal terminator symbols, represented for example by “Ter” or asterisk “*” or period “.” or a blank space, must be included as separate sequences for each amino acid sequence that contains at least four specifically defined amino acids and is encompassed by the description of sequences found in MPEP § 2412.05(a), referencing paragraph 7 of WIPO Standard ST.26. Each such separate sequence must be assigned its own sequence identifier. Terminator symbols and spaces must not be included in a “Sequence Listing XML”. This means that the element INSDSeq _sequence must disclose the sequence using only the appropriate symbols set forth in Table 1: List of Nucleotides Symbols and Table 3: List of Amino Acids Symbols (reproduced in MPEP § 2412.03(a)), above for the sequence. The sequence must not include numbers, punctuation or whitespace characters (WIPO Standard ST.26, paragraph 57).

WIPO Standard ST.26, paragraph 29, specifies that modified amino acids, including D-amino acids, should be represented in the sequence as the corresponding unmodified amino acids whenever possible. Any modified amino acid in a sequence that cannot otherwise be represented by any other symbol in Table 3: List of Amino Acids Symbols (reproduced in MPEP § 2412.03(a)), i.e., an “other” amino acid, must be represented by “X”. The symbol “X” is the equivalent of only one residue.

Any “unknown” amino acid must be represented by the symbol “X” in the sequence. An “unknown” amino acid designated as “X” must be further described in a feature table (see MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table” ) using the feature key “UNSURE” and optionally the qualifier “note.” The symbol “X” is the equivalent of only one residue (WIPO Standard ST.26, paragraph 32).

III. DESCRIPTION OF MODIFIED AMINO ACIDS WITHIN AN AMINO ACID SEQUENCE

WIPO Standard ST.26, paragraph 29, specifies that modified amino acids, including D-amino acids, should be represented in the sequence as the corresponding unmodified amino acids whenever possible. Any modified amino acid in a sequence that cannot otherwise be represented by any other symbol in Table 3: List of Amino Acids Symbols (reproduced in MPEP § 2412.03(a)), i.e., an “other” amino acid, must be represented by “X”. The symbol “X” is the equivalent of only one residue.

WIPO Standard ST.26, paragraph 30, provides that a modified amino acid must be further described in a feature table (see MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table”). Where applicable, the feature keys “CARBOHYD” or “LIPID” should be used together with the qualifier “note”. The feature key “MOD_RES” should be used for other post-translationally modified amino acids together with the qualifier “note”; otherwise the feature key “SITE” together with the qualifier “note” should be used. The value for the qualifier “note” must either be an abbreviation set forth in Table 4: List of Modified Amino Acids (reproduced in MPEP § 2412.03(b)), above, or the complete, unabbreviated name of the modified amino acid. The abbreviations set forth in Table 4, or the complete, unabbreviated names must not be used in the sequence itself.

IV. JOINTLY DESCRIBING A REGION OF AN AMINO ACID SEQUENCE

WIPO Standard ST.26, paragraph 34, provides that a region containing a known number of contiguous “X” residues for which the same description applies may be jointly described using the syntax “x..y” as the location descriptor in the element INSDFeature_location (see MPEP § 2413.01(g) subsection IV, for information regarding INSDFeature_location). For representation and inclusion of sequence variants, see MPEP § 2412.05(c). For details of how to represent variants in a “Sequence Listing XML,” see MPEP § 2413.01(g), subsection XII.

2412.05(e) Presentation of Special Situations [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]

37 CFR 1.832 Representation of nucleotide and/or amino acid sequence data in the “Sequence Listing XML” part of a patent application filed on or after July 1, 2022.

  • *****

  • (d) A nucleotide and/or amino acid sequence that is constructed as a single continuous sequence derived from one or more non-contiguous segments of a larger sequence or of segments from different sequences must be listed in the “Sequence Listing XML” in the manner described in paragraph 35 of WIPO Standard ST.26.
  • (e) A nucleotide and/or amino acid sequence that contains regions of specifically defined residues separated by one or more regions of contiguous “n” or “X” residues, wherein the exact number of “n” or “X” residues in each region is disclosed, must be listed in the “Sequence Listing XML” in the manner described in paragraph 36 of WIPO Standard ST.26.
  • (f) A nucleotide and/or amino acid sequence that contains regions of specifically defined residues separated by one or more gaps of an unknown or undisclosed number of residues must be listed in the “Sequence Listing XML” in the manner described in paragraph 37 of WIPO Standard ST.26.

WIPO Standard ST.26, paragraph 35, describes that a sequence disclosed by enumeration of its residues that is constructed as a single continuous sequence from one or more non-contiguous segments of a larger sequence or of segments from different sequences must be included in the “Sequence Listing XML” and assigned its own sequence identifier.

WIPO Standard ST.26, paragraph 36, describes that a sequence that contains regions of specifically defined residues separated by one or more regions of contiguous “n” or “X” residues, wherein the exact number of “n” or “X” residues in each region is disclosed, must be included in the “Sequence Listing XML” as one sequence and assigned its own sequence identifier.

WIPO Standard ST.26, paragraph 37, describes that a sequence that contains regions of specifically defined residues separated by one or more gaps of an unknown or undisclosed number of residues must not be represented in the “Sequence Listing XML” as a single sequence. Each region of specifically defined residues (as encompassed by the definitions in 37 CFR 1.831(b)) must be included in the “Sequence Listing XML” as a separate sequence and assigned its own sequence identifier.

2412.06 The Requirement for Exclusive Conformance; Sequences Presented in Drawing Figures [R-07.2022]

[Editor Note: This section is applicable to all applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b).]

For all applications that disclose nucleic acid and/or amino acid sequences that fall within the definition set forth in 37 CFR 1.831(b), 37 CFR 1.831(a) requires conformance to the requirements of 37 CFR 1.832 through 37 CFR 1.834 with regard to the manner in which the disclosed nucleotide and/or amino acid sequences are presented and described. This requirement is necessary to minimize any confusion that could result if more than one format for representing sequence data was employed in a given application.

Pursuant to 37 CFR 1.83(a), sequences that are included in the “Sequence Listing XML” should not be duplicated in the drawings. With the use of feature keys and qualifiers in a “Sequence Listing XML” to represent and describe features of a nucleotide or amino/acid sequence, the need to re-present a sequence in a drawing is less critical. However, many significant sequence characteristics may only be demonstrated by a figure. This is especially true in view of the fact that the representation of double stranded nucleotides is not permitted in the “Sequence Listing XML” and many significant nucleotide features, such as "sticky ends" and the like, may only be shown effectively by reference to a drawing figure. Further, the similarity or homology between/among sequences may only be depicted in an effective manner in a drawing figure. Similarly, drawing figures are recommended for use with amino acid sequences to depict structural features of the corresponding protein, such as epitopes and interaction domains. The situations discussed herein are given by way of example only and there may be many other reasons for including a sequence in a drawing. However, when a sequence is presented in a drawing, the sequence must still be included in the “Sequence Listing XML” if the sequence falls within the definition set forth in 37 CFR 1.831(b), and a sequence identifier (“SEQ ID NO:X” or the like) must be used, either in the drawing itself or in the Brief Description of the Drawings.

2412.07 Examination of Patent Applications Claiming Large Numbers of Nucleotide Sequences [R-07.2022]

Content regarding the examination of patent applications claiming large numbers of nucleotide sequences is located in MPEP § 2434.

[top]

 

United States Patent and Trademark Office
This page is owned by Patents.
Last Modified: 02/16/2023 12:58:23