What is a sequence listing?
A sequence listing provides a standardized means of presenting the entirety of biological sequence data that is disclosed in a patent application in a single document. More specifically, it includes a list of the nucleotide (DNA or RNA) and/or amino acid protein sequences that are described in a patent application by enumeration of their residues and that meet sequence length thresholds.
Sequence listings enable biological sequence data in patent applications to be collected and transmitted to search databases that are used by the United States Patent and Trademark Office (USPTO) and by the public. It is therefore important that all of the biological sequences that meet the criteria set forth in the relevant standard (see below), regardless of whether they are naturally occurring, artificially synthesized, or randomly generated for exemplary purposes be included in the sequence listing. Biological sequences serve as prior art and as references for future research and innovation. The presentation of the biological sequence data in the standardized format of a sequence listing facilitates both publication and inclusion in searchable databases.
Currently, the international standard for this sequence listing is World Intellectual Property Organization (WIPO) Standard ST.25. The current USPTO regulations regarding sequence listings (see 37 CFR 1.821 – 1.825) are based on ST.25. However, a new international standard, WIPO Standard ST.26, is being implemented internationally. The USPTO has adopted this standard and revised its regulations accordingly (see 37 CFR 1.831-1.835).
Does ST.25 apply or does ST.26 apply?
The application filing date determines whether a sequence listing must comply with ST.25 or ST.26.
All applications with a filing date or international filing date BEFORE July 1, 2022 MUST file sequence listings in ST.25 format
- For 111(a) applications, the relevant date is the “official filing date” i.e., the date all the requirements for granting a filing date are met.
- For U.S. national phase (371) applications, the relevant date is the PCT filing date, NOT the 371(c) date.
- You cannot choose to file in ST.26.
All applications with a filing date or international filing date ON OR AFTER July 1, 2022 MUST file sequence listings in ST.26 format
- An application with benefit or priority to an earlier filed application (under 35 USC 119, 120, 121 or 365) that may have contained a sequence listing in accordance with ST.25 will nonetheless be REQUIRED to submit a compliant sequence listing in XML file format in accordance with 37 CFR 1.831-1.835 (i.e., be in ST.26 format, there will no “grandfathering”).
- Provisional applications are not required to file a sequence listing, however, after July 1, 2022, if an applicant chooses to submit a sequence listing in provisional application, such sequence listing must be comply with 37 CFR 1.831-1.835 (i.e., be in ST.26 format).
Differences between ST.25 and ST.26 sequence listings
| ST.25 | ST.26 |
|---|---|
| ASCII text format with numeric identifiers | XML format, encoded in UTF-8 (Unicode), with elements and attributes |
Not required to include:
| Must include:
|
| Annotation of sequences: Feature keys only | Annotation of sequences: Feature keys and qualifiers |
Permitted to include sequences with:
| Prohibited sequences:
Note: “specifically defined” means any nucleotide other than “n” and any amino acid other than “X” |
| All priority applications may be included | Only the earliest priority application can be included |
| All applicant and inventor names may be included | Only one applicant name and optionally one inventor name may be included. These should be the first named or primary applicant or inventor |
| Only one invention title included | Multiple invention titles may be included, each one in a different language |
| Applicant/inventor names and invention titles must be in basic Latin characters | Applicant/inventor names may be included using any valid Unicode character along with a basic Latin translation or transliteration; invention titles may be included using any valid Unicode character |
| Sequences identified as DNA, RNA, or PRT only | Sequences identified as DNA, RNA, or AA along with a mandatory mol_type qualifier to further describe the type of molecule |
Organism names:
| Organism names:
|
| “u” represents uracil in nucleotide sequences | The symbol "u" is not a valid nucleotide symbol in ST.26. The symbol "t" represents uracil in RNA sequences and thymine in DNA sequences. Uracil in a DNA sequence or thymine in an RNA sequence are considered modified nucleotide bases, must be represented by the symbol "t", and further must be described in a feature table using the feature key "modified_base" |
| Amino acid sequences are represented by three letter abbreviations | Amino acid sequences are represented by one letter abbreviations |
| “n” and “Xaa” variable residues must have a definition provided in a feature | Default value assumed for “n” and “X” variable residues with no definition. Further definition is required when ‘n’ or ‘X’ represent residues other than the default value |
| Feature location format not clearly defined | Strictly defined feature location formats; permits use of “<” and “>” in all sequence types, and “^”, “join”, “order”, and “complement” in nucleotide sequences |
| “Mixed mode” sequence–nucleotide sequence with amino acid translation shown below–is permitted | No “mixed mode” sequence; a translation of a nucleotide may be included in a “translation” qualifier |
ST.25 information
ST.25 sequence listing format
“ST.25” refers to an international standard, adopted in 2009, that describes how nucleotide and amino acid sequences are to be presented in a sequence listing. The U.S. sequence rules (37 CFR 1.821 – 1.825) are based upon WIPO Standard ST.25. All applications requiring sequence listings with a filing date or international filing date before July 1, 2022 MUST contain sequence listings in ST.25 format.
The <110> through the <170> fields of an ST.25 sequence listing are collectively known as the “header” fields of the sequence listing. These header fields are only included once, at the beginning of the sequence listing, and relate to all of the sequences included in that sequence listing. As sequence listings are often filed as documents separate from the specification of a patent application, the header fields serve to associate the sequence data in the sequence listing with the appropriate patent application.
Each sequence in an ST.25 sequence listing is assigned a numbered sequence identifier. The sequence identifiers begin with “1” and increase sequentially by integers. The sequence identifier for each sequence is found in the <210> field and in the first line of the <400> field. In the description, claims, or drawings of a patent application, sequences are referred to by the sequence identifier to which each is assigned in the sequence listing, preceded by “SEQ ID NO:”.
See MPEP 2424 for a definition of, and additional details for, each field in a sequence listing as well as direction as to which fields are mandatory and which are optional.
The USPTO provides a computer program, PatentIn version 3.5.1, that generates sequence listings that comply with ST.25 formatting requirements. See the section on PatentIn for guidance on how to download and use PatentIn version 3.5.1.
See an example of the ST.25 sequence listing format.
Creating an ST.25 ASCII text format sequence listing
Because the sequence rules have very specific format requirements, the USPTO developed software to make it easier to create compliant sequence listings. It is possible to create such listings manually, but it is extremely time consuming and very difficult to ensure that the listing is compliant with (ST.25 and 37 CFR 1.821 – 1.825). The USPTO software are PatentIn and Checker.
Creation of a sequence listing with PatentIn starts with compiling the sequences that form a part of the invention and placing them into a text file. PatentIn can import the sequence information in text format, which can then be refined as needed. PatentIn puts the information into the format that will comply with ST.25.
Checker is verification software provided by the USPTO for preliminary evaluation of a sequence listing for compliance with a subset of Standard ST.25 formatting requirements. While it is not identical to the software used within the agency, it can detect errors that would cause the USPTO validation software to reject the sequence listing, and thereby permits applicants to preemptively correct these errors. Since Checker is unable to validate whether the information in free text fields is proper, it is possible for a sequence listing to pass Checker yet still be found not compliant by the USPTO validation software.
You can download both programs from the USPTO website:
How to file an ST.25 sequence listing in a U.S. national application (35 U.S.C. 111(a))
Applications that disclose specifically enumerated sequences of nucleotides and/or amino acids are required by the USPTO to contain a sequence listing. A sequence listing is a separate part of the disclosure of the application that represents disclosed nucleotide and/or amino acid sequences and associated information, and uses the format and symbols set forth in 37 CFR 1.822 and 1.823. In accordance with 37 CFR 1.821(c)(1)-37 CFR 1.821(c)(3), the sequence listing can be submitted:
- As an ASCII .txt file submitted through the USPTO's patent electronic filing system or on read-only optical discs (where the specification contains an incorporation by reference statement of the ASCII plain text file),
- As a PDF file via the USPTO's electronic filing system or
- On physical sheets of paper
When submitting via 1 above, an incorporation by reference statement of the content of the ASCII .txt file in a separate paragraph, preferably on the first page, of the specification is required identifying:
- the name of the ASCII .txt file
- the size of the ASCII .txt file in bytes and
- the date of creation
For example,
REFERENCE TO AN ELECTRONIC SEQUENCE LISTING
The contents of the electronic sequence listing (sequencelisting.txt; Size: 107,643 bytes; and Date of Creation: February 28, 2021) is herein incorporated by reference in its entirety.
____________________
Further guidance for incorporating a sequence listing by reference into a specification is provided in the Patent Center, MPEP § 502.05 L(l), and MPEP 2422.03(a).
If the sequence listing is supplied as a PDF file or on physical sheets of paper, 37 CFR 1.821(e) requires submission of a copy of the sequence listing in computer readable form (CRF) where the CRF meets the requirements of 37 CFR 1.824.
Use the PatentIn software program to create the ASCII plain text file (37 CRF 1.821(e) or CRF version) of the sequence listing.
There is a 100 MB file size limit for file uploads to the USPTO electronic filing system. For those CRFs that are greater than 100 MB in size, the 37 CRF 1.821(c) and 1.821(e) versions of the sequence listing will need to be submitted separately; i.e., the CRF version filed via the USPTO electronic filing system will not count as both the 37 CFR 1.821(c) version and the 1.821(e) version.
There is a 100 MB file size limit for sequence listing text file uploads to the USPTO electronic filing system. For those ASCII text files that are greater than 100 MB in size, a single copy of the ASCII plain text file sequence listing should be submitted on read-only optical discs.
Checklist for filing a ST.25 sequence listing on read-only optical discs
The following checklist includes guidance from 37 CFR 1.52(e) with respect to filing the sequence listing, in compliance with 37 CFR 1.821-1.825, on read-only optical discs. The submission must:
- Be saved as a single ASCII text file on one or more read-only optical discs.
- Utilize a disc spanning feature of CD/ DVD burner software to save the single file across multiple discs, if the ASCII text file is too large to include on a single read-only optical disc.
- Be a compressed file in accordance with 1.824(b)(2)(ii)-(iv), if file compression is necessary.
- Have each read-only optical disc enclosed in a hard read-only optical disc case within an unsealed padded and protective mailing envelope.
- Include a transmittal letter for read-only optical disc submissions that list for each read-only optical disc:
- First-named inventor (if known),
- Title of the invention,
- Attorney docket or file reference number (if applicable),
- Operating system (e.g., MS-DOS®, MS-Windows®, Mac OS®, or Unix®/Linux®) used to produce the disc and
- A list of files contained on the read-only optical disc including the name of the file, size of the file in bytes, and dates of creation.
- Ensure that the specification contains an incorporation by reference of the sequence listing ASCII plain text file submitted on the read-only optical disc as set forth in 37 CFR 1.821(c)(1), identifying, as required by 37 CFR 1.52(e)(8):
- The name of each file,
- The date of creation of each file and
- The size in bytes of each file.
Recognize that read-only optical discs submitted will not be returned to the applicant.
Amending a U.S. nonprovisional application filed under 35 U.S.C. 111(a) to include an ST.25 sequence listing
Here are some examples of when an application may need to be amended to include a sequence listing:
- A sequence listing is filed after the initial application filing date (e.g., in response to a Notice to File Missing Parts), or
- The sequence listing in ASCII text format submitted under 37 CFR 1.821(c) was found to be noncompliant with Standard ST.25 (i.e. CRFD).
A number of statements are required to support amending an application to add a sequence listing (37 CFR 1.825(a)) or to amend a sequence listing (37 CFR 1.825(b) after the original filing date. These include:
- A “no new matter” statement as per 37 CFR 1.825(a)(4) or 1.825(b)(5),
- A statement under 37 CFR 1.825(a)(3) or 37 CFR 1.825(b)(4) indicating the basis (with specific references to particular parts of the application) for all new or amended sequence data in the application as filed,
- A statement under 37 CFR 1.825(b)(3) that identifies the locations of all deletions, replacements, or additions to the sequence listing, if amended and
- An incorporation by reference statement under 37 CFR 1.825(a)(2)(i) or 1.825(b)(2)(i).
Other statements may also be required, e.g., if an amended or added sequence listing is submitted on physical sheets of paper or as a PDF file, then a separate CRF of the sequence listing is required. In that case, a statement that the paper or PDF sequence listing is the same as the CRF sequence listing copy would be required under 37 CFR 1.825(a)(6) or 1.825(b)(7).
Reminder: When amending an application to include a sequence listing in ASCII text format as a part of the description, an incorporation by reference paragraph will either need to be added to the specification or, if already present, updated to reflect the new sequence listing file name, date of creation, and size in bytes. Keep in mind that the provisions of 37 CFR § 1.111, § 1.116 and § 1.312 apply to making such an amendment to the specification.
When the USPTO mails a Notice to Comply regarding a Sequence Listing requirement, in response, along with the required “Sequence Listing,” a substitute specification in accordance with 37 CFR 1.125(a) will be required to include the new or updated incorporation by reference statement.
ST.26 information
ST.26 sequence listing format
“ST.26” refers to an international standard, adopted in October 2021 and set to be implemented on July 1, 2022, that describes how nucleotide and amino acid sequences are to be presented in a sequence listing using an XML format. The U.S. sequence rules (37 CFR 1.831 – 1.835) are based upon WIPO Standard ST.26. All applications requiring sequence listings with a filing date or international filing date on or after July 1, 2022, MUST contain sequence listings in ST.26 XML format.
An ST.26 sequence listing in XML format must be presented as a single file in XML 1.0 format, must be encoded with Unicode UTF-8, and must comply with the WIPO Standard ST.26 Document Type Definition (DTD) in Annex II of the WIPO Standard ST.26.
An ST.26 sequence listing in XML format is composed of two basic sections:
- a general information part, which contains the bibliographic data pertaining to the application, such as applicant name, inventor name, application number, filing date, invention title, and the earliest priority application. See WIPO Standard ST.26, paragraphs 45-49 and
- a sequence data part, which contains one or more sequences and all of the features describing those sequences. See WIPO Standard ST.26, paragraphs 50-100.
MPEP updates to address WIPO Standard ST.26 are forthcoming.
A desktop software tool, WIPO Sequence, was developed to support authoring, validating, and generating ST.26-compliant, XML format sequence listings. WIPO Sequence is downloadable for free from: https://www.wipo.int/standards/en/sequence/index.html
See an example of the ST.26 sequence listing in XML format.
Creating an ST.26 XML format sequence listing
Because the ST.26 sequence rules have very specific format requirements (see 37 CFR 1.831 – 1.835), WIPO Sequence was developed by WIPO in collaboration with patent offices around the world to support authoring, validating, and generating ST.26-compliant, XML format sequence listings. WIPO Sequence simplifies the creation of ST.26 XML format sequence listings with a user-friendly interface so there is no need to directly edit an XML file. Using WIPO sequence to produce ST.26-compliant, XML format sequence listings is not required, but is highly recommended.
WIPO Sequence allows a user to:
- Accept and store application and sequence information for multiple projects
- Add feature keys and qualifiers to sequences by selecting from easy-to-use, drop down menus
- Validate project data and generate a compliant XML sequence listing
- Validate an existing XML sequence listing
- Generate a “human readable” version of project data for easy review (note that the “human readable” sequence listing should NOT be filed with an application as it is not ST.26 compliant)
- Store custom applicant and inventor information
- Store custom organism names
- Import data from multiple file types, such as ST.25 sequence listings, ST.26 sequence listings, ST.26 projects, .raw files, multisequence format, and FASTA files
- Facilitate creating a translation of free text qualifier values
WIPO Sequence is downloadable for free from: https://www.wipo.int/standards/en/sequence/index.html
Transforming an ST.25 sequence listing into ST.26 XML format using WIPO Sequence
In addition to creating a new sequence listing in XML format, WIPO Sequence will also facilitate the transformation of an existing ST.25 text format sequence listing into an ST.26 XML format sequence listing. Users can import an existing ST.25 sequence listing ASCII text file to initiate a new WIPO Sequence project that can be the basis for an ST.26 XML format sequence listing.
Important points to consider when transforming an ST.25 sequence listing into an ST.26 project:
- The imported ST.25 sequence listing must be valid and compliant with the ST.25 rules. Importing an invalid ST.25 sequence listing into WIPO Sequence could result in unexpected consequences and loss of data.
- Import of an ST.25 sequence listing will result in the creation of an ST.26 project, not an ST.26 XML format sequence listing. There are mandatory elements in an ST.26 sequence listing that are not present in an ST.25 sequence listing, such as the “mol_type” qualifier. Users must input additional information into the resulting ST.26 project before generating a valid ST.26 XML sequence listing.
- It is imperative that users carefully review every note in the comprehensive “Import Report” created by WIPO Sequence upon ST.25 text file import to understand what data transformations occurred, and take any necessary steps to ensure no loss of data.
- It is highly recommended that every user read Annex VII as found in WIPO Standard ST.26. Annex VII provides detailed guidance on transformation of ST.25 sequence listings into ST.26 XML format using WIPO Sequence.
How to file an ST.26 sequence listing XML in a U.S. nonprovisional application filed under 35 U.S.C. 111(a)
37 CFR 1.831(a) requires that patent applications, which contain disclosures of nucleotide and/or amino acid sequences that fall within the definitions of 37 CFR 1.831(b), must contain a "Sequence Listing XML" as a separate part of the disclosure. This “Sequence Listing XML” presents the nucleotide and/or amino acid sequences and associated information using the symbols and format in accordance with the requirements of 37 CFR 1.831-1.834.
This "Sequence Listing XML" part of the disclosure may be submitted:
- As an XML file submitted through the USPTO's Patent Center electronic filing system or
- As an XML file submitted on read-only optical disc(s), as permitted by 37 CFR 1.52(e)(1)(ii) and labeled according to 37 CFR 1.52(e)(5).
With submitting via either 1 or 2, an incorporation by reference statement of the material in the XML file in a separate paragraph of the specification is required by 37 CFR 1.834(c)(1) identifying:
- the name of the XML file,
- the date of creation, and
- the size of the XML file in bytes.
For example,
REFERENCE TO AN ELECTRONIC SEQUENCE LISTING
The contents of the electronic sequence listing (sequencelisting.xml; Size: 107,643 bytes; and Date of Creation: February 28, 2021) is herein incorporated by reference in its entirety.
____________________
Use the WIPO Sequence free desktop software tool, which can be downloaded for free from https://www.wipo.int/standards/en/sequence/index.html, to create an ST.26 XML format sequence listing.
There is a 100 MB file size limit for sequence listing XML file uploads to the USPTO Patent Center electronic filing system. For those XML files that are greater than 100 MB in size, the sequence listing XML must be submitted on read-only optical discs.
Checklist for filing an ST.26 sequence listing XML on read-only optical discs
The following checklist includes guidance from 37 CFR 1.52(e) with respect to filing the sequence listing XML, in compliance with 37 CFR 1.831-1.835, on read-only optical discs. The submission must comply with the following:
- Be saved as a single XML file
- Permitted “Read-only optical discs” are:
- Compact Disc-Read-Only Memory (CD-ROM),
- Compact Disc-Recordable (CD-R) and
- Digital Video Disc-Recordable (DVD-R or DVD+R)
- A file that is not compressed must be contained on a single read-only optical discDVD-R or DVD+R)
- If necessary, the XML file may be compressed using WinZip®, 7-Zip, or Unix®/Linux®Zip; the compressed file must be non-self-extracting
- A compressed file that does not fit on a single read-only optical disc may be split into multiple file parts in accordance with the target read-only disc size
- Each disc must be labeled with the following information:
- the first named inventor, if known,
- the title of the invention,
- attorney docket number and filing date (if known),
- application number and filing date (if known),
- date on which the data were recorded on the read-only optical disc; and
- if multiple discs are submitted, the label must indicate their order (e.g., "1 of X")
- Each read-only optical disc must be enclosed in a hard, read-only optical disc case within an unsealed, padded and protective mailing envelope
- Include a transmittal letter for read-only optical disc submission that list, for each read-only optical disc:
- the first named inventor, if known,
- the title of the invention,
- attorney docket number and filing date (if known),
- the operating system (MS-DOS®, MS-Windows®, Mac OS®, or Unix®/Linux®) used to produce the disc and
- a list of files contained on the compact disc including their names, sizes in bytes, and dates of creation
- Ensure that the specification contains an incorporation by reference of the sequence listing XML file submitted on the read-only optical disc(s) identifying:
- The name of the file,
- The date of creation of the file and
- The size of the file in bytes.
Recognize that read-only optical discs will not be returned to the applicant and will not be retained as part of the patent application file.
Amending a U.S. nonprovisional application filed under 35 U.S.C. 111(a) to include an ST.26 sequence listing XML
Here are some examples of when an application may need to be amended to include a sequence listing XML:
- A sequence listing is filed after the initial application filing date (e.g., in response to a Notice to File Missing Parts), or
- The sequence listing in XML format submitted under 37 CFR 1.831 was found to be noncompliant with Standard ST.26 (i.e. CRFD).
The amendment must include:
- A sequence listing XML in compliance with 37 CFR 1.831-1.834,
- An incorporation by reference statement in a separate paragraph of the specification as required by 37 CFR 1.835(a)(2) or 1.835(b)(2),
- A statement that the "Sequence Listing XML" includes no new matter as required by 37 CFR 1.835(b)(5),
- A statement that indicates support for the “Sequence Listing XML” in the application, as filed, as required by 37 CFR 1.835(b)(4) and
- If the new sequence listing is replacing a previously submitted sequence listing, a statement that identifies the location of all additions, deletions or replacements of sequence information relative to the replaced “Sequence Listing XML” as required by 37 CFR 1.835(b)(3).
Reminder: When amending an application to include a sequence listing in XML format as a part of the description, an incorporation by reference paragraph will either need to be added to the specification or, if already present, updated to reflect the new sequence listing file name, date of creation, and size in bytes.
When the USPTO mails a Notice to Comply regarding a sequence listing requirement, in response, along with the required “Sequence Listing XML”, a substitute specification in accordance with 37 CFR 1.125(a) will be required to include the new or updated incorporation by reference statement.

