Sequence listing FAQs

Frequently asked questions (FAQs) and their answers related to World Intellectual Property Organization (WIPO) Standard ST.26 for filing sequence listings in eXtensible Markup Language (XML) format are found below. These FAQs regularly reference WIPO Sequence, a desktop software tool that was developed to support authoring, validating, and generating ST.26-compliant, XML format sequence listings. The most current version of WIPO Sequence is downloadable for free from the WIPO Sequence Suite page.

WIPO also has answers to FAQs: Implementation of WIPO ST.26, related to general questions, WIPO Sequence, PCT filings, and national/regional filings.

Applicability of WIPO Standard ST.26

A1. Do the new ST.26 sequence rules that went into effect on July 1, 2022 apply to all sequence listings filed on/after July 1 at the United States Patent and Trademark Office (USPTO)?

No. Whether to use the ST.25 standard (see 37 C.F.R. 1.821-1.825) or the new ST.26 standard (see 37 C.F.R. 1.831-1.835) is governed by the application’s filing date and not the date when the sequence listing is being filed.

For applications filed under 35 U.S.C. 111(a): If the application has a filing date under 37 C.F.R. 1.53(b) before July 1, 2022, then the ST.25 standard applies (see 37 C.F.R. 1.821-1.825). If the application filing date is on July 1, 2022 or later, then the ST.26 standard applies (see 37 C.F.R. 1.831-1.835).

For national stage applications filed under 35 U.S.C. 371: If the application has an international filing date (the date the PCT application was filed, which PCT is the basis for the 371 U.S. national stage application) before July 1, 2022, then the ST.25 standard applies (see 37 C.F.R. 1.821-1.825). If the application has an international filing date of July 1, 2022 or later, then the ST.26 standard applies (see 37 C.F.R. 1.831-1.835). OF NOTE: The date indicated as the “FILING or 371(c) DATE” on any form issued by the USPTO in a National Stage application filed under 35 U.S.C. 371 has no bearing on which sequence listing format standard is applicable; the sole controlling date is the international filing date.

For international (PCT) applications filed with the U.S. receiving office: If the application has an international filing date before July 1, 2022, then the ST.25 standard applies. If the application has an international filing date of July 1, 2022 or later, then the ST.26 standard applies.

In all of the above application types, when filing a new or replacement sequence listing, the required sequence listing format, ST.25 or ST.26, is governed by the date of filing of the application as above and NOT by the date when the sequence listing itself is provided/filed.

As detailed in the Summary of Changes section of 87 Fed. Reg. 30806:

“[a]pplications pending prior to July 1, 2022, will not have to comply with WIPO Standard ST.26; rather, such applications will require the submission of a ‘‘Sequence Listing,’’ as defined in 37 CFR 1.821(a), in compliance with 37 CFR 1.821 through[sic] 1.825.”

A2. It is after July 1, 2022 and I am filing a continuation (CON) of an application that was filed before July 1, 2022. Can I file a copy of all the documents at the USPTO, including the ST.25 sequence listing, to be compliant?

No. You must file an ST.26 (XML) sequence listing in any application having a filing date on or after July 1, 2022, even though the application contains a benefit or priority claim to a prior application with a filing date before July 1, 2022. OF NOTE: Such a continuation filing will not be an exact copy of the papers filed in the parent application because the sequence listing in .txt format must be transformed into a sequence listing in .xml format.

As detailed in the Applicability section of 87 Fed. Reg. 30806:

“an application that has a filing date on or after July 1, 2022, will be required to provide a ‘‘Sequence Listing XML’’ …. This includes applications having an international filing date on or after July 1, 2022, that claim benefit or priority to applications with filing dates before July 1, 2022. Such applications include, but are not limited to, applications having one or more benefit or priority claims under 35 U.S.C. 119(e) (claiming the benefit of a provisional), 35 U.S.C. 120 (claiming the benefit as a continuation and/or continuation-in-part), 35 U.S.C. 121 (claiming the benefit as a divisional), 35 U.S.C. 365(c) (claiming the benefit as a continuing application to a PCT application), or 35 U.S.C. 119(a)–(d) or 35 U.S.C. 365(a) (claiming the priority to a foreign filed application or a prior filed PCT). If a prior application to which benefit or priority is claimed contains a ‘‘Sequence Listing’’ in Standard ST.25 format (in compliance with 37 CFR 1.821 through 1.825), the applicant will be required to convert that ‘‘Sequence Listing’’ to WIPO Standard ST.26 format (a ‘‘Sequence Listing XML’’ in compliance with 37 CFR 1.831 through 1.835) for inclusion in the new application filed on or after July 1, 2022”

In other words, there is no “grandfathering” per the rules in 37 C.F.R. 1.831-1.835. This applies to applications claiming benefit or priority under 35 U.S.C. 119, 120, 121, or 365 to an earlier (before July 1, 2022) filed application.

Filing an ST.26-compliant sequence listing XML

F1. I erroneously filed a sequence listing in ST.25 (ASCII text) format in a U.S. application requiring a sequence listing in ST.26 (XML) format. What will happen to my application?

You will receive a notice entitled “Notification to Comply With Requirements for Patent Applications Containing Nucleotide And/Or Amino acid Sequence Disclosure,” specifying that 37 CFR 1.831-1.835 (the rules that implement WIPO Standard ST.26) apply to the present application and compliance is required. Further, the ST.25 sequence listing, as present in the file wrapper, may be used to provide support for the submission of a compliant ST.26 sequence listing XML. For details, see the Applicability section in 87 Fed. Reg. 30806.

F2. Is a verified statement, such as the “listing does not go beyond the disclosure of the application as filed”, still needed for an ST.26 compliant sequence listing?

It depends. For applications filed under 35 U.S.C. 111(a) and 35 U.S.C. 371, the U.S. regulations do not require a statement that the sequence listing “does not go beyond the disclosure of the application as filed.”

However, such a statement is required in an international (PCT) application when applicant provides a sequence listing under PCT Rule 13ter.1, 13ter.2 and 45bis.5(c). Pursuant to PCT Rule 13ter.1(e) any sequence listing submitted for purposes of international search shall be accompanied by the statement to the effect that the sequence listing does not go beyond the disclosure in the international application as filed.

F3. When I submit my international (PCT) application for entry into the national stage under 35 U.S.C. 371, do I need to submit the sequence listing XML?

There are only four circumstances under which an ST.26 sequence listing XML should be submitted at time of entry into national phase under 35 U.S.C. 371.

i) The international phase application (PCT) discloses sequences but does not contain a sequence listing as “forming part of the international application” (Annex C of the Administrative Instructions under the PCT), as required under PCT Rule 5.2(a); or

ii) The international phase application (PCT) discloses sequences and contains a sequence listing as “forming part of the international application”, as required under PCT Rule 5.2(a), but the PCT application was neither filed with the USPTO as receiving Office nor published by the International Bureau at the time of entry; or

iii) A translation of the sequence listing as “forming part of the international application” is necessary because the values of one or more qualifiers that contain language-dependent free text were not provided in English in the ST.26 sequence listing XML of the PCT application; or

iv) Applicant desires to amend the sequence listing as “forming part of the international application”.

Applicant is cautioned that submission of the sequence listing in the national stage application, unless provided under circumstance ii) as part of a required copy of the international application pre-publication, or under circumstance iii) as part of a required translation of the international application, is considered to be an amendment of the application. See 37 C.F.R. 1.835(a)-(b).

F4. Upon entering national phase from a PCT application, is it possible that a national office will require a translation of my sequence listing into a language used by that office?

Yes, an intellectual property office (IPO) might require an applicant to provide a translation of the sequence listing such that the language dependent free text qualifier values are in the language used by that Office. In the United States, 37 CFR 1.52(b)(1)(ii) requires that an application be in the English language or be accompanied by a translation into English. Therefore, language-dependent free text elements not in English must be translated into English for a sequence listing XML. Such a translated “Sequence Listing XML” is not, however, considered an added or amended submission as explained in 37 CFR 1.835(d)(2).

F5. If my PCT application contains an ST. 26 sequence listing XML, is it necessary to include an incorporation by reference statement regarding this sequence listing at entry into national phase at the USPTO?

Where the PCT application contains an ST.26 sequence listing XML and the sequence listing forms part of the PCT application, under PCT Rule 5.2(a), then the sequence listing is already considered part of the description, and incorporation by reference is redundant and unnecessary. An ST.26 sequence listing XML is considered to be “forming part of the international application” (see Annex C of the Administrative Instructions under the PCT) when:

i) The sequence listing was included with the original filing by being filed in the PCT application on the international filing date and indicated as part of the application within Box No. IX of the Request (PCT/RO/101) checklist; or

ii) The sequence listing was added after the international filing date by being:

- included in the application under PCT Rule 20.5(b) or (c), or PCT Rule 20.5bis(b) or (c); or

- considered to have been contained in application under PCT Rule 20.6(b); or

- corrected under PCT rule 26; or

- rectified under PCT Rule 91; or

- added as an amendment under Article 34.

OF NOTE: where the ST.26 sequence listing XML forms part of the international application and the international publication has occurred, the USPTO will obtain the ST.26 sequence listing XML as part of the Article 20 documents from the International Bureau for the national stage application.

On the other hand, if the PCT application contains an ST.26 sequence listing XML that was only provided as a search tool under PCT Rule 13ter, then applicant must provide a sequence listing XML upon submission of the national stage application, and amend the description to contain an incorporation by reference statement. Accordingly, the submission must be in compliance with 37 C.F.R. 1.835(a) or (b).

F6. Will the PCT Request Form (PCT/RO/101) provided by WIPO be updated to reflect ST.26?

The PCT Request Form (PCT/RO/101) available to users of ePCT has been revised as of July 2022, in view of the implementation of ST.26. Applicants should no longer generate a request form using PCT-SAFE software, which is no longer supported, as of July 2022. Rather, applicants can use ePCT to generate a .zip file containing the validated Request Form or use a fillable pdf version of the July 2022 Request Form, which can be obtained online from WIPO.

Formatting for WIPO Standard ST.26 – general

G1. May I include the same sequence multiple times within a sequence listing?

While ST.26 does not prohibit including a sequence two or more times in a single sequence listing, each time with a separate SEQ ID number, this practice is not recommended. It can lead to confusion if different SEQ ID NOs are used to describe the same sequence in the disclosure.

G2. If I want to provide the values of the language dependent qualifiers in two languages, must one of the languages always be English?

Yes. One of the languages must be English.

G3. For what purpose would one add multiple titles in different languages?

If an applicant knows that they are going to file their sequence listing in multiple countries that require different languages, you can include all of the required languages in the initial sequence listing. That way, correction or translation is not required later.

G4. May I use a common name such as “human” or “mouse” as the “organism” qualifier value?

No. ST.26 does not permit the use of common names as the “organism” qualifier value (see WIPO Standard ST.26, paragraph 78). Only the following formats are permitted in the “organism” qualifier value:

i) “Genus species” must be used when the sequence is naturally occurring and the Latin genus and species are known, for example, “Mus musculus”;

ii) “Genus sp.” must be used when the sequence is naturally occurring and the Latin genus is known but the species in unknown, for example, “Mus sp.”;

iii) An acceptable scientific name must be used when the sequence is naturally occurring but the organism does not have a Latin genus and species name, such as a virus, for example, “Torque teno virus”;

iv) “unidentified” must be used when the sequence is naturally occurring but the Latin genus and species designation is unknown. Any taxonomic information known about the origins of the should be included in a “note” qualifier attached to the “source” feature. For example, if the only information about a sequence is that it is of primate origin, the value of the “note” qualifier can say “Order Primates”;

v) “synthetic construct” must be used when the sequence is not naturally occurring, i.e., man-made or constructed in vitro.

If desired, a common name may be included as the value of a “note” qualifier attached to the “source” feature, but it must not be used as the “organism” qualifier value.

G5. Can I include strain information in the “organism” qualifier value?

No. Strain information must not be included in the “organism” qualifier value.

The “organism” qualifier value is limited to include genus/species names, virus names, “unidentified”, or “synthetic construct”. Other identifying information, such as strain, substrain, subspecies, serovar, isolate, or cultivar should be included in an appropriate qualifier value. Please see WIPO Standard ST.26, Annex I, Section 5.37 for a list of qualifiers permitted for use with the “source” feature. For example, if the source organism is “Mus musculus strain 129S1/SvImJ”, the correct way to include this information is to use “Mus musculus” as the “organism” qualifier value and “strain 129S1/SvImJ” as the value of a “strain” qualifier.

G6. My sequence can be found in multiple organisms. Can I include more than one organism name in the “organism” qualifier value?

No. The “organism” qualifier value must only contain a single Latin genus/species name, virus name, “unidentified” or “synthetic construct.” If an applicant wants to include additional organism names, they can be included in a “note” qualifier attached to the “source” feature.

G7. If I use “synthetic construct” or “unidentified” as the “organism” qualifier value of a sequence, am I required to include additional descriptive information, similar to what was required to describe “artificial sequence” and “unknown” sequences under ST.25 rules?

No. ST.26 does not require further description of a sequence indicated as a “synthetic construct” or “unidentified”. This is a difference from ST.25, which required a further description for all sequences described as “artificial sequence” or “unknown”. OF NOTE: it is always recommended that applicants provide as much information as possible in their sequence listing. Particularly for “unidentified” sequences, any taxonomic information that is known should be provided in a note qualifier. For example, if all that is known about the sequence is that it is mammalian, then “unidentified” should be used as the value for the “organism” qualifier and “mammalian” should be included as the value of a “note” qualifier.

G8. If free text exceeds the 1,000 character limit in a single qualifier, can the text be continued in additional qualifiers to encompass these additional characters?

Yes. Additional qualifiers can be added to the feature to include information that exceeds the 1,000 character limit.

Formatting sequences for WIPO Standard ST.26 – either nucleotide or amino acid sequences

NA/AA1. Are feature keys required to be presented in order in the sequence listing, from left to right along the length of the molecule?

No. There is no requirement in ST.26 that feature keys appear in any particular order. For a DNA/RNA hybrid molecule, each DNA and RNA segment must be accounted for with a “misc_feature” feature key (i.e. every residue must be included in a “misc_feature” feature key), but no particular order is required. However, for clarity and ease of understanding, we do recommend including features in sequential order, where possible.

NA/AA2. Can an enumerated sequence be identified in the specification as 'residues of' another sequence instead of being assigned a new Sequence Identification number? For example, 'residues 1-10 of SEQ ID NO:1' instead of being assigned its own identifier?

No. Any sequence that is enumerated by its residues and meets the length requirements of WIPO Standard ST.26, paragraph 7, must be given its own SEQ ID number and included in the sequence listing – even if that sequence is a subsequence of another sequence. “Enumerated by its residues” means “disclosure of a sequence in a patent application by listing, in order, each residue of the sequence.” See 37 CFR 1.831(d).

NA/AA3. If my patent application discloses a single sequence with a residue that is repeated consecutively (i.e., 20 glycines in a row), can I “shorten” the sequence in the sequence listing by including one glycine and adding an annotation to indicate that it is repeated 20 times?

WIPO Standard ST.26 treats multiple residues represented by a shorthand formula as if each residue was separately enumerated (see paragraph 3(c)(ii), and Annex VI, Introduction and Example 3(c)-2). Therefore, a disclosure with a shorthand formula such as “His₆Gln” will be treated as if the disclosure was written out in long form: “His His His His His His Gln”. Accordingly, the sequence must be represented in the sequence listing as “HHHHHHQ”.

A sequence with repeating residues, such as a string of 5 glycines, should be included in the sequence listing as “GGGGG” and not with a single instance of the repeating residue combined with an annotation stating that the residue is repeated ‘x’ number of times. All residues must be represented individually in the sequence listing in this circumstance.

OF NOTE: feature keys and qualifiers can be used to annotate variants of a primary sequence that differ in the number of times that a subsequence is repeated. For example, a sequence with a repeated region can be annotated to describe variants with alternative numbers of repeats (e.g., using feature keys REPEAT, rpt_type, rpt_unit_range, or rpt_unit_seq) to describe variants with alternative numbers of repeats.

NA/AA4. If I include a primary sequence and several variants of the primary sequence as separate sequences in my sequence listing, how can I annotate the sequences to describe their relationship?

While not required, applicants are always encouraged to include as much sequence annotation as possible in their sequence listing. An easy way to identify a specific variant as related to a primary generic sequence is to include a note qualifier in the source feature that describes the relationship of the sequences.

NA/AA5. How should I represent a sequence with a variable length in my sequence listing? For example, how should the following sequence be represented and annotated in a sequence listing: P[R/K

TN-X_2-4-MTF-X_1-3-SQNCE-X_0-1-I[D/E]?]

In WIPO Standard ST.26, where a sequence that meets the length requirements of paragraph 7 is disclosed by enumeration of its residues only once, but the length of the sequence may vary (for example, due to copy number variation), the longest embodiment of the sequence is considered the “most encompassing” sequence and is the single sequence that should be included in the sequence listing. For example, consider a sequence containing a repeated region that can vary from 2 to 5 copies as enumerated. The embodiment with 5 copies of the repeat is the most encompassing sequence and should be included in the sequence listing.

Regarding the sequence “P[R/K]TN-X_2-4-MTF-X_1-3-SQNCE-X_0-1-I[D/E]” - since the disclosed sequence is a consensus sequence based on natural variation, the feature key ‘VARIANT’ along with a note qualifier should be used to indicate variations.

A. With respect to the “R” or “K” (position 2) and the “D” or “E” (position 21), there are two ways to present this:

i) Use “X” at positions 2 and 11 (i.e., PXTNXXXXMTFXXXSQNCEXIX), annotate positions 2 and 11 with feature key ‘VARIANT’, and add a note qualifier as value “R or K” and “D or E”, respectively; or

ii) Use a more prevalent amino acid in the sequence (e.g. “R” at position 2 and “D” at position 11, PRTNXXXXMTFXXXSQNCEXID), annotate positions 2 and 11 with feature key ‘VARIANT’, and add a note qualifier to indicate that “R” can be replaced by “K” and “D” can be replaced by “E”.

B. With respect to the variable numbers of “X” residues, it is recommended to include the maximum number of residues in the sequence (the “most encompassing” sequence; see WIPO Standard ST.26, Introduction to Annex VI) and then annotate to indicate one or more may be absent (see Annex VI, Example 36-3). Thus, for the sequence PXTNXXXXMTFXXXSQNCEXIX, the feature key ‘VARIANT’ can be used with a note qualifier describing how many resides can be absent.

For example:

i) feature key ‘VARIANT’ at position 5..8, and a note qualifier with the value “one or two X residues may be absent”;

ii) feature key ‘VARIANT’ at position 13..15, and a note qualifier with the value “one or two X residues may be absent”; and

iii) feature key ‘VARIANT’ at position 19, and a note qualifier with the value “may be absent”.

The example does not state what amino acids the “X” variables can represent. WIPO Standard ST.26, provides that absent a specific definition of “X”, it will be construed as any one of “A”, “R”, “N”, “D”, “C”, “Q”, “E”, “G”, “H”, “I”, “L”, “K”, “M”, “F”, “P”, “O”, “S”, “U”, “T”, “W”, “Y”, or “V” (see paragraph 27). If the definition of X is different, then a note qualifier should be used to provide the desired definition consistent with the disclosure in the application.

NA/AA6. How can a sequence variant with a single point deletion be represented in the sequence listing? Can I represent it as a single sequence using the “replace” qualifier with an empty value?

The number of sequences that are required to be included in the sequence listing will depend on how the variants are disclosed in the application. WIPO Standard ST.26, paragraphs 93-95, should be consulted for the representation of variants, in general. If the two sequences are separately enumerated (see paragraph 93), then both must be included in the sequence listing. If only a single sequence is enumerated, and the variant with a point deletion is described in prose (see paragraph 94), then only one sequence must be included in the sequence listing. This primary sequence must be annotated with a “replace” qualifier with an empty value to indicate the location of the deletion.

OF NOTE: it is recommended that the sequences of all variants, which are important to the invention being claimed, be included separately in the sequence listing, even if inclusion is not required by the Standard.

NA/AA7. Is annotation of modified nucleotides and modified amino acids mandatory in a sequence listing?

WIPO Standard ST.26 requires that modified residues are annotated per paragraph 17 for nucleotides and per paragraph 30 for amino acids.

The Standard defines “modified amino acid” as any amino acid as described in paragraph 3(a) other than L-alanine, L-arginine, L-asparagine, L-aspartic acid, L-cysteine, L-glutamine, L-glutamic acid, L-glycine, L-histidine, L-isoleucine, L-leucine, L-lysine, L-methionine, L-phenylalanine, L-proline, L-pyrrolysine, L-serine, L-selenocysteine, L-threonine, L-tryptophan, L-tyrosine, or L-valine (WIPO ST.26, paragraph 3(e)).

Similarly, the Standard defines “modified nucleotide” as any nucleotide as described in paragraph 3(g) other than deoxyadenosine 3’- monophosphate, deoxyguanosine 3’-monophosphate, deoxycytidine 3’-monophosphate, deoxythymidine 3’- monophosphate, adenosine 3’-monophosphate, guanosine 3’-monophosphate, cytidine 3’-monophosphate, or uridine 3’- monophosphate (WIPO ST.26, paragraph 3(f)).

Residues with modifications to the “side chains”, i.e., nucleotide nucleobases or amino acid R groups, are “modified nucleotides” or “modified amino acids” since these residues are not the conventional amino acid residues or nucleotide residues recited in paragraphs 3(e) and 3(f) of the Standard. Therefore, these modified residues must be annotated.

Additionally, residues with modifications to the sequence “backbone”, i.e., the sugar-phosphate backbone of a nucleotide sequence or a polypeptide backbone containing conventional amide linkages, may or may not result in “modified nucleotides” or “modified amino acids” depending on the nature of the modification. Backbone modifications that change the chemical structure of the residue within a sequence are “modified nucleotides” or “modified amino acids” and must be annotated. Examples include nucleotide analogs such as peptide nucleic acids (PNAs) and glycol nucleic acids (GNAs), and D-amino acids.

NA/AA8. When should modified nucleotides or modified amino acids be included in the sequence listing as their corresponding unmodified nucleotide or unmodified amino acid? For example, how should a 2’ O-methyl adenosine be represented in a sequence listing? How should a beta-alanine be represented in a sequence listing? Are these modified residues considered specifically defined?

WIPO Standard ST.26 indicates that modified nucleotides and amino acids should be represented in the sequence listing as the corresponding unmodified residue whenever possible (paragraphs 16 and 29). OF NOTE: this recommendation is a “should” – a “strongly encouraged approach, but not a requirement” (paragraph 4(d)). It is up to the discretion of the applicant to decide if a modified residue will be represented by the corresponding unmodified residue or the variables “n” or “X”.

As a general rule of thumb – if a residue is modified by the addition of a moiety, such as methylation or acetylation, and the structure of the unmodified residue is generally unchanged, then representation by the unmodified residue is recommended. For example, a methylated adenosine should preferably be represented by an “a” symbol in the sequence listing. However, when the modified residue is structurally different from any unmodified residue, then an “n” or an “X” is recommended. For example, norleucine is an isomer of leucine, and its side chain is a linear structure of 4 carbons. Leucine also has a 4-carbon side chain, but it is branched at the second carbon. Therefore, norleucine is not simply the result of a modification added to a leucine, but a completely different (although related) structure. It is therefore recommended that norleucine be represented by an “X” in a sequence listing.

A nucleotide is “specifically defined” when it is represented by anything other than “n”, and an amino acid is “specifically defined” when it is represented by anything other than “X” (see paragraph 3(k)). Therefore, a 2’ O-methyl adenosine represented by an “a” in the sequence is specifically defined, whereas norleucine represented by “X” in the sequence is NOT specifically defined.

NA/AA9. If there is an abbreviation for a modified residue in ST.26 Annex I, Table 2 or Table 4, is it mandatory that they are used as the value for the mod_base qualifier (for nucleotide sequences) or as the value of the note qualifier (for amino acid sequences), or can the unabbreviated residue name be used instead?

Modified nucleotides are annotated using the feature key “modified_base” and the “mod_base” qualifier. The only permitted values for the mod_base qualifier are those abbreviations listed in Annex I, Table 2. If a particular modified nucleotide does not have an abbreviation listed in Annex I, Table 2, then the value “OTHER” must be used for the mod_base qualifier, along with an additional note qualifier that includes the complete, unabbreviated name of the modified reside. If the modified nucleotide does have an abbreviation listed in Annex I, Table 2, it is recommended (but not required) that this abbreviation be used for the value of the mod_base qualifier instead of using “OTHER” along with a note qualifier.

Modified amino acids are annotated using the feature keys “MOD_RES”, “SITE”, “CARBOHYD”, or “LIPID” along with a note qualifier. The value of the note qualifier must be an abbreviation listed in Annex I, Table 4, or the complete, unabbreviated name of the modification. If the modified amino acid in question has an abbreviation listed in Annex I, Table 4, it is recommended (but not required) that the abbreviation be used for the value of the note qualifier, rather than the unabbreviated name. Note that abbreviations not listed in Annex I, Table 4 are not permitted as values for the note qualifier.

NA/AA10. What qualifiers are mandatory for branched or cyclic sequences?

Linear regions of branched sequences that meet the requirements of WIPO Standard ST.26, paragraph 7 (i.e., 10 or more specifically defined nucleotides or 4 or more specifically defined amino acids) must be included in a sequence listing. “Modified residues” must be annotated according to paragraphs 17 and 30.

If the cyclization of a sequence results in a “modified nucleotide” or “modified amino acid,” then that residue must be annotated. However, if cyclization of a sequence occurs via normal phosphodiester bonds (for nucleic acids) or peptide bonds (for proteins), then an annotation is not required.

The requirements for annotation of branched sequences will depend on the nature of the branching. For example, branching that occurs via linkages to amino acid side chains will result in a “modified residue”, and therefore require annotation. This concept is exemplified in Example 7(b)-3 of the Guidance Document in Annex VI of WIPO Standard ST.26. Within this example, the lysine in peptide 1 must be annotated because its side chain was modified to be linked to the C terminus of another peptide. This lysine residue does not use convention peptide linkages to conjugate the other peptide sequence. OF NOTE: the glycine in peptide 2 in this example does not need to be annotated because it is simply linked to the other peptide using a conventional peptide linkage, and the chemical structure of the glycine has not changed (refer to the answer to question 3, above, for more detail on terminal modifications).

It is always recommended that feature keys and qualifiers are included to indicate the nature of branching, crosslinks, or circularization, even if they are not required.

NA/AA11. Is annotation of “end” modifications (N- or C-terminal modifications in protein sequences or 5’ or 3’ modifications in nucleotide sequences) mandatory under ST.26?

WIPO Standard ST.26 requires that modified residues are annotated per paragraph 17 for nucleotides and per paragraph 30 for amino acids. If the end modification results in a modified nucleotide or modified amino acid, then it must be annotated.

Terminal modifications within a sequence may or may not change the chemical structure of the terminal residue. One must look at the terminal modification and determine whether the modification changes the chemical structure of residue such that residue falls outside the exceptions set forth within WIPO Standard ST.26, paragraphs 3(e) and 3(f). For example, a peptide in which the C terminal residue is linked to a structure (such as part of a branched sequence – see peptide #2 in example 7(b)-3) via a conventional amide linkage is not considered a modified residue and therefore is not required to be annotated. Similarly, a peptide in which the N terminal residue is amide bonded to biotin is not considered a modified residue and therefore is not required to be annotated. In both scenarios, the structure of the residue involved in the C-terminal or N-terminal linkage is not changed from the conventional amino acids recited in paragraph 3(e) of the Standard.

In contrast, terminal modifications that change the chemical structure of the residue are considered “modified residues” and must be annotated. For example, the methylation of the C-terminus in Example 3(c)-1 does change the chemical structure of the terminal residue, since the methyl group replaces the hydroxyl normally found at the alpha carboxyl group. Therefore, this methylated lysine must be annotated as a “modified residue”.

OF NOTE: it will be up to the applicant to evaluate each terminal residue modification within an enumerated sequence and decide as to whether or not the structure of the terminal residue is changed. If the modified residue structure is different from the conventional amino acids or nucleotides recited in paragraph 3(e) and 3(f) of the Standard, then the modification must be annotated.

Finally, it is always recommended that applicants include as much information as reasonable in their sequence listings to represent their disclosures as accurately as possible. Therefore, even if an end modification isn’t required to be annotated, it should be included.

Formatting sequences for WIPO Standard ST.26 – specifically amino acid sequences

AA1. Why is there a specific amino acid symbol for leucine or isoleucine (J), but not for other similar amino acids such as serine and threonine?

ST.26 was designed to ensure sequence data was in a format compliant with INSDC (International Nucleotide Sequence Database Collaboration) requirements. Therefore, the amino acid symbols permitted in ST.26 are the amino acid symbols defined by the INSDC. The INSDC does not have a symbol for Ser/Thr, but does have a symbol “J” for Leu/Ile.

AA2. If there is an X in an amino acid sequence that can be “any amino acid”, must I use a feature key with a note qualifier indicating that it is “any amino acid”?

WIPO Standard ST.26 defines a default value for “X” in paragraph 27:

The symbol “X” will be construed as any one of “A”, “R”, “N”, “D”, “C”, “Q”, “E”, “G”, “H”, “I”, “L”, “K”, “M”, “F”, “P”, “O”, “S”, “U”, “T”, “W”, “Y”, or “V”, except where it is used with a further description in the feature table.

If an “X” is equal to the default value, then no feature key with a note qualifier is required. If an “X” is equal to anything other than the default value, then a feature key with a note qualifier describing the variable is required. OF NOTE: the value “any amino acid” is broader than the default value as defined in WIPO Standard ST.26, so if any possible amino acid is permitted at the position, a feature key with a note qualifier is required.

AA3. How should a modified amino acid residue be represented in a sequence listing if the modification is not listed in in Annex 1, Table 4?

A modified amino acid in a sequence should be represented with the corresponding unmodified amino acid whenever possible. Where a modified amino acid cannot be represented by a specific, one-letter symbol as found in Annex 1, Table 3 (see WIPO Standard ST.26), the modified amino acid must be represented by the symbol “X.” The modified amino acid must then be annotated using a “MOD_RES” feature key for post-translational modifications, or a “SITE” feature key for modifications that are not post-translational. Both of these feature keys require a note qualifier that must indicate the nature of the modification as its value. Table 4 lists abbreviations of common modifications for use in the “note” qualifier. When a modified amino acid is not listed with an abbreviation in ST.26, Annex I, Table 4, the value for the note qualifier should be the complete, unabbreviated name of the modified amino acid (see paragraph 30).

AA4. What is the definition of a “branched amino acid” sequence?

A branched amino acid sequence is one where one or more amino acids acid sequences “branch” off the main peptide backbone via conjugation to an amino acid side chain of an amino acid within the main peptide backbone. For example, branched sequences can include those sequences where one or more amino acids sequences are linked to a main peptide backbone using a peptide bond linkage to an amine group on an amino acid side chain. Branched sequences may also include cyclic peptides with a tail, where all the bonds adjacent to the amino acid from which the tail emanates are normal peptide bonds. A representative example of a branched amino acid sequence can be found in WIPO Standard ST.26, Annex VI, Example 7(b)-3.

AA5. How can I annotate the branch sequence such that the structure is adequately described?

If I have a branched amino acid sequence where the main amino acid sequence is >4 specifically defined amino acids and one branch sequence is <4 specifically defined amino acids, I understand that the branch sequence cannot be included in the sequence listing as a separate sequence.

If a branched amino acid sequence has a branch sequence with <4 specifically defined amino acids, that particular branch cannot be included in the sequence listing as a separate sequence. However, the sequence listing must include any linear region of the branched amino acid sequence that contains >4 specifically defined amino acids.

Where the branched amino acid sequence included in the sequence listing links to a branch with <4 specifically defined amino acids, the amino acid (of the branched amino acid sequence) involved in the linkage must be described using the feature key “SITE” and note qualifier. The note qualifier indicates that the residue is “bonded to a peptide of the sequence ”.

The feature key “MOD_RES” is used only for post-translation modifications. If the linkage to the branch with <4 specifically defined amino acids is a result of a post-translation modification, then the feature key “MOD_RES” must be used with a note qualifier for the amino acid that links the branch with <4 specifically defined amino acids.

Formatting sequences for WIPO Standard ST.26 – specifically nucleotide sequences

NA1. I want to include a coding sequence (CDS) that is only a fragment of a larger CDS in my sequence listing. How do I represent this sequence when it does not begin with a start codon or end with a stop codon?

CDS features should begin with the start codon and must end with the stop codon (see WIPO Standard ST.26, paragraph 89). However, it is possible to correctly represent a CDS feature that is a fragment of a larger CDS and does not include a start codon or a stop codon.

To include a CDS that does not include the start codon, use the symbol “<” before the first position in the CDS location. For example, the location “<1..200” indicates that the CDS begins at some residue previous to residue 1 and continues to and includes residue 200.

To include a CDS that does not include the stop codon, use the symbol “>” before the last position in the CDS location. For example, the location “1..>200” indicates that the CDS continues beyond residue 200.

A CDS that is missing both the start codon and the stop codon can include the “<” and the “>” symbols. The location “<1..>200” indicates that the CDS begins at some residue previous to 1 and continues beyond residue 200.

NA2. What is the difference between “uracil in DNA or thymine in RNA” (ST.26 paragraph 19) and a “combined DNA/RNA” molecule” (ST.26 paragraph 55)?

WIPO Standard ST.26, paragraph 19, refers to “uracil in DNA” and “thymine in RNA.” The requirements of this paragraph apply when:

i) The backbone of the nucleotide sequence (the sugar-phosphate backbone) is DNA, but one or more nucleobases are uracil; OR

ii) The backbone of the nucleotide sequence is RNA, but one or more nucleobases are thymine.

These residues are considered “modified nucleotides” and must be annotated according to paragraph 19.

In contrast, paragraph 55 refers to a “combined DNA/RNA molecule”. The requirements of paragraph 55 apply when a single nucleotide sequence has a backbone that is partially DNA and partially RNA.

NA3. When annotating a DNA/RNA hybrid molecule, must I specifically identify DNA segments using a “misc_feature” feature key even though the molecule type is DNA?

Where an application discloses a DNA/RNA hybrid sequence, the sequence must be included in the sequence listing with the molecule type “DNA”, and the value for the mandatory “mol_type” qualifier of the “source” feature key is “other DNA”. WIPO Standard ST.26, paragraph 55, specifically requires that “[e]ach DNA and RNA segment of the combined DNA/RNA molecule must be further described with the feature key “misc_feature” and the qualifier “note”, which indicates whether the segment is DNA or RNA.” Thus, in a DNA/RNA hybrid molecule, the DNA segments must be indicated with a “misc_feature” despite the fact that the molecule type is DNA. This will ensure a complete and accurate description of every segment in the molecule.

NA4. How should I annotate a nucleotide sequence with one or more phosphorothioate internucleotide linkages?

A nucleotide sequence with one or more phosphorothioate internucleotide linkages can be annotated as a nucleotide analog under ST.26. The residues linked with phosphorothioate bonds must be identified using a “modified_base” feature key, a “mod_base” qualifier with the value “OTHER”, and a note qualifier that describes the phosphorothioate bond.

A single phosphorothioate bond between two adjacent residues can be described in a “modified_base” feature with the location format x^y, where x and y are the positions of the residues linked by the phosphorothioate bond. A region of contiguous residues that are all linked by phosphorothioate bonds can be described in a “modified_base” feature with the location format x..y, where x is the first residue in the region and y is the last residue in the region.

Creating an ST.26-compliant sequence listing XML using the WIPO Sequence tool

C1. Can I use PatentIn to create an ST.26-compliant sequence listing XML? If not, what can I use?

No. PatentIn does not create sequence listing XMLs that comply with ST.26 (37 C.F.R. 1.831-1.835).

WIPO Sequence, developed by WIPO with assistance from numerous IP offices including the USPTO, is a global, free-to-download software tool to assist applicants in creating ST.26-compliant sequence listing XMLs. Be sure to visit WIPO Sequence regularly to ensure you have the most up-to-date version of the software.

C2. Am I required to use WIPO Sequence to create my sequence listing XML?

No. Applicants are not required to use WIPO Sequence to create a sequence listing XML, but its use is highly recommended. You may create your sequence listing XML using any software you wish, but it must be valid according to the ST.26 Document Type Definition (DTD) of WIPO ST.26 in Annex II and comply with WIPO ST.26.

When using WIPO sequence, applicant must make certain that the latest version of WIPO sequence is used to generate a sequence listing XML. Using an outdated version of WIPO sequence may result in a non-compliant sequence listing. See WIPO Sequence Suite for the most up to date version.

C3. Where is the list of organisms provided in WIPO Sequence derived from?

The pre-defined list of organisms in WIPO Sequence is derived from the scientific names at the species level and at the genus level as listed in the Integrated Taxonomic Information System (ITIS) database and in the International Committee on Taxonomy of Viruses (ICTV) Master Species List. This list will be updated annually.

C4. When I import an ST.25 sequence listing into WIPO Sequence, will the tool automatically convert “u” symbols to “t” symbols in nucleotide sequences?

When importing an RNA sequence from an ST.25 sequence listing, all “u” symbols will be replaced by “t” symbols. This change will be noted in the import report.

When importing a DNA sequence from an ST.25 sequence listing, any “u” symbols will be maintained and will NOT be converted to “t”. WIPO Sequence cannot determine if a “u” in a DNA sequence is a modified residue (a uracil nucleobase on a DNA backbone) or an RNA segment of a DNA/RNA hybrid molecule. Therefore, maintaining the “u” symbols in DNA sequences will result in errors when the project is validated. These errors will require the user to manually change the “u” symbols to “t” symbols and prompt the inclusion of required feature keys and qualifiers that explain the presence of uracil in a DNA sequence.

C5. When I import an ST.25 sequence listing into WIPO Sequence to convert it to an ST.26 sequence listing, will the tool automatically highlight the <10 nucleic acid sequences and the <4 amino acid sequences so that I may remove them to be compliant with ST.26 (they had been permitted in ST.25)?

Not exactly. WIPO Sequence will import nucleotide sequences less than 10 specifically defined residues and amino acid sequences with less than 4 specifically defined residues; however, they will automatically be marked as “intentionally skipped sequences” (WIPO Standard ST.26 paragraph 3(d)). Intentionally skipped sequences function as placeholders to preserve the numbering of sequences in the disclosure, but no sequence data will be present in the sequence listing. All sequences that are marked as “intentionally skipped sequences” will be listed in the import report.

OF NOTE: it is not necessary to remove intentionally skipped sequences before generating the sequence listing XML. These sequences will appear in the sequence listing as empty sequences (WIPO Standard ST.26 paragraph 58-59). If they are removed, be aware that all subsequent sequences will be renumbered. Take care to ensure that SEQ ID numbers in the application disclosure are updated appropriately.

C6. What is the file size limit for WIPO Sequence? How large of a sequence listing can I create?

WIPO has published approximate file size limits for import, XML generation, and validation on their WIPO Sequence and ST.26 Knowledge Base.

Always be sure to use the most recent version of WIPO Sequence to get the best performance when working with large numbers of sequences. The most up-to-date software can be downloaded at the WIPO Sequence Suite.

C7. In WIPO Sequence, what is the difference between the “originalFreeTextLanguageCode” attribute and the “nonEnglishFreeTextLanguageCode” attribute?

WIPO ST.26 allows qualifier values for language dependent qualifiers to be presented in two languages – English and a non-English language. The “nonEnglishFreeTextLanguageCode” attribute is used to indicate the language code of the non-English qualifier values included in a given sequence listing.

The “originalFreeTextLanguageCode” attribute is used to indicate the single language that the language dependent free text qualifier values were originally prepared in.

Consider the following scenario: an inventor prepares an initial patent application and sequence listing in the German language. Subsequently, a related PCT application is filed. The inventor wants to enter national phase in the United States, which requires English, and Korea, which requires Korean. A second sequence listing is prepared to accompany the PCT application and contains the language dependent free text qualifier values in English and Korean. In this second sequence listing, the “originalFreeTextLanguageCode” is “de” to indicate that the sequence listing was originally prepared in German; the “nonEnglishFreeTextLanguageCode” is “ko” to indicate that the non-English qualifier values are presented in Korean.

C8. How does WIPO Sequence handle alternative start codons and alternative stop codons when generating translations of CDS regions?

WIPO Sequence will not automatically detect alternative start codons and alternative stop codons when generating a translation. However, alternative start codons and alternative stop codons can be indicated using the “transl_except” qualifier (see WIPO Standard ST.26, Annex I, section 6.77).

An alternative start codon should be annotated with a “transl_except” qualifier with the value “(pos:,aa:Met)”, where “” is the position of the codon in the format “x..y”.

An alternative stop codon should be annotated with a “transl_except” qualifier with the value “(pos:,aa:TERM)”, where “” is the position of the codon in the format “x..y”. Note that a stop codon must ONLY be in the last position of the CDS.

C9. I am importing an ST.25 sequence listing into WIPO Sequence, and it contains hundreds of nucleotide sequences. Is there a fast and easy way to add the required “mol_type” qualifier value to all of these sequences in bulk?

WIPO Sequence allows bulk editing of the “mol_type” qualifier in nucleotide sequences.

After importing your ST.25 sequence listing, open up the newly created project, and scroll down to the “Sequences” section. Select the “Bulk Edit” button.

A bulk edit window will appear. In the “Type of bulk edit” drop down menu, choose “Qualifier molecule type.” In the “Select Range of Sequence IDs” box, enter the range of sequences that you want to bulk edit. OF NOTE: all of the sequences you enter here will be given the same qualifier molecule type. FURTHER NOTE: DNA sequences and RNA sequences have different qualifier molecule type options, so they must be done separately. Do not mix DNA and RNA SEQ ID numbers in the “Select Range of Sequence IDs” box.

In the “Molecule Type” drop down menu, choose either DNA or RNA. The molecule type should match the molecule type of the sequence ID numbers you entered into the “Select Range of Sequence IDs” box.

Select the desired qualifier molecule type from the “Qualifier Molecule Type” drop down menu, and click on “Edit sequences.” All of the sequences entered into the “Select Range of Sequence IDs” box will be given this qualifier molecule type.

You can repeat this bulk editing multiple times for each qualifier molecule type you want to select.

Validating an ST.26 sequence listing XML using WIPO Sequence

V1. Does WIPO Sequence have a counterpart like “Checker” (as in PatentIn and Checker) that will validate sequence listing XML files?

Yes. WIPO Sequence allows you to validate sequence listings XMLs that were previously created. On the main projects page of WIPO Sequence, click on “Validate Sequence Listing”, in the upper right-hand corner of the screen, then use the pop-up box to browse to your XML file.

WIPO Sequence will validate project data before generating a sequence listing XML file. However, it is best practice to validate your XML file using the validation function of WIPO Sequence, described above, before filing.

V2. If an ST.26 sequence listing XML passes validation by the WIPO Sequence validator, are IP offices guaranteed to accept it? Are there any ST.26 requirements that the WIPO Sequence validator cannot check?

The validator that is integrated into the WIPO Sequence desktop authoring tool is able to check most, but not all, of the requirements of ST.26.

For example, the validator will list an error if a feature key is missing a mandatory qualifier or if a qualifier with a defined list of value choices has an inappropriate value. The validator will perform several checks on CDS features, ensuring that the value in the translation qualifier matches the theoretical translation of the CDS feature, taking into account any “transl_except”, “codon_start”, or “transl_table” qualifiers. It will also ensure that the value of the protein identified in the “protein_id” qualifier matches the value of the “translation” qualifier. The validator will also check location formats for feature keys to ensure the format is compliant. However, it is ultimately the responsibility of the applicant to ensure that a sequence listing accurately and completely describes sequences disclosed in a patent application.

The WIPO Sequence validator cannot guarantee an error-free sequence listing because certain values cannot be checked in an automated way and require human review. For example, if the user errs in the entry of a custom organism name, the validator will only generate a warning to the user to confirm that the entry is correct. By definition, custom organism names are not included in the comprehensive list of scientific organism names contained in WIPO Sequence and, accordingly, cannot be verified by the tool. During human review of the sequence listing after submission to an IPO, the error in the custom organism name could be found, resulting in a rejected sequence listing.

V3. Will WIPO Sequence generate a warning or error if feature keys or qualifiers are missing from modified residues?

WIPO Sequence will not detect that a residue is modified until the user adds a feature key and qualifier indicating the modification. For example, if a user wants to include an amino acid sequence with a single D-alanine in a sequence listing, the sequence that is “introduced” (typed-in, pasted in, or imported in) will simply have an “A” at the position of the D-alanine. Until the user adds a feature and qualifier indicating that the alanine is a D-alanine, WIPO Sequence will simply treat the “A” as an L-alanine.

If a user enters a feature key into a sequence, such as “modified_base”, but does not include the mandatory qualifier “mod_base”, WIPO Sequence will list an error on validation.

Variables “n” and “X” have default values under Standard ST.26 rules; therefore, if a user includes the variable “n” in a nucleotide sequence or the variable “X” in an amino acid sequence and does not include an annotation that describes the value of the “n” or “X” residue, WIPO Sequence will not list an error or warning on validation. It is perfectly acceptable to have these variable residues in a sequence and not include a description. The user must ensure that all “n” and “X” residues are properly described, if they are not equivalent to the default value.

V4. I validated my project and saw a warning about an organism name. Does that mean I cannot use that particular organism name?

Not necessarily. A warning about an unrecognized organism name simply means that the name entered is not already contained in the WIPO Sequence organism name list. It doesn’t mean that you cannot use that name. The organism name list integrated into WIPO Sequence is large, but it does not contain all valid scientific organism names. If you see this warning on a validation report, simply check that the organism name you entered is an accepted genus/species name or virus name. If you are confident that your organism name complies with the requirements of WIPO Standard ST.26 paragraphs 77-83, then you can safely ignore the warning.