1. How do you determine the exemplary drawing displayed on the front page?
2. How do you control live links embedded in patent applications?
3. How do I interpret the document number "0941"?
4. SIRs (Statutory Invention Registrations) have invalid patent numbers and external file names.
5. How do you determine that a patent is subject to a terminal disclaimer?
8. Should the B578US tag be used for multiple exemplary claims in single instance, such as "1,20"?
10. How do you reconcile paragraph types across Red Book, Green Book, and Blue Book?
11. Where can I get the mathmlAlias and mathmlExtra entity files?
12. Should the paragraph level attribute be delimited with quotes?
13. Why did the line breaks change within SDOBI?
14. Why is the B130 tag empty?
15. Should the B210 field length be 8 or 10 characters?
16. Will italics be utilized within citations?
18. Does MathML support character pullouts?
21. Where are the character pullouts and diacritical characters?
22. Where do I get information regarding the TIFF image file specifications?
24. Is there any situation where B582 Field of Search may be legitimately missing?
25. Why is the SIR information missing?
26. Will Red Book utilize CML (Chemical Markup Language) similar to it s MathML implementation?
28. What became of the LREP (Legal attorney or representative information) Green book tags?
30. Why are the Assignee Addresses empty?
31. How do I resolve/render the USPTO specific special characters?
33. Is the ENTRY element being used to identify when a new column begins in a row of a CALS Table?
34. Why do Reissue patents sometimes drop claims or group claims?
35. Why is highlighting sometimes incorrect around numeric data?
36. There are several problems with the B570 data element.
37. Why do Design Patents sometime appear as "D. 99999", "D0099999", or "Des. 99999"?
39. What are the differences between versions 1.8 and 1.9 of the Red Book DTDs?
40. Will math and tables be internal or external files?
42. Are the third party ML's included in your DTD (i.e. CALS MILS-M-28001 TABLEPAK) XML-compliant?
45. Does the USPTO still want Blue Book users to switch to Green Book?
46. Within CWUs are embedded images (EMIs) optional rather than a necessary feature of the patent?
47. Is it possible to receive a sample of a re-issue of a re-issue?
48. Were the 13 Red Book examples used in production?
50. What model of DLT tape is going to be used to create Red Book? Why not use CD-ROM?
52. What is this "FOR" class code? As in 320FOR123?
56. Where is the "Rule 47 provision" in the Red Book?
57. In tables how are column widths measured?
58. Since doctype declarations are not allowed in DTDs, why is it in the version 1.8 DTD?
60. How do you interpret the terminal disclaimer statements?
61. Element B540 has incorrect highlighting.
Red Book does not currently identify or include the exemplary drawing. In addition, the exemplary drawing differs from other drawings in that it is a single image consisting of one or more figures with the figure reference(s) removed.
As of Build 20000307 the exemplary drawing is included within the SDODR element identified with sequence number of all zeroes, as in US06037034-20000314-D00000.TIF for the exemplary drawing and US06037034-20000314-D00001.TIF for the first drawing. A future release of the Red Book DTD will also include an element to identify the existence of an exemplary drawing.
Prior to tagging the patent application data, "&", "<", and ">" are converted to "&", "<", and ">" respectively in order to avoid conflicts with the element and entity tagging syntax. This conversion also deactivates embedded URLs when the data is rendered within an HTML browser. This conversion is not performed for Blue Book data.
This is a software error within Red Book related application processing and is not a data capture problem. This problem was identified and fixed several months ago in Red Book conversion routine relapp.pl, but does exist within some of the distributed sample data. Reference Build 20000307 for corrected sample data.
The problem was corrected 2000/03/06 in Red Book conversion routine cleanup.xom. Reference Build 20000307 for corrected sample data. A similar problem may exist with Design SIR (Constant "HD" followed by a 6 digit numeric, right justified, with leading zeros), and Plant SIR (Constant "HP" followed by a digit numeric, right justified, with leading zeros).
Terminal disclaimer information is captured and retained by the data-capture contractor, but was not passed on to Red Book generation. The data-capture contractor corrected this problem in Build 20000307.
This problem was detected as part of the Red Book to Green Book conversion, and was corrected in January 2000 in Red Book conversion routine fdc.xom. Reference Build 20000307 for corrected sample data.
As part of the data capture process, "plus E" codes are used to define composition type attributes for one or more characters. The leading code is of the form +E,xxx where xxx defines the composition type attribute, and the terminating code is +EE. Plus E codes can also be embedded implying that the 1st +EE occurrence may not be the correct paired terminator.
Since these codes span a set of characters, the initial Red Book conversion simply replaced the character set and codes with a descriptive text as described below:
- +E,uml convert to text of the form: {umlaut over (c...c)}
- +E,acu convert to text of the form: {acute over (c...c)}
- +E,gra convert to text of the form: {grave over (c...c)}
- +E,cir convert to text of the form: {circumflex over (c...c)}
- +E,dot convert to text of the form: {dot over (c...c)}
- +E,otl convert to text of the form: {tilde over (c...c)}
- +E,utl convert to text of the form: {tilde under (c...c)}
- +E,hac convert to text of the form: {haeck over (c...c)}
- +E,rar convert to text of the form: {right arrow over (c...c)}
- +E,ovs convert to text of the form: {overscore (c...c)}
- +E,dos convert to text of the form: {double overscore (c...c)}
- +E,crc convert to text of the form: {circle around (c...c)}
- +E,rad convert to text of the form: {square root over (c...c)}
- +E,fra convert to text of the form: {fraction (c...c/...c)}
However, for a single (or repeated) character set, there may be simpler character mappings such as ö instead of {umlaut over (o)}, or ää instead of {umlaut over (aa)}. But there are many characters with simple implementations, and addressing all cases is not achievable at this time. Consequently, simpler conversions will be implemented as identified. If you have encountered additional simple character mappings that need to be updated, please identify the existing Red Book text and recommended replacement text (sgml entities) and send to USPTO Red Book Contacts. Reference Build 20000307 for "{umlaut over (o)}" corrected sample data. "{umlaut over (aa)}" has yet to be implemented.
Each claim number should be tagged separately with B578US. This is a conversion oversight that was corrected in January 2000 in Red Book conversion routine fdc.xom. Reference Build 20000307 for corrected sample data.
No, the B583US tags should be used instead. This is a conversion oversight that was corrected in January 2000 in Red Book conversion routine fdc.xom. Reference Build 20000307 for corrected sample data.
Red Book was not accurately tracking paragraph types (specifically, PAL/+PS and PA5/+P5 codes) and has been modified as follows:
Blue Book Red Book Green Book "+P " <PARA LVL="0"> PAR "+P0 " <PARA LVL="1"> PA0 "+P1 " <PARA LVL="2"> PA1 "+P2 " <PARA LVL="3"> PA2 "+P3 " <PARA LVL="4"> PA3 "+P4 " <PARA LVL="5"> PA4 "+P5 " <PARA LVL="6"> PA5 "+PS " <PARA LVL="7"> PAL "+PA " <PARA LVL="0"> PAR (abstract paragraph within Red Book it is associated with the <SDOAB> tag) "+CL " <H LVL="1"> PAC (centerline/header) This required a DTD change (implemented within V1.9) as follows:
<!ATTLIST PARA ID ID #IMPLIED LVL (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7) #IMPLIED>
The paragraph problem was corrected within the Red Book conversion routine fdc.xom. Blue Book +P5 code (LVL 6) is currently not used. Reference Build 20000307 for corrected sample data.
The Red Book Catalog file identifies the files required to parse and validate Red Book, including mathmlAlias and mathmlExtra. Starting with the issue 2000-03-28, the data-capture contractor will also include all of the DTD and entity files used to generate Red Book within the delivered issue tape in directories DTDS and ENTITY. The Red Book Catalog file is included within the DTD directory.
Yes. Claim steps, paragraphs, and header attributes were not being delimited (embedded within double quotes). The problem was corrected within the Red Book conversion routine fdc.xom. Reference Build 20000307 for corrected sample data.
Line breaks between tag sets are include for readability within an ASCII editor and have no impact on the data's SGML/XML validity. However, we do appreciate the potential impact on import/translation utilities. If they continue to be a problem, then we suggest that you remove all newlines from the document instance prior to import, thus ensuring a consistent format.
Removing the line breaks will be more challenging when sequence listing data is included in tagged form (as opposed to the current CALS table format). This is because the listing data includes newlines as part of the data, the removal of which will significantly alter the intended display layout of the data.
This is a problem in fdc.xom resulting from not testing for all document kinds, and has been corrected. Reference Build 20000307 for corrected sample data.
For U.S. patent applications, the document number is a fixed length of eight positions. The first two positions are the series code and the following six positions are the serial number, left padded with zeros. Two programs (fdc.xom and relapp.pl) were modified to include the series code (with leading zero if required) within the B210 tag. Reference Build 20000307 for corrected sample data.
Within the typeset copies of a granted patent, italics, boldface and other highlighting are used to indicated text added or deleted as the result of amendments or other modifications to a patent. But within Red Book, changes are tracked utilizing the DEL-S, DEL-E, INS-S, and INS-E elements which stand for, respectively, delete start, delete end, insert start, and insert end. When rendering a Red Book document instance, the styling application will be responsible for applying the highlighting and inserting required text delimiters such as "[" and "]". This allows the highlighting of Red Book content such as bold, underscore, italics, etc. without conflicting with amendment processing. With respect to citations, content will be transcribed exactly as presented within the application.
Be advised that the next Red Book DTD release will implement INSERT and DELETE elements that have both start and end tags. This differs from the current implementation that utilizes 4 empty elements with a terminating element pointing to the initiating element.
B861 contains a DOC structure, which contains, among other elements, the KIND element. The KIND element should always contain either a kind code in conformance with WIPO Standard ST.16, or text describing the document kind. "00" has no defined meaning for this element.
The kind code is captured by the data-capture contractor, but not passed on to Red Book generation. If the PCT application has yet to be published, 00 is used. The data-capture contractor corrected this problem within Build 20000307.
Unfortunately not, and consequently the tools used by the data-capture contractor are unable to accurately export character pullouts. The data-capture contractor uses Mathematica to generate the math complex work units (CWU). When a CWU contains custom characters (characters not defined within an existing entity file) the character is scanned and brought into Mathematica as a TIFF file. The TIFF image is integrated into the exported notebook and EPS files, but not within the exported MathML, since that would create parsing errors within the MathML markup.
The data-capture contractor and Mathematica have defined a means to capture character pullout content within MathML as defined entities within the document instance, but this will not likely be implemented until late April 2000, and problems will still remain with character pullouts within the test data published earlier.
The sample data initially distributed was converted to Red Book from Green Book. Since the citations are not "structured" in Green Book, the conversion software made no attempt to populate the various citation elements in Red Book. In addition, the Red Book DTD has since been changed to eliminate the structure for citations in favor of transcribing the citation exactly as presented in the application. This avoids errors in interpretation of the abbreviations and other aspects of a citation by the data-capture contractor. Consequently the ARTCIT, BOOKCIT, DBASECIT, OTHCIT, and subordinate elements have been replaced with simpler CIT, NCIT, and PCIT elements.
The data-capture contractor does not currently use Red Book data to compose the printed patent document. Nevertheless, it is the intention of the USPTO that Red Book will replace Blue Book as a complete and reliable source of the content of published patent grants. Obviously, we still have some way to go to achieve that goal. With the continuing evolution of SGML/XML rendering tools it will also be possible to render the patent from the Red Book file with appropriate style sheets. Tests with current Red Book data and WordPerfect 9 indicate that it is possible to render a patent in a style that differs from the printed product in only minor ways.
The initial Red Book test data was feature poor, primarily because it was derived from Green Book data. The test data has steadily improved, with character pullouts (custom characters), diacritical characters, page pullouts, highlighting, chemical, math and table CWUs, etc., all implemented in 1999, and continuing with Sequence Listing data currently in development.
Tagged Image File Format (TIFF) Image SpecificationsImage files will be Tagged Image File Format (TIFF) revision 6.0 with CCITT Group 4 compression. There will be only one TIFF image per file and no private data fields will be used. They are defined within a document instance using the embedded image element (EMI) as described below:
<EMI ID="EMI-nnnnnn" WI="mmm" HE="mmm" FILE="USxxxxxxxx-yyyymmdd-nnnnnn.TIF">
The embedded image record attributes identify the size of the image and links it to its physical location The embedded image is referred to as a callout. A embedded image record will be created for each complex work unit (CWU) and each drawing sheet. Each embedded image is sequentially numbered from the beginning of the patent.
Embedded Image Record Attributes:
- ID="EMI-nnnnnn" is a required attribute that identifies the sequence of the image within the patent. nnnnnn is a 6-position sequence number.
- HE="mmm" is an optional attribute that identifies the height of the image in millimeters. mmm is variable length numeric.
- WI="mmm" is an optional attribute that identifies the width of the image in millimeters. mmm is variable length numeric.
- FILE="USxxxxxxxx-yyyymmdd-nnnnnn.TIF" is a required attribute that identifies the logical file name of the tiff file that contains the embedded image. xxxxxxxx is the patent number, yyyymmdd is the issue date , and nnnnnn is a 6-position sequence number.
Each embedded image record must also have a corresponding entity record that associates the "USxxxxxxxx-yyyymmdd-nnnnnn.TIF" logical file name to the TIFF file. The entity record appears within the patent <!DOCTYPE > record and is created as follows:
<!ENTITY USxxxxxxxx-yyyymmdd-nnnnnn.TIF SYSTEM " USxxxxxxxx-yyyymmdd-nnnnnn.TIF " NDATA TIF>
Width/Height Initialization
Red Book currently hard codes the image width and height, even though the information is readily available within the header of the TIFF files. The next release of the Red Book DTD will omit the width and height attributes, requiring that the rendering tool interrogate the TIFF header for the units and file sizes.
Multiple Pages
There will be only one TIFF image per file. If a CWU image spans multiple pages, there will be one file per page.
Odd Byte Widths/Heights
Most likely yes. The standard viewers we have tested are able to display the images regardless of whether the width or height is an odd or even number of bytes.
Unfortunately no. The current Red Book generation software does not initialize this field. Term extension information under 35 USC 154 is captured and retained by the data-capture contractor, but not processed by Red Book generation. The data-capture contractor has corrected this problem in Build 20000307.
This is usually an omission by the examiner that was not caught by the data-capture contractor. Empty B582 tags are generated to avoid parsing errors. All US patent documents are required to have a field of search, and the data-capture contractor is required to return the file wrapper to the USPTO if the information is missing so that it can be supplied. The USPTO is currently investigating what method will be used to correct the defective documents. The data-capture contractor is revising procedures to define this condition as requiring immediate corrective action.
A SIR is identified by the B130 element containing an "H". In this case, the identical boiler plate text is included within the printed patent. Reference FAQ # 14 for additional discussions about the B130 element.
This is definitely in Red Book's future but will likely wait until CML stabilizes and Red Book migrates to XML. ChemDraw has already produced a Beta version of ChemDraw 6 and the data-capture contractor has been testing its CML export capabilities.
Yes, the data-capture contractor is currently working on converting the sequence-listing data contained within tables to the SEQ-LST CWU.
Element B740 (Identification of legal representation, that is, attorneys, agents, or representatives associated with the document) contains multiple B741 (Attorney address) tags, which in turn contains a PARTY-US tag that contains a name, optional address, optional place of residence, optional descriptive text, optional country of residence, and optional country of nationality. They will map to the AAT, AGT, REG, STR, CTY, STA, CNT, ZIP Green Book tags.
The mapping of related patent application documents between the Green/Blue Book records and the Red Book SGML DTD is somewhat complex and warrants further explanation. In the Green/Blue Book, the related application document (RLAP) records express a linked parent/child relationship as a sequence of record sets, with the relationship defined by the Parent Code (COD) in the record set. For example, given the relationship:
Document A is a continuation-in-part of document B and document C, each of which is a division of document D.
Green/Blue Book would encode this as follows:
- Document A is the base document (i.e., the one in the PATN record group).
- A series of (RLAP) record sets would then occur in the following order by Parent Code (COD):
- COD=72 (Continuation-in-part of) Document B
- COD=90 (and) Document C
- COD=92 (,each) No Document
- COD=84 (,which is a division of) Document D
In the Red Book DTD, by contrast, the Parent/Child relationship is expressed explicitly in the PARENT model group. The PARENT model group is expressed as (DNUM, PDOC, PSTA?, PPUB), corresponding to the Child document number, Parent Document, Status Code, and Publication Date respectively. Red Book would encode the above example as:
- <B632><DNUM>Document A</DNUM><PDOC><DNUM>Document B</DNUM><PDOC> ... and so on
- <B632><DNUM>Document A</DNUM><PDOC><DNUM>Document C</DNUM><PDOC> ... and so on
- <B620><DNUM>Document B</DNUM><PDOC><DNUM>Document D</DNUM><PDOC> ... and so on
- <B620><DNUM>Document C</DNUM><PDOC><DNUM>Document D</DNUM><PDOC> ... and so on
Thus, there is a correspondence between the tagging schemes, although it is not necessarily a simple one. There is a general relationship between the following Green/Blue Book and Red Book constructs:
Green/Blue Book Parent Code (COD) Red Book Element 74, 84 B620 (Division) 71, 81, 91 B631 (Continuation) 72,82 B632 (Continuation-in-part) 73 B660 (Substitution) 86, 89, Not mapped 92 (, each) Spawns multiple records (i.e., one for each document up to the previous record which is not an AND ) 90 (And) Spawns multiple records of the previous type (e.g., multiple Continuations if the previous non-And record was a continuation)
The data-capture contractor captures the assignee address information exactly as it appears on the PTOL85b form, and from nowhere else per current rules. If the address is incomplete or omitted, then that is the way it appears in the published patent.
The uspto.ent file contains a set of special characters that have Blue and Green Book codes, but no public SGML/XML entities. Listed below are the problem characters:
Green Book Red Book Blue Book .dotbhalfcircle. &Dotbhalfcircle; # 511 .dotthalfcircle. &Dotthalfcircle; # 510 .dotlhalfcircle &Dotlhalfcircle; # 508 .dotrhalfcircle. &Dotrhalfcircle; # 509 .dottedcircle. &Dottedcircle; # 505 .lhalfcircle. &Lhalfcircle; # 503 .quadbond &Quadbond; # 185 .rhalfcircle. &Rhalfcircle; # 504 .centerline. &Centerline; # 551 .asterisk-pseud. &Asteriskpseud; # 553 .rect-ver-solid. &Rectversolid; # 563 .rect-solid. &Rectsolid; # 564 .oval-hollow. &Ovalhollow; # 571 .oval-solid. &Ovalsolid; # 572 .circle-solid. &Circlesolid; # 574 .h-slashed. &Hslashed; # 528 .paren-open-st. &Parenopenst; # 545 .paren-close-st. &Parenclosest; # 546 .brket-open-st. &Brketopenst; # 547 .brket-close-st. &Brketclosest; # 548 .BHorizBrace. &Bothorzbrace; # 507 .THorizBrace. &Tophorzbrace; # 506 The data-capture contractor has captured these characters as glyphs for rendering, and as of issue 20000328 they are now included within the entity directory distributed with the Red Book media.
No, but it does exist in many locations. The USPTO is Investigating
Yes, it is.
This is a result of the reissue insert and delete tags starting in one claim and terminating in another claim. If it starts at the end of the previous claim, then the problem was corrected in February 2000.
Reference Build 20000307 for corrected sample data.
Performance changes implement in late 1999 introduced a problem with un-bolding numeric data. The problem was corrected in December 1999. Reference Build 20000307 for corrected sample data.
This is a problem discussed in FAQ# 8.
Revision elements are used only within Reissued patents. In addition, within the bibliographic information, it is only used with changes to the title and abstract. In the cited example, the front page of the published reissued patent also had no indication that the exemplary claim had changed. The USPTO will investigate whether this is a policy or a mistake.
The USPTO is investigating.
This problem existed in the first issue of 2000 and was corrected in subsequent issues.
Within the Red Book DTD are comments detailing the revision history. The V1.8 to V1.9 changes are listed below, as well as a list of V1.8 to V1.9 content model changes.
Revisions 2000-03-07:
- Element B580 changed from (B582+,B583US*) to (B582 | B583US)+ to allow for any order of structured or unstructured national classification
- Element F, added PTEXT to allow non-CWU in-line formulae
- Element H, removed F (now part of STEXT)
- Element PAR, LVL attribute changed from (0 | 1 | 2 | 3 | 4 | 5) to (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7) in order to capture all known paragraph types.
- Element PTEXT added F to allow non-CWU in-line formula
- Changed version number to 1.9 and date to 2000-03-07
Content model changes:
Element - B580
ST32-US-Grant-018.DTD content model: (B581*,B582+,B583US*)
ST32-US-Grant-019.DTD content model: (B581*,(B582 | B583US)+)
Element - F
ST32-US-Grant-018.DTD content model: (MATH)
ST32-US-Grant-019.DTD content model: (MATH | PTEXT)
Element - H
ST32-US-Grant-018.DTD content model: (STEXT | F)+
ST32-US-Grant-019.DTD content model: (STEXT+)
Element - PTEXT
ST32-US-Grant-018.DTD content model: (B830 | CIT | CLREF | CRF | CWU | DFREF | DNUM | FGREF | FOO | FOR | HIL | IMG | LST | LSTREF | PAREF | PDAT | SEQREF | TBLREF)+
ST32-US-Grant-019.DTD content model: (B830 | CIT| CLREF | CRF | CWU | DFREF | DNUM | F | FGREF | FOO | FOR | HIL| IMG | LST | LSTREF | PAREF | PDAT | SEQREF | TBLREF)+
Element - STEXT
ST32-US-Grant-018.DTD content model: (PDAT | FOR | IMG | HIL)+
ST32-US-Grant-019.DTD content model: (PDAT | F | FOR | IMG | HIL)+
Earlier versions of Red Book proposed both internal and external complex work units (CWUs). A compromise was reached with Red Book ST.32 US Patent Grant V1.8 1999-08-26 DTD supporting inline tables (CALS) and inline math structures (SGML tailored MathML), but also allowing for external TIFF image file references for all CWUs.
CALS tables are easily rendered via a composition tool like Corel WordPerfect, and math can be displayed from the associated TIFF file. The MathML content is included to support content searching.
The Red Book ST.32 US Patent Grant V1.9 2000-03-07 DTD was further modified to allow for simple text tagged as in-line math without the MathML structure. However, simple in-line math structures like E=MC2 can be rendered using highlighting and not be associated with math content. This is appropriate for H2O Systems Inc., but not for true math structures.
Currently, the DTD is SGML formatted and sample documents delivered within the first year will be SGML. However, the DTD is readily converted to XML by removing tag minimization indicators and substituting an XML version of CALS markup (all other requirements of XML having been met already). Document instances differ from XML in minor ways, such as empty tag syntax, Unicode verses ISO character set references, and CALS table markup syntax.
Red Book will likely migrate to XML in the next few years, possibly at the time that applications published at eighteen months are first allowed and ready for publication as grants. It is unlikely that the test data currently distributed would be converted by the USPTO to XML. It is much more likely that production data in SGML will be converted to XML.
The CALS Table markup is not XML but SGML. In migrating to XML, a new table model such as a forthcoming XML version of CALS will likely be used.
MathML is XML but the referenced MathML DTD was modified to conform to the SGML syntax. The MathML data content is modified on import to Red Book for both empty tag syntax, and character entity declarations.
When an XML Red Book product is available, the corresponding DTDs will also be provided. As of Build 20000307, equivalent SGML information is available within the distributed DTD and ENTITY directories.
We have not yet created any XML Red Book document instances.
The data-capture contractor is parsing the SGML instances using James Clark's SP parser with the following warnings activated:
- Warn about mixed content models that do not allow #PCDATA anywhere.
- Warn about various non-compliance recommendations made in ISO.
- Warn about defaulted references.
- Warn about undefined elements: elements used in the DTD but not defined.
- Warn about various dubious constructions in the SGML declaration.
- Warn about unused short reference maps.
- Warn about parameter entities that are defined but not used in a DTD.
- Warn about empty start and end tags.
- Warn about unclosed start and end tags.
Blue Book is provided by the data-capture contractor and converted to Green Book by the USPTO. Since Red Book will replace Blue Book, a Red to Green conversion tool is being developed by the USPTO to extend the life of Green Book beyond Blue Book.
Red Book is still a maturing product. It represents the USPTO's first major step in the direction of standard, generalized markup. Both Blue Book production data and Red Book sample data will be delivered for customer review and feedback until such time as the USPTO is satisfied that Red Book meets the needs of Blue Book customers.
At such time that production Green Book is generated from Red Book rather than Blue Book, customers will be notified in advance. When it takes place, this change will not delay the delivery of Green Book.
Early versions of Red Book omitted EMIs within CWUs for the reason that there were no TIFF files available for those documents converted from Green Book to Red Book. All EMIs within the CWUs were consequently optional.
Now that Red Book is being generated from production data, the current version of Red Book DTD (ST.32 US Patent Grant V1.9 2000-03-07) enforces TIFF images for chemical and math CWUs, but still allows optional images for both tables and sequence listings. The data-capture contractor does not generate TIFF images for these CWU types at present. The USPTO will ask the data-capture contractor to create images of all tables, in addition to the markup. Images of sequence listings are under investigation.
Reissue markup utilizes a set of unpaired insert and delete tags that track changes made to the document. As of release ST.32 US Patent Grant V1.9 2000-03-07 two issues remain with reissue tagging:
- Date attributes associated with the insert and delete tags are not being initialized by the data-capture contractor, and consequently tracking changes to changes is compromised.
- The USPTO is proposing paired insert and delete tags for the next release. This will eliminate the current problem of overlapping reissue tags.
Due to the likelihood of difficulties with a reissue of a reissues patent in Red Book, the USPTO continues to investigate this problem.
The 13 sample patents are derived from Green Book using an early release of Red Book and do maintain the integrity of the table as coded in Green Book. This, as you noted, was not the source of the typeset patent. We recommend that you look at tables in the current DTD (ST.32 US Patent Grant V1.9 2000-03-07), starting with issue 20000328.
With respect to the maximum record size:
The weekly issue tapes consist of a variety of file types of which only one is ASCII, the SGML document instance for each patent. The record length within the SGML file is unlimited (i.e. a paragraph may be contained within a single record and there is no limit on the paragraph record size). However, as of release ST.32 US Patent Grant V1.9 2000-03-07, the SGML file is relatively newline/record insensitive, and can be re-blocked as long as record breaks are not introduced within leading and trailing tags. This may not be true in future releases (i.e. sequence listing data will include record breaks as part of data content).
With respect to the logical format of Red Book:
The logical format of the Red Book is best understood by traversing the DTD. The last version of the Red Book DTD (ST.32 US Patent Grant V1.8 1999-08-26) is documented in an HTML format that permits you to walk the tree starting with element PATDOC, and is available at the following URL:
http://www.uspto.gov/web/offices/ac/ido/oeip/sgml/st32/redbook/st32g018/index.html
Similar documentation for the current DTD (ST.32 US Patent Grant V1.9 2000-03-07) is being generated and will be available in the near future.
With respect to the physical format of Red Book:
Physically the Red Book weekly issue tape contains the full text, drawings, and complex work units (tables, mathematical expressions, sequence data, and chemical structures) of each patent issued. The file format is Standard Generalized Markup Language (SGML) in accordance with the ST.32 US Patent Grant V1.9 2000-03-07 Document Type Definition (DTD). Tables and sequence data are included using CALS SGML markup. Mathematical expressions are included using MathML XML markup and external Mathematica Notebook (NB) files. Chemical structures are represented by external CS ChemDraw (CDX) files and MDL Information Systems (MOL) files. Drawings, mathematical expressions, and chemical structures also include external Tagged Image File Format (TIFF) Revision 6.0 with CCITT Group 4 Compression files.
Each weekly update contains approximately 3,000 patents (800 megabytes) on one HP DLT IIIXT (TK85XT) tape. All files associated with a specific patent are compressed and zipped into a single patent zip file. Zipped patent files are grouped by type within a pre-determined directory scheme and re-zipped with path information (but not compressed) into a single weekly update file. The weekly update file is then copied to a DLT tape using the UNIX TAR facility.
Grouping is based on the following directory tree:
YYYYMMDD |-UTIL0601 |-US0601nnnn-YYYYMMDD.ZIP |-UTIL0602
|-US0601nnnn-YYYYMMDD.ZIP
|- . . .|-US0602nnnn-YYYYMMDD.ZIP |-UTIL0603 . . .
|- . . .
|-PLANT|-USP0nnnnnn-YYYYMMDD.ZIP |-DESIGN
|- . . .|-USD0nnnnnn-YYYYMMDD.ZIP |-REISSUE
|- . . .|-USREnnnnnn-YYYYMMDD.ZIP |-SIR
|- . . .|-USH0nnnnnn-YYYYMMDD.ZIP |-DTDS
|- . . .
|-ENTITIESWhere:
- The root directory is the issue date;
- Utility patents are distributed into sub directories "UTIL" plus the first four characters of the patent number. This assures a maximum of 1000 zipped patent files within a single directory.
- Plant, Design, Reissue, and Sir patents are distributed into their respective directories listed above. Note that if the weekly issue does not have a specific patent type, then the patent type sub directory will be omitted.
- Sub directory DTDS contains the DTDs and catalogs used to parse the issue.
- Sub directory ENTITIES contains the entity files and glyphs referenced by the Red Book DTD.
The drives used to create Red Book tapes are HP DLT 30e and 40e using no hardware compression and HP DLT IIIXT (TK85XT) or equivalent media. Although some issues would fit, CD-R does not have sufficient capacity for every issue.
The 13 Red Book examples do not adequately capture the paragraph types. This is a result of treating paragraph formatting as a style issue. However, in subsequent releases of Red Book, both claim step processing and paragraph types are accurately captured. Reference the latest version of Red Book (ST.32 US Patent Grant V1.9 2000-03-07).
The FOR should be handled the same way as DIG. FOR refers to a collection of foreign art (non-US patents) that exists in a subclass.
ADR is for correspondence and RESIDENCE indicates where the inventor lives, which sometimes might be a different country than the correspondence address. In any case the RESIDENCE element is not a complete address but only indicates which branch of the military or a city (with or without a state or country). The ADR element gives what should be a complete address.
Art Units are now called Industry Sectors. The B474US element has been defined to cover the variations expected in this field. Numbers can be up to four digits. The space in the third position was an error.
The DTD allows zero or more assistant examiners. This is not a change in policy.
There is now an empty tag indicating that Rule 47 was invoked. It might not be in any of the examples, but it is in the DTD. When present, this tag signifies that the application was filed under Rule 47 indicating that the applicant(s) refused to execute the application or could not be found
<!ELEMENT B221US - O EMPTY >
Table column widths are based on 72 points per inch. 120 PT therefore would be 1.67 inches.
Some parsers or SGML applications require the DOCTYPE declaration while others complain when it is included. For example, the data-capture contractor comments out the declaration as follows:
<!-- DOCTYPE PATDOC [ -->
...
<!-- ]> -->
This element is currently not initialized in Build 20000307. The USPTO is investigating.
In general, the presence of a terminal disclaimer is a signal that one must pursue further investigations to determine whether a patent is or is not in force. Nothing that appears on the front page of a patent grant at time of issue can be taken at its face value with respect to this question since any of it could have changed after the file wrapper was closed for printing.
The USPTO is investigating.
In order to accurately report the status of a specific Red Book document instance, the data-capture contractor is now identifying both the DTD version and the build date within the instance as follows:
<PATDOC DTD="1.9" STATUS="BUILD 20000307">
The DTD attribute defines the version of the Red Book DTD and the STATUS attribute defines the build date of the application software that created the instance. Instances that do not contain the STATUS attribute were generated prior to implementing the build controls.
Since problem resolutions are now reported by build, the build number can be used to associate data problems and their resolution with a specific version of the Red Book build software. Weekly issue 20000328 is the first issue created with "Build 20000307".
Please address questions about the Red Book data to Ed Johnson and questions about the Red Book DTD to Bruce Cox.