ST.32 US Patent Grant V1.9 2000-03-07 (Red Book)

Frequently Asked Questions (FAQ)

2000 April 12

Questions:

1. How do you determine the exemplary drawing displayed on the front page?

2. How do you control live links embedded in patent applications?

3. How do I interpret the document number "0941"?

4. SIRs (Statutory Invention Registrations) have invalid patent numbers and external file names.

5. How do you determine that a patent is subject to a terminal disclaimer?

6. Reissue patents are missing the B640 record (Earlier document of which the present document is a reissue).

7. Why do names sometimes contain "{umlaut over (o)}" versus "ö" or "{umlaut over (aa)}" versus "ää"?

8. Should the B578US tag be used for multiple exemplary claims in single instance, such as "1,20"?

9. Should the B582 tag be used for unstructured US classifications such as a combination of classes, subclasses, ranges, etc.?

10. How do you reconcile paragraph types across Red Book, Green Book, and Blue Book?

11. Where can I get the mathmlAlias and mathmlExtra entity files?

12. Should the paragraph level attribute be delimited with quotes?

13. Why did the line breaks change within SDOBI?

14. Why is the B130 tag empty?

15. Should the B210 field length be 8 or 10 characters?

16. Will italics be utilized within citations?

17. What does "00" mean in the KIND Codes of the B861 (PCT document) and B871 (Published document) identification records?

18. Does MathML support character pullouts?

19. Will Journal and Book citations be structured as in the ARTCIT and BOOKCIT elements? Why can't I find any examples?

20. Can a full and complete representation of the patent can be derived from data contained in the Red Book format?

21. Where are the character pullouts and diacritical characters?

22. Where do I get information regarding the TIFF image file specifications?

23. Is B474US (indicates that the term of the patent has been extended under 35 USC 154(b)) ever utilized?

24. Is there any situation where B582 Field of Search may be legitimately missing?

25. Why is the SIR information missing?

26. Will Red Book utilize CML (Chemical Markup Language) similar to it s MathML implementation?

27. Patent 6,010,895 contains tables that consist of sequence listing data. Will they be converted to the SEQ-LST element?

28. What became of the LREP (Legal attorney or representative information) Green book tags?

29. Related Application data seem to match in Blue Book and Green Book but what happened to Red Book?

30. Why are the Assignee Addresses empty?

31. How do I resolve/render the USPTO specific special characters?

32. Should there be a slash between the series code and the application number with the DNUM element?

33. Is the ENTRY element being used to identify when a new column begins in a row of a CALS Table?

34. Why do Reissue patents sometimes drop claims or group claims?

35. Why is highlighting sometimes incorrect around numeric data?

36. There are several problems with the B570 data element.

37. Why do Design Patents sometime appear as "D. 99999", "D0099999", or "Des. 99999"?

38. The IPC Edition element B516 is at variance with the printed copies; the Red Book says one thing, the printed patent something else.

39. What are the differences between versions 1.8 and 1.9 of the Red Book DTDs?

40. Will math and tables be internal or external files?

41. Has the USPTO thought about using XML for Red Book, or converting Red Book to XML? Is the USPTO planning to offer the Red Book as a separate XML product at any point in the future?

42. Are the third party ML's included in your DTD (i.e. CALS MILS-M-28001 TABLEPAK) XML-compliant?

43. Will you provide the XML equivalent to the SGML Red Book DTD as well as referenced third party DTDs?

44. Have you tried to parse the XML-ized Red Book documents using an off-the-shelf XML parser or have you developed your own?

45. Does the USPTO still want Blue Book users to switch to Green Book?

46. Within CWUs are embedded images (EMIs) optional rather than a necessary feature of the patent?

47. Is it possible to receive a sample of a re-issue of a re-issue?

48. Were the 13 Red Book examples used in production?

49. Would you please provide details for the physical and logical format of the Red Book weekly issue?

50. What model of DLT tape is going to be used to create Red Book? Why not use CD-ROM?

51. In the past, i.e. using Blue Book data, it was possible to easily format the claims the same way we see them on the printed patent complete with hanging indents. How can we do this using the 13 examples?

52. What is this "FOR" class code? As in 320FOR123?

53. Inventor information contains the inventor address as well as the inventor residence. What is the difference between the two?

54. The art unit was always a three digit number (NNN) but now it is a variable length number with a space between the 2nd and 3rd digit (e.g. NN N). Why the difference?

55. The DTD claims that the assistant examiner may be repeated although I have never seen more than one in the samples. Has this changed?

56. Where is the "Rule 47 provision" in the Red Book?

57. In tables how are column widths measured?

58. Since doctype declarations are not allowed in DTDs, why is it in the version 1.8 DTD?

59. We find that the data element B122US is missing in SIRs. This keeps us from placing the appropriate "boiler-plate text" on the front page.

60. How do you interpret the terminal disclaimer statements?

61. Element B540 has incorrect highlighting.

 

Answers:

1. How do you determine if there is an exemplary drawing to be displayed on the front page?

Red Book does not currently identify or include the exemplary drawing. In addition, the exemplary drawing differs from other drawings in that it is a single image consisting of one or more figures with the figure reference(s) removed.

As of Build 20000307 the exemplary drawing is included within the SDODR element identified with sequence number of all zeroes, as in US06037034-20000314-D00000.TIF for the exemplary drawing and US06037034-20000314-D00001.TIF for the first drawing. A future release of the Red Book DTD will also include an element to identify the existence of an exemplary drawing.

2. How do you control live links embedded in patent applications?

Prior to tagging the patent application data, "&", "<", and ">" are converted to "&amp;", "&lt;", and "&gt;" respectively in order to avoid conflicts with the element and entity tagging syntax. This conversion also deactivates embedded URLs when the data is rendered within an HTML browser. This conversion is not performed for Blue Book data.

3. How do I interpret the document number "0941"?

This is a software error within Red Book related application processing and is not a data capture problem. This problem was identified and fixed several months ago in Red Book conversion routine relapp.pl, but does exist within some of the distributed sample data. Reference Build 20000307 for corrected sample data.

4. SIRs (Statutory Invention Registrations) have invalid patent numbers and external file names. 

Utility SIR patent numbers are supposed to be 8 characters in length consisting of a constant "H" followed by a 7-digit number, right justified, with leading zeroes. Patent 24001826 is incorrect and should be H0001826. External file name US24001826-19991123-D00005.TIF is incorrect and should be USH0001826-19991123-D00005.TIF.

The problem was corrected 2000/03/06 in Red Book conversion routine cleanup.xom. Reference Build 20000307 for corrected sample data. A similar problem may exist with Design SIR (Constant "HD" followed by a 6 digit numeric, right justified, with leading zeros), and Plant SIR (Constant "HP" followed by a digit numeric, right justified, with leading zeros).

5. How do you determine that a patent is subject to a terminal disclaimer? The terminal disclaimer date is no longer captured, but the disclaimer condition is via the B473US empty element tag. However, patent number 6,009,592 of Jan 4, 2000, does not have the terminal disclaimer condition defined.

Terminal disclaimer information is captured and retained by the data-capture contractor, but was not passed on to Red Book generation. The data-capture contractor corrected this problem in Build 20000307.

6. Reissue patents are missing the B640 record (Earlier document of which the present document is a reissue).

This problem was detected as part of the Red Book to Green Book conversion, and was corrected in January 2000 in Red Book conversion routine fdc.xom. Reference Build 20000307 for corrected sample data.

7. Why do names sometimes contain "{umlaut over (o)}" versus "&ouml;" or "{umlaut over (aa)}" versus "&auml;&auml;"?

As part of the data capture process, "plus E" codes are used to define composition type attributes for one or more characters. The leading code is of the form +E,xxx where xxx defines the composition type attribute, and the terminating code is +EE. Plus E codes can also be embedded implying that the 1st +EE occurrence may not be the correct paired terminator.

Since these codes span a set of characters, the initial Red Book conversion simply replaced the character set and codes with a descriptive text as described below:

However, for a single (or repeated) character set, there may be simpler character mappings such as &ouml; instead of {umlaut over (o)}, or &auml;&auml; instead of {umlaut over (aa)}. But there are many characters with simple implementations, and addressing all cases is not achievable at this time. Consequently, simpler conversions will be implemented as identified. If you have encountered additional simple character mappings that need to be updated, please identify the existing Red Book text and recommended replacement text (sgml entities) and send to USPTO Red Book Contacts. Reference Build 20000307 for "{umlaut over (o)}" corrected sample data. "{umlaut over (aa)}" has yet to be implemented.

8. Should the B578US tag be used for multiple exemplary claims in single instance, such as "1,20"?

Each claim number should be tagged separately with B578US. This is a conversion oversight that was corrected in January 2000 in Red Book conversion routine fdc.xom. Reference Build 20000307 for corrected sample data.

9. Should the B582 tag be used for unstructured US classifications such as a combination of classes, subclasses, ranges, etc.?

No, the B583US tags should be used instead. This is a conversion oversight that was corrected in January 2000 in Red Book conversion routine fdc.xom. Reference Build 20000307 for corrected sample data.

10. How do you reconcile paragraph types across Red Book, Green Book, and Blue Book?

Red Book was not accurately tracking paragraph types (specifically, PAL/+PS and PA5/+P5 codes) and has been modified as follows:

 Blue Book
Red Book
Green Book
"+P " <PARA LVL="0"> PAR
"+P0 "  <PARA LVL="1">  PA0 
"+P1 "  <PARA LVL="2">  PA1 
"+P2 "  <PARA LVL="3">  PA2
"+P3 " <PARA LVL="4"> PA3
"+P4 " <PARA LVL="5">  PA4 
"+P5 "  <PARA LVL="6">  PA5
"+PS "  <PARA LVL="7">  PAL 
"+PA " <PARA LVL="0">  PAR (abstract paragraph within Red Book it is associated with the <SDOAB> tag)
"+CL "  <H LVL="1">   PAC (centerline/header)

This required a DTD change (implemented within V1.9) as follows:

<!ATTLIST PARA ID ID #IMPLIED LVL (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7) #IMPLIED>

The paragraph problem was corrected within the Red Book conversion routine fdc.xom. Blue Book +P5 code (LVL 6) is currently not used. Reference Build 20000307 for corrected sample data.

11. Where can I get the mathmlAlias and mathmlExtra entity files?

The Red Book Catalog file identifies the files required to parse and validate Red Book, including mathmlAlias and mathmlExtra. Starting with the issue 2000-03-28, the data-capture contractor will also include all of the DTD and entity files used to generate Red Book within the delivered issue tape in directories DTDS and ENTITY. The Red Book Catalog file is included within the DTD directory.

12. Should the paragraph level attribute be delimited with quotes?

Yes. Claim steps, paragraphs, and header attributes were not being delimited (embedded within double quotes). The problem was corrected within the Red Book conversion routine fdc.xom. Reference Build 20000307 for corrected sample data.

13. Why did the line breaks change within SDOBI?

Line breaks between tag sets are include for readability within an ASCII editor and have no impact on the data's SGML/XML validity. However, we do appreciate the potential impact on import/translation utilities. If they continue to be a problem, then we suggest that you remove all newlines from the document instance prior to import, thus ensuring a consistent format.

Removing the line breaks will be more challenging when sequence listing data is included in tagged form (as opposed to the current CALS table format).  This is because the listing data includes newlines as part of the data, the removal of which will significantly alter the intended display layout of the data.

14. Why is the B130 tag empty?

This is a problem in fdc.xom resulting from not testing for all document kinds, and has been corrected. Reference Build 20000307 for corrected sample data.

15. Should the B210 field length be 8 or 10 characters?

For U.S. patent applications, the document number is a fixed length of eight positions. The first two positions are the series code and the following six positions are the serial number, left padded with zeros. Two programs (fdc.xom and relapp.pl) were modified to include the series code (with leading zero if required) within the B210 tag. Reference Build 20000307 for corrected sample data.

16. Will italics and boldface be utilized within citations?

Within the typeset copies of a granted patent, italics, boldface and other highlighting are used to indicated text added or deleted as the result of amendments or other modifications to a patent. But within Red Book, changes are tracked utilizing the DEL-S, DEL-E, INS-S, and INS-E elements which stand for, respectively, delete start, delete end, insert start, and insert end. When rendering a Red Book document instance, the styling application will be responsible for applying the highlighting and inserting required text delimiters such as "[" and "]". This allows the highlighting of Red Book content such as bold, underscore, italics, etc. without conflicting with amendment processing. With respect to citations, content will be transcribed exactly as presented within the application.

Be advised that the next Red Book DTD release will implement INSERT and DELETE elements that have both start and end tags. This differs from the current implementation that utilizes 4 empty elements with a terminating element pointing to the initiating element.

17. What does "00" mean in the KIND Codes of the B861 (PCT document) and B871 (Published document) identification records?

B861 contains a DOC structure, which contains, among other elements, the KIND element. The KIND element should always contain either a kind code in conformance with WIPO Standard ST.16, or text describing the document kind. "00" has no defined meaning for this element.

The kind code is captured by the data-capture contractor, but not passed on to Red Book generation. If the PCT application has yet to be published, 00 is used. The data-capture contractor corrected this problem within Build 20000307.

18. Does MathML support character pullouts?

Unfortunately not, and consequently the tools used by the data-capture contractor are unable to accurately export character pullouts. The data-capture contractor uses Mathematica to generate the math complex work units (CWU). When a CWU contains custom characters (characters not defined within an existing entity file) the character is scanned and brought into Mathematica as a TIFF file. The TIFF image is integrated into the exported notebook and EPS files, but not within the exported MathML, since that would create parsing errors within the MathML markup.

The data-capture contractor and Mathematica have defined a means to capture character pullout content within MathML as defined entities within the document instance, but this will not likely be implemented until late April 2000, and problems will still remain with character pullouts within the test data published earlier.

19. Will Journal and Book citations be structured as in the ARTCIT and BOOKCIT elements? Why can't I find any examples?

The sample data initially distributed was converted to Red Book from Green Book. Since the citations are not "structured" in Green Book, the conversion software made no attempt to populate the various citation elements in Red Book. In addition, the Red Book DTD has since been changed to eliminate the structure for citations in favor of transcribing the citation exactly as presented in the application. This avoids errors in interpretation of the abbreviations and other aspects of a citation by the data-capture contractor. Consequently the ARTCIT, BOOKCIT, DBASECIT, OTHCIT, and subordinate elements have been replaced with simpler CIT, NCIT, and PCIT elements.

20. Can a full and complete representation of the patent can be derived from data contained in the Red Book format?

The data-capture contractor does not currently use Red Book data to compose the printed patent document. Nevertheless, it is the intention of the USPTO that Red Book will replace Blue Book as a complete and reliable source of the content of published patent grants. Obviously, we still have some way to go to achieve that goal. With the continuing evolution of SGML/XML rendering tools it will also be possible to render the patent from the Red Book file with appropriate style sheets. Tests with current Red Book data and WordPerfect 9 indicate that it is possible to render a patent in a style that differs from the printed product in only minor ways.

21. Where are the character pullouts and diacritical characters?

The initial Red Book test data was feature poor, primarily because it was derived from Green Book data. The test data has steadily improved, with character pullouts (custom characters), diacritical characters, page pullouts, highlighting, chemical, math and table CWUs, etc., all implemented in 1999, and continuing with Sequence Listing data currently in development.

22. Where do I get information regarding the TIFF image file specifications? Why are TIFF image files identical in horizontal and vertical measurement, in spite of the fact that the patents themselves show them differently? Will CWUs have multiple page TIFF files? Will TIFF images have odd byte widths/heights?

Tagged Image File Format (TIFF) Image Specifications

Image files will be Tagged Image File Format (TIFF) revision 6.0 with CCITT Group 4 compression. There will be only one TIFF image per file and no private data fields will be used. They are defined within a document instance using the embedded image element (EMI) as described below:

<EMI ID="EMI-nnnnnn" WI="mmm" HE="mmm" FILE="USxxxxxxxx-yyyymmdd-nnnnnn.TIF">

The embedded image record attributes identify the size of the image and links it to its physical location The embedded image is referred to as a callout. A embedded image record will be created for each complex work unit (CWU) and each drawing sheet. Each embedded image is sequentially numbered from the beginning of the patent.

Embedded Image Record Attributes:

Each embedded image record must also have a corresponding entity record that associates the "USxxxxxxxx-yyyymmdd-nnnnnn.TIF" logical file name to the TIFF file. The entity record appears within the patent <!DOCTYPE > record and is created as follows:

<!ENTITY USxxxxxxxx-yyyymmdd-nnnnnn.TIF SYSTEM " USxxxxxxxx-yyyymmdd-nnnnnn.TIF " NDATA TIF>

Width/Height Initialization

Red Book currently hard codes the image width and height, even though the information is readily available within the header of the TIFF files. The next release of the Red Book DTD will omit the width and height attributes, requiring that the rendering tool interrogate the TIFF header for the units and file sizes.

Multiple Pages

There will be only one TIFF image per file. If a CWU image spans multiple pages, there will be one file per page.

Odd Byte Widths/Heights

Most likely yes. The standard viewers we have tested are able to display the images regardless of whether the width or height is an odd or even number of bytes.

23. Is B474US (indicates that the term of the patent has been extended under 35 USC 154(b)) ever utilized?

Unfortunately no. The current Red Book generation software does not initialize this field. Term extension information under 35 USC 154 is captured and retained by the data-capture contractor, but not processed by Red Book generation. The data-capture contractor has corrected this problem in Build 20000307.

24. Is there any situation where B582 Field of Search may be legitimately missing? Reference 6,009708 in issue 20000104. Over the last 6 months Red Book has encountered several patents with empty or omitted B582 (a U.S. Patent Classification searched by the examiner) records.

This is usually an omission by the examiner that was not caught by the data-capture contractor. Empty B582 tags are generated to avoid parsing errors. All US patent documents are required to have a field of search, and the data-capture contractor is required to return the file wrapper to the USPTO if the information is missing so that it can be supplied. The USPTO is currently investigating what method will be used to correct the defective documents.  The data-capture contractor is revising procedures to define this condition as requiring immediate corrective action.

25. Why is the SIR information missing? The Statutory Invention Registration (SIR) data element is missing. It appears that the data-capture contractor knows how to trigger the boiler-plate statement since it appears on the printed patent, but it is not in the data tape.

A SIR is identified by the B130 element containing an "H". In this case, the identical boiler plate text is included within the printed patent. Reference FAQ # 14 for additional discussions about the B130 element.

26. Will Red Book utilize CML (Chemical Markup Language) similar to it s MathML implementation?

This is definitely in Red Book's future but will likely wait until CML stabilizes and Red Book migrates to XML. ChemDraw has already produced a Beta version of ChemDraw 6 and the data-capture contractor has been testing its CML export capabilities.

27. Patent 6,010,895 contains tables that consist of sequence listing data. Will they be converted to the SEQ-LST element?

Yes, the data-capture contractor is currently working on converting the sequence-listing data contained within tables to the SEQ-LST CWU.

28. What became of the LREP (Legal attorney or representative information) Green book tags?

Element B740 (Identification of legal representation, that is, attorneys, agents, or representatives associated with the document) contains multiple B741 (Attorney address) tags, which in turn contains a PARTY-US tag that contains a name, optional address, optional place of residence, optional descriptive text, optional country of residence, and optional country of nationality. They will map to the AAT, AGT, REG, STR, CTY, STA, CNT, ZIP Green Book tags.

29. Related Application data seem to match in Blue Book and Green Book but what happened to Red Book?

The mapping of related patent application documents between the Green/Blue Book records and the Red Book SGML DTD is somewhat complex and warrants further explanation. In the Green/Blue Book, the related application document (RLAP) records express a linked parent/child relationship as a sequence of record sets, with the relationship defined by the Parent Code (COD) in the record set. For example, given the relationship:

Document A is a continuation-in-part of document B and document C, each of which is a division of document D.

Green/Blue Book would encode this as follows:

  1. Document A is the base document (i.e., the one in the PATN record group).
  2. A series of (RLAP) record sets would then occur in the following order by Parent Code (COD):
  3. COD=72 (Continuation-in-part of) Document B
  4. COD=90 (and) Document C
  5. COD=92 (,each) No Document
  6. COD=84 (,which is a division of) Document D

In the Red Book DTD, by contrast, the Parent/Child relationship is expressed explicitly in the PARENT model group. The PARENT model group is expressed as (DNUM, PDOC, PSTA?, PPUB), corresponding to the Child document number, Parent Document, Status Code, and Publication Date respectively. Red Book would encode the above example as:

  1. <B632><DNUM>Document A</DNUM><PDOC><DNUM>Document B</DNUM><PDOC> ... and so on
  2. <B632><DNUM>Document A</DNUM><PDOC><DNUM>Document C</DNUM><PDOC> ... and so on
  3. <B620><DNUM>Document B</DNUM><PDOC><DNUM>Document D</DNUM><PDOC> ... and so on
  4. <B620><DNUM>Document C</DNUM><PDOC><DNUM>Document D</DNUM><PDOC> ... and so on

Thus, there is a correspondence between the tagging schemes, although it is not necessarily a simple one. There is a general relationship between the following Green/Blue Book and Red Book constructs:

Green/Blue Book Parent Code (COD)
Red Book Element
74, 84 B620 (Division)
71, 81, 91 B631 (Continuation)
72,82 B632 (Continuation-in-part)
73 B660 (Substitution)
86, 89, Not mapped
92 (, each) Spawns multiple records (i.e., one for each document up to the previous record which is not an AND )
90 (And) Spawns multiple records of the previous type (e.g., multiple Continuations if the previous non-And record was a continuation)

30. Why are the Assignee Addresses empty?

We find a set of patents in Jan 4, 2000 that have empty data elements where the assignee addresses should be. The ADR element has a start and end tag, but no data. Some of these instances, such as 6,011,767, are fairly obscure foreign assignees, in this case "3dcd, L.L.C.", though we might surmise that we are dealing with a UK company because the inventor is from there. There are some 21 other patents like this, some foreign, some domestic but not even the country is given in the ADR element.

In one other instance, 6,010,871, there are two assignees, the first has an empty address while the second has "Tokyo, Japan". Looking at it as printed one might assume that both are from Tokyo, but of course anyone trying to build a database of assignees and their addresses from Red Book will be frustrated here. So the inference that both are from Tokyo is unwarranted, and in any case makes for bad software practice to do a "look ahead" and assign addresses where they are missing with such scanty evidence.

Finally, in 6,009,665 we have a third situation where, although there is an empty ADR, the Assignee on the printed page reads: "Southpac Trust International, Inc.; as Trustee of The Family Trust, U/T/A dated 12/8/95". This "Trustee" statement can be found nowhere in Red Book, it is not in DTXT, so we have no idea where it came from.

The data-capture contractor captures the assignee address information exactly as it appears on the PTOL85b form, and from nowhere else per current rules. If the address is incomplete or omitted, then that is the way it appears in the published patent.

31. How do I resolve/render the USPTO specific special characters?

The uspto.ent file contains a set of special characters that have Blue and Green Book codes, but no public SGML/XML entities. Listed below are the problem characters:

Green Book
Red Book
Blue Book
 .dotbhalfcircle.  &Dotbhalfcircle;  # 511
 .dotthalfcircle.  &Dotthalfcircle;  # 510
 .dotlhalfcircle  &Dotlhalfcircle;  # 508
 .dotrhalfcircle.  &Dotrhalfcircle;  # 509
 .dottedcircle.  &Dottedcircle;  # 505
 .lhalfcircle.  &Lhalfcircle;  # 503
 .quadbond  &Quadbond;  # 185
 .rhalfcircle.  &Rhalfcircle;  # 504
 .centerline.  &Centerline;  # 551
 .asterisk-pseud.  &Asteriskpseud;  # 553
 .rect-ver-solid.  &Rectversolid;  # 563
 .rect-solid.  &Rectsolid;  # 564
 .oval-hollow.  &Ovalhollow;  # 571
 .oval-solid.  &Ovalsolid;  # 572
 .circle-solid.  &Circlesolid;  # 574
 .h-slashed.  &Hslashed;  # 528
 .paren-open-st.  &Parenopenst;  # 545
 .paren-close-st.  &Parenclosest;  # 546
 .brket-open-st.  &Brketopenst;  # 547
 .brket-close-st.  &Brketclosest;  # 548
 .BHorizBrace.  &Bothorzbrace;  # 507
 .THorizBrace.  &Tophorzbrace;  # 506

The data-capture contractor has captured these characters as glyphs for rendering, and as of issue 20000328 they are now included within the entity directory distributed with the Red Book media.

32. Should there be a slash between the series code and the application number with the DNUM element?

No, but it does exist in many locations.  The USPTO is Investigating

33. Is the ENTRY element being used to identify when a new column begins in a row of a CALS Table?

Yes, it is.

34. Why do Reissue patents sometimes drop claims or group claims?

This is a result of the reissue insert and delete tags starting in one claim and terminating in another claim. If it starts at the end of the previous claim, then the problem was corrected in February 2000.

Reference Build 20000307 for corrected sample data.

35. Why is highlighting sometimes incorrect around numeric data?

Performance changes implement in late 1999 introduced a problem with un-bolding numeric data. The problem was corrected in December 1999. Reference Build 20000307 for corrected sample data.

36. There are several problems with the B570 data element. 

First, we think the content model is incorrect in that it permits the exemplary claim number to be optional. We have never seen a patent without at least one exemplary claim, so we think this should be changed to show that there must always be at least one. Design patents have exactly one claim, which is, obviously, the exemplary claim.  In Blue Book, therefore, the exemplary claim for design patents is implied with the result that it is not recorded in the data from which Red Book is built. Therefore, the exemplary claim number must be optional in the DTD since it does not appear in a significant number of documents. 

This is a problem discussed in FAQ# 8.

Second, the DTD seems to suggest that the <B578US> Exemplary claim number will be repeated whenever there are multiple exemplary claims. In practice however, this is not the case, as multiple exemplary claim numbers are all lumped into a single data element, separated by commas. Thus, commas, instead of appropriate SGML, are being used as a form of structural markup.  

Third, we get the impression from the DTD that "revision markers" may be used, presumably in the case of reissues. But when we examine the way this data is handled for reissues, we find no such markers for insertion or deletion. See for example Re. 36,480 in Jan 4, 2000. Since the exemplary claim here is numbered "9" and since 9 through 29 are all new material in the reissue, we know for sure that this cannot have been the original exemplary claim. Yet there is no indication here that the previous exemplary claim has been removed and a new one inserted in its place. 

Revision elements are used only within Reissued patents. In addition, within the bibliographic information, it is only used with changes to the title and abstract. In the cited example, the front page of the published reissued patent also had no indication that the exemplary claim had changed.  The USPTO will investigate whether this is a policy or a mistake.

37. Why do Design Patents sometime appear as "D. 99999", "D0099999", or "Des. 99999"?

The USPTO is investigating.

38. The IPC Edition element B516 is at variance with the printed copies; the Red Book says one thing, the printed patent something else.

This problem existed in the first issue of 2000 and was corrected in subsequent issues.

39. What are the differences between versions 1.8 and 1.9 of the Red Book DTDs?

Within the Red Book DTD are comments detailing the revision history. The V1.8 to V1.9 changes are listed below, as well as a list of V1.8 to V1.9 content model changes.

Revisions 2000-03-07:

Content model changes:

Element - B580

ST32-US-Grant-018.DTD content model: (B581*,B582+,B583US*)

ST32-US-Grant-019.DTD content model: (B581*,(B582 | B583US)+)

Element - F

ST32-US-Grant-018.DTD content model: (MATH)

ST32-US-Grant-019.DTD content model: (MATH | PTEXT)

Element - H

ST32-US-Grant-018.DTD content model: (STEXT | F)+

ST32-US-Grant-019.DTD content model: (STEXT+)

Element - PTEXT

ST32-US-Grant-018.DTD content model: (B830 | CIT | CLREF | CRF | CWU | DFREF | DNUM | FGREF | FOO | FOR | HIL | IMG | LST | LSTREF | PAREF | PDAT | SEQREF | TBLREF)+

ST32-US-Grant-019.DTD content model: (B830 | CIT| CLREF | CRF | CWU | DFREF | DNUM | F | FGREF | FOO | FOR | HIL| IMG | LST | LSTREF | PAREF | PDAT | SEQREF | TBLREF)+

Element - STEXT

ST32-US-Grant-018.DTD content model: (PDAT | FOR | IMG | HIL)+

ST32-US-Grant-019.DTD content model: (PDAT | F | FOR | IMG | HIL)+

40. Will math and tables be internal or external files? How will composition/display tools render them? For simple inline math markup, how would such things as corporate name "H(superscript: 2)0 Systems Inc." be tagged?

Earlier versions of Red Book proposed both internal and external complex work units (CWUs). A compromise was reached with Red Book ST.32 US Patent Grant V1.8 1999-08-26 DTD supporting inline tables (CALS) and inline math structures (SGML tailored MathML), but also allowing for external TIFF image file references for all CWUs.

CALS tables are easily rendered via a composition tool like Corel WordPerfect, and math can be displayed from the associated TIFF file. The MathML content is included to support content searching.

The Red Book ST.32 US Patent Grant V1.9 2000-03-07 DTD was further modified to allow for simple text tagged as in-line math without the MathML structure. However, simple in-line math structures like E=MC2 can be rendered using highlighting and not be associated with math content. This is appropriate for H2O Systems Inc., but not for true math structures.

41. Has the USPTO thought about using XML for Red Book, or converting Red Book to XML? Is the USPTO planning to offer the Red Book as a separate XML product at any point in the future?

Currently, the DTD is SGML formatted and sample documents delivered within the first year will be SGML. However, the DTD is readily converted to XML by removing tag minimization indicators and substituting an XML version of CALS markup (all other requirements of XML having been met already). Document instances differ from XML in minor ways, such as empty tag syntax, Unicode verses ISO character set references, and CALS table markup syntax.

Red Book will likely migrate to XML in the next few years, possibly at the time that applications published at eighteen months are first allowed and ready for publication as grants.  It is unlikely that the test data currently distributed would be converted by the USPTO to XML. It is much more likely that production data in SGML will be converted to XML.

42. Are the third party ML's included in your DTD (i.e. CALS MILS-M-28001 TABLEPAK) XML-compliant?

The CALS Table markup is not XML but SGML. In migrating to XML, a new table model such as a forthcoming XML version of CALS will likely be used.

MathML is XML but the referenced MathML DTD was modified to conform to the SGML syntax. The MathML data content is modified on import to Red Book for both empty tag syntax, and character entity declarations.

43. Will you provide the XML equivalent to the SGML Red Book DTD as well as referenced third party DTDs?

When an XML Red Book product is available, the corresponding DTDs will also be provided. As of Build 20000307, equivalent SGML information is available within the distributed DTD and ENTITY directories.

44. Have you tried to parse the XML-ized Red Book documents using an off-the-shelf parser or have you developed your own?

We have not yet created any XML Red Book document instances.

The data-capture contractor is parsing the SGML instances using James Clark's SP parser with the following warnings activated:

45. Does the USPTO still want Blue Book users to switch to Green Book? Some time ago, the USPTO appeared to want Blue Book users to switch over to Green Book. Since Green Book will eventually be derived from Red Book and Red Book appears to have many problems, how stable will the new Green Book be? As good as Blue? or Red? Also, does this impact the timing of Green Book delivery? A few hours makes a real difference.

Blue Book is provided by the data-capture contractor and converted to Green Book by the USPTO. Since Red Book will replace Blue Book, a Red to Green conversion tool is being developed by the USPTO to extend the life of Green Book beyond Blue Book.

Red Book is still a maturing product. It represents the USPTO's first major step in the direction of standard, generalized markup. Both Blue Book production data and Red Book sample data will be delivered for customer review and feedback until such time as the USPTO is satisfied that Red Book meets the needs of Blue Book customers.

At such time that production Green Book is generated from Red Book rather than Blue Book, customers will be notified in advance. When it takes place, this change will not delay the delivery of Green Book.

46. Within CWUs are embedded images (EMIs) optional rather than a necessary feature of the patent? We presumed that tables, chemical structures, DNA sequences, and equations will always have accompanying TIFF images.

Early versions of Red Book omitted EMIs within CWUs for the reason that there were no TIFF files available for those documents converted from Green Book to Red Book. All EMIs within the CWUs were consequently optional.

Now that Red Book is being generated from production data, the current version of Red Book DTD (ST.32 US Patent Grant V1.9 2000-03-07) enforces TIFF images for chemical and math CWUs, but still allows optional images for both tables and sequence listings. The data-capture contractor does not generate TIFF images for these CWU types at present. The USPTO will ask the data-capture contractor to create images of all tables, in addition to the markup. Images of sequence listings are under investigation.

47. Is it possible to receive a sample of a re-issue of a re-issue? We have concluded that the re-issuance 'thread' should be separated from the other data if one is to reconstruct a cogent statement. Is this the case?

Reissue markup utilizes a set of unpaired insert and delete tags that track changes made to the document. As of release ST.32 US Patent Grant V1.9 2000-03-07 two issues remain with reissue tagging:

Due to the likelihood of difficulties with a reissue of a reissues patent in Red Book, the USPTO continues to investigate this problem.

48. Were the 13 Red Book examples used in production? After a little head scratching we concluded that the few dozen tables contained in the sample of 13 could not possibly have been used to typeset the patents in question. Can we obtain a patent that makes proper use of tables? We would hope to see a table that has several columns in it and perhaps a chemical structure in one of them.

The 13 sample patents are derived from Green Book using an early release of Red Book and do maintain the integrity of the table as coded in Green Book. This, as you noted, was not the source of the typeset patent. We recommend that you look at tables in the current DTD (ST.32 US Patent Grant V1.9 2000-03-07), starting with issue 20000328.

49. Would you please provide details for the physical and logical format of the Red Book weekly issue? We need to read the tape on PC-based equipment and our software has some constraints regarding record size. Can you tell us how large the records are?

With respect to the maximum record size:

The weekly issue tapes consist of a variety of file types of which only one is ASCII, the SGML document instance for each patent. The record length within the SGML file is unlimited (i.e. a paragraph may be contained within a single record and there is no limit on the paragraph record size). However, as of release ST.32 US Patent Grant V1.9 2000-03-07, the SGML file is relatively newline/record insensitive, and can be re-blocked as long as record breaks are not introduced within leading and trailing tags. This may not be true in future releases (i.e. sequence listing data will include record breaks as part of data content).

With respect to the logical format of Red Book:

The logical format of the Red Book is best understood by traversing the DTD. The last version of the Red Book DTD (ST.32 US Patent Grant V1.8 1999-08-26) is documented in an HTML format that permits you to walk the tree starting with element PATDOC, and is available at the following URL:

http://www.uspto.gov/web/offices/ac/ido/oeip/sgml/st32/redbook/st32g018/index.html

Similar documentation for the current DTD (ST.32 US Patent Grant V1.9 2000-03-07) is being generated and will be available in the near future.

With respect to the physical format of Red Book:

Physically the Red Book weekly issue tape contains the full text, drawings, and complex work units (tables, mathematical expressions, sequence data, and chemical structures) of each patent issued. The file format is Standard Generalized Markup Language (SGML) in accordance with the ST.32 US Patent Grant V1.9 2000-03-07 Document Type Definition (DTD). Tables and sequence data are included using CALS SGML markup. Mathematical expressions are included using MathML XML markup and external Mathematica Notebook (NB) files. Chemical structures are represented by external CS ChemDraw (CDX) files and MDL Information Systems (MOL) files. Drawings, mathematical expressions, and chemical structures also include external Tagged Image File Format (TIFF) Revision 6.0 with CCITT Group 4 Compression files.

Each weekly update contains approximately 3,000 patents (800 megabytes) on one HP DLT IIIXT (TK85XT) tape. All files associated with a specific patent are compressed and zipped into a single patent zip file. Zipped patent files are grouped by type within a pre-determined directory scheme and re-zipped with path information (but not compressed) into a single weekly update file. The weekly update file is then copied to a DLT tape using the UNIX TAR facility.

Grouping is based on the following directory tree:

YYYYMMDD |-UTIL0601 |-US0601nnnn-YYYYMMDD.ZIP
|-US0601nnnn-YYYYMMDD.ZIP
|- . . .
|-UTIL0602 |-US0602nnnn-YYYYMMDD.ZIP
|- . . .
|-UTIL0603 . . .
|-PLANT
|-USP0nnnnnn-YYYYMMDD.ZIP
|- . . .
|-DESIGN |-USD0nnnnnn-YYYYMMDD.ZIP
|- . . .
|-REISSUE |-USREnnnnnn-YYYYMMDD.ZIP
|- . . .
|-SIR |-USH0nnnnnn-YYYYMMDD.ZIP
|- . . .
|-DTDS
|-ENTITIES

Where:

  1. The root directory is the issue date;
  2. Utility patents are distributed into sub directories "UTIL" plus the first four characters of the patent number. This assures a maximum of 1000 zipped patent files within a single directory.
  3. Plant, Design, Reissue, and Sir patents are distributed into their respective directories listed above. Note that if the weekly issue does not have a specific patent type, then the patent type sub directory will be omitted.
  4. Sub directory DTDS contains the DTDs and catalogs used to parse the issue.
  5. Sub directory ENTITIES contains the entity files and glyphs referenced by the Red Book DTD.

50. What model of DLT tape is going to be used to create Red Book? Why not use CD-ROM?

The drives used to create Red Book tapes are HP DLT 30e and 40e using no hardware compression and HP DLT IIIXT (TK85XT) or equivalent media. Although some issues would fit, CD-R does not have sufficient capacity for every issue.

51. In the past, i.e. using Blue Book data, it was possible to easily format the claims the same way we see them on the printed patent complete with hanging indents. How can we do this using the 13 examples?

The 13 Red Book examples do not adequately capture the paragraph types. This is a result of treating paragraph formatting as a style issue. However, in subsequent releases of Red Book, both claim step processing and paragraph types are accurately captured. Reference the latest version of Red Book (ST.32 US Patent Grant V1.9 2000-03-07).

52. What is this "FOR" class code? As in 320FOR123?

The FOR should be handled the same way as DIG. FOR refers to a collection of foreign art (non-US patents) that exists in a subclass.

53. Inventor information contains the inventor address as well as the inventor residence. What is the difference between the two?

ADR is for correspondence and RESIDENCE indicates where the inventor lives, which sometimes might be a different country than the correspondence address. In any case the RESIDENCE element is not a complete address but only indicates which branch of the military or a city (with or without a state or country). The ADR element gives what should be a complete address.

54. The art unit was always a three digit number (NNN) but now it is a variable length number with a space between the 2nd and 3rd digit (e.g. NN N). Why the difference?

Art Units are now called Industry Sectors. The B474US element has been defined to cover the variations expected in this field. Numbers can be up to four digits. The space in the third position was an error.

55. The DTD claims that the assistant examiner may be repeated although I have never seen more than one in the samples. Has this changed?

The DTD allows zero or more assistant examiners. This is not a change in policy.

56. Where is the "Rule 47 provision" in the Red Book?

There is now an empty tag indicating that Rule 47 was invoked. It might not be in any of the examples, but it is in the DTD.  When present, this tag signifies that the application was filed under Rule 47 indicating that the applicant(s) refused to execute the application or could not be found

<!ELEMENT B221US - O EMPTY >

57. In tables how are column widths measured? I.E what does COLWIDTH="120PT" equate to?

Table column widths are based on 72 points per inch. 120 PT therefore would be 1.67 inches.

58. Since doctype declarations are not allowed in DTDs, why is it in the version 1.8 DTD? 

The way you have enclosed the entire Red Book DTD v1.8 within the <!DOCTYPE PATDOC [....]> is the method to use when a DTD is concatenated with a document instance conforming to that DTD and is also useful during development of the DTD to avoid error messages like "'ELEMENT' declaration not allowed in prolog" when parsing the DTD file alone. However not removing the DocType declaration when distributing the file renders the file unusable as it stands for parsing with document instances in the conventional way parsing an SGML instance that refers to the DTD rather than being concatenated with the DTD. This is because DocType declarations are not allowed in DTDs.

Some parsers or SGML applications require the DOCTYPE declaration while others complain when it is included. For example, the data-capture contractor comments out the declaration as follows:

<!-- DOCTYPE PATDOC [ -->

...

<!-- ]> -->

59. We find that the data element B122US is missing in SIRs. This keeps us from placing the appropriate "boiler-plate text" on the front page.

This element is currently not initialized in Build 20000307. The USPTO is investigating.

60. How do you interpret the terminal disclaimer statements? 

While looking at the printed copy of Design Patents we find an aspect relating to terminal disclaimers that seems to be incorrect. Design patents with terminal disclaimers have the statement [*] Notice: This patent is subject to a terminal disclaimer. But they also have an explicit statement on the front page saying "[**] Term: 14 years". It seems to us that to have both statements simultaneously present is inconsistent and misleading since one tells us that the term is not the usual 14 years while the other tells us that it is.

In general, the presence of a terminal disclaimer is a signal that one must pursue further investigations to determine whether a patent is or is not in force. Nothing that appears on the front page of a patent grant at time of issue can be taken at its face value with respect to this question since any of it could have changed after the file wrapper was closed for printing.

61. Element B540 has incorrect highlighting.

Among the markup problems in element B540 (title of the invention) we are finding many examples of incorrect and improper highlighting markup in Red Book files. As one example from the issue week of Jan. 4 2000 we offer the title of Pat. No. 6010243 Hessler et al.):

Method of zero point setting of a thermal conductivity detector system in a chamber especially for CO<HIL><SB<2 /SB></HIL><HIL><BOLD>measuring in a controlled atmosphere incubator</BOLD></HIL>

The first HIL element the subscript in the chemical symbol for carbon dioxide is valid and correct; the second HIL element, which throws the phrase "measuring in a controlled atmosphere incubator" into boldface, is invalid and incorrect.

As another we offer the title of Pat. No. 6010629:

<HIL><ITALIC>Microthrix Parvicella </ITALIC></HIL><HIL><BOLD>forming and bulking controlling method for waste water treatment</BOLD></HIL>

Here too the first HIL element is a valid italicization of a Latin scientific name; the boldfacing of the remainder of the title is not.

Finally we wish to point out that the trailing space in BOTH of these examples should NOT be subscripted or italicized structurally. The fact that the government's own vendor happens to typeset an italicized space and a subscripted space identically with a light-roman space of unhighlighted attributes should not influence the structural markup of Red Book and will indeed produce improper appearances in more modern typesetting systems due to variations in size.

The USPTO is investigating.

 

Build 20000307

In order to accurately report the status of a specific Red Book document instance, the data-capture contractor is now identifying both the DTD version and the build date within the instance as follows:

<PATDOC DTD="1.9" STATUS="BUILD 20000307">

The DTD attribute defines the version of the Red Book DTD and the STATUS attribute defines the build date of the application software that created the instance. Instances that do not contain the STATUS attribute were generated prior to implementing the build controls.

Since problem resolutions are now reported by build, the build number can be used to associate data problems and their resolution with a specific version of the Red Book build software. Weekly issue 20000328 is the first issue created with "Build 20000307".

 

USPTO Red Book Contacts

Please address questions about the Red Book data to Ed Johnson and questions about the Red Book DTD to Bruce Cox.

 

Return to top of page