part of the commerce business apps challenge we're challenging developers to look for...

Part of the Commerce Business Apps Challenge / We're challenging developers to look for innovative ways to utilize DOC and other publicly available data to help businesses identify opportunities, grow, enhance productivity and create jobs. $10,000 USD in prizes (1 st - $5,000; 2 nd - $3,000; and 3 rd – $2,000) Ends: April 30, 2012 @ 11:59 PM EDT DOC/USPTO Apps for Innovation Webinar: Thursday, March 29, 2012 11:00 – 11:30 AM EDT

Upload: lester-matthews

Post on 17-Dec-2015




0 download


Page 1: Part of the Commerce Business Apps Challenge  We're challenging developers to look for innovative ways to utilize

Part of the Commerce Business Apps Challenge

We're challenging developers to look for innovative ways to utilize DOC and other publicly available data to help businesses identify opportunities, grow, enhance

productivity and create jobs.

$10,000 USD in prizes (1st - $5,000; 2nd - $3,000; and 3rd – $2,000)

Ends: April 30, 2012 @ 11:59 PM EDT

DOC/USPTO Apps for Innovation

Webinar: Thursday, March 29, 201211:00 – 11:30 AM EDT

Page 2: Part of the Commerce Business Apps Challenge  We're challenging developers to look for innovative ways to utilize


DOC/USPTO Apps for Innovation


Mike Kruger, DOC – Director of Digital Strategy (Host)[email protected]

Christopher Leithiser (pronounced LightHizer), USPTO – IT Specialist (Presenter)[email protected](703) 756-1244 Office

If you have questions regarding the USPTO Patent and Trademark Bulk Data available from Google, Inc. for no charge, send them to: [email protected]

Page 3: Part of the Commerce Business Apps Challenge  We're challenging developers to look for innovative ways to utilize


DOC/USPTO Apps for Innovation


Open Government Initiative / / Google, Inc.

(2) Datasets

Innovative Ideas



Page 4: Part of the Commerce Business Apps Challenge  We're challenging developers to look for innovative ways to utilize


DOC/USPTO Apps for Innovation

Open Government Initiative / / Google, Inc.:

History: The USPTO had been working towards electronic distribution of information (no media).

Examples: In the 1980’s, Patent Grant Bibliographic information was available for download for no charge from a 286 via a BBS. Foreign Exchange Agreements - Trilateral Agreement (USPTO; EPO; and JPO) – Medialess Exchange. Tumbleweed System. In the early 2000’s Trademarks asked us to provide their information for no charge via the Internet. FTP site. USPTO Security had us shutdown the FTP site. Now we have an HTTPS site.

Approximately three years ago, President Obama directed the Open Government Initiative (OGI). Transparency… This resulted in a new website for U.S. Federal Government data,, which went live in May 2009. Although this site does not have all of the U.S. Government’s data, it has links to data sources and has searchable metadata. USPTO now has 31 datasets listed on the site, which are free to the Public. The USPTO released all of its datasets more than a year earlier than originally planned.

On September 4, 2009 the USPTO released an RFI (Solicitation# SS-PAPT-09-10008 for Public Data Dissemination). This included holding East Coast and West Coast Public meetings. We received 6 responses.

On July 6, 2010 the USPTO released an RFP (Solicitation# DOC52PAPT1000025 for Public Data Dissemination) (closed August 12, 2010). The USPTO received zero responses.

Subsequently, the USPTO did a sole source procurement (a no cost service agreement to Google, Inc.) for two years. This was just extended for an additional year (ends February 28, 2013). The USPTO plans to release another RFP in the next couple of months based on the earlier RFP (includes: Data Hosting; Public PAIR extraction/hosting; minus the infrastructure piece).

Page 5: Part of the Commerce Business Apps Challenge  We're challenging developers to look for innovative ways to utilize


DOC/USPTO Apps for Innovation

Datasets: U.S. Patent Grant Bibliographic Text (2001 to Present) – Part (1 of 2): Contains the bibliographic text (i.e., front page) of each patent grant issued weekly (Tuesdays) from January 2001 to Present (excludes images/drawings).

The file formats are Standard Generalized Markup Language (SGML) in accordance with the U.S. Patent Grant Version 2.4 Document Type Definition (DTD) and eXtensible Markup Language (XML) in accordance with the U.S. Patent Grant Version 2.5; 4.0 International Common Element (ICE); 4.1 ICE; and 4.2 ICE Document Type Definitions (DTDs).

XML Resources at the USPTO: (These are being updated).

If you take one document out of the Patent Grant Bibliographic Text file and place it in a directory with the correct DTD and then double click that individual document, Internet Explorer will open the file successfully. NOTE: You may receive a warning about Active X controls.

Additionally, if you take one document out of the Patent Grant Bibliographic Text file and open it with MS Excel as an XML List, it will import the data under column headings from the XML tags.

NOTE: All Patent Grant Bibliographic Text files will open successfully in MS Word; NotePad; WordPad; TextPad; and UltraEdit.

This product includes a or file for each week [where "yyyymmdd" is a Tuesday issue date and "nn" is a two-digit, fixed-length number (with leading zero) representing the sequentially-numbered week of the year].

Within each weekly zip file are three (3) files: pgbyyyymmdd.xml or ipgbyyyymmdd.xml (Bibliographic information in XML ICE); pgbyyyymmddlst.txt or ipgbyyyymmddlst.txt (List of patent grant numbers in ascending order); pgbyyyymmddrpt.txt or ipgbyyyymmddrpt.html (Statistical/summary report)

Approximately 4,000 patent grants per week.

Approximately 5 MB per weekly zipfile.

Available from Google: or

Available directly from the USPTO: • • •

Page 6: Part of the Commerce Business Apps Challenge  We're challenging developers to look for innovative ways to utilize


DOC/USPTO Apps for Innovation

Datasets: U.S. Patent Grant Bibliographic Text (1976 to 2001) – Part (2 of 2): Contains the bibliographic text (i.e., front page) of each patent grant issued weekly (Tuesdays) from January 1976 to December 2001 (excludes images/drawings).

The file format is a subset of the Green Book, ASCII text:

It includes patent number, series code and application number, type of patent, filing date, title, issue date, inventor information, assignee name at time of issue, foreign priority information, related US patent documents, classification information, U.S. and foreign references, attorney, agent or firm/legal representative, Patent Cooperation Treaty (PCT) information, abstract, and if present Statement of U.S. Government Interest.

NOTE: All Patent Grant Bibliographic Text files will open successfully in MS Word; NotePad; WordPad; TextPad; and UltraEdit.

This product includes a file for each year (1976 to 2001). All of the weekly files were concatenated into an annual file.

Within each annual zip file is (1) file: yyyy.dat (Bibliographic information in ASCII);

EXCEPTION 1: Beginning 09/03/1996 we also began providing the weekly zip files:(e.g., which contains: pba19960903.txt)

EXCEPTION 2: Beginning 01/07/1997 the weekly files appear as which contains:pbayyyymmdd.txt (Bibliographic information in ASCII); pbayyyymmddlst.txt (List of patent grant numbers in ascending order); pbayyyymmddrpt.txt (Statistical/summary report)

Approximately 4,000 patent grants per week.

Approximately 1.6 GB total.

Available from Google: or

Available directly from the USPTO: • • •

Page 7: Part of the Commerce Business Apps Challenge  We're challenging developers to look for innovative ways to utilize


DOC/USPTO Apps for Innovation

Datasets: U.S. Patent Application Publication Bibliographic Text (March 15, 2001 to Present):

Contains the bibliographic text (i.e., front page) of each patent application publication (non-provisional utility and plant) published weekly (Thursdays) from March 15, 2001 to Present (excludes images/drawings).

The file formats are eXtensible Markup Language (XML) in accordance with the U.S. Patent Application Version 1.5; 1.6; 4.0 International Common Element (ICE); 4.1 ICE; and 4.2 ICE Document Type Definitions (DTDs).

XML Resources at the USPTO: (These are being updated).

If you take one document out of the Patent Application Publication Bibliographic Text file and place it in a directory with the correct DTD and then double click that individual document, Internet Explorer will open the file successfully. NOTE: You may receive a warning about Active X controls.

Additionally, if you take one document out of the Patent Application Publication Bibliographic Text file and open it with MS Excel as an XML List, it will import the data under column headings from the XML tags.

NOTE: All Patent Application Publication Bibliographic Text files will open successfully in MS Word; NotePad; WordPad; TextPad; and UltraEdit.

This product includes a or file for each week [where "yyyymmdd" is a Thursday publication date and "nn" is a two-digit, fixed-length number (with leading zero) representing the sequentially-numbered week of the year].

Within each weekly zip file are (3) files: pabyyyymmdd.xml or ipabyyyymmdd.xml (Bibliographic information in XML ICE)pabyyyymmddlst.txt or ipabyyyymmddlst.txt (List of published patent application numbers in ascending order)pabyyyymmddrpt.txt or ipabyyyymmddrpt.html (Statistical/summary report)

Approximately 5,000 patent application publications per week.

Approximately 2.7 MB per weekly zipfile.

Available from Google: or

Available directly from the USPTO: • • •

Page 8: Part of the Commerce Business Apps Challenge  We're challenging developers to look for innovative ways to utilize


DOC/USPTO Apps for Innovation

Innovative Ideas:

Homogenize the patent grant bibliographic text data (i.e., make it all the same format).

Same for the patent application publication bibliographic data.

Capture patent grant bibliographic text data from 1790 to 1975 using the image data.

Build a text searchable database (updated weekly) that includes both of the datasets discussed today. Search queries can be saved. Result sets can be saved/extracted/tailored.

Build a text searchable database (updated weekly) that includes subsets of both of the datasets discussed today. (e.g., Green Technology related).

Same ideas as above, but use full-text (75 MB/104 MB per week) or full-text with embedded images (1.4 GB/1.5GB per week):

Page 9: Part of the Commerce Business Apps Challenge  We're challenging developers to look for innovative ways to utilize


DOC/USPTO Apps for Innovation


Combine USPTO applicant/inventor information with other USPTO datasets (e.g., with USPTO assignments (ownership) data): or

Combine USPTO patent grants and patent application publications with other DOC data (e.g., Census or Economic data).

Page 10: Part of the Commerce Business Apps Challenge  We're challenging developers to look for innovative ways to utilize


DOC/USPTO Apps for Innovation


If you have questions regarding the USPTO Patent and Trademark Bulk Data available from Google, Inc. for no charge, send them to: [email protected]