Collections

Industry Documents Library API and Data Sets

API

The Industry Documents Library uses Solr to index the document corpus. Users who are interested in accessing document metadata and searching the full text of the documents programmatically can query the Industry Documents Solr server directly through our application programming interface (API). This allows the user to easily query the index and export the metadata of documents matching the query in order to process search results by program. Search results can be exported in these formats: xml, json, python, ruby, php, and csv.

Download documentation

Data Sets

For researchers who would prefer to work with Industry Documents Library (IDL) metadata and optical character recognition (OCR) text from within their own database systems, IDL has made these files available for free download via the link below. Please consult the included readme file for instructions. Note that the IDL website’s user interface provides access to the most current data; new documents are released every month and we update the data sets as needed on a monthly basis. These files are provided on a do-it-yourself basis. IDL is unable to provide individual technical support for downloading files or for setting up your own database in which to ingest them. We do welcome feedback – please contact IDL directly via email or phone.

Access data sets

Opioid Industry Documents Data Sets

For researchers interested in opioid industry documents in particular, Johns Hopkins University hosts the Opioid Industry Documents Archive Toolbox, which supports the use of computational methods on these documents and their metadata.

Visit Opioid Industry Documents Archive Toolbox

Other Data Sets

A growing number of research projects using industry documents have made their data sets publicly available, including:

If you are aware of other publicly available data sets using IDL documents or metadata which could be added to this list, please contact us at industrydocuments@ucsf.edu.