Frequently Asked Questions
Can I download a large batch of PDF documents all at once?
Please contact us at industrydocuments@ucsf.edu if you are interested in large downloads of PDFs. For batch downloads of records/metadata, you can query our Solr index directly using an API. We also provide data sets containing metadata and OCR text for our entire corpus. Please see
Industry Documents Library API and Data Set for more information and documentation.
Can I use documents or media clips in my project?
You can use these materials for a non-commercial project if it falls under ‘Fair Use.’ Please see
Copyright and Fair Use for more information.
Do you make your entire Data Set available? Is there an API?
You can query our Solr index directly using an API. We also provide data sets containing metadata and OCR text for our entire corpus. Please see
Industry Documents Library API and Data Set for more information and documentation.
What is the format for a date query?
The date query can be expressed as YYYYMMDD.
-The year (YYYY) needs to be between 1760 and the current year.
-The month (MM) needs to be between 01 and 12. (notice the leading zero).
-The date (DD) needs to be between 01 and 31. (notice the leading zero).
an example of a valid date query dd:19810123 or dd:[19810123 TO 20101111]
an example of a invalid date query dd:00000000 or dd:[00000000 TO 19899999]
How do I search for variant spellings of names or terms (fuzzy search)?
Fuzzy search is very useful in searching for names or terms when you are not sure how they are spelled in the documents or you think a word might be misspelled in the text.
Use the ~ operator at the end of a term like teen~
A fuzzy search like teen~ searches for words that are similarly spelled to teen. The definition of similar is how far is it from the original word by "edit distance". An edit distance is either an insertion (teens), a deletion (ten), or a substitution (teem).
You can specify how much edit distance you want. For instance, teen~1 will only return words that are at most 1 edit distance away from teen.
cigarettes~ or cigarettes~1 (to be a little more conservative), will return documents where the term is spelled as cigaretes or cigarretes.
If you do not specify a number, then the system searches for teen~0.5 which will return words that are about 50% like teen (in this case 2 edit distance away).
How do you identify "Potential Duplicates"?
Potential duplicates are identified when a document matches another in the following fields:
collection, title, documentdate, pages, availability
What is the "More Like This" feature when viewing a document?
"More Like This" returns documents that are similar to the currently viewed document.
This feature contains two types of documents:
- The public version of a "restricted" document (if it has a public counterpart).
- Recommendations based upon matches in title and author with a slightly higher weight put on title.
What is the "Previous/Next Bates" feature when viewing a document?
"Previous/Next Bates" allows you to view documents in order of Bates number, a sequential number stamped on most litigation documents.
What is the "Browse" feature when viewing a document?
"Browse" allows you to view the documents in the order they were ingested into the archive as a part of a contextual set.