As 2024 comes to a close, we’d like to share our gratitude for all of you in the IDL community and your ongoing support and connection to our work.
Here are some of the achievements you helped us reach in 2024:
22,459,816 documents now available through IDL!
If you’re able, please consider making a tax-deductible donation to the Industry Documents Library to help us preserve and provide access to the collections for years to come.
From all of us at the IDL, we wish you a peaceful holiday season, and a healthy and hopeful New Year ahead.
Kate, Rachel, Rebecca, Sven, Melissa, J.A., Emma, and Julie
Truth Tobacco Industry Documents
Juul Labs Collection
117,000+ new documents were posted to the Juul Labs Collection today. This brings the collection to over 2.9 million documents and includes social media reports, marketing campaigns, product complaint logs, product design materials, and more.
In partnership with the University of North Carolina at Chapel Hill Libraries, the IDL continues to process and make available documents subject to public disclosure under JUUL Labs’s 2021 settlement with North Carolina.
Check out “Giant Companies Took Secret Payments to Allow Free Flow of Opioids,” a wonderful in-depth investigation (and use of Opioid Archive documents!) on pharmacy benefit managers (PBMs) by Chris Hamby for the New York Times. Learn more about interactions between PBMs and opioid manufacturers like Purdue and Mallinckrodt.
A compilation of OIDA documents cited in the article:
For another perspective on PBMs and the opioid crisis, read Catherine Dunn's October article in Barron's, Confidential Files Detail PBMs’ Backroom Negotiations—and Their Role in the Opioid Crisis.
OIDA staff added 259,000+ documents to its newest collection, the Teva and Allergan Documents. This batch brings the collection to more than 848,000 documents and includes sales training presentations, marketing communications, and more.
The Teva and Allergan collection will encompass about 1.9 million documents when complete. Processed documents are being made public on a rolling basis with monthly releases expected from 2024-2026.
Announcing the OIDA Data Products
Explore our newest resource, OIDA Data Products — tools that can facilitate and inspire research.
We created these datasets to provide access points for data analysis of Opioid Industry Documents. Researchers get a running start on exploring data, benefiting from our work to curate and deduplicate documents, provide a glossary of spreadsheet column names, and more. Users can craft queries online or select a subset of the data for download, allowing them to interact with OIDA data before dedicating time and resources to a full analysis.
“OIDA Data Products reduces some of the barriers to working with OIDA data, helping researchers get a sense of the many gems hidden among OIDA’s millions of documents,” said Kevin Hawkins, OIDA program director for Johns Hopkins University. “Working with data wranglers, statisticians, and developers, we hope these data products will facilitate new research, helping us to better understand the opioid crisis.”
To learn more and access OIDA Data Products, visit https://data.oida-resources.jhu.edu/.
151,000+ new documents were posted to the Juul Labs Collection today!
This new batch of documents includes social media presence reports, marketing campaigns, focus group findings, product design, and more.
In partnership with the University of North Carolina at Chapel Hill Libraries, the IDL continues to process and make available documents subject to public disclosure under JUUL Labs’s 2021 settlement with North Carolina.
The Opioid Industry Documents Archive (OIDA), a collaborative undertaking between the University of California, San Francisco and Johns Hopkins University, invites you to explore our newest resource, OIDA Data Products—tools that can facilitate and inspire research.
We created these datasets to provide access points for data analysis of OIDA documents. Researchers get a running start on exploring data, benefiting from our work to curate and deduplicate documents, provide a glossary of spreadsheet column names, and more. Users can craft queries online or select a subset of the data for download, allowing them to interact with OIDA data before dedicating time and resources to a full analysis.
“OIDA Data Products reduces some of the barriers to working with OIDA data, helping researchers get a sense of the many gems hidden among OIDA’s millions of documents,” said Kevin Hawkins, OIDA program director for Johns Hopkins University. “Working with data wranglers, statisticians and developers, we hope these data products will facilitate new research helping us to better understand the opioid crisis.”
Current OIDA Data Products include:
OIDA was launched by UCSF and Johns Hopkins in March 2021 as a free public resource. The digital repository includes publicly disclosed documents arising from litigation brought against opioid manufacturers, distributors, pharmacies and consultants by local and state governments and tribal communities.
The Archive contains more than 17.9 million pages in 3.8 million documents and is expected to continue to grow for years to come. Documents are full-text searchable and include an array of relevant materials from many different companies, including emails, memos, presentations, sales reports, budgets, audit reports, Drug Enforcement Administration briefings, meeting agendas and minutes, expert witness reports and trial transcripts.
OIDA may be of use to many different parties, including individuals and communities harmed by the opioid crisis, as well as the media, health care practitioners, students, lawyers, and researchers. Major news outlets such as the Washington Post and New York Times and academic resources like Health Affairs Scholar and the American Journal of Public Health have published investigative reports and analysis using OIDA documents.
To learn more and access OIDA Data Products, visit https://data.oida-resources.jhu.edu/.
The team behind the UCSF-JHU Opioid Industry Documents Archive was pleased to release its latest OIDA resource in late October: the OIDA Image Collection. This incredible new resource highlights images extracted from documents created by the opioid industry. Many of these documents were designed for internal company audiences and board members, while others were targeted to prescribers and consumers. The images provide insight into corporate practices that shaped the opioid crisis.
The OIDA Image Collection currently features 3,907 images extracted from PowerPoint and Excel documents in OIDA. How did we select these images? And how did we create metadata like titles, descriptions, and categories to help you browse and find images of interest? It was a complicated process involving a mix of automated and manual steps.
First, we processed all PowerPoint and Excel documents in OIDA as of December 2023 to remove every image embedded in these documents – roughly 4 million images!
Second, we ran some code that removed:
This yielded roughly 100,000 images.
Third, we deduplicated the images found, tracking the source documents so that in the OIDA Image Collection website we could link to all documents in the archive where the image was found. This step left us with just 13,688 images. This dramatic drop is due in part to the fact that OIDA often contains more than one copy of the same email attachment and each version of the document circulated by company employees.
Fourth, our metadata librarian reviewed these 13,688 images to decide which to keep according to our subjective criteria for inclusion. The criteria were refined over time with input from a number of OIDA team members, but in the end, we decided to discard the following:
Finally, we were left with the 3,907 images that we’ve made available in the collection.
We used a mix of AI models and human expert review to generate metadata to help users browse and search for images.
The OIDA Image Collection website runs on an instance of WordPress, with the images themselves served from a content delivery network (CDN) to increase performance of the site and ease the process of updating the website. We hired Mission Media, a web agency familiar with creating polished websites that follow university branding requirements, to build the website.
The website’s search feature uses not just the metadata fields but also the text within the image, which we generated using optical character recognition (OCR).
AI models are rapidly developing, so we expect to get better results when we use them in the future. One way you can help us with that is by participating in our collaboration with Hugging Face to test multiple AI models for writing image descriptions (captions). While no model is perfect, we want to know which is our best starting point for generating descriptions for images in the future.
We are also considering adding new features to the website, like allowing user corrections and annotations and improving the “related images” feature to be based on the image files themselves rather than just the metadata. And we might adjust the entropy score used at the filtering stage to be less aggressive in removing images before human review.
Over the past few months, the OIDA team has added nearly 600,000 documents from Teva to the archive, with many more to come from this and other companies. We also plan to expand our image detection and classification techniques beyond PowerPoint and Excel documents to the many other file formats found in OIDA. So we hope to mine the archive for many more images to add to the OIDA Image Collection, providing an even richer view on how opioids and their effects were represented or misrepresented to patients and prescribers, and more broadly how the drug crisis was imagined and perpetuated by the industry while it unfolded.
OIDA staff added 218,267 documents to its newest collection, the Teva and Allergan Documents. This batch brings the collection to more than 588,000 documents and includes sales training presentations, interviews with prescribers, reports on focus groups, product communications, and more.
The Teva and Allergan collection will encompass about 1.9 million documents when complete. Processed documents are being made public on a rolling basis with monthly releases expected from 2024-2026.
Announcing the OIDA Image Collection and How You Can Help!
We are proud to introduce the OIDA Image Collection, a website created to highlight images within the OIDA documents. Images provide unique entry points to understand a visual narrative of the opioid industry and gain insight into harmful corporate and marketing practices that contributed to the opioid crisis. Researchers can browse, limit their results by filters, and search by keyword. By viewing the source documents, you can see the images in their original context.
The OIDA team used artificial intelligence (AI) to write captions for highlighted images within the OIDA Image Collection but we could use your help! We have generated captions using two different AI models and need to decide which AI-generated caption is better for use in the OIDA Image Collection. Thanks to support from Hugging Face, a platform for collaborating on models and datasets for machine learning, and its Argilla data annotation tool, we have created a handy interface for voting on the quality of image captions. To help us out, you’ll just need to create a free Hugging Face account.
Your image labeling efforts will contribute to an open preference dataset, crucial for "steering" AI models towards generating more useful outputs in specific domains. Please email opioidarchive@jh.edu with any questions.
117K new documents were posted to the Juul Labs Collection today!
This new batch of documents includes social media presence reports, marketing campaigns, focus group findings, product design, and more.
In partnership with the University of North Carolina at Chapel Hill Libraries, the IDL continues to process and make available documents subject to public disclosure under JUUL Labs’s 2021 settlement with North Carolina.
2019 SRNT slide deck.
World Digital Preservation Day – 7 November 2024
Each year, the Digital Preservation Coalition promotes World Digital Preservation Day, which falls on the first Thursday of November.
In line with this year’s theme of 'Preserving Our Digital Content: Celebrating Communities,' the UC Libraries’ Digital Preservation Working Group (DPWG) is hosting a community-building Open House event which presents an opportunity for everyone to learn more about digital preservation while also sharing their own stories and experiences in this space.
Please join the event online on November 7, 2024 from 11AM – 12PM where you’ll hear digital preservation stories from us at the UCSF Industry Documents Library as well as the UC & Jepson Herbaria.
Register via Zoom.
Annual Tobacco and Other Industry Documents Workshop: Recording Now Available!
IDL and the UCSF Center for Tobacco Research and Education (CTCRE) held the "Annual Tobacco and Other Industry Documents Workshop" virtually on October 8th from 9 am-12:15 pm PT.
If you didn't have a chance to join us, the event recording is now available