Collections
Subscribe and Share
Links
Friday, February 07, 2025

January 2025 Updates - Tobacco, Opioid and Chemical Industry Documents

Collection Updates


Opioid Industry Documents Archive
Teva and Allergan Documents

OIDA staff added 226,880 documents to its newest collection, the Teva and Allergan Documents. This batch brings the collection to more than 1.3 million documents and includes sales training presentations, marketing communications, and more.
The Teva and Allergan collection will encompass about 1.9 million documents when complete. Processed documents are being made public on a rolling basis with monthly releases expected through 2025.


Truth Tobacco Industry Documents
JUUL Labs Collection

2,800+ new documents were posted to the Juul Labs Collection today! In partnership with the University of North Carolina at Chapel Hill Libraries, the IDL has processed and made available documents subject to public disclosure under Juul Labs’s 2021 settlement with North Carolina. The IDL is pleased to announce that we have neared completion for the processing of these documents! The project began in December 2023, from which point our archivists have been working to release an average of 240,000 documents every month to our public website. With the onset of 2025, the IDL team has amassed a significantly smaller release of records this January, consisting of documents that required more time-consuming and complicated PII redactions, or some technical challenges that we saved for the end. However, this small release does indicate the majority of the North Carolina Juul Labs documents are now fully available online to our researcher communities.

In the coming months, the IDL archiving team will work through what is left in the NC Juul documents – all files that were originally large ZIP files, the structure of which has been disrupted, and the contents came to the IDL separated as individual records. We have observed that these small files, unfortunately, do not offer much value without the greater context of the original ZIP, and we will work towards reconciling that original structure and release the files accordingly.


New California JUUL Documents Coming Soon
Although we have neared the end of the North Carolina Juul documents, the IDL will soon release additional documents from the California Juul multistate settlement, which was negotiated by the California Department of Justice and six other states in 2023. These forthcoming releases will not be duplicates of the approximately 3 million Juul Labs records already in the IDL but rather are new additions that will further enrich the Juul Labs Collection. Our first release of the new California Juul documents will be coming next month.


Depositions and Trial Transcripts (DATTA)
57 new transcripts of tobacco trial testimony and depositions by Robert Proctor.


Chemical Industry Documents Archive: The Forever Pollution Project Collection

In February 2023, five European countries proposed a PFAS "universal restriction" under the EU chemical regulation REACH (Registration, Evaluation, Authorization and Restriction of Chemicals). The ban would include the entire PFAS chemical 'universe', with some derogations until alternatives are developed. In response, hundreds of industry players have been lobbying decision-makers across Europe to undermine and perhaps kill the proposal.

Over the course of a year, a team of 46 journalists in 16 countries investigated the lobbying and disinformation campaign by the PFAS industry and its allies. This cross-border, interdisciplinary investigation known as the Forever Lobbying Project collected over 14,000 unpublished documents on PFAS, constituting the world’s largest collection to date on the topic. The majority originate from 184 freedom of information requests, 66 of which were shared with the group by the EU lobby watchdog, Corporate Europe Observatory.

This unique trove of documents was donated by the Forever Lobbying Project and is now available to the public in our new Forever Pollution Project Collection.


Purdue/Sackler settlement under consideration includes document disclosure requirement:

The proposed $7.4 billion settlement with members of the Sackler family and their company, Purdue Pharma (Purdue), includes a provision for document disclosure, which would require Purdue to make public more than 30 million documents related to Purdue and the Sacklers’ opioid business.

According to the Office of the Massachusetts Attorney General, if the settlement is approved, the documents are “expected to be added to the existing public document repository” (UCSF-JHU Opioid Industry Documents Archive) that already houses millions of documents from multiple industries responsible for the crisis.

UCSF and Johns Hopkins University are pleased that these vitally significant documents are one step closer to being made public. The Opioid Industry Documents Archive provides evidence on how and why this crisis happened, so that this type of tragedy can be prevented from occurring again.

We look forward to having the opportunity to contribute our expertise in public health, digital archives, and information technology to enable timely and free public access to these important documents.


Education & Research Updates

Center to End Corporate Harm Launches at UCSF

We are very excited to announce the new UCSF Center to End Corporate Harm!

Products, including fossil fuels, chemicals, alcohol, tobacco and ultra-processed foods are now responsible for approximately one in three deaths worldwide. In the US, a rise in chronic diseases, including cancer (175%), diabetes (283%), Parkinson’s (133%), and dementias (75%), have led to what the scientists say is an “industrial epidemic” of disease.

The Center to End Corporate Harm brings together scientists, researchers, and physicians who study various health-harming industries and, in collaboration with the UCSF Industry Documents Library, are working to identify, analyze, and prevent industry-driven disease and develop strategies to counter the destructive influence of polluters and poisoners.


Could You Be the 2025 UCSF Library Artist in Residence?

The UCSF Library Archives and Special Collections and Makers Lab are accepting proposals for the sixth annual UCSF Library Artist in Residence program. The UCSF Library Artist in Residence award, valued at $8,000, will be given annually to one candidate with a degree in studio arts or a related field or a history of exhibiting artistic work in professional venues. The 2025 residency will begin on July 1, 2025 and end on June 30, 2026.
For more information and application process, please visit the UCSF Library site


UC Love Data Week

The UC Love Data Week is a week-long offering of presentations and workshops focused on data access, management, security, sharing, and preservation. All members of the University of California community are welcome to attend.

The IDL will be featured in the Friday, February 14th session at 3pm: Unlocking image, audio, and video data in the Industry Documents Library: a Python based, open source stack for audio transcription, text extraction, sentiment analysis, and topic classification

Thursday, December 19, 2024

Industry Documents Library - 2024 in Review

Season’s Greetings from the UCSF Industry Documents Library!

As 2024 comes to a close, we’d like to share our gratitude for all of you in the IDL community and your ongoing support and connection to our work.
Here are some of the achievements you helped us reach in 2024:

22,459,816 documents now available through IDL!

  • In collaboration with Johns Hopkins University, we continued to acquire and make public millions of documents disclosed in opioid litigation through the UCSF-JHU Opioid Industry Documents Archive (OIDA), including a major new collection of Teva and Allergan materials. There are now over 4 million opioid industry documents available!
  • We launched the Juul Labs Collection in partnership with the University Libraries at the University of North Carolina at Chapel Hill. We’ve added close to 3 million documents to the collection this year and it will continue to expand with additional Juul Labs documents in 2025.
  • We welcomed Emma James and Julie Hillpot to the IDL Team: Emma is our project archivist for the Juul Labs Collection, and Julie is supporting our data annotation and quality control workflows for opioid industry documents.
  • We delivered multiple webinars, workshops, and presentations, including the annual Tobacco and Other Industry Documents Workshop in partnership with the UCSF Center for Tobacco Control Research and Education.
  • We continued to make significant progress on redesigning and rebuilding the IDL website to add new features and make it easier to search. Stay tuned for more news about this next year!
  • We continued our Student Data Science Summer Fellowship in collaboration with the UCSF Library Archives & Special Collections and the Data Science and Open Scholarship team.
  • We added 33 new publications which cite industry documents to our Bibliography, bringing the total number of citations to 1,209!

If you’re able, please consider making a tax-deductible donation to the Industry Documents Library to help us preserve and provide access to the collections for years to come.

From all of us at the IDL, we wish you a peaceful holiday season, and a healthy and hopeful New Year ahead.

Kate, Rachel, Rebecca, Sven, Melissa, J.A., Emma, and Julie

Thursday, December 19, 2024

New Teva and Juul Labs Documents Posted!

Opioid Industry Documents Archive
Teva and Allergan Documents

OIDA staff added 235,705 documents to its newest collection, the Teva and Allergan Documents. This batch brings the collection to over 1 million documents and includes sales training presentations, marketing communications, and more.

The Teva and Allergan collection will encompass about 1.9 million documents when complete. Processed documents are being made public on a rolling basis with monthly releases expected through 2025.

Truth Tobacco Industry Documents
Juul Labs Collection

117,000+ new documents were posted to the Juul Labs Collection today. This brings the collection to over 2.9 million documents and includes social media reports, marketing campaigns, product complaint logs, product design materials, and more.

In partnership with the University of North Carolina at Chapel Hill Libraries, the IDL continues to process and make available documents subject to public disclosure under JUUL Labs’s 2021 settlement with North Carolina.

Tuesday, December 17, 2024

Opioid Archive Documents on PBMs

Check out “Giant Companies Took Secret Payments to Allow Free Flow of Opioids,” a wonderful in-depth investigation (and use of Opioid Archive documents!) on pharmacy benefit managers (PBMs) by Chris Hamby for the New York Times. Learn more about interactions between PBMs and opioid manufacturers like Purdue and Mallinckrodt.

A compilation of OIDA documents cited in the article:

Screenshot of New York Times website, 2024-12-17, including a headline and image illustrating an article about PBMs and opioids.

For another perspective on PBMs and the opioid crisis, read Catherine Dunn's October article in Barron's, Confidential Files Detail PBMs’ Backroom Negotiations—and Their Role in the Opioid Crisis.

Thursday, November 21, 2024

November 2024 Updates - New Opioid and JUUL Documents

Opioid Industry Documents Archive - Teva and Allergan Documents


OIDA staff added 259,000+ documents to its newest collection, the Teva and Allergan Documents. This batch brings the collection to more than 848,000 documents and includes sales training presentations, marketing communications, and more.

The Teva and Allergan collection will encompass about 1.9 million documents when complete. Processed documents are being made public on a rolling basis with monthly releases expected from 2024-2026.


Announcing the OIDA Data Products

Explore our newest resource, OIDA Data Products — tools that can facilitate and inspire research.

We created these datasets to provide access points for data analysis of Opioid Industry Documents. Researchers get a running start on exploring data, benefiting from our work to curate and deduplicate documents, provide a glossary of spreadsheet column names, and more. Users can craft queries online or select a subset of the data for download, allowing them to interact with OIDA data before dedicating time and resources to a full analysis.

“OIDA Data Products reduces some of the barriers to working with OIDA data, helping researchers get a sense of the many gems hidden among OIDA’s millions of documents,” said Kevin Hawkins, OIDA program director for Johns Hopkins University. “Working with data wranglers, statisticians, and developers, we hope these data products will facilitate new research, helping us to better understand the opioid crisis.”

To learn more and access OIDA Data Products, visit https://data.oida-resources.jhu.edu/.


Tobacco Industry Documents Archive - Juul Labs Collection

151,000+ new documents were posted to the Juul Labs Collection today!
This new batch of documents includes social media presence reports, marketing campaigns, focus group findings, product design, and more.

In partnership with the University of North Carolina at Chapel Hill Libraries, the IDL continues to process and make available documents subject to public disclosure under JUUL Labs’s 2021 settlement with North Carolina.


An update regarding our audio-visual files:

The IDL partners with the excellent Internet Archive to host the audio and video files found in our industry documents archives.

As you may be aware, the Internet Archive has recently faced a series of cyberattacks, prompting them to enhance security measures, strengthen firewalls, and update software. Unfortunately, these challenges have temporarily prevented the upload of multimedia items, impacting our last two document releases (October and November 2024).

We are closely monitoring the situation and maintaining communication with the Internet Archive team. Once uploading can resume, we will begin posting the audio and video files from our latest collection additions.


New Papers and Publications


Wednesday, November 20, 2024

Introducing OIDA Data Products

The Opioid Industry Documents Archive (OIDA), a collaborative undertaking between the University of California, San Francisco and Johns Hopkins University, invites you to explore our newest resource, OIDA Data Products—tools that can facilitate and inspire research.

We created these datasets to provide access points for data analysis of OIDA documents. Researchers get a running start on exploring data, benefiting from our work to curate and deduplicate documents, provide a glossary of spreadsheet column names, and more. Users can craft queries online or select a subset of the data for download, allowing them to interact with OIDA data before dedicating time and resources to a full analysis.

“OIDA Data Products reduces some of the barriers to working with OIDA data, helping researchers get a sense of the many gems hidden among OIDA’s millions of documents,” said Kevin Hawkins, OIDA program director for Johns Hopkins University. “Working with data wranglers, statisticians and developers, we hope these data products will facilitate new research helping us to better understand the opioid crisis.”

Current OIDA Data Products include:

The webpage will grow as OIDA continues to identify and develop new OIDA Data Products.

OIDA was launched by UCSF and Johns Hopkins in March 2021 as a free public resource. The digital repository includes publicly disclosed documents arising from litigation brought against opioid manufacturers, distributors, pharmacies and consultants by local and state governments and tribal communities.

    

The Archive contains more than 17.9 million pages in 3.8 million documents and is expected to continue to grow for years to come. Documents are full-text searchable and include an array of relevant materials from many different companies, including emails, memos, presentations, sales reports, budgets, audit reports, Drug Enforcement Administration briefings, meeting agendas and minutes, expert witness reports and trial transcripts. 

  

OIDA may be of use to many different parties, including individuals and communities harmed by the opioid crisis, as well as the media, health care practitioners, students, lawyers, and researchers. Major news outlets such as the Washington Post and New York Times and academic resources like Health Affairs Scholar and the American Journal of Public Health have published investigative reports and analysis using OIDA documents.

   

To learn more and access OIDA Data Products, visit https://data.oida-resources.jhu.edu/.

Monday, November 11, 2024

Behind the Scenes of the OIDA Image Collection

The team behind the UCSF-JHU Opioid Industry Documents Archive was pleased to release its latest OIDA resource in late October: the OIDA Image Collection. This incredible new resource highlights images extracted from documents created by the opioid industry. Many of these documents were designed for internal company audiences and board members, while others were targeted to prescribers and consumers. The images provide insight into corporate practices that shaped the opioid crisis.

The OIDA Image Collection currently features 3,907 images extracted from PowerPoint and Excel documents in OIDA. How did we select these images? And how did we create metadata like titles, descriptions, and categories to help you browse and find images of interest? It was a complicated process involving a mix of automated and manual steps.

Selecting the images

First, we processed all PowerPoint and Excel documents in OIDA as of December 2023 to remove every image embedded in these documents – roughly 4 million images!

Second, we ran some code that removed:

  • Images whose original filename included “thumb” (likely indicating a low-resolution thumbnail of little value)
  • images with a width or height of less than 200 pixels (likely too small to be useful)
  • images with an entropy score of less than 6.0 (likely indicating a gradient horizontal rule, background image, or other meaningless decoration)

This yielded roughly 100,000 images.

Third, we deduplicated the images found, tracking the source documents so that in the OIDA Image Collection website we could link to all documents in the archive where the image was found. This step left us with just 13,688 images. This dramatic drop is due in part to the fact that OIDA often contains more than one copy of the same email attachment and each version of the document circulated by company employees.

Fourth, our metadata librarian reviewed these 13,688 images to decide which to keep according to our subjective criteria for inclusion. The criteria were refined over time with input from a number of OIDA team members, but in the end, we decided to discard the following:

  • purely decorative images
  • bare logos and branding templates
  • headshots
  • text-heavy images
  • computer screenshots
  • illegible or meaningless images
  • sexually explicit images

Finally, we were left with the 3,907 images that we’ve made available in the collection.

Describing the images

We used a mix of AI models and human expert review to generate metadata to help users browse and search for images.

  • Title: So far we have only been creating these by hand, so not all images have a Title at this time. We are considering using AI to generate titles in the future.
  • Description: We generated these using Microsoft’s Florence-2 AI model, and our metadata librarian, with help from our collection archivist, is reviewing these for typos, nonsensical statements, inaccuracies, and more. When we make small corrections, we leave them labeled with the “AI” badge in the website interface, but if we rewrite from scratch, we remove the AI badge.
  • Type: We generated these using OpenAI’s CLIP AI model, but our metadata librarian reviewed these and made corrections to about 40% of the AI-assigned values.
  • Category: Our metadata librarian assigned these by hand.

Making the images available online

The OIDA Image Collection website runs on an instance of WordPress, with the images themselves served from a content delivery network (CDN) to increase performance of the site and ease the process of updating the website. We hired Mission Media, a web agency familiar with creating polished websites that follow university branding requirements, to build the website.

The website’s search feature uses not just the metadata fields but also the text within the image, which we generated using optical character recognition (OCR).

Next steps

AI models are rapidly developing, so we expect to get better results when we use them in the future. One way you can help us with that is by participating in our collaboration with Hugging Face to test multiple AI models for writing image descriptions (captions). While no model is perfect, we want to know which is our best starting point for generating descriptions for images in the future.

We are also considering adding new features to the website, like allowing user corrections and annotations and improving the “related images” feature to be based on the image files themselves rather than just the metadata. And we might adjust the entropy score used at the filtering stage to be less aggressive in removing images before human review.

Over the past few months, the OIDA team has added nearly 600,000 documents from Teva to the archive, with many more to come from this and other companies. We also plan to expand our image detection and classification techniques beyond PowerPoint and Excel documents to the many other file formats found in OIDA. So we hope to mine the archive for many more images to add to the OIDA Image Collection, providing an even richer view on how opioids and their effects were represented or misrepresented to patients and prescribers, and more broadly how the drug crisis was imagined and perpetuated by the industry while it unfolded.

PREV
NEXT