
OIDA staff added 218,267 documents to its newest collection, the Teva and Allergan Documents. This batch brings the collection to more than 588,000 documents and includes sales training presentations, interviews with prescribers, reports on focus groups, product communications, and more.
The Teva and Allergan collection will encompass about 1.9 million documents when complete. Processed documents are being made public on a rolling basis with monthly releases expected from 2024-2026.
Announcing the OIDA Image Collection and How You Can Help!
We are proud to introduce the OIDA Image Collection, a website created to highlight images within the OIDA documents. Images provide unique entry points to understand a visual narrative of the opioid industry and gain insight into harmful corporate and marketing practices that contributed to the opioid crisis. Researchers can browse, limit their results by filters, and search by keyword. By viewing the source documents, you can see the images in their original context.
The OIDA team used artificial intelligence (AI) to write captions for highlighted images within the OIDA Image Collection but we could use your help! We have generated captions using two different AI models and need to decide which AI-generated caption is better for use in the OIDA Image Collection. Thanks to support from Hugging Face, a platform for collaborating on models and datasets for machine learning, and its Argilla data annotation tool, we have created a handy interface for voting on the quality of image captions. To help us out, you’ll just need to create a free Hugging Face account.
Your image labeling efforts will contribute to an open preference dataset, crucial for "steering" AI models towards generating more useful outputs in specific domains. Please email opioidarchive@jh.edu with any questions.
117K new documents were posted to the Juul Labs Collection today!
This new batch of documents includes social media presence reports, marketing campaigns, focus group findings, product design, and more.
In partnership with the University of North Carolina at Chapel Hill Libraries, the IDL continues to process and make available documents subject to public disclosure under JUUL Labs’s 2021 settlement with North Carolina.
2019 SRNT slide deck.
World Digital Preservation Day – 7 November 2024
Each year, the Digital Preservation Coalition promotes World Digital Preservation Day, which falls on the first Thursday of November.
In line with this year’s theme of 'Preserving Our Digital Content: Celebrating Communities,' the UC Libraries’ Digital Preservation Working Group (DPWG) is hosting a community-building Open House event which presents an opportunity for everyone to learn more about digital preservation while also sharing their own stories and experiences in this space.
Please join the event online on November 7, 2024 from 11AM – 12PM where you’ll hear digital preservation stories from us at the UCSF Industry Documents Library as well as the UC & Jepson Herbaria.
Register via Zoom.
Annual Tobacco and Other Industry Documents Workshop: Recording Now Available!
IDL and the UCSF Center for Tobacco Research and Education (CTCRE) held the "Annual Tobacco and Other Industry Documents Workshop" virtually on October 8th from 9 am-12:15 pm PT.
If you didn't have a chance to join us, the event recording is now available
The recently launched OIDA Image Collection highlights images found within OIDA and includes a description of each image, but it would have taken an enormous amount of time to write a description for each image. Therefore, the OIDA team used artificial intelligence (AI) to write captions for these images to make them more discoverable in the OIDA Image Collection.
But we need your help! We have generated captions using two different AI models and need to decide which AI-generated caption is better for use in the OIDA Image Collection. Thanks to support from Hugging Face, a platform for collaborating on models and datasets for machine learning, and its Argilla data annotation tool, we have created a handy interface for voting on the quality of image captions. To help us out, you’ll just need to create a free Hugging Face account.
Your image labeling efforts will contribute to an open preference dataset, crucial for "steering" AI models towards generating more useful outputs in specific domains.
“Projects like this ensure AI becomes useful for a wider range of audiences, aligning with Hugging Face’s mission to democratize machine learning and make AI more accessible and impactful across diverse fields,” said Daniel van Strien, machine learning librarian at Hugging Face and OIDA National Advisory Committee member.
Vision Language Models (VLMs) represent a cutting-edge field in AI, and your contributions will enable the development of more specialized models for important applications such as captioning large archival image collections. By participating, you're not just helping OIDA – you're shaping the future of AI to better serve specialized communities and enhance visual information accessibility for a wider range of document types.
The Opioid Industry Documents Archive (OIDA), a collaborative undertaking between the University of California, San Francisco and Johns Hopkins University, announces the launch of the OIDA Image Collection, a website created to highlight images within the documents, providing a new way to explore the Archive that will generate fundamental new knowledge about the U.S. opioid epidemic.
The OIDA Image Collection highlights images extracted from publicly disclosed industry documents—many originally created for internal company audiences and board members, others targeted to prescribers and consumers. These images provide insight into corporate practices that shaped the opioid crisis.
“The images in this collection offer unique insight into the strategies the pharmaceutical industry used to expand its reach and present opioid products as a safe and effective solution for nearly all forms of pain,” said Cecília Tomori, PhD, an associate professor at the Johns Hopkins School of Nursing, with a joint appointment in the Department of Population, Family and Reproductive Health at the Johns Hopkins Bloomberg School of Public Health. “The images played a powerful role in efforts to mislead health professionals and the public as the industry sought ever-increasing profits.”
Computer scientists working with OIDA archivists and librarians extracted images embedded in Microsoft Office documents and then filtered them based on properties such as size and color variation. The team removed exact duplicates and then manually sorted images into high-level categories, such as Business Strategy, Sales and Marketing, and Pain Management. The processes of extracting, filtering and deduplicating images were carried out using SciServer with support from the Johns Hopkins Institute for Data-Intensive Engineering and Science.
Using the OIDA Image Collection, researchers can browse compelling images that tell stories on their own, and they may continue on to investigate the source documents to study the broader context. All audiences, including those directly impacted by epidemic, can use the OIDA Image Collection to interact with OIDA in more visual and accessible way.
OIDA was launched by UCSF and Johns Hopkins in March 2021 as a free public resource. The digital repository includes publicly disclosed documents arising from litigation brought against opioid manufacturers, distributors, pharmacies and consultants by local and state governments and tribal communities.
The Archive contains more than 16.2 million pages in 3.5 million documents and is expected to continue to grow for years to come. Documents are full-text searchable and include an array of relevant materials from many different companies, including emails, memos, presentations, sales reports, budgets, audit reports, Drug Enforcement Administration briefings, meeting agendas and minutes, expert witness reports and trial transcripts.
OIDA may be of use to many different parties, including individuals and communities harmed by the opioid crisis, as well as the media, health care practitioners, students, lawyers, and researchers. Major news outlets such as the Washington Post and New York Times and academic resources like Health Affairs Scholar and the American Journal of Public Health have published investigative reports and analysis using OIDA documents.
To learn more and access the OIDA Image Collection, visit images.oida-resources.jhu.edu/.
The UCSF Industry Documents Library is pleased to highlight the work of 2024 Summer Fellow Gordon Lichtstein. Gordon is an incoming MIT student with an interest in the intersection of computer science and linguistics in NLP and the application of NLP for the betterment of humanity such as in environmental sustainability or the digital humanities.
Over the course of the 8-week internship, Gordon crafted and completed four distinct projects that leverage natural language processing and data science within the context of our JUUL Labs Collection and the broader IDL. Project One investigates the optical character recognition (OCR) accuracy of low-quality and handwritten documents in the absence of ground truth data. Project Two explores the implementation of embedding search algorithms and visualizations aimed at enhancing the relevance of document recommendations for users. Project Three employs txt-ferret to conduct a thorough scan of a substantial corpus of industry documents to identify sensitive information, including credit card numbers. Finally, Project Four assesses the biases present in large language model (LLM) summarization through the lens of sentiment analysis.
Read Gordon's entire report and reflection via eScholarship.
The IDL staff is deeply appreciative of Gordon's thoughtful and comprehensive contributions, as well as his engagement in team meetings and Amazon Web Services workshops. His projects and use of NLP techniques with our document corpus have greatly enriched our understanding.
OIDA staff added 123K documents to its newest collection, the Teva and Allergan Documents. This batch brings the collection to more than 370,000 documents and includes sales training presentations, interviews with prescribers on Fentora message testing, reports on focus group responses to REMS messaging, and more.
The Teva and Allergan collection will encompass about 1.9 million documents when complete. Processed documents are being made public on a rolling basis with monthly releases expected from 2024-2026.
213,762 new documents were posted to the JUUL Labs Collection today!
This new batch of documents includes social media presence reports, marketing campaigns, focus group findings, product design, and more.
In partnership with the University of North Carolina at Chapel Hill Libraries, the IDL continues to process and make available documents subject to public disclosure under JUUL Labs’s 2021 settlement with North Carolina.
Above image: 2019 JUUL Early Primary State Focus Group Findings slide deck.
The UCSF Library and five community-based partner organizations have received a $97,000 grant from the California State Library to co-create the Opioid Crisis Community Archive (OCCA).
This archive, the first of its kind, will document the impact of the opioid crisis on communities and community-based service organizations in Northern California. While OIDA documents illuminate the corporate and business realities of the opioid crisis, the OCCA aims to close the gap in information about the community response. A key output of the OCCA project is the inclusion of underrepresented voices in the historical record to curate the archive.
Learn more: California State Archive Funds Opioid Crisis Community Archive
Annual Tobacco and Other Industry Documents Workshop
IDL and the UCSF Center for Tobacco Research and Education (CTCRE) will hold the "Annual Tobacco and Other Industry Documents Workshop" virtually on October 8th from 9 am-12:15 pm PT.
This event offers an in-depth exploration of our Industry Documents Library and our new JUUL Labs documents collection and features talks from experts on how document collections can be used for teaching, research, and public health advocacy. Please join us!
Register at: https://tiny.ucsf.edu/TIDWS2024
New to the 'Archives as Data' world? The DHH program is here to help! The UCSF DHH now offers virtual courses to help with digital humanities projects, provide 1:1 research consultations, and created a new 'Archives as Data' research guide which is a centralized resource containing descriptions of datasets that have been prepared from UCSF archival collections, including the following:
The Opioid Industry Documents Archive (OIDA), a collaborative undertaking between the University of California, San Francisco and Johns Hopkins University, today announced the launch of the OIDA Toolbox, a website created to promote data exploration and visualization of OIDA, providing new ways for the public to make sense of its massive collection of more than 3 million documents.
The OIDA Toolbox provides a central access point for documentation and ready-to-run code to help users access the raw data behind OIDA, which includes metadata, the documents themselves in various file formats, and the text extracted from the documents.
To supplement the UCSF Industry Document Library’s Solr API, researchers now have the option to access and work with all of OIDA’s raw data through Amazon Web Services or through Johns Hopkins University’s SciServer virtual environment.
“The volume of data can be overwhelming—how can researchers review millions of documents efficiently? We can help!” said Kevin Hawkins, OIDA program director for Johns Hopkins University. “The toolbox will continue to grow as users make requests and OIDA identifies new ways to engage with the data.”
The OIDA Toolbox promotes original scholarship in disciplines such as public health, communications, sociology, history, linguistics and computer science. Ultimately, discoveries from an OIDA data toolbox can help to improve and safeguard public policy and public health, and to ensure that the opioid-related harms that have taken place never occur again.
“Our ability to mine the documents for information that will assist the public health community should be greatly enhanced with this technology,” said Dr. Christopher K. Haddock, chief data and analytics officer and senior scientist at NDRI-USA, who is one of the first to try out these tools. He added, “You’ve already saved us months of work!”
OIDA was launched by UCSF and Johns Hopkins in March 2021 as a free public resource. The digital repository includes previously internal documents made public through legal settlements to enable multiple audiences to explore and investigate information which shines a light on the opioid crisis.
The Archive contains more than 15.3 million pages in 3.4 million documents and is expected to continue to grow for years to come. Documents are full-text searchable and include an array of relevant materials from many different companies, including emails, memos, presentations, sales reports, budgets, audit reports, Drug Enforcement Administration briefings, meeting agendas and minutes, expert witness reports and trial transcripts.
OIDA may be of use to many different parties, including families harmed by the opioid crisis, as well as the media, health care practitioners, students, lawyers, and researchers. Major news outlets such as the Washington Post and New York Times and academic resources like Evidence & Policy and the American Journal of Public Health have published investigative reports and analysis using OIDA documents.
To learn more and access the OIDA Toolbox, visit https://oida-resources.jhu.edu/oida-toolbox/. Email opioidarchive@jh.edu with questions or for more information.
Teva and Allergan Documents
OIDA staff added approx. 125,000 documents to its newest collection, Teva and Allergan Documents. This second batch brings the collection to more than 247,000 documents and includes adverse drug event reports, sales representative field coaching reports, research articles and more.
The Teva and Allergan collection will encompass about 1.9 million documents when complete. Processed documents are being made public on a rolling basis with monthly releases expected from 2024-2026.
Above image: Slide deck laying out the goals of a proposed PROTECT committee composed of key opinion leaders, developed to support risk management efforts for Fentora (a fentanyl buccal tablet). The verbiage notably omits that Fentora is indicated for cancer patients, shifting the emphasis to chronic pain disorders
JUUL Labs Collection
171,047 new documents were posted to the JUUL Labs Collection today!
This new batch of documents includes influencer reports and presentations, social media presence, targeted marketing campaigns, and more.
In partnership with the University of North Carolina at Chapel Hill Libraries, the IDL continues to process and make available documents subject to public disclosure under JUUL Labs’s 2021 settlement with North Carolina.
Above image: 2018 JUUL Brand Guide slide deck.
We are excited to introduce the 2024 artist Ruth Tabancay and welcome her back to UCSF. Ruth commenced her year-long residency on July 1, 2024.
As the UCSF Library Artist in Residence, Ruth will be continuing her research into the adverse effects of global warming and plastic accumulation on the planet. She will be digging into the IDL's Fossil Fuel and Chemical Industry Documents Archive Collections to find early communications within those industries, responses to government regulations, and how these industries presented themselves to their stockholders and the public.
The artwork she creates for her final exhibition will incorporate plastic discarded at UCSF to highlight the contributions that medical centers make to the growing mass of plastic waste. In addition, Ruth will look at current literature to see how Big Oil and the plastic industry perpetuate the myth of plastic recycling.
Ruth will share updates on her project and upcoming workshops on the UCSF Library Artist in Residence webpage.
Learn more about Ruth’s work: www.ruthtabancay.com • @ruth_tabancay on Instagram • ruth.tabancay on Facebook