The UCSF Industry Documents Library is pleased to highlight the work of 2024 Summer Fellow Gordon Lichtstein. Gordon is an incoming MIT student with an interest in the intersection of computer science and linguistics in NLP and the application of NLP for the betterment of humanity such as in environmental sustainability or the digital humanities.
Over the course of the 8-week internship, Gordon crafted and completed four distinct projects that leverage natural language processing and data science within the context of our JUUL Labs Collection and the broader IDL. Project One investigates the optical character recognition (OCR) accuracy of low-quality and handwritten documents in the absence of ground truth data. Project Two explores the implementation of embedding search algorithms and visualizations aimed at enhancing the relevance of document recommendations for users. Project Three employs txt-ferret to conduct a thorough scan of a substantial corpus of industry documents to identify sensitive information, including credit card numbers. Finally, Project Four assesses the biases present in large language model (LLM) summarization through the lens of sentiment analysis.
Read Gordon's entire report and reflection via eScholarship.
The IDL staff is deeply appreciative of Gordon's thoughtful and comprehensive contributions, as well as his engagement in team meetings and Amazon Web Services workshops. His projects and use of NLP techniques with our document corpus have greatly enriched our understanding.
Opioid Industry Documents Archive
We added 127,511 documents to the UCSF-JHU Opioid Industry Documents Archive's Insys Litigation Documents collection. These documents, which arise from Insys’s early years bringing the fentanyl spray Subsys to market (2012–2013), shed new light on the genesis of the company’s speaker program and reimbursement center (See the Insys At a Glance page for more information), both of which have featured prominently in litigation against Insys.
This release is the fourth batch of Insys documents to be added to OIDA; the Insys collection ultimately will contain several million documents that are currently being processed chronologically. Processed documents will be made public on a rolling basis with monthly releases expected in 2023–2024. Information arising from a December 2022 release (UCSF News, Johns Hopkins University News) served as the basis for reporting from USA Today.
Opioid Industry Documents Archive National Advisory Committee Update
We are pleased to welcome four new members to our National Advisory Committee, a group that supports the Archive through expert recommendations on the project’s development and sustainability pertaining to use, transparency, accessibility, impact, and other measures: Sandy Alexander (former Massachusetts Assistant Attorney General), Michelle Muffett-Lipinski (recovery advocate and Founding Principal, Northshore Recovery High School), Melina Sherman (communications scholar, Knology), and Anthony Ryan Hatch (Professor of the Science in Society Program, Wesleyan University). Many thanks to our outgoing NAC member Beth Macy (author of Raising Lazarus and Dopesick) for her remarkable service.
3,600+ New USRTK Food Industry Documents Added
The 3,634 new documents posted today were donated by USRTK and acquired in their ongoing investigations into the influence of large food and beverage companies on academic partnerships and government regulatory processes around sugary beverages and obesity, among other topics.
Postdoctoral Fellowship in Opioid Industry Documents Research and Community Data Engagement -
The UCSF OIDA Postdoctoral Fellow will pursue original, publishable research using materials housed in OIDA and work closely with the archive research team to enhance the accessibility and usability of archival materials for a diverse array of communities, with a particular focus on racial and health equity. Fellows will work on a multidisciplinary team including faculty, other postdoctoral fellows and research assistants and will be mentored by and work closely with researchers and information specialists at UCSF. Fellows will be based at the UCSF Center for Tobacco Control Research and Education (https://tobacco.ucsf.edu/) and participate fully in the fellowship program. Fellows will also be affiliated with the Department of Humanities and Social Sciences at the UCSF School of Medicine (https://humsci.ucsf.edu/).
Postdoctoral Fellowship in Tobacco Control Research -
The CTCRE Postdoctoral Fellowship offers diverse educational and research opportunities, including a grant writing seminar, graduate research positions, advocacy training, and individualized documents training. Work spans policy and historical research, economics, and science. Fellows are recruited from a variety of fields including the basic sciences, social sciences, public health practitioners, clinical fields, political science, history, economics, law, and marketing.
Fellowship stipends range from $55,500 - $66,600, depending on years of postdoctoral experience.
More about the fellowships and application submission
The Digital Health Humanities Pilot (DHHP) will facilitate new insights into historical health data. Participants from all disciplines (including faculty, staff, and other learners) will learn how to evaluate and integrate digital methods and “archives as data” into their research through a range of offerings and trainings utilizing datasets from holdings within the UCSF Archives and Special Collections (including the AIDS History Project and Industry Documents Library, among others.)
Check out the workshops and sign up!
UC Love Data Week (February 13-17)
Want more information on working with data?
The UC-wide Love Data Week offers free sessions on topics such as data access, management, security, sharing, and preservation.
As 2022 comes to a close, we’d like to say a big THANK YOU to all of you for your continuing support and connection to the Industry Documents Library.
We’re grateful for your interest in industry documents and for your participation in the IDL community, whether that’s through documents research, workshops and trainings, project partnerships, or strategic planning and guidance.
This year we celebrated 20 years (!!!) of making industry documents available online and we appreciate all the ways you’ve worked with us to make the IDL stronger.
Here are some of the achievements you helped us reach in 2022:
17,508,831 documents now available through IDL!
We added 2.3 million new documents to the collections in 2022 -
If you’re able, please consider making a tax-deductible donation to the Industry Documents Library to help us preserve and provide access to the collections for years to come.
From all of us at the IDL, we wish you a safe and festive holiday season, and a healthy and hopeful New Year ahead.
Kate, Rachel, Rebecca, Sven, Melissa and ErikAs a junior associate at a big New York City law firm in the early 2000s, I spent my days—and nights—conducting document review. It was not considered to be a choice assignment, but quickly grew into an obsession. With the latest release of Mallinckrodt and McKinsey documents, journalists and the public have a unique opportunity to further explore the cause of accountability in the opioid epidemic.
The “document review” process, circa 2004, began when the firm received boxes upon boxes upon boxes of company records. We hired contractors to scan them into PDF form, then worked with our tech specialists to devise scanning protocols (indicating how we wanted folders, stapled documents, paper-clipped documents or loose assortments to be represented, as well as the numbers to be stamped at the bottom). These “Bates” numbers, popping up on the bottom right corner of every page through which we subsequently rifled, populated our nightmares. I still think of all this work today: there is so much invisible and essential labor in the creation of an electronic collection of images that are not only relevant but presented in their authentic form.
Once the documents had been uploaded into the software program we used for review, we designed protocols for hierarchical tagging systems, which meant every sheet of paper would be categorized, analyzed and ranked by its content, creator, date, topic, relevance or “hotness.” Senior partners could then review their black binders of critical documents with the knowledge that everything below had been filtered by everyone below. As we made our way through each set of documents, we worked with the tech team to roll out “productions” in paper and electronic form, with accompanying privilege logs and cover letters. The entire process was rife with potential dangers: one missed redaction, one unintended enclosure, and your reputation would be toast. I still tremble at the memory of a few all-caps email messages I received in the middle of the night, on my Blackberry, calling me to task for a Bates number error or unintentionally blank sheet.
My obsession with document review thus originated from self-preservation and fear. It developed into a passion when I prepared my first deposition. The defendant was a former New York State corrections officer who’d been accused of raping a female inmate. Though I had my binder of “hot docs” at the ready, it was my understanding of the body of documents we had obtained—their substance, their deficiencies and their gaps—that allowed me to maintain a sense of direction and achieve our deposition goals.
A few years later, as an Assistant District Attorney in Manhattan, I was working with a junior ADA on an investigation. It was “impossible,” he complained, to get through all the cell phone records we’d obtained in the time we had. It took me a moment to understand that he, unlike me, was not in a state of grateful enchantment before this trove of information.
Before long, I was applying my document review skills to a landmark opioid prosecution. My case—involving a corrupt physician who sold controlled substance prescriptions in exchange for cash—was one of many that followed in the wake of Purdue Pharma’s fraudulent marketing of OxyContin. Without my “BigLaw” document skills, I never would have been able to marshal the volume of documentary evidence we needed to build the homicide, insurance fraud and reckless endangerment angles of the case. And ironically, I’ve since applied those skills—developed at the law firm of Debevoise & Plimpton LLP, which represents Side B of the Sackler family—to support activists fighting against the Sacklers.
Documents are a gift to researchers, justice-seekers and students. Everything and everyone leaves a trace. The truth may not always be what you expect, but skillful, careful and honest document review will allow you to make your peace with it.
The possibilities for research and reporting using OIDA documents are rich, as we’ve already seen with Washington Post, Salon, and New York Times articles. I hope someone will choose to go through these documents with an eye to the contributions of the sales representatives and their supervisors, as well as the middle-level executives and the marketing teams. We—justly—have focused on members of the Sackler family and other high-profile industry leaders in seeking accountability for the opioid epidemic, but where are their foot soldiers now? In which industries are they peddling their skills at selling lies? I’ve taken just a few names and followed them into high-level pharmaceutical positions: who will conduct a more thorough examination? Have any of these individuals been held to task for their choices? Admitted their mistakes? Contributed any part of their ill-gotten gains to victims or any of their time to hard-hit communities?
Documents allow us to verify facts and rebuild a disappeared world: in the case of the opioid epidemic, the OIDA documents may well help us repair the damage and prevent this tragedy from ever happening again.
The Opioid Industry Documents Archive, hosted by University of California, San Francisco and Johns Hopkins University, is a free and public digital archive of opioid litigation documents, including previously unseen evidence on how and why the opioid epidemic happened — shedding light on this tragedy so that a crisis like this will never happen again.
Today we added 1.4 million documents to the Opioid Industry Documents Archive from Mallinckrodt, a leading generic opioid manufacturer now in bankruptcy. The company is one of many in the opioid industry currently implicated in the deaths of hundreds of thousands of people due to misleading marketing, sales, distribution, dispensing, and governance practices. The Mallinckrodt company agreed to release documents produced during litigation as part of their settlement in recent legal cases with the plaintiffs.
Starting today, the documents in the archive are available to and searchable by the public, including families impacted by the opioid crisis as well as the media, healthcare practitioners, students, lawyers, and researchers. We invite everyone to search the archives for the truth.
Read the Press Releases: May 10, 2022: Opioid Industry Archive Releases 1.4 Million Documents from Leading Opioid Maker Implicated in Drug Crisis
via UCSF News, via Johns Hopkins University News Releases
Today’s disclosure of more than a million documents from Mallinckrodt Pharmaceuticals, one of the country’s most prolific opioid sellers, is an important step to expose the truth and prevent a manmade crisis like the opioid epidemic from ever happening again.
Drug companies profited by pushing dangerous prescription opioids, and Americans have become the biggest users of opioids in the world. Communities across our nation suffered the consequences as a result: addiction, overdose, and death.
Families most impacted by the crisis have led the way in advocating for justice. Parents whose own children died because of the opioid crisis have dedicated years of their lives to protect others. They demanded that lawbreakers be held accountable, failed systems be reformed, and urgent investments be made for harm reduction, treatment, recovery, and prevention.
State Attorneys General heard the calls for action and acted. Working together, across party lines and across the nation, our teams conducted a searching investigation of illegal conduct throughout the opioid industry. We filed lawsuits and won verdicts from judges and juries, forcing companies to pay tens of billions of dollars that will be dedicated to address the crisis.
An essential part of justice is exposing the truth. Our teams pursued that truth for years. Our efforts resulted in the public disclosure of millions of documents and of the critical facts revealed by witnesses ranging from drug sales reps to company presidents.
We rejected the companies’ attempts to keep the evidence sealed, or to hand it back to the perpetrators. Instead, we posted it online.
For the first time in a generation, since the landmark tobacco cases, an industry’s secrets are being turned over to the public. Under orders entered by courts throughout the nation, millions of opioid industry documents will be posted in a free public archive, in perpetuity.
The families who suffered in this crisis will be able to see for themselves the evidence that we uncovered – the company emails, board minutes, and business plans that changed so many lives.
Journalists, filmmakers, artists, and scholars will tell the story of this epidemic using the real words and actions of the people who drove the opioid business.
Policymakers throughout the country will be informed by what went wrong.
Executives, directors, and employees in every industry will know that, if they break the law and endanger the public, the whole world may see what they did.
Today is a step toward justice. We are grateful to the advocates who demanded action in the face of a devastating crisis, to our staff who work every day to serve the public, and to the archivists at the University of California San Francisco and Johns Hopkins who will preserve this evidence for the public good.