Practical Protections

Deidentifying qualitative data

Quick Take

Required sharing  Participants’ views on data sharing

In the era of open science, researchers are increasingly sharing deidentified data from quantitative studies, allowing secondary scientists to build on their initial findings. But sharing data from qualitative studies can be far more challenging.

Qualitative research typically includes highly personal interviews with participants, Michigan State University psychologist Rebecca Campbell and colleagues explain in a 2023 article for Advances in Methods and Practices in Psychological Science (AMPPS), and sharing these transcripts requires researchers to strike a careful balance between making the data anonymous while also keeping it useful for professional peers and collaborators. 

In their AMPPS article, Campbell and colleagues shared how they approached the process of deidentifying qualitative data from interviews with participants who had survived sexual assault and had pressed charges against their assailant. 

“There’s understandable concern about protecting the privacy, confidentiality, and safety of our research participants, particularly those who have experienced traumatic events,” said Campbell, an ecological/community psychologist who studies violence against women, in an interview with the Observer. “For any researchers who thought ‘this can’t be done’ or ‘I don’t know how to do this,’ we hope our paper provides a useful step-by-step guide for the key issues and decisions that need to be made along the way.” 

Required sharing 

Campbell and colleagues secured funding for their research from the U.S. Department of Justice, which requires researchers who receive federal support to share deidentified data through the National Archive of Criminal Justice Data. They first asked participants, attorneys associated with their criminal cases, and advocates from a victim service agency to help them determine what information they should remove or redact from the transcripts before sharing on the archive. Participants were informed that names, dates, locations, and details about the assault and trial would be removed by default. But they could also request additional information be removed at the end of the study, though only two out of 30 participants did so. 

Generally, Campbell and colleagues explained, their decisions about which information to remove were guided by three central questions: “Who else would know that information?” “How would they know that information?” and “What other records contain that information?” 

An Adversarial Case Study of Data Sharing

A few years ago, three psychological scientists prepared a dyadic dataset for sharing through a database available only to other researchers. The effort ultimately resulted in an adversarial discussion. 

Samantha Joel (University of Utah) and APS Fellows Paul W. Eastwick (University of California, Davis), and Eli J. Finkel (Northwestern University) shared their experiences and differing opinions in a 2018 AMPPS article.   

The article sprung from an earlier study Joel, Eastwick, and Finkel published in Psychological Science. The researchers had used machine learning to analyze how 300 participants’ personal preferences influenced their perception of potential partners at a speed-dating event. Participants responded to over 100 self-report measures about everything from their favorite television show to what they were looking for in a future spouse, as well as how likable and attractive they found their dates to be. 

After confirming that their participant consent form gave them permission to share the data with other researchers, Joel, Eastwick, and Finkel proceeded to anonymize the data by removing identifying information about participants, including their ages, ethnicities, and birth dates. Additionally, because the researchers’ original analysis focused on group-level effects of personal preferences, they chose to share only aggregate data instead of participants’ individual responses. 

Initially, Joel, Eastwick, and Finkel agreed that this anonymization would be enough to ensure the participants’ privacy, especially because the speed-dating study data had been collected over a decade prior to the publication of their Psychological Science article. No one was likely to be interested in identifying the participants.  

Employing multiple forms of deidentification—in this case, removing identifying information about participants and aggregating their responses—further decreases the likelihood of someone invading participants’ privacy, Joel explained in her section of the AMPPS article. 

“My view is that it is important to consider the use of these safeguards in combination,” she wrote. “Assuming that the chance of each protection failing is independent, the risk of a confidentiality breach decreases exponentially with each new protection that is added.”  

But just because these measures were appropriate for this study doesn’t mean they can be safely applied to all dyadic datasets, Finkel argued. 

“My sense is that our procedures—using our best intuition to answer self-interrogations—may be excessively risky for the vast majority of nonindependent data in psychological science,” he wrote. 

One thing each member of this adversarial collaboration agreed upon is that researchers need additional institutional guidance about how and when to share data. 

“Our discipline is still in need of clear, prescriptive guidelines that address these issues at the intersection of confidentiality and open-data practices, so that researchers are not relying so much on their own intuitions when making these decisions,” they wrote.  

References 

Joel, S., Eastwick, P. W., Finkel, E. J. (2018). Open sharing of data on close relationships and other sensitive social psychological topics: Challenges, tools, and future directions. Advances in Methods and Practices in Psychological Science, 1(1), 86–94. https://doi.org/10.1177/2515245917744281  

Joel S., Eastwick P. W., Finkel E. J. (2017). Is romantic desire predictable? Machine learning applied to initial romantic attraction. Psychological Science, 28(10), 1478–1489. https://doi.org/10.1177/0956797617714580 

It is particularly important to address these concerns when deidentifying data from dyadic research in which a second party—in this case, the perpetrator of the sexual assault—may try to identify a participant from their interview responses, the researchers noted. Furthermore, unredacted court records that are available to the general public could be cross-referenced against the study’s interview transcripts to identify participants. 

To “blur” data has much as possible, Campbell and colleagues aimed to replace identifying information with less specific text—for example, by replacing an individual’s age with an age range—instead of redacting it entirely. 

“Blurring tries to preserve as much detail and context as possible while acknowledging that the remediation could decrease the usability of the data,” the researchers explained.

These decisions were made through an iterative process. A team of research staff, some of whom had been involved in interviewing the participants, blurred and redacted information from each transcript using a rules-based codebook. Their recommendations were then reviewed by a pair of supervisors who could request additional changes. Finally, once the coding staff and supervisors reached an agreement, staff members from the victim service agency reviewed each transcript to determine whether further changes were needed to deidentify the participants. 

In many cases, the researchers noted, participants themselves ideally would review the transcripts to alleviate any remaining concerns about their privacy. Because of the sensitive nature of this study, however, Campbell and colleagues chose to work with the victim service agency to avoid retraumatizing the participants by asking them to read about their own assaults. Though minimizing the impact of deidentifying qualitative data on participants is the priority, it’s also important to remember that exposure to such difficult experiences can take a toll on researchers too, Campbell and colleagues added. 

“For researchers who will be deidentifying data that addresses traumatic content, we recommend that teams pay attention to the risk of vicarious trauma from in-depth exposure to upsetting material,” Campbell said. “There are many existing resources for addressing [vicarious trauma] within research teams, and we recommend that researchers proactively plan for giving team members adequate support and time to do this work carefully.” 

Participants’ views on data sharing 

Researchers aren’t the only ones with a vested interest in sharing qualitative datasets. Many participants view data sharing as a way to amplify their contributions to science and society by making full use of the information they’ve chosen to share with researchers. In a study of 30 people who had participated in qualitative interviews concerning sensitive topics like substance abuse and sexual health, bioethicist Jessica Mozersky (Washington University in St. Louis) and colleagues found that 28 participants supported sharing their data with other researchers. 

“For many participants, sharing data is a way to amplify the societal benefits of participating in research and to maximize their contribution to the research enterprise at large,” Mozersky and colleagues wrote.

This support came with certain caveats, however. Though most participants were open to sharing their data with government agencies, other researchers, and students, many did not want them to be available to the general public, and they also expected that their data would be deidentified to preserve their anonymity (Mozersky et al, 2020). 

Despite widespread support for data sharing amongst participants, researchers who work with qualitative data often don’t share these datasets, social psychologist Bobby Lee Houtkoop (University of Amsterdam) and colleagues wrote in a 2018 AMPPS article. To understand why, Houtkoop and colleagues surveyed 600 researchers who had published articles in psychology journals about their data-sharing practices. 

“Respondents considered data sharing to be both desirable and profitable for their particular research fields, but somewhat less desirable and profitable in the case of their own current research projects,” Houtkoop and colleagues explained. 

Although the survey did not distinguish between researchers who worked with qualitative and quantitative data, respondents raised many of the same concerns that arise in relation to deidentifying qualitative datasets. In addition to concerns about preserving participants’ anonymity, researchers reported holding back because of time constraints. They also wanted to prevent secondary researchers from “scooping” them by publishing findings based on the shared data before they had the opportunity to do so themselves. Furthermore, many respondents reported that their participant consent form, institutional review board (IRB), or other legal constraints prevented them from sharing their data. 

In another study, Mozersky and colleagues interviewed 90 data repository curators, IRB members, and qualitative researchers about their knowledge of and experiences with sharing qualitative data. Researchers reported being the least familiar with qualitative data sharing. Moreover, only a collective half of curators and IRB members reported having any experience with sharing qualitative data sets. 

“IRB members and data curators are not prepared to advise researchers on legal and regulatory matters, potentially leaving researchers who have the least knowledge with no guidance,” Mozersky and colleagues wrote. “These findings are not surprising—[qualitative data sharing] is relatively new, uncharted, and many have not yet experienced it.” (Mozersky, et al., 2020). 

Academic institutions, funders, and journals could help address this uncertainty by providing researchers with guidelines for qualitative data sharing. More respondents in the study by Houtkoop and colleagues indicated that they would probably share their data if they could get additional grant funding or if a journal required them to do so as part of the publication process. 

“Our findings suggest that although researchers perceive barriers to data sharing, at least some important barriers can be overcome relatively easily,” Houtkoop and colleagues concluded. “Strong encouragement from institutions, journals, and funders will be particularly effective in overcoming these barriers, in combination with educational materials that demonstrate where and how data can be shared effectively.” 

With all of these considerations, sharing data from qualitative studies is no simple matter, but overcoming these hurdles can pay dividends for psychological science as a whole. 

“Deidentifying narrative data can be a time-consuming process, but one that ultimately helped us understand our data more deeply,” said Campbell. 

Back to top

Feedback on this article? Email [email protected] or login to comment.

References 

Campbell, R., Javorka, M., Engleton, J., Fishwick, K., Gregory, K., & Goodman-Williams, R. (2023). Open-science guidance for qualitative research: An empirically validated approach for de-identifying sensitive narrative data. Advances in Methods and Practices in Psychological Science, 6(4). https://doi.org/10.1177/25152459231205832

Houtkoop, B. L., Chambers, C., Macleod, M., Bishop, D. V. M., Nichols, T. E., Wagenmakers, E-J. (2018). Data sharing in psychology: A survey on barriers and preconditions. Advances in Methods and Practices in Psychological Science, 1(1), 70–85. https://doi.org/10.1177/2515245917751886

Mozersky, J., Parsons, M., Walsh, H., Baldwin, K., McIntosh, T., DuBois, J. M. (2020). Research participant views regarding qualitative data sharing. Ethics & Human Research, 42(2), 13–27. https://doi.org/10.1002/eahr.500044

Mozersky, J., Walsh, H., Parsons, M., McIntosh, T., Baldwin, K., DuBois, J. M. (2020). Are we ready to share qualitative research data? Knowledge and preparedness among qualitative researchers, IRB Members, and data repository curators. IASSIST Quarterly, 43(4), 1–23. https://doi.org/10.29173/iq952

Comments

Yet to be satisfactorily addressed are questions related to data sovereignty. In my experience, data depositories give lip service to sovereignty principles, but do not have practices in place that they can refer to. This means that investigators cannot in good faith provide assurance to tribal organizations that tribal wishes will be honored or even whether tribes would be notified in the case of requests to do secondary analyses.
Issues of individual participant privacy also require extra care whenever participants reside in small, close knit communities. Because I analyze data on fairly large samples using both qualitative and quantitative methods, I am able to provide raw numbers of code frequencies. At this time, I am not comfortable providing transcripts. I second Dr. Finkel’s call for clearer guidelines, while emphasizing that repositories have to pair those guidelines with procedures that can protect participant privacy and tribal sovereignty.


APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines.

Please login with your APS account to comment.