Part Three: Three Ways to Use Databases as Tools for Psychological Research

DAVID H. JOHNSON

Part Three: Three Ways to Use Databases as Tools for Psychological Research

DAVID H. JOHNSON

December 3, 2001

Previous Articles

Sharing Data: It’s Time to End Psychology’s Guild Approach
October 2001Three Objections to Databases Answered
November 2001

Though databases as tools for psychological research are in their infancy, thinking about their ideal organization is surprisingly far along. Three schools of thought have emerged. But the technology for databases is being innovated so rapidly that it is premature to judge any approach as best. In fact, a hybridization containing elements of all three is becoming possible thanks to technological developments and the work of some far-thinking scientists.

THE DEFAULT SCHOOL

The first of these three schools of thought could be called the “default” school, though some consider it the most desirable course. The premise is that when a discipline needs a database, the people working in that area will create it. This position is the default because it is the current state of databases in psychology. For example, a desire to gather research on women was the impetus for creation of the Murray Center database, (www.radcliffe.edu/murray/data/registra.htm). The need to preserve the records of psychological societies underlies the contributions from psychology to the Akron archive (www.uakron.edu/ahap). The need to share hard-to-get audio material led developmental psychologists to create their database. A desire to delineate the differences between fluid and crystallized intelligence over time and space led to the database of ability test results at the University of Virginia. The need to know the abilities of Air Force recruits led to the cognitive science database that was long housed at San Antonio’s Brooks Air Force Base. And the high cost of fMRI research led to the database of brain images at Dartmouth (www.fmridc.org).

There is an organic quality to the default position that is naturally endearing. No coercion went into the creation of the existing databases. No fiat from a government official or journal editor dictated their being. They grow because those in the field support their growth. Psychologists, and probably most human beings with great expertise in a subject area, are within their rights to believe that they have deep insight into what is, or is not, needed to aid progress in their subject area. Thus, the perceived need for these databases has been powerful motivation for their creation and maintenance. It is probably their strongest survival asset.

The conditional weakness of such databases is that they are “stovepiped.” That is, they exist in isolation from other information systems, like so many unconnected stovepipes. They serve a small, specialized set of clients, namely, those who carry out research of the kind whose data are contained in the database. A limited client base has value. It has particular value when the science itself is in a reductionist phase. In such a phase, the primary pursuit of most researchers is the gathering of diverse bits of knowledge that might eventually enable an integrationist phase in the science. But it is when a science has sufficient bits of knowledge to begin an integrative phase that stovepiping becomes a weakness: It contributes to the field’s slowness to recognize its integrative capabilities.

A second weakness of stovepiping is that it can separate the keepers of the database from innovations in informatics. In the default approach, it is usually one or more scientists within the discipline who recognize the need for a database and build it. The set of cases in which an information management specialist, realizing that a discipline not his or her own needs a better way of managing data, creates the database as an unsolicited gift is a null set. If a database eventually comes to be supported by federal or private grants, then an informatics specialist might be added to the staff as a luxury. Otherwise, the researchers do double duty as both scientists and database managers. To ask that they stay at the cutting edge of information management innovation even as they pursue their own research and teaching careers is asking too much. The likely result is that those served by the stovepiped database receive a product that is not the early, and sometimes not even the eventual, beneficiary of informatics advancements.

THE SELF-ARCHIVING SCHOOL

One would think that the second approach to databases, self-archiving, would suffer from technical and methodological obsolescence even more than the default approach. But that may not necessarily be the case. Stephen Harnad is the psychologist most associated with the self-archiving approach.¹ The core position of this school of thought is that there is no need to wait for an appropriate database to come along before archiving one’s data. It is perfectly possible to make it available to others by posting it at one’s own web site. If everyone did this, proponents point out, the data from psychological research could be archived essentially instantly. Harnad also advocates posting one’s journal articles along with the data.

A pioneer in several areas including electronic journal publication (he created Psycoloquy, the first electronic journal in psychology), Harnad has been part of a group of scientists from many disciplines who are developing common protocols for databases. Their goal is not just to achieve interoperability among databases within a discipline. They want compatibility across all of science. The group was formed under support from the Los Alamos National Labs. The first iteration of their work was called the Santa Fe Convention for the Open Archives and was released early in 2000. That has now been superceded by the Open Archives Initiative (www.openarchives.org/index.html). Among the items flowing out of this work are templates for information entry in archives and templates for information retrieval from those archives. Beta testing of the protocols has been underway for over a year. If each scientist in each discipline were to self-archive in line with the protocols, then it should be possible for an individual knowing the protocols to access current information from all scientific disciplines. The process would be aided by a variety of services, which might include specialized search engines, systems for interlinking documents, and online catalogs of accessible datasets. Were the conventions universally accepted and applied, it would matter little for purposes of retrieval at least whether the physical repositories for the data were thousands of university-based computers around the world or a few, large, centralized repositories.

The critical assumption in the self-archiving approach is that individual scientists who choose to archive will adopt the Open Archives conventions. Barring that, users would be presented with the daunting task of figuring out first how to understand and then how to use every archive they access. Presumably, continuing revisions of the Open Archives protocols would be the means by which innovations in archiving would be passed along to the archiving community. But the idea that scientist-archivists will adopt, unprompted, a prescribed set of protocols for data entry and for ease of data retrieval by others is a large assumption with little in the way of precedent to inspire confidence that it will come to pass.

THE KNOWLEDGE MANAGEMENT SCHOOL

The third school of thought on database building grows out of work on what has been labeled “knowledge management” in the fields of business management and library science. The key idea in this school of thinking is that the utility of knowledge can be dramatically increased if knowledge units can be organized electronically within a relational structure that allows access to any piece of knowledge from any starting point within the system, that allows unique recombinations of knowledge elements, and that can aid the user in seeking aggregations of knowledge elements whose value in aggregated form has the potential to be greater than the individual value of its parts.

Now, you ask yourself, what did that last paragraph mean? Imagine this: You are a social psychologist interested in influences on the quality of life of the oldest old. Normally you would search the social psychological literature to see what others had done: The window you have available for looking at your subject is small. What you will see is only a fraction of what you ought to see if you are to do justice to your topic of interest. What if you could sit at your computer and have access to the full corpus of research on aging from all the subfields of psychology? What if you could access not just the full text of journal articles from around the world relevant to your topic, but in addition, you had access to the raw data from the research and also could recombine it and carry out additional analyses on it? And what if the software that gathered that knowledge for you was also able, based on your query, to select from literature and data you wouldn’t have thought to access, knowledge that has relevance for your comprehensive understanding of your topic? If that were possible, your power as an investigator to master your topic of inquiry would be greatly enhanced.

Moreover, the ability of psychology generally to make sense of the knowledge it is producing would exceed what is possible without such a tool. That is the goal of the knowledge management approach to databases. The outspoken proponents of this approach are the current authors: Michel Sabourin of the University of Montreal, Kurt Pawlik of the University of Hamburg, and Gary VandenBos and Wade Pickrin from the American Psychological Association.²

The critical shortcoming of the knowledge management approach is that it requires a level of inter-society support, sustained funding, and international cooperation that is unprecedented in psychology. The field of psychology has grown in a fragmented way. Scientists in each specialty area are most active in the small societies that reflect their research and publish their major journals.

Though the situation has changed greatly in the past two decades, psychologists from different countries remain somewhat isolated from each other. If a comprehensive knowledge management structure were to be built for psychology, it would have to happen through a complex knitting together of disparate groups. For example, it would require agreements among publishers to standardize the format in which articles, journals and books are made available electronically and tagged for relational purposes. It would require similar agreements about first encouraging members of different psychological societies to archive data at a few centralized locations and then about data entry structures that would make techniques for data access and secondary analysis similar, as well as similarly user friendly, across repository sites. Whether as a field we can rise to that level of complex integration is open to debate.

A HYBRID PROPOSAL FOR PROGRESS

What frustrates proponents of databases, regardless of the school of thought to which they subscribe, is the understanding that the technological capabilities to perform wonders are within our grasp. That was not the case only a few years ago. But the social impediments to building the tool that can perform the wonders are humbling. There is little demand for the tools in the research community. There is no clamor on the part of researchers for funding agencies to devote significant resources to this kind of infrastructure building. In fact, if anything, the funding community has been trying with varying degrees of success to use its funding mechanisms to demonstrate to researchers that they ought to be clamoring for more such support.

Without adequate funding and without a visceral craving on the part of psychologists to have these wondrous tools, what can the proponents of databases do? Though probably not the ideal course of action, it may be possible to make progress by borrowing concepts and technology from the three existing database schools to rig a system capable of showing the unconvinced the power of a deliberate plan to manage the knowledge of the field.

Thanks to the default school, a number of databases are in operation. But they are not currently linked. Social scientists once faced this problem. To achieve coherence, they created a coordinating body for their major databases.

It is called the Inter-University Consortium for Political and Social Research (ICPSR). For relatively low cost, the managers of the existing psychological databases could create an equivalent of ICPSR with the goal of forging agreements on protocols for data entry and data access that could make the databases similar in appearance and methods for use. Such coordinated attention to user friendliness would mean that potential users would not be scared away from databases by their first encounter with them. Since those in the self-archiving school have already designed and are testing templates for data entry and retrieval, it might be possible for this suggested ad hoc coordinating body to align its protocols with those being adopted in sciences other than psychology. That would at least place psychology in step with measures being taken in other sciences to make scientific data universally accessible.

If such an ad hoc group were to be formed, it could also become the device for giving guidance to individual scientists who wish to self-archive. Thus, it might be possible to put together an archival system consisting of the existing special purpose archives and the archives of individual scientists willing to join the movement even in the absence of a special purpose archive for their own brand of research. One advantage of this approach would be the distribution of the costs of archiving. Making such costs in effect part of the equipment costs of universities around the country (whose computers would be hosting the archives of faculty) makes the costs more bearable.

If all participating archivists subscribed to agreed-upon protocols, it would enable the barest beginning of knowledge management for psychology. Lacking in this scheme would be the linkage to full text articles, to abstracts, and to search engines designed for intelligently surveying the body of psychological knowledge and the data underlying it.

The maintenance capabilities attendant on centralized storage would not be distributed evenly across the system, perhaps causing uneven service to users. Lacking also would probably be the archival materials of scientists outside the United States. But maybe these features would come in time – when enough eyes are opened to the possibilities.

¹See Harnad, S. 1994. “Publicly retrievable FTP archives for esoteric science and scholarship: A subversive proposal.” In: Okerson, A. and O’Donnell, J. (eds.) Scholarly Journals at the Crossroads: A Subversive Proposal for Electronic Publishing. Washington, D.C. Association of Research Libraries, June 1995. See also Harnad, S. 1991. “Post-Gutenberg galaxy: The fourth revolution in the means of production of knowledge.” Public-Access Computer Systems Review. 2 (1), 39-53.

²A detailed treatment of the approach is found in Johnson, DH and Sabourin, ME. 2001. “Universally accessible databases in the advancement of knowledge from psychological research.” International Journal of Psychology, 36 (3), 212-220.

Observer > 2001 > December > Part Three: Three Ways to Use Databases as Tools for Psychological Research

Cookie	Duration	Description
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
uvc	1 year 27 days	Set by addthis.com to determine the usage of addthis.com service.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_3507334_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Cookie	Duration	Description
loc	1 year 27 days	AddThis sets this geolocation cookie to help understand the location of users who share the information.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Part Three: Three Ways to Use Databases as Tools for Psychological Research

About the Author

Related

New Report Finds “Gaps and Variation” in Behavioral Science at NIH

APS Advocates for Psychological Science in New Pandemic Preparedness Bill

APS Urges Psychological Science Expertise in New U.S. Pandemic Task Force

About the Author

Related

New Report Finds “Gaps and Variation” in Behavioral Science at NIH

APS Advocates for Psychological Science in New Pandemic Preparedness Bill

APS Urges Psychological Science Expertise in New U.S. Pandemic Task Force

Cookies