Though databases as tools for psychological research are in their infancy, thinking about their ideal organization is surprisingly far along. Three schools of thought have emerged. But the technology for databases is being innovated so rapidly that it is premature to judge any approach as best. In fact, a hybridization containing elements of all three is becoming possible thanks to technological developments and the work of some far-thinking scientists.
THE DEFAULT SCHOOL
There is an organic quality to the default position that is naturally endearing. No coercion went into the creation of the existing databases. No fiat from a government official or journal editor dictated their being. They grow because those in the field support their growth. Psychologists, and probably most human beings with great expertise in a subject area, are within their rights to believe that they have deep insight into what is, or is not, needed to aid progress in their subject area. Thus, the perceived need for these databases has been powerful motivation for their creation and maintenance. It is probably their strongest survival asset.
The conditional weakness of such databases is that they are “stovepiped.” That is, they exist in isolation from other information systems, like so many unconnected stovepipes. They serve a small, specialized set of clients, namely, those who carry out research of the kind whose data are contained in the database. A limited client base has value. It has particular value when the science itself is in a reductionist phase. In such a phase, the primary pursuit of most researchers is the gathering of diverse bits of knowledge that might eventually enable an integrationist phase in the science. But it is when a science has sufficient bits of knowledge to begin an integrative phase that stovepiping becomes a weakness: It contributes to the field’s slowness to recognize its integrative capabilities.
A second weakness of stovepiping is that it can separate the keepers of the database from innovations in informatics. In the default approach, it is usually one or more scientists within the discipline who recognize the need for a database and build it. The set of cases in which an information management specialist, realizing that a discipline not his or her own needs a better way of managing data, creates the database as an unsolicited gift is a null set. If a database eventually comes to be supported by federal or private grants, then an informatics specialist might be added to the staff as a luxury. Otherwise, the researchers do double duty as both scientists and database managers. To ask that they stay at the cutting edge of information management innovation even as they pursue their own research and teaching careers is asking too much. The likely result is that those served by the stovepiped database receive a product that is not the early, and sometimes not even the eventual, beneficiary of informatics advancements.
THE SELF-ARCHIVING SCHOOL
A pioneer in several areas including electronic journal publication (he created Psycoloquy, the first electronic journal in psychology), Harnad has been part of a group of scientists from many disciplines who are developing common protocols for databases. Their goal is not just to achieve interoperability among databases within a discipline. They want compatibility across all of science. The group was formed under support from the Los Alamos National Labs. The first iteration of their work was called the Santa Fe Convention for the Open Archives and was released early in 2000. That has now been superceded by the Open Archives Initiative (www.openarchives.org/index.html). Among the items flowing out of this work are templates for information entry in archives and templates for information retrieval from those archives. Beta testing of the protocols has been underway for over a year. If each scientist in each discipline were to self-archive in line with the protocols, then it should be possible for an individual knowing the protocols to access current information from all scientific disciplines. The process would be aided by a variety of services, which might include specialized search engines, systems for interlinking documents, and online catalogs of accessible datasets. Were the conventions universally accepted and applied, it would matter little for purposes of retrieval at least whether the physical repositories for the data were thousands of university-based computers around the world or a few, large, centralized repositories.
The critical assumption in the self-archiving approach is that individual scientists who choose to archive will adopt the Open Archives conventions. Barring that, users would be presented with the daunting task of figuring out first how to understand and then how to use every archive they access. Presumably, continuing revisions of the Open Archives protocols would be the means by which innovations in archiving would be passed along to the archiving community. But the idea that scientist-archivists will adopt, unprompted, a prescribed set of protocols for data entry and for ease of data retrieval by others is a large assumption with little in the way of precedent to inspire confidence that it will come to pass.
THE KNOWLEDGE MANAGEMENT SCHOOL
Now, you ask yourself, what did that last paragraph mean? Imagine this: You are a social psychologist interested in influences on the quality of life of the oldest old. Normally you would search the social psychological literature to see what others had done: The window you have available for looking at your subject is small. What you will see is only a fraction of what you ought to see if you are to do justice to your topic of interest. What if you could sit at your computer and have access to the full corpus of research on aging from all the subfields of psychology? What if you could access not just the full text of journal articles from around the world relevant to your topic, but in addition, you had access to the raw data from the research and also could recombine it and carry out additional analyses on it? And what if the software that gathered that knowledge for you was also able, based on your query, to select from literature and data you wouldn’t have thought to access, knowledge that has relevance for your comprehensive understanding of your topic? If that were possible, your power as an investigator to master your topic of inquiry would be greatly enhanced.
Moreover, the ability of psychology generally to make sense of the knowledge it is producing would exceed what is possible without such a tool. That is the goal of the knowledge management approach to databases. The outspoken proponents of this approach are the current authors: Michel Sabourin of the University of Montreal, Kurt Pawlik of the University of Hamburg, and Gary VandenBos and Wade Pickrin from the American Psychological Association.2
The critical shortcoming of the knowledge management approach is that it requires a level of inter-society support, sustained funding, and international cooperation that is unprecedented in psychology. The field of psychology has grown in a fragmented way. Scientists in each specialty area are most active in the small societies that reflect their research and publish their major journals.
Though the situation has changed greatly in the past two decades, psychologists from different countries remain somewhat isolated from each other. If a comprehensive knowledge management structure were to be built for psychology, it would have to happen through a complex knitting together of disparate groups. For example, it would require agreements among publishers to standardize the format in which articles, journals and books are made available electronically and tagged for relational purposes. It would require similar agreements about first encouraging members of different psychological societies to archive data at a few centralized locations and then about data entry structures that would make techniques for data access and secondary analysis similar, as well as similarly user friendly, across repository sites. Whether as a field we can rise to that level of complex integration is open to debate.
A HYBRID PROPOSAL FOR PROGRESS
Without adequate funding and without a visceral craving on the part of psychologists to have these wondrous tools, what can the proponents of databases do? Though probably not the ideal course of action, it may be possible to make progress by borrowing concepts and technology from the three existing database schools to rig a system capable of showing the unconvinced the power of a deliberate plan to manage the knowledge of the field.
Thanks to the default school, a number of databases are in operation. But they are not currently linked. Social scientists once faced this problem. To achieve coherence, they created a coordinating body for their major databases.
It is called the Inter-University Consortium for Political and Social Research (ICPSR). For relatively low cost, the managers of the existing psychological databases could create an equivalent of ICPSR with the goal of forging agreements on protocols for data entry and data access that could make the databases similar in appearance and methods for use. Such coordinated attention to user friendliness would mean that potential users would not be scared away from databases by their first encounter with them. Since those in the self-archiving school have already designed and are testing templates for data entry and retrieval, it might be possible for this suggested ad hoc coordinating body to align its protocols with those being adopted in sciences other than psychology. That would at least place psychology in step with measures being taken in other sciences to make scientific data universally accessible.
If such an ad hoc group were to be formed, it could also become the device for giving guidance to individual scientists who wish to self-archive. Thus, it might be possible to put together an archival system consisting of the existing special purpose archives and the archives of individual scientists willing to join the movement even in the absence of a special purpose archive for their own brand of research. One advantage of this approach would be the distribution of the costs of archiving. Making such costs in effect part of the equipment costs of universities around the country (whose computers would be hosting the archives of faculty) makes the costs more bearable.
If all participating archivists subscribed to agreed-upon protocols, it would enable the barest beginning of knowledge management for psychology. Lacking in this scheme would be the linkage to full text articles, to abstracts, and to search engines designed for intelligently surveying the body of psychological knowledge and the data underlying it.
The maintenance capabilities attendant on centralized storage would not be distributed evenly across the system, perhaps causing uneven service to users. Lacking also would probably be the archival materials of scientists outside the United States. But maybe these features would come in time – when enough eyes are opened to the possibilities.