This article is adapted from an invited address given at the annual meeting of the American Educational Research Association, held April 2004 in San Diego, California.
My intent in this paper is to offer a personal perspective on the events that led to a major change in the college admissions test, known as the SAT. The new test is now in place for all students nationwide who must take the SAT as part of the admissions process for the college class entering in the fall of 2006. Hopefully, this account will be useful to those trying to change policies and practices deeply entrenched in our society.
Before I begin, let me introduce some terminology. By “standardized test,” I mean simply a test administered under controlled conditions and carefully monitored to prevent cheating. I will also use the terms “aptitude test” and “achievement test.” Achievement tests are designed to measure mastery of a specific subject. In contrast, aptitude tests are designed to predict an individual’s ability to profit from a particular type of training or instruction. For example, an algebra test given at the end of a course would be classified as an achievement test, whereas a test given prior to the course — designed to predict the student’s performance in the algebra course — would be classified as an aptitude test. In actual practice, the distinction between achievement and aptitude tests is not as neat as these definitions might suggest, but the conceptual difference is useful.
A Brief History
After World War II, colleges and universities in the United States gradually adopted standardized tests as part of their admissions process. The test that was most widely selected was the Scholastic Aptitude Test, known as the SAT. Some schools used the American College Testing Program, or ACT, but most institutions, particularly the more selective ones, chose the SAT.
The College Board (the non-profit organization that owns the SAT) has made a series of changes in the test since its inception. The original SAT became the SAT I — a three-hour test that continued to focus on verbal aptitude but added a quantitative section covering mathematical topics typically taught in grades one through eight. In addition, the College Board developed 23 one-hour SAT II tests designed to measure a student’s achievement in specific subjects such as physics, chemistry, history, mathematics, writing, and foreign languages. Most colleges and universities required only the SAT I, but some required the SAT I plus two or three SAT II tests.
Today, when the SAT is mentioned in the media, the reference is invariably to the SAT I. The test has become a key factor in determining who is admitted — and who is rejected — at the more selective institutions.
Concerns About the SAT
My concerns about the SAT date back to the late 1940s, when I was an undergraduate at the University of Chicago. Many of the Chicago faculty were outspoken critics of the SAT and viewed it as nothing more than a multiple-choice version of an IQ test; they argued forcefully for achievement tests in the college admissions process. Their opposition may have been influenced to some degree by school rivalry; the leading force behind the SAT at that time was James B. Conant, the president of Harvard University. Eventually Chicago adopted the SAT, but not without controversy.
In the years after leaving the University of Chicago, I followed the debates about the SAT and IQ tests with great interest. I knew that Carl Brigham, a psychologist at Princeton who created the original SAT, modeled the test after earlier IQ tests and regarded it as a measure of innate mental ability. But, years later, he expressed doubts about the validity of the SAT and worried that preparing for the test distorted the educational experience of high school students. Conant also expressed serious reservations about the test later in his life.
When students asked me about IQ testing, I frequently referred them to Stephen Jay Gould’s book The Mismeasure of Man, published in 1981; it is a remarkable piece of scholarship that documented the widespread misuse of IQ tests. I knew both Dick Herrnstein at Harvard and Art Jensen at the University of California, Berkeley personally, and kept track of their controversial work on IQ. And, of course, I was a long-term faculty member at Stanford University, where the Stanford-Binet Intelligence Scales were developed.
Over the intervening years, my views about IQ testing proved to be mixed. In the hands of a trained clinician, tests like the Wechsler Intelligence Scales or the Stanford-Binet Intelligence Scales are useful instruments in the diagnosis of learning problems; they can often identify someone with potential who, for whatever reason, is failing to live up to that potential. However, such tests do not have the necessary validity or reliability to justify ranking individuals of normal intelligence, let alone to make fine judgments among highly talented individuals. My views are similar to those of Alfred Binet, the French psychologist who, in the early years of the 20th century, devised the first IQ tests. Binet was very clear that these tests could be useful in a clinical setting, but rejected the idea that they provided a meaningful measure of mental ability that could be used to rank individuals. Unfortunately, his perspective was soon forgotten, as the IQ testing industry burst onto the American scene.
A Defining Moment
My involvement with the SAT began in the early 1990s, when I served as chair of the Board on Testing and Assessment. BOTA is a board of the National Research Council charged with advising the federal government on issues of testing and assessment. BOTA has done a tremendous service integrating and interpreting research findings in order to advise the government on a wide range of testing and assessment problems for virtually every federal agency.
Serving on BOTA focused my attention on college admissions tests and their effects on a student’s high school education and subsequent career. However, the defining moment for me occurred at a meeting of BOTA in Washington, DC, where representatives of the College Board and the Educational Testing Service, or ETS, presented their views on college admissions tests. I left that meeting less than satisfied. The College Board and ETS have a superb record both on the technical aspects of test development and on administering tests and ensuring their security. But at that meeting, the notion that the SAT I was a “true measure of intelligence” dominated their perspective. Further, they seemed oblivious to several studies suggesting that achievement tests were a better predictor of college success than aptitude tests.
On my way home I stopped in Florida to visit my grandchildren. I found my granddaughter, then in sixth grade, already diligently preparing for the SAT by testing herself on long lists of verbal analogies. She had a corpus of quite obscure words to memorize, and then she proceeded to construct analogies using the words. I was amazed at the amount of time and effort involved, all in anticipation of the SAT. Was this how I wanted my granddaughter to spend her study time?
On the plane trip back to California I drafted an op-ed piece about college admissions tests. It was not focused on the University of California, but on college admissions in general. It made a series of points. One was that admissions tests should not try to measure “innate intelligence” (whatever that is), but should focus on achievement — what the student actually learned during the high school years. In addition, such tests should have an essay component requiring the student to produce an actual writing sample. And the tests should cover more mathematics than simply an eighth grade introduction to algebra.
And, finally, I said that an important aspect of admissions tests was to convey to students, as well as their teachers and parents, the importance of learning to write and the necessity of mastering at least eighth- through tenth-grade mathematics.
The draft op-ed piece was handwritten. I shared it with a few close friends, decided that the time was not right to raise the issue, and placed it in my desk drawer. But later, when the SAT controversy erupted, a reporter learned of the draft and requested it under the Freedom of Information Act. To my chagrin, the UC general counsel declared that it was a university document and had to be turned over to the reporter.
A Leak Becomes a Deluge
When I was asked to give the keynote address at the annual meeting of the American Council of Education, or ACE, in February 2001, a colleague of mine at the office of the president, Pat Hayashi, suggested that we use the op-ed draft as the basis for the speech. Pat had been the admissions officer at UC Berkeley for a number of years, and at the time was serving on the board of trustees of the College Board. He has been an important influence on my thinking about admissions issues, in general, and the SAT, in particular.
Although as UC president I already had plenty of controversies to contend with, I liked Pat’s suggestion and we proceeded to redo the op-ed piece, this time focusing on the University of California (the speech can be found at the UC Office of the President Web site at www.ucop.edu/news/sat/speech1.html). In a nutshell, I said that I intended to recommend to the faculty that the university cease using the SAT I and rely on SAT IIs until an appropriate achievement-oriented test could be developed to replace the SAT I. The text of that speech was a closely held secret; I shared it with only a few trusted colleagues.
I flew to Washington, DC on a Friday, with the speech scheduled for Sunday afternoon. I checked into my hotel Friday evening. The next morning I woke up, planning to spend an enjoyable Saturday visiting the Hirshhorn Gallery. When I opened my hotel door, there in the hallway was the Washington Post. The front page story — top of the fold — read: “Key SAT Test Under Fire in Calif.; University President Proposes New Admissions Criteria.” I rushed out to retrieve copies of the Los Angeles Times and the Chicago Tribune and found the same thing: front page stories. The New York Times had a long story, also starting on the front page, with a headline that read “Head of U. of California Seeks To End SAT Use in Admissions.” The story was particularly interesting because they had reproduced word for word almost half of the speech.
I will take a moment to explain how this happened. A young man in the UC press office was about to take another job, and he had friends at the Associated Press. The computer system in my office was not as secure as we had assumed, and he was able to obtain the next-to-last draft of the speech. I know this because at the last moment Pat Hayashi convinced me to add a paragraph on ‘comprehensive review’; namely, that the University of California should stress the importance of multiple factors in the admissions process and not rely too heavily on test scores. So I said, “OK, draft a paragraph and put it in.” And he did. When I saw the paragraph, I was satisfied except that he used the term “holistic review.” I dislike the word “holistic” with its various connotations and quickly changed it to “comprehensive review.” But the New York Times carried the term “holistic,” because they had the penultimate draft of the speech. That term continues to plague me even to this day. Apparently, some people still refer to the original New York Times account.
The Public Response
I never made it to the Hirshhorn on Saturday. Most of the day was spent trying to dodge reporters and frantic calls from UC officials. When I arrived at the ACE meetings on Sunday afternoon, the auditorium was packed, as were the overflow rooms. The place was alive with reporters. There were TV cameras and satellite feeds everywhere; it was truly a chaotic scene. Stan Ikenberry, the president of ACE, was absolutely delighted. This was the biggest crowd and the most media coverage ACE had ever had. No one seemed disturbed that the speech had been leaked to the press the day before.
The audience’s response was wonderful. I had expected to attract some attention in the higher education community, but I was unprepared for the general public’s response. Clearly, the topic hit a deep chord in the American psyche.
Over the course of the next several months, I received hundreds of letters from people describing their experiences with the SAT. I was on “The News Hour With Jim Lehrer” and was in a debate on “Good Morning America.” The major magazines, such as Newsweek and U.S. News & World Report, had cover stories. The one I liked best was Time magazine; they devoted a large part of an issue to the subject of college admissions testing. Nicholas Lemann, a reporter who authored the book The Big Test: The Secret History of the American Meritocracy, wrote one of the Time magazine articles that I particularly like. The piece includes a photograph of me on one page and facing me on the opposite page is the President of the United States, George W. Bush. The question over the photos is, “What Do These Two Men Have In Common?” Lemann’s answer was that we both supported the idea of standardized testing. A few clever souls speculated that what the two of us had in common was the same SAT score. Fortunately, I was able to respond, “No, that’s not the case. I was a student at the University of Chicago which, at that time, had its own entrance exam, and it certainly wasn’t the SAT.”
Some people assumed that I was arguing for no testing at all; they hadn’t bothered to read the actual speech. For a few weeks anti-testing groups saw me as a hero, until they realized that I was not proposing a ban on standardized testing.
Unfortunately, in one discussion with reporters I described the impact of my granddaughter’s experience on my thinking, and after that she was often mentioned in their stories. She was embarrassed by the attention and not too happy with her grandfather. I’ll return to her views on this matter later.
A Ticking Time Bomb
The College Board’s response to my speech was less than enthusiastic. There were some sharp exchanges in the press and a number of SAT supporters wrote scathing articles; a few got a little too personal. Some of the articles were written by college admissions officers who failed to disclose that they had been paid consultants to the College Board. And efforts were made to enlist key UC faculty to oppose the proposal. But, as I will explain later, the College Board did, in the end, agree to totally overhaul the SAT. The president of the College Board, Gaston Caperton, deserves much of the credit for what took place. He had served as the governor of West Virginia and in that role had been particularly effective in improving K-12 education. As the SAT debate evolved, he showed remarkable leadership. Some of the senior people at the College Board wanted to maintain the status quo, but as Caperton immersed himself in the issue, his perspectives changed and he concluded that a major overhaul of the test was needed. I admire Caperton greatly. He showed courage and leadership, and change in the SAT I would not have occurred without his involvement.
Buried in the ACE speech was a very brief paragraph — five sentences that were overlooked by most people. It noted that the University of California had used the SAT I and three SAT IIs for a number of years, and that several small-scale UC studies indicated that the SAT II was the better predictor of college performance. Just a brief paragraph, hardly noticed, but it was a ticking time bomb.
At this point it will be useful to provide some history. The UC faculty, under the university’s tradition of shared governance, have responsibility for the admissions process. That responsibility is exercised by the Board on Admissions and Relations with Schools, or BOARS, of the UC Academic Senate. In 1960, when many universities had already adopted the SAT, UC still did not require the test in its admissions process. At that time, BOARS launched a study to compare the SAT and several achievement tests as predictors of college performance. The results were mixed. The achievement tests proved a more useful predictor of success than did the SAT, but the benefit of both tests appeared marginal. BOARS decided not to introduce admissions tests and to continue to rely on high school grades.
In 1968, UC began requiring the SAT I and three SAT II achievement tests, although the applicant’s SAT scores were not considered in the regular admissions process. However, in special cases, high SAT scores were a way of admitting promising students whose high school grades fell below the UC standard. UC requires applicants to take a specific set of courses in high school; poor grades in these courses could be offset by high SAT scores. Lemann, in his book, The Big Test, asserts that UC’s adoption of the SAT was a turning point for the College Board. Once UC required the test, the SAT became the gold standard for admissions tests. To this day, more students applying to UC take the SAT than at any other institution.
By 1979, UC faced increasing enrollment pressures and finally adopted the SAT as a formal part of the regular admissions process. That year, BOARS established UC’s Eligibility Index: a sliding scale combining the high school grade point average, or GPA, with the SAT I score to determine whether a student is UC eligible. The Eligibility Index was established because several studies showed UC accepted students well below its mandated top 12.5 percent of statewide high school graduates. Note that only the SAT I score was included in the Eligibility Index, even though applicants were still required to take three SAT II tests. All eligible students were guaranteed acceptance at one of the UC campuses, but not necessarily the campus of their choice. Campus admissions officers at each of the UC campuses used the full array of data, including the SAT II scores, in making individual campus decisions.
In 1995, shortly after I became president, BOARS — with my strong endorsement — redefined the Eligibility Index to include GPA plus scores on the SAT I and three SAT IIs (writing, mathematics, and a third test of the student’s choice). This was done on the basis of several small-scale studies suggesting that the SAT IIs were good predictors of college success. BOARS established a weighting scheme that had the principal weight on the GPA, but with a relative weight of one on the SAT I compared with a weight of three on the SAT IIs. So, in 1995, the word went out to high school students and their counselors that the SAT II had taken on a new significance.
DATA MAKE THE DIFFERENCE
By the time I gave my ACE speech, we had four years of data under the new policy on all freshmen who were admitted and subsequently enrolled at a UC campus. We had approximately 78,000 student protocols. A protocol included the student’s high school grades, SAT I scores (verbal and quantitative), three SAT II scores, family income, family educational background, the quality of the high school the student attended, race/ethnicity, and several other variables. And, of course, the protocol included the grade record of the student in her or his freshman year at a UC campus.
When I gave my ACE speech, an analysis of the UC data was not yet available. However, a few months later, two researchers at the UC Office of the President, Saul Geiser and Roger Studley, completed a seminal study on predictive validity using the data set. The study examined the effectiveness of high school grades and various combinations of SAT I and SAT II scores in predicting success in college. A full account of the study has been published in the journal Educational Assessment and is available on the UC Web site (www.ucop.edu/sas/research/researchandplanning/pdf/sat_study.pdf).
In brief, the study shows that the SAT II is a far better predictor of college grades than the SAT I. The combination of high school grades and the three SAT IIs accounts for 22.2 percent of the variance in first-year college grades. When the SAT I is added to the combination of high school grades and the SAT IIs, the explained variance increases from 22.2 percent to 22.3 percent, a trivial increment.
The data indicate that the predictive validity of the SAT II is much less affected by differences in socioeconomic background than is the SAT I. After controlling for family income and parents’ education, the predictive power of the SAT II is undiminished, whereas the relationship between SAT I scores and UC grades virtually disappears. The SAT II is not only a better predictor, but also a fairer test, insofar as it is demonstrably less sensitive than the SAT I to differences in family income and parents’ education.
These findings for the full UC data set hold equally well for three major disciplinary subsets of the data, namely for: 1) Physical Sciences/Mathematics/Engineering, 2) Biological Sciences, and 3) Social Sciences/Humanities. Across these disciplinary areas, SAT II is consistently a better predictor of student performance than SAT I.
Analyses with respect to the racial-ethnic impact of SAT I versus SAT II indicate that, in general, there are only minor differences between the tests. The SAT II is a better predictor of UC grades for most racial-ethnic groups than the SAT I, but both tests tend to “over-predict” freshman grades for underrepresented minorities to a small but measurable extent. Eliminating SAT I in favor of SAT II would have little effect on rates of UC eligibility and admissions for students from different racial-ethnic groups.
The UC data yield another interesting result. Of the various tests that make up the SAT I (verbal and quantitative) and the three SAT IIs, the best single predictor of student performance was the SAT II writing test. Given the importance of writing ability at the college level, it should not be surprising that a test of actual writing skills correlates strongly with college grades.
Once the Geiser-Studley study was made public, opposition to a change in the SAT I quickly died out. And, the UC faculty were fully engaged in planning for a new admissions test. In March 2002, Gaston Caperton, in his role as president of the College Board, announced that they would eliminate the SAT I as it then stood and replace it — on a nationwide basis — with a new test very much in accord with my original proposal and the planning that the UC faculty had already done.
Adjusting to the Changes
Since then, the College Board has consulted with UC faculty and other groups around the country about the new test. The new SAT I includes: a 25-minute essay requiring students to produce an actual writing sample, a more substantial mathematics section assessing higher-level mathematical skills, and a reading comprehension section that does not include verbal analogies. I believe this is an ideal solution that reflects the changes called for in my ACE speech. UC’s plan is to use the new SAT I and to continue to augment it with two SAT II tests.
When I look back, I’m amazed at the speed with which change has occurred. The ACE speech was in February 2001, the College Board made its decision to overhaul the SAT I in March of 2002, and the new test is now in use for students entering college in fall 2006. In a brief time, college admissions will have undergone a revolutionary change — a change that will affect millions of young people.
My granddaughter is in the first group of high school students to take the new SAT I. As a high school sophomore she took the PSAT — a test preparatory to taking the old SAT I — and did brilliantly. She was not hesitant to accuse me of complicating her future. Her high school quickly adjusted to the proposed changes, and now has students writing a 25-minute essay once a week in preparation for the new test.
One of the clear lessons of history is that colleges and universities, through their admissions requirements, strongly influence what is taught in the schools. From my viewpoint, the most important reason for changing the SAT is to send a clear message to K-12 students, their teachers, and parents that learning to write and mastering a solid background in mathematics is of critical importance. The changes in the SAT go a long way toward accomplishing that goal.