Use Big Samples Online to Increase Replicability

Nate  Kornell

Letter/Observer Forum

Use Big Samples Online to Increase Replicability

Nate Kornell

December 27, 2012

Tags:

Log in to Save for Later

Hostess Brands — makers of Wonder Bread, Twinkies, Ding Dongs, and other products — recently filed for bankruptcy. There are many complex reasons the company ran into trouble, but there is also an obvious one: If a company wants to deliver more bread to the shelf, it has to pay for more workers, equipment, supplies, shipping, and so on. Food businesses do not easily increase in scale.

Google, on the other hand, is not considering bankruptcy. In 2004, Google received around 300 million search queries per day. As of 2011, that number had increased by more than an order of magnitude, to over 3 billion. A search algorithm that works for one search can work just as well for 10. There are costs associated with increasing a website’s scale, but they are relatively minor compared with those of most businesses. Online businesses are relatively easy to scale.

As a psychological scientist, my research used to resemble the Hostess business model. To run twice as many participants in a study generally required that I and/or my research assistants spend twice as much time in the lab, and I had to wait about twice as long to get the data.

Now I run the majority of my studies online, mainly using Amazon’s Mechanical Turk to recruit participants. If I offer participants $2.00 for a 30-minute study, I can reliably get 20, or 200, or more people to complete my study within 24 hours. In other words, my research now scales like Google’s search. If I want to run more participants, I just type a larger number into the box on the Amazon webpage. The costs exist — I have to sort out (and sometimes code) the data, and I have to pay more participants — but they tend to be minimal compared with the costs of running studies in the lab.

The world of psychology is awash with concerns about replicability. One way to increase replicability is to run more participants. Running studies online makes doing so eminently feasible. Those of us who began our careers running subjects in the lab tend to have a Hostess mindset: We don’t want to run any more participants than we need to. If I would have run 40 participants in a study I conducted in 2005, my first instinct is to do the same in 2013. Our science would benefit from adopting a Google mindset: If I would have run 40 participants in 2005, why not run 100 in 2013? In addition to increasing statistical power, running more participants can also avoid Type I or Type II errors.

I recently reaped the benefit of using a large sample. I conducted a study that looked very promising after we had collected data from about 40 participants. Having recently read about the dangers of running subjects until p < .05 and then stopping, I decided to run more participants. Unfortunately, the effect faded away; fortunately, I found out the truth. Adopting a Google mindset, and using larger samples online, will not solve all replicability problems. But it can help.

Observer > 2013 > January > Use Big Samples Online to Increase Replicability

Cookie	Duration	Description
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
uvc	1 year 27 days	Set by addthis.com to determine the usage of addthis.com service.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_3507334_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Cookie	Duration	Description
loc	1 year 27 days	AddThis sets this geolocation cookie to help understand the location of users who share the information.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Letter/Observer Forum

Use Big Samples Online to Increase Replicability

Related

New Report Finds “Gaps and Variation” in Behavioral Science at NIH

APS Advocates for Psychological Science in New Pandemic Preparedness Bill

APS Urges Psychological Science Expertise in New U.S. Pandemic Task Force

Related

New Report Finds “Gaps and Variation” in Behavioral Science at NIH

APS Advocates for Psychological Science in New Pandemic Preparedness Bill

APS Urges Psychological Science Expertise in New U.S. Pandemic Task Force

Cookies