Human Subject: An Investigational Memoir

Previous chapter | Contents | Next chapter | References | Contact


14. Farther Along the Investigation Superhighway

“Toute science expérimentale exige un laboratoire. C'est là que le savant se retire pour chercher à comprendre, au moyen de l'analyse expérimentale, les phénomènes qu'il a observés dans la nature.”

In addition to being a virtual research center, the Web acts as a global bulletin board for researchers to use in recruiting subjects. In one metropolitan area, on one day, there were 14 research recruitment posts on craigslist (which, in case you just awoke from an eight-year coma, is an online information exchange where you can find everything from a new vacuum cleaner to a new mate). Conditions being studied included asthma, schizophrenia, eczema, depression, anxiety, low libido, and nasal congestion.

The subject line for one of the studies said, “Female Research Participants Needed!” It turned out to be a study on domestic violence. By completing the 25-minute survey, I could obtain a $5 coffee gift card (good for about a latte and a half). I figured that $12 per hour was a decent wage. Unfortunately, after I’d finished the survey, I had to spend another 25 minutes writing to the researcher in detail about the various errors and glitches I had found, ranging from missing option buttons on one form to bad grammar (I pointed out that the question “Has anyone forced you to have sex with them?” seemed to raise the notion of gang rape). One thing I’d noticed was that there was no debriefing page. I mentioned this in my email, but later I remembered that debriefing is only required for psychologists, not social workers (further proof that either this ethics stuff is really complicated or my blood pressure is really low).

I was glad I had sent the student my suggestions and corrections, because she was very grateful for my input. She made the changes right away, although she was puzzled by my debriefing question; I explained that I had been temporarily confused, and then I educated her as to what debriefing is. When I got my Starbucks gift card in the mail a week later, I was a little disappointed that she hadn’t added some extra dollars for my extra effort.

The main beneficiary of gift cards is the retailer, who earns interest on all that money while the card sits unused in someone’s wallet. So, to do my part in thwarting corporate wealth, I rushed out to Starbucks and exchanged my card plus sixty cents for half a pound of coffee. I had reasoned that someone who can’t even keep a job as a research subject has no business buying fancy coffee drinks.

Not all of the studies posted on craigslist that day were being conducted by people at institutions of higher learning. A few of them had purely commercial motives. One of the ads, posted by a large market-research firm, said that it would pay me $125 for two hours of my time. Participants needed to be health-conscious women above a certain age. I figured that hypochondria and anorexia were legitimate forms of health-consciousness, so I called.

First the screener told me which sessions they had available, and she asked which one I would prefer to attend. After I’d made my choice, she asked me a series of screening questions dealing with how much I cared about my health, where I bought my groceries, and what I did to stay healthy. In the last category, she read a long list of brand-name products, asking if I had used any of them. She read the list fairly quickly, and I had never heard of any of the products, but I managed to jot down a few of the names:

After I’d answered all the questions, the screener told me that I didn’t qualify for the study, but that they would call me if there was another one for which I was qualified. I never heard from them again. I guess they weren’t interested in someone who had never taken advantage of the health-promoting substances engineered by concerned and caring food manufacturers.


One of the more intriguing ways that people can now recruit subjects online is through a contracting service called Amazon Mechanical Turk. Yes, the same startup-turned-behemoth that brought us the ability to read informative customer reviews of Bali bras (“The MUST HAVE bra!!!”), Black & Decker drills (“Nice drill, nice price”), and Godiva dark chocolate (“how can you go wrong”), and to purchase them all with one click, also runs an online clearinghouse where people who are willing to work literally for pennies can find ridiculously low-paying jobs.

Mechanical Turk gets its name from an 18th century hoax involving an alleged chess-playing machine that actually had a human being inside. Amazon is capitalizing on the notion, advanced by Luis von Ahn and others, that there are certain tasks that human beings can do better than computers.

The jobs available to do at Mechanical Turk are called HITs, short for Human Intelligence Tasks. On one morning in June 2007 there were 223 sets of HITs available. Here are a few representative examples, including the amount one could earn by completing one HIT in each group:

In addition to the demographic question above, there were a few other survey-type HITs, including “Quick Survey on Family History” (5 cents, 1 HIT available). Many of the HITs appeared to be posted by large companies seeking cheap labor, but some were posted by individuals. This one set off my illegal-activity detector:

A series of paragraphs, one per HIT, were to be paraphrased, using the same concepts and style as the original but with different words. It looked to me as if someone wanted to copy an essay or article without having the result look plagiarized. I reported this suspected violation, using the Mechanical Turk online contact form. A few minutes later, when I looked again for those HITs, they were gone. Probably some Turker, as the regulars call themselves, had claimed them all, but I preferred to think that the Mechanical Turk police had acted swiftly to expel the offender. (My bubble was burst when I finally got a thank-you-we’ll-look-into-it email several weeks later.)

A few months earlier Bob (the friend who corresponded at length with InvestiGuard in chapter 10) had used Mechanical Turk to recruit 200 participants for his online study. The study took about 25 minutes to complete, and each participant got paid the princely sum of 75 cents. Volunteers were also solicited from several online and email forums in subject areas related to the project.

The 200 slots for paid subjects, at less than $2 per hour, filled up within 24 hours, while fewer than 200 unpaid volunteers signed up in the course of five days. This would seem to indicate that money, even paltry amounts of it, is a greater incentive for participating in research than interest in the topic. However, the results revealed a drawback to relying on paid subjects: They worked faster, but were less careful and less accurate, than the unpaid volunteers. In the academic research arena, there’s a widely held view that it’s OK to pay participants for their time and inconvenience, but that very large sums are an “undue inducement.” So what to make of the study subject who leaps at the chance to earn a pittance?

What kind of person will slave over a hot computer for mere pennies per hour? Surprisingly enough, they aren’t all unemployed 30-something loners living in their parents’ basements. Nor can they be people who live in countries where 50 cents an hour puts you in a high-income bracket, because Amazon requires that you have a U.S. bank account. No, the Turkers seem to be just regular folks like you and me. (Well, like you anyway.)

I found a very unscientific poll on a discussion board for Turkers, where people were asked “What do you do for your ‘day job’?” Only 8.3 percent said they were unemployed. The working Turkers included students, teachers, and people in the legal field. There were even some people who worked in information technology; apparently they couldn’t get enough interaction with computers on the job. One woman said that she ran a daycare and computer consulting business “at the same time,” which seemed like an impractical combination. (“Try rebooting. I’ll call you back after I tie Emma’s shoes and give Jacob his bottle.”) All of the participants in the discussion seemed to get great satisfaction, if not great wealth, from Turking.

I briefly tried out the Turker lifestyle for myself. I can’t remember any of the HITs I did, but here’s the weekly report I got after a few hours of Turking:

Your HIT activity for this week:
—Number of HITs accepted: 5
—Number of HITs returned: 0
—Number of HITs abandoned: 1
—Number of HITs submitted: 4
Approvals and payments that occurred this week:
—Number of HITs approved: 3
—Number of HITs rejected: 0
—HIT reward earned: $0.24
—Total Amount earned this week: $0.24

I never did figure out how I could have submitted four HITs, had no rejections, and had only three HITs approved, but I didn’t think it was worth the eight cents or so to investigate. However, I did give the service one more try. This time I accepted a HIT that involved transcribing what was described as a 5-minute audio file. The pay was 80 cents, and the task seemed pretty simple.

I knew this task was bad news when the first three seconds of audio were an indistinguishable jumble of mumbling voices. After that it got a little easier to identify actual sentences and specific speakers, but I still had to replay most sentences at least once. It didn’t help that the conversation was about shopping at trendy stores, something I only do once every decade or so (and then only when my mother is paying). What I was transcribing, I soon realized, was a market-research focus-group session.

My instructions had said that when the facilitator was talking, he was to be labeled as “Man,” and when any of the women were talking, I was to call them all “Woman.” This lack of regard for each woman’s individuality seemed like a form of objectification, which I was sure would rankle many feminists.

The conversation included exchanges like this:

Man: So do any of you besides Katherine shop at Ann Taylor?

Woman: Yeah, they have nice suits.

Woman:—dresses you can wear to a club.

Woman: They cater to the working woman.

Man: OK. And what about the Loft? How is it different? Does anyone know the difference?

Woman: I didn’t really know—

Woman: You explained it just now. I thought it was an outlet store, but it’s —

Man: Yeah.

Woman: It’s got more variety.

Woman: It’s cheaper. It has pants for work; it has tops; it has shorts.

Woman: And you can dress it up and wear it to a club.

After half an hour I had barely made it halfway through the audio file. I decided it wasn’t worth 80 cents an hour to further the research agenda of corporate America. So I clicked on the button that said “Return HIT” to let some other sucker work on it.

Amazon Mechanical Turk is at the crossroads of several kinds of research. First there are the kinds of opportunities posted by the occasional academic researcher. In a related category are the many market-research survey questions. Both of these types of research involve human subjects and are designed to contribute to generalizable knowledge, but only the former requires institutional review.

Then there are HITs where someone is willing to pay a few cents for some information, like “Find the email address of the owner/manager of Powerhouse Gym—Daytona Beach FL.” This wouldn’t meet anyone’s definition of human-subjects research. It’s just a case, like the one I encountered in the HIT described above, of a company that would prefer to pay pennies to contractors than real wages to an employee. And, sadly, that’s the main kind of HIT that’s now available at Mechanical Turk.

Perhaps Amazon’s greatest contribution to research is the very existence of the Mechnical Turk service. Not only is it an experiment in the field of human computation, i.e., giving humans the work that computers would find difficult or impossible, but there could be much to study in the evolving ways it has been used, in the economics of pricing and selecting HITs, and in the characteristics of the people who use it for finding either cheap labor or low-paid drudgery.


A few of the Mechanical Turk HITs are in the category of usability testing. That is, someone has a product or a Web site that is at some stage of development, and actual users are recruited to try it out. For instance, one day there was this lucrative opportunity:

Test a mobile/cellphone web page—3 cents

I suspected that usability testing was another area where private industry could get away with inflicting all manner of emotional harm on subjects, while those in government or academia were required to get IRB approval. So I was a bit surprised to learn that the Usability Professionals Association (http://www.usabilityprofessionals.org/) has a voluntary code of conduct.

This shouldn’t have surprised me. After all, what group doesn’t have some sort of ethics code these days? The Center for the Study of Ethics in the Professions (http://ethics.iit.edu/codes/) lists hundreds of codes in dozens of disciplines, including the Enron Code of Ethics (65 pages —I guess no one had time to read it) and the Tamaki Makau-rau Accord on the Display of Human Remains and Sacred Objects.

The UPA code incorporates the following principles:

Although there is no mention of human subjects or generalizable knowledge, these seven principles are similar to the ones behind the Common Rule. Under the privacy principle, there’s even a requirement that participants provide “informed consent for use of all data collected.”

I don’t know if any UPA members worked on the document Usability testing of voting systems, published by the U.S. Election Assistance Commission in October 2003 (http://www.eac.gov/docs/usability.pdf). This commission was created by the 2002 Help America Vote Act, which was supposed to ensure that there wouldn’t be a repeat of the kinds of punch-card mishaps that tainted the 2000 election in Florida. (Instead we now have voting-machine malfunctions and shortages.) Among other tasks, the EAC is required to develop a program “for the testing, certification, and decertification of voting systems.” The guide to usability testing was intended for both election officials and the manufacturers of the voting systems.

Alluding vaguely to “appropriate guidelines for human subjects protection,” the usability guide states that test administrators “should take care to guard the test participants against physical and emotional harm.” It further notes that “government and/or institutional rules and regulations may require test administrators to pass the test plan through an Internal Review Board.” I suppose that if a local election board gets funds from a federal agency that has adopted the Common Rule, they might indeed have to get their plan approved, but the most likely funding agencies would be either the EAC itself or the Federal Election Commission, and neither agency has adopted the Common Rule. As for the hapless employee of Diebold Election Systems (or any other manufacturer of voting machines) who comes across this paragraph, she would probably be flummoxed by the concept of human-subject protection.

The usability guide suggests compensating subjects for their time and travel. It also states that it’s “customary” to have the participant sign a statement outlining his or her rights, including the right to withdraw at any time without forfeiting compensation. This provision is actually more generous than most. of the consent forms that I’ve seen or signed. Most just say that the subject can withdraw at any time; sometimes the phrase “with no penalty” is added. I’ve never seen one that promised payment even if the subject quits before the study is over. When the question of whether to pay a quitting subject arose on the IRB Forum a few years ago, there was, as usual, much disagreement as to what was ethical or appropriate. But most of those weighing in agreed that as long as the rules are spelled out in the consent form, it’s up to the individual PI or, more likely, the IRB, to decide if subjects who quit early will get paid.

During the actual testing of the voting system, the guide warns, subjects may become frustrated while attempting to perform assigned tasks. The test administrator should establish a time limit after which someone either offers the subject help or makes the decision to move on to the next task.

This reminded me of a book I had recently seen about usability testing of library Web sites. In addition to practical advice about how long to let the subject try to perform a task, this book suggested that testers use wording designed to spare the subject any feelings of inadequacy. (Norlin & Winters, 2002) For example, when the subject can’t figure out even the most basic task on a Web page, the tester would say, “I guess we didn’t design this page very well, did we, Mr. Smith?” Leave it to librarians to stroke others’ egos while disparaging their own abilities.

The EAC guide recommends videotaping test sessions so that they can be reviewed later. It also suggests making a “highlights” video to accompany written reports of test results. Despite its apparent concern with research ethics, the EAC doesn’t mention any privacy or confidentiality issues that may arise from using videotaped sessions for promotional or other purposes.

When determining whether usability studies constitute research, as defined by the Common Rule, one important question is whether the studies are designed to contribute to “generalizable knowledge.” As you may recall from way back in chapter 2, HHS chose not to define this term, and there is no agreement among researchers as to what it means.

Information obtained during testing with a few users is always generalized to apply to the entire group of potential users. Is that enough generalization to fit the definition of research? Maybe not. But what if the information gathered forms the basis for a general theory of system design, implementation, or testing, and what if those who develop this theory then disseminate their findings at conferences or in journal articles? In that case, the researchers could claim that the original studies were never designed for that purpose. In fact, that’s a tactic used by at least one researcher in an effort to circumvent the IRB (as described in chapter 2). The Common Rule allows you to analyze human-subject data that was previously gathered, as long as the subjects are not identifiable.

Just when I thought I had exhausted the subject of usability testing, without actually engaging in any myself, I received an email announcement offering just such an opportunity. A graduate student in computer science, whom I’ll call Gina, had designed a Web browser extension to help people save and organize Internet content. Now she needed people to test it, and she was going to pay them a dollar a day for their help. Heck, that was more than I ever made as a Turker.

By agreeing to install and use the software on my computer, I was also acknowledging that, whenever I was using it, Gina would be keeping track of every Web page I visited. Because she was a Big U student, she had needed to get this minimal-risk study approved by InvestiGuard, a process that took two months. The consent form included these sentences, in a section called Risks, Stress, or Discomfort: “We do not intend for this experiment to cause you stress.” and “If any of the logged data, surveys, or interviews make you uncomfortable, you may stop your participation.”

At first I tried to use the browser extension every day, fearing that I might not get my dollar for the days when I didn’t use it. But it wasn’t really all that helpful, and I was so busy with other things that I would forget to use it for days at a time. I couldn’t totally forget, though, because every week Gina would send all her subjects a Web research assignment, and once a week a survey popped up with a series of “short questions”—as if the length of the questions dictated the length of the answers. (“What’s the meaning of life?” Now there’s a short question for you.)

After a while, I started noticing that my experimentally enhanced Web browser had become sluggish and cantankerous. That seemed like enough of an adverse event for me to discontinue my participation. But then Gina developed an improved version of the software, which didn’t seem to cause as many problems. For the first time in my computing life, I actually wished that an application would misbehave. If only my browser would become dysfunctional, I thought, I could tell Gina that I had tested her invention, had found it lacking in usability, and was now ready to get on with my life.

Even though the consent form said I could withdraw at any time, I was too conscientious to do so without a good reason. I wondered how many other study subjects felt an obligation to continue in a study when they really wanted to quit. I looked for information related to this question, but no one seems to have studied it.


Previous chapter | Contents | Next chapter | References | Contact