## What Statistics Can Do

How To Ask Sensitive Questions without Getting Punch in the Nose

1. Introduction

Have you had a homosexual experience? Are you an atheist? Many people might well refuse to answer these questions because they think such matters are personal or confidential. However, such questions may be other people’s business. For example, consider the case where military officials are interested in estimating the percentage of military men using hard drugs, to then decide whether or not there is a need to expand the rehabilitation program. The officials will need to interview the men with methods that elicit truthful answer to this sensitive question. Such method will be explained after some elementary definitions below.

2. Definitions

• Statistical inference: A collection of methods by which we make inferences about a population based on information gained from a sample
• Population: the set of all possible observations of some characteristic of interest
• Sample: a portion or subset of a population
• Simple random sampling: a procedure to draw a sample of size n from a population of size N in which every possible sample of size n has the same chance of being selected
• Simple random sample: the sample obtained from simple random sampling
• Population parameter: a numerical value that gives information about a population

For instance, suppose one wishes to study the Grade Point Average (GPA) of accounting majors attending certain college. The set of GPAs of all the accounting majors is the population, and the set of GPAs of all junior accounting majors constitutes a sample. The highest GPA among all accounting majors and the average GPA for all accounting majors are two examples of population parameters.

3. The Randomized Response Method

This method is based on the idea that the less revealing answers to questions are, the better cooperation from the interviewee will be. How does the method work?

Let us designate people with certain sensitive characteristic as Group A. Firstly the interviewer is furnished with a deck of cards that are identical except that a known fraction P is marked with number 1 and the remaining 1-P is marked with number 2.

The interviewer chooses a random sample of size n. The following procedure is repeated with each of the n people. An interviewee is instructed to choose a card at random full deck and note the number appearing on the card. The number is NOT shown to the interviewer. The interviewee is now shown the two statements:

• Statement 1: I am a member of Group A
• Statement 2: A am not a member of Group A

He is asked to respond “yes’ or “no” to the statement that corresponds to the number he has chosen.

Let us introduce the following notation:

x: proportion of the population belonging to A

1-x: proportion of the population not belonging to A

P: probability that a card number 1 is chosen

1-P: probability that a card number 1 is chosen.

y: probability of “yes” responses

m: number of “yes” responses

n: size of sample

Using addition and multiplication theorems from probability theory,

P(“yes” responses)= P(card 1 chosen and “yes” reported) + P(card 2 chosen and “yes” reported)

P(“yes” responses)= P(card 1 chosen).P(“yes” answer|card 1 chosen) + P(card 2 chosen).P(“yes” answer|card 2 chosen)

y= Px + (1-P)(1-x)

substituting  y by m/n, we have

x=(P-1+m/n)/(2P-1)          P is not equal to 1/2……………………….(1)

Example:

A study is to be designed to estimate the proportion of seniors in a college who have cheated on a final examination some time during their college career. The interviewer chooses a simple random sample of 200 students. The randomization deck is composed of cards ¾ of which marked 1 and ¼ marked 2. Each interviewee is shown the two statements:

• Statement 1: I have cheated on a final examination during my college career
• Statement 2: I have not cheated on a final examination during my college career

The interviewee is then asked to choose a card at random from the deck, read the statement that corresponds to the number he drew, and respond “yes” or “no”. Repeating this with each of the 200 seniors, the interviewer receives 60 “yes” answers. Using equation (1), we get

x=(3/4 – 1 + 60/200)/(2.3/4 – 1)

x=0.1

So we estimate 10% of the seniors at the college have cheated on the final examination during their college careers.

4. Unrelated Question Model

This method is a modification from the preceding one. Here, the interviewee randomly chooses and answers one of the following questions:

• Question 1: Are you a member of Group A?
• Question 2: Are you a member of Group B?

Group A represents those people possessing the sensitive characteristic. Group B is chosen to be a non-sensitive group and so that the proportion of the population belonging to it is known. For example, “Were you born in December?” or “Is your last digit of your ID card is odd?”

With almost the same procedure, we have

y= Px + (1-P)b,

with  b be the proportion of the population belonging to B.

Then, x= [(P-1)x + m/n]/P                P>0…………………..(2)

Example:

We wish to estimate the proportion of female students at a college who have had a homosexual experience. The interviewees are presented with the following questions:

• Question 1: Have you ever had a homosexual experience?
• Question 2: Is the last digit of your ID number even?

The randomization device is again ¾ marked 1 and ¼ marked 2. A simple random sample of size 100 is taken and the interviewer gets 18 “yes” answers. In this case,  is ½. By equation (2) we have

x= [(3/4 – 1).1/2]/(3/4) = 0.73

Therefore, we can estimate 7% of the female students at the college have had a homosexual experience.

After learning this topic, many students get very interested in studying statistics when they find out that statistical methods can allow them to get honest answers to questions dealing with sensitive topics. The following quote may sum up very well the general attitude of them when they see randomized response methods in action:

“But, Dr. X, how can you tell how many of us smoke pot, without even knowing which of us, if any, answered your question? If statistics can do that, I might even get interested enough to learn some statistics for myself!”

So, how about you? Are you now interested enough, too?

Maceli, J., How to ask sensitive questions without getting punched in the nose. Modules in Applied Mathematics, Vol. 2, edited by W. F. Lucas, Springer-Verlag, New York, 197