Signal detection theory is one of the most successful theories in all of cognitive science and psychology. This is a tutorial notebook for learning the basics by running a little signal detection experiment on yourself and analyzing the data. It accompanies the course *Cognitive Science I: Perception* at TU Darmstadt and replaces the lab work during the 2020 corona pandemic.
Homework for *Cognitive Science I: Perception* at [TU Darmstadt](https://www.tu-darmstadt.de/cogsci/studying_cogsci/index.en.jsp). This homework replaces the lab work during the 2020 corona pandemic.
Signal detection theory is one of the most successful theories in all of cognitive science and psychology. We assume you've already covered the basics of signal detection theory elsewhere. In the course we use the following text:
Wickens, T.D. (2002). *Elementary Signal Detection Theory*. Oxford University Press.
For this homework the most relevant sections are 1.1-1.3 (p. 3-15), 2.1-2.3 (p. 17-26) and sections 3.1 (p. 39-42) and 3.3 (45-48).
Here, you will consolidate the basics of signal detection theory by running a little signal detection experiment on yourself and analyzing the data. To collect data follow the instructions in `signal_detection_data_collection.ipynb`. After you've collected your data start on the homework excercises in `signal_detection_homework.ipynb`.
## Installation
...
...
@@ -26,21 +34,11 @@ start jupyter notebook
jupyter notebook
```
and click on the tutorial in the browser window that will open. Now you're ready to go.
## Additional Reading
and click on `signal_detection_data_collection.ipynb` in the browser window that will open. Now you're ready to go.
A good place to start is
Wickens, T.D. (2002). *Elementary Signal Detection Theory*. Oxford University Press.
For this tutorial the most relevant sections are 1.1-1.3 (p. 3-15), 2.1-2.3 (p. 17-26) and sections 3.1 (p. 39-42) and 3.3 (45-48). TU Darmstadt has an [online copy](https://hds.hebis.de/ulbda/Record/HEB379323249) where you can read and copy the respective sections. Please don't borrow the whole e-book because then nobody else in class can read it while you borrow it.
An excellent paper to get an overview of why you should care about signal detection theory beyond the simple experiment in this tutorial is
Swets, J.A., Dawes, R.M., & Monahan, J. (2000). Psychological Science Can Improve Diagnostic Decisions. *Psychological Science in the Public Interest*, 1(1):1-26, DOI: [10.1111/1529-1006.001](https://doi.org/10.1111/1529-1006.001).
## Working with jupyter notebooks and git
Just in case you're working with git: To keep version control on git clean, it is better not to commit the output of the cells and the count how often they have been run and so forth.
For a clean, well readable diff it helps to strip all this information before commiting.
With the following configuration it will be done automatically, leaving your local copy intact.
"Now, you'll run 5 conditions where `p=[0.1,0.7,0.5,0.3,0.9]` with the same `intensity`. In each conditions you will do at least 5 blocks of 50 trials. This experiment should take abour an hour. Make sure you have enough time and energy before you start. Also make sure to have breaks in between as appropriate so that you really do your best. If you want you can also do more trials, of course. Just re-run each condition.\n",
"Now, you'll run 5 conditions where `p=[0.1,0.75,0.5,0.25,0.9]` with the same `intensity`. In each conditions you will do at least 5 blocks of 50 trials. This experiment should take less than 1.5 hours. Make sure you have enough time and energy before you start. Also make sure to have breaks in between as appropriate so that you really do your best. If you want you can also do more trials, of course. Just re-run each condition.\n",
"\n",
"When you start the first condition with `p=0.1`, keep in mind that there is a signal only in 10% percent of trials!"
Signal detection theory is among the most successful theories in all of cognitive science and psychology. You will run a little signal detection experiment on yourself and analyze the data.
When you're at the ear doctor, she will increase the intensity of a tone until you can hear it. Until you can detect it. She does this to measure how sensitive your hearing is. Imagine you are a radiologist and have to decide whether on an x-ray there is a tumor present or not. This is also a detection task. You have to detect the tumor. We will be simulating this last task with our stimuli but what we will learn is applicable more generally to all detection tasks.
%% Cell type:markdown id: tags:
## Stimuli
Instead of detecting a tumor, in our task you have to detect the letter *A* in a noisy image. Running the code in the next cell will generate a plot. In it you see a noisy image *without* the letter *A* on the left. We call this the noise-only condition. On the right you see a noisy image *with* the letter *A*. This is the signal-plus-noise condition.
%% Cell type:code id: tags:
``` python
# we need psychopy and a few other libraries (pandas, numpy, matplotib, etc.)
# they are all imported in `signal_detection.py` in the same folder
fromsignal_detectionimport*
# in this cell press shift+enter to run only this cell and generate the plot!
imshow_stimuli(125)
```
%% Cell type:markdown id: tags:
The one parameter of the function `imshow_stimuli` allows you to change the intensity of the stimulus. Play with it! When it is at zero there is no signal (just like on the left). The *A* is clearly visible with the intensity being at `125`. You can probably still kinda see the *A* when you set the parameter to `50`. You should know you're looking at grayscale images where `0` is black and `255` is white. The noise-only pixels have a mean of `127`, i.e. gray, and the standard deviation of the noise was set to `25`. The pixels of the *A* have a mean of `127` plus the intensity that you've set and the same standard deviation.
%% Cell type:markdown id: tags:
## Some Cautionary Notes
The question we want to answer is: How well can you detect the *A*. Or more generally, how can we measure your sensitivity in any detection task? In a second step, which we will not address here, we could then compare your sensitivity to everyone else's in the class to find out who's the best *A*-detector (or the best radiologist or who has a hearing problem). The reason why we won't attempt a comparison is that you're all doing this task on different computers and with different monitors. Some monitors might be better than other monitors and allow for a higher contrast. Also your detection performance will depend on the luminance of the monitor and the lighting in your room. Hence, if we measure your performance on the *A*-detection task we're really measuring your performance *on your setup*. Therefore, any comparison is not meaningful psychologically (but note that some radiologists might also have better setups than other radiologists and we might therefore also care about the setup in some applications). Given that we expect the lighting in your room to also play a role you should try to keep all conditions as constant as possible, e.g. by sitting at the same desk and by closing the curtains and switching off all lights. Now is also a good opportunity to clean your screen. Also make sure that you're well rested and able to concentrate while you're doing the experiments. We want to measure your best possible performance and not you're performance when you're tired or worn out. Psychological data are already noisy enough as they are, we don't want to introduce additional, unnecessary variability. With these comments out of the way. How do we measure sensitivity?
%% Cell type:markdown id: tags:
## The Yes-No Task
The first idea one could have is the following: We choose an intensity, show the *A* with this intensity to a subject 100 times and record how often the subject reports to see the *A*. The trouble with this experiment ist that the subject could just lie and always say "I see it". A simple fix is to introduce catch trials. On each trial we flip a coin an there either is (signal trial) or there is not (noise-only or catch trial) an *A*. In this way we can offer the performance objectively.
A block in the experiment will always consist of 50 trials. In each trial you will first see a fixation cross that you should look at. Then, very briefly, the stimulus will show up. With probability `p=0.5` there will be an *A*. Your task is to then press either `y` (for yes) or `n` (for no) to answer the question whether you detected the *A*. Let's try this:
%% Cell type:code id: tags:
``` python
data=run_block(subject='test',intensity=100)# run this cell by shift+enter
```
%% Cell type:markdown id: tags:
This was an easy block with a high intensity (`intensity=100`). Let's look at how well you did.
%% Cell type:code id: tags:
``` python
data=load_data('test')
summarize(data,group='block').T
```
%% Cell type:markdown id: tags:
The first row (block) shows you the blocks you've done and the other rows how many trials you did and how well you did in each block. The first thing you want to look at is the row $N$. It says you did 50 trials. The row $pc$ shows the proportion correct. Since this was an easy block hopefully the number will be close to 1. If not you need to practice this task a little more until you make only 1 or 2 mistakes in a block of 50 trials. These mistakes are not because you didn't see the *A* but because you accidentally pressed the wrong button. As the task is pretty fast paced this can happen and it's really hard to bring down the number of mistakes down to zero even if you see all *A*s. As we want to measure how well you can see the *A* it's important that you really learn the response mapping (which button is which) and make as few "finger errors" as possible. So really do practice this task by re-running the `run_block`-cell above (with `subject='test'` and a clearly visible `intensity=100`) before you move on.
%% Cell type:markdown id: tags:
## Psychometric Functions
The obvious experiment to do now is to vary the intensity (our independent variable) and see how that affects the proportion of correct responses (our dependent variable). Because we are keeping the stimuli constant in each block, i.e. we show either the catch trial or a stimulus with a fixed intensity, people call this method of measuring detection performance the *method of constant stimuli*. I usually start the experiment with a warm-up block that is pretty easy for subjects to get right. For example, set the `intensity=60`:
%% Cell type:code id: tags:
``` python
subject='psychometric_xy'# change xy to your initials
```
%% Cell type:code id: tags:
``` python
data=run_block(subject,intensity=60)
```
%% Cell type:code id: tags:
``` python
data=load_data(subject)
summarize(data).T
```
%% Cell type:markdown id: tags:
That was easy. Note that instead of arranging the table by block we have now arranged it by intensity. Let's make it harder by re-running the last two cells with `intensity=40` and see how hard that is. For me (on my setup) that was still pretty easy. So I made it harder and went down to `intensity=30`. Still very easy although I made 1 or 2 real mistakes. So I then tried `intensity=20`. That was already quite difficult for me with about 75% correct. I often had to guess. Still 75% seems above chance level. So I made it even harder with `intensity=10`. With 48% correct I was pretty much at chance level. So I made it a little easier again, `intensity=15`. When you do this yourself you want to choose the intensities such that some data points are close to chance level and some will give you perfect performance and you also want some data points in between. It's a little cumbersome to always look at the table so let's make a plot instead.
%% Cell type:code id: tags:
``` python
thresholds=psychometric_function(data)
```
%% Cell type:markdown id: tags:
We have also fit a function to your data. This function is called a *psychometric function*. A psychometric function has some property of the stimulus that we vary on the x-axis. Here that's the stimulus intensity. On the y-axis there's always some proportion of the subject's responses, usually the proportion of correct responses. Where the dotted lines interect the x-axis are the 55%, 65%, 75%, 85%, and 95% thresholds. Those are the stimulus intensities that are necessary to achieve the respective percentage of correct responses. You should have at least 5 data points around these performance levels and a couple of data points with lower and higher performance so that you have the full psychometric function covered with your data. You should at least have collected 500 trials with good coverage of all performance levels to get a good estimate of the psychometric function.
%% Cell type:markdown id: tags:
## Hits and False Alarms
Back to our original question: How do we measure the sensitivity of a subject? We just did. By measuring the proportion of correct responses in a yes-no task. PC was our measure of sensitivity. However, measuring sensitivity by only looking at pc is theoretically a little unsatisfying. The data are actually a little more complicated. By only looking at pc we're ignoring the fact that there are two kinds of trials. The signal trials and the noise-only (or catch) trials. There are two ways to be right and two ways to be wrong as the following table shows:
If you say yes in a signal trial you got a hit. If you say yes in a noise-only trial you have a false alarm. It turns out, we only need to count the hits and false alarms because every response that wasn't a hit in a signal trial was a miss. And every response that wasn't a false alarm was a correct rejection in a noise-only trial. The correct trials are the sum of the hit-trials and the correct-rejection-trials. Let's look at your data from before by splitting it up in signal trials and noise-only trials.
%% Cell type:code id: tags:
``` python
data=load_data(subject)
summarize(data,mode='roc').T
```
%% Cell type:markdown id: tags:
The column $N_1$ tells you the number of signal trials you have done for each intensity. The column $h$ gives the hitrate, i.e. the proportion of all the signal trials where you got a hit. Similarly, $N_0$ is the number of noise-only trials and $f$ is the false alarm rate, i.e. the proportion of all the noise-only trials where you incorrectly said you detected something.
The data are obviously a little more complicated than it first seemed when we only looked at the proportion of correct responses. In each block we have measured two numbers. The hit rate and the false alarm rate and we summarized the performance by only looking at the proportion of correct responses. But is the proportion of correct responses really a suitable measure of sensitivity? Or is there a better way to summarize the 2x2 response table into one measure of sensitivity?
## Manipulating Response Biases
A potential problem of using correct responses to measure sensitivity is that it confounds the sensitivity measurement with the so-called *response bias*. Why might this be problematic? Imagine two subjects with the same sensitivity, both doing a yes-no task. The first subject tends to answer *no* rather than *yes* when in doubt. She has a response bias towards *no*. The second subject is an unbiased subject who has no such preference. In a trial where the first subject is in doubt, she does not know whether it is a noise-only or a signal trial. Hence, compared to the second, unbiased subject, she will have a higher number of correct rejections in noise-only trials. Simply because she will say *no* more often. But, for the same reason, she will also have a lower number of hits in signal trials. The proportion of correct responses will only be the same for both subjects if the increase in correct rejections for our biased subject will exactly compensate for her loss in hits. Whether this is really the case is an empirical question. So let's do an experiment to find out (spoiler alert: it's not the case).
Instead of imagining two subjects with the same sensitivity, we will make the measurements on one subject: You. Assuming that your sensitivity is constant, we will try and induce a change in your response bias by an experimental manipulation. There are several ways to do this. For example, we could give you 10€ for each hit but only 1€ for each correct rejection. In this case you'll probably be biased to answer *yes* when in doubt. Or we could give you 1¢ for each correct answer and an unpleasant electric shock for each false alarm but not so for the misses. In this case you would probably quickly develop a bias towards *no*. The simplest manipulation, however, is to change the relative probabilities for signal and noise-only trials. If only 1% of all the trials are signal trials then in cases where you're in doubt it's much more likely that the stimulus was a noise-only stimulus, hence you should have a strong bias towards *no* responses. Conversely, if 99% of the trials are signal trials you should have a strong bias towards *yes* responses.
When we measured the psychometric function we held the probability for a signal trial constant at `p=0.5` and varied the stimulus intensity. Now, we'll keep the `intensity` fixed and vary the so-called *prior probability*`p`. Let's choose the stimulus intensity for the new experiment to be roughly where you got 75% correct:
%% Cell type:code id: tags:
``` python
print('Your 75%%-threshold was at %d.'%round(thresholds[2]))
```
%% Cell type:markdown id: tags:
Make sure you change the subject variable and the intensity in the next cell before you prodeed.
%% Cell type:code id: tags:
``` python
# from here on we will start a new experiment and therefore re-set some variables
fromsignal_detectionimport*
subject='roc_xy'# change xy to your initials
intensity=20# put in the 75% threshold by hand the first time you get here
# so that it's not recalculated if you rerun the script
n=5# number of blocks for roc measurements
```
%% Cell type:markdown id: tags:
Now, you'll run 5 conditions where `p=[0.1,0.7,0.5,0.3,0.9]` with the same `intensity`. In each conditions you will do at least 5 blocks of 50 trials. This experiment should take abour an hour. Make sure you have enough time and energy before you start. Also make sure to have breaks in between as appropriate so that you really do your best. If you want you can also do more trials, of course. Just re-run each condition.
Now, you'll run 5 conditions where `p=[0.1,0.75,0.5,0.25,0.9]` with the same `intensity`. In each conditions you will do at least 5 blocks of 50 trials. This experiment should take less than 1.5 hours. Make sure you have enough time and energy before you start. Also make sure to have breaks in between as appropriate so that you really do your best. If you want you can also do more trials, of course. Just re-run each condition.
When you start the first condition with `p=0.1`, keep in mind that there is a signal only in 10% percent of trials!
df=df.droplevel('intensity')# intensity is always the same anyway
df
```
%% Cell type:markdown id: tags:
First check the proportion correct column *pc*. This is your average performance over all the blocks you did. It's likely that even for the condition $p=0.5$ your performance was better than 75%. At least that was the case for me. Perhaps that's because I have become better at the task due to learning. Or perhaps it's just random variaton. Doesn't really matter. In any case we'll assume that for this experiment your sensitivity stayed relatively constant -- since the stimuli stayed constant. We'll get back to the question how to interpret the *pc* column in a second.
Before that, check that with an increase in $p$ your false alarm rate $f$ and your hit rate $h$ go up. As they should! You have adopted your response bias to the prior probability of there being a signal. A good plot for inspecting the raw data of this experiment is the so-called *receicer operating characteristic* or *ROC*, for short. It's a plot of the hit rate as a function of the false alarm rate.
%% Cell type:code id: tags:
``` python
plot_roc(df)
```
%% Cell type:markdown id: tags:
The small text labels tell you from which condition each data point orginated. In the plot you also see approximate 95%-confidence intervals for each hit rate and false alarm rate (±2 standard errors of the mean). These confidence values give you some idea about the range the true hit rate and false alarm rate could be in based on the limited number of trials you did.
Remember that our motivation was to manipulate your response bias and see how this influences your hit rate and false alarm rate. Apparently you can make a trade-off between hits and false alarms. A higher hit rate always goes together with a higher false alarm rate. That much we suspected beforehand. But the ROC-plot shows you more. It shows you quantitatively how you traded off hits vs false alarms. You might find the name ROC strange. It's a term from signal processing. Say, you want to send a signal over a cable. The cable adds noise to the signal. At the other end of the cable you have a receiver. The receiver (a human or a machine) has to do a yes-no task and decide whether the signal was there or not. Different receivers might have different operating characterstics (i.e. can achieve different trade-offs between hits and false alarms). Hence the name.
When we measured psychometric functions we used proportion correct as a measure for the sensitivity. A good measure for sensitivity should not depend other factors, like the prior probability of the signal and a subject's response bias. We did this experiment to check whether *pc* changes with our response bias manipulation. So is *pc* constant? Unfortunately, looking at the *pc* column in our data table above is not very helpful to answer this question. The values are hard to compare because when $p=0.1$ then even if you do not see anything and you just always respond *no* you would get 90% correct. In the condition where $p=0.3$ this strategy will only give you 70% correct. So we first need to correct $pc$ for the different prior probabilities. Since we measured the conditional probabilities $f$ and $h$ we can compute the hypothetical *pc* that you would have gotten if half the trials had been signal trials and half the trials noise-only trials:
This new proportion correct for different values of $p$ varies, but it's hard to say whether that variation is due to $pc$ changing with our experimental manipulation or whether that's just due to chance. If $pc$ really was constant even if the response bias of a subject changes then any increase in hits will have to go along with exactly the same decrase in correct rejections (or equivalently the same increase in false alarms). Whether this is really the case will be easier to judge if we plot the ROC-curve that goes with a certain $pc$. Rearranging the last equation
$$
h = (2pc-1)+f
$$
gives the hit rate $h$ as a function of the false alarm rate $f$. The combinations of $h$ and $f$ that lie on this straight line have the same proportion of correct responses but with different trade-offs between hits and false alarms. We can now plot this equal-*pc* line on top of the data and try to judge whether it's a good fit.
%% Cell type:code id: tags:
``` python
plot_roc(df)
# plot the roc-curve for a constant pc of 0.75
k=(2*0.75-1)
plot([0,1-k],[k,1],'k:')
# plot the roc-curve for a constant pc with the mean of the data
First look at the dashed line. This is the line where $pc$ is constant at 75%. If your performance did not change from the first experiment where we measured your psychometric function, this is the line we would expect if *pc* was indifferent to our experimental manipulation of the prior probability. Convince yourself that *pc* is really 75% for the points on this line. Since we expect that due to learning or fatigue your *pc* in the second experiment might be slightly different from the first experiment, the blue line shows the roc-line for the mean *pc* over all experimental conditions. In my data there were two data points that were clearly very far away from the this line, i.e. their 95%-confidence intervals did not overlap with it. If that's not the case for you, you will have to do some more trials in the experiment to get more precise measurements of $h$ and $f$ and see where they actually lie in the plot.
With enough trials you will, hopefully, also conclude that *pc* is not invariant under our experimental manipulation of the prior probabilities, and hence is not a good measure for a subject's perceptual sensitivity. In fact, my five data points don't look like they lie on straight line at all. If only we could describe the exact shape the ROC-curve theoretically... the homework for next week is to do so using signal detection theory. Or more precisely, the *equal-variance Gaussian model* of signal detection theory (see `signal_detection_homework.ipynb`).