An Introduction to Statistical Learning

I highly recommend the book An Introduction to Statistical Learning (ISL).
I used the "with applications in R" version, but I imagine the "with applications in Python" version is just as useful. The PDF is available online for free. The code used in the book is available online for free. There are videos to help explain each chapter available online for free.

ISL starts by quickly teaching you the fundamentals of statistical thinking, starting with philosophy of science and philosophy of statistics. Basically: we model things, but there is always error and the models are not reality.

ISL teaches you the General Linear Model (GLM).
Undergraduates are often taught in a way that seems backwards to me. You may learn about t-tests, then ANOVAs, but these are often taught as if they were completely different tests. Undergraduates tend to be unaware that these tests are special cases of the GLM.

By reading ISL, you will understand the GLM first, then learn its special cases.
This way, you will understand the fundamentals of the most common tests and how they are all related to basic correlations.

When you only understand t-tests and ANOVAs, you don't understand the GLM.
When you understand the GLM, you understand t-tests and ANOVAs automatically.

Analogy to Number Theory

The logic of leaning about the GLM first is like learning about Real numbers straight away rather than learning about Natural numbers, then learning about Real numbers later.

Real numbers are any decimal-number,
e.g. {-1.1, 0, 1, √2, pi, ¼, 5.555..., etc.)
Natural numbers are just the positive integers,
e.g. {1, 2, 3, 4, 5, etc.)

When you understand Real numbers, you automatically understand Natural numbers because all Natural numbers are Real numbers (just like all t-tests and ANOVAs are GLMs).

The reverse is not true: when you only understand Natural numbers, you still don't understand all Real numbers because many Real numbers are not Natural numbers (just like learning t-tests first doesn't help you understand GLMs in general).

Info

ISL has sufficient information for you to become a well-educated undergraduate in psychology. If you want to learn the deeper math behind the statistics, you should check out The Elements of Statistical Learning (ESL). This is not necessary for psych.

What to read in ISL

If you pick up ISL, don't get overwhelmed by the table of contents.
You don't need to learn everything in the book!

As a student of psychology, you would start at the beginning of Chapter 1 and you would probably stop in the middle of Chapter 4.

Chapter 1: Introduction. Yes, read this section. It is actually really good. This introduction section made me fall in love with statistics.
Chapter 2: Statistical Learning. Yes, read this section.
Chapter 3: Linear Regression. Yes, read this section. Linear regression will be your "bread and butter" fundamentals.
Chapter 4: Classification. Yes, start reading this section, but you can stop after 4.3 Logistic Regression. You probably won't need the rest of this section during undergrad and most people won't need anything further for grad school, either.
Chapter 13: Multiple Testing. Yes, read this section. This portion of the book is new to the Second Edition. It focuses on hypothesis testing, p-values, errors, and corrections for errors. If you read this section, you will have a better understanding of p-values and null-hypothesis significance testing (NHST) than your peers and many published researchers.

If you want to read into more complex statistical methods, you might read on a bit:

Chapter 5: Resampling Methods. Maybe, but you won't need this in undergrad. You probably wouldn't even use this in grad school.
Chapter 6: Linear Model Selection and Regularization. Maybe, but you won't need this in undergrad. You might be able to use 6.3.1 Principal Components Analysis in grad school and beyond.
Chapter 11: Survival Analysis and Censored Data. Maybe, but you won't need this in undergrad. This chapter might come in handy in grad school, depending on the type of research you do.

Most of the rest of ISL —Decision Trees, Support Vector Machines, Deep Learning, Unsupervised Learning— are not what you need to learn in undergraduate psychology or even graduate psychology. These may come up under special circumstances, but you are unlikely to need them. You would only learn these if you had a specific project where they became relevant. On the other hand, if you are curious and learn them, doing so may spark an idea for research questions you could address using this less common techniques. Using such techniques could set you apart as a researcher. Definitely not required reading, though.

R or Python

Personally, I recommend using the R version.

You might start out uncomfortable with coding, but such is life.
You need to push through ignorance to learn. There are plenty of free introductory courses to learn R and learning R is a skill that will translate even if you pursue other careers.

If you already know Python, feel free to use the Python book.
The same goes if you are volunteering in a lab that uses Python. Python is also a transferable skill and it doesn't really matter which you use. I only recommend R because it tends to be easier for beginners that are new to programming.

Multilevel Modelling

ISL does not teach multilevel modelling (MLM).

MLM goes by a variety of names and is standard practice in most psychological research these days.
MLM is an extension of the GLM that takes into account sub-groups within the data that share variance within their sub-group. Conceptually, this is a bit like "within participants" versus "between participants", but for participant-groups. For example, participants are nested in classrooms, which are nested within schools, which are nested within districts; employees are nested withing teams, which are nested within workplaces, which are nested within countries. In these cases, researchers would use MLM to account for the shared variance of belonging to the same sub-group.

I learned MLM during my PhD statistics course.
Unfortunately, I do not have a good resource for learning MLM to share with you. MLM is not commonly taught to undergraduate psychology students, even though MLM is the most commonly used statistical baseline in real psychology research. Perhaps someday MLM will be taught to undergrads, but not today.

Index

Return to Statistics

Jump to Other Statistical Learning