Regression with very small sample size

I want to run a regression with 4 to 5 explanatory variables, but I have only 15 observations. Not being able to assume these variables are normally distributed, is there a non-parametric or any other valid regression method?

asked Sep 20, 2014 at 8:12 131 1 1 gold badge 1 1 silver badge 3 3 bronze badges

$\begingroup$ There's no assumption that any of the explanatory variables are normal. There's no assumption about the marginal distribution of the response either. If you're doing CIs or hypothesis tests, the usual inference assumes conditional normality of the response. More important are the assumptions of linearity and constant variance. What does your response consist of (/why won't it be normal)? $\endgroup$

Commented Sep 20, 2014 at 8:20

$\begingroup$ No. You don't have enough data. This is exploratory analysis. You may well seen suggestive relationships. But you should avoid p-values, confidence intervals and hypothesis testing. $\endgroup$

Commented Sep 20, 2014 at 15:52

1 Answer 1

$\begingroup$

@Glen_b is right about the nature of the normality assumption in regression 1 .

I think your bigger problem is going to be that you don't have enough data to support 4 to 5 explanatory variables. The standard rule of thumb 2 is that you should have at least 10 data per explanatory variable, i.e. 40 or 50 data in your case (and this is for ideal situations where there isn't any question about the assumptions). Because your model would not be completely saturated 3 (you have more data than parameters to fit), you can get parameter (slope, etc.) estimates and under ideal circumstances the estimates are asymptotically unbiased. However, it is quite likely that your estimates will be a long way off from the true values and your SE's / CI's will be very large, so you will have no statistical power. Note that using a nonparametric, or other alternative, regression analysis will not get you out of this problem.

What you will need to do here is either pick a single explanatory variable (before looking at your data!) based on prior theories in your field or your hunches, or you should combine your explanatory variables. A reasonable strategy for the latter option is to run a principal components analysis (PCA) and use the first principle component as your explanatory variable.