ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Checking assumptions and Categorical predictors
    Inferential Statistics from Amsterdam 2021. 6. 21. 09:46

    In this blog, I'm going to talk about assumptions and categorical predictors.

     

    1. Checking assumptions : when i dealt with linear regression, i said that there are 4 assumptions to have to be met, which were linearity, normliaty, homoscedasticity and independence. Here, it also requires assumptions to use the test. Here, i'm going to check what assumptions there are.

     1-1) Linearity : the relation between each predictor and response variable is linear for any given combinatino of values of the other predictors. In this case, we look at the residuals plotted against that predictor for each predictor. The residuals should be scattered around the value zero. If you see a curved pattern, the relation is not linear.

     1-2) Homoscedasticity : this also has the same way as the linearity does. In other words, to check whether this assumption is met, you're going to use eyeballing. This assumptions requires that for each predictor, the varaibility of the residuals is the same over the entire range of values for that predictor. So the variation in prediction error should be the same for young cats and old cats. If the residuals fan out at some point, then the assumptino of homoscedasticity is violated and regression analysis shouldn't be performed at all.

     1-3) Independence of error : It means that the residuals can't be related to each other. Random sampling, or random assignment in experiments usually ensure this assumption.

     1-4) Normality : This assumptions requires the residuals to be distributed normally. To check this, we look at a histogram of the residuals.

     ** Note that deviatino from normlaity is not a problem as long as the sample is large enough and we use two sided tests when we conduct individual t-tests. If your sample is small and the distribution is highly skewed, then you shouldn't perform multiple linear regression.

     ** As a rule of thumb, a technical requirement is that you need at least ten observations for each predictor in your model.

     1-5) The absence of influential regression outliers : Regression outliers can substantially alter the value of regression coefficients. As a rule of thumb, you should inspect standardized residuals more extreme than minus and plus 3, and only remove an extreme case if there's a clear reason why the data are invalid and should not be in the data set.

     

    2. Categorical predictors : I've talked so many things about a linear regression. When I dealt with all the stuff, I only assumed that all the variables are numerical. But In this part, I'm going to talk about categorical predictors, called indicators.

     2-1) categorical predictors : In lienar regression, predictors have to be quantitative. Categorical predictors are not allowed because the regression coefficients indicate by what amount the response variable changes with a one unit change in the predictor. There's one exception. If the categorical variable is binary, then there's just one distance between the two scale points. Such a binary predictor is called an indicator. An indicator represents not the quantity of a measure property, but it's quality.

     2-2) dummy variables : the indicates used to represent the categories are referred to as dummy variables. 

     ** Remember, you always need one less dummy variables thant there are categories to represent. Actually, if you include an extra dummy variable, you'll violate one of the technical requirements of regression that there should be no redundancy in the predictors.

     ** Note that multiple regression forces the line to be parallel. In other words, it forces the predictors to be independent.

Designed by Tistory.