Regressions

In our discussion of charts and graphs, we observed that states with lower minimum wage rates tend to have lower employment rates.

Here, we use regression analysis to estimate the effect of state minimum wage rates on state employment rates while controlling for the effect of inflation, average annual pay and state and year fixed effects.

Those estimates (summarized in the table below) suggest that we can reject the null hypothesis of zero correlation between state employment rates and state minimum wage rates.

In addition to the positive correlation between state employment rates and state minimum wage rates, there is also a positive correlation between state minimum wage rates and average annual pay. (For details, see my working paper). The positive correlations among state employment rates, minimum wage rates and average annual pay are reflected in the first regression in the table above.


Log Odds of Employment – 50 USA States
method:   two-step weighted least squares

(2001-​2016) (2001-​2016)

Policy:
ln( State Min. Wage ) 0.076 ** 0.063 *
(0.025) (0.027)
Economy:
Inflation Rate 0.070 0.003
(0.045) (0.048)
ln( CPI index ) –1.071 · 0.620
(0.614) (0.642)
ln( Avg. Annual Pay ) 0.788 *** – –
(0.071)

observations 800 800
fixed effects state & year state & year
R^2 0.956 0.948

Because it is possible that state employment rates and average annual pay are simultaneously determined, I also ran a regression which excludes average annual pay from the model. In the second regession, the coefficient is smaller, but the positive correlation between state minimum wage rates and state employment rates remains.

In a future analysis, I will use instrumental variables estimation to further explore the relationship between employment rates and average annual pay.

But for now, let's focus on the proportions data in this model.

Whenever our dependent variable is a proportion or percentage, we must model the effect of the explanatory variables on the underlying "true" probability that the proportion reflects. Here, the state employment rate reflects the employment probabilities of each working-age adult in the state.

We wish to estimate those probabilities, but because regression assumes linearity, we must convert the state employment rates to log odds and estimate the effect of the explanatory variables on the log odds of employment.

And to account for the heteroskedascity present in such a model, we first write a function that implements a two-step weighted least squares estimation strategy:

## function to run two-step weighted least squares
wls <- function( formula , data ) {

    ## run the first-step regression
    StepOne <- lm( formula = formula , data = data )

    ## compute the weights for step two
    data$StepOneProbs <- exp(StepOne$fitted.values) / (1+exp( StepOne$fitted.values ))
    data$StepTwoWghts <- data$CivPop * data$StepOneProbs * (1-data$StepOneProbs)

    ## run the second-step regression
    StepTwo <- lm( formula = formula , data = data , weights = StepTwoWghts )

    ## return the results
    list( Results = StepTwo , formula = formula , data = data )
}

Then we specify a regression model:

## make regression model formula
rmodel_apay <- paste(
    "lno_emp ~ ln_state_minw + cpi_inflation + ln_cpi_index + ln_avg_annl_pay ",
    StateDums , YearDums_later , sep = " + ")

And run the regression:

## run the regression
yrs_later <- which(dta$year > 2000)
tswls_apay <- wls( formula = rmodel_apay , data = dta[yrs_later,] )

More detailed examples can be found in the R script that I wrote for this analysis. And in our discussion of the Perl language, we will identify and edit patterns in text to assemble the dataset used for this analysis.

If we can assume that the trends in our data are linear and if we can assume that the residuals of our regression model are not correlated with the explanatory variables, then we can assume that our least squares estimates of the regression coefficients will be unbiased. Under such conditions, regression analysis provides us with unbiased estimates of the marginal effects of the explanatory variables.

And if we can also assume that the residuals exhibit constant variance and are uncorrelated with each other, then we can also use regression analysis to test hypotheses about the relationships between one or more explanatory variables and the dependent variable.

But as datasets grow larger, the need to make these assumptions -- the Gauss-Markov assumptions -- grows smaller. In large datasets (like the Vision Zero data, which we will explore next), we can use cross-tabulations to empirically observe trends and distributions. We do not have to make any assumptions about the shape of the relationship or the distribution of the residuals.

Copyright © 2002-2024 Eryk Wdowiak