The meta-analytic results for recent years are [...] counter-intuitive
November 11, 2023 11:23 AM   Subscribe

A meta-analysis of stability and change in gender discrimination over time. How widespread is gender discrimination in hiring and selection, and have at least some human societies experienced meaningful change towards greater equality of opportunity? These intertwined questions represent two of the most theoretically rich, practically important, and politically controversial scientific issues of our time.
posted by Sebmojo (19 comments total) 12 users marked this as a favorite
 
I ain't a statistician....but a single link to a technical meta-analysis in human behavior - probably one of the hardest fields to do good quant studies in - w/o any commentary? Really?

If you read the twitters of the study authors, they seem legit (albeit legit behavioral economists).....but I want some peer commentary for the laypeople, from people without axes to grind. Which is all you can find if you look for, for example, "Gender Audits Forecasting Collaboration".......
posted by lalochezia at 11:56 AM on November 11, 2023


A meta-analysis of studies taken from a field going through a replication crisis is probably meaningless. But then economists also thought if you build a big enough mountain of bad loans you can rate it triple A.
posted by interogative mood at 1:12 PM on November 11, 2023 [3 favorites]


This is a fascinating meta-analysis, and I would not have found out about it without this post.

A few notes for those who are interested. First, it only concerns callback rates for job applications, not hiring, nor promotion, nor termination rates. Second, the article considers studies from across the globe, so it's impossible (as far as I can tell) to draw conclusions about any particular country. Third, counteracting biases may mask one another, it appears. As the authors say, "there exists wide variability in current hiring practices such that discrimination against women is present in some contexts and organizations, and discrimination against men in others."

The methods sections are really interesting -- at least for such sections. Most notably, the researchers hired a separate "red team" of five experts to critique their methods prior to conducting the study. So, neither the authors nor the hired devil's advocates knew the results before conducting the study. They also tried to assess whether there is publication bias (i.e. the possibility that certain studies were not published because the results were "uninteresting" to the editors of scholarly journals). The data, the authors say, is available for replication of their results.

The abstract summarizes the top-level results of the study. I found it interesting that "forecasters," both lay and scientific, underappreciated how much progress has been made in eliminating gender bias in the application process of both mixed-gender and male-stereotypical jobs. Again, this study says nothing about who is hired, promoted, or fired, but (in my opinion) it's OK to celebrate some progress.
posted by ferdydurke at 2:28 PM on November 11, 2023 [19 favorites]


What I found odd about this article was what seemed like a mismatch between the rigour across different aspects of the study. I mean they're all "pre-registered! Red-team! Meta-analysis!" and then they use the word "job" to mean both "job" and "occupation," which is a pretty significant distinction and it's bizarre that researchers working in this field wouldn't distinguish. There are lots of gender-balanced occupations that are nonetheless not well balanced at the job level. And then they threw all the countries into the same models. I mean even with country-level controls, unless you're interacting country by each variable (which is then mathematically the same as having separate models), this seems iffy. It's weird their red team wouldn't mention it.

Also, I would love to have seen some sort of measure of job-level (entry-level, experienced, senior, etc.). I know that one finding you see with female-dominated (note that's different than female-stereotyped, but they can go together) jobs is that men in those jobs are promoted more quickly. The hypothesis when this was first discovered was that people perceive men as the people who should be in charge in those workplaces, not the ones who should be doing the grunt work.

Including author gender in some of the models feels icky AF to me.

Anyway, this was interesting.
posted by If only I had a penguin... at 2:31 PM on November 11, 2023 [4 favorites]


Also, I'm unaware of a replication crisis in OB. It seems like a field that wouldn't necessarily lend itself to a replication crisis because the kind of findings made aren't necessarily expected to replicate?
posted by If only I had a penguin... at 2:49 PM on November 11, 2023 [1 favorite]


And then they threw all the countries into the same models

There’s no harm here. They use a multilevel model.
posted by MisantropicPainforest at 5:07 PM on November 11, 2023 [2 favorites]


Again, this study says nothing about who is hired, promoted, or fired, but (in my opinion) it's OK to celebrate some progress.

You can’t ethically do an audit experiment to answer those questions.
posted by MisantropicPainforest at 5:08 PM on November 11, 2023 [1 favorite]


There’s no harm here. They use a multilevel model.

Right but not all multilevel models are the same. Unless I'm missing it in a note or citation (and i'm on my phone do maybe I am) they don't state their full model. Given that they say they use the multilevel model to account for the nested nature of the data that sounds like a multilevel model that is more about partitioning the error than about allowing all the parameters in the model to vary by country (the interact every variable by country approach).

Ok...I just went back to make extra sure they don't spell out their full model before I embassy myself. They don't spell out the model. But they do say it's a random effects model. All that does,is give each country a different intercept it doesn't allow the coefficient parameters to vary by country ...

Ok wuickngoogke to again make sure I don't embassy myself and it seems the term random effects is used differently in different fields and different packages and dometimes does mean random slopes (I've published random effects models in ob-adjacent journals and it was only the intercepts that clustered). But I note that it seems beyond implausible that the changes over time haven't varied across countries, which a random slopes would have shown (though not described) and it would be very odd if their models showed different trends by country and they didn't mention that.

I think this is likely good-ish news (I mean no evidence of discrimination in callbacks would be better) but it's a very strange study nonetheless.
posted by If only I had a penguin... at 5:38 PM on November 11, 2023 [3 favorites]


But they do say it's a random effects model. All that does,is give each country a different intercept it doesn't allow the coefficient parameters to vary by country ...

You can also often set mixed models to model interactions between random and fixed effects, or Bayesian style "x given y" type effects. Obviously you want to use a good approach for model selection, but I don't think it's beyond the pale to use this approach.

I am way too tired to read the article in detail today though, altho I would really like to...
posted by sciatrix at 6:25 PM on November 11, 2023 [1 favorite]


Behavioral Economics has been hard hit by the “reproducibility crisis” because a lot of their models were based on economic research and/or psychology research looking at behavior and decision making that later proved to be made up or not reproducible. Here is a recent example
posted by interogative mood at 6:41 PM on November 11, 2023 [1 favorite]


This is not behavioural economics, this is organizational behaviour which is an entirely different and not particularly related field.

And yes, multilevel models can have interactions so you have different slopes. But like I said, they didn't spell out their model so who knows if they did that. But if they had done that it seems certain that they would have found different slopes in different countries and if they found different slopes in different countries, it would be very odd not to report and discuss that.

But none of that is what I came back to say. What I came back to say is this: Let's say they did all their multilevel setup perfectly for mixing all the countries together. Let's go even further and say they did the methodologically near-impossible and controlled for gender composition of jobs instead of gender-associations of occupations. The ultimate question here is "How has the overall-rate of gender-based discrimination in call-backs in the whole world changed?" [or rather that subset of the world that does audit studies]. I don't think this is a particularly useful question to answer. We could get the world's overall odds ratio for men/women getting callbacks to 1 and still have messed up stuff going on in every single country. "The world's overall rate" is just not a very elucidating dependent variable.
posted by If only I had a penguin... at 6:55 PM on November 11, 2023 [1 favorite]


We could get the world's overall odds ratio for men/women getting callbacks to 1 and still have messed up stuff going on in every single country.

Yeah that’s not fair. The goal of this paper and research isn’t to get rid of or even measure all the messed up stuff going on in a country.
posted by MisantropicPainforest at 7:01 PM on November 11, 2023 [2 favorites]


Yeah that’s not fair. The goal of this paper and research isn’t to get rid of or even measure all the messed up stuff going on in a country.

I understand that. My point is that the thing that IS there point is not very useful because it's not useful or interesting information to know the world-wide odds ratio or men:women getting callbacks and how it's changing (which I think is the goal of the paper). I mean they can't even speak to two questions in the text of the post

"How widespread is gender discrimination in hiring and selection, and have at least some human societies experienced meaningful change towards greater equality of opportunity?"
posted by If only I had a penguin... at 7:15 PM on November 11, 2023 [2 favorites]


I guess I think it’s important whether they’re gender parity in who gets through the first stage of a hiring process? I fail to see how it’s a useless thing to know.
posted by MisantropicPainforest at 7:19 PM on November 11, 2023


I guess I think it’s important whether they’re gender parity in who gets through the first stage of a hiring process? I fail to see how it’s a useless thing to know.

But we don't know that when we lump everything together. If half the countries (or occupations, or workplaces ) give a 25% boost to one sex and the other half give a 25% boost to the other then we would have overall parity. Or overall parity might mean that gender never predicts the odds of getting through. But if you just calculate the overall level you don't know which of those is going on. So it's a number that can't tell you what's going on. It can't tell you "how widespread is gender discrmination" (because an odds ratio of 1 would be consistent with both no sex discrimination AND widespread discrimination) and it can't tell you "have at least some societies experienced meaningful change" (ok, some findings on this might give you a "yes probably" answer to this, but you can't get a "no" and you can't get anything beyond "yes probably" (like which societies, what has the change been etc. etc.))

The point is that this overall number is consistent with so many different things and can't distinguish between those things. That's why it's not useful.
posted by If only I had a penguin... at 7:56 PM on November 11, 2023 [1 favorite]


They don’t lump everything together. In the SI they have the odds ratio over time and plot the individual treatment effects. The argument that an ATE is irrelevant because there may be treatment effect heterogeneity is an odd one and I’ve never encountered it during a decade of doing applied causal inference research. There are going to be some ITEs that are higher than the ATE and some that are lower, but that’s by definition going to happen. They’re models are all spelled out in the supplementary appendix, R code and all.
posted by MisantropicPainforest at 8:13 PM on November 11, 2023 [4 favorites]


Appreciating the discussion here, as I am not a statistician, and had ambivalent response to the methods of TFA that I do have experience with.
posted by rrrrrrrrrt at 8:31 PM on November 11, 2023 [1 favorite]


Thanks, didn't look at the appendix. The point isn't that there's heterogeneity, the point is that it's systematic. It's not "some workplaces discriminate and some don't" Like ok, imagine you're testing a drug. You give the drug to a bunch of sick people and then look at the overall 5 year survival rate. You're really fine with just looking at the overall and not breaking it down by what disease they had? If the people who take the drug have the same 5 year survival as the control you don't want to know that it people with diabetes had 30% lower odds of dying and people with high blood pressure dropped dead as soon as they opened the bottle? This is the same thing: The "treatment" by its nature is going to have different effects in different contexts and one of the relevant context variables (disease, country) is pretty easily measured.
posted by If only I had a penguin... at 8:42 PM on November 11, 2023 [1 favorite]


Seems like that is a different research question, for a different paper. I don’t think it’s fair to critique papers for asking Y when you want them to ask X.
posted by MisantropicPainforest at 5:20 AM on November 12, 2023 [4 favorites]


« Older The Great Cajun Turtle Heist   |   From Ukraine with Love Newer »


This thread has been archived and is closed to new comments