Google's Disconcerting Project Aristotle
NYT <a href="https://www.nytimes.com/2016/02/28/magazine/what-google-learned-from-its-quest-to-build-the-perfect-team.html?smid=pl-share">released an article reporting Google's findings from an internal research project called Project Aristotle. It found that technical skill doesn't matter that much in team building. They found the key factor was psychological safety. This article discusses some reasons I find it hard to trust Google's results.
Four key issues are front and center:
- The actual data is inaccessible
- Replication fails
- There is evidence that Google developed a pattern of rejecting evidence-based research and systematically selects progressive results
- Google seems not to be using their own findings
This is straightforward. On <a href="https://rework.withgoogle.com/guides/understanding-team-effectiveness/steps/introduction/">Google's team effectiveness reWork page they describe Project Aristotle and link the NYT article, but they do not provide access to the source data. They consider it proprietary. This makes replication and even understanding Google's results exceedingly difficult. 8 important questions come to mind:
- What do they mean when they say psychological safety is the key factor?
- What share of what outcome(s) were explained?
- How robust is the finding?
- Compared to what other factors?
- What is the degree of confidence?
- What actual measures were used to operationalize the concept?
- What is the optimal financial amount to invest into psychological safety?
- Is this factor even subject to standard diminishing marginal return?
A result is trustworthy to the degree that is it replicable and well-defined, which entails that trustworthy results must be accompanied by open data. As a market leader in technology and data, Google knows this and open data is entirely feasible for them. The fact that they omit such data makes it suspect above and beyond the simple fact that replication fails anyway.
2. Replication Fails
Google isn't the only company that studies team effectiveness. I have a bit of data about programmers I work with. I have also contributed to hiring and firing decisions. I can confidently state the following:
- New hire technical skill causes management trust, and therefore causes the preconditions for psychological safety.
- Technical skill is the most important factor in team effectiveness for development projects.
- Completion of work requirements on time
- High customer satisfaction
- Achieve reasonable profit
- Understanding requirements
- Client research activities
- Contracting defined scope early on
- Preventing scope creep
- Iterative adaptation
- Developer effectiveness
- Technical skill
- Work ethic
- Communication skill
While there is some role for soft factors including psychological safety, they are largely non-central. Simply contrast these two extremes:
- A friendly, communicative team with no technical skill
- A technically skilled, hard working team, which is subject to high penalties for unapproved activities
My team measures developer effectiveness in a number of ways including:
- Story points completed per time
- Inverse bugs generated per time
- Ratio of accepted to rejected pull requests
They looked at lines of code written, bugs fixed, customer satisfaction, and more. But Google’s leaders, who had initially pushed for objective effectiveness measures, realized that every suggested measure could be inherently flawed - more lines of code aren’t necessarily a good thing and more bugs fixed means more bugs were initially created.3. There is evidence that Google developed a pattern of rejecting evidence-based research and systematically selects progressive resultsInstead, the team decided to use a combination of qualitative assessments and quantitative measures.
I love how the explanation at the end of the last section describes the selection of the definition of effectiveness. It states that Google's leaders called for the change from objective to subjective measures. It fits my theory perfectly, which is that company leadership is selecting preferred results, and it seems to indicate that objective measures produced some other result, such that a change in study was required. There is also no justification for the finalized mix of qualitative and quantitative outcomes selected for the definition of effectiveness. It's as if the research team's justification is basically, "Well, that's what leadership picked."
While this appears to be grossly non-rigorous, it's not the first example of such anti-science behavior. I note four additional points in this section, for a total of five, which is enough for me to claim a pattern.
- Academic research, such as Epstein and Robertson 2017, has identified a left bias in Google search results
- Tangentially, reports are that Facebook is also biased, and therefore it may be a trend in Silicon Valley and among other big name tech firms as well.
- Shapiro notes that rotating Google doodles throughout the year favor leftist causes.
- Project Oxygen concluded that soft skills matter more than STEM for top employees.
- Some of the incorrectness of this result is attributable to misreporting in the media. For example, Project Oxygen specifically looked at managers rather all employees or developers in particular. No one expects STEM to be more important than a creative writing degree for a professional writer. Most people think STEM matters for developers.
- Even so, Google once again failed to permit data access or provide clear meaning about certain factors.
- They also engaged leadership-directed, leading (in the sense of determining the answer beforehand) research questions. As an example, Michelle Donovan, one of the original researchers, said the guiding question shifted from “Do managers matter?” to “What if every Googler had an awesome manager?”
- Their findings basically amount to \"We discovered good managers don't piss off developers, \" which should have revivified the original question about whether managers as a pure role are desirable at all. Instead, leadership decided of their own accord that a management layer would exist and the research team needs to figure out how to ensure it is composed of good managers. They didn't consider alternatives.
- Google fired the author of the Google Memo and Google's leadership participated in his public humiliation, despite the fact that the Google Memo is largely correct.
- Also see Vice/Motherboard and Wikipedia for memo references, other than the actual PDF copy linked above.
I value diversity and inclusion, am not denying that sexism exists, and don’t endorse using stereotypes...Google’s political bias has equated the freedom from offense with psychological safety, but shaming into silence is the antithesis of psychological safety...This silencing has created an ideological echo chamber where some ideas are too sacred to be honestly discussed...Differences in distributions of traits between men and women may in part explain why we don't have 50% representation of women in tech and leadership...Discrimination to reach equal representation is unfair, divisive, and bad for business...Of course, I may be biased and only see evidence that supports my viewpoint. In terms of political biases, I consider myself a classical liberal and strongly value individualism and reason. I'd be very happy to discuss any of the document further and provide more citations.The memo alleges a left bias, where left bias is defined according to the included chart, and it seems accurate imo:
<img class="aligncenter size-full wp-image-6507" src="http://www.afterecon.com/wp-content/uploads/2018/01/left-bias.png" alt="" width="556" height="125" />
4. Google seems not to be using their own findings
Google claims to be concerned about psychological safety, but they fired James Damore for creating the Google Memo. That's an example of thought intolerance or policing, and thought intolerance reduces diversity as well as psychological safety. Google also continues administering technical tests in order to get an interview.