Early Thoughts on Optimal Performance Review

John Vandivier

This article lists some thoughts on optimal hiring, firing, and performance review policy.

  1. I do like what Netflix does: At any time a report can bring an offer letter to their manager and the manager is free to modify compensation.
    1. In their words: \"To help us attract and retain stunning colleagues, we pay employees at the top of their personal market.\"
    2. I generally like their culture of \"unusual freedom\"
  2. Broadly, aggresive policies may or may not improve productivity but they do increase social anxiety (ie, reduce psychological safety)
    1. I'm not saying such policies shouldn't be used. Rather, I'm saying careful holistic measurements need to be verified to prove marginal utility and justify extent and kind of use.
      1. I might argue they form bad default policies and shouldn't be used under a veil ie without evidence that they work in a particular organization. To obtain evidence, I would introduce such policies as a gradual experiment to a small, ideally somewhat randomized, subsample of employees.
      2. Perhaps you could take numeric employee IDs and shard by first number to obtain ten sampling groups. Then aggregate evens and odds for a 50% sample. You could gradually deploy policies from 10% to 50% then 100%.
    2. Unregretted attrition is one such policy. A 0% unregretted attrition has psychological safety benefits, but it also may cause productivity loss relative to a higher number.
      1. 0% URA is nice in a way in that it becomes the employer's responsibility to upskill and caretake for low performers. Of course, this comes at a cost.
      2. Higher URA potentially increases new hire access, which is nice in a different way. Of course, raising URA too much creates general social anxiety which is not optimal and in fact quite painful and carries firm reputation implications alongside it.
      3. As a reference, Amazon's URA has floated around 4-6% and among software engineering leaders they are considered notoriously poor employee experience and PIP Culture in causal connection with having such a high number.
    3. Forced Ranking is another aggro perf review process
  3. Many tech companies, perhaps prototypically including Google, are known for leveraging employee perks to increase a variety of things such as productivity and retention.
    1. Sometimes these perks are seen as employer gimmicks or ways for an employer to essentially underpay employees.
    2. I disagree. I think employers can and should leverage their scale to obtain bulk deals where they believe it would be a win-win for employees. Ie, the employee values the benefit over where the employer must pay given bulk pricing.
    3. I do think employers need to be careful about locking themselves in with these. One ex-Google employee disclosed that their manager was unable to pay them more in exchange for less perks, but this particular employee didn't value the perks. We can see that this perk game becomes culture-reinforcing and tends to repel individuals with organizationally abnormal preferences or performance patterns. This might be a D&I concern, an organizational agility concern, and an access concern (ie we don't hire juniors bc they don't contribute enough to justify perk expenditure) if perks get too voluminous
  4. There are tools that collect code metrics and anecdotally many engineers I've spoken with hate the things.
    1. Employee satisfaction drives retention and has major value, so it might be rational to avoid such tools simply on the grounds that engineers tend not to like them.
    2. Alternatively, they could be quite useful and could potentially be modified so that they don't irritate engineers so much. For example, fully automated metric collection that doesn't require any data input from engineers.
    3. Below are a few such tools I haven't investigated too deeply. We can also easily use git directly and create custom metrics. I think a multi-metric approach is key because virtually any lone metric or pair is gameable, but I think it's insightful and hard to game once three or metrics are involved. This would supplement typical anecdote-oriented review, not replace it.
      1. A few interesting metrics are commit rate, PR rate, PR acceptance share, PR participation rate, lines of code written (relatively poor metric, but not obviously 0-value), and percent of authorship (one of my favorites; far better than LOC I think).
      2. Team-level metrics and agile metrics are important too eg defect density, number and severity of production issues, velocity, cycle time, revenue, cost, etc
      3. Ideally, individual performance would tie back to individual compensation. The idea that level rubrics are created and individuals pinned to level etc is a kind of purely inefficient indirection that creates statistical noise if we were to try and predict performance or optimize compensation (w = MPL)
      4. https://techcrunch.com/2020/12/24/reading-between-the-lines-how-data-will-optimize-your-developers-annual-reviews/
      5. https://dev.to/austinblade/see-how-much-you-coded-in-the-past-year-with-git-85l
      6. https://linearb.io/blog/git-analytics/
      7. https://www.gitclear.com/content/6_ways_gitclear_can_improve_your_developer_performance_reviews
  5. I think two reviews per year is a great cadance.
    1. In part bc reviews and raises too far apart is a retention issue.
    2. Maybe one review per year works if you pair it with Netflix's personal top of market philosophy.
  6. Interview process thoughts
    1. Loving Karat atm
    2. I like how Facebook let me pick my own interview style (BE or FE eng) but I didn't like that the FE path had poor internal and external prep material, and also that there was no ability to share pass rates for either. I think a strategy under a veil is use the industry norms but as we grow our data we can empower candidates to pick what works for them, but balancing that with making sure we don't trick them by letting them pick an option where we know they will perform worse.
    3. I like Google's Four Interview Rule, but own-data might derive a different interview curve and also own firm preferences might require more or less than 86% rigour.
    4. I'm generally a fan of code test screening tools. Usually we can tailor them to make the questions easier or harder as needed, so as long as we are mindful not to set the bar too high then I think it's an efficient way to eliminate very low performers.
      1. I do mindfully take this approach: Use screening tools to eliminate low talent, don't try to use them to identify top talent. Screening tools will over-optimize on a specific coding game. They miss essentially all soft skills and they also poorly test on most hard skills. I'm currently a fan of work trials to identify top talent, with structured interviews as a fall back. Again: Automated code tests are valuable because they weed out very poor performers, and so the bar should be set low. They are not valuable because they identify top performers, in which case setting the bar low would be counter-productive or mistaken.