Install Theme

Figure 1a shows how incidents happened substantially less on Saturday and Sunday even though traffic to the site remains consistent throughout the week. Figure 1b shows a six-month period during which there were only two weeks with no incidents: the week of Christmas and the week when employees are expected to write peer reviews for each other.

These two data points seem to suggest that when Facebook employees are not actively making changes to infrastructure because they are busy with other things (weekends, holidays, or even performance reviews), the site experiences higher levels of reliability.

(seen here)

  1. stumpyjoepete reblogged this from nostalgebraist and added:
    I work for {redacted}, which is a company similar to but distinct from fb. Here’s my guess for how things work at fb:...
  2. poipoipoi-2016 reblogged this from nostalgebraist and added:
    The way I’m reading this is Continuous Delivery and/or (Near-)Continuous Deployment (after truly continuous deployment...
  3. furioustimemachinebarbarian reblogged this from nostalgebraist and added:
    There is an easy way to meet a goal of minimize “production incidents” but there is also a much harder way. I’ve worked...
  4. nostalgebraist reblogged this from stumpyjoepete and added:
    There’s also the question of what your error budget should be, once we’ve decided it should be nonzero (which it...
  5. puddleofchaos reblogged this from mumblingsage
  6. mumblingsage reblogged this from nostalgebraist
  7. guavaorb reblogged this from nostalgebraist
  8. ireneae reblogged this from nostalgebraist