Joel on Software

Evidence Based Scheduling


by Joel Spolsky Friday, October 26, 2007

Softwaredevelopers don’t really like to make schedules. Usually, they try to get awaywithout one. “It’ll be done when it’s done!” they say, expecting that such abrave, funny zinger will reduce their boss to a fit of giggles, and in theensuing joviality, the schedule will be forgotten.


Most of theschedules you do see are halfhearted attempts. They’re stored on a file sharesomewhere and completely forgotten. When these teams ship, two years late, thatweird guy with the file cabinet in his office brings the old printout to thepost mortem, and everyone has a good laugh. “Hey look! We allowed two weeks forrewriting from scratch in Ruby!”


Hilarious! If you’re still inbusiness.


You want to bespending your time on things that get the most bang for the buck. And you can’tfigure out how much buck your bang is going to cost without knowing how longit’s going to take. When you have to decide between the “animated paperclip”feature and the “more financial functions” feature, you really need to know howmuch time each will take.


Why won’tdevelopers make schedules? Two reasons. One: it’s a pain in the butt. Two:nobody believes the schedule is realistic. Why go to all the trouble of workingon a schedule if it’s not going to be right?

為什麼開發人員不定計劃?兩個原因。一:實在是太他媽痛苦了。 二:沒人會相信計劃是現實的。 如果計劃不正確的話那幹嘛還費那麼大力氣來做呢?。

Over the lastyear or so at Fog Creek we’ve been developing a system that’s so easy even ourgrouchiest developers are willing to go along with it. And as far as we cantell, it produces extremely reliable schedules. It’s called Evidence-BasedScheduling, or EBS. You gatherevidence, mostly from historical timesheetdata, that you feed back into your schedules. What you get is not just one shipdate: you get a confidence distribution curve, showing the probability that youwill ship on any given date. It looks like this:


The steeper thecurve, the more confident you are that the ship date is real.


Here’s how you doit.


1)    Break ‘er down

2)   分解。

When I see aschedule measured in days, or even weeks, I know it’s not going to work. Youhave to break your schedule into very small tasks that can be measured in hours.Nothing longer than 16 hours.

當我看見以天甚至星期為單位制定的計劃時,我知道這肯定是不準確的。你需要把你的計劃分解成用小時衡量的小任務。 所有任務不得超過16小時。

This forces youto actually figure out what you are going to do. Write subroutine foo.Create this dialog box. Parse the Fizzbott file. Individual development tasksare easy to estimate, because you’ve written subroutines, created dialogs, andparsed files before.


If you aresloppy, and pick big three-week tasks (e.g., “Implement Ajax photo editor”),then you haven’t thought about what you are going to do. Indetail. Step by step. And when you haven’t thought about what you’re going todo, you can’t know how long it will take.


Setting a 16-hourmaximum forces you to design the damn feature. If you have ahand-wavy three week feature called “Ajax photo editor” without a detaileddesign, I’m sorry to be the one to break it to you but you are officially doomed.You never thought about the steps it’s going to take and you’re sure to beforgetting a lot of them.


3)   Track elapsed time

4)   跟蹤花費時間。

It’s hard to getindividual estimates exactly right. How do you account for interruptions,unpredictable bugs, status meetings, and the semiannual Windows Tithe Day whenyou have to reinstall everything from scratch on your main development box?Heck, even without all that stuff, how can you tell exactly how long it’s goingto take to implement a given subroutine?

You can’t,really.



So, keeptimesheets. Keep track of how long you spend working on each task. Then you cango back and see how long things took relative to the estimate. For eachdeveloper, you’ll be collecting data like this:


Each point on thechart is one completed task, with the estimate and actual times for that task.When you divide estimate by actual, you getvelocity: how fast the taskwas done relative to estimate.


Over time, foreach developer, you’ll collect a history of velocities.


·   The mythical perfect estimator, who exists only in yourimagination, always gets every estimate exactly right. So their velocityhistory is {1, 1, 1, 1, 1, …}

·   那種神祕的完美估演算法,僅存在於想象當中,也就是每項估算都完全正確,因此他們的速度因子為{1,1,1,1,1,…}

·   A typical bad estimator has velocities all over themap, for example {0.1, 0.5, 1.7, 0.2, 1.2, 0.9, 13.0}

·   一個典型的壞的估算就是在圖中有著各種各樣的速度因子,例如{0.1, 0.5, 1.7, 0.2, 1.2, 0.9, 13.0}

·   Most estimators get the scale wrong but the relative estimatesright. Everything takes longer than expected, because the estimate didn’taccount for bug fixing, committee meetings, coffee breaks, and that crazy bosswho interrupts all the time. This common estimator has veryconsistent velocities, but they’re below 1.0. For example, {0.6, 0.5, 0.6, 0.6,0.5, 0.6, 0.7, 0.6}

·   大多數的估算規模是錯誤的,但是相對估計是正確的。 所有的事情都比預計要花更多的時間,因為估算沒有考慮修正錯誤,委員會會議,咖啡休息然後還有那個總是來打斷的瘋子老闆。這種常見的估算者有著很穩定的速度因子,但他們都小於1.0 例如{0.6, 0.5, 0.6, 0.6, 0.5, 0.6, 0.7,0.6}。

As estimatorsgain more experience, their estimating skills improve. So throw away anyvelocities older than, say, six months.


If you have a newestimator on your team, who doesn’t have a track record, assume the worst: givethem a fake history with a wide range of velocities, until they’ve finished ahalf-dozen real tasks.


3) Simulate thefuture


Rather than justadding up estimates to get a single ship date, which sounds right but gives youa profoundly wrong result, you’re going to use the Monte Carlo method tosimulate many possible futures. In a Monte Carlo simulation, you can create 100possible scenarios for the future. Each of these possible futures has 1%probability, so you can make a chart of the probability that you will ship byany given date.


While calculatingeach possible future for a given developer, you’re going divide each task’sestimate by a randomly-selected velocityfrom that developer’shistorical velocities, which we’ve been gathering in step 2. Here’s one samplefuture:






















Do that 100times; each total has 1% probability, and now you can figure out theprobability that you will ship on any given date.

Now watch whathappens:



·   In the case of the mythical perfect estimator, all velocities are 1.Dividing by a velocity which is always 1 has no effect. Thus, all rounds of thesimulation give the same ship date, and that ship date has 100% probability.Just like in the fairy tales!

·   在那種神奇的完美估算者的情況下,所有的速度因子都是1.處以速度因子為1等於沒有除。然後所有的模擬都會給出相同的釋出日期,然後那個釋出日期有著100%的概率。就像童話裡描述的那樣。

·   The bad estimator’s velocities are all over the map. 0.1 and 13.0are just as likely. Each round of the simulation is going to produce a verydifferent result, because when you divide by random velocities you get verydifferent numbers each time. The probability distribution curve you get will bevery shallow, showing an equal chance of shipping tomorrow or in the farfuture. That’s still useful information to get, by the way: it tells you thatyou shouldn’t have confidence in the predicted ship dates.

·   會的估算者的速度因子在圖表裡到處都是,0.1和13.0一樣可能。每一輪的模擬都會產生非常不同的結果,因為當你每次處以隨機的速度因子的時候你都會得到很不一樣的記過。這時你得到的概率分佈曲線就非常淺薄,意味著你明天釋出的可能性和遙遠的未來發布的可能性是一樣的。雖然還是有那麼點兒用處,順便說一下:它同時告訴你,你不應親信預測的釋出日期。

·   The common estimator has a lot of velocities that are pretty closeto each other, for example, {0.6, 0.5, 0.6, 0.6, 0.5, 0.6, 0.7, 0.6}. When youdivide by these velocities you increase the amount of time something takes, soin one iteration, an 8-hour task might 13 hours; in another it might take 15hours. That compensates for the estimators perpetual optimism. And itcompensatesprecisely, based exactly on thisdevelopers actual, proven, historical optimism. And since all thehistorical velocities are pretty close, hovering around 0.6, when you run eachround of the simulation, you’ll get pretty similar numbers, so you’ll wind upwith a narrow range of possible ship dates.

·   普通的估算者有著大量非常相似的速度因子,例如,{0.6, 0.5, 0.6, 0.6, 0.5, 0.6, 0.7, 0.6}。當你除以這些速度因子的時候你增加了某件事情花費時間發生的可能性,所以在某次迭代中,一項8小時的任務可能要花13小時;在另一個迭代中可能要花15小時。這種過程彌補了估算者的固有樂觀傾向。而且彌補的很準確,因為這是基於該開發者的實際,真實,歷史樂觀性資料而來的。既然所有的歷史速度因子都很接近,大約徘徊在0.6左右,當你每次執行一輪模擬的時候,你都會得到一個相近的資料,所以你最後就會得到一個相對精確的釋出日期範圍。

In each round ofthe Monte Carlo simulation, of course, you have to convert the hourly data tocalendar data, which means you have to take into account each developer’s workschedule, vacations, holidays, etc. And then you have to see, for each round,which developer is finishing last, because that’s when the whole team will bedone. These calculations are painstaking, but luckily, painstaking is what computersare good at.

在蒙特卡洛的每一輪模擬中,當然,你要把小時資料轉換成日程資料,日曆資料意味著你要把開發者的工作進度,休假,公假等等都考慮進來。然後你每一輪都要觀察,哪個開發者最後完成,因為那就是整個團隊的完成時間。這些計算很痛苦, 不過幸運的是, 計算機剛好擅長這些令人痛苦的事情。

Obsessive-compulsivedisorder not required


What do you doabout the boss who interrupts you all the time with long-winded stories abouthis fishing trips? Or the sales meetings you’re forced to go to even though youhave no reason to be there? Coffee breaks? Spending half a day helping the newguy get his dev environment set up?


When Brett and Iwere developing this technique at Fog Creek, we worried a lot about things thattake real time but can’t be predicted in advance. Sometimes, this all adds upto more time than writing code. Should you have estimates for this stuff too,and track it on a time sheet?


Well, yeah, youcan, if you want. And Evidence Based Scheduling will work.


But you don’thave to.


It turns out thatEBS works so well that all you have to do is keep the clock running onwhatever task you were doing when the interruption occurred. As disconcertingas this may sound, EBS produces the best results when you do this.


Let me walk youthrough a quick example. To make this example as simple as possible, I’m goingto imagine a very predictable programmer, John, whose whole job is writingthose one-line getter and setter functions that inferior programming languagesrequire. All day long this is all he does:


private intwidth;
public int getWidth () { return width; }
public void setWidth (int _width} { width = _width; }

I know, I know…it’s a deliberately dumb example, but you knowyou’ve met someonelike this.


Anyway. Eachgetter or setter takes him 2 hours. So his task estimates look like this:


{2, 2, 2, 2, 2,2, 2, 2, 2, 2, 2, … }

Now, this poor guy has a bosswho interrupts him every once in a while with a two-hour conversation aboutmarlin fishing. Now, of course, John could have a task on his schedule called“Painful conversations about marlin,” and put that on his timesheet, but thismight not be politically prudent. Instead, John just keeps the clock running.So his actual times look like this:


{2, 2, 2, 2, 4,2, 2, 2, 2, 4, 2, … }

And hisvelocities are:


{1, 1, 1, 1, 0.5,1, 1, 1, 1, 0.5, 1, … }

Now think aboutwhat happens. In the Monte Carlo simulation, the probability that each estimatewill be divided by 0.5 is exactly the same as the probability thatJohn’s boss would interrupt him during any given feature. So EBSproduces a correct schedule!


In fact, EBS isfar more likely to have accurate evidence about these interruptions than eventhe most timesheet-obsessive developer.Which is exactly why it works so well.Here’s how I explain this to people. When developers get interrupted, they caneither


1.   make a big stink about putting the interruption on their timesheetand in their estimates, so management can see just how much time is beingwasted on fishing conversation, or

2.   公開抱怨這個打斷,並把它記到他們的時間表估計專案上去,這樣管理層就能知道有多少時間被浪費在這種釣魚談論中,或者。

3.  make a big stink about refusing to put it on their timesheet, justletting the feature they were working on slip, because they refuse to pad their estimateswhich were perfectly correct with stupid conversation aboutfishing expeditions to which they weren’t even invited,

4.   抱怨歸抱怨但是不把它放到時間表裡去,就讓他們工作的功能跳票,因為他們也不會為了顯得他們的估算是完全準確的而在時間表裡填上那些他們甚至都沒被邀請的愚蠢的關於釣魚的對話。

… and in eithercase, EBS gives the same, exactly correct results, no matter whichtype of passive-aggressive developer you have.

…不管是哪一種情況, 不管你被動還是主動型別的開發人員EBS都給出了相同的完全正確的結果。

4) Manage yourprojects actively


Once you’ve gotthis set up, you can actively manage projects to ship on time. For example, ifyou sort features out into different priorities, it’s easy to see how much itwould help the schedule if you could cut the lower priority features.


You can also lookat the distribution of possible ship dates for each developer:


Some developers(like Milton in this picture) may be causing problems because their ship datesare so uncertain: they need to work on learning to estimate better. Otherdevelopers (like Jane) have very precise ship dates that are just too late:they need to have some of their work taken off their plate. Other developers(me! yay!) are not on the critical path at all, and can be left in peace.


Scope creep

Assuming you hadeverything planned down to the last detail when you started work, EBS worksgreat. To be honest, though, you may do some features that you hadn’t planned.You get new ideas, your salespeople sell features you don’t have, and somebodyon the board of directors comes up with a cool new idea to make your golf cartGPS application monitor EKGs while golfers are buzzing around the golf course.All this leads to delays that could not have been predicted when you did theoriginal schedule.



Ideally, you havea bunch of buffer for this. In fact, go ahead and build buffer into youroriginal schedule for:

理想情況下,你應該要有一些緩衝來應付這些情況。 實際上要為你原來的計劃制定一些緩衝來處理這些情況:

1.   New feature ideas

2.   新功能想法

3.  Responding to the competition

4.   響應競爭

5.  Integration (getting everyone’s code to work together when it’smerged)

6.   整合(當所有程式碼融合的時候花一些時間來保證所有人的程式碼都是能工作的)

7.   Debugging time

8.  除錯時間

9.  Usability testing (and incorporating the results of those tests intothe product).

10.           可用性測試(將產品的可用性測試整合進產品)

11.            Beta tests

12.            Beta測試

So now, when newfeatures come up, you can slice off a piece of the appropriate buffer and useit for the new feature.


What happens ifyou’re still adding features and you’ve run out of buffer? Well, now the shipdates you get out of EBS start slipping. You should take a snapshot of the shipdate confidence distribution every night, so that you can track this over time:


The x-axisis when the calculation was done; the y-axis is the ship date.There are three curves here: the top one is the 95% probability date, themiddle is 50% and the bottom is 5%. So, the closer the curves are to oneanother, the narrower the range of possible ship dates.


If you see shipdate getting later and later (rising curves), you’re in trouble. If it’sgetting later by more than one day per day, you’re adding work faster thanyou’re completing work, and you’ll never be done. You can also look and see ifthe ship date confidence distribution is getting tighter (the curves areconverging), which it should be if you’re really converging on a date.


While we’re at it


Here are a fewmore things I’ve learned over the years about schedules.


1) Only theprogrammer doing the work can create the estimate. Anysystem where management writes a schedule and hands it off to programmers isdoomed to fail. Only the programmer who is going to implement a feature canfigure out what steps they will need to take to implement that feature.

2) Fix bugs asyou find them, and charge the time back to the original task. You can’t schedule a single bug fix in advance, because youdon’t know what bugs you’re going to have. When bugs are found in new code,charge the time to the original task that you implemented incorrectly. Thiswill help EBS predict the time it takes to get fully debugged code,not just working code.
當你發現錯誤的時候就開始修正它們,把時間算到你原來的這項任務裡。 你無法預先制定修正某個錯誤的計劃,因為你不知道你會遇到什麼樣的軟體錯誤。當新程式碼裡發現錯誤的時候,把時間算到你原來計劃中會正確實現的那項任務裡。這有助於EBS正確的預測你實現完整的除錯過的程式碼的時間,而不僅僅是工作程式碼。

3) Don’t let managersbadger developers into shorter estimates. Manyrookie software managers think that they can “motivate” their programmers towork faster by giving them nice, “tight” (unrealistically short) schedules. Ithink this kind of motivation is brain-dead. When I’m behind schedule, I feeldoomed and depressed and unmotivated. When I’m working ahead ofschedule, I’m cheerful and productive. The schedule is not the place to playpsychological games.
不要讓專案經理糾纏程式設計師制定更短時間的估計。 許多初級軟體專案經理會認為他們可以通過給予程式設計師更“緊湊”(時間短到不現實)的進度計劃來激勵程式設計師更加快速的工作。我覺得這種所謂的激勵簡直就是腦殘。當我落後於進度的時候我會覺得註定是這樣的,抑鬱,並且失去動機。當我提前於進度計劃,我就會很高興並且充滿效率。進度計劃不是玩心理遊戲的地方。

Why do managerstry this?


When the projectbegins, the technical managers go off, meet with the business people, and comeup with a list of features they think would take about threemonths, but which would really take twelve. When you think of writing codewithout thinking about all the steps you have to take, it always seems like itwill take n time, when in reality it will probably take morelike 4n time. When you do a real schedule, you add up all the tasksand realize that the project is going to take much longer than originallythought. The business people are unhappy.

當專案開始的時候,技術經理就會去會見業務人員,然後定出一堆他們覺得只要花3個月的功能,實際上這些功能可能要花12個月。 當你只考慮寫程式碼而不考慮寫程式碼的所有步驟的時候,總是看起來要花N的時間,而實際上可能要花4N的時間。當你制定實際的計劃的時候,你把所有的任務都加起來然後意識到專案要比原來的計劃要花多得多的時間。業務人員就不高興了。

Inept managerstry to address this by figuring out how to get people to work faster. This isnot very realistic. You might be able to hire more people, but they need to getup to speed and will probably be working at 50% efficiency for several months(and dragging down the efficiency of the people who have to mentor them).


You might be ableto get 10% more raw code out of people temporarilyat the cost ofhaving them burn out 100% in a year. Not a big gain, and it’s a bit like eatingyour seed corn. Of course, when you overwork people, debugging time doubles anda late project becomes later. Splendid karma.


But you can neverget 4n from n, ever, and if you think you can, pleaseemail me the stock symbol for your company so I can short it.


4) A scheduleis a box of wood blocks. If you have a bunchof wood blocks, and you can’t fit them into a box, you have two choices: get abigger box, or remove some blocks. If you wanted to ship in six months, but youhave twelve months on the schedule, you are either going to have to delayshipping, or find some features to delete. You just can’t shrink the blocks,and if you pretend you can, then you are merely depriving yourself of a usefulopportunity to actually see into the future by lying toyourself about what you see there.
進度就像一盒子的木料。 如果你有一堆木料而且你無法把他們裝進盒子裡,你有兩個選擇:選個更大的盒子,或者去掉一些木料。如果你想要在6個月內釋出,但你的進度上有12個月的內容,你要麼延遲釋出,要麼找些功能刪除掉。你不能縮減內容,如果你假裝可以的話,你不過是在對你自己撒謊你能在未來看到什麼,你剝奪了自己很好的機會來看看未來實際是怎樣的。

Now that Imention it, one of the great benefits of realistic schedules is that you are forcedto delete features. Why is this good?


Suppose you havetwo features in mind. One is really useful and will make your product reallygreat. The other is really easy and the programmers can’t wait to code it up(”Look! <blink>!”), but it serves no useful purpose.


If you don’t make a schedule,the programmers will do the easy/fun feature first. Then they’ll run out oftime, and you will have no choice but to slip the schedule to do theuseful/important feature.


If you do make aschedule, even before you start working, you’ll realize that you have to cutsomething, so you’ll cut the easy/fun feature and just do the useful/importantfeature. By forcing yourself to chose some features to cut, you wind up makinga more powerful, better product with a better mix of good features that shipssooner.


Way back when Iwas working on Excel 5, our initial feature list was huge and would havegone way over schedule. “Oh my!” we thought. “Those are all superimportant features! How can we live without a macro editing wizard?”


As it turns out,we had no choice, and we cut what we thought was “to the bone” to make theschedule. Everybody felt unhappy about the cuts. To make people feel better, wetold ourselves that we weren’tcutting the features, we weresimply deferring them to Excel 6.


As Excel 5 wasnearing completion, I started working on the Excel 6 spec with a colleague,Eric Michelman. We sat down to go through the list of “Excel 6” features thathad been punted from the Excel 5 schedule. Guess what? It was the shoddiestlist of features you could imagine. Not one of those featureswas worth doing. I don’t think a single one of them ever was. The process ofculling features to fit a schedule was the best thing we could have done. If wehadn’t done this, Excel 5 would have taken twice as long and included 50%useless crap features that would have had to be supported, for backwardscompatibility, until the end of time.

在Excel5快要完成的時候,我開始和同事EricMichelman編寫Excel6的規範,我們坐下來討論從Excel5推遲過來的特性。你猜怎麼著?那簡直就是你能想象的最挫的功能列表。沒有一個功能值得做。 我不覺得任何一個值得去做。 裁剪功能了列表來迎合進度計劃的過程是做得最好的事情。如果我們沒有這麼做,Excel5至少要花兩倍的時間而且包含了50%的無用垃圾功能。這些功能還得後續技術支援,後向相容直到最後。



UsingEvidence-Based Scheduling is pretty easy: it will take you a day or two at thebeginning of every iteration to produce detailed estimates, and it’ll take afew seconds every day to record when you start working on a new task on atimesheet. The benefits, though, are huge: realistic schedules.


Realisticschedules are the key to creating good software. It forces you to do the bestfeatures first and allows you to make the right decisions about what to build.Which makes your product better, your boss happier, delights your customers,and—best of all—lets you go home at five o’clock.



