This post is a bit of a capstone. It utilizes all of the tools to make video games scientifically that I covered in the Parts 1-6 of “Game Planning With Science”. Make sure you’ve reviewed those weighty tomes before digging in here. In this post, I’m going to walk you through how to utilize capacity charts, story points, user stories, variance, and the central limit theorem to forecast development time lines.
The article image for “Scheduling Video Games Scientifically! – Game Planning With Science! Part 7” is from GraphicStock. Used under license.
By Reading This Post, You’ll Learn:
- Step-by-step plans for using your data to forecast both asset and feature development over time
- How to quantify those estimates in terms of expected values and confidence intervals
- What to do if you lack data and need to make assumptions to get started
But First, A Quick Recap
Before we dive in, let’s recap some key points:
- You can use Little’s Law to calculate your average flow time per unit in a pipeline flow
- Your critical path determines the time it takes to complete an entire pipeline cycle, but your bottleneck determines the rate that the pipeline outputs
- The mean value of a data set only tells you the central tendency of that data, but tells you nothing of the variance
- Variance is calculated in nonsensical squared values, but the more interpretable square root of the variance is the standard deviation
- The central limit theorem demonstrates that averages or large sums taken from a distribution of any shape will assume a normal distribution
- Because of this, the central limit theorem allows us to make predictions about averages and large sums
- A story point is an estimate of scope
- Story point velocities (units of scope per unit of time) are easier to calculate and use than development time per feature (units of time per unit of scope)
- User stories are feature request that account for who wants a feature and why, the technical requirements of the feature, and the acceptance criteria
Estimating Asset Creation
I’m going to start with art asset forecasting, as it’s a little more straightforward than features.
Step 1: Talley Up the Assets You Need To Create, and Group Them Logically into Separate Pipelines
To start, you need to decide how many pipelines you have and what art asset/s each processes.
The definition of a logical grouping is subjective to your studio. It may be that each asset class (levels, characters, items,) should each have a distinct pipeline. Or some assets may have the same series of activities (character models and weapon models may require exactly the same sequence). The trade-off you are making here is between simplicity and consistency.
On one extreme, if every asset class has its own pipeline, forecasting is going to be more complex, especially if you have team members working on multiple pipelines. But your observed flow time per pipeline will have much lower variability (and thus a narrower confidence interval).
On the other hand if you track every asset class through one consolidated pipeline, you’re going to have a much easier time with tracking, but your output will be highly variant.
Like I said in Part 6, don’t think in terms of good or bad. Balance the trade-off to best match your situation.
Step 2: Create a Capacity Chart
Go through the capacity chart exercise I discussed in Part 2 for each of the pipelines you identified in Step 1 above. If you have data on mean flow time and variance, great. Otherwise just go with the throughput (R) the capacity chart spits out.
Also, remember to take special cases into account as outlined at the end of Part 2.
Step 3: Forecast!
The math in this part is a little tricky, so bear with me.
First, take the inverse of the overall pipeline throughput that each capacity chart spits out (1/R). Little’s Law tells us that the inverse of the throughput is the flow time between assets for that pipeline. In other words, the inverse of the throughput is the time that will elapse between when one asset is complete and the next asset is complete.
Now, take the average flow time for the pipeline (or the theoretical flow time if you don’t have data on the actual average flow time). That is the time line to produce the first unit. Then multiply the inverse of your throughput by one less than the total number of assets the pipeline needs to produce. Add the product to That’s your expected time line.
That’s a little confusing so here’s the above paragraph in equation form:
Expected Time Line To Complete All Assets = (Time to Complete First Asset) + (All of the Remaining Assets * Elapsed Time Between Assets)
In other words, once the pipeline kicks into gear, each asset will have the same theoretical flow time. But you will have multiple assets in the pipeline at once (unless you only have one person running the entire process), so those flow times will overlap. So you just need to account for the entire flow time of the very first asset in the pipeline and then the time that elapses between assets.
Step 4: Calculate Your Confidence Interval
If you have enough data to calculate a meaningful confidence interval (30 or so data points) for any pipeline, then go for it just like in Part 4. First decide what confidence level you want. Do you want to be 95% certain your confidence interval encapsulates the right date? 80%? Remember the important caveat: higher confidence levels mean wider, less precise, confidence intervals.
Then take that desired confidence level, and the number of samples for each pipeline boot up Excel and calculate your t-statistic: “=TINV(Confidence Level, # of Samples-1)”. So if you want to 90% confident in your confidence interval and you have 30 data points, you would enter “=TINV(90%,29). Calculate a separate t-statistic for each pipeline.
Then calculate the upper and lower bounds of your confidence interval:
- X is the expected time line you calculated in Step 3
- t is the t-statistic you calculated in Excel
- s is the standard deviation you’ve seen in your data
- A is the number of outstanding assets of that class you need to produce
- n is the number of data points you have for that asset
The lower bound is the earliest date you can expect to hit a the confidence level you used for your t-statistic. The upper bound is the latest date. Refine this confidence interval overtime as you gather more data.
Estimating You Feature Backlog
Step 1: Scope Out Your Backlog
Get a deck of planning poker cards and start scoping out your backlog. The more the better, but this is also a time consuming exercise. The scope of or need for various user stories will change or evaporate over time, so don’t invest days and days going through everything.* Estimate enough that you have a grasp on the mean user story size. 30-40 stories is a good amount of scoping to get a sense for the mean. For all of the stories you don’t directly estimate, just apply the mean user story size.º
Step 2: Calculate Your Average Velocity
Determine the time unit that is most useful for you (day, week, fortnight, month) and determine the average story points your team completes per that unit of time. I tend to stick with fortnights. The entire per-feature development cycle tends to be longer than a week, but going month to month is too lagging of an indicator. Two-week intervals tend to hit the sweet spot for me, but your needs and experience may vary. Some people swear by three weeks.
How many weeks of data should you include? This is another of those subjective, art-versus-science judgment calls. In general, more data means smaller variances and thus narrower confidence intervals, all of which is a good thing. However, too much data might bias your calculations. If your team doubled in size in a short time span, including velocity data from before the increase will bias your velocity downward, in which case you might not want to include it.
Alternatively, if your team regularly ramps up and down (through heavy use of contractors, for instance) you might want to leave all that variability in there, to allow for similar variance in the future. Or if you have one or two outlier weeks where your velocity dramatically decreased (everyone got the same flu virus and had to call out), you could simply drop those weeks from your calculation.
Again, statistics is about story telling. With your average velocity calculation you are attempting to tell a story about how quickly your team churns through scope. So include the information that enhances that story (for better or worse) and ignore the info that just muddies the waters.
Step 3: Map That Average Velocity Against Your Backlog
Once you have an average story point velocity per your preferred unit of time, divide the total number of story points in your backlog by that velocity. The resulting quotient is your forecast for the expected time line to complete all of your features (X).
As in Step 4 for art asset forecasting, calculate the confidence interval for that time line, as we did in Part 4. First decide what confidence level you want. Again, remember thar higher confidence levels mean wider, less precise, confidence intervals.
Second, tally up the number of velocity samples you have. If you are measuring story point velocity per week, how many weeks of data do you have? Tally up whatever that number is.
Then take that desired confidence level, and the number of samples, boot up Excel and calculate your t-statistic: “=TINV(Confidence Level, # of Samples-1)”. So if you want to 95% confident in your confidence interval and you have 45 weeks of data, you would enter “=TINV(95%,44).
Take the resulting t-statistic and plug that into the confidence interval formula. To recap, X is the the expected value you calculated by dividing your backlog by your average velocity, t is the t-statistic you just calculated, s is the standard deviation you see in your velocity data, BL is the number of user stories in your backlog, and n is the number of samples of data you have (45 in the t-statistic calculation above).
The results of these two equations tell you the range of dates that will encapsulate your actual delivery date, based on the confidence level you choose.
Narrowing Your Confidence Intervals
If you don’t have a lot of data, or if you have highly variable data, your confidence interval may be really wide. That information is useful from a project management standpoint of understanding risk. But, if you tried to pitch that to a publisher, investor, or backer, he or she would baulk.
If you want to narrow your confidence interval without reducing your confidence level and without waiting around for weeks and weeks to collect data, you need to focus on reducing variability. There are three approaches you can use, by themselves or in tandem:
- Push to hit your target average number of story points each week or more By hitting a consistent number of story points, you are reducing the variability of your velocity, which will tighten the confidence interval around the expected value.
- Groom user stories to have a consistent scope Limiting the variance in scope should result in in less variant output from your developers and thus less variant overall velocity.
- Groom smaller user stories Smaller user stories will have fewer story points, so individual stories that take longer than expected have a smaller impact on your overall velocity. Example: if your developer has a 3-point story he can’t complete until the beginning of next week, that will have a far smaller impact on your velocity for this week than a 21-point whopper would.
I Smell Snake Oil: Does This Hocus Pocus Really Work?
Yes, and quite well at that. I’ve seen this work on game development and web development, and with remarkable accuracy.
Why does it work? One reason is that it gives clear production benchmarks: to hit a designated date you need to churn through X amount of scope per Y amount of time. It’s really easy to compare your actual development velocity against that bench mark.
A second reason is that the natural variance of the Fibonacci sequence cushions you against inaccuracies in estimation. Some stories will be low-balled and some will be high-balled. Disparities between estimates and actuality won’t constantly disrupt schedules they way they would if you scheduled every hour of work.
What If I’m Just Getting Started?
Then it’s time to make some assumptions! Despite what prevailing wisdom would have you believe, there is nothing asinine about assumptions, provided that:
- The assumption is reasonable
- You re-evaluate and adjust the assumption as you collect data
My favorite term that I learned in business school is “complex rectal extraction”. Entrepreneurs pull numbers out of their asses all the time when pitching ideas to investors. And they’re not alone. Corporate America also constantly makes assumptions. It makes assumptions about economic conditions, growth rates, conversion rates, how many years a product will be on the market. Sometimes you just don’t have the data. It’s okay.
There are few pieces of information you need to make a forecast when making video games:
- How many user stories are in the backlog
- The mean story point size of your user stories
- You mean and standard deviation of your story point velocity
- The number of art assets in all of their varieties
- The throughput/flow time of those various assets
If you have an established studio, you probably have data to inform some or all of those. If you’re a brand new studio, it’s a different story. Here’s how you can approach mapping out the unknown. Skip any step for which you already have real data.
First Assumption: Average User Story Size
Spec out and estimate a handful of user stories (10-20) and then calculate the average story point value.
How to improve this estimate over time
Simply recalculate as you gather more data
Second Assumption: The Number of User Stories in Your Backlog
This is the most complex-rectal-extraction-y part of the process.
First chop up your design into the biggest chunks of related features (what, in agile parlance, are known as epics). If you were to perform this process on Mass Effect 3, it might look like this:
- Special abilities
- Experience system
- Weapon Crafting
- Dialog System
- Galaxy Navigation
- Crucible Metagame
Likewise, break your own game into these large, logical categories. Then…well…take a guess. How much scope does each entail? It’s helpful to think of these in terms of relative scope, just as you would individual user stories. EG, Epic 1 is twice the size as Epic 2, but half the size of Epic 3. You can even apply the Fibonacci Sequence at this larger scale to simplify your comparisons. Or, t-shirt sizes can also work well at the Epic level: small, medium, large, XL, etc.
Then take whichever Epic is the smallest and take a stab at how many story points it encompasses. Then scale that estimate proportionally to all of the other Epics.
And don’t over -think this. You’re not going to get super-accurate estimates. You just don’t know enough about the game yet. So don’t waste your time. No sense sinking tons of time into being accurate when it’s impossible. Do the best you can do and move on.
How to improve this estimate over time
Keep track of the actually size of each epic is when you actually complete it. Then use that to inform high-level estimates of your next game. For instance, if the current game had a mix of small, medium, and large epics, calculate the average scope for each size. Then, apply those average scope values to the epics from your next game.
Third Assumption: Average Velocity and Standard Deviation
Pick a number that sounds reasonable, sustainable, and has some headroom. Don’t estimate velocity based on your redline, live-at-the-office level of productivity. If anyone wants to pull a date in, velocity is always the first knob they want to turn. Pick a value that leaves room for increased velocity without killing yourself.
You can also make a reasonable assumption of standard deviation. +/- 3 points? +/-15%? Again, pick something that sounds reasonable and don’t over think it.
How to improve this estimate over time
As with average story point value, simply gather data.
Fourth Assumption: The Number of Art Assets You Need To Create (characters, NPCs, levels, items)
This is probably the most straightforward assumption to make, particularly in the case of levels. If you’re not sure, look at comparable games and count how many assets and variants (palettes swaps or altered geometry) you see. The great thing about assets is that the scale much more easily. If you assumption results in a schedule that is untenable, scale back the number of assets. You can always scale back up if your pipelines move faster than expected.
How to improve this estimate over time
Record the number of each kind of asset you actually made. Then, possibly more importantly, record the number you wanted to make. If there were particular assets you felt were too scarce, take note of why. Where there too few enemy NPC’s to keep combat interesting? Not enough components to give crafting depth? Or not enough weapons to satisfy loot whores? Maybe there weren’t enough levels to avoid filler back-tracking?
Use these assessments to gauge how many more assets you would need to fix those deficits. 10% more? 30%? Then perform the same exercise for any assets that were overly abundant and for which you could have made do with less.
Fifth Assumption: The Throughput & Flow Time of Your Art Asset Pipelines
Talk to your artists, and get their feedback on the critical steps of the pipeline and the average time to complete each. Ideally, get estimates for each activity from the person who will actually perform it. If not, then have your most senior artist give the estimates.
Alternatively, if you’re really in a primordial stage of your studio and you don’t have an artist on the founding team, reach out to another studio. Or attend a game dev-related meetup.com event in your area. Or hit up a relevant thread in reddit. If all else fails, simply search for artists using the “#gamedev” hashtag on Twitter, Facebook, or Google+ and see if any of them are willing to help out.
And don’t be bashful: one of the highest compliments you can pay any person is to ask him or her for his/her advice. So get to complimenting!
How to improve this estimate over time
Gather data. Focus on measuring throughput as accurately as you can (it’s probably easier to measure than flow time), including mean and standard deviations.
BENCHMARK, BENCHMARK, BENCHMARK
One of the most famous quotes in all of business is from management luminary Peter Drucker: “What’s measured improves”***. Don’t set and forget your estimates. Gather data, and refine your estimates and forecasts over time. And when we’re talking about gathering an leveraging data, it’s vital to use that information responsibly. Specifically you should use it to avoid falling for the dreaded sunk-cost fallacy. Click the link to read on!
- Use capacity charts to estimate average flow time per art asset type
- Multiply that average flow time by the number of assets to estimate the time to complete all of your assets
- Groom your backlog to estimate your average user story size
- Sample your story point velocity to estimate your average velocity and standard deviation
- Divide your estimated backlog size by your average velocity to get your expected development timeline
- Use your velocity’s standard deviation to calculate your confidence interval
- Don’t hesitate to make assumptions when you lack data