Over at AITS.org — Failure is not Optional

My latest is at this link at AITS.org with the provocative title: “Failure is not Optional: Why Project Failure is OK.”  The theme and specifics of the post, however, are not that simple and I continue with a sidebar on Grant’s conduct of the Overland Campaign entitled “How Grant Leveraged Failure in the Civil War.”  A little elaboration is in place once you read the entire post.

I think what we deal with in project management are shades of failure.  It is important to understand this because we rely too often on projections of performance that oftentimes turn out to be unrealistic within the framing assumptions of project management.  In this context our definition of what defines success turns out to be fluid.

To provide a simplistic example of other games of failure, let’s take the game of American baseball.  A batter who hits safely more than 30% of the time is deemed to be skilled in the art of hitting a baseball.  A success.  Yet, when looked at it from a total perspective what this says is that 70% failure is acceptable.  A pitcher who gives up between 2 and 4 earned runs a game is considered to be skilled in the art of pitching.  Yet, this provides a range of acceptable failure under the goal of giving up zero runs.  Furthermore, if your team wins 9-4 you’re considered to be a winning pitcher.  If you lose 1-0 you are a losing pitcher, and there are numerous examples of talented pitchers who were considered skilled in their craft who had losing records because of lack of run production by his team.  Should the perception of success and failure be adjusted based on whether one pitched for the 1927 or 1936 or 1998 Yankees, or the 1963 Dodgers, or 1969 Mets?  The latter two examples were teams built on just enough offense to provide the winning advantage, with the majority of pressure placed on the pitching staff.  Would Tom Seaver be classified as less extraordinary in his skill if he averaged giving up half a run more?  Probably.

Thus, when we look at the universe of project management and see that the overwhelming majority of IT projects fail, or that the average R&D contract realizes a 20% overrun in cost and a significant slip in schedule, what are we measuring?  We are measuring risk in the context of games of failure.  We handle risk to absorb just enough failure and noise in our systems to both push the envelope on development without sacrificing the entire project effort.  To know the difference between transient and existential failure, between learning and wasted effort, and between intermediate progress and strategic position requires a skillset that is essential to ultimate achievement of the goal, whether it be deployment of a new state-of-the-art aircraft, or a game-changing software platform.  The noise must pass what I have called the “so-what?” test.

I have listed a set of skills necessary to the understanding these differences in the article that you may find useful.  I have also provided some ammunition for puncturing the cult of “being green.”

Technical Ecstacy — Technical Performance and Earned Value

As many of my colleagues in project management know, I wrote a series of articles on the application of technical performance risk in project management back in 1997, one of which made me an award recipient from the institution now known as Defense Acquisition University.  Over the years various researchers and project organizations have asked me if I have any additional thoughts on the subject and the response up until now has been: no.  From a practical standpoint, other responsibilities took me away from the domain of determining the best way of recording technical achievement in complex projects.  Furthermore, I felt that the field was not ripe for further development until there were mathematics and statistical methods that could better approach the behavior of complex adaptive systems.

But now, after almost 20 years, there is an issue that has been nagging at me since publication of the results of the project studies that I led from 1995 through 1997.  It is this: the complaint by project managers in resisting the application of measuring technical achievement of any kind, and integrating it with cost performance, the best that anyone can do is 100%.  “All TPM can do is make my performance look worse”, was the complaint.  One would think this observation would not only not face opposition, especially from such an engineering dependent industry, but also because, at least in this universe, the best you can do is 100%.*  But, of course, we weren’t talking about the same thing and I have heard this refrain again at recent conferences and meetings.

To be honest, in our recommended solution in 1997, we did not take things as far as we could have.  It was always intended to be the first but not the last word regarding this issue.  And there have been some interesting things published about this issue recently, which I noted in this post.

In the discipline of project management in general, and among earned value practitioners in particular, the performance being measured oftentimes exceeds 100%.  But there is the difference.  What is being measured as exceeding 100% is progress against both a time-based and fiscally-based linear plan.  Most of the physical world doesn’t act nor can it be measured this way.  When measuring the attributes of a system or component against a set of physical or performance thresholds, linearity against a human-imposed plan oftentimes goes out the window.

But a linear progression can be imposed on the development toward the technical specification.  So then the next question is how do we measure progress during the development curve and duration.

The short answer, without repeating a summarization of the research (which is linked above) is through risk assessment, and the method that we used back in 1997 was a distribution curve that determined the probability of reaching the next step in the technical development.  This was based on well-proven systems engineering techniques that had been used in industry for many years, particularly at pre-Lockheed Martin Martin Marietta.  Technical risk assessment, even using simplistic 0-50-80-100 curves, provides a good approximation of probability and risk between each increment of development, though now there are more robust models.  For example, the use of Bayesian methodology, which introduces mathematical rigor into statistics, as outlined in this post by Eliezer Yudkowsky.  (As an aside, I strongly recommend his blogs for anyone interested in the cutting edge of rational inquiry and AI).

So technical measurement is pretty well proven.  But the issue that then presents itself (and presented itself in 1997) was how to derive value from technical performance.  Value is a horse of a different color.  The two bugaboos that were presented as being impassible roadblocks were weight and test failure.

Let’s take weight first.  On one of my recent trips I found myself seated in an Embraer E-jet.  These are fairly small aircraft, especially compared to conventional commercial aircraft, and are lightweight.  As such, they rely on a proper distribution and balance of weight, especially if one finds oneself at 5,000 feet above sea level with the long runway shut down, a 10-20 mph crosswind, and a mountain range rising above the valley floor in the direction of takeoff.  So the flight crew, when the cockpit noted a weight disparity, shifted baggage from belly stowage to the overhead compartments in the main cabin.  What was apparent is that weight is not an ad hoc measurement.  The aircraft’s weight distribution and tolerances are documented–and can be monitored as part of operations.

When engineering an aircraft, each component is assigned its weight.  Needless to say, weight is then allocated and measured as part of the development of subsystems of the aircraft.  One would not measure the overall weight of the aircraft or end item without ensuring that the components and subsystems did not conform to the weight limitations.  The overall weight limitation of an aircraft will very depending on mission and use.  If a commercial-type passenger airplane built to takeoff and land from modern runways, weight limitations are not as rigorous.  If the aircraft in question is going to takeoff and land from a carrier deck at sea then weight limitations become more critical.  (Side note:  I also learned these principles in detail while serving on active duty at NAS Norfolk and working with the Navy Air Depot there).  Aside from aircraft weight is important in a host of other items–from laptops to ships.  In the latter case, of which I am also intimately familiar, weight is important in balancing the ship and its ability to make way in the water (and perform its other missions).

So given that weight is an allocated element of performance within subsystem or component development, we achieve several useful bits of information.  First off, we can aggregate and measure weight of the entire end item to track if we are meeting the limitations of the item.  Secondly, we can perform trade-off.  If a subsystem or component can be made with a lighter material or more efficiently weight-wise, then we have more leeway (maybe) somewhere else.  Conversely, if we need weight for balance and the component or subsystem is too light, we need to figure out how to add weight or ballast.  So measuring and recording weight is not a problem. Finally, we allocate and tie performance-wise a key technical specification to the work, avoiding subjectivity.

So how to do we show value?  We do so by applying the same principles as any other method of earned value.  Each item of work is covered by a Work Breakdown Structure (WBS), which is tied (hopefully) to an Integrated Master Schedule (IMS).  A Performance Management Baseline (PMB) is applied to the WBS (or sometimes thought a resource-loaded IMS).  If we have properly constructed our Integrated Management Plan (IMP) prior to the IMS, we should clearly have tied the relationship of technical measures to the structure.  I acknowledge that not every program performs an IMP, but stating so is really an acknowledgement of a clear deficiency in our systems, especially involving complex R&D programs.  Since our work is measured in short increments against a PMB, we can claim 100% of a technical specification but be ahead of plan for the WBS elements involved.

It’s not as if the engineers in our industrial activities and aerospace companies have never designed a jet aircraft or some other item before.  Quite a bit of expertise and engineering know-how transfers from one program to the next.  There is a learning curve.  The more information we collect in that regard, the more effective that curve.  Hence my emphasis in recent posts on data.

For testing, the approach is the same.  A test can fail, that is, a rocket can explode on the pad or suffer some other mishap, but the components involved will succeed or fail based on the after-action report.  At that point we will know, through allocation of the test results, where we are in terms of technical performance.  While rocket science is involved in the item’s development, recording technical achievement is not rocket science.

Thus, while our measures of effectiveness, measures of performance, measures of progress, and technical performance will determine our actual achievement against a standard, our fiscal assessment of value against the PMB can still reflect whether we are ahead of schedule and below budget.  What it takes is an understanding of how to allocate more rigorous measures to the WBS that are directly tied to the technical specifications.  To do otherwise is to build a camel when a horse was expected or–as has been recorded in real life in previous programs–to build a satellite that cannot communicate, a Navy aircraft that cannot land on a carrier deck, a ship that cannot fight, and a vaccine that cannot be delivered and administered in the method required.  We learn from our failures, and that is the value of failure.

 

*There are colloquial expressions that allow for 100% to be exceeded, such as exceeding 100% of the tolerance of a manufactured item or system, which essentially means to exceed its limits and, therefore, breaking it.

Over at AITS.org — Black Swans: Conquering IT Project Failure & Acquisition Management

It’s been out for a few days but I failed to mention the latest article at AITS.org.

In my last post on the Blogging Alliance I discussed information theory, the physics behind software development, the economics of new technology, and the intrinsic obsolescence that exists as a result. Dave Gordon in his regular blog described this work as laying “the groundwork for a generalized theory of managing software development and acquisition.” Dave has a habit of inspiring further thought, and his observation has helped me focus on where my inquiries are headed…

To read more please click here.

Frame by Frame: Framing Assumptions and Project Success or Failure

When we wake up in the morning we enter the day with a set of assumptions about ourselves, our environment, and the world around us.  So too when we undertake projects.  I’ve just returned from the latest NDIA IPMD meeting in Washington, D.C. and the most intriguing presentation at the meeting was given by Irv Blickstein regarding a RAND root cause analysis of major program breaches.  In short, a major breach in the cost of a program is defined by the Nunn-McCurdy amendment that was first passed in 1982, in which a major defense program breaches its projected baseline cost by more than 15%.

The issue of what constitutes programmatic success and failure has generated a fair amount of discussion among the readers of this blog.  The report, which is linked above, is full of useful information regarding Major Defense Acquisition Program (also known as MDAP) breaches under Nunn-McCurdy, but for purposes of this post readers should turn to page 83.  In setting up a project (or program), project/program managers must make a set of assumptions regarding the “uncertain elements of program execution” centered around cost, technical performance, and schedule.  These assumptions are what are referred to as “framing assumptions.”

A framing assumption is one in which there are signposts along the way to determine if an assumption regarding the project/program has changed over time.  Thus, according to the authors, the precise definition of a framing assumption is “any explicit or implicit assumption that is central in shaping cost, schedule, or performance expectations.”  An interesting aspect of their perspective and study is that the three-legged stool of program performance relegates risk to serving as a method that informs the three key elements of program execution, not as one of the three elements.  I have engaged in several conversations over the last two weeks regarding this issue.  Oftentimes the question goes: can’t we incorporate technical performance as an element of risk?  Short answer:  No, you can’t (or shouldn’t).  Long answer: risk is a set of methods for overcoming the implicit invalidity of single point estimates found in too many systems being used (like estimates-at-complete, estimates-to-complete, and the various indices found in earned value management, as well as a means of incorporating qualitative environmental factors not otherwise categorizable), not an element essential to defining the end item application being developed and produced.  Looked at another way, if you are writing a performance specification, then performance is a key determinate of program success.

Additional criteria for a framing assumption are also provided in the RAND study.  These are that the assumptions must be determinative, that is, the consequences of the assumption being wrong significantly affects the program in an essential way.  They must also be unmitigable, that is, the consequences of the assumption being wrong are unavoidable.  They must be uncertain, that is, the outcome or certainty of it being right or wrong cannot be determined in advance.  They must be independent and not dependent on another event or series of events.  Finally, they must be distinctive, in setting the program apart from other efforts.

RAND then applied the framing assumption methodology to a number of programs.  The latest NDIA meeting was an opportunity to provide an update of conclusions based on the work first done in 2013.  What the researchers found was that framing assumptions which are kept at a high level, be developed early in a program’s life cycle, and should be reviewed on a regular basis to determine validity.  They also found that a program breached the threshold when a framing assumption became invalid.  Project and program managers, and requirements personnel have at least intuitively known this for quite some time.  Over the years, this is the reason given for requirements changes and contract modifications over the course of development that result in cost, performance, and schedule impacts.

What is different about the RAND study is that they have outlined a practical process for making these determinations early enough for a project/program to be adjusted with changing circumstances.  For example, the numbers of framing assumptions of all MDAPs in the study could be boiled down to four or five, which are easily tested against reality during the milestone and other reviews held over the course of a program.  This is particularly important given the lengthened time-frames of major acquisitions from development to production.

Looking at these results, my own observation is that this is a useful tool for identifying course corrections that are needed before they manifest into cost and schedule impacts, particularly given that leadership at PARCA has been stressing agile acquisition strategies.  The goal here, it seems, is to allow for course corrections before the inertia of the effort leads to failure or–more likely–the development and deployment of an end item that does not entirely meet the needs of the Defense Department.  (That such “disappointments” often far outstrip the capabilities of our adversaries is a topic for a different post).

I think the court is still out on whether course corrections, given the inertia of work and effort already expended at the point that a framing assumption would be tested as invalid, can ever truly be offsetting to the point of avoiding a breach, unless we then rebrand the existing effort as a new program once it has modified its structure to account for new framing assumptions.  Study after study has shown that project performance is pretty well baked in at the 20% mark.  For MDAPs, much of the front-loaded efforts in technology selection and application have been made.  After all, systems require inputs and to change a system requires more inputs, not less, to overcome the inertia of all of the previous effort, not to mention work in progress.   This is basic physics whether we are dealing with physical systems or complex adaptive (economic) systems.

Certainly, more efficient technology that affects the units of measurement within program performance can result in cost savings or avoidance, but that is usually not the case.  There is a bit of magical thinking here: that commercial technologies will provide a breakthrough to allow for such a positive effect.  This is an ideological idea not borne out by reality.  The fact is that most of the significant technological breakthroughs we have seen over the last 70 years–from the microchip to the internet and now to drones–have resulted from public investments, sometimes in public-private ventures, sometimes in seeded technologies that are then released into the public domain.  The purpose of most developmental programs is to invest in R&D to organically develop technologies (utilizing the talents of the quasi-private A&D industry) or provide economic incentives to incorporate technologies that do not currently exist.

Regardless, the RAND study has identified an important concept in determining the root causes of overruns.  It seems to me that a formalized process of identifying framing assumptions should be applied and done at the inception of the program.  The majority of the assessments to test the framing assumptions should then need to be made prior to the 20% mark as measured by program schedule and effort.  It is easier and more realistic to overcome the bow-wave of effort at that point than further down the line.

Note: I have modified the post to clarify my analysis of the “three-legged stool” of program performance in regard to where risk resides.