Technical Foul — It’s Time for TPI in EVM

For more than 40 years the discipline of earned value management (EVM) has gone through a number of changes in its descriptions, governance, and procedures.  During that same time its community has been resistant to improvements in its methodology or to changes that extend its value when taking into account other methods that either augment its usefulness, or that potentially provide more utility in the area of performance management.  This has been especially the case where it is suggested that EVM is just one of many methodologies that contribute to this assessment under a more holistic approach.

Instead, it has been asserted that EVM is the basis for integrated project management.  (I disagree–and solely on the evidence that if it was so, then project managers would more fully participate in its organizations and conferences.  This would then pose the problem that PMs might then propose changes to EVM that, well…default to the second sentence in this post).  As evidence it need only be mentioned that there has been resistance to such recent developments in using earned schedule, technical performance, and risk–most especially risk based on Bayesian analysis).

Some of this resistance is understandable.  First, it took quite a long time just to get to a consensus on the application of EVM, though its principles and methods are based on simple and well proven statistical methods.  Second, the industries in which EVM has been accepted are sensitive to risk, and so a bureaucracy of practitioners have grown to ensure both consensus and compliance to accepted methods.  Third, the community that makes up practitioners of EVM consist mostly of cost analysts, trained in simple accounting, arithmetic, and statistical methodology.  It is thus a normal human bias to assume that the path of one’s previous success is the way to future success, though our understanding of the design space (reality) that we inhabit has been enhanced through new knowledge.  Fourth, there is a lot of data that applies to project management, and the EVM community is only now learning of the ways that this other data impacts our understanding of measuring project performance and the probability of reaching project goals in rolling out a product.  Finally, there is the less defensible reason that a lot of people and firms have built their careers that depends on maintaining the status quo.

Our ability to integrate disparate datasets is accelerating on a yearly basis thanks to digital technology, and the day in achieving integration of all relevant factors in project and enterprise performance is inevitable.  To be frank, I am personally engaged in such projects and am assisting organizations in moving in this direction today.  Regardless, we can make an advance in the discipline of performance management by pulling down low hanging fruit.  The most reachable one, in my opinion, is technical performance measurement.

The literature of technical performance has come quite a long way, thanks largely to the work of the Institute for Defense Analyses (IDA) and others, particularly the National Defense Industrial Association through the publication of their predictive measures guide.  This has been a topic of interest to me since its study was part of my duties back when I was still wearing a uniform.  The early results of these studies resulted in a paper that proposed a method of integrating technical performance, earned value, and risk.  A pretty comprehensive overview of the literature and guidance for technical performance can be found at this presentation by Glen Alleman and Tom Coonce given at EVM World in 2015.  It must be mentioned that Rick Price of Lockheed Martin also contributed greatly to this literature.

Keep in mind what is meant when we decide to assess technical performance within the context of R&D.  It is an assessment against expected or specified:

a.  Measures of Effectiveness (MoE)

b.  Measures of Performance (MoP), and

c.  Key Performance Parameters (KPP)

The opposition from the project management community to widespread application of this methodology took two forms.  First, it was argued, the method used to adjust the value of earned (CPI) seemed always to have a negative impact.  Second, there are technical performance factors that transcend the WBS, and so it is hard to properly adjust the individual control accounts based on the contribution of technical performance.  Third, some performance measures defy an assessment of value in a time-phased manner.  The most common example has been tracking weight of aircraft, which has contributors from virtually all components that go into it.

Let’s take these in order.  But lest one think that this perspective is an artifact from 1997, just a short while ago, in the A&D community, the EVM policy office at DoD attempted to apply a somewhat modest proposal of ensuring that technical performance was included as an element in EVM reporting.  Note that the EIA 748 standard states this clearly and has done so for quite some time.  Regardless, the same three core objections were raised in comments from the industry.  Thus, this caused me to ask some further in-depth questions and my revised perspective follows below.

The first condition occurred, in many cases, due to optimism bias in registering earned value, which often occurs when using a single point estimate of percent complete by a limited population of experts contributing to an assessment of the element.  Fair enough, but as you can imagine, its not a message that a PM wants to hear or will necessarily accept or admit, regardless of the merits.  There are more than enough pathways to second guessing and testing selection bias at other levels of reporting.  Glen Alleman in his Herding Cats blog post of 12 August has a very good post listing the systemic reasons for program failure.

Another factor is that the initial methodology did possess a skewing toward more pessimistic results.  This was not entirely apparent at the time because the statistical methods applied did not make that clear.  But, to critique that first proposal, which was the result of contributions from IDA and other systems engineering technical experts, the 10-50-90 method in assessing probability along the bandwidth of the technical performance baseline was too inflexible.  The graphic that we proposed is as follows and one can see that, while it was “good enough”, if rolled up there could be some bias that required adjustment.

TPM Graphic

 

Note that this range around 50% can be interpreted to be equivalent to the bandwidth found in the presentation given by Alleman and Coonce (as well as the Predictive Measures Guide), though the intent here was to perform an assessment based on a simplified means of handicapping the handicappers–or more accurately, performing a probabilistic assessment on expert opinion.  The method of performing Bayesian analysis to achieve this had not yet matured for such applications, and so we proposed a method that would provide a simple method that our practitioners could understand that still met the criteria of being a valid approach.  The reason for the difference in the graphic resides in the fact that the original assessment did not view this time-phasing as a continuous process, but rather an assessment at critical points along the technical baseline.

From a practical perspective, however, the banding proposed by Alleman and Coonce take into account the noise that will be experienced during the life cycle of development, and so solves the slight skewing toward pessimism.  We’ll leave aside for the moment how we determine the bands and, thus, acceptable noise as we track along our technical baseline.

The second objection is valid only so far as any alignment of work-related indicators vary from project to project.  For example, some legs of the WBS tree go down nine levels and others go down five levels, based on the complexity of the work and the organizational breakdown structure (OBS).  Thus where we peg within each leg of the tree the control account (CA) and work package (WP) level becomes relative.  Do the schedule activities have a one-to-one relationship or many-to-one relationship with the WP level in all legs?  Or is the lowest level that the alignment can be made in certain legs at the CA level?

Given that planning begins with the contract spec and (ideally) proceed from IMP –> IMS –> WBS –> PMB in a continuity, then we will be able to determine the contributions of TPM to each WBS element at their appropriate level.

This then leads us to another objection, which is that not all organizations bother with developing an IMP.  That is a topic for another day, but whether such an artifact is created formally or not, one must achieve in practice the purpose of the IMP in order to get from contract spec to IMS under a sufficiently complex effort to warrant CPM scheduling and EVM.

The third objection is really a child of the second objection.  There very well may be TPMs, such as weight, with so many contributors that distributing the impact would both dilute the visibility of the TPM and present a level of arbitrariness in distribution that would render its tracking useless.  (Note that I am not saying that the impact cannot be distributed because, given modern software applications, this can easily be done in an automated fashion after configuration.  My concern is in regard to visibility on a TPM that could render the program a failure).  In these cases, as with other indicators that must be tracked, there will be high level programmatic or contract level TPMs.

So where do we go from here?  Alleman and Coonce suggest adjusting the formula for BCWP, where P is informed by technical risk.  The predictive measures guide takes a similar approach and emphasizes the systems engineering (SE) domain in getting to an assessment to determine the impact of reported EVM element performance.  The recommendation of the 1997 project that I headed in assignments across Navy and OSD, was to inform performance based on a risk assessment of probable achievement at each discrete performance milestone.  What all of these studies have in common, and in common with common industry practice using SE principles, is an intermediate assessment, informed by risk, of a technical performance index against a technical performance baseline.

So let’s explore this part of the equation more fully.

Given that we have MoE, MoP, and KPP are identified for the project, different methods of determining progress apply.  This can be a very simplistic set of TPMs that, through the acquisition or fabrication of compliant materials, meet contractual requirements.  These are contract level TPMs.  Depending on contract type, achievement of these KPPs may result in either financial penalties or financial reward.  Then there are the R&D-dependent MoEs, MoPs, and KPPs that require more discrete time-phasing and ties to the physical completion of work documented by through the WBS structure.  As with EVM on the measurement of the value of work, our index of physical technical achievement can be determined through various methods: current EVM methods, simulated Monte Carlo technical risk, 10-50-90 risk assessment, Bayesian analysis, etc.  All of these methods are designed to militate against selection bias and the inherent limitations of limited sample size and, hence, extreme subjectivity.  Still, expert opinion is a valid method of assessment and (in cases where it works) better than a WAG or coin flip.

Taken together these TPMs can be used to determine the technical achievement of the project or program over time, with a financial assessment of the future work needed to bring it in line.  These elements can be weighted, as suggested by Coonce, Alleman, and Price, through an assessment of relative risk to project success.  Some of these TPIs will apply to particular WBS elements at various levels (since their efforts are tied to specific activities and schedules via the IMS), and the most important project and program-level TPMs are reflected at that level.

What about double counting?  A comparison of the aggregate TPIs and the aggregate CPI and SPI will determine the fidelity of the WBS to technical achievement.  Furthermore, a proper baseline review will ensure that double counting doesn’t occur.  If the element can be accounted for within the reported EVM elements, then it need not be tracked separately by a TPI.  Only those TPMs that cannot be distributed or that represent such overarching risk to project success need be tracked separately, with an overall project assessment made against MR or any reprogramming budget available that can bring the project back into spec.

My last post on project management concerned the practices at what was called Google X.  There incentives are given to teams that identify an unacceptably high level of technical risk that will fail to pay off within the anticipated planning horizon.  If the A&D and DoD community is to become more nimble in R&D, it needs the necessary tools to apply such long established concepts such as Cost-As-An-Independent-Variable (CAIV), and Agile methods (without falling into the bottomless pit of unsupported assertions by the cult such as elimination of estimating and performance tracking).

Even with EVM, the project and program management community needs a feel for where their major programmatic efforts are in terms of delivery and deployment, in looking at the entire logistics and life cycle system.  The TPI can be the logic check of whether to push ahead, or finishing the low risk items that are remaining in R&D to move to first item delivery, or to take the lessons learned from the effort, terminate the project, and incorporate those elements into the next generation project or related components or systems.  This aligns with the concept of project alignment with framing assumptions as an early indicator of continued project investment at the corporate level.

No doubt, existing information systems, many built using 1990s technology and limited to line-and-staff functionality, do not provide the ability to do this today.  Of course, these same systems do not take into account a whole plethora of essential information regarding contract and financial management: from the tracking of CLINs/SLINs, to work authorization and change order processing, to the flow of funding from TAB to PMB/MR and from PMB to CA/UB/PP, contract incentive threshold planning, and the list can go on.  What this argues for is innovation and rewarding those technology solutions that take a more holistic approach to project management within its domain as a subset of program, contract, and corporate management–and such solutions that do so without some esoteric promise of results at some point in the future after millions of dollars of consulting, design, and coding.  The first company or organization that does this will reap the rewards of doing so.

Furthermore, visibility equals action.  Diluting essential TPMs within an overarching set of performance metrics may have the effect of hiding them and failing to properly identify, classify, and handle risk.  Including TPI as an element at the appropriate level will provide necessary visibility to get to the meat of those elements that directly impact programmatic framing assumptions.

No Bucks, No Buck Rogers — Project Work Authorizations, Change Control, and Cash Flow

As I’ve written here most recently, the most significant proposal coming out of the Integrated Program Management Conference (IPMC) this year was the comprehensive manner of integrating all essential elements of a project, presented by Glen Alleman et al.  In their presentation, Alleman, Coonce, and Price, present a process flow (which, in my estimation, should be mirrored in data and information flow) in which program artifacts were imbued with measures of effectiveness, measures of performance, and measures of progress, to achieve an organic integration of all parts of the project that allow the project team to make a valid assessment of achievement against the plan, informed by risk and opportunity.  (Emphasis my own).  The three-legged stool of cost, schedule, and technical performance are thereby integrated properly at the appropriate level of the project structure, and done in such a way as to overcome the rigidity and fallacy of the single point estimate.

But, as is always the case with elegant models, while they replicate a sufficient portion of reality to allow us to make our assessments using statistical methods, there are other elements that we have purposely left out because our present models do not incorporate them into the normal and normative process.  They are considered situational, and so lie just outside of the process flow, though they insert themselves when necessary–and much more frequently than desired.  I am referring to the availability of money and resources, and the manner in which they affect the project: through work authorizations (WADs) and baseline change requests (BCRs).

I have seen situations where fully 90% of the effort in project management is devoted to manage and adjust the plan based on baseline changes.  This is particularly the case where estimates are poorly developed due to the excuse of uncertainty.  Of course there is uncertainty–that’s the purpose of developing a plan.  The issue isn’t the presence of risk (and opportunity) but that our risks are educated ones, that is, informed by familiarity with similar efforts, engineering assessment, core competency, and other empirical factors.  This is where the most radical elements of the Agile Cult gets it wrong–in focusing on risk and assuming that the only way to realize opportunity is to forgo the empirical process.  This is not only a misreading of risk and opportunity assessment in project management, it is a sort of neo-Luddite position regarding scientific management.

The environment in which a project operates undergoes change.  The framing assumptions of the project determine the expectations of scope, cost, and what defines success.  The concept of framing assumptions was fully developed in a RAND study that I covered in a previous blog post.  Most often, but not always, the change in framing assumptions is reflected in the WAD and BCR process, most often in the latter.  Thus, we have a means of determining and taking account of changes in framing assumptions.  This is in the normal process of project management, as opposed to the more obvious examples of a complete replan or over target baseline (OTB).

So where do we track WADs and BCRs in our processes that will provide us sufficient indicators in our measures of effectiveness, performance, and progress that our resources (both size and type) many not be sufficient or that these changes are sufficient enough that our framing assumptions have changed?  I would argue that the linkage for resources must also be made through the Integrated Master Plan (IMP) and reflect in the IMS, cross-referenced to the PMB.  Technology can provide the remainder of the ability to integrate these elements and provide the process flow necessary to provide early warning.  This integration goes beyond the traditional focus on cost and schedule (and the newly reintroduced emphasis on technical achievement).  It involves integration with resource management systems (personnel, skillset assignments, etc.) as well as financial management systems to determine the availability of money (both its sufficiency and “color”*) being applied to the right place at the right time.

Integrating these elements together then allows for more sophisticated methods of determining project success through the introduction of metrics that provide correlations between the elements.  It also answers, absent politics, the optimum level of both analysis and reporting.

*The “color” of money applies mostly to public investments in which monies appropriated are designed by their purpose:  operations, maintenance, engineering, R&D, etc.

Note: This post was modified to add a point of clarification in applying WADs and BCRs to the PMB.

Frame by Frame: Framing Assumptions and Project Success or Failure

When we wake up in the morning we enter the day with a set of assumptions about ourselves, our environment, and the world around us.  So too when we undertake projects.  I’ve just returned from the latest NDIA IPMD meeting in Washington, D.C. and the most intriguing presentation at the meeting was given by Irv Blickstein regarding a RAND root cause analysis of major program breaches.  In short, a major breach in the cost of a program is defined by the Nunn-McCurdy amendment that was first passed in 1982, in which a major defense program breaches its projected baseline cost by more than 15%.

The issue of what constitutes programmatic success and failure has generated a fair amount of discussion among the readers of this blog.  The report, which is linked above, is full of useful information regarding Major Defense Acquisition Program (also known as MDAP) breaches under Nunn-McCurdy, but for purposes of this post readers should turn to page 83.  In setting up a project (or program), project/program managers must make a set of assumptions regarding the “uncertain elements of program execution” centered around cost, technical performance, and schedule.  These assumptions are what are referred to as “framing assumptions.”

A framing assumption is one in which there are signposts along the way to determine if an assumption regarding the project/program has changed over time.  Thus, according to the authors, the precise definition of a framing assumption is “any explicit or implicit assumption that is central in shaping cost, schedule, or performance expectations.”  An interesting aspect of their perspective and study is that the three-legged stool of program performance relegates risk to serving as a method that informs the three key elements of program execution, not as one of the three elements.  I have engaged in several conversations over the last two weeks regarding this issue.  Oftentimes the question goes: can’t we incorporate technical performance as an element of risk?  Short answer:  No, you can’t (or shouldn’t).  Long answer: risk is a set of methods for overcoming the implicit invalidity of single point estimates found in too many systems being used (like estimates-at-complete, estimates-to-complete, and the various indices found in earned value management, as well as a means of incorporating qualitative environmental factors not otherwise categorizable), not an element essential to defining the end item application being developed and produced.  Looked at another way, if you are writing a performance specification, then performance is a key determinate of program success.

Additional criteria for a framing assumption are also provided in the RAND study.  These are that the assumptions must be determinative, that is, the consequences of the assumption being wrong significantly affects the program in an essential way.  They must also be unmitigable, that is, the consequences of the assumption being wrong are unavoidable.  They must be uncertain, that is, the outcome or certainty of it being right or wrong cannot be determined in advance.  They must be independent and not dependent on another event or series of events.  Finally, they must be distinctive, in setting the program apart from other efforts.

RAND then applied the framing assumption methodology to a number of programs.  The latest NDIA meeting was an opportunity to provide an update of conclusions based on the work first done in 2013.  What the researchers found was that framing assumptions which are kept at a high level, be developed early in a program’s life cycle, and should be reviewed on a regular basis to determine validity.  They also found that a program breached the threshold when a framing assumption became invalid.  Project and program managers, and requirements personnel have at least intuitively known this for quite some time.  Over the years, this is the reason given for requirements changes and contract modifications over the course of development that result in cost, performance, and schedule impacts.

What is different about the RAND study is that they have outlined a practical process for making these determinations early enough for a project/program to be adjusted with changing circumstances.  For example, the numbers of framing assumptions of all MDAPs in the study could be boiled down to four or five, which are easily tested against reality during the milestone and other reviews held over the course of a program.  This is particularly important given the lengthened time-frames of major acquisitions from development to production.

Looking at these results, my own observation is that this is a useful tool for identifying course corrections that are needed before they manifest into cost and schedule impacts, particularly given that leadership at PARCA has been stressing agile acquisition strategies.  The goal here, it seems, is to allow for course corrections before the inertia of the effort leads to failure or–more likely–the development and deployment of an end item that does not entirely meet the needs of the Defense Department.  (That such “disappointments” often far outstrip the capabilities of our adversaries is a topic for a different post).

I think the court is still out on whether course corrections, given the inertia of work and effort already expended at the point that a framing assumption would be tested as invalid, can ever truly be offsetting to the point of avoiding a breach, unless we then rebrand the existing effort as a new program once it has modified its structure to account for new framing assumptions.  Study after study has shown that project performance is pretty well baked in at the 20% mark.  For MDAPs, much of the front-loaded efforts in technology selection and application have been made.  After all, systems require inputs and to change a system requires more inputs, not less, to overcome the inertia of all of the previous effort, not to mention work in progress.   This is basic physics whether we are dealing with physical systems or complex adaptive (economic) systems.

Certainly, more efficient technology that affects the units of measurement within program performance can result in cost savings or avoidance, but that is usually not the case.  There is a bit of magical thinking here: that commercial technologies will provide a breakthrough to allow for such a positive effect.  This is an ideological idea not borne out by reality.  The fact is that most of the significant technological breakthroughs we have seen over the last 70 years–from the microchip to the internet and now to drones–have resulted from public investments, sometimes in public-private ventures, sometimes in seeded technologies that are then released into the public domain.  The purpose of most developmental programs is to invest in R&D to organically develop technologies (utilizing the talents of the quasi-private A&D industry) or provide economic incentives to incorporate technologies that do not currently exist.

Regardless, the RAND study has identified an important concept in determining the root causes of overruns.  It seems to me that a formalized process of identifying framing assumptions should be applied and done at the inception of the program.  The majority of the assessments to test the framing assumptions should then need to be made prior to the 20% mark as measured by program schedule and effort.  It is easier and more realistic to overcome the bow-wave of effort at that point than further down the line.

Note: I have modified the post to clarify my analysis of the “three-legged stool” of program performance in regard to where risk resides.