Ground Control from Major Tom — Breaking Radio Silence: New Perspectives on Project Management

Since I began this blog I have used it as a means of testing out and sharing ideas about project management, information systems, as well to cover occasional thoughts about music, the arts, and the meaning of wisdom.

My latest hiatus from writing was due to the fact that I was otherwise engaged in a different sort of writing–tech writing–and in exploring some mathematical explorations related to my chosen vocation, aside from running a business and–you know–living life.  There are only so many hours in the day.  Furthermore, when one writes over time about any one topic it seems that one tends to repeat oneself.  I needed to break that cycle so that I could concentrate on bringing something new to the table.  After all, it is not as if this blog attracts a massive audience–and purposely so.  The topics on which I write are highly specialized and the members of the community that tend to follow this blog and send comments tend to be specialized as well.  I air out thoughts here that are sometimes only vaguely conceived so that they can be further refined.

Now that that is out of the way, radio silence is ending until, well, the next contemplation or massive workload that turns into radio silence.

Over the past couple of months I’ve done quite a bit of traveling, and so have some new perspectives that and trends that I noted and would like to share, and which will be the basis (in all likelihood) of future, more in depth posts.  But here is a list that I have compiled:

a.  The time of niche analytical “tools” as acceptable solutions among forward-leaning businesses and enterprises is quickly drawing to a close.  Instead, more comprehensive solutions that integrate data across domains are taking the market and disrupting even large players that have not adapted to this new reality.  The economics are too strong to stay with the status quo.  In the past the barrier to integration of more diverse and larger sets of data was the high cost of traditional BI with its armies of data engineers and analysts providing marginal value that did not always square with the cost.  Now virtually any data can be accessed and visualized.  The best solutions, providing pre-built domain knowledge for targeted verticals, are the best and will lead and win the day.

b.  Along these same lines, apps and services designed around the bureaucratic end-of-month chart submission process are running into the new paradigm among project management leaders that this cycle is inadequate, inefficient, and ineffective.  The incentives are changing to reward actual project management in lieu of project administration.  The core fallacy of apps that provide standard charts based solely on user’s perceptions of looking at data is that they assume that the PM domain knows what it needs to see.  The new paradigm is instead to provide a range of options based on the knowledge that can be derived from data.  Thus, while the options in the new solutions provide the standard charts and reports that have always informed management, KDD (knowledge discovery in database) principles are opening up new perspectives in understanding project dynamics and behavior.

c.  Earned value is *not* the nexus of Integrated Project Management (IPM).  I’m sure many of my colleagues in the community will find this statement to be provocative, only because it is what they are thinking but have been hesitant to voice.  A big part of their hesitation is that the methodology is always under attack by those who wish to avoid accountability for program performance.  Thus, let me make a point about Earned Value Management (EVM) for clarity–it is an essential methodology in assessing project performance and the probability of meeting the constraints of the project budget.  It also contributes data essential to project predictive analytics.  What the data shows from a series of DoD studies (currently sadly unpublished), however, is that it is planning (via a Integrated Master Plan) and scheduling (via an Integrated Master Schedule) that first ties together the essential elements of the project, and will record the baking in of risk within the project.  Risk manifested in poorly tying contract requirements, technical performance measures, and milestones to the plan, and then manifested in poor execution will first be recorded in schedule (time-based) performance.  This is especially true for firms that apply resource-loading in their schedules.  By the time this risk translates and is recorded in EVM metrics, the project management team is performing risk handling and mitigation to blunt the impact on the performance management baseline (the money).  So this still raises the question: what is IPM?  I have a few ideas and will share those in other posts.

d.  Along these lines, there is a need for a Schedule (IMS) Gold Card that provides the essential basis of measurement of programmatic risk during project execution.  I am currently constructing one with collaboration and will put out a few ideas.

e.  Finally, there is still room for a lot of improvement in project management.  For all of the gurus, methodologies, consultants, body shops, and tools that are out there, according to PMI, more than a third of projects fail to meet project goals, almost half to meet budget expectations, less than half finished on time, and almost half experienced scope creep, which, I suspect, probably caused “failure” to be redefined and under-reported in their figures.  The assessment for IT projects is also consistent with this report, with CIO.com reporting that more than half of IT projects fail in terms of meeting performance, cost, and schedule goals.  From my own experience and those of my colleagues, the need to solve the standard 20-30% slippage in schedule and similar overrun in costs is an old refrain.  So too is the frustration that it need take 23 years to deploy a new aircraft.  A .5 CPI and SPI (to use EVM terminology) is not an indicator of success.  What this indicates, instead, is that there need to be some adjustments and improvements in how we do business.  The first would be to adjust incentives to encourage and reward the identification of risk in project performance.  The second is to deploy solutions that effectively access and provide information to the project team that enable them to address risk.  As with all of the points noted in this post, I have some other ideas in this area that I will share in future posts.

Onward and upward.

Technical Ecstacy — Technical Performance and Earned Value

As many of my colleagues in project management know, I wrote a series of articles on the application of technical performance risk in project management back in 1997, one of which made me an award recipient from the institution now known as Defense Acquisition University.  Over the years various researchers and project organizations have asked me if I have any additional thoughts on the subject and the response up until now has been: no.  From a practical standpoint, other responsibilities took me away from the domain of determining the best way of recording technical achievement in complex projects.  Furthermore, I felt that the field was not ripe for further development until there were mathematics and statistical methods that could better approach the behavior of complex adaptive systems.

But now, after almost 20 years, there is an issue that has been nagging at me since publication of the results of the project studies that I led from 1995 through 1997.  It is this: the complaint by project managers in resisting the application of measuring technical achievement of any kind, and integrating it with cost performance, the best that anyone can do is 100%.  “All TPM can do is make my performance look worse”, was the complaint.  One would think this observation would not only not face opposition, especially from such an engineering dependent industry, but also because, at least in this universe, the best you can do is 100%.*  But, of course, we weren’t talking about the same thing and I have heard this refrain again at recent conferences and meetings.

To be honest, in our recommended solution in 1997, we did not take things as far as we could have.  It was always intended to be the first but not the last word regarding this issue.  And there have been some interesting things published about this issue recently, which I noted in this post.

In the discipline of project management in general, and among earned value practitioners in particular, the performance being measured oftentimes exceeds 100%.  But there is the difference.  What is being measured as exceeding 100% is progress against both a time-based and fiscally-based linear plan.  Most of the physical world doesn’t act nor can it be measured this way.  When measuring the attributes of a system or component against a set of physical or performance thresholds, linearity against a human-imposed plan oftentimes goes out the window.

But a linear progression can be imposed on the development toward the technical specification.  So then the next question is how do we measure progress during the development curve and duration.

The short answer, without repeating a summarization of the research (which is linked above) is through risk assessment, and the method that we used back in 1997 was a distribution curve that determined the probability of reaching the next step in the technical development.  This was based on well-proven systems engineering techniques that had been used in industry for many years, particularly at pre-Lockheed Martin Martin Marietta.  Technical risk assessment, even using simplistic 0-50-80-100 curves, provides a good approximation of probability and risk between each increment of development, though now there are more robust models.  For example, the use of Bayesian methodology, which introduces mathematical rigor into statistics, as outlined in this post by Eliezer Yudkowsky.  (As an aside, I strongly recommend his blogs for anyone interested in the cutting edge of rational inquiry and AI).

So technical measurement is pretty well proven.  But the issue that then presents itself (and presented itself in 1997) was how to derive value from technical performance.  Value is a horse of a different color.  The two bugaboos that were presented as being impassible roadblocks were weight and test failure.

Let’s take weight first.  On one of my recent trips I found myself seated in an Embraer E-jet.  These are fairly small aircraft, especially compared to conventional commercial aircraft, and are lightweight.  As such, they rely on a proper distribution and balance of weight, especially if one finds oneself at 5,000 feet above sea level with the long runway shut down, a 10-20 mph crosswind, and a mountain range rising above the valley floor in the direction of takeoff.  So the flight crew, when the cockpit noted a weight disparity, shifted baggage from belly stowage to the overhead compartments in the main cabin.  What was apparent is that weight is not an ad hoc measurement.  The aircraft’s weight distribution and tolerances are documented–and can be monitored as part of operations.

When engineering an aircraft, each component is assigned its weight.  Needless to say, weight is then allocated and measured as part of the development of subsystems of the aircraft.  One would not measure the overall weight of the aircraft or end item without ensuring that the components and subsystems did not conform to the weight limitations.  The overall weight limitation of an aircraft will very depending on mission and use.  If a commercial-type passenger airplane built to takeoff and land from modern runways, weight limitations are not as rigorous.  If the aircraft in question is going to takeoff and land from a carrier deck at sea then weight limitations become more critical.  (Side note:  I also learned these principles in detail while serving on active duty at NAS Norfolk and working with the Navy Air Depot there).  Aside from aircraft weight is important in a host of other items–from laptops to ships.  In the latter case, of which I am also intimately familiar, weight is important in balancing the ship and its ability to make way in the water (and perform its other missions).

So given that weight is an allocated element of performance within subsystem or component development, we achieve several useful bits of information.  First off, we can aggregate and measure weight of the entire end item to track if we are meeting the limitations of the item.  Secondly, we can perform trade-off.  If a subsystem or component can be made with a lighter material or more efficiently weight-wise, then we have more leeway (maybe) somewhere else.  Conversely, if we need weight for balance and the component or subsystem is too light, we need to figure out how to add weight or ballast.  So measuring and recording weight is not a problem. Finally, we allocate and tie performance-wise a key technical specification to the work, avoiding subjectivity.

So how to do we show value?  We do so by applying the same principles as any other method of earned value.  Each item of work is covered by a Work Breakdown Structure (WBS), which is tied (hopefully) to an Integrated Master Schedule (IMS).  A Performance Management Baseline (PMB) is applied to the WBS (or sometimes thought a resource-loaded IMS).  If we have properly constructed our Integrated Management Plan (IMP) prior to the IMS, we should clearly have tied the relationship of technical measures to the structure.  I acknowledge that not every program performs an IMP, but stating so is really an acknowledgement of a clear deficiency in our systems, especially involving complex R&D programs.  Since our work is measured in short increments against a PMB, we can claim 100% of a technical specification but be ahead of plan for the WBS elements involved.

It’s not as if the engineers in our industrial activities and aerospace companies have never designed a jet aircraft or some other item before.  Quite a bit of expertise and engineering know-how transfers from one program to the next.  There is a learning curve.  The more information we collect in that regard, the more effective that curve.  Hence my emphasis in recent posts on data.

For testing, the approach is the same.  A test can fail, that is, a rocket can explode on the pad or suffer some other mishap, but the components involved will succeed or fail based on the after-action report.  At that point we will know, through allocation of the test results, where we are in terms of technical performance.  While rocket science is involved in the item’s development, recording technical achievement is not rocket science.

Thus, while our measures of effectiveness, measures of performance, measures of progress, and technical performance will determine our actual achievement against a standard, our fiscal assessment of value against the PMB can still reflect whether we are ahead of schedule and below budget.  What it takes is an understanding of how to allocate more rigorous measures to the WBS that are directly tied to the technical specifications.  To do otherwise is to build a camel when a horse was expected or–as has been recorded in real life in previous programs–to build a satellite that cannot communicate, a Navy aircraft that cannot land on a carrier deck, a ship that cannot fight, and a vaccine that cannot be delivered and administered in the method required.  We learn from our failures, and that is the value of failure.

 

*There are colloquial expressions that allow for 100% to be exceeded, such as exceeding 100% of the tolerance of a manufactured item or system, which essentially means to exceed its limits and, therefore, breaking it.

For What It’s Worth — More on the Materiality and Prescriptiveness Debate and How it Affects Technological Solutions

The underlying basis on the materiality vs. prescriptiveness debate that I previously wrote about lies in two areas:  contractual compliance, especially in the enforcement of public contracts, and the desired outcomes under the establishment of a regulatory regime within an industry.  Sometimes these purposes are in agreement and sometimes they are in conflict and work at cross-purposes to one another.

Within a simple commercial contractual relationship, there are terms and conditions established that are based on the expectation of the delivery of supplies and services.  In the natural course of business these transactions are usually cut-and-dried: there is a promise for a promise, a meeting of the minds, consideration, and performance.  Even in cases that are heavily reliant on services, where the terms are bit more “fuzzy,” the standard is that the work being performed be done in a “workmanlike” or “professional” manner, usually defined by the norms of the trade or profession involved.  There is some judgment here depending on the circumstances, and so disputes tend to be both contentious and justice oftentimes elusive where ambiguity reigns.

In research and development contracts the ambiguities and contractual risks are legion.  Thus, the type of work and the ability to definitize that work will, to the diligent contract negotiator, determine the contract type that is selected.  In most cases in the R&D world, especially in government, contract types reflect a sharing and handling of risk that is reflected in the use of cost-plus type contracts.

Under this contract type, the effort is reimbursed to the contractor, who must provide documentation on expenses, labor hours, and work accomplished.  Overhead, G&A, and profit is negotiated based on a determination of what is fair and reasonable against benchmarks in the industry, which will be ultimately determined through negotiation of the parties.  A series of phases and milestones are established to mark the type of work that is expected to be accomplished over time.  The ultimate goal is the produce a prototype end item application that meets the needs of the agency, whether that agency is the Department of Defense or some other civilian agency in the government.

The period of performance of the contracts in these cases, depending on the amount of risk in research and development in pushing the requisite technology, usually involving several years.  Thus, the areas of concern given the usually high dollar value, inherent risk, and extended periods, involve:

  1. The reliability, accuracy, quality, consistency, and traceability of the underlying systems that report expenditures, effort, and progress;
  2. Measures that are indicative of whether all of the aspects of the eventual end item will meet elements that constitute expectations and standards of effectiveness, performance, and technical achievement.  These measures are conducted within the overall cost and schedule constraints of the contracted effort;
  3. Assessment over the lifecycle of the contract regarding external, as well as internal technical, qualitative, and quantitative risks of the effort;
  4. The ability of items 1 through 3 above to provide an effective indication or early warning that the contractual vehicle will significantly vary from either the contractual obligations or the established elements outlining the physical, performance, and technical characteristics of the end item.
  5. The more mundane, but no less important, verification of contractual performance against the terms and conditions to avoid a condition of breach.

Were these the only considerations in public contracting related to project management our work in evaluating these relationships, while challenging, would be fairly cut-and-dried given that they would be looked at from a contracting perspective.  But there is also a systemic purpose for a regulatory regime.  These are often in conflict with one another.  Such requirements as compliance, surveillance, process improvement, and risk mitigation are looking at the same systems, but from different perspectives with, ultimately, differing reactions, levels of effectiveness, and results.  What none of these purposes includes is a punitive purpose or result–a line oftentimes overstepped, in particular, by private parties.  This does not mean that some regulations that require compliance with a law do not come with civil penalties, but we’ll get to that in a moment.

The underlying basis of any regulatory regime is established in law.  The sovereign–in our case the People of the United States through the antecedent documents of governance, including the U.S. Constitution and Constitutions of the various states, as well as common law–possesses an inherent right to regulate the health, safety, and welfare of the people.  The Preamble of the U.S. Constitution actually specifies this purpose in writing, but in broader terms.  Thus, the purposes of a regulatory regime when it comes to this specific issue are what are at issue.

The various reasons usually are as follows:

  1. To prevent an irreversible harm from occurring.
  2. To enforce a particular level of professional conduct.
  3. To ensure compliance with a set of regulations or laws, especially where ambiguities in civil and common law have yielded judicial uncertainty.
  4. To determine the level of surveillance of a regulated system that is needed based on a set of criteria.
  5. To encourage particular behaviors.
  6. To provide the basis for system process improvement.

Thus, in applying a regulation there are elements that go beyond the overarching prescriptiveness vs. materiality debate.  Materiality only speaks to relevance or significance, while prescriptiveness relates to “block checking”–the mindless work of the robotic auditor.

For example, let’s take the example of two high profile examples of regulation in the news today.

The first concerns the case of where Volkswagen falsified its emissions test results for a good many of its vehicles.  The role of the regulator in this case was to achieve a desired social end where the state has a compelling interest–the reduction of air pollution from automobiles.  The regulator–the Environmental Protection Agency (EPA)–found the discrepancy and issued a notice of violation of the Clean Air Act.  The EPA, however, did not come across this information on its own.  Since we are dealing with multinational products, the initial investigation occurred in Europe under a regulator there and the results passed to the EPA.  The corrective action is to recall the vehicles and “make the parties whole.”  But in this case the regulator’s remedy may only be the first line of product liability.  It will be hard to recall the pollutants released into the atmosphere and breach of implicit contract with the buyers of the automobiles.  Whether a direct harm can be proven is now up to the courts, but given that executives in an internal review (article already cited) admitted that executives knew about deception, the remedies now extend to fraud.  Regardless of the other legal issues,

The other high profile example is the highly toxic levels of lead in the drinking water of Flint, Michigan.  In this case the same regulator, the EPA, has issued a violation of federal law in relation to safe drinking water.  But as with the European case, the high levels of lead were first discovered by local medical personnel and citizens.  Once the discrepancy was found a number of actions were required to be taken to secure proper drinking water.  But the damage has been done.  Children in particular tend to absorb lead in their neurological systems with long term adverse results.  It is hard to see how the real damage that has been inflicted will make the damaged parties whole.

Thus, we can see two things.  First, the regulator is firmly within the tradition of regulating the health, safety, and welfare, particularly the first category and second categories.  Secondly, the regulatory regime is reactive.

While obviously the specific illnesses caused by the additional pollution form Volkswagen vehicles is probably not directly traceable to harm, the harm in the case of elevated lead levels in Flint’s water supply is both traceable and largely irreversible.

Thus, in comparing these two examples, we can see that there are other considerations than the black and white construct of materiality and prescriptiveness.  For example, there are considerations of irreversible harm, prevention, proportionality, judgment, and intentional results.

The first reason for regulation listed above speaks to irreversible harm.  In these cases proportionality and prevention are the main concerns.  Ensuring that those elements are in place that will prevent some catastrophic or irreversible harm through some event or series of events is the primary goal in these cases.  When I say harm I do not mean run of the mill, litigious, constructive “harm” in the legal or contractual sense, but great harm–life and death, resulting disability, loss of livelihood, catastrophic market failure, denial of civil rights, and property destruction kind of harm.  In enforcing such goals, these fall most in line with prescriptiveness–the establishment of particular controls which, if breached, would make it almost impossible to fully recover without a great deal of additional cost or effort.  Furthermore, when these failures occur a determination of culpability or non-culpability is determined.  The civil penalties in these cases, where not specified by statute, are ideally determined by proportionality of the damage.  Oftentimes civil remedies are not appropriate since these often involve violations of law.  This occurs, in real life, from the two main traditional approaches to audit and regulation being rooted in prescriptive and judgmental approaches.

The remainder of the reasons for regulation provide degrees of oversight and remedy that are not only proportional to the resulting findings and effects, but also to the goal of the regulation and its intended behavioral response.  Once again, apart from the rare and restricted violations given in the first category above, these regulations are not intended to be enforced in a punitive manner, though there can be penalties for non-compliance.  Thus, proportionality, purpose, and reasonableness are additional considerations to take into account.  These oftentimes fall within the general category of materiality.

Furthermore, going beyond prescriptiveness and materiality, a paper entitled Applying Systems-Thinking to Reduce Check-the-Box Decisions in the Audit of Complex Estimates, by Anthony Bucaro at the University of Illinois at Urbana-Champaign, proposes an alternative auditing approach that also is applicable to other types of regulation, including contract management.  The issue that he is addressing is the fact that today, in using data, a new approach is needed to shift the emphasis to judgment and other considerations in whether a discrepancy warrants a finding of some sort.

This leads us, then, to the reason why I went down this line of inquiry.  Within project management, either a contractual or management prerogative already exists to apply a set of audits and procedures to ensure compliance with established business processes.  Particular markets are also governed by statutes regulating private conduct of a public nature.  In the government sphere, there is an added layer of statutes that prescribe a set of legal and administrative guidance.  The purposes of these various rules varies.  Obviously breaking a statute will garner the most severe and consequential penalties.  But the set of regulatory and administrative standards often act at cross purposes, and in their effect, do not rise to the level of breaking a law, unless they are necessary elements in complying with that law.

Thus, a whole host of financial and performance data assessing what, at the core, is a very difficult “thing” to execute (R&D leading to a new end item), offers some additional hazards under these rules.  The underlying question, outside of statute, concerns what the primary purpose should be in ensuring their compliance.  Does it pass the so-what? test if a particular administrative procedure is not followed to the letter?

Taking a broader approach, including a data-driven and analytical one, removes much of the arbitrariness when judgment and not box-checking is the appropriate approach.  Absent a consistent and wide pattern that demonstrates a lack of fidelity and traceability of data within the systems that have been established, auditors and public policymakers must look at the way that behavior is affected.  Are there incentives to hide or avoid issues, and are there sufficient incentives to find and correct deficiencies?  Are the costs associated with dishonest conclusions adequately addressed, and are there ways of instituting a regime that encourages honesty?

At the core is technology–both for the regulated and the regulator.  If the data that provides the indicators of compliance come, unhindered, from systems of record, then dysfunctional behaviors are minimized.  If that data is used in the proper manner by the regulator in driving a greater understanding of the systemic conditions underlying the project, as well as minimizing subjectivity, then the basis for trust is established in determining the most appropriate means of correcting a deficiency.  The devil is in the details, of course.  If the applied technology simply reproduces the check-block mentality, then nothing has been accomplished.  Business intelligence and systems intelligence must be applied in order to achieve the purposes that I outlined earlier.

 

Measure for Measure — Must Read: Dave Gordon Is Looking for Utilitarian Metrics at AITS.org

Dave Gordon at his AITS.org blog deals with the issue of metrics and what makes them utilitarian, this is, “actionable.”  Furthermore at his Practicing IT Project Management blog he challenges those in the IT program management community to share real life examples.  The issue of measures and whether they pass the “so-what?” test in an important one, since chasing, and drawing improper conclusions from, the wrong ones are a waste of money and effort at best, and can lead one to make very bad business decisions at worst.

In line with Dave’s challenge, listed below are the types of metrics (or measures) that I often come across.

1.  Measures of performance.  This type of metric is characterized by actual performance against a goal for a physical or functional attribute of the system being developed.  It can be measured across time as one of the axes, but the ultimate benchmark against what is being measured is against the requirement or goal.  Technical performance measurements often fall into this category, though I have seen instances where these TPM is listed in its own category.  I would argue that such separation is artificial.

2.  Measures of progress.  This type of metric is often time-based, oftentimes measured against a schedule or plan.  Measurement of schedule variances in terms of time or expenditure rates against a budget often fall into this category.

3.  Measures of compliance.  This type of metric is one that measures systemic conditions that must be met which, if not, indicates a fatal error in the integrity of the system.

4.  Measures of effectiveness.  This type of metric tracks against those measures related to the operational objectives of the project, usually specified under particular conditions.

5.  Measures of risk.  This type of metric measures quantitatively the effects of qualitative, systemic, and inherent risk.  Oftentimes qualitative and quantitative risk are separated, which is the means of identification and whether that means is recorded either indirectly or directly.  But, in reality, they are measuring different aspects and causes of the same phenomenon.

6.  Measures of health.  This type of metric measures the relative health of a system against a set of criteria.  In medicine there are a set of routine measures for biological subjects.  Measures of health distinguish themselves from measures of compliance in that any variation, while indicative of a possible problem, is not necessarily fatal.  Thus, a range of acceptable indicators or even some variation within the indicators can be acceptable.  So while these measures may point to a system issue, borderline areas may warrant additional investigation.

In any project management system, there are often correct and incorrect ways of constructing these measures.  The basis for determining whether they are correct, I think, is whether the end result metric possesses materiality and traceability to a particular tangible state or criteria.  According to Dave and others, a test of a good metric is whether it is “actionable”.  This is certainly a desirable characteristic, but I would suggest not a necessary one and is contained within materiality and traceability.

For example, some metrics are simply indicators, which suggest further investigation; others suggest an action when viewed in combination with others.  There is no doubt that the universe of “qualitative” measures is shrinking as we have access to bigger and better data that provide us with quantification.  Furthermore as stochastic and other mathematical tools develop, we will have access to more sophisticated means of measurement.  But for the present there will continue to be some of these non-quantifiable measures only because, with experience, we learn that there are different dimensions in measuring the behavior of complex adaptive systems over time that are yet to be fully understood, much less measured.

I also do not mean for this to be an exhaustive list.  Others that have some overlap to what I’ve listed come to mind, such as measures of efficiency (different than effectiveness and performance in some subtle ways), measures of credibility or fidelity (which has some overlap with measures of compliance and health, but really points to a measurement of measures), and measures of learning or adaptation, among others.

Frame by Frame: Framing Assumptions and Project Success or Failure

When we wake up in the morning we enter the day with a set of assumptions about ourselves, our environment, and the world around us.  So too when we undertake projects.  I’ve just returned from the latest NDIA IPMD meeting in Washington, D.C. and the most intriguing presentation at the meeting was given by Irv Blickstein regarding a RAND root cause analysis of major program breaches.  In short, a major breach in the cost of a program is defined by the Nunn-McCurdy amendment that was first passed in 1982, in which a major defense program breaches its projected baseline cost by more than 15%.

The issue of what constitutes programmatic success and failure has generated a fair amount of discussion among the readers of this blog.  The report, which is linked above, is full of useful information regarding Major Defense Acquisition Program (also known as MDAP) breaches under Nunn-McCurdy, but for purposes of this post readers should turn to page 83.  In setting up a project (or program), project/program managers must make a set of assumptions regarding the “uncertain elements of program execution” centered around cost, technical performance, and schedule.  These assumptions are what are referred to as “framing assumptions.”

A framing assumption is one in which there are signposts along the way to determine if an assumption regarding the project/program has changed over time.  Thus, according to the authors, the precise definition of a framing assumption is “any explicit or implicit assumption that is central in shaping cost, schedule, or performance expectations.”  An interesting aspect of their perspective and study is that the three-legged stool of program performance relegates risk to serving as a method that informs the three key elements of program execution, not as one of the three elements.  I have engaged in several conversations over the last two weeks regarding this issue.  Oftentimes the question goes: can’t we incorporate technical performance as an element of risk?  Short answer:  No, you can’t (or shouldn’t).  Long answer: risk is a set of methods for overcoming the implicit invalidity of single point estimates found in too many systems being used (like estimates-at-complete, estimates-to-complete, and the various indices found in earned value management, as well as a means of incorporating qualitative environmental factors not otherwise categorizable), not an element essential to defining the end item application being developed and produced.  Looked at another way, if you are writing a performance specification, then performance is a key determinate of program success.

Additional criteria for a framing assumption are also provided in the RAND study.  These are that the assumptions must be determinative, that is, the consequences of the assumption being wrong significantly affects the program in an essential way.  They must also be unmitigable, that is, the consequences of the assumption being wrong are unavoidable.  They must be uncertain, that is, the outcome or certainty of it being right or wrong cannot be determined in advance.  They must be independent and not dependent on another event or series of events.  Finally, they must be distinctive, in setting the program apart from other efforts.

RAND then applied the framing assumption methodology to a number of programs.  The latest NDIA meeting was an opportunity to provide an update of conclusions based on the work first done in 2013.  What the researchers found was that framing assumptions which are kept at a high level, be developed early in a program’s life cycle, and should be reviewed on a regular basis to determine validity.  They also found that a program breached the threshold when a framing assumption became invalid.  Project and program managers, and requirements personnel have at least intuitively known this for quite some time.  Over the years, this is the reason given for requirements changes and contract modifications over the course of development that result in cost, performance, and schedule impacts.

What is different about the RAND study is that they have outlined a practical process for making these determinations early enough for a project/program to be adjusted with changing circumstances.  For example, the numbers of framing assumptions of all MDAPs in the study could be boiled down to four or five, which are easily tested against reality during the milestone and other reviews held over the course of a program.  This is particularly important given the lengthened time-frames of major acquisitions from development to production.

Looking at these results, my own observation is that this is a useful tool for identifying course corrections that are needed before they manifest into cost and schedule impacts, particularly given that leadership at PARCA has been stressing agile acquisition strategies.  The goal here, it seems, is to allow for course corrections before the inertia of the effort leads to failure or–more likely–the development and deployment of an end item that does not entirely meet the needs of the Defense Department.  (That such “disappointments” often far outstrip the capabilities of our adversaries is a topic for a different post).

I think the court is still out on whether course corrections, given the inertia of work and effort already expended at the point that a framing assumption would be tested as invalid, can ever truly be offsetting to the point of avoiding a breach, unless we then rebrand the existing effort as a new program once it has modified its structure to account for new framing assumptions.  Study after study has shown that project performance is pretty well baked in at the 20% mark.  For MDAPs, much of the front-loaded efforts in technology selection and application have been made.  After all, systems require inputs and to change a system requires more inputs, not less, to overcome the inertia of all of the previous effort, not to mention work in progress.   This is basic physics whether we are dealing with physical systems or complex adaptive (economic) systems.

Certainly, more efficient technology that affects the units of measurement within program performance can result in cost savings or avoidance, but that is usually not the case.  There is a bit of magical thinking here: that commercial technologies will provide a breakthrough to allow for such a positive effect.  This is an ideological idea not borne out by reality.  The fact is that most of the significant technological breakthroughs we have seen over the last 70 years–from the microchip to the internet and now to drones–have resulted from public investments, sometimes in public-private ventures, sometimes in seeded technologies that are then released into the public domain.  The purpose of most developmental programs is to invest in R&D to organically develop technologies (utilizing the talents of the quasi-private A&D industry) or provide economic incentives to incorporate technologies that do not currently exist.

Regardless, the RAND study has identified an important concept in determining the root causes of overruns.  It seems to me that a formalized process of identifying framing assumptions should be applied and done at the inception of the program.  The majority of the assessments to test the framing assumptions should then need to be made prior to the 20% mark as measured by program schedule and effort.  It is easier and more realistic to overcome the bow-wave of effort at that point than further down the line.

Note: I have modified the post to clarify my analysis of the “three-legged stool” of program performance in regard to where risk resides.

Mo’Better Risk — Tournaments and Games of Failure Part II

My last post discussed economic tournaments and games of failure in how they describe the success and failure of companies, with a comic example for IT start-up companies.  Glen Alleman at his Herding Cats blog has a more serious response in handily rebutting those who believe that #NoEstimates, Lean, Agile, and other cult-like fads can overcome the bottom line, that is, apply a method to reduce inherent risk and drive success.  As Glen writes:

“It’s about the money. It’s always about the money. Many want it to be about them or their colleagues, or the work environment, or the learning opportunities, or the self actualization.” — Glen Alleman, Herding Cats

Perfectly good products and companies fail all the time.  Oftentimes the best products fail to win the market, or do so only fleetingly.  Just think of the roles of the dead (or walking dead) over the years:  Novell, WordPerfect, Visicalc, Harvard Graphics; the list can go on and on.  Thus, one point that I would deviate from Glen is that it is not always EBITDA.  If that were true then both Facebook and Amazon would not be around today.  We see tremendous payouts to companies with promising technologies acquired for outrageous sums of money, though they have yet to make a profit.  But for every one of these there are many others that see the light of day for a moment and then flicker out of existence

So what is going on and how does this inform our knowledge of project management?  For the measure of our success is time and money, in most cases.  Obviously not all cases.  I’ve given two cases of success that appeared to be failure in previous posts to this blog: the M1A1 Tank and the ACA.  The reason why these “failures” were misdiagnosed was that the agreed measure(s) of success were incorrect.  Knowing this difference, where, and how it applies is important.

So how do tournaments and games of failure play a role in project management?  I submit that the lesson learned from these observations is that we see certain types of behaviors that are encouraged that tend to “bake” certain risks into our projects.  In high tech we know that there will be a thousand failures for every success, but it is important to keep the players playing–at least it is in the interest of the acquiring organization to do so, and is in the public interest in many cases as well.  We also know that most IT projects by most measures–both contracted out and organic–tend to realize a high rate of failure.  But if you win an important contract or secure an important project, the rewards can be significant.

The behaviors that are reinforced in this scenario on the part of the competing organization is to underestimate the cost and time involved in the effort; that is, so-called “bid to win.”  On the acquiring organization’s part, contracting officers lately have been all too happy to award contracts they know to be too low (and normally out of the competitive range) even though they realize it to be significantly below the independent estimate.  Thus “buying in” provides a significant risk that is hard to overcome.

Other behaviors that we see given the project ecosystem are the bias toward optimism and requirements instability.

In the first case, bias toward optimism, we often hear project and program managers dismiss bad news because it is “looking in the rear view mirror.”  We are “exploring,” we are told, and so the end state will not be dictated by history.  We often hear a version of this meme in cases where those in power wish to avoid accountability.  “Mistakes were made” and “we are focused on the future” are attempts to change the subject and avoid the reckoning that will come.  In most cases, however, particularly in project management, the motivations are not dishonest but, instead, sociological and psychological.  People who tend to build things–engineers in general, software coders, designers, etc.–tend to be an optimistic lot.  In very few cases will you find one of them who will refuse to take on a challenge.  How many cases have we presented a challenge to someone with these traits and heard the refrain:  “I can do that.”?  This form of self-delusion can be both an asset and a risk.  Who but an optimist would take on any technically challenging project?  But this is also the trait that will keep people working to the bitter end in a failure that places the entire enterprise at risk.

I have already spent some bits in previous posts regarding the instability of requirements, but this is part and parcel of the traits that we see within this framework.  Our end users determine that given how things are going we really need additional functionality, features, or improvements prior to the product roll out.  Our technical personnel will determine that for “just a bit more effort” they can achieve a higher level of performance or add capabilities at marginal or tradeoff cost.  In many cases, given the realization that the acquisition was a buy-in, project and program managers allow great latitude in accepting as a change an item that was assumed to be in the original scope.

There is a point where one or more of these factors is “baked in” into the course that the project will take.  We can delude ourselves into believing that we can change the course of the trajectory of the system through the application of methods: Agile, Lean, Six Sigma, PMBOK, etc. but, in the end, if we exhaust our resources without a road map on how to do this we will fail.  Our systems must be powerful and discrete enough to note the trend that is “baked in” due to factors in the structure and architecture of the effort being undertaken.  This is the core risk that must be managed in any undertaking.  A good example that applies to a complex topic like Global Warming was recently illustrated by Neil deGrasse Tyson in the series Cosmos:

In this example Dr. Tyson is climate and the dog is the weather.  But in our own analogy Dr. Tyson can be the trajectory of the system with the dog representing the “noise” of periodic indicators and activity around the effort.  We often spend a lot of time and effort (which I would argue is largely unproductive) on influencing these transient conditions in simpler systems rather than on the core inertia of the system itself.  That is where the risk lies. Thus, not all indicators are the same.  Some are measuring transient anomalies that have nothing to do with changing the core direction of the system, others are more valuable.  These latter indicators are the ones that we need to cultivate and develop, and they reside in an initial measurement of the inherent risk of the system largely based on its architecture that is antecedent to the start of the work.

This is not to say that we can do nothing about the trajectory.  A simpler system can be influenced more easily.  We cannot recover the effort already expended–which is why even historical indicators are important.  It is because they inform our future expectations and, if we pay attention to them, they keep us grounded in reality.  Even in the case of Global Warming we can change, though gradually, what will be a disastrous result if we allow things to continue on their present course.  In a deterministic universe we can influence the outcomes based on the contingent probabilities presented to us over time.  Thus, we will know if we have handled the core risk of the system by focusing on these better indicators as the effort progresses.  This will affect its trajectory.

Of course, a more direct way of modifying these risks is to make systemic adjustments.  Do we really need a tournament-based system as it exists and is the waste inherent in accepting so much failure really necessary?  What would that alternative look like?