Sledgehammer: Pisano Talks!

My blogging hiatus is coming to an end as I take a sledgehammer to the writer’s block wall.

I’ve traveled far and wide over the last six months to various venues across the country and have collected a number of new and interesting perspectives on the issues of data transformation, integrated project management, and business analytics and visualization. As a result, I have developed some very strong opinions regarding the trends that work and those that don’t regarding these topics and will be sharing these perspectives (with the appropriate supporting documentation per usual) in following posts.

To get things started this post will be relatively brief.

First, I will be speaking along with co-presenter John Collins, who is a Senior Acquisition Specialist at the Navy Engineering & Logistics Office, at the Integrated Program Management Workshop at the Hyatt Regency in beautiful downtown Baltimore’s Inner Harbor 10-12 December. So come on down! (or over) and give us a listen.

The topic is “Unlocking Data to Improve National Defense Systems”. Today anyone can put together pretty visualizations of data from Excel spreadsheets and other sources–and some have made quite a bit of money doing so. But accessing the right data at the right level of detail, transforming it so that its information content can be exploited, and contextualizing it properly through integration will provide the most value to organizations.

Furthermore, our presentation will make a linkage to what data is necessary to national defense systems in constructing the necessary artifacts to support the Department of Defense’s Planning, Programming, Budgeting and Execution (PPBE) process and what eventually becomes the Future Years Defense Program (FYDP).

Traditionally information capture and reporting has been framed as a question of oversight, reporting, and regulation related to contract management, capital investment cost control, and DoD R&D and acquisition program management. But organizations that fail to leverage the new powerful technologies that double processing and data storage capability every 18 months, allowing for both the depth and breadth of data to expand exponentially, are setting themselves up to fail. In national defense, this is a condition that cannot be allowed to occur.

If DoD doesn’t collect this information, which we know from the reports of cybersecurity agencies that other state actors are collecting, we will be at a serious strategic disadvantage. We are in a new frontier of knowledge discovery in data. Our analysts and program managers think they know what they need to be viewing, but adding new perspectives through integration provide new perspectives and, as a result, will result in new indicators and predictive analytics that will, no doubt, overtake current practice. Furthermore, that information can now be processed and contribute more, timely, and better intelligence to the process of strategic and operational planning.

The presentation will be somewhat wonky and directed at policymakers and decisionmakers in both government and industry. But anyone can play, and that is the cool aspect of our community. The presentation will be non-commercial, despite my day job–a line I haven’t crossed up to this point in this blog, but in this latter case will be changing to some extent.

Back in early 2018 I became the sole proprietor of SNA Software LLC–an industry technology leader in data transformation–particularly in capturing datasets that traditionally have been referred to as “Big Data”–and a hybrid point solution that is built on an open business intelligence framework. Our approach leverages the advantages of COTS (delivering the 80% solution out of the box) with open business intelligence that allows for rapid configuration to adapt the solution to an organization’s needs and culture. Combined with COTS data capture and transformation software–the key to transforming data into information and then combining it to provide intelligence at the right time and to the right place–the latency in access to trusted intelligence is reduced significantly.

Along these lines, I have developed some very specific opinions about how to achieve this transformation–and have put those concepts into practice through SNA and delivered those solutions to our customers. Thus, the result has been to reduce both the effort and time to capture large datasets from data that originates in pre-processed data, and to eliminate direct labor and the duration to information delivery by more than 99%. The path to get there is not to apply an army of data scientists and data analysts that deals with all data as if it is flat and to reinvent the wheel–only to deliver a suboptimized solution sometime in the future after unnecessarily expending time and resources. This is a devolution to the same labor-intensive business intelligence approaches that we used back in the 1980s and 1990s. The answer is not to throw labor at data that already has its meaning embedded into its information content. The answer is to apply smarts through technology, and that’s what we do.

Further along these lines, if you are using hard-coded point solutions (also called purpose-built software) and knitted best-of-breed, chances are that you will find that you are poorly positioned to exploit new technology and will be obsolete within the next five years, if not sooner. The model of selling COTS solutions and walking away except for traditional maintenance and support is dying. The new paradigm will be to be part of the solution and that requires domain knowledge that translates into technology delivery.

More on these points in future posts, but I’ve placed the stake in the ground and we’ll see how they hold up to critique and comment.

Finally, I recently became aware of an extremely informative and cutting-edge website that includes podcasts from thought leaders in the area of integrated program management. It is entitled InnovateIPM and is operated and moderated by a gentleman named Rob Williams. He is a domain expert in project cost development, with over 20 years of experience in the oil, gas, and petrochemical industries. Robin has served in a variety of roles throughout his career and is now focuses on cost estimating and Front-End Loading quality assurance. His current role is advanced project cost estimator at Marathon Petroleum’s Galveston Bay Refinery in Texas City.

Rob was also nice enough to continue a discussion we started at a project controls symposium and interviewed me for a podcast. I’ll post additional information once it is posted.

Rear View Mirror — Correcting a Project Management Fallacy

“The past is never dead. It’s not even past.” —  William Faulkner, Requiem for a Nun

Over the years I and others have briefed project managers on project performance using KPPs, earned value management, schedule analysis, business analytics, and what we now call predictive analytics. Oftentimes, some set of figures will be critiqued as being ineffective or unhelpful; that the analytics “only look in the rear view mirror” and that they “tell me what I already know.”

In approaching this critique, it is useful to understand Faulkner’s oft-cited quote above.  When we walk down a street, let us say it is a busy city street in any community of good size, we are walking in the past.  The moment we experience something it is in the past.  If we note the present condition of our city street we will see that for every building, park, sidewalk, and individual that we pass on that sidewalk, each has a history.  These structures and the people are as much driven by their pasts as their expectations for the future.

Now let us take a snapshot of our street.  In doing so we can determine population density, ethnic demographics, property values, crime rate, and numerous other indices and parameters regarding what is there.  No doubt, if we stop here we are just “looking in the rear view mirror” and noting what we may or may not know, however certain our anecdotal filter.

Now, let us say that we have an affinity for this street and may want to live there.  We will take the present indices and parameters that noted above, which describe our geographical environment, and trend it.  We may find that housing pricing are rising or falling, that crime is rising or falling, etc.  If we delve into the street’s ownership history we may find that one individual or family possesses more than one structure, or that there is a great deal of diversity.  We may find that a Superfund site is not too far away.  We may find that economic demographics are pointing to stagnation of the local economy, or that the neighborhood is becoming gentrified.  Just by time-phasing and delving into history–by mapping out the trends and noting the significant historical background–provides us with enough information to inform us about whether our affinity is grounded in reality or practicality.

But let us say that, despite negatives, we feel that this is the next up-and-coming neighborhood.  We would need signs to make that determination.  For example, what kinds of businesses have moved into the neighborhood and what is their number?  What demographic do they target?  There are many other questions that can be asked to see if our economic analysis is valid–and that analysis would need to be informed by risk.

The fact of the matter is that we are always living with the past: the cumulative effect of the past actions of numerous individuals, including our own, and organizations, groups of individuals, and institutions; not to mention larger economic forces well beyond our control.  Any desired change in the trajectory of the system being evaluated must identify those elements that can be impacted or influenced, and an analysis of the effort that must be expended to bring about the change, is also essential.

This is a scientific fact, proven countless times by physics, biology, and other disciplines.  A deterministic universe, which provides for some uncertainty at any given point at our level of existence, drives the possible within very small limits of possibility and even smaller limits of probability.  What this means in plain language is that the future is usually a function of the past.

Any one number or index, no doubt, does not necessarily tell us something important.  But it could if it is relevant, material, and prompts further inquiry essential to project performance.

For example, let us look at an integrated master schedule that underlies a typical medium-sized project.

 

We will select a couple of metrics that indicates project schedule performance.  In the case below we are looking at task hits and misses and Baseline Execution Index, a popular index that determines efficiency in meeting baseline schedule planning.

Note that the chart above plots the performance over time.  What will it take to improve our efficiency?  So as a quick logic check on realism, let’s take a look at the work to date with all of the late starts and finishes.

Our bow waves track the cumulative effort to date.  As we work to clear missed starts or missed finishes in a project we also must devote resources to the accomplishment of current work that is still in line with the baseline.  What this means is that additional resources may need to be devoted to particular areas of work accomplishment or risk handling.

This is not, of course, the limit to our analysis that should be undertaken.  The point here is that at every point in history in every system we stand at a point of the cumulative efforts, risk, failure, success, and actions of everyone who came before us.  At the microeconomic level this is also true within our project management systems.  There are also external constraints and influences that will define the framing assumptions and range of possibilities and probabilities involved in project outcomes.

The shear magnitude of the bow waves that we face in all endeavors will often be too great to fully overcome.  As an analogy, a bow wave in complex systems is more akin to a tsunami as opposed to the tidal waves that crash along our shores.  All of the force of all of the collective actions that have preceded present time will drive our trajectory.

This is known as inertia.

Identifying and understanding the contributors to the inertia that is driving our performance is important to knowing what to do.  Thus, looking in the rear view mirror is important and not a valid argument for ignoring an inconvenient metric that may only require additional context.  Furthermore, knowing where we sit is important and not insignificant.  Knowing the factors that put us where we are–and the effort that it will take to influence our destiny–will guide what is possible and not possible in our future actions.

Note:  All charted data is notional and is not from an actual project.

You Know I’m No Good: 2016 Election Polls and Predictive Analytics

While the excitement and emotions of this past election work themselves out in the populace at large, as a writer and contributor to the use of predictive analytics, I find the discussion about “where the polls went wrong” to be of most interest.  This is an important discussion, because the most reliable polling organizations–those that have proven themselves out by being right consistently on a whole host of issues since most of the world moved to digitization and the Internet of Things in their daily lives–seemed to be dead wrong in certain of their predictions.  I say certain because the polls were not completely wrong.

For partisans who point to Brexit and polling in the U.K., I hasten to add that this is comparing apples to oranges.  The major U.S. polling organizations that use aggregation and Bayesian modeling did not poll Brexit.  In fact, there was one reliable U.K. polling organization that did note two factors:  one was that the trend in the final days was toward Brexit, and the other is that the final result was based on turnout, where greater turnout favored the “stay” vote.

But aside from these general details, this issue is of interest in project management because, unlike national and state polling, where there are sufficient numbers to support significance, at the micro-microeconomic level of project management we deal with very small datasets that expand the range of probable results.  This is not an insignificant point that has been made time-and-time again over the years, particularly given single-point estimates using limited time-phased data absent a general model that provides insight into what are the likeliest results.  This last point is important.

So let’s look at the national polls on the eve of the election according to RealClear.  IBD/TIPP Tracking had it Trump +2 at +/-3.1% in a four way race.  LA Times/USC had it Trump +3 at the 95% confidence interval, which essentially means tied.  Bloomberg had Clinton +3, CBS had Clinton +4, Fox had Clinton +4, Reuters/Ipsos had Clinton +3, ABC/WashPost, Monmouth, Economist/YouGov, Rasmussen, and NBC/SM had Clinton +2 to +6.  The margin for error for almost all of these polls varies from +/-3% to +/-4%.

As of this writing Clinton sits at about +1.8% nationally, the votes are still coming in and continue to confirm her popular vote lead, currently standing at about 300,000 votes.  Of the polls cited, Rasmussen was the closest to the final result.  Virtually every other poll, however, except IBD/TIPP, was within the margin of error.

The polling that was off in predicting the election were those that aggregated polls along with state polls, adjusted polling based on non-direct polling indicators, and/or then projected the chances of winning based on the probable electoral vote totals.  This is where things were off.

Among the most popular of these sites is Nate Silver’s FiveThirtyEight blog.  Silver established his bonafides in 2008 by picking winners with incredible accuracy, particularly at the state level, and subsequently in his work at the New York Times which continued to prove the efficacy of data in predictive analytics in everything from elections to sports.  Since that time his significant reputation has only grown.

What Silver does is determine the probability of an electoral outcome by using poll results that are transparent in their methodologies and that have a high level of confidence.  Silver’s was the most conservative of these types of polling organizations.  On the eve of the election Silver gave Clinton a 71% chance of winning the presidency. The other organizations that use poll aggregation, poll normalization, or other adjusting indicators (such as betting odds, financial market indicators, and political science indicators) include the New York Times Upshot (Clinton 85%), HuffPost (Clinton 98%), PredictWise (Clinton 89%), Princeton (Clinton >99%), DailyKos (Clinton 92%), Cook (Lean Clinton), Roth (Lean Clinton), and Sabato (Lean Clinton).

In order to understand what probability means in this context, the polls were using both bottom-up state polling to track the electoral college combined with national popular vote polling.  But keep in mind that, as Nate Silver wrote over the course of the election, that just a 17% chance of winning “is the same as your chances of losing a “game” of Russian roulette”.  Few of us would take that bet, particularly since the result of losing the game is finality.

Still, except for FiveThirtyEight, none of the other methods using probability got it right.  None, except FiveThirtyEight, left enough room for drawing the wrong chamber.  Also, in fairness, the Cook, Rothenberg, and Sabato projections also left enough room to see a Trump win if the state dominoes fell right.

The place that the models failed were in the states of Florida, North Carolina, Pennsylvania, Michigan, and Wisconsin.  In particular, even with Florida (result Trump +1.3%) and North Carolina (result Trump +3.8%), Trump would not win if Pennsylvania (result Trump +1.2%), Michigan (result Trump +.3), and Wisconsin (result Trump +1.0)–supposed Clinton firewall states–were not breached.  So what happened?

Among the possible factors are the effect of FBI Director Comey’s public intervention, which was too soon to the election to register in the polling; ineffective polling methods in rural areas (garbage in-garbage out), bad state polling quality, voter suppression, purging, and restrictions (of the battleground states this includes Florida, North Carolina, Wisconsin, Ohio, and Iowa), voter turnout and enthusiasm (aside from the factors of voter suppression), and the inability to peg the way the high level of undecided voters would go at the last minute.

In hindsight, the national polls were good predictors.  The sufficiency of the data in drawing significance, and the high level of confidence in their predictive power is borne out by the final national vote totals.

I think that where the polling failed in the projections of the electoral college was from the inability to take into account non-statistical factors, selection bias, and that the state poll models probably did not accurately reflect the electorate in the states given the lessons from the primaries.  Along these lines, I believe that if pollsters look at the demographics in the respective primaries that they will find that both voter enthusiasm and composition provide the corrective to their projections. Given these factors, the aggregators and probabilistic models should all have called the race too close to call.  I think both Monte Carlo and Bayesian methods in simulations will bear this out.

For example, as one who also holds a political science degree and so will put on that hat.  It is a basic tenet that negative campaigns depress voter participation.  This causes voters to select the lesser of two evils (or lesser of two weevils).  Voter participation was down significantly due to a unprecedentedly negative campaign.  When this occurs, the most motivated base will usually determine the winner in an election.  This is why midterm elections are so volatile, particularly after a presidential win that causes a rebound of the opposition party.  Whether this trend continues with the reintroduction of gerrymandering is yet to be seen.

What all this points to from a data analytics perspective is that one must have a model to explain what is happening.  Statistics by themselves, while correct a good bit of the time, will cause one to be overconfident of a result based solely on the numbers and simulations that give the false impression of solidity, particularly when one is in a volatile environment.  This is known as reification.  It is a fallacious way of thinking.  Combined with selection bias and the absence of a reasonable narrative model–one that introduces the social interactions necessary to understand the behavior of complex adaptive systems–one will often find that invalid results result.