The Monster Mash — Zombie Ideas in Project and Information Management

Just completed a number of meetings and discussions among thought leaders in the area of complex project management this week, and I was struck by a number of zombie ideas in project management, especially related to information, that just won’t die.  The use of the term zombie idea is usually attributed to the Nobel economist Paul Krugman from his excellent and highly engaging (as well as brutally honest) posts at the New York Times, but for those not familiar, a zombie idea is “a proposition that has been thoroughly refuted by analysis and evidence, and should be dead — but won’t stay dead because it serves a political purpose, appeals to prejudices, or both.”

The point is that to a techie–or anyone engaged in intellectual honesty–is that they are often posed in the form of question begging, that is, they advance invalid assumptions in the asking or the telling.  Most often they take the form of the assertive half of the same coin derived from “when did you stop beating your wife?”-type questions.  I’ve compiled a few of these for this post and it is important to understand the purpose for doing so.  It is not to take individuals to task or to bash non-techies–who have a valid reason to ask basic questions based on what they’ve heard–but propositions put forth by people who should know better based on their technical expertise or experience.  Furthermore, knowing and understanding technology and its economics is really essential today to anyone operating in the project management domain.

So here are a few zombies that seem to be most common:

a.  More data equals greater expense.  I dealt with this issue in more depth in a previous post, but it’s worth repeating here:  “When we inform Moore’s Law by Landauer’s Principle, that is, that the energy expended in each additional bit of computation becomes vanishingly small, it becomes clear that the difference in cost in transferring a MB of data as opposed to a KB of data is virtually TSTM (“too small to measure”).”  The real reason why we continue to deal with this assertion is both political in nature and also based in social human interaction.  People hate oversight and they hate to be micromanaged, especially to the point of disrupting the work at hand.  We see behavior, especially in regulatory and contractual relationships, where the reporting entity plays the game of “hiding the button.”  This behavior is usually justified by pointing to examples of dysfunction, particularly on the part of the checker, where information submissions lead to the abuse of discretion in oversight and management.  Needless to say, while such abuse does occur, no one has yet to point quantitatively to data (as opposed to anecdotally) that show how often this happens.

I would hazard to guess that virtually anyone with some experience has had to work for a bad boss; where every detail and nuance is microscopically interrogated to the point where it becomes hard to make progress on the task at hand.  Such individuals, who have been advanced under the Peter principle must, no doubt, be removed from such a position.  But this often happens in any organization, whether it be in private enterprise–especially in places where there is no oversight, check-and-balances, means of appeal, or accountability–or government–and is irrelevant to the assertion.  The expense item being described is bad management, not excess data.  Thus, such assertions are based on the antecedent assumption of bad management, which goes hand-in-hand with…

b. More information is the enemy of efficiency.  This is the other half of the economic argument to more data equals greater expense.  And I failed to mention that where the conflict has been engaged over these issues, some unjustifiable figure is given for the additional data that is certainly not supported by the high tech economics cited above.  Another aspect of both of these perspectives also comes from the conception of non-techies that more data and information is equivalent to pre-digital effort, especially in conceptualizing the work that often went into human-readable reports.  This is really an argument that supports the assertion that it is time to shift the focus from fixed report formatting functionality in software based on limited data to complete data, which can be formatted and processed as necessary.  If the right and sufficient information is provided up-front, then additional questions and interrogatories that demand supplemental data and information–with the attendant multiplication of data streams and data islands that truly do add cost and drive inefficiency–are at least significantly reduced, if not eliminated.

c.  Data size adds unmanageable complexity.  This was actually put forth by another software professional–and no doubt the non-techies in the room would have nodded their heads in agreement (particularly given a and b above), if opposing expert opinion hadn’t been offered.  Without putting too fine a point on it, a techie saying this to an open forum is equivalent to whining that your job is too hard.  This will get you ridiculed at development forums, where you will be viewed as an insufferable dilettante.  Digitized technology for well over 40 years has been operating under the phenomenon of Moore’s Law.  Under this law, computational and media storage capability doubles at least every two years under the original definition, though that equation has accelerated to somewhere between 12 and 24 months.  Thus, what was considered big data, say, in 1997 when NASA first coined the term, is not considered big data today.  No doubt, what is considered big data this year will not be considered big data two years from now.  Thus, the term itself is relative and may very well become archaic.  The manner in which data is managed–its rationalization and normalization–is important in successfully translating disparate data sources, but the assertion that big is scary is simply fear mongering because you don’t have the goods.

d.  Big data requires more expensive and sophisticated approaches.  This flows from item c above as well and is often self-serving.  Scare stories abound, often using big numbers which sound scary.  All data that has a common use across domains has to be rationalized at some point if they come from disparate sources, and there are a number of efficient software techniques for accomplishing this.  Furthermore, support for agnostic APIs and common industry standards, such as the UN/CEFACT XML, take much of the rationalization and normalization work out of a manual process.  Yet I have consistently seen suboptimized methods being put forth that essentially require an army of data scientists and coders to essentially engage in brute force data mining–a methodology that has been around for almost 30 years: except that now it carries with it the moniker of big data.  Needless to say this approach is probably the most expensive and slowest out there.  But then, the motivation for its use by IT shops is usually based in rice bowl and resource politics.  This is flimflam–an attempt to revive an old zombie under a new name.  When faced with such assertions, see Moore’s Law and keep on looking for the right answer.  It’s out there.

e.  Performance management and assessment is an unnecessary “regulatory” expense.  This one keeps coming up as part of a broader political agenda beyond just project management.  I’ve discussed in detail the issues of materiality and prescriptiveness in regulatory regimes here and here, and have addressed the obvious legitmacy of organizations to establish one in fiduciary, contractual, and governmental environments.

My usual response to the assertion of expense is to simply point to the unregulated derivatives market largely responsible for the financial collapse, and the resulting deep economic recession that followed once the housing bubble burst.  (And, aside from the cost of human suffering and joblessness, the expenses related to TARP).  Thus we know that the deregulation of banking had gone so well.  Even after the Band-Aid of Dodd-Frank the situation probably requires a bit more vigor, and should include the ratings agencies as well as the real estate market.  But here is the fact of the matter: such expenses cannot be monetized as additive because “regulatory” expenses usually represent an assessment of the day-to-day documentation, systems, and procedures required when performing normal business operations and due diligence in management.  I attended an excellent presentation last week where the speaker, tasked with finding unnecessary regulatory expenses, admitted as much.

Thus, what we are really talking about is an expense that is an essential prerequisite to entry in a particular vertical, especially where monopsony exists as a result of government action.  Moral hazard, then, is defined by the inherent risk assumed by contract type, and should be assessed on those terms.  Given the current trend is to raise thresholds, the question is going to be–in the government sphere–whether public opinion will be as forgiving in a situation where moral hazard assumes $100M in risk when things head south, as they often do with regularity in project management.  The way to reduce that moral hazard is through sufficiency of submitted data.  Thus, we return to my points in a and b above.

f.  Effective project assessment can be performed using high level data.  It appears that this view has its origins in both self-interest and a type of anti-intellectualism/anti-empiricism.

In the former case, the bias is usually based on the limitations of either individuals or the selected technology in providing sufficient information.  In the latter case, the argument results in a tautology that reinforces the fallacy that absence of evidence proves evidence of absence.  Here is how I have heard the justification for this assertion: identifying emerging trends in a project does not require that either trending or lower level data be assessed.  The projects in question are very high dollar value, complex projects.

Yes, I have represented this view correctly.  Aside from questions of competency, I think the fallacy here is self-evident.  Study after study (sadly not all online, but performed within OSD at PARCA and IDA over the last three years) have demonstrated that high level data averages out and masks indicators of risk manifestation, which could have been detected looking at data at the appropriate level, which is the intersection of work and assigned resources.  In plain language, this requires integration of the cost and schedule systems, with risk first being noted through consecutive schedule performance slips.  When combined with technical performance measures, and effective identification of qualitative and quantitative risk tied to schedule activities, the early warning is two to three months (and sometime more) before the risk is reflected in the cost measurement systems.  You’re not going to do this with an Excel spreadsheet.  But, for reference, see my post  Excel is not a Project Management Solution.

It’s time to kill the zombies with facts–and to behead them once and for all.

Let’s Get Physical — Pondering the Physics of Big Data

I’ll have a longer and less wonky article on this and related topics next week at’s Blogging Alliance, but Big Data has been a hot topic of late.  It also concerns the business line in which I engage and so it is time to sweep away a lot of the foolishness concerning it: what it can do, its value, and its limitations.

As a primer a useful commentary on the ethical uses of Big Data was published today at in an excerpt from Jacob Silverman’s book, Terms of Service: Social Media and the Price of Constant Connection.  Silverman takes a different approach from the one that I outline in my article, but he tackles the economics of new media that were identified years ago by Brad DeLong and A. Michael Froomkin back in the late 1990s and first decade of the 21st century.  This article on First Monday from 2000 regarding speculative microeconomics emerging from new media nicely summarizes their thesis.  Silverman rejects reforming the system in economic terms, entering the same ethical terrain on personal data collection that was explored by Rebecca Skloot on the medical profession’s genetic collection and use of tissue during biopsies in the book, The Immortal Life of Henrietta Lacks.

What Silverman’s book does make clear–and which is essential in understanding the issue–is that not all big data is the same.  To our brute force machines data is data absent the means of software to distinguish it, since they are not yet conscious in the manner that would pass a Turing test.  Even with software such machines still cannot pass such a test, though I personally believe that strong AI is inevitable.

Thus, there is Big Data that is swept up–often without deliberate consent by the originator of the data–from the larger pool of society at large by commercial companies that have established themselves as surveillance “statelets” in gathering data from business transactions, social media preferences, and other electronic means.

And there is data that is deliberately stored and, oftentimes shared, among conscious actors for a specific purpose.  These actors are often government agencies, corporations, and related organizations that cooperatively share business information from their internal processes and systems for the purpose of developing predictive systems toward a useful public purpose, oftentimes engaged in joint enterprises toward the development of public goods and services.  It is in this latter domain that I operate.  I like to call this “small” Big Data, since we operate in what can realistically be characterized as closed systems.

Data and computing has a physical and mathematical basis.  For anyone who has studied the history of computing (or has coded) this is a self-evident fact.  But for the larger community of users it appears–especially if one listens to the hype of our industry–that the sky is the limit.  But perhaps that is a good comparison after all, for anyone who has flown in a plane knows that the sky does indeed have limits.  To fly requires a knowledge of gravity, the atmosphere, lift, turbulence, aerodynamics, and propulsion, among other disciplines and sciences.  All of these have their underpinnings in physics and mathematics.

The equation that we use in computing is known as Landauer’s Principle.  It is as follows:

kT In 2,

where k is the Boltzmann constant, T is the temperature of the circuit in Kelvins, and In 2 is the natural logarithm of 2.

This equation follows those in thermodynamics established earlier in physics.  What this means is that the inherent entropy in a system–its onward inevitable journey toward a state of disorder–cannot be reduced, it can only be expelled from the system. For Landauer, who worked at IBM in physical computing, entropy is expelled in the form of heat and energy.  For the longest time, given the close correlation and applied proofs of the Principle, this was seen as a physical law, but modern computing seems to be undermining the manner in which entropy is expelled.

Big Data runs up against the physics identified in Landauer’s Principle because heat and energy are not the only ways to expel entropy.  For really Big Data entropy is expelled by the iron law of Boltzmann’s Constant: the calculation of probable states of disorder in the system. The larger the system, the larger the probable states of disorder, and the more our results in processing such information become a function of probability.  This may or many not matter, depending on the fidelity of the probabilistic methods and their application.

For “small” Big Data, the acceptability of variations from the likely outcome is much narrower.  We need to approach being 100% correct, 100% of the time, though small variations are acceptable depending on the type of system.  So, for example, in project management systems, we can be a percent or two off on rolling up data, since accountability is not an issue.  Financial systems compliance is a different matter.

In “small” Big Data, entropy can be expelled by pre-processing the data in the form of effort expended toward standardization, normalization, and rationalization.  Our equation, kT In 2, is the lower bound, that is, it identifies the minimum state of entropy that need be expelled in order to process a bit.  In reality we will never reach this lower bound, but we can approach it until the difference between the lower bound of entropy and the “cost” of processing data is vanishingly small.  Once we have expelled entropy by limiting the states of instability in the data, expelling the cost of entropy through the data pipeline, we can then process the data to derive its significant with a high degree of confidence.

But this is only the start.  For once “small” Big Data undergoes a process to ensure its fidelity, the same pattern recognition algorithms used in Big Data can be applied, but to more powerful and credible effect.  Early warning “signatures” of project performance can be collected and applied to provide decision-makers with information early enough to affect the outcome of efforts before risk is fully manifested, with the calculated probabilities of cost, schedule, and technical impacts possessing a higher level of certainty.