Both Sides Now — The Value of Data Exploration

Over the last several months I have authored a number of stillborn articles that just did not live up to the standards that I set for this blog site. After all, sometimes we just have nothing important to add to the conversation. In a world dominated by narcissism, it is not necessary to constantly have something to say. Some reflection and consideration are necessary, especially if one is to be as succinct as possible.

A quote ascribed to Woodrow Wilson, which may be apocryphal, though it does appear in two of his biographies, was in response to being lauded by someone for making a number of short, succinct, and informative speeches. When asked how he was able to do this, President Wilson is supposed to have replied:

“It depends. If I am to speak ten minutes, I need a week for preparation; if fifteen minutes, three days; if half an hour, two days; if an hour, I am ready now.”

An undisciplined mind has a lot to say about nothing in particular with varying degrees of fidelity to fact or truth. When in normal conversation we most often free ourselves from the discipline expected for more rigorous thinking. This is not necessarily a bad thing if we are saying nothing of consequence and there are gradations, of course. Even the most disciplined mind gets things wrong. We all need editors and fact checkers.

While I am pulling forth possibly apocryphal quotes, the one most applicable that comes to mind is the comment by Hemingway as told by his deckhand in Key West and Cuba, Arnold Samuelson. Hemingway was supposed to have given this advice to the aspiring writer:

“Don’t get discouraged because there’s a lot of mechanical work to writing. There is, and you can’t get out of it. I rewrote the first part of A Farewell to Arms at least fifty times. You’ve got to work it over. The first draft of anything is shit. When you first start to write you get all the kick and the reader gets none, but after you learn to work it’s your object to convey everything to the reader so that he remembers it not as a story he had read but something that happened to himself.”

Though it deals with fiction, Hemingway’s advice applies to any sort of writing and rhetoric. Dr. Roger Spiller, who more than anyone mentored me as a writer and historian, once told me, “Writing is one of those skills that, with greater knowledge, becomes harder rather than easier.”

As a result of some reflection, over the last few months, I had to revisit the reason for the blog. Thus, this is still its purpose: it is a way to validate ideas and hypotheses with other professionals and interested amateurs in my areas of interest. I try to keep uninformed opinion in check, as all too many blogs turn out to be rants. Thus, a great deal of research goes into each of these posts, most from primary sources and from interactions with practitioners in the field. Opinions and conclusions are my own, and my reasoning for good or bad are exposed for all the world to see and I take responsibility for them.

This being said, part of my recent silence has also been due to my workload in–well–the effort involved in my day job of running a technology company, and in my recent role, since late last summer, as the Managing Editor of the College of Performance Management’s publication known as the Measurable News. Our emphasis in the latter case has been to find new contributions to the literature regarding business analytics and to define the concept of integrated project, program, and portfolio management. Stepping slightly over the line to make a pitch, I recommend anyone interested in contributing to the publication to submit an article. The submission guidelines can be found here.

Both Sides Now: New Perspectives

That out of the way, I recently saw, again on the small screen, the largely underrated movie about Neil Armstrong and the Apollo 11 moon landing, “First Man”, and was struck by this scene:

Unfortunately, the first part of the interview has been edited out of this clip and I cannot find a full scene. When asked “why space” he prefaces his comments by stating that the atmosphere of the earth seems to be so large from the perspective of looking at it from the ground but that, having touched the edge of space previously in his experience as a test pilot of the X15, he learned that it is actually very thin. He then goes on to posit that looking at the earth from space will give us a new perspective. His conclusion to this observation is then provided in the clip.

Armstrong’s words were prophetic in that the space program provided a new perspective and a new way of looking at things that were in front of us the whole time. Our spaceship Earth is a blue dot in a sea of space and, at least for a time, the people of our planet came to understand both our loneliness in space and our interdependence.

Earth from Apollo 8. Photo courtesy of NASA.

 

The impact of the Apollo program resulted in great strides being made in environmental and planetary sciences, geology, cosmology, biology, meteorology, and in day-to-day technology. The immediate effect was to inspire the environmental and human rights movements, among others. All of these advances taken together represent a new revolution in thought equal to that during the initial Enlightenment, one that is not yet finished despite the headwinds of reaction and recidivism.

It’s Life’s Illusions I Recall: Epistemology–Looking at and Engaging with the World

In his book Darwin’s Dangerous Idea, Daniel Dennett posited that what was “dangerous” about Darwinism is that it acts as a “universal acid” that, when touching other concepts and traditions, transforms them in ways that change our world-view. I have accepted this position by Dennett through the convincing argument he makes and the evidence in front of us, and it is true that Darwinism–the insight in the evolution of species over time through natural selection–has transformed our perspective of the world and left the old ways of looking at things both reconstructed and unrecognizable.

In his work, Time’s Arrow, Time’s Cycle, Stephen Jay Gould noted that Darwinism is part of one of the three great reconstructions of human thought that, in quoting Sigmund Freud, where “Humanity…has had to endure from the hand of science…outrages upon its naive self-love.” These outrages include the Copernican revolution that removed the Earth from the center of the universe, Darwinism and the origin of species, including the descent of humanity, and what John McPhee, coined as the concept of “deep time.”

But–and there is a “but”–I would propose that Darwinism and the other great reconstructions noted are but different ingredients of a larger and more broader, though compatible, type of innovation in the way the world is viewed and how it is approached–a more powerful universal acid. That innovation in thought is empiricism.

It is this approach to understanding that eats through the many ills of human existence that lead to self-delusion and folly. Though you may not know it, if you are in the field of information technology or any of the sciences, you are part of this way of viewing and interacting with the world. Married with rational thinking, this epistemology–coming from the perspectives of the astronomical observations of planets and other heavenly bodies by Charles Sanders Peirce, with further refinements by William James and John Dewey, and others have come down to us in what is known as Pragmatism. (Note that the word pragmatism in this context is not the same as the more generally used colloquial form of the word. For this type of reason Peirce preferred the term “pragmaticism”). For an interesting and popular reading of the development of modern thought and the development of Pragmatism written for the general reader I highly recommend the Pulitzer Prize-winning The Metaphysical Club by Louis Menand.

At the core of this form of empiricism is that the collection of data, that is, recording, observing, and documenting the universe and nature as it is will lead us to an understanding of things that we otherwise would not see. In our more mundane systems, such as business systems and organized efforts applying disciplined project and program management techniques and methods, we also can learn more about these complex adaptive systems through the enhanced collection and translation of data.

I Really Don’t Know Clouds At All: Data, Information, Intelligence, and Knowledge

The term “knowledge discovery in data”, or KDD for short, is an aspirational goal and so, in terms of understanding that goal, is a point of departure from the practice information management and science. I’m taking this stance because the technology industry uses terminology that, as with most language, was originally designed to accurately describe a specific phenomenon or set of methods in order to advance knowledge, only to find that that terminology has been watered down to the point where it obfuscates the issues at hand.

As I traveled to locations across the U.S. over the last three months, I found general agreement among IT professionals who are dealing with the issues of “Big Data”, data integration, and the aforementioned KDD of this state of affairs. In almost every case there is hesitation to use this terminology because it has been absconded and abused by mainstream literature, much as physicists rail against the misuse of the concept of relativity by non-scientific domains.

The impact of this confusion in terminology has caused organizations to make decisions where this terminology is employed to describe a nebulous end-state, without the initiators having an idea of the effort or scope. The danger here, of course, is that for every small innovative company out there, there is also a potential Theranos (probably several). For an in-depth understanding of the psychology and double-speak that has infiltrated our industry I highly recommend the HBO documentary, “The Inventor: Out for Blood in Silicon Valley.”

The reason why semantics are important (as they always have been despite the fact that you may have had an associate complain about “only semantics”) is that they describe the world in front of us. If we cloud the meanings of words and the use of language, it undermines the basis of common understanding and reveals the (poor) quality of our thinking. As Dr. Spiller noted, the paradox of writing and in gathering knowledge is that the more you know, the more you realize you do not know, and the harder writing and communicating knowledge becomes, though we must make the effort nonetheless.

Thus KDD is oftentimes not quite the discovery of knowledge in the sense that the term was intended to mean. It is, instead, a discovery of associations that may lead us to knowledge. Knowing this distinction is important because the corollary processes of data mining, machine learning, and the early application of AI in which we find ourselves is really the process of finding associations, correlations, trends, patterns, and probabilities in data that is approached in a manner as if all information is flat, thereby obliterating its context. This is not knowledge.

We can measure the information content of any set of data, but the real unlocked potential in that information content will come with the processing of it that leads to knowledge. To do that requires an underlying model of domain knowledge, an understanding of the different lexicons in any given set of domains, and a Rosetta Stone that provides a roadmap that identifies those elements of the lexicon that are describing the same things across them. It also requires capturing and preserving context.

For example, when I use the chat on my iPhone it attempts to anticipate what I want to write. I am given three choices of words to choose if I want to use this shortcut. In most cases, the iPhone guesses wrong, despite presenting three choices and having at its disposal (at least presumptively) a larger vocabulary than the writer. Oftentimes it seems to take control, assuming that I have misspelled or misidentified a word and chooses the wrong one for me, where my message becomes a nonsense message.

If one were to believe the hype surrounding AI, one would think that there is magic there but, as Arthur C. Clarke noted (known as Clarke’s Third Law): “Any sufficiently advanced technology is indistinguishable from magic.” Familiar with the new technologies as we are, we know that there is no magic there, and also that it is consistently wrong a good deal of the time. But many individuals come to rely upon the technology nonetheless.

Despite the gloss of something new, the long-established methods of epistemology, code-breaking, statistics, and Calculus apply–as do standards of establishing fact and truth. Despite a large set of data, the iPhone is wrong because the iPhone does not understand–does not possess knowledge–to know why it is wrong. As an aside, its dictionary is also missing a good many words.

A Segue and a Conclusion–I Still Haven’t Found What I’m Looking For: Why Data Integration?…and a Proposed Definition of the Bigness of Data

As with the question to Neil Armstrong, so the question on data. And so the answer is the same. When we look at any set of data under a particular structure of a domain, the information we derive provides us with a manner of looking at the world. In economic systems, businesses, and projects that data provides us with a basis for interpretation, but oftentimes falls short of allowing us to effectively describe and understand what is happening.

Capturing interrelated data across domains allows us to look at the phenomena of these human systems from a different perspective, providing us with the opportunity to derive new knowledge. But in order to do this, we have to be open to this possibility. It also calls for us to, as I have hammered home in this blog, reset our definitions of what is being described.

For example, there are guides in project and program management that refer to statistical measures as “predictive analytics.” This further waters down the intent of the phrase. Measures of earned value are not predictive. They note trends and a single-point outcome. Absent further analysis and processing, the statistical fallacy of extrapolation can be baked into our analysis. The same applies to any index of performance.

Furthermore, these indices and indicators–for that is all they are–do not provide knowledge, which requires a means of not only distinguishing between correlation and causation but also applying contextualization. All systems operate in a vector space. When we measure an economic or social system we are really measuring its behavior in the vector space that it inhabits. This vector space includes the way it is manifested in space-time: the equivalent of length, width, depth (that is, its relative position, significance, and size within information space), and time.

This then provides us with a hint of a definition of what often goes by the definition of “big data.” Originally, as noted in previous blogs, big data was first used in NASA in 1997 by Cox and Ellsworth (not as credited to John Mashey on Wikipedia with the dishonest qualifier “popularized”) and was simply a statement meaning “datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.”

This is a relative term given Moore’s Law. But we can begin to peel back a real definition of the “bigness” of data. It is important to do this because too many approaches to big data assume it is flat and then apply probabilities and pattern recognition to data that undermines both contextualization and knowledge. Thus…

The Bigness of Data (B) is a function (f ) of the entropy expended (S) to transform data into information, or to extract its information content.

Information evolves. It evolves toward greater complexity just as life evolves toward greater complexity. The universe is built on coded bits of information that, taken together and combined in almost unimaginable ways, provides different forms of life and matter. Our limited ability to decode and understand this information–and our interactions in it– are important to us both individually and collectively.

Much entropy is already expended in the creation of the data that describes the activity being performed. Its context is part of its information content. Obliterating the context inherent in that information content causes all previous entropy to be of no value. Thus, in approaching any set of data, the inherent information content must be taken into account in order to avoid the unnecessary (and erroneous) application of data interpretation.

More to follow in future posts.

Driver’s Seat — How Software Normalization Can Drive Process Improvement

Travel, business, and family obligations have interrupted regular blogging–many apologies but thanks for hanging in there and continuing to read the existing posts.

Over the past couple of weeks I have taken note of two issues that regularly pop up: the lack of consistency in how compliance is applied by oversight organizations within both industry and within government, especially in cases of government agencies with oversight responsibility in project management; and the lack of consistency in data and information that informs project management systems.

That the same condition exists in both areas is not, I believe, a coincidence, and points to a great deal of hair-pulling, scapegoating, and finger-pointing that would otherwise have been avoided over the years.  I am not saying that continual process improvement without technology is not essential, but it is undoubtedly true that there is a limit to how effectively information can be processed and consumed in a pre-digitized system compared to post-digitized systems.  The difference is in multiples, not only in the amount of data, but also in the quality of the data after processing.

Thus, an insight that I have observed as we apply new generations of software technology to displace the first and second wave of project management technologies is an improved ability to apply consistency and standardization in oversight.  Depending on where you stand this is either a good or bad thing.  If you are a traditional labor-intensive accounting organization where you expect a team of personnel to disrupt the organization in going through details of a paper trail, then you are probably unhappy because your business model will soon be found to be obsolete (actually it already is depending on where your customer sits on a scale of digitization).  If you are interested, however, in optimization of systems then you are probably somewhere to the positive on the scale.

Software companies are mainly interested in keeping their customers tied to their technology.  For example, try buying the latest iPhone if you have an existing plan with a carrier but want to switch to someone else.  This is why I am often puzzled by how anyone in the economics or political science professions cannot understand why we have new technological robber barons with the resulting concentration in wealth and political power.  One need only look at how the railroads and utilities tied entire swaths of the country into knots well into the 20th century prior to anti-trust enforcement.  The technology is new but the business model is the same.

The condition of establishing technological islands of code and software is what creates diseconomies in processes.  The costs associated with multiple applications to address different business processes increases costs and reduces efficiency not only because of proprietary idiosyncrasies which create duplicative training, maintenance, and support requirements, but also because of the costs associated with reconciliation and integration of interrelated data, usually accomplished manually.  On the systems validation and oversight side, this lack of consistency in data drives inconsistency in the way in which the effectiveness of project management systems are assessed.

Years of effort, training, policy writing, and systems adjustments have met with the law of diminishing returns while ignoring the underlying and systemic cause of inconsistency in interdependent factors.  Yet, when presented with a system in which otherwise proprietary and easily reconcilable data is normalized to not only ensure consistency but quality, the variations in how the data is assessed and viewed diminishes very quickly.  This should be no surprise but, despite the obvious advantages and economies being realized, resistance still exists, largely based in fear.

The fear is misplaced only because it lies in the normal push and pull of management/employee and customer/contractor relations.  Given more information and more qualitatively insightful information, the argument goes, the more that oversight will become disruptive.  That this condition exists today because of sub-optimization and lack of consistency does not seem to occur to the proponents of this argument.  Both sides, like two wrestlers having locked each other in a stronghold that cannot result in a decision, is each loathe to ease their own grip in fear that the other will take advantage of the situation.  Yet, technology will be the determining factor as the economic pressures become too hard to resist.  It is time to address these fears and reestablish the lines of demarcation in our systems based on good leadership and management practices–skills that seem to be disappearing as more people and companies become focused on 1s and 0s.

Note: The post has been modified to correct grammatical errors.  Travel took its toll on the first go-round.

Better Knock-Knock-Knock on Wood — The Essential Need for Better Schedule-Cost Integration

Back in early to mid-1990s, when NSFNET was making the transition to the modern internet, I was just finishing up my second assignment as an IT project manager and transitioning to a full-blown Program Executive Office (PEO) Business Manager and CIO at a major Naval Systems Command.  The expanded potential of a more open internet was on everyone’s mind and, on the positive side, on how barriers to previously stove-piped data could be broken down in order to drive optimization of the use of that data (after processing it into useable intelligence).  The next step was then to use that information, which was opened to a larger audience that previously was excluded from it, and to juxtapose and integrate it with other essential data (processed into intelligence) to provide insights not previously realized.

Here we are almost 20 years later and I am disappointed to see in practice that the old barriers to information optimization still exist in many places where technology should have long ago broken this mindset.  Recently I have discussed cases at conferences and among PM professionals where the Performance Management Baseline (PMB), that is, the plan that is used to measure financial value of the work performed, is constructed separately from and without reference to the Integrated Master Schedule (IMS) until well after the fact.  This is a challenge to common sense.

Project management is based on the translation of a contract specification into a plan to build something.  The basic steps after many years of professional development are so tried and true that it should be rote at this point:  Integrated Master Plan (IMP) –> Integrated Master Schedule (IMS) with Schedule Risk Assessment (SRA) –> Resource assignments with negotiated rates –> Develop work packages, link to financials, and roll-up of WBS –> Performance Management Baseline (PMB).  The arrows represent the relationships between the elements.  Feel free to adjust semantics and add additional items to the process such as a technical performance baseline, testing and evaluation plans, systems descriptions to ensure traceability, milestone tracking, etc.  But the basic elements of project planning and execution pretty much remain the same–that’s all there is folks.  The complexity and time spent to go through the steps varies based on the complexity of the scope being undertaken.  For a long-term project involving billions or millions of dollars the interrelationships and supporting documentation is quite involved, for short-term efforts the process may be in mental process of the person doing the job.  But in the end, regardless of terminology, these are the basic elements of PM.

When one breaks this cycle and decides to build each of the elements independently from the other it is akin to building a bridge in sections without using an overarching plan.  Result:  it’s not going to meet in the center.  One can argue that it is perfectly fine to build the PMB concurrent with the IMS if the former is informed by the latter.  But in practice I find that this is rarely the case.  So what we have, then, is a case where a bridge is imperfectly matched when the two sections meet in the middle requiring constant readjustment and realignment.  Furthermore, the manner in which the schedule activities are aligned with the budget vary from project to project, even within the same organization.  So not only do we not use a common plan in building our notional bridge, we decide to avoid standardization of bolts and connectors too, just to make it that more interesting.

The last defense in this sub-optimized environment is: well, if we are adjusting it every month through the project team what difference does it make?  Isn’t this integration nonetheless?  Response #1:  No.  Response #2:  THIS-IS-THE-CHALLENGE-THAT-DIGITAL-SYSTEMS-ARE-DESIGNED-TO-OVERCOME.  The reason why this is not integration is because it simultaneously ignores the lessons learned in the SRA and prevents insights gained through optimization.  If our planning documents are contingent on a month-to-month basis then the performance measured against them is of little value and always open to question, and not just on the margins.  Furthermore, utilization of valuable project management personnel on performing what is essentially clerical work in today’s environment is indefensible.  If there are economic incentives for doing this it is time for project stakeholders and policymakers to end them.

It is time to break down the artificial barriers that define cost and schedule analysts.  Either you know project and program management or you don’t.  There is no magic wall between the two disciplines, given that one cannot exist without the other.  Furthermore, more standardization, not less, is called for.  For anyone who has tried to decipher schedules where smiley-faces, and non-standard and multiple structures are in use in the same schedule, which defy reference to a cost control account, it is clear that both the consulting and project management communities are failing to instill professionalism.

Otherwise, as in my title, it’s like knocking on wood.