Both Sides Now — The Value of Data Exploration

Over the last several months I have authored a number of stillborn articles that just did not live up to the standards that I set for this blog site. After all, sometimes we just have nothing important to add to the conversation. In a world dominated by narcissism, it is not necessary to constantly have something to say. Some reflection and consideration are necessary, especially if one is to be as succinct as possible.

A quote ascribed to Woodrow Wilson, which may be apocryphal, though it does appear in two of his biographies, was in response to being lauded by someone for making a number of short, succinct, and informative speeches. When asked how he was able to do this, President Wilson is supposed to have replied:

“It depends. If I am to speak ten minutes, I need a week for preparation; if fifteen minutes, three days; if half an hour, two days; if an hour, I am ready now.”

An undisciplined mind has a lot to say about nothing in particular with varying degrees of fidelity to fact or truth. When in normal conversation we most often free ourselves from the discipline expected for more rigorous thinking. This is not necessarily a bad thing if we are saying nothing of consequence and there are gradations, of course. Even the most disciplined mind gets things wrong. We all need editors and fact checkers.

While I am pulling forth possibly apocryphal quotes, the one most applicable that comes to mind is the comment by Hemingway as told by his deckhand in Key West and Cuba, Arnold Samuelson. Hemingway was supposed to have given this advice to the aspiring writer:

“Don’t get discouraged because there’s a lot of mechanical work to writing. There is, and you can’t get out of it. I rewrote the first part of A Farewell to Arms at least fifty times. You’ve got to work it over. The first draft of anything is shit. When you first start to write you get all the kick and the reader gets none, but after you learn to work it’s your object to convey everything to the reader so that he remembers it not as a story he had read but something that happened to himself.”

Though it deals with fiction, Hemingway’s advice applies to any sort of writing and rhetoric. Dr. Roger Spiller, who more than anyone mentored me as a writer and historian, once told me, “Writing is one of those skills that, with greater knowledge, becomes harder rather than easier.”

As a result of some reflection, over the last few months, I had to revisit the reason for the blog. Thus, this is still its purpose: it is a way to validate ideas and hypotheses with other professionals and interested amateurs in my areas of interest. I try to keep uninformed opinion in check, as all too many blogs turn out to be rants. Thus, a great deal of research goes into each of these posts, most from primary sources and from interactions with practitioners in the field. Opinions and conclusions are my own, and my reasoning for good or bad are exposed for all the world to see and I take responsibility for them.

This being said, part of my recent silence has also been due to my workload in–well–the effort involved in my day job of running a technology company, and in my recent role, since late last summer, as the Managing Editor of the College of Performance Management’s publication known as the Measurable News. Our emphasis in the latter case has been to find new contributions to the literature regarding business analytics and to define the concept of integrated project, program, and portfolio management. Stepping slightly over the line to make a pitch, I recommend anyone interested in contributing to the publication to submit an article. The submission guidelines can be found here.

Both Sides Now: New Perspectives

That out of the way, I recently saw, again on the small screen, the largely underrated movie about Neil Armstrong and the Apollo 11 moon landing, “First Man”, and was struck by this scene:

Unfortunately, the first part of the interview has been edited out of this clip and I cannot find a full scene. When asked “why space” he prefaces his comments by stating that the atmosphere of the earth seems to be so large from the perspective of looking at it from the ground but that, having touched the edge of space previously in his experience as a test pilot of the X15, he learned that it is actually very thin. He then goes on to posit that looking at the earth from space will give us a new perspective. His conclusion to this observation is then provided in the clip.

Armstrong’s words were prophetic in that the space program provided a new perspective and a new way of looking at things that were in front of us the whole time. Our spaceship Earth is a blue dot in a sea of space and, at least for a time, the people of our planet came to understand both our loneliness in space and our interdependence.

Earth from Apollo 8. Photo courtesy of NASA.


The impact of the Apollo program resulted in great strides being made in environmental and planetary sciences, geology, cosmology, biology, meteorology, and in day-to-day technology. The immediate effect was to inspire the environmental and human rights movements, among others. All of these advances taken together represent a new revolution in thought equal to that during the initial Enlightenment, one that is not yet finished despite the headwinds of reaction and recidivism.

It’s Life’s Illusions I Recall: Epistemology–Looking at and Engaging with the World

In his book Darwin’s Dangerous Idea, Daniel Dennett posited that what was “dangerous” about Darwinism is that it acts as a “universal acid” that, when touching other concepts and traditions, transforms them in ways that change our world-view. I have accepted this position by Dennett through the convincing argument he makes and the evidence in front of us, and it is true that Darwinism–the insight in the evolution of species over time through natural selection–has transformed our perspective of the world and left the old ways of looking at things both reconstructed and unrecognizable.

In his work, Time’s Arrow, Time’s Cycle, Stephen Jay Gould noted that Darwinism is part of one of the three great reconstructions of human thought that, in quoting Sigmund Freud, where “Humanity…has had to endure from the hand of science…outrages upon its naive self-love.” These outrages include the Copernican revolution that removed the Earth from the center of the universe, Darwinism and the origin of species, including the descent of humanity, and what John McPhee, coined as the concept of “deep time.”

But–and there is a “but”–I would propose that Darwinism and the other great reconstructions noted are but different ingredients of a larger and more broader, though compatible, type of innovation in the way the world is viewed and how it is approached–a more powerful universal acid. That innovation in thought is empiricism.

It is this approach to understanding that eats through the many ills of human existence that lead to self-delusion and folly. Though you may not know it, if you are in the field of information technology or any of the sciences, you are part of this way of viewing and interacting with the world. Married with rational thinking, this epistemology–coming from the perspectives of the astronomical observations of planets and other heavenly bodies by Charles Sanders Peirce, with further refinements by William James and John Dewey, and others have come down to us in what is known as Pragmatism. (Note that the word pragmatism in this context is not the same as the more generally used colloquial form of the word. For this type of reason Peirce preferred the term “pragmaticism”). For an interesting and popular reading of the development of modern thought and the development of Pragmatism written for the general reader I highly recommend the Pulitzer Prize-winning The Metaphysical Club by Louis Menand.

At the core of this form of empiricism is that the collection of data, that is, recording, observing, and documenting the universe and nature as it is will lead us to an understanding of things that we otherwise would not see. In our more mundane systems, such as business systems and organized efforts applying disciplined project and program management techniques and methods, we also can learn more about these complex adaptive systems through the enhanced collection and translation of data.

I Really Don’t Know Clouds At All: Data, Information, Intelligence, and Knowledge

The term “knowledge discovery in data”, or KDD for short, is an aspirational goal and so, in terms of understanding that goal, is a point of departure from the practice information management and science. I’m taking this stance because the technology industry uses terminology that, as with most language, was originally designed to accurately describe a specific phenomenon or set of methods in order to advance knowledge, only to find that that terminology has been watered down to the point where it obfuscates the issues at hand.

As I traveled to locations across the U.S. over the last three months, I found general agreement among IT professionals who are dealing with the issues of “Big Data”, data integration, and the aforementioned KDD of this state of affairs. In almost every case there is hesitation to use this terminology because it has been absconded and abused by mainstream literature, much as physicists rail against the misuse of the concept of relativity by non-scientific domains.

The impact of this confusion in terminology has caused organizations to make decisions where this terminology is employed to describe a nebulous end-state, without the initiators having an idea of the effort or scope. The danger here, of course, is that for every small innovative company out there, there is also a potential Theranos (probably several). For an in-depth understanding of the psychology and double-speak that has infiltrated our industry I highly recommend the HBO documentary, “The Inventor: Out for Blood in Silicon Valley.”

The reason why semantics are important (as they always have been despite the fact that you may have had an associate complain about “only semantics”) is that they describe the world in front of us. If we cloud the meanings of words and the use of language, it undermines the basis of common understanding and reveals the (poor) quality of our thinking. As Dr. Spiller noted, the paradox of writing and in gathering knowledge is that the more you know, the more you realize you do not know, and the harder writing and communicating knowledge becomes, though we must make the effort nonetheless.

Thus KDD is oftentimes not quite the discovery of knowledge in the sense that the term was intended to mean. It is, instead, a discovery of associations that may lead us to knowledge. Knowing this distinction is important because the corollary processes of data mining, machine learning, and the early application of AI in which we find ourselves is really the process of finding associations, correlations, trends, patterns, and probabilities in data that is approached in a manner as if all information is flat, thereby obliterating its context. This is not knowledge.

We can measure the information content of any set of data, but the real unlocked potential in that information content will come with the processing of it that leads to knowledge. To do that requires an underlying model of domain knowledge, an understanding of the different lexicons in any given set of domains, and a Rosetta Stone that provides a roadmap that identifies those elements of the lexicon that are describing the same things across them. It also requires capturing and preserving context.

For example, when I use the chat on my iPhone it attempts to anticipate what I want to write. I am given three choices of words to choose if I want to use this shortcut. In most cases, the iPhone guesses wrong, despite presenting three choices and having at its disposal (at least presumptively) a larger vocabulary than the writer. Oftentimes it seems to take control, assuming that I have misspelled or misidentified a word and chooses the wrong one for me, where my message becomes a nonsense message.

If one were to believe the hype surrounding AI, one would think that there is magic there but, as Arthur C. Clarke noted (known as Clarke’s Third Law): “Any sufficiently advanced technology is indistinguishable from magic.” Familiar with the new technologies as we are, we know that there is no magic there, and also that it is consistently wrong a good deal of the time. But many individuals come to rely upon the technology nonetheless.

Despite the gloss of something new, the long-established methods of epistemology, code-breaking, statistics, and Calculus apply–as do standards of establishing fact and truth. Despite a large set of data, the iPhone is wrong because the iPhone does not understand–does not possess knowledge–to know why it is wrong. As an aside, its dictionary is also missing a good many words.

A Segue and a Conclusion–I Still Haven’t Found What I’m Looking For: Why Data Integration?…and a Proposed Definition of the Bigness of Data

As with the question to Neil Armstrong, so the question on data. And so the answer is the same. When we look at any set of data under a particular structure of a domain, the information we derive provides us with a manner of looking at the world. In economic systems, businesses, and projects that data provides us with a basis for interpretation, but oftentimes falls short of allowing us to effectively describe and understand what is happening.

Capturing interrelated data across domains allows us to look at the phenomena of these human systems from a different perspective, providing us with the opportunity to derive new knowledge. But in order to do this, we have to be open to this possibility. It also calls for us to, as I have hammered home in this blog, reset our definitions of what is being described.

For example, there are guides in project and program management that refer to statistical measures as “predictive analytics.” This further waters down the intent of the phrase. Measures of earned value are not predictive. They note trends and a single-point outcome. Absent further analysis and processing, the statistical fallacy of extrapolation can be baked into our analysis. The same applies to any index of performance.

Furthermore, these indices and indicators–for that is all they are–do not provide knowledge, which requires a means of not only distinguishing between correlation and causation but also applying contextualization. All systems operate in a vector space. When we measure an economic or social system we are really measuring its behavior in the vector space that it inhabits. This vector space includes the way it is manifested in space-time: the equivalent of length, width, depth (that is, its relative position, significance, and size within information space), and time.

This then provides us with a hint of a definition of what often goes by the definition of “big data.” Originally, as noted in previous blogs, big data was first used in NASA in 1997 by Cox and Ellsworth (not as credited to John Mashey on Wikipedia with the dishonest qualifier “popularized”) and was simply a statement meaning “datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.”

This is a relative term given Moore’s Law. But we can begin to peel back a real definition of the “bigness” of data. It is important to do this because too many approaches to big data assume it is flat and then apply probabilities and pattern recognition to data that undermines both contextualization and knowledge. Thus…

The Bigness of Data (B) is a function (f ) of the entropy expended (S) to transform data into information, or to extract its information content.

Information evolves. It evolves toward greater complexity just as life evolves toward greater complexity. The universe is built on coded bits of information that, taken together and combined in almost unimaginable ways, provides different forms of life and matter. Our limited ability to decode and understand this information–and our interactions in it– are important to us both individually and collectively.

Much entropy is already expended in the creation of the data that describes the activity being performed. Its context is part of its information content. Obliterating the context inherent in that information content causes all previous entropy to be of no value. Thus, in approaching any set of data, the inherent information content must be taken into account in order to avoid the unnecessary (and erroneous) application of data interpretation.

More to follow in future posts.

Sunday Contemplation — Finding Wisdom: The Epimenides Paradox

The liar’s paradox, as it is often called, is a fitting subject for our time. For those not familiar with the paradox, it was introduced to me by the historian Gordon Prange when I was a young Navy enlisted man attending the University of Maryland. He introduced the paradox to me as a comedic rejoinder to the charge of a certain bias in history that he considered to be without merit. He stated it this way: “I heard from a Cretan that all Cretans are liars.”

The origin of this form of the liar’s paradox has many roots. It is discussed as a philosophical conundrum by Aristotle in ancient Greece as well as by Cicero in Rome. A version of it appears in the Christian New Testament and it was a source of study in Europe during the Middle Ages.

When I have introduced the paradox in a social setting and asked for a resolution to it by the uninitiated, usually a long conversation ensues. The usual approach is as a bi-polar proposition, accepting certain assumptions from the construction of the sentence, that is, if the Cretan is lying then all Cretans tell the truth which cannot be the case, but if the Cretan is telling the truth then he is lying, but he could not be telling the truth since all Cretans lie…and the circular contradiction goes on ad infinitum.

But there is a solution to the paradox and what it requires is thinking about the Cretan and breaking free of bi-polar thinking, which we often call, colloquially, “thinking in black and white.”

The solution.

The assumption in the paradox is that the Cretan in question can speak for all Cretans. This assumption could be false. Thus not all Cretans are liars and, thus, the Cretan in question is making a false statement. Furthermore, the Cretan making the assertion is not necessarily a liar–the individual could just be mistaken. We can test the “truthiness” of what the Cretan has said by testing other Cretans on a number of topics and seeing if they are simply ignorant, uninformed, or truly liars on all things.

Furthermore, there is a difference between something being a lie and a not-lie. Baked into our thinking by absolutist philosophies, ideologies, and religions is black and white thinking that clouds our judgement. A lie must have intent and be directed to misinform, misdirect, or to cloud a discussion. There are all kinds of lies and many forms of not-lies. Thus, the opposite of “all Cretans are liars” is not that “all Cretans are honest” but that “some Cretans are honest and some are not.”

If we only assume the original conclusion as being true, then this is truly a paradox, but it is not. If we show that Cretans do not lie all of the time then we are not required to reach the high bar that “all Cretans are honest”, simply that the Cretan making the assertion has made a false statement or is, instead, the liar.

In sum, our solution in avoiding falling into the thinking of the faulty or dishonest Cretan is not to accept the premises as they have been presented to us, but to use our ability to reason out the premises and to look at the world as it is as a “reality check.” The paradox is not truly a paradox, and the assertion is false.

(Note that I have explained this resolution without going into the philosophical details of the original syllogism, the mathematics, and an inquiry on the detailed assumptions. For a fuller discussion of liar’s paradoxes I recommend this link.)

Why Care About the Paradox?

We see versions of the paradox used all of the time. This includes the use of ad hominem attacks on people, that is, charges of guilt by association with an idea, a place, an ethnic group, or another person. “Person X is a liar (or his/her actions are suspect or cannot be trusted) because they adhere to Y idea, group, or place.” Oftentimes these attacks are joined with insulting or demeaning catchphrases and (especially racial or ethnic) slurs.

What we attribute to partisanship or prejudice or bias often uses this underlying type of thinking. It is a simplification born of ignorance and all simplifications are a form of evil in the world. This assertion was best articulated by Albert Camus in his book The Plague.

“The evil that is in the world always comes of ignorance, and good intentions may do as much harm as malevolence, if they lack understanding. On the whole, men are more good than bad; that, however, isn’t the real point. But they are more or less ignorant, and it is this that we call vice or virtue; the most incorrigible vice being that of an ignorance that fancies it knows everything and therefore claims for itself the right to kill. The soul of the murderer is blind; and there can be no true goodness nor true love without the utmost clear-sightedness.”

Our own times are not much different in its challenges than what Camus faced during the rise of fascism in Europe, for fascism’s offspring have given rise to a new generation that has insinuated itself into people’s minds.

Aside from my expertise in technology and the military arts and sciences, the bulk of my formal academic education is as an historian and political scientist. The world is currently in the grip of a plague that eschews education and Camus’ clear-sightedness in favor of materialism, ethnic hatred, nativisim, anti-intellectualism, and ideological propaganda.

History is replete with similar examples, both large and small, of this type of thinking which should teach us that this is an aspect of human character wired into our brains that requires eternal vigilance to guard against. Such examples as the Spanish Inquisition, the Reformation and Counter Reformation, the French Revolution, the defense of slavery in the American Civil War and the subsequent terror of Jim Crow, 18th and 19th century imperialism, apartheid after the Boer War, the disaster of the First World War, the Russian Revolutions, the history of anti-Jewish pogroms and the Holocaust, the rise of Fascism and Nazism, Stalinism, McCarthyism in the United States, Mao and China’s Cultural Revolution, Castro’s Cuba, Pinochet’s Chile, the Pathet Lao, the current violence and intolerance borne of religious fundamentalism–and the list can go on–teaches us that our only salvation and survival as a species lies in our ability to overcome ignorance and self-delusion.

We come upon more pedestrian examples of this thinking all of the time. As Joseph Conrad wrote in Heart of Darkness, “The mind of man is capable of anything—because everything is in it, all the past as well as all the future.”

We must perform this vigilance first on ourselves–and it is a painful process because it shatters the self-image that is necessary for us to continue from day-to-day: that narrative thread that connects the events of our existence and that guides our actions as best and in as limited ways that they can be guided, without falling into the abyss of nihilism. Only knowledge, and the attendant realization of the necessary components of human love, acceptance, empathy, sympathy, and community–that is understanding–the essential connections that make us human–can overcome the darkness that constantly threatens to envelope us. But there is something more.

The birth of the United States was born on the premise that the practical experiences of history and its excesses could be guarded against and such “checks and balances” would be woven, first, into the thread of its structure, and then, into the thinking of its people. This is the ideal, and it need not be said that, given that it was a construction of flawed men, despite their best efforts at education and enlightenment compared to the broad ignorance of their time, these ideals for many continued to be only that. This ideal is known as the democratic ideal.

Semantics Matter

It is one that is under attack as well. We often hear the argument against it dressed up in academic clothing as being “only semantics” on the difference between a republic and a democracy. But as I have illustrated  regarding the Epimenides Paradox, semantics matter.

For the democratic ideal is about self-government, which was a revolutionary concept in the 18th century and remains one today, which is why it has been and continues to be under attack by authoritarians, oligarchs, dictators, and factions pushing their version of the truth as they define it. But it goes further than than a mechanical process of government.

The best articulation of democracy in its American incarnation probably was written by the philosopher and educator John Dewey in his essay On Democracy. Democracy, says Dewey, is more than a special political form: it is a way of life, social and individual, that allows for the participation of every mature human being in forming the values that regulate society toward the twin goals of ensuring the general social welfare and full development of human beings as individuals.

While what we call intelligence be distributed in unequal amounts, it is the democratic faith that it is sufficiently general so that each individual has something to contribute, whose value can be assessed only as enters into the final pooled intelligence constituted by the contributions of all. Every authoritarian scheme, on the contrary, assumes that its value may be assessed by some prior principle, if not of family and birth or race and color or possession of material wealth, then by the position and rank a person occupies in the existing social scheme. The democratic faith in equality is the faith that each individual shall have the chance and opportunity to contribute whatever he is capable of contributing and that the value of his contribution be decided by its place and function in the organized total of similar contributions, not on the basis of prior status of any kind whatever.

In such a society there is no place for “I heard from a Cretan that all Cretans lie.” For democracy to work, however, requires not only vigilance but a dedication to education that is further dedicated to finding knowledge, however inconvenient or unpopular that knowledge may turn out to be. The danger has always been in lying to ourselves, and allowing ourselves to be seduced by good liars.

Note: This post has been updated for grammar and for purposes of clarity from the original.