Innervisions: The Connection Between Data and Organizational Vision

During my day job I provide a number of fairly large customers with support to determine their needs for software that meets the criteria from my last post. That is, I provide software that takes an open data systems approach to data transformation and integration. My team and I deliver this capability with an open user interface based on Windows and .NET components augmented by time-phased and data management functionality that puts SMEs back in the driver’s seat of what they need in terms of analysis and data visualization. In virtually all cases our technology obviates the need for the extensive, time consuming, and costly services of a data scientist or software developer.

Over the course of my career both as a consumer and a provider of technology solutions, I have seen an evolution in software that began with simple point solutions being developed to automate particular manual processes, to more sophisticated solutions that are designed to automate a complex function. In most of these cases, a customer has identified a gap or deficiency in their requirements that represents an inefficiency or sub-optimization of their processes and then seek a software “tool” to acquire in order to address that specific purpose. The application of these “tools” combine to meet the overall vision of the organization or sub-system within the organization.

What Do You Do With A Problem Like “Tools”

The capabilities of software in terms of data handling capabilities and functionality double every 12-18 months in today’s environment. The use of the term “tools” for software, which is really based on a pre-2000 concept, is that in the mind’s eye software is analogous to any other tool. In the literature, particularly in that authored by consultants, this analogy is oftentimes extended to common household or construction tools: a wrench, a screwdriver, or a power drill. Under this concept each tool has a specific purpose and it is up to the SME to determine which tool is best for a specific job.

The problem with this concept is that not only is it obsolete, but it does great harm financially to the organization in terms of overhead costs, organizational efficiency and effectiveness.

First of all, most physical tools are fairly static in their specific use. A hammer is still a hammer, even if some sort of power is extended to give it power. It’s purpose remains to use force to insert a connective fastener, like a nail, into a medium, like a piece of wood. A nail gun, for instance, is a type of hammer. It is more powerful and efficient but, still, it is a glorified hammer. It is a superior tool in construction because it is more efficient, provides a consistency in quality, and is faster. It also eliminates the factors of arm strength, physical coordination, and visual alignment skills of the user; as anyone who has experienced a sore thumb as a result of a misaligned strike can attest. But a nail gun is still restricted to its specific function–sinking nails for the purpose of fastening.

Software, as it has evolved, was similarly based on the concept of a tool. The physical functions of a specific vocation were the first to undergo digitization: accountants and business operations personnel had spreadsheet software applications, secretarial and clerical staffs (yes, they used to exist) had word processing software, marketing and middle management could relay their ideas with presentation software, and the list went on.

As the power of software improved it followed the functions of traditional line-and-staff organizations. Many of these were built to replace the physical calculation of formulae and concepts that required a slide rule and, later, a scientific calculator. Soon scheduling software replaced manual GANTT planning, earned value software automated the calculation of basic EVM analytics, and risk software allowed for the complex formulation involved in assessing risk for the branch of a plan using simulated Monte Carlo analysis.

Each of these software applications targeted a specific occupation, and incorporated specific knowledge (functionality) required of that occupation.

Organizational software for multiple functions usually consisted of a suite of tools under the rubric of an ERP or Business Intelligence System. Modules and “bolt-ons” consisted of tying together business processes and point software requirements augmented by large software consulting staffs to customize the solutions. In actual practice, however, these were software tools tied together though a common brand and operating environment. Oftentimes the individual bolt-ons and tools weren’t even authored by the same development team with a common vision in mind, but a reaction to market forces that required a gap be filled through acquisition of a company or intellectual property.

Needless to say, these “enterprise” solutions aren’t that at all. Instead, they are a business-driven means to penetrate a vertical by providing scattershot functionality. Once inside a company or organization the other bolt-ons and modules are marketed in order to take over other business processes. Integration is achieved across domains through data transfer or other interpretive methods.

This approach has been successful, as it has been since the halcyon days when IBM dominated the computing market, especially among the larger software firms. It also meets many of the emotional and psychic needs of many senior managers. After all, the software firm–given its economic size–feels solid. The numbers of specialists introduced into the organization to augment staff provide a feeling of safety and accomplishment. C-level management and stockholders feel that risk is handled given that their software needs are being met at some level.

What this approach did not, and does not, meet is genuine data integration, especially given the realization that the data we have been using has been inadequate and artificially restricted based on what software providers were convincing their customers was the art of the possible. The term “Big Data” began to be introduced into the lexicon, and with it the economic realization that capturing and integrating datasets that were previously “impossible” to capture and integrate was (and presently is) an economic imperative.

But the approach of incumbents, whose priority is to remain “sticky” and to defend territory against new technologies, was to respond: “we have a tool for that.” Thus, the result has been the further introduction of inefficient individual applications with their inability to fully exploit data. Among these tools are largely “dumb”–that is, viewing data flat–data visualization tools that essentially paint pretty pictures from Excel or, when they need to be applied on a larger scale, default to the old business intelligence brute force approach of applying labor to derive the importance in data. Old habits are hard to change and what one person has done another can do. But this is the economic equivalent of what is called rent-seeking behavior. That is, it is inefficient and exploitative.

After all, if you buy what was advertised as a sports car you expect to see an engine under the hood and a transmission connected to a drivetrain and a pretty powerful one at that. What one does not expect is to buy the car but have to design and build the features of these essential systems while a team of individuals are paid by the hour to push us to where we want to go. Yet, organizations (and especially consultants) seem to be happy with this model when it comes to information management.

Thus, when a technology company like mine comes across a request for proposal, an informal invitation to participate in market research, or in exploratory professional meetings (largely virtual as of this writing), the emphasis and terminology is on software “tools”, which limits the ability of consumers to exploit technology because it mentally paints a picture that limits the definition of what software should do and can do.

This mindset, however, is beginning to change and, no doubt, our current predicament under the Coronavirus crisis will accelerate that transition.

To take our analogy one step further, we are long past the time when we must buy each component of an automobile individually and then assemble it in our own garage. Point solutions, which are set and inelastic, are like individual parts of the car.

Enterprise solutions consisting of different modules and datasets, oftentimes constructed from incompatible foundations, exacerbate this situation and add the element of labor to a supposedly automated process, like buying OEM products and having to upgrade the automobile we supposed bought to do its job, but still needed (with the help of a mechanic) to perform the normal functions of steering, stopping, and accelerating.

Open systems solutions provide more flexibility, but they can be both a blessing and a curse. The challenge is to provide the right balance of out-of-the-box point solution-type functionality while still providing enough flexibility for adaptability. Taking a common data approach is key to achieving this balance. This will require the abandonment of the concept of software “tools” and shifting the focus on data.

Data and Information Take Over: Two Models

The economic imperative for data integration and optimization developed from the needs of the organization and its practitioners–whether it be managers, analysts, or auditors working in a company, a business unit, a governmental agency, or a program or project organization–is to be positioned facing forward.

In order to face forward one must first establish a knowledge-based organization or, as oftentimes identified, a data-driven organization. What this means in real terms is that data is captured, processed, and contextualized so that its importance and meaning can be derived in a timely manner so that something can be done about what is happening. During our own present situation this is not just an economic imperative, but for public health an existential one for many of us.

Thus, we are faced with several key dimensions that must be addressed: size, manner of integration, contextualization, timeliness, and target. This applies to both known and unknown datasets.

Our known datasets are those that are already being used and populated in existing systems. We know, for example, that in program and project management that we require an estimate and plan, a schedule, a manner of organizing and tracking our progress, financial management and material management systems and others. These represent our pool of structured data, and understanding the lexicon of these systems is what is necessary to normalize and rationalize the data through a universal translator.

Our unknown datasets are those that require collection but, when done, is collected and processed in an ad hoc manner. Usually the need for this data collection is learned through the school of hard knocks. In other cases, the information is not collected at all or accidentally, such as when management relies on outside experts and anecdotal information. This is the equivalent of an organizational JOHARI window shown below.

Overview of Johari Window with quadrants
showing the relationships of self-knowledge and understanding

The Johari Window explains our perceptions and our relationship to the outside world. Our universe is not a construction of our own making or imagination. We cannot make our own reality nor are there “alternative facts.” The most colorful example of refuting this specious philosophical mind game is relayed to us in Boswell’s Life of Samuel Johnson.

After we came out of the church, we stood talking for some time together of Bishop Berkeley’s ingenious sophistry to prove the nonexistence of matter, and that every thing in the universe is merely ideal. I observed, that though we are satisfied his doctrine is not true, it is impossible to refute it. I never shall forget the alacrity with which Johnson answered, striking his foot with mighty force against a large stone, till he rebounded from it — “I refute it thus.”

We can deny what we do not know, or construct magical thinking. but reality is unmoved. In the case of Johnson he kicked the stone and the stone, also unmoved, kicked back in the form of the pain that Johnson felt when he “rebounded from it”.

Nor are the quadrants equal in our perceptual windows. Some people and organizations are very well informed and others less so, but the tension and conflict of our lives–both internally and externally–relates to expanding the “open” and “facade” portions of the Johari window so that we are not only informed of how others register us, but also to uncover the unknown, and to attempt to control how others perceive us in our various roles and guises.

We see this playing out in tracking the current Coronavirus pandemic. The absence of reliable widespread tests and testing infrastructure has impeded an understanding of the virus and the most effective strategies to deploy in dealing with it. Absent data, health and governmental agencies have been left with no choice but to use the same social distancing and travel restrictions deployed during the 1918 Influenza Pandemic and then, if lifting some of these, hope for the best.

This is the situation despite the fact that national risk assessments and risk registers, such as the U.S. National Security Council Pandemic Playbook and the U.K. National Risk Register, outlined measures to be taken given certain particular indicators. No doubt there are lessons to be learned here, but at the core lesson is the fact that, absent reliable and timely data that is converted into information that can be used in a decisive and practical manner, an organization, a state, or a nation risks its survival when it fails to imagine what information it needs to collect, absent the prosaic information that comes from performing the day-to-day routine.

Admittedly, there is no great insight here regarding this need (or, at least, there shouldn’t be). This condition is the reason why intelligence systems and agencies were created in the first place. It is why military and health services imagine scenarios and war-game them, and why organizations deploy brain-storming. Individuals and organizations that go into the world uninformed or self-deluded do not last long, and history is replete with such examples. Blanche DuBois relied on the kindness of strangers and we are best served by her experience as an archetype.

And yet, we still find ourselves struggling to properly collect, integrate, and utilize information at the same time that we have come to the realization that we need to collect and process information from larger pools of data. The root cause of this condition, as asserted above, rests in the mental framing of how to approach data and the problem that needs to be solved. It requires us to change the conceptual framework that relies on the concept of “tools.”

We can make this adjustment by realigning the object of the challenge so that it conforms with what we imagine to be the desired end-state. But, still, how do we determine what we need to collect? This is first a question of perception as opposed to one regarding knowledge: what one views as not only necessary but within the realm of possibility.

Once again, this dilemma is best served by models and, in this case, it is not unlike the Overton Window. Those preferring to eschew Wikipedia entries can also find a more detailed and nuanced definition at the source through the Mackinac Center for Public Policy website.

Overton Windows showing degrees of acceptability as modified by Joshua Trevino

Joseph Overton described the window as one of defining acceptable political policies in the mind of the public. He used the terms “more free” and “less free” to describe policies that think tanks recommend to describe the amount of government intervention, avoiding the left-right comparisons used by polemicists. Various adjustments and variations to the basic window have been proposed since his original use of the model, but it has been expanded to describe public perceptions in general on a host of socioeconomic concerns.

As with the Johari Window, I would posit that there is an analogous Overton Window in relation to information that frames what is viewed as the art of the possible. These perceptions influence the actions of decision-makers in assessing the risk involved in buying software solutions. When it comes to the rapidly developing field of data capture, transformation, and effective utilization, the perception from the start suggests some degree of risk and the danger of moving too quickly. For those in the field of data optimization, given that new technology capacity increases exponentially in shorter periods of time, the barrier here is to shift the informational Overton Window so that the market is educated on the risk-reward equation.

A Unified Model for Aligning Our Data

We have discussed two models up to this point in our exploration: an Informational Johari Window and an Informational Overton Window. Each of these models, using a simplified method, isolates different dimensions of the problem of data, which when freed of the concept of “tools” unlocking it, provides us with a clearer picture of the essential nature of its capture and utilization, and to what purposes.

We are now ready to take the next step in defining how to approach data to serve the strategic interests of the enterprise or organization.

For those of us in the information field, especially in the early years when applying solutions to line-and-staff organizations, what we found is that the very introduction of the new technology changed both the structure and nature of the organization. Initially we noted a sophisticated and accelerated version of the Hawthorne Effect. But there was something more elemental and significant going on.

Digital technology is amazingly attuned, especially when properly designed and deployed, to extend the functions of human knowledge gathering and processing. In this way it can be interpreted as an extension of human evolution–of the nature of human society acting as a complex adaptive system. In fact, there are so many connections between early physical, methodological, and industrial societal developments to digitization, such as the connection between the development of the Jacquard Loom to the development of the computer punch card (and there are others) that it seems that human society would have found a way to get to this point regardless of the existence of the intervening human pioneers, though their actual contributions are clear. (For further information on the waves of development see the books Future Shock and The Third Wave by Alvin Toffler.)

When many of us first applied digitized technology to knowledge workers (in my case in the field of contract management) we found that the very introduction of the technology changed perceptions, work habits, and organizational structures in very essential ways. Like the effect of the idea of evolution as described by Daniel Dennett, the application of digital evolution is like a universal acid–it eats through and transforms everything it touches.

For example, a report that, in the past, would have taken a week or two to complete, mostly because of the research required, now took a day or so. Procurement Action Lead Times (PALT) realized significant improvements since information previously only available in paper form was now provided on-line. At the same time, systems were now able to handle greater volumes of demand. As a result, customers’ expectations changed so much that they no longer felt that they had to hold back requests for fear of overloading the system and depend on human intervention. Suppliers, seeing many commodities experiencing steady and stable growth, reverted to just-in-time manufacturing.

Over time, typing pools and secretarial staffs, the former being commonplace well into the 1980s and the latter into the 1990s, except as symbols of privilege or prestige, disappeared. Middle management and many support staffs followed this trend in the early 2000s. Today, consulting services consisting of staffing personnel to apply non-value added manual solutions such as Excel spreadsheets and PowerPoint slides to display data that has already been captured and processed, still manage to hold on in isolated pockets. That this model is not sustainable nor efficient should be obvious except for the continued support these models lend to the self-serving concept of “tools.”

Thus, the next step in the alignment of data capture and utilization to organizational vision is the interplay between our models. Practical experience suggests, though anecdotal, that as forward-facing organizations adopt more powerful digitized technologies designed to capture more and larger datasets, and to better utilize that data, that they tend to move to expand their self-awareness–their Informational Johari Window.

This, in turn, allows them to distinguish between structured and unstructured data and the value–the qualitative information content–of these datasets. This knowledge is then applied to reduce the labor and custom code required for larger data capture and utilization. In the end, these developments then determine what is the art of the possible by moving and expanding the Informational Overton Window.

Combining these concepts from a data perspective results in a combined model as illustrated below from the perspective of the subject:

Data Window of Perception and Possibility (Subject)

Extending this concept to the external subject (object or others) results in the following:

Data Window of Perception and Possibility (Object or Others)

This simplistic model describes several ways of looking at the problem of data and how to align it with its use to serve our purposes. When we gather data from the world the result can be symmetrical or asymmetrical. That is, each of us does not have the capacity to collect the same data that may be relevant to our existence or the survival of our organizations or institutions.

This same concept of symmetry and asymmetry applies to our ability to process data into information and–further–to properly apply information to when it will contribute to a decisive outcome in terms of knowledge, understanding, insight, or action.

As with the psychological Johari Window, our model takes it account the unknown within the much larger data space. Think of our Big Blue Ball (which is not so big) within the context of space. All of space represents the data of the universe. We are finding that the secrets of vast space-time are found in quanta as well in the observations of large and distant celestial events and objects. Data is everywhere. Yet, we can perceive only a small part of the universe. That is why our Data Window does not encompass the entire data space.

The quadrants, of course, are rarely co-equal, but for purposes of simplicity they are shown as such. As with the psychological Johari Window of self-awareness, the tension and conflict within the individual and its relationship with the external world is in the adjustment of the sizes of the quadrants that, hopefully, tend toward more self-awareness and openness. From the perspective of data, the equivalent is toward the expansion of the physical expansion of the Data Window while the quadrants within the window expand to minimize asymmetry of external knowledge and the unknown.

The physical limitations of symmetry, asymmetry, and the unknown portions of the data space is further limited by our perceptions. Our understanding of what is possible, acceptable, sensible, radical, unthinkable, and impossible is influenced by these perceptions. Those areas of information management that fall within some mean or midpoint of the limitations of our perceptions represent current practice and which, as with the original Johari Window, I label as “policy,” though a viable alternative label would be “practice.”

Note that there perceptions vary by the position of the subject. In the case of our own perceptions, as for those reading this post, the first variation of the model is aligned vertically. For the case of the perceptions of others, which are important in understanding their position when advocating a particular course of action, the perception model is aligned horizontally across the quadrants.

The interplay of the quadrants within the Data Window directly affect how we perceive the use of data and its potential. Thus, I have labeled the no-man’s-land portion that pushes into areas that are unknown to the subject and external object is labeled as “The Frontier.”

To an American a “frontier” is an unexplored country while, historically, in the Old World a “frontier” is a border. The former promises not only risk, but, also opportunity and invites exploration. The latter is a limitation. No doubt, my use of the term is culturally biased to the first definition.

Intellectually and physically, as we enter the frontier and learn what secrets await us there, we learn. For data we may first see a Repository of Babel and deal with it as if it were flat. But, given enough exploration we will learn its lexicon and underlying structure and, eventually, learn how to process it into information and harness its content. This, in turn, will influence the size of the Data Window, the relative sizes of the quadrants, and our perceptions of the art of the possible.

Conception to Application

This model, I believe, is a useful antecedent concept in approaching and making comprehensible what is often called Big Data. The model also helps us be more precise in how we perceive and define the term as technology changes, given that exponential increases in hardware storage and processing capabilities expand our Data Window.

Furthermore, understanding the interplay of how wee approach data, and the consequences of our perceptions of it, allow us to weigh the risk when looking at new technologies and the characteristics they need to possess in order to meet organizational goals and vision. The initial bias, as noted by Paul Kahneman in his book Thinking, Fast and Slow, is for people to stick with the status quo or the familiar–the devil they know–in lieu of something new and innovative, even when the advantages of adoption of the new innovation are clearly obvious. It requires a reorientation of thinking to allow the acceptance of the new.

Our familiar patterns when thinking about information is to look for solutions that are “tools.” The new, unfamiliar concept that we find challenging is the understanding that we do not know what we do not know when it come to data and its potential–that we must push into the frontier in order to do so–and doing so will require not only new technology that is oriented toward the optimization of data, its processing from information to knowledge, and its use, but also a new way of thinking about it and how it will align with our organizational strategy.

This can only be done by first starting with a benchmark–to practically take stock–of where we individually as organizations and where we need to be in terms of understanding our mission or purpose. For project controls and project management there is no area more at odds with this alignment.

Recently, Dave Gordon in his blog The Practicing IT Project Manager argued why project managers needed to align their projects with organizational strategy. He noted that in 2015, during the development of the “Talent Triangle” that the Project Management Institute found that a major deficiency noted by organizations was that project managers needed to take an active role in aligning their projects with organizational strategy.

As I previously noted, there are a number of project management tools on the market today and a number of data visualization tools. Yet, there are significant gaps not only in the capture, quality, and processing of data, but also in the articulation of a consistent data strategy that aligns with the project organization and the overarching organization’s business strategy, goals, and priorities.

For example, in government, program managers spend a large portion of the year defending their programs to show that they are effectively and efficiently overseeing the expenditure of resources: that they are “executing program.” Failure to execute program will result in a budget mark, or worse, result in a re-baseline, or possible restructuring or cancellation. Projected production may be scaled back in favor of more immediate priorities.

Yet, none of our so-called “tools” fully capture program execution as it is defined by agencies and Congress. We have performance management tools, earned value tools, and the list can go on. A typical program manager in government spends almost five months assessing and managing program execution, and defending program and only a few minutes each month reviewing performance. This fact alone should be indicative that our priorities are misaligned.

The intersection of organizational alignment and program management in this case is related to resource utilization and program execution. No doubt, project controls and performance management contribute to our understanding of program execution, but they are removed from informing both the program manager and the organization in a comprehensive manner about execution, risk, and opportunity–and whether those elements conflict with or align with the agency’s goals. They are even further removed from an understanding of decisions related to program execution on the interrelationships across spectrum of the project and program portfolio.

The reason for this condition is that the data is currently not being captured and processed in a comprehensive manner to be positioned for its effective exploitation and utilization in meeting the needs of the various levels of the organization, nor does the perception of the specific data needed align with organizational needs.

Correspondingly, in construction and upstream oil and gas, project managers and stakeholders are most concerned with scope, timeliness, and the inevitable questions of claims–especially the avoidance or equitable settlement of the last.

As with government, our data strategy must align with our organizational goals and vision from the perspective of all stakeholders in the effort. At the heart of this alignment is data and those technologies “fitted” to exploit it and align it with our needs.

Potato, Potahto, Tomato, Tomahto: Data Normalization vs. Standardization, Why the Difference Matters

In my vocation I run a technology company devoted to program management solutions that is primarily concerned with taking data and converting it into information to establish a knowledge-based environment. Similarly, in my avocation I deal with the meaning of information and how to turn it into insight and knowledge. This latter activity concerns the subject areas of history, sociology, and science.

In my travels just prior to and since the New Year, I have come upon a number of experts and fellow enthusiasts in these respective fields. The overwhelming numbers of these encounters have been productive, educational, and cordial. We respectfully disagree in some cases about the significance of a particular approach, governance when it comes to project and program management policy, but generally there is a great deal of agreement, particularly on basic facts and terminology. But some areas of disagreement–particularly those that come from left field–tend to be the most interesting because they create an opportunity to clarify a larger issue.

In a recent venue I encountered this last example where the issue was the use of the phrase data normalization. The issue at hand was that the use of “data normalization” suggested some statistical methodology in reconciling data into a standard schema. Instead, it was suggested, the term “data standardization” was more appropriate.

These phrases do not describe the same thing, but they do describe processes that are symbiotic, not mutually exclusive. So what about data normalization? No doubt there is a statistical use of the term, but we are dealing with the definition as used in digital technology here, just as the use of “standardization” was suggested in the same context. There are many examples of technical terminology that do not have the same meaning when used in different contexts. Here is the definition of normalization applied to data science from Technopedia, which is the proper use of the term in this case:

Normalization is the process of reorganizing data in a database so that it meets two basic requirements: (1) There is no redundancy of data (all data is stored in only one place), and (2) data dependencies are logical (all related data items are stored together). Normalization is important for many reasons, but chiefly because it allows databases to take up as little disk space as possible, resulting in increased performance.

Normalization is also known as data normalization

This is pretty basic (and necessary) stuff. I have written at length about data normalization, but also pair it with two other terms. This is data rationalization and contextualization. Here is a short definition of rationalization:

What is the benefit of Data Rationalization? To be able to effectively exploit, manage, reuse, and govern enterprise data assets (including the models which describe them), it is necessary to be able to find them. In addition, there is (or should be) a wealth of semantics (e.g. business names, definitions, relationships) embedded within an organization’s models that can be exposed for improved analysis and knowledge transfer. By linking model objects (across or within models) it is possible to discover the higher order conceptual objects for any given object. Conversely, it is possible to identify what implementation artifacts implement a higher order model object. For example, using data rationalization, one can traverse from a conceptual model entity to a logical model entity to a physical model table to a database table, etc. Similarly, Data Rationalization enables understanding of a database table by traversing up through the different model levels.

Finally, we have contextualization. Here is a good definition using Wikipedia:

Context or contextual information is any information about any entity that can be used to effectively reduce the amount of reasoning required (via filtering, aggregation, and inference) for decision making within the scope of a specific application.[2] Contextualisation is then the process of identifying the data relevant to an entity based on the entity’s contextual information. Contextualisation excludes irrelevant data from consideration and has the potential to reduce data from several aspects including volume, velocity, and variety in large-scale data intensive applications

There is no approximation of reflecting the accuracy of data in any of these terms wihin the domain of data and computer science. Nor are there statistical methods involved to approximate what needs to be accomplished precisely. The basic skill required to accomplish these tasks–knowing that the data is structured and pre-conditioned–is to reconcile the various lexicons from differing sources, much as I reconcile in my avocation the meaning of words and phrases across periods in history and across languages.

In this discussion we are dealing with the issue of different words used to describe a process or phenomenon. Similarly, we find this challenge in data.

So where does this leave data standardization? In terms of data and computer science, this describes a completely different method. Here is a definition from Wikipedia, which is the proper contextual use of the term under “Standard data model”:

A standard data model or industry standard data model (ISDM) is a data model that is widely applied in some industry, and shared amongst competitors to some degree. They are often defined by standards bodies, database vendors or operating system vendors.

In the context of project and program management, particularly as it relates to government data submission and international open standards across vendors in an industry, is the use of a common schema. In this case there is a DoD version of a UN/CEFACT XML file currently set as the standard, but soon to be replaced by a new standard using the JSON file structure.

In any event, what is clear here is that, while standardization is a necessary part of a data policy to allow for sharing of information, the strength of the chosen schema and the instructions regarding it will vary–and this variation will have an effect on the quality of the information shared. But that is not all.

This is where data normalization, rationalization, and contextualization come into play. In order to create data for the a standardized format, it is first necessary to convert what is an otherwise opaque set of data due to differences into a cohesive lexicon. In data, this is accomplished by reconciling data dictionaries to determine which items are describing the same thing, process, measure, or phenomenon. In a domain like program management, this is a finite set. But it is also specialized knowledge and where the value is added to any end product that is produced. Then, once we know how to identify the data, we must be able to map those terms to the standard schema but, keeping on eye on the use of the data down the line, must be able to properly structure and ensure interrelationships of the data are established and/or maintained to ensure its effective use. This is no mean task and why all data transformation methods and companies are not the same.

Furthermore, these functions can be accomplished efficiently or inefficiently. The inefficient method is to take the old-fashioned business intelligence method that has been around since the 1980s and before, where a team of data scientists and analysts deal with data as if it is flat and, essentially, reinvents the wheel in establishing the meaning and proper context of the data. Given enough time and money anything can be accomplished, but brute force labor will not defeat the Second Law of Thermodynamics.

In computing, which comes close to minimizing that physical law, we know that data has already been imbued with meaning upon its initial processing. In lieu of brute force labor we apply intelligence and knowledge to accomplish this requirement. This is called normalization, rationalization, and contextualization of data. It requires a small fraction of other methods in terms of time and effort, and is infinitely more transparent.

Using these methods is also where innovation, efficiency, performance, accuracy, scalability, and anticipating future requirements based on the latest technology trends comes into play. Establishing a seamless flow of data integration allows, for example, the capture of more data being able to be properly structured in a database, which lays the ground for the transition from 2D to 3D and 4D (that is, what is often called integrated) program management, as well as more effective analytics.

The term “standardization” also suffers from a weakness in data and computer science that requires that it be qualified. After all, data standardization in an enterprise or organization does not preclude the prescription of a propriety dataset. In government, this is contrary to both statutory and policy mandates. Furthermore, even given an effective, open standard, there will be a large pool of legacy and other non-conforming data that will still require capture and transformation.

The Section 809 Panel study dealt directly with this issue:

Use existing defense business system open-data requirements to improve strategic decision making on acquisition and workforce issues…. DoD has spent billions of dollars building the necessary software and institutional infrastructure to collect enterprise wide acquisition and financial data. In many cases, however, DoD lacks the expertise to effectively use that data for strategic planning and to improve decision making. Recommendation 88 would mitigate this problem by implementing congressional open-data mandates and using existing hiring authorities to bolster DoD’s pool of data science professionals.

Section 809 Volume 3, Section 9, p.477

As operating environment companies expose more and more capability into the market through middleware and other open systems methods of visualizing data, the key to a system no longer resides in its ability to produce charts and graphs. The use of Excel as an ad hoc data repository with its vulnerability to error, to manipulation, and for its resistance to the establishment of an optimized data management and corporate knowledge environment is a symptom of the larger issue.

Data and its proper structuring is at the core of organizational success and process improvement. Standardization alone will not address barriers to data optimization. According to RAND studies in 2015 and 2017* these are:

  • Data Quality and Discontinuities
  • Data Silos and Underutilized Repositories
  • Timeliness of Data for use by SMEs and Decision-makers
  • Lack of Access and Contextualization
  • Traceability and Auditability
  • Lack of the Ability to Apply Discovery in the Data
  • The issue of Contractual Technical Data and Proprietary Data

That these issues also exist in private industry demonstrates the universality of the issue. Thus, yes, standardize by all means. But also ensure that the standard is open and that transformation is traceable and auditable from the the source system to the standard schema, and then into the target database. Only then will the enterprise, the organization, and the government agency have full ownership of the data it requires to efficiently and effectively carry out its purpose.

*RAND Corporation studies are “Issues with Access to Acquisition Data and Information in the DoD: Doing Data Right in Weapons System Acquisition” (RR880, 2017), and “Issues with Access to Acquisition Data and Information in the DoD: Policy and Practice (RR1534, 2015). These can be found here.

Open: Strategic Planning, Open Data Systems, and the Section 809 Panel

Sundays are usually days reserved for music and the group Rhye was playing in the background when this topic came to mind.

I have been preparing for my presentation in collaboration with my Navy colleague John Collins for the upcoming Integrated Program Management Workshop in Baltimore. This presentation will be a non-proprietary/non-commercial talk about understanding the issue of unlocking data to support national defense systems, but the topic has broader interest.

Thus, in advance of that formal presentation in Baltimore, there are issues and principles that are useful to cover, given that data capture and its processing, delivery, and use is at the heart of all systems in government, and private industry and organizations.

Top Data Trends in Industry and Their Relationship to Open Data Systems

According to Shohreh Gorbhani, Director, Project Control Academy, the top five data trends being pursued by private industry and technology companies. My own comments follow as they relate to open data systems.

  1. Open Technologies that transition from 2D Program Management to 3D and 4D PM. This point is consistent with the College of Performance Management’s emphasis on IPM, but note that the stipulation is the use of open technologies. This is an important distinction technologically, and one that I will explore further in this post.
  2. Real-time Data Capture. This means capturing data in the moment so that the status of our systems is up-to-date without the present delays associated with manual data management and conditioning. This does not preclude the collection of structured, periodic data, but also does include the capture of transactions from real-time integrated systems where appropriate.
  3. Seamless Data Flow Integration. From the perspective of companies in manufacturing and consumer products, technologies such as IoT and Cloud are just now coming into play. But, given the underlying premises of items 1 and 2, this also means the proper automated contextualization of data using an open technology approach that flows in such a way as to be traceable.
  4. The use of Big Data. The term has lost a good deal of its meaning because of its transformation into a buzz-phrase and marketing term. But Big Data refers to the expansion in the depth and breadth of available data driven by the economic forces that drive Moore’s Law. What this means is that we are entering a new frontier of data processing and analysis that will, no doubt, break down assumptions regarding the validity and strength of certain predictive analytics. The old assumptions that restrict access to data due to limitations of technology and higher cost no longer apply. We are now in the age of Knowledge Discovery in Data (KDD). The old approach of reporting assumed that we already know what we need to know. The use of data challenges old assumptions and allows us to follow the data where it will lead us.
  5. AI Forecasting and Analysis. No doubt predictive AI will be important as we move forward with machine learning and other similar technologies. But this infant is not yet a rug rat. The initial experiences with AI are that they tend to reflect the biases of the creators. The danger here is that this defeats KDD, which results in stagnation and fugue. But there are other areas where AI can be taught to automate mundane, value-neutral tasks relating to raw data interpretation.

The 809 Panel Recommendation

The fact that industry is the driving force behind these trends that will transform the way that we view information in our day-to-day work, it is not surprising that the 809 Panel had this to say about existing defense business systems:

“Use existing defense business system open-data requirements to improve strategic decision making on acquisition and workforce issues…. DoD has spent billions of dollars building the necessary software and institutional infrastructure to collect enterprise wide acquisition and financial data. In many cases, however, DoD lacks the expertise to effectively use that data for strategic planning and to improve decision making. Recommendation 88 would mitigate this problem by implementing congressional open-data mandates and using existing hiring authorities to bolster DoD’s pool of data science professionals.”

Section 809 Volume 3, Section 9, p. 477

At one point in my military career, I was assigned as the Materiel, Fuels, and Transportation Officer of Naval Air Station, Norfolk. As a major naval air base, transportation hub, and home to a Naval Aviation Depot, we shipped and received materiel and supplies across the world. In doing so, our transportation personnel would use what at the time was new digital technology to complete an electronic bill of lading that specified what and when items were being shipped, the common or military carrier, the intended recipient, and the estimated date of arrival, among other essential information.

The customer and receiving end of this workflow received an open systems data file that contained these particulars. The file was an early version of open data known as an X12 file, for which the commercial transportation industry was an early adopter. Shipping and receiving activities and businesses used their own type of local software: and there were a number of customized and commercial choices out there, as well as those used by common carriers such various trucking and shipping firms, the USPS, FEDEX, DHS, UPS, and others. The X12 file was the DMZ that made the information open. Software manufacturers, if they wanted to stay relevant in the market, could not impose a proprietary data solution.

Furthermore, standardization of terminology and concepts ensured that the information was readable and comprehensible wherever the items landed–whether across receiving offices in the United States, Japan, Europe, or even Istanbul. Understanding that DoD needs the skillsets to be able to optimize data, it didn’t require an army of data scientists to achieve this end-state. It required the right data science expertise in the right places, and the dictates of transportation consumers to move the technology market to provide the solution.

Over the years both industry and government have developed a number of schema standards focused on specific types of data, progressing from X12 to XML and now projected to use JSON-based schemas. Each of them in their initial iterations automated the submission of physical reports that had been required by either by contract or operations. These focused on a small subset of the full dataset relating to program management and project controls.

This progression made sense.

When digitized technology is first introduced into an intensive direct-labor environment, the initial focus is to automate the production of artifacts and their underlying processes in order to phase in the technology’s acceptance. This also allows the organization to realize immediate returns on investment and improvements in productivity. But this is the first step, not the final one.

Currently for project controls the current state is the UN/CEFACT XML for program performance management data, and the contract cost and labor data collection file known as the FlexFile. Clearly the latter file, given that the recipient is the Office of the Secretary of Defense Cost Assessment and Program Evaluation (OSD CAPE), establish it as one of many feedback loops that support that office’s role in coordinating the planning, programming, budgeting, and evaluation (PPBE) system related to military strategic investments and budgeting, but only one. The program performance information is also a vital part of the PPBE process in evaluation and in future planning.

For most of the U.S. economy, market forces and consumer requirements are the driving force in digital innovation. The trends noted by Ms. Gorbhani can be confirmed through a Google search of any one of the many technology magazines and websites that can be found. The 809 Panel, drawn as it was from specialists and industry and government, were tasked “to provide recommendations that would allow DoD to adapt and deliver capability at market speeds, while ensuring that DoD remains true to its commitment to promote competition, provide transparency in its actions, and maintain the integrity of the defense acquisition system.”

Given that the work of the DoD is unique, creating a type of monopsony, it is up to leadership within the Department to create the conditions and mandates necessary to recreate in microcosm the positive effects of market forces. The DoD also has a very special, vital mission in defending the nation.

When an individual business cobbles together its mission statement it is that mission that defines the necessary elements in data collection that are then essential in making decisions. In today’s world, best commercial sector practice is to establish a Master Data Management (MDM) approach in defining data requirements and practices. In the case of DoD, a similar approach would be beneficial. Concurrent with the period of the 809 Panel’s efforts, RAND Corporation delivered a paper in 2017 (link in the previous sentence) that made recommendations related to data governance that are consistent with the 809 Panel’s recommendations. We will be discussing these specific recommendations in our presentation.

Meeting the mission and readiness are the key components to data governance in DoD. Absent such guidance, specialized software solution providers, in particular, will engage in what is called “rent-seeking” behavior. This is an economic term that means that an “entity (that) seeks to gain added wealth without any reciprocal contribution of productivity.”

No doubt, given the marketing of software solution providers, it is hard for decision-makers to tell what constitutes an open data system. The motivation of a software solution is to make itself as “sticky” as possible and it does that by enticing a customer to commit to proprietary definitions, structures, and database schemas. Usually there are “black-boxed” portions of the software that makes traceability impossible and that complicates the issue of who exactly owns the data and the ability of the customer to optimize it and utilize it as the mission dictates.

Furthermore, data visualization components like dashboards are ubiquitous in the market. A cursory stroll through a tradeshow looks like a dashboard smorgasbord combined with different practical concepts of what constitutes “open” and “integration”.

As one DoD professional recently told me, it is hard to tell the software systems apart. To do this it is necessary to understand what underlies the software. Thus, a proposed honest-broker definition of an open data system is useful and the place to start, given that this is not a notional concept since such systems have been successfully been established.

The Definition of Open Data Systems

Practical experience in implementing open data systems toward the goal of optimizing essential information from our planning, acquisition, financial, and systems engineering systems informs the following proposed definition, which is based on commercial best practice. This proposal is also based on the principle that the customer owns the data.

  1. An open data system is one based on non-proprietary neutral schemas that allow for the effective capture of all essential elements from third-party proprietary and customized software for reporting and integration necessary to support both internal and external stakeholders.
  2. An open data system allows for complete traceability and transparency from the underlying database structure of the third-party software data, through the process of data capture, transformation, and delivery of data in the neutral schema.
  3. An open data system targets the loading of the underlying source data for analysis and use into a neutral database structure that replicates the structure of the neutral schema. This allows for 100% traceability and audit of data elements received through the neutral schema, and ensures that the receiving organization owns the data.

Under this definition, data from its origination to its destination is more easily validated and traced, ensuring quality and fidelity, and establishing confidence in its value. Given these characteristics, integration of data from disparate domains becomes possible. The tracking of conflicting indicators is mitigated, since open system data allows for its effective integration without the bias of proprietary coding or restrictions on data use. Finally, both government and industry will not only establish ownership of their data–a routine principle in commercial business–but also be free to utilize new technologies that optimize the use of that data.

In closing, Gahan Wilson, a cartoonist whose work appeared in National Lampoon, The New Yorker, Playboy, and other magazines recently passed.

When thinking of the barriers to the effective use of data, I came across this cartoon in The New Yorker:

Open Data is the key to effective integration and reporting–to the optimal use of information. Once mandated and achieved, our defense and business systems will be better informed and be able to test and verify assumed knowledge, address risk, and eliminate dogmatic and erroneous conclusions. Open Data is the driver of organizational transformation keyed to the effective understanding and use of information, and all that entails. Finally, Open Data is necessary to the mission and planning systems of both industry and the U.S. Department of Defense.

Both Sides Now — The Value of Data Exploration

Over the last several months I have authored a number of stillborn articles that just did not live up to the standards that I set for this blog site. After all, sometimes we just have nothing important to add to the conversation. In a world dominated by narcissism, it is not necessary to constantly have something to say. Some reflection and consideration are necessary, especially if one is to be as succinct as possible.

A quote ascribed to Woodrow Wilson, which may be apocryphal, though it does appear in two of his biographies, was in response to being lauded by someone for making a number of short, succinct, and informative speeches. When asked how he was able to do this, President Wilson is supposed to have replied:

“It depends. If I am to speak ten minutes, I need a week for preparation; if fifteen minutes, three days; if half an hour, two days; if an hour, I am ready now.”

An undisciplined mind has a lot to say about nothing in particular with varying degrees of fidelity to fact or truth. When in normal conversation we most often free ourselves from the discipline expected for more rigorous thinking. This is not necessarily a bad thing if we are saying nothing of consequence and there are gradations, of course. Even the most disciplined mind gets things wrong. We all need editors and fact checkers.

While I am pulling forth possibly apocryphal quotes, the one most applicable that comes to mind is the comment by Hemingway as told by his deckhand in Key West and Cuba, Arnold Samuelson. Hemingway was supposed to have given this advice to the aspiring writer:

“Don’t get discouraged because there’s a lot of mechanical work to writing. There is, and you can’t get out of it. I rewrote the first part of A Farewell to Arms at least fifty times. You’ve got to work it over. The first draft of anything is shit. When you first start to write you get all the kick and the reader gets none, but after you learn to work it’s your object to convey everything to the reader so that he remembers it not as a story he had read but something that happened to himself.”

Though it deals with fiction, Hemingway’s advice applies to any sort of writing and rhetoric. Dr. Roger Spiller, who more than anyone mentored me as a writer and historian, once told me, “Writing is one of those skills that, with greater knowledge, becomes harder rather than easier.”

As a result of some reflection, over the last few months, I had to revisit the reason for the blog. Thus, this is still its purpose: it is a way to validate ideas and hypotheses with other professionals and interested amateurs in my areas of interest. I try to keep uninformed opinion in check, as all too many blogs turn out to be rants. Thus, a great deal of research goes into each of these posts, most from primary sources and from interactions with practitioners in the field. Opinions and conclusions are my own, and my reasoning for good or bad are exposed for all the world to see and I take responsibility for them.

This being said, part of my recent silence has also been due to my workload in–well–the effort involved in my day job of running a technology company, and in my recent role, since late last summer, as the Managing Editor of the College of Performance Management’s publication known as the Measurable News. Our emphasis in the latter case has been to find new contributions to the literature regarding business analytics and to define the concept of integrated project, program, and portfolio management. Stepping slightly over the line to make a pitch, I recommend anyone interested in contributing to the publication to submit an article. The submission guidelines can be found here.

Both Sides Now: New Perspectives

That out of the way, I recently saw, again on the small screen, the largely underrated movie about Neil Armstrong and the Apollo 11 moon landing, “First Man”, and was struck by this scene:

Unfortunately, the first part of the interview has been edited out of this clip and I cannot find a full scene. When asked “why space” he prefaces his comments by stating that the atmosphere of the earth seems to be so large from the perspective of looking at it from the ground but that, having touched the edge of space previously in his experience as a test pilot of the X15, he learned that it is actually very thin. He then goes on to posit that looking at the earth from space will give us a new perspective. His conclusion to this observation is then provided in the clip.

Armstrong’s words were prophetic in that the space program provided a new perspective and a new way of looking at things that were in front of us the whole time. Our spaceship Earth is a blue dot in a sea of space and, at least for a time, the people of our planet came to understand both our loneliness in space and our interdependence.

Earth from Apollo 8. Photo courtesy of NASA.

 

The impact of the Apollo program resulted in great strides being made in environmental and planetary sciences, geology, cosmology, biology, meteorology, and in day-to-day technology. The immediate effect was to inspire the environmental and human rights movements, among others. All of these advances taken together represent a new revolution in thought equal to that during the initial Enlightenment, one that is not yet finished despite the headwinds of reaction and recidivism.

It’s Life’s Illusions I Recall: Epistemology–Looking at and Engaging with the World

In his book Darwin’s Dangerous Idea, Daniel Dennett posited that what was “dangerous” about Darwinism is that it acts as a “universal acid” that, when touching other concepts and traditions, transforms them in ways that change our world-view. I have accepted this position by Dennett through the convincing argument he makes and the evidence in front of us, and it is true that Darwinism–the insight in the evolution of species over time through natural selection–has transformed our perspective of the world and left the old ways of looking at things both reconstructed and unrecognizable.

In his work, Time’s Arrow, Time’s Cycle, Stephen Jay Gould noted that Darwinism is part of one of the three great reconstructions of human thought that, in quoting Sigmund Freud, where “Humanity…has had to endure from the hand of science…outrages upon its naive self-love.” These outrages include the Copernican revolution that removed the Earth from the center of the universe, Darwinism and the origin of species, including the descent of humanity, and what John McPhee, coined as the concept of “deep time.”

But–and there is a “but”–I would propose that Darwinism and the other great reconstructions noted are but different ingredients of a larger and more broader, though compatible, type of innovation in the way the world is viewed and how it is approached–a more powerful universal acid. That innovation in thought is empiricism.

It is this approach to understanding that eats through the many ills of human existence that lead to self-delusion and folly. Though you may not know it, if you are in the field of information technology or any of the sciences, you are part of this way of viewing and interacting with the world. Married with rational thinking, this epistemology–coming from the perspectives of the astronomical observations of planets and other heavenly bodies by Charles Sanders Peirce, with further refinements by William James and John Dewey, and others have come down to us in what is known as Pragmatism. (Note that the word pragmatism in this context is not the same as the more generally used colloquial form of the word. For this type of reason Peirce preferred the term “pragmaticism”). For an interesting and popular reading of the development of modern thought and the development of Pragmatism written for the general reader I highly recommend the Pulitzer Prize-winning The Metaphysical Club by Louis Menand.

At the core of this form of empiricism is that the collection of data, that is, recording, observing, and documenting the universe and nature as it is will lead us to an understanding of things that we otherwise would not see. In our more mundane systems, such as business systems and organized efforts applying disciplined project and program management techniques and methods, we also can learn more about these complex adaptive systems through the enhanced collection and translation of data.

I Really Don’t Know Clouds At All: Data, Information, Intelligence, and Knowledge

The term “knowledge discovery in data”, or KDD for short, is an aspirational goal and so, in terms of understanding that goal, is a point of departure from the practice information management and science. I’m taking this stance because the technology industry uses terminology that, as with most language, was originally designed to accurately describe a specific phenomenon or set of methods in order to advance knowledge, only to find that that terminology has been watered down to the point where it obfuscates the issues at hand.

As I traveled to locations across the U.S. over the last three months, I found general agreement among IT professionals who are dealing with the issues of “Big Data”, data integration, and the aforementioned KDD of this state of affairs. In almost every case there is hesitation to use this terminology because it has been absconded and abused by mainstream literature, much as physicists rail against the misuse of the concept of relativity by non-scientific domains.

The impact of this confusion in terminology has caused organizations to make decisions where this terminology is employed to describe a nebulous end-state, without the initiators having an idea of the effort or scope. The danger here, of course, is that for every small innovative company out there, there is also a potential Theranos (probably several). For an in-depth understanding of the psychology and double-speak that has infiltrated our industry I highly recommend the HBO documentary, “The Inventor: Out for Blood in Silicon Valley.”

The reason why semantics are important (as they always have been despite the fact that you may have had an associate complain about “only semantics”) is that they describe the world in front of us. If we cloud the meanings of words and the use of language, it undermines the basis of common understanding and reveals the (poor) quality of our thinking. As Dr. Spiller noted, the paradox of writing and in gathering knowledge is that the more you know, the more you realize you do not know, and the harder writing and communicating knowledge becomes, though we must make the effort nonetheless.

Thus KDD is oftentimes not quite the discovery of knowledge in the sense that the term was intended to mean. It is, instead, a discovery of associations that may lead us to knowledge. Knowing this distinction is important because the corollary processes of data mining, machine learning, and the early application of AI in which we find ourselves is really the process of finding associations, correlations, trends, patterns, and probabilities in data that is approached in a manner as if all information is flat, thereby obliterating its context. This is not knowledge.

We can measure the information content of any set of data, but the real unlocked potential in that information content will come with the processing of it that leads to knowledge. To do that requires an underlying model of domain knowledge, an understanding of the different lexicons in any given set of domains, and a Rosetta Stone that provides a roadmap that identifies those elements of the lexicon that are describing the same things across them. It also requires capturing and preserving context.

For example, when I use the chat on my iPhone it attempts to anticipate what I want to write. I am given three choices of words to choose if I want to use this shortcut. In most cases, the iPhone guesses wrong, despite presenting three choices and having at its disposal (at least presumptively) a larger vocabulary than the writer. Oftentimes it seems to take control, assuming that I have misspelled or misidentified a word and chooses the wrong one for me, where my message becomes a nonsense message.

If one were to believe the hype surrounding AI, one would think that there is magic there but, as Arthur C. Clarke noted (known as Clarke’s Third Law): “Any sufficiently advanced technology is indistinguishable from magic.” Familiar with the new technologies as we are, we know that there is no magic there, and also that it is consistently wrong a good deal of the time. But many individuals come to rely upon the technology nonetheless.

Despite the gloss of something new, the long-established methods of epistemology, code-breaking, statistics, and Calculus apply–as do standards of establishing fact and truth. Despite a large set of data, the iPhone is wrong because the iPhone does not understand–does not possess knowledge–to know why it is wrong. As an aside, its dictionary is also missing a good many words.

A Segue and a Conclusion–I Still Haven’t Found What I’m Looking For: Why Data Integration?…and a Proposed Definition of the Bigness of Data

As with the question to Neil Armstrong, so the question on data. And so the answer is the same. When we look at any set of data under a particular structure of a domain, the information we derive provides us with a manner of looking at the world. In economic systems, businesses, and projects that data provides us with a basis for interpretation, but oftentimes falls short of allowing us to effectively describe and understand what is happening.

Capturing interrelated data across domains allows us to look at the phenomena of these human systems from a different perspective, providing us with the opportunity to derive new knowledge. But in order to do this, we have to be open to this possibility. It also calls for us to, as I have hammered home in this blog, reset our definitions of what is being described.

For example, there are guides in project and program management that refer to statistical measures as “predictive analytics.” This further waters down the intent of the phrase. Measures of earned value are not predictive. They note trends and a single-point outcome. Absent further analysis and processing, the statistical fallacy of extrapolation can be baked into our analysis. The same applies to any index of performance.

Furthermore, these indices and indicators–for that is all they are–do not provide knowledge, which requires a means of not only distinguishing between correlation and causation but also applying contextualization. All systems operate in a vector space. When we measure an economic or social system we are really measuring its behavior in the vector space that it inhabits. This vector space includes the way it is manifested in space-time: the equivalent of length, width, depth (that is, its relative position, significance, and size within information space), and time.

This then provides us with a hint of a definition of what often goes by the definition of “big data.” Originally, as noted in previous blogs, big data was first used in NASA in 1997 by Cox and Ellsworth (not as credited to John Mashey on Wikipedia with the dishonest qualifier “popularized”) and was simply a statement meaning “datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.”

This is a relative term given Moore’s Law. But we can begin to peel back a real definition of the “bigness” of data. It is important to do this because too many approaches to big data assume it is flat and then apply probabilities and pattern recognition to data that undermines both contextualization and knowledge. Thus…

The Bigness of Data (B) is a function (f ) of the entropy expended (S) to transform data into information, or to extract its information content.

Information evolves. It evolves toward greater complexity just as life evolves toward greater complexity. The universe is built on coded bits of information that, taken together and combined in almost unimaginable ways, provides different forms of life and matter. Our limited ability to decode and understand this information–and our interactions in it– are important to us both individually and collectively.

Much entropy is already expended in the creation of the data that describes the activity being performed. Its context is part of its information content. Obliterating the context inherent in that information content causes all previous entropy to be of no value. Thus, in approaching any set of data, the inherent information content must be taken into account in order to avoid the unnecessary (and erroneous) application of data interpretation.

More to follow in future posts.

Post-Blogging NDIA Blues — The Latest News (Project Management Wonkish)

The National Defense Industrial Association’s Integrated Program Management Division (NDIA IPMD) just had its quarterly meeting here in sunny Orlando where we braved the depths of sub-60 degrees F temperatures to start out each day.

For those not in the know, these meetings are an essential coming together of policy makers, subject matter experts, and private industry practitioners regarding the practical and mundane state-of-the-practice in complex project management, particularly focused on the concerns of the the federal government and the Department of Defense.  The end result of these meetings is to publish white papers and recommendations regarding practice to support continuous process improvement and the practical application of project management practices–allowing for a cross-pollination of commercial and government lessons learned.  This is also the intersection where innovation among the large and small are given an equal vetting and an opportunity to introduce new concepts and solutions.  This is an idealized description, of course, and most of the petty personality conflicts, competition, and self-interest that plagues any group of individuals coming together under a common set of interests also plays out here.  But generally the days are long and the workshops generally produce good products that become the de facto standard of practice in the industry. Furthermore the control that keeps the more ruthless personalities in check is the fact that, while it is a large market, the complex project management community tends to be a relatively small one, which reinforces professionalism.

The “blues” in this case is not so much borne of frustration or disappointment but, instead, from the long and intense days that the sessions offer.  The biggest news from an IT project management and application perspective was twofold. The data stream used by the industry in sharing data in an open systems manner will be simplified.  The other was the announcement that the technology used to communicate will move from XML to JSON.

Human readable formatting to Data-focused formatting.  Under Kendall’s Better Buying Power 3.0 the goal of the Department of Defense (DoD) has been to incorporate better practices from private industry where they can be applied.  I don’t see initiatives for greater efficiency and reduction of duplication going away in the new Administration, regardless of what a new initiative is called.

In case this is news to you, the federal government buys a lot of materials and end items–billions of dollars worth.  Accountability must be put in place to ensure that the money is properly spent to acquire the things being purchased.  Where technology is pushed and where there are no commercial equivalents that can be bought off the shelf, as in the systems purchased by the Department of Defense, there are measures of progress and performance (given that the contract is under a specification) that are submitted to the oversight agency in DoD.  This is a lot of data and to be brutally frank the method and format of delivery has been somewhat chaotic, inefficient, and duplicative.  The Department moved to address this by a somewhat modest requirement of open systems submission of an application-neutral XML file under the standards established by the UN/CEFACT XML organization.  This was called the Integrated Program Management Report (IMPR).  This move garnered some improvement where it has been applied, but contracts are long-term, so incorporating improvements though new contractual requirements tends to take time.  Plus, there is always resistance to change.  The Department is moving to accelerate addressing these inefficiencies in their data streams by eliminating the unnecessary overhead associated with specifications of formatting data for paper forms and dealing with data as, well, data.  Great idea and bravo!  The rub here is that in making the change, the Department has proposed dropping XML as the technology used to transfer data and move to JSON.

XML to JSON. Before I spark another techie argument about the relative merits of each, there are some basics to understand here.  First, XML is a language, JSON is simply data exchange format.  This means that XML is specifically designed to deal with hierarchical and structured data that can be queried and where validation and fidelity checks within the data are inherent in the technology. Furthermore, XML is known to scale while maintaining the integrity of the data, which is intended for use in relational databases.  Furthermore, XML is hard to break.  It is meant for editing and will maintain its structure and integrity afterward.

The counter argument encountered is that JSON is new! and uses fewer characters! (which usually turns out to be inconsequential), and people are talking about it for Big Data and NoSQL! (but this happened after the fact and the reason for shoehorning it this way is discussed below).

So does it matter?  Yes and no.  As a supplier specializing in delivering solutions that normalize and rationalize data across proprietary file structures and leverage database capabilities, I don’t care.  I can adapt quickly and will have a proof-of-concept solution out within 30 days of receiving the schema.

The risk here, which applies to DoD and the industry, is that the decision to go to JSON is made only because it is the shiny new thing used by gamers and social networking developers.  There has also been a move to adapt to other uses because of the history of significant security risks that had been found in Java, so much so that an entire Wikipedia page is devoted to them.  Oracle just killed off Java applets, though Java hangs on.  JSON, of course, isn’t Java, but it was designed from birth as JavaScript Object Notation (hence the acronym JSON), with the purpose of handling relatively small bits of data across web servers in a number of proprietary settings.

To address JSON deficiencies relative to XML, a number of tools have been and are being developed to replicate the fidelity and reliability found in XML.  Whether this is sufficient to be effective against a structured LANGUAGE is to be seen.  Much of the overhead that technies complain about in XML is due to the native functionality related to the power it brings to the table.  No doubt, a bicycle is simpler than a Formula One racer–and this is an apt comparison.  Claiming “simpler” doesn’t pass the “So What?” test knowing the business processes involved.  The technology needs to be fit to the solution.  The purpose of data transmission using APIs is not only to make it easy to produce but for it to–you know–achieve the goals of normalization and rationalization so that it can be used on the receiving end which is where the consumer (which we usually consider to be the customer) sits.

At the end of the day the ability to scale and handle hierarchical, structured data will rely on the quality and strength of the schema and the tools that are published to enforce its fidelity and compliance.  Otherwise consuming organizations will be receiving a dozen different proprietary JSON files, and that does not address the present chaos but simply adds to it.  These issues were aired out during the meeting and it seems that everyone is aware of the risks and that they can be addressed.  Furthermore, as the schema is socialized across solutions providers, it will be apparent early if the technology will be able handle the project performance data resulting from the development of a high performance aircraft or a U.S. Navy destroyer.

Something New (Again)– Top Project Management Trends 2017

Atif Qureshi at Tasque, which I learned via Dave Gordon’s blog, went out to LinkedIn’s Project Management Community to ask for the latest tends in project management.  You can find the raw responses to his inquiry at his blog here.  What is interesting is that some of these latest trends are much like the old trends which, given continuity makes sense.  But it is instructive to summarize the ones that came up most often.  Note that while Mr. Qureshi was looking for ten trends, and taken together he definitely lists more than ten, there is a lot of overlap.  In total the major issues seem to the five areas listed below.

a.  Agile, its hybrids, and its practical application.

It should not surprise anyone that the latest buzzword is Agile.  But what exactly is it in its present incarnation?  There is a great deal of rising criticism, much of it valid, that it is a way for developers and software PMs to avoid accountability. Anyone ready Glen Alleman’s Herding Cat’s Blog is aware of the issues regarding #NoEstimates advocates.  As a result, there are a number hybrid implementations of Agile that has Agile purists howling and non-purists adapting as they always do.  From my observations, however, there is an Ur-Agile that is out there common to all good implementations and wrote about them previously in this blog back in 2015.  Given the time, I think it useful to repeat it here.

The best articulation of Agile that I have read recently comes from Neil Killick, whom I have expressed some disagreement on the #NoEstimates debate and the more cultish aspects of Agile in past posts, but who published an excellent post back in July (2015) entitled “12 questions to find out: Are you doing Agile Software Development?”

Here are Neil’s questions:

  1. Do you want to do Agile Software Development? Yes – go to 2. No – GOODBYE.
  2. Is your team regularly reflecting on how to improve? Yes – go to 3. No – regularly meet with your team to reflect on how to improve, go to 2.
  3. Can you deliver shippable software frequently, at least every 2 weeks? Yes – go to 4. No – remove impediments to delivering a shippable increment every 2 weeks, go to 3.
  4. Do you work daily with your customer? Yes – go to 5. No – start working daily with your customer, go to 4.
  5. Do you consistently satisfy your customer? Yes – go to 6. No – find out why your customer isn’t happy, fix it, go to 5.
  6. Do you feel motivated? Yes – go to 7. No – work for someone who trusts and supports you, go to 2.
  7. Do you talk with your team and stakeholders every day? Yes – go to 8. No – start talking with your team and stakeholders every day, go to 7.
  8. Do you primarily measure progress with working software? Yes – go to 9. No – start measuring progress with working software, go to 8.
  9. Can you maintain pace of development indefinitely? Yes – go to 10. No – take on fewer things in next iteration, go to 9.
  10. Are you paying continuous attention to technical excellence and good design? Yes – go to 11. No – start paying continuous attention to technical excellent and good design, go to 10.
  11. Are you keeping things simple and maximising the amount of work not done? Yes – go to 12. No – start keeping things simple and writing as little code as possible to satisfy the customer, go to 11.
  12. Is your team self-organising? Yes – YOU’RE DOING AGILE SOFTWARE DEVELOPMENT!! No – don’t assign tasks to people and let the team figure out together how best to satisfy the customer, go to 12.

Note that even in software development based on Agile you are still “provid(ing) value by independently developing IP based on customer requirements.”  Only you are doing it faster and more effectively.

With the possible exception of the “self-organizing” meme, I find that items through 11 are valid ways of identifying Agile.  Given that the list says nothing about establishing closed-loop analysis of progress says nothing about estimates or the need to monitor progress, especially on complex projects.  As a matter of fact one of the biggest impediments noted elsewhere in industry is the inability of Agile to scale.  This limitations exists in its most simplistic form because Agile is fine in the development of well-defined limited COTS applications and smartphone applications.  It doesn’t work so well when one is pushing technology while developing software, especially for a complex project involving hundreds of stakeholders.  One other note–the unmentioned emphasis in Agile is technical performance measurement, since progress is based on satisfying customer requirements.  TPM, when placed in the context of a world of limited resources, is the best measure of all.

b.  The integration of new technology into PM and how to upload the existing PM corporate knowledge into that technology.

This is two sides of the same coin.  There is always  debate about the introduction of new technologies within an organization and this debate places in stark contrast the differences between risk aversion and risk management.

Project managers, especially in the complex project management environment of aerospace & defense tend, in general, to be a hardy lot.  Consisting mostly of engineers they love to push the envelope on technology development.  But there is also a stripe of engineers among them that do not apply this same approach of measured risk to their project management and business analysis system.  When it comes to tracking progress, resource management, programmatic risk, and accountability they frequently enter the risk aversion mode–believing that the less eyes on what they do the more leeway they have in achieving the technical milestones.  No doubt this is true in a world of unlimited time and resources, but that is not the world in which we live.

Aside from sub-optimized self-interest, the seeds of risk aversion come from the fact that many of the disciplines developed around performance management originated in the financial management community, and many organizations still come at project management efforts from perspective of the CFO organization.  Such rice bowl mentality, however, works against both the project and the organization.

Much has been made of the wall of honor for those CIA officers that have given their lives for their country, which lies to the right of the Langley headquarters entrance.  What has not gotten as much publicity is the verse inscribed on the wall to the left:

“And ye shall know the truth and the truth shall make you free.”

      John VIII-XXXII

In many ways those of us in the project management community apply this creed to the best of our ability to our day-to-day jobs, and it lies as the basis for all of the management improvement from Deming’s concept of continuous process improvement, through the application of Six Sigma and other management improvement methods.  What is not part of this concept is that one will apply improvement only when a customer demands it, though they have asked politely for some time.  The more information we have about what is happening in our systems, the better the project manager and the project team is armed with applying the expertise which qualified the individuals for their jobs to begin with.

When it comes to continual process improvement one does not need to wait to apply those technologies that will improve project management systems.  As a senior management (and well-respected engineer) when I worked in Navy told me; “if my program managers are doing their job virtually every element should be in the yellow, for only then do I know that they are managing risk and pushing the technology.”

But there are some practical issues that all managers must consider when managing the risks in introducing new technology and determining how to bring that technology into existing business systems without completely disrupting the organization.  This takes–good project management practices that, for information systems, includes good initial systems analysis, identification of those small portions of the organization ripe for initial entry in piloting, and a plan of data normalization and rationalization so that corporate knowledge is not lost.  Adopting systems that support more open systems that militate against proprietary barriers also helps.

c.  The intersection of project management and business analysis and its effects.

As data becomes more transparent through methods of normalization and rationalization–and the focus shifts from “tools” to the knowledge that can be derived from data–the clear separation that delineated project management from business analysis in line-and-staff organization becomes further blurred.  Even within the project management discipline, the separation in categorization of schedule analysts from cost analysts from financial analyst are becoming impediments in fully exploiting the advantages in looking at all data that is captured and which affects project performance.

d.  The manner of handling Big Data, business intelligence, and analytics that result.

Software technologies are rapidly developing that break the barriers of self-contained applications that perform one or two focused operations or a highly restricted group of operations that provide functionality focused on a single or limited set of business processes through high level languages that are hard-coded.  These new technologies, as stated in the previous section, allow users to focus on access to data, making the interface between the user and the application highly adaptable and customizable.  As these technologies are deployed against larger datasets that allow for integration of data across traditional line-and-staff organizations, they will provide insight that will garner businesses competitive advantages and productivity gains against their contemporaries.  Because of these technologies, highly labor-intensive data mining and data engineering projects that were thought to be necessary to access Big Data will find themselves displaced as their cost and lack of agility is exposed.  Internal or contracted out custom software development devoted along these same lines will also be displaced just as COTS has displaced the high overhead associated with these efforts in other areas.  This is due to the fact that hardware and processes developments are constantly shifting the definition of “Big Data” to larger and larger datasets to the point where the term will soon have no practical meaning.

e.  The role of the SME given all of the above.

The result of the trends regarding technology will be to put the subject matter expert back into the driver’s seat.  Given adaptive technology and data–and a redefinition of the analyst’s role to a more expansive one–we will find that the ability to meet the needs of functionality and the user experience is almost immediate.  Thus, when it comes to business and project management systems, the role of Agile, while these developments reinforce the characteristics that I outlined above are made real, the weakness of its applicability to more complex and technical projects is also revealed.  It is technology that will reduce the risk associated with contract negotiation, processes, documentation, and planning.  Walking away from these necessary components to project management obfuscates and avoids the hard facts that oftentimes must be addressed.

One final item that Mr. Qureshi mentions in a follow-up post–and which I have seen elsewhere in similar forums–concerns operational security.  In deployment of new technologies a gatekeeper must be aware of whether that technology will not open the organization’s corporate knowledge to compromise.  Given the greater and more integrated information and knowledge garnered by new technology, as good managers it is incumbent to ensure these improvements do not translate into undermining the organization.