Open: Strategic Planning, Open Data Systems, and the Section 809 Panel

Sundays are usually days reserved for music and the group Rhye was playing in the background when this topic came to mind.

I have been preparing for my presentation in collaboration with my Navy colleague John Collins for the upcoming Integrated Program Management Workshop in Baltimore. This presentation will be a non-proprietary/non-commercial talk about understanding the issue of unlocking data to support national defense systems, but the topic has broader interest.

Thus, in advance of that formal presentation in Baltimore, there are issues and principles that are useful to cover, given that data capture and its processing, delivery, and use is at the heart of all systems in government, and private industry and organizations.

Top Data Trends in Industry and Their Relationship to Open Data Systems

According to Shohreh Gorbhani, Director, Project Control Academy, the top five data trends being pursued by private industry and technology companies. My own comments follow as they relate to open data systems.

  1. Open Technologies that transition from 2D Program Management to 3D and 4D PM. This point is consistent with the College of Performance Management’s emphasis on IPM, but note that the stipulation is the use of open technologies. This is an important distinction technologically, and one that I will explore further in this post.
  2. Real-time Data Capture. This means capturing data in the moment so that the status of our systems is up-to-date without the present delays associated with manual data management and conditioning. This does not preclude the collection of structured, periodic data, but also does include the capture of transactions from real-time integrated systems where appropriate.
  3. Seamless Data Flow Integration. From the perspective of companies in manufacturing and consumer products, technologies such as IoT and Cloud are just now coming into play. But, given the underlying premises of items 1 and 2, this also means the proper automated contextualization of data using an open technology approach that flows in such a way as to be traceable.
  4. The use of Big Data. The term has lost a good deal of its meaning because of its transformation into a buzz-phrase and marketing term. But Big Data refers to the expansion in the depth and breadth of available data driven by the economic forces that drive Moore’s Law. What this means is that we are entering a new frontier of data processing and analysis that will, no doubt, break down assumptions regarding the validity and strength of certain predictive analytics. The old assumptions that restrict access to data due to limitations of technology and higher cost no longer apply. We are now in the age of Knowledge Discovery in Data (KDD). The old approach of reporting assumed that we already know what we need to know. The use of data challenges old assumptions and allows us to follow the data where it will lead us.
  5. AI Forecasting and Analysis. No doubt predictive AI will be important as we move forward with machine learning and other similar technologies. But this infant is not yet a rug rat. The initial experiences with AI are that they tend to reflect the biases of the creators. The danger here is that this defeats KDD, which results in stagnation and fugue. But there are other areas where AI can be taught to automate mundane, value-neutral tasks relating to raw data interpretation.

The 809 Panel Recommendation

The fact that industry is the driving force behind these trends that will transform the way that we view information in our day-to-day work, it is not surprising that the 809 Panel had this to say about existing defense business systems:

“Use existing defense business system open-data requirements to improve strategic decision making on acquisition and workforce issues…. DoD has spent billions of dollars building the necessary software and institutional infrastructure to collect enterprise wide acquisition and financial data. In many cases, however, DoD lacks the expertise to effectively use that data for strategic planning and to improve decision making. Recommendation 88 would mitigate this problem by implementing congressional open-data mandates and using existing hiring authorities to bolster DoD’s pool of data science professionals.”

Section 809 Volume 3, Section 9, p. 477

At one point in my military career, I was assigned as the Materiel, Fuels, and Transportation Officer of Naval Air Station, Norfolk. As a major naval air base, transportation hub, and home to a Naval Aviation Depot, we shipped and received materiel and supplies across the world. In doing so, our transportation personnel would use what at the time was new digital technology to complete an electronic bill of lading that specified what and when items were being shipped, the common or military carrier, the intended recipient, and the estimated date of arrival, among other essential information.

The customer and receiving end of this workflow received an open systems data file that contained these particulars. The file was an early version of open data known as an X12 file, for which the commercial transportation industry was an early adopter. Shipping and receiving activities and businesses used their own type of local software: and there were a number of customized and commercial choices out there, as well as those used by common carriers such various trucking and shipping firms, the USPS, FEDEX, DHS, UPS, and others. The X12 file was the DMZ that made the information open. Software manufacturers, if they wanted to stay relevant in the market, could not impose a proprietary data solution.

Furthermore, standardization of terminology and concepts ensured that the information was readable and comprehensible wherever the items landed–whether across receiving offices in the United States, Japan, Europe, or even Istanbul. Understanding that DoD needs the skillsets to be able to optimize data, it didn’t require an army of data scientists to achieve this end-state. It required the right data science expertise in the right places, and the dictates of transportation consumers to move the technology market to provide the solution.

Over the years both industry and government have developed a number of schema standards focused on specific types of data, progressing from X12 to XML and now projected to use JSON-based schemas. Each of them in their initial iterations automated the submission of physical reports that had been required by either by contract or operations. These focused on a small subset of the full dataset relating to program management and project controls.

This progression made sense.

When digitized technology is first introduced into an intensive direct-labor environment, the initial focus is to automate the production of artifacts and their underlying processes in order to phase in the technology’s acceptance. This also allows the organization to realize immediate returns on investment and improvements in productivity. But this is the first step, not the final one.

Currently for project controls the current state is the UN/CEFACT XML for program performance management data, and the contract cost and labor data collection file known as the FlexFile. Clearly the latter file, given that the recipient is the Office of the Secretary of Defense Cost Assessment and Program Evaluation (OSD CAPE), establish it as one of many feedback loops that support that office’s role in coordinating the planning, programming, budgeting, and evaluation (PPBE) system related to military strategic investments and budgeting, but only one. The program performance information is also a vital part of the PPBE process in evaluation and in future planning.

For most of the U.S. economy, market forces and consumer requirements are the driving force in digital innovation. The trends noted by Ms. Gorbhani can be confirmed through a Google search of any one of the many technology magazines and websites that can be found. The 809 Panel, drawn as it was from specialists and industry and government, were tasked “to provide recommendations that would allow DoD to adapt and deliver capability at market speeds, while ensuring that DoD remains true to its commitment to promote competition, provide transparency in its actions, and maintain the integrity of the defense acquisition system.”

Given that the work of the DoD is unique, creating a type of monopsony, it is up to leadership within the Department to create the conditions and mandates necessary to recreate in microcosm the positive effects of market forces. The DoD also has a very special, vital mission in defending the nation.

When an individual business cobbles together its mission statement it is that mission that defines the necessary elements in data collection that are then essential in making decisions. In today’s world, best commercial sector practice is to establish a Master Data Management (MDM) approach in defining data requirements and practices. In the case of DoD, a similar approach would be beneficial. Concurrent with the period of the 809 Panel’s efforts, RAND Corporation delivered a paper in 2017 (link in the previous sentence) that made recommendations related to data governance that are consistent with the 809 Panel’s recommendations. We will be discussing these specific recommendations in our presentation.

Meeting the mission and readiness are the key components to data governance in DoD. Absent such guidance, specialized software solution providers, in particular, will engage in what is called “rent-seeking” behavior. This is an economic term that means that an “entity (that) seeks to gain added wealth without any reciprocal contribution of productivity.”

No doubt, given the marketing of software solution providers, it is hard for decision-makers to tell what constitutes an open data system. The motivation of a software solution is to make itself as “sticky” as possible and it does that by enticing a customer to commit to proprietary definitions, structures, and database schemas. Usually there are “black-boxed” portions of the software that makes traceability impossible and that complicates the issue of who exactly owns the data and the ability of the customer to optimize it and utilize it as the mission dictates.

Furthermore, data visualization components like dashboards are ubiquitous in the market. A cursory stroll through a tradeshow looks like a dashboard smorgasbord combined with different practical concepts of what constitutes “open” and “integration”.

As one DoD professional recently told me, it is hard to tell the software systems apart. To do this it is necessary to understand what underlies the software. Thus, a proposed honest-broker definition of an open data system is useful and the place to start, given that this is not a notional concept since such systems have been successfully been established.

The Definition of Open Data Systems

Practical experience in implementing open data systems toward the goal of optimizing essential information from our planning, acquisition, financial, and systems engineering systems informs the following proposed definition, which is based on commercial best practice. This proposal is also based on the principle that the customer owns the data.

  1. An open data system is one based on non-proprietary neutral schemas that allow for the effective capture of all essential elements from third-party proprietary and customized software for reporting and integration necessary to support both internal and external stakeholders.
  2. An open data system allows for complete traceability and transparency from the underlying database structure of the third-party software data, through the process of data capture, transformation, and delivery of data in the neutral schema.
  3. An open data system targets the loading of the underlying source data for analysis and use into a neutral database structure that replicates the structure of the neutral schema. This allows for 100% traceability and audit of data elements received through the neutral schema, and ensures that the receiving organization owns the data.

Under this definition, data from its origination to its destination is more easily validated and traced, ensuring quality and fidelity, and establishing confidence in its value. Given these characteristics, integration of data from disparate domains becomes possible. The tracking of conflicting indicators is mitigated, since open system data allows for its effective integration without the bias of proprietary coding or restrictions on data use. Finally, both government and industry will not only establish ownership of their data–a routine principle in commercial business–but also be free to utilize new technologies that optimize the use of that data.

In closing, Gahan Wilson, a cartoonist whose work appeared in National Lampoon, The New Yorker, Playboy, and other magazines recently passed.

When thinking of the barriers to the effective use of data, I came across this cartoon in The New Yorker:

Open Data is the key to effective integration and reporting–to the optimal use of information. Once mandated and achieved, our defense and business systems will be better informed and be able to test and verify assumed knowledge, address risk, and eliminate dogmatic and erroneous conclusions. Open Data is the driver of organizational transformation keyed to the effective understanding and use of information, and all that entails. Finally, Open Data is necessary to the mission and planning systems of both industry and the U.S. Department of Defense.

Sledgehammer: Pisano Talks!

My blogging hiatus is coming to an end as I take a sledgehammer to the writer’s block wall.

I’ve traveled far and wide over the last six months to various venues across the country and have collected a number of new and interesting perspectives on the issues of data transformation, integrated project management, and business analytics and visualization. As a result, I have developed some very strong opinions regarding the trends that work and those that don’t regarding these topics and will be sharing these perspectives (with the appropriate supporting documentation per usual) in following posts.

To get things started this post will be relatively brief.

First, I will be speaking along with co-presenter John Collins, who is a Senior Acquisition Specialist at the Navy Engineering & Logistics Office, at the Integrated Program Management Workshop at the Hyatt Regency in beautiful downtown Baltimore’s Inner Harbor 10-12 December. So come on down! (or over) and give us a listen.

The topic is “Unlocking Data to Improve National Defense Systems”. Today anyone can put together pretty visualizations of data from Excel spreadsheets and other sources–and some have made quite a bit of money doing so. But accessing the right data at the right level of detail, transforming it so that its information content can be exploited, and contextualizing it properly through integration will provide the most value to organizations.

Furthermore, our presentation will make a linkage to what data is necessary to national defense systems in constructing the necessary artifacts to support the Department of Defense’s Planning, Programming, Budgeting and Execution (PPBE) process and what eventually becomes the Future Years Defense Program (FYDP).

Traditionally information capture and reporting has been framed as a question of oversight, reporting, and regulation related to contract management, capital investment cost control, and DoD R&D and acquisition program management. But organizations that fail to leverage the new powerful technologies that double processing and data storage capability every 18 months, allowing for both the depth and breadth of data to expand exponentially, are setting themselves up to fail. In national defense, this is a condition that cannot be allowed to occur.

If DoD doesn’t collect this information, which we know from the reports of cybersecurity agencies that other state actors are collecting, we will be at a serious strategic disadvantage. We are in a new frontier of knowledge discovery in data. Our analysts and program managers think they know what they need to be viewing, but adding new perspectives through integration provide new perspectives and, as a result, will result in new indicators and predictive analytics that will, no doubt, overtake current practice. Furthermore, that information can now be processed and contribute more, timely, and better intelligence to the process of strategic and operational planning.

The presentation will be somewhat wonky and directed at policymakers and decisionmakers in both government and industry. But anyone can play, and that is the cool aspect of our community. The presentation will be non-commercial, despite my day job–a line I haven’t crossed up to this point in this blog, but in this latter case will be changing to some extent.

Back in early 2018 I became the sole proprietor of SNA Software LLC–an industry technology leader in data transformation–particularly in capturing datasets that traditionally have been referred to as “Big Data”–and a hybrid point solution that is built on an open business intelligence framework. Our approach leverages the advantages of COTS (delivering the 80% solution out of the box) with open business intelligence that allows for rapid configuration to adapt the solution to an organization’s needs and culture. Combined with COTS data capture and transformation software–the key to transforming data into information and then combining it to provide intelligence at the right time and to the right place–the latency in access to trusted intelligence is reduced significantly.

Along these lines, I have developed some very specific opinions about how to achieve this transformation–and have put those concepts into practice through SNA and delivered those solutions to our customers. Thus, the result has been to reduce both the effort and time to capture large datasets from data that originates in pre-processed data, and to eliminate direct labor and the duration to information delivery by more than 99%. The path to get there is not to apply an army of data scientists and data analysts that deals with all data as if it is flat and to reinvent the wheel–only to deliver a suboptimized solution sometime in the future after unnecessarily expending time and resources. This is a devolution to the same labor-intensive business intelligence approaches that we used back in the 1980s and 1990s. The answer is not to throw labor at data that already has its meaning embedded into its information content. The answer is to apply smarts through technology, and that’s what we do.

Further along these lines, if you are using hard-coded point solutions (also called purpose-built software) and knitted best-of-breed, chances are that you will find that you are poorly positioned to exploit new technology and will be obsolete within the next five years, if not sooner. The model of selling COTS solutions and walking away except for traditional maintenance and support is dying. The new paradigm will be to be part of the solution and that requires domain knowledge that translates into technology delivery.

More on these points in future posts, but I’ve placed the stake in the ground and we’ll see how they hold up to critique and comment.

Finally, I recently became aware of an extremely informative and cutting-edge website that includes podcasts from thought leaders in the area of integrated program management. It is entitled InnovateIPM and is operated and moderated by a gentleman named Rob Williams. He is a domain expert in project cost development, with over 20 years of experience in the oil, gas, and petrochemical industries. Robin has served in a variety of roles throughout his career and is now focuses on cost estimating and Front-End Loading quality assurance. His current role is advanced project cost estimator at Marathon Petroleum’s Galveston Bay Refinery in Texas City.

Rob was also nice enough to continue a discussion we started at a project controls symposium and interviewed me for a podcast. I’ll post additional information once it is posted.

Both Sides Now — The Value of Data Exploration

Over the last several months I have authored a number of stillborn articles that just did not live up to the standards that I set for this blog site. After all, sometimes we just have nothing important to add to the conversation. In a world dominated by narcissism, it is not necessary to constantly have something to say. Some reflection and consideration are necessary, especially if one is to be as succinct as possible.

A quote ascribed to Woodrow Wilson, which may be apocryphal, though it does appear in two of his biographies, was in response to being lauded by someone for making a number of short, succinct, and informative speeches. When asked how he was able to do this, President Wilson is supposed to have replied:

“It depends. If I am to speak ten minutes, I need a week for preparation; if fifteen minutes, three days; if half an hour, two days; if an hour, I am ready now.”

An undisciplined mind has a lot to say about nothing in particular with varying degrees of fidelity to fact or truth. When in normal conversation we most often free ourselves from the discipline expected for more rigorous thinking. This is not necessarily a bad thing if we are saying nothing of consequence and there are gradations, of course. Even the most disciplined mind gets things wrong. We all need editors and fact checkers.

While I am pulling forth possibly apocryphal quotes, the one most applicable that comes to mind is the comment by Hemingway as told by his deckhand in Key West and Cuba, Arnold Samuelson. Hemingway was supposed to have given this advice to the aspiring writer:

“Don’t get discouraged because there’s a lot of mechanical work to writing. There is, and you can’t get out of it. I rewrote the first part of A Farewell to Arms at least fifty times. You’ve got to work it over. The first draft of anything is shit. When you first start to write you get all the kick and the reader gets none, but after you learn to work it’s your object to convey everything to the reader so that he remembers it not as a story he had read but something that happened to himself.”

Though it deals with fiction, Hemingway’s advice applies to any sort of writing and rhetoric. Dr. Roger Spiller, who more than anyone mentored me as a writer and historian, once told me, “Writing is one of those skills that, with greater knowledge, becomes harder rather than easier.”

As a result of some reflection, over the last few months, I had to revisit the reason for the blog. Thus, this is still its purpose: it is a way to validate ideas and hypotheses with other professionals and interested amateurs in my areas of interest. I try to keep uninformed opinion in check, as all too many blogs turn out to be rants. Thus, a great deal of research goes into each of these posts, most from primary sources and from interactions with practitioners in the field. Opinions and conclusions are my own, and my reasoning for good or bad are exposed for all the world to see and I take responsibility for them.

This being said, part of my recent silence has also been due to my workload in–well–the effort involved in my day job of running a technology company, and in my recent role, since late last summer, as the Managing Editor of the College of Performance Management’s publication known as the Measurable News. Our emphasis in the latter case has been to find new contributions to the literature regarding business analytics and to define the concept of integrated project, program, and portfolio management. Stepping slightly over the line to make a pitch, I recommend anyone interested in contributing to the publication to submit an article. The submission guidelines can be found here.

Both Sides Now: New Perspectives

That out of the way, I recently saw, again on the small screen, the largely underrated movie about Neil Armstrong and the Apollo 11 moon landing, “First Man”, and was struck by this scene:

Unfortunately, the first part of the interview has been edited out of this clip and I cannot find a full scene. When asked “why space” he prefaces his comments by stating that the atmosphere of the earth seems to be so large from the perspective of looking at it from the ground but that, having touched the edge of space previously in his experience as a test pilot of the X15, he learned that it is actually very thin. He then goes on to posit that looking at the earth from space will give us a new perspective. His conclusion to this observation is then provided in the clip.

Armstrong’s words were prophetic in that the space program provided a new perspective and a new way of looking at things that were in front of us the whole time. Our spaceship Earth is a blue dot in a sea of space and, at least for a time, the people of our planet came to understand both our loneliness in space and our interdependence.

Earth from Apollo 8. Photo courtesy of NASA.

 

The impact of the Apollo program resulted in great strides being made in environmental and planetary sciences, geology, cosmology, biology, meteorology, and in day-to-day technology. The immediate effect was to inspire the environmental and human rights movements, among others. All of these advances taken together represent a new revolution in thought equal to that during the initial Enlightenment, one that is not yet finished despite the headwinds of reaction and recidivism.

It’s Life’s Illusions I Recall: Epistemology–Looking at and Engaging with the World

In his book Darwin’s Dangerous Idea, Daniel Dennett posited that what was “dangerous” about Darwinism is that it acts as a “universal acid” that, when touching other concepts and traditions, transforms them in ways that change our world-view. I have accepted this position by Dennett through the convincing argument he makes and the evidence in front of us, and it is true that Darwinism–the insight in the evolution of species over time through natural selection–has transformed our perspective of the world and left the old ways of looking at things both reconstructed and unrecognizable.

In his work, Time’s Arrow, Time’s Cycle, Stephen Jay Gould noted that Darwinism is part of one of the three great reconstructions of human thought that, in quoting Sigmund Freud, where “Humanity…has had to endure from the hand of science…outrages upon its naive self-love.” These outrages include the Copernican revolution that removed the Earth from the center of the universe, Darwinism and the origin of species, including the descent of humanity, and what John McPhee, coined as the concept of “deep time.”

But–and there is a “but”–I would propose that Darwinism and the other great reconstructions noted are but different ingredients of a larger and more broader, though compatible, type of innovation in the way the world is viewed and how it is approached–a more powerful universal acid. That innovation in thought is empiricism.

It is this approach to understanding that eats through the many ills of human existence that lead to self-delusion and folly. Though you may not know it, if you are in the field of information technology or any of the sciences, you are part of this way of viewing and interacting with the world. Married with rational thinking, this epistemology–coming from the perspectives of the astronomical observations of planets and other heavenly bodies by Charles Sanders Peirce, with further refinements by William James and John Dewey, and others have come down to us in what is known as Pragmatism. (Note that the word pragmatism in this context is not the same as the more generally used colloquial form of the word. For this type of reason Peirce preferred the term “pragmaticism”). For an interesting and popular reading of the development of modern thought and the development of Pragmatism written for the general reader I highly recommend the Pulitzer Prize-winning The Metaphysical Club by Louis Menand.

At the core of this form of empiricism is that the collection of data, that is, recording, observing, and documenting the universe and nature as it is will lead us to an understanding of things that we otherwise would not see. In our more mundane systems, such as business systems and organized efforts applying disciplined project and program management techniques and methods, we also can learn more about these complex adaptive systems through the enhanced collection and translation of data.

I Really Don’t Know Clouds At All: Data, Information, Intelligence, and Knowledge

The term “knowledge discovery in data”, or KDD for short, is an aspirational goal and so, in terms of understanding that goal, is a point of departure from the practice information management and science. I’m taking this stance because the technology industry uses terminology that, as with most language, was originally designed to accurately describe a specific phenomenon or set of methods in order to advance knowledge, only to find that that terminology has been watered down to the point where it obfuscates the issues at hand.

As I traveled to locations across the U.S. over the last three months, I found general agreement among IT professionals who are dealing with the issues of “Big Data”, data integration, and the aforementioned KDD of this state of affairs. In almost every case there is hesitation to use this terminology because it has been absconded and abused by mainstream literature, much as physicists rail against the misuse of the concept of relativity by non-scientific domains.

The impact of this confusion in terminology has caused organizations to make decisions where this terminology is employed to describe a nebulous end-state, without the initiators having an idea of the effort or scope. The danger here, of course, is that for every small innovative company out there, there is also a potential Theranos (probably several). For an in-depth understanding of the psychology and double-speak that has infiltrated our industry I highly recommend the HBO documentary, “The Inventor: Out for Blood in Silicon Valley.”

The reason why semantics are important (as they always have been despite the fact that you may have had an associate complain about “only semantics”) is that they describe the world in front of us. If we cloud the meanings of words and the use of language, it undermines the basis of common understanding and reveals the (poor) quality of our thinking. As Dr. Spiller noted, the paradox of writing and in gathering knowledge is that the more you know, the more you realize you do not know, and the harder writing and communicating knowledge becomes, though we must make the effort nonetheless.

Thus KDD is oftentimes not quite the discovery of knowledge in the sense that the term was intended to mean. It is, instead, a discovery of associations that may lead us to knowledge. Knowing this distinction is important because the corollary processes of data mining, machine learning, and the early application of AI in which we find ourselves is really the process of finding associations, correlations, trends, patterns, and probabilities in data that is approached in a manner as if all information is flat, thereby obliterating its context. This is not knowledge.

We can measure the information content of any set of data, but the real unlocked potential in that information content will come with the processing of it that leads to knowledge. To do that requires an underlying model of domain knowledge, an understanding of the different lexicons in any given set of domains, and a Rosetta Stone that provides a roadmap that identifies those elements of the lexicon that are describing the same things across them. It also requires capturing and preserving context.

For example, when I use the chat on my iPhone it attempts to anticipate what I want to write. I am given three choices of words to choose if I want to use this shortcut. In most cases, the iPhone guesses wrong, despite presenting three choices and having at its disposal (at least presumptively) a larger vocabulary than the writer. Oftentimes it seems to take control, assuming that I have misspelled or misidentified a word and chooses the wrong one for me, where my message becomes a nonsense message.

If one were to believe the hype surrounding AI, one would think that there is magic there but, as Arthur C. Clarke noted (known as Clarke’s Third Law): “Any sufficiently advanced technology is indistinguishable from magic.” Familiar with the new technologies as we are, we know that there is no magic there, and also that it is consistently wrong a good deal of the time. But many individuals come to rely upon the technology nonetheless.

Despite the gloss of something new, the long-established methods of epistemology, code-breaking, statistics, and Calculus apply–as do standards of establishing fact and truth. Despite a large set of data, the iPhone is wrong because the iPhone does not understand–does not possess knowledge–to know why it is wrong. As an aside, its dictionary is also missing a good many words.

A Segue and a Conclusion–I Still Haven’t Found What I’m Looking For: Why Data Integration?…and a Proposed Definition of the Bigness of Data

As with the question to Neil Armstrong, so the question on data. And so the answer is the same. When we look at any set of data under a particular structure of a domain, the information we derive provides us with a manner of looking at the world. In economic systems, businesses, and projects that data provides us with a basis for interpretation, but oftentimes falls short of allowing us to effectively describe and understand what is happening.

Capturing interrelated data across domains allows us to look at the phenomena of these human systems from a different perspective, providing us with the opportunity to derive new knowledge. But in order to do this, we have to be open to this possibility. It also calls for us to, as I have hammered home in this blog, reset our definitions of what is being described.

For example, there are guides in project and program management that refer to statistical measures as “predictive analytics.” This further waters down the intent of the phrase. Measures of earned value are not predictive. They note trends and a single-point outcome. Absent further analysis and processing, the statistical fallacy of extrapolation can be baked into our analysis. The same applies to any index of performance.

Furthermore, these indices and indicators–for that is all they are–do not provide knowledge, which requires a means of not only distinguishing between correlation and causation but also applying contextualization. All systems operate in a vector space. When we measure an economic or social system we are really measuring its behavior in the vector space that it inhabits. This vector space includes the way it is manifested in space-time: the equivalent of length, width, depth (that is, its relative position, significance, and size within information space), and time.

This then provides us with a hint of a definition of what often goes by the definition of “big data.” Originally, as noted in previous blogs, big data was first used in NASA in 1997 by Cox and Ellsworth (not as credited to John Mashey on Wikipedia with the dishonest qualifier “popularized”) and was simply a statement meaning “datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.”

This is a relative term given Moore’s Law. But we can begin to peel back a real definition of the “bigness” of data. It is important to do this because too many approaches to big data assume it is flat and then apply probabilities and pattern recognition to data that undermines both contextualization and knowledge. Thus…

The Bigness of Data (B) is a function (f ) of the entropy expended (S) to transform data into information, or to extract its information content.

Information evolves. It evolves toward greater complexity just as life evolves toward greater complexity. The universe is built on coded bits of information that, taken together and combined in almost unimaginable ways, provides different forms of life and matter. Our limited ability to decode and understand this information–and our interactions in it– are important to us both individually and collectively.

Much entropy is already expended in the creation of the data that describes the activity being performed. Its context is part of its information content. Obliterating the context inherent in that information content causes all previous entropy to be of no value. Thus, in approaching any set of data, the inherent information content must be taken into account in order to avoid the unnecessary (and erroneous) application of data interpretation.

More to follow in future posts.

Back to School Daze Blogging–DCMA Investigation on POGO, DDSTOP, $600 Ashtrays,and Epistemic Sunk Costs

Family summer visits and trips are in the rear view–as well as the simultaneous demands of balancing the responsibilities of a, you know, day job–and so it is time to take up blogging once again.

I will return to my running topic of Integrated Program and Project Management in short order, but a topic of more immediate interest concerns the article that appeared on the website for pogo.org last week entitled “Pentagon’s Contracting Gurus Mismanaged Their Own Contracts.” Such provocative headlines are part and parcel of organizations like POGO, which have an agenda that seems to cross the line between reasonable concern and unhinged outrage with a tinge conspiracy mongering. But the content of the article itself is accurate and well written, if also somewhat ripe with overstatement, so I think it useful to unpack what it says and what it means.

POGO and Its Sources

The source of the article comes from three sources regarding an internal Defense Contract Management Agency (DCMA) IT project known as the Integrated Workflow Management System (IWMS). These consist of a September 2017 preliminary investigative report, an April 2018 internal memo, and a draft of the final report.

POGO begins the article by stating that DCMA administers over $5 trillion in contracts for the Department of Defense. The article erroneously asserts that it also negotiates these contracts, apparently not understanding the process of contract oversight and administration. The cost of IWMS was apparently $46.6M and the investigation into the management and administration of the program was initiated by the then-Commander of DCMA, Lieutenant General Wendy Masiello, shortly before she retired from the government in May 2017.

The implication here, given the headline, seems to be that if there is a problem in internal management within the agency, then that would translate into questioning its administration of the $5 trillion in contract value. I view it differently, given that I understand that there are separate lines of responsibility in the agency that do not overlap, particularly in IT. Of the $46.6M there is a question of whether $17M in value was properly funded. More on this below, but note that, to put things in perspective, $46.6M is .000932% of DCMA’s oversight responsibility. This is aside from the fact that the comparison is not quite correct, given that the CIO had his own budget, which was somewhat smaller and unrelated to the $5 trillion figure. But I think it important to note that POGO’s headline and the introduction of figures, while sounding authoritative, are irrelevant to the findings of the internal investigation and draft report. This is a scare story using scare numbers, particularly given the lack of context. I had some direct experience in my military career with issues inspired by the POGO’s founders’ agenda that I will cover below.

In addition to the internal investigation on IWMS, there was also an inspector general (IG) investigation of thirteen IT services contracts that resulted in what can only be described as pedestrian procedural discrepancies that are easily correctable, despite the typically overblown language found in most IG reports. Thus, I will concentrate on this post on the more serious findings of the internal investigation.

My Own Experience with DCMA

A note at this point on full disclosure: I have done business with and continue to do business with DCMA, both as a paid supplier of software solutions, and have interacted with DCMA personnel at publicly attended professional forums and workshops. I have no direct connection, as far as I am aware, to the IWMS program, though given that the assessment is to the IT organization, it is possible that there was an indirect relationship. I have met Lieutenant General Masiello and dealt with some of her subordinates not only during her time at DCMA, but also in some of her previous assignments in Air Force. I always found her to be an honest and diligent officer and respect her judgment. Her distinguished career speaks for itself. I have talked on the telephone to some of the individuals mentioned in the article on unrelated matters, and was aware of their oversight of some of my own efforts. My familiarity with all of them was both businesslike and brief.

As a supplier to DCMA my own contracts and the personnel that administer them were, from time-to-time, affected by the fallout from what I now know to have occurred. Rumors have swirled in our industry regarding the alleged mismanagement of an IT program in DCMA, but until the POGO article, the reasons for things such as a temporary freeze and review of existing IT programs and other actions were viewed as part and parcel of managing a large organization. I guess the explanation is now clear.

The Findings of the Investigation

The issue at hand is largely surrounding the method of source selection, which may have constituted a conflict of interest, and the type of money that was used to fund the program. In reading the report I was reminded of what Glen Alleman recently wrote in his blog entitled “DDSTOP: The Saga Continues.” The acronym DDSTOP means: Don’t Do Stupid Things On Purpose.

There is actually an economic behavioral principle for DDSTOP that explains why people make and double down on bad decisions and irrational beliefs. It is called epistemic sunk cost. It is what causes people to double down in gambling (to the great benefit of the house), to persist in mistaken beliefs, and, as stated in the link above, to “persist with the option which they have already invested in and resist changing to another option that might be more suitable regarding the future requirements of the situation.” The findings seem to document a situation that fits this last description.

In going over the findings of the report, it appears that IWMS’s program violated the following:

a. Contractual efforts in the program that were appropriate for the use of Research, Development, Test and Evaluation (R,D,T & E) funds as opposed to those appropriate for O&M (Operations and Maintenance) funds. What the U.S. Department of Defense calls “color of money.”

b. Amounts that were expended on contract that exceeded the authorized funding documents, which is largely based on the findings regarding the appropriate color of money. This would constitute a serious violation known as an Anti-Deficiency Act violation which, in layman’s terms, is directed to punish public employees for the misappropriation of government funds.

c. Expended amounts of O&M that exceeded the authorized levels.

d. Poor or non-existent program management and cost performance management.

e. Inappropriate contracting vehicles that, taken together, sidestepped more stringent oversight, aside from the award of a software solutions contract to the same company that defined the agency’s requirements.

Some of these are procedural and some are serious, particularly the Anti-deficiency Act (ADA) violations, are serious. In the Contracting Officer’s rulebook, you can withstand pedestrian procedural and administrative findings that are part and parcel of running an intensive contracting organization that acquires a multitude of supplies and services under deadline. But an ADA violation is the deadly one, since it is a violation of statute.

As a result of these findings, the recommendation is for DCMA to lose acquisition authority over the DoD micro-contracting level ($10,000). Organizationally and procedurally, this is a significant and mission-disruptive recommendation.

The Role and Importance of DCMA

DCMA performs an important role in contract compliance and oversight to ensure that public monies are spent properly and for the intended purpose. They perform this role mostly on contracts that are negotiated and entered into by other agencies and the military services within the Department of Defense, where they are assigned contract administration duties. Thus, the fact that DCMA’s internal IT acquisition systems and procedures were problematic is embarrassing.

But some perspective is necessary because there is a drive by some more extreme elements in Congress and elsewhere that would like to see the elimination of the agency. I believe that this would be a grave mistake. As John F. Kennedy is quoted as having said: “You don’t tear your fences down unless you know why they were put up.”

For those of you who were not around prior to the formation of DCMA or its predecessor organization, the Defense Contract Management Command (DCMC), it is important to note that the formation of the agency is a result of acquisition reform. Prior to 1989 the contract administration services (CAS) capabilities of the military services and various DoD offices varied greatly in capability, experience, and oversight effectiveness.Some of these duties had been assigned to what is now the Defense Logistics Agency (DLA), but major acquisition contracts remained with the Services.

For example, when I was on active duty as a young Navy Supply Corps Officer as part of the first class that was to be the Navy Acquisition Corps, I was taught cradle-to-grave contracting. That is, I learned to perform customer requirements development, economic analysis, contract planning, development of a negotiating position, contract negotiation, and contract administration–soup to nuts. The expense involved in developing and maintaining the skill set required of personnel to maintain such a broad-based expertise is unsustainable. For analogy, it is as if every member of a baseball club must be able to play all nine positions at the same level of expertise; it is impossible.

Furthermore, for contract administration a defense contractor would have contractual obligations for oversight in San Diego, where I was stationed, that were different from contracts awarded in Long Beach or Norfolk or any of the other locations where a contracting office was located. Furthermore, the military services, having their own organizational cultures, provided additional variations that created a plethora of unique requirements that added cost, duplication, inconsistency, and inter-organizational conflict.

This assertion is more than anecdotal. A series of studies were commissioned in the 1980s (the findings of which were subsequently affirmed) to eliminate duplication and inconsistency in the administration of contracts, particularly major acquisition programs. Thus, DCMC was first established under DLA and subsequently became its own agency. Having inherited many of the contracting field office, the agency has struggled to consolidate operations so that CAS is administered in a consistent manner across contracts. Because contract negotiation and program management still resides in the military services, there is a natural point of conflict between the services and the agency.

In my view, this conflict is a healthy one, as all power in the hands of a single individual, such as a program manager, would lead to more fraud, waste, and abuse, not less. Internal checks and balances are necessary in proper public administration, where some efficiency is sacrificed to accountability. It is not just the goal of government to “make the trains run on time”, but to perform oversight of the public’s money so that there is accountability in its expenditure, and integrity in systems and procedures. In the case of CAS, it is to ensure that what is being procured actually gets delivered in conformance to the contract terms and conditions designed to reduce the inherent risk in complex acquisition programs.

In order to do its job effectively, DCMA requires innovative digital systems to allow it to perform its CAS function. As a result, the agency must also possess an acquisition capability. Given the size of the task at hand in performing CAS on over $5 trillion of contract effort, the data involved is quite large, and the number of personnel geographically distributed. The inevitable comparisons to private industry will arise, but few companies in the world have to perform this level of oversight on such a large economic scale, which includes contracts comprising every major supplier to the U.S. Department of Defense, involving detailed knowledge of the management control systems of those companies that receive the taxpayer’s money. Thus, this is a uniquely difficult job. When one understands that in private industry the standard failure rate of IT projects is more than 70% percent, then one cannot help but be unimpressed by these findings, given the challenge.

Assessing the Findings and Recommendations

There is a reason why internal oversight documents of this sort stay confidential–it is because these are preliminary/draft findings and there are two sides to every story which may lead to revisions. In addition, reading these findings without the appropriate supporting documentation can lead one to the wrong impression and conclusions. But it is important to note that this was an internally generated investigation. The checks and balances of management oversight that should occur, did occur. But let’s take a close look at what the reports indicate so that we can draw some lessons. I also need to mention here that POGO’s conflation of the specific issues in this program as a “poster child” for cost overruns and schedule slippage displays a vast ignorance of DoD procurement systems on the part of the article’s author.

Money, Money, Money

The core issue in the findings revolves around the proper color of money, which seems to hinge on the definition of Commercial-Off-The-Shelf (COTS) software and the effort that was expended using the two main types of money that apply to the core contract: RDT&E and O&M.

Let’s take the last point first. It appears that the IWMS effort consisted of a combination of COTS and custom software. This would require acquisition, software familiarization, and development work. It appears that the CIO was essentially running a proof-of-concept to see what would work, and then incrementally transitioned to developing the solution.

What is interesting is that there is currently an initiative in the Department of Defense to do exactly what the DCMA CIO did as part of his own initiative in introducing a new technological approach to create IWMS. It is called Other Transactional Authority (OTA). The concept didn’t exist and was not authorized until the 2016 NDAA and is given specific statutory authority under 10 U.S.C. 2371b. This doesn’t excuse the actions that led to the findings, but it is interesting that the CIO, in taking an incremental approach to finding a solution, also did exactly what was recommended in the 2016 GAO report that POGO references in their article.

Furthermore, as a career Navy Supply Corps Officer, I have often gotten into esoteric discussions in contracts regarding the proper color of money. Despite the assertion of the investigation, there is a lot of room for interpretation in the DoD guidance, not to mention a stark contrast in interpreting the proper role of RDT&E and O&M in the procurement of business software solutions.

When I was on the NAVAIR staff and at OSD I ran into the difference in military service culture where what Air Force financial managers often specified for RDT&E would never be approved by Navy financial managers where, in the latter case, they specified that only O&M dollars applied, despite whether development took place. Given that there was an Air Force flavor to the internal investigation, I would be interested to know whether the opinion of the investigators in making an ADA determination would withstand objective scrutiny among a panel of government comptrollers.

I am certain that, given the differing mix of military and civil service cultures at DCMA–and the mixed colors of money that applied to the effort–that the legal review that was sought to resolve the issue. One of the principles of law is that when you rely upon legal advice to take an action that you have a defense, unless your state of mind and the corollary actions that you took indicates that you manipulated the system to obtain a result that shows that you intended to violate the law. I just do not see that here, based on what has been presented in the materials.

It is very well possible that an inadvertent ADA violation occurred by default because of an improper interpretation of the use of the monies involved. This does not rise to the level of a scandal. But going back to the confusion that I have faced from my own experiences on active duty, I certainly hope that this investigation is not used as a precedent to review all contracts under the approach of accepting a post-hoc alternative interpretation by another individual who just happens to be an inspector long after a reasonable legal determination was made, regardless of how erroneous the new expert finds the opinion. This is not an argument against accountability, but absent corruption or criminal intent, a legal finding is a valid defense and should stand as the final determination for that case.

In addition, this interpretation of RDT&E vs. O&M relies upon an interpretation of COTS. I daresay that even those who throw that term around and who are familiar with the FAR fully understand what constitutes COTS when the line between adaptability and point solutions is being blurred by new technology.

Where the criticism is very much warranted are those areas where the budget authority would have been exceeded in any event–and it is here that the ADA determination is most damning. It is one thing to disagree on the color of money that applies to different contract line items, but it is another to completely lack financial control.

Part of the reason for lack of financial control was the absence of good contracting practices and the imposition of program management.

Contracts 101

While I note that the CIO took an incremental approach to IWMS–what a prudent manager would seem to do–what was lacking was a cohesive vision and a well-informed culture of compliance to acquisition policy that would avoid even the appearance of impropriety and favoritism. Under the OTA authority that I reference above as a new aspect of acquisition reform, the successful implementation of a proof-of-concept does not guarantee the incumbent provider continued business–salient characteristics for the solution are publicized and the opportunity advertised under free and open competition.

After all, everyone has their favorite applications and, even inadvertently, an individual can act improperly because of selection bias. The procurement procedures are established to prevent abuse and favoritism. As a solution provider I have fumed quite often where a selection was made without competition based on market surveys or use of a non-mandatory GSA contract, which usually turn out to be a smokescreen for pre-selection.

There are two areas of fault on IMWS from the perspective of acquisition practice, and another in relation to program management.

These are the initial selection of Apprio, which had laid out the initial requirements and subsequently failed to have the required integration functionality, and then, the selection of Discover Technologies under a non-mandatory GSA Blanket Purchase Agreement (BPA) contract under a sole source action. Furthermore, the contract type was not appropriate to the task at hand, and the arbitrary selection of Discover precluded the agency finding a better solution more fit to its needs.

The use of the GSA BPA allowed managers, however, to essentially spit the requirements to stay below more stringent management guidelines–an obvious violation of acquisition regulation that will get you removed from your position. This leads us to what I think is the root cause of all of these clearly avoidable errors in judgment.

Program Management 101

Personnel in the agency familiar with the requirements to replace the aging procurement management system understood from the outset that the total cost would probably fall somewhere between $20M and $40M. Yet all effort was made to reduce the risk by splitting requirements and failing to apply a programmatic approach to a clearly complex undertaking.

This would have required the agency to take the steps to establish an acquisition strategy, open the requirement based on a clear performance work statement to free and open competition, and then to establish a program management office to manage the effort and to allow oversight of progress and assessment of risks in a formalized environment.

The establishment of a program management organization would have prevented the lack of financial control, and would have put in place sufficient oversight by senior management to ensure progress and achievement of organizational goals. In a word, a good deal of the decision-making was based on doing stupid things on purpose.

The Recommendations

In reviewing the recommendations of the internal investigation, I think my own personal involvement in a very similar issue from 1985 will establish a baseline for comparison.

As I indicated earlier, in the early 1980s, as a young Navy commissioned officer, I was part of the first class of what was to be the Navy Acquisition Corps, stationed at the Supply Center in San Diego, California. I had served as a contracting intern and, after extensive education through the University of Virginia Darden School of Business, the extended Federal Acquisition Regulation (FAR) courses that were given at the time at Fort Lee, Virginia, and coursework provided by other federal acquisition organizations and colleges, I attained my warrant as a contracting officer. I also worked on acquisition reform issues, some of which were eventually adopted by the Navy and DoD.

During this time NAS Miramar was the home of Top Gun. In 1984 Congressman Duncan Hunter (the elder not the currently indicted junior of the same name, though from the same San Diego district), inspired by news of $7,600 coffee maker and a $435 hammer publicized by the founders of POGO, was given documents by a disgruntled employee at the base regarding the acquisition of replacement E-2C ashtrays that had a cost of $300. He presented them to the Base Commander, which launched an investigation.

I served on the JAG investigation under the authority of the Wing Commander regarding the acquisitions and then, upon the firing of virtually the entire chain of command at NAS Miramar, which included the Wing Commander himself, became the Officer-in-Charge of Supply Center San Diego Detachment NAS Miramar. Under Navy Secretary Lehman’s direction I was charged with determining the root cause of the acquisition abuses and given 60-90 days to take immediate corrective action and clear all possible discrepancies.

I am not certain who initiated the firings of the chain of command. From talking with contemporaneous senior personnel at the time it appeared to have been instigated in a fit of pique by the sometimes volcanic Secretary of Defense Caspar Weinberger. While I am sure that Secretary Weinberger experienced some emotional release through that action, placed in perspective, his blanket firing of the chain of command, in my opinion, was poorly advised and counterproductive. It was also grossly unfair, given what my team and I found as the root cause.

First of all, the ashtray was misrepresented in the press as a $600 ashtray because during the JAG I had sent a sample ashtray to the Navy industrial activity at North Island with a request to tell me what the fabrication of one ashtray would cost and to provide the industrial production curve that would reduce the unit price to a reasonable level. The figure of $600 was to fabricate one. A “whistleblower” at North Island took this slice of information out of context and leaked it to the press. So the $300 ashtray, which was bad enough, became the $600 ashtray.

Second, the disgruntled employee who gave the files to Congressman Hunter had been laterally assigned out of her position as a contracting officer by the Supply Officer because of the very reason that the pricing of the ashtray was not reasonable, among other unsatisfactory performance measures that indicated that she was not fit to perform those duties.

Third, there was a systemic issue in the acquisition of odd parts. For some reason there was an ashtray in the cockpit of the E-2C. These aircraft were able to stay in the air an extended period of time. A pilot had actually decided to light up during a local mission and, his attention diverted, lost control of the aircraft and crashed. Secretary Lehman ordered corrective action. The corrective action taken by the squadron at NAS Miramar was to remove the ashtray from the cockpit and store them in a hangar locker.

Four, there was an issue of fraud. During inspection the spare ashtrays were removed and deposited in the scrap metal dumpster on base. The tech rep for the DoD supplier on base retrieved the ashtrays and sold them back to the government for the price to fabricate one, given that the supply system had not experienced enough demand to keep them in stock.

Fifth, back to the systemic issue. When an aircraft is to be readied for deployment there can be no holes representing missing items in the cockpit. A deploying aircraft with this condition is then grounded and a high priority “casuality report” or CASREP is generated. The CASREP was referred to purchasing which then paid $300 for each ashtray. The contracting officer, however, feeling under pressure by the high priority requisition, did not do due diligence in questioning the supplier on the cost of the ashtray. In addition, given that several aircraft deploy, there were a number of these requisitions that should have led the contracting officer to look into the matter more closely to determine price reasonableness.

Furthermore, I found that buying personnel were not properly trained, that systems and procedures were not established or enforced, that the knowledge of the FAR was spotty, and that procurements did not go through multiple stages of review to ensure compliance with acquisition law, proper documentation, and administrative procedure.

Note that in the end this “scandal” was born by a combination of systemic issues, poor decision-making, lack of training, employee discontent, and incompetence.

I successfully corrected the issues at NAS Miramar during the prescribed time set by the Secretary of the Navy, worked with the media to instill public confidence in the system, built up morale, established better customer service, reduced procurement acquisition lead times (PALT), recommended necessary disciplinary action where it seemed appropriate, particularly in relation to the problematic employee, recovered monies from the supplier, referred the fraud issues to Navy legal, and turned over duties to a new chain of command.

NAS Miramar procurement continued to do its necessary job and is still there.

What the higher chain of command did not do was to take away the procurement authority of NAS Miramar. It did not eliminate or reduce the organization. It did not close NAS Miramar.

It requires leadership and focus to take effective corrective action to not only fix a broken system, but to make it better while the corrective actions are being taken. As I outlined above, DCMA performs an essential mission. As it transitions to a data-driven approach and works to reduce redundancy and inefficiency in its systems, it will require more powerful technologies to support its CAS function, and the ability to acquire those technologies to support that function.

(Data) Transformation–Fear and Loathing over ETL in Project Management

ETL stands for data extract, transform, and load. This essential step is the basis for all of the new capabilities that we wish to acquire during the next wave of information technology: business analytics, big(ger) data, interdisciplinary insight into processes that provide insights into improving productivity and efficiency.

I’ve been dealing with a good deal of fear and loading regarding the introduction of this concept, even though in my day job my organization is a leading practitioner in the field in its vertical. Some of this is due to disinformation by competitors in playing upon the fears of the non-technically minded–the expected reaction of those who can’t do in the last throws of avoiding irrelevance. Better to baffle them with bullshit than with brilliance, I guess.

But, more importantly, part of this is due to the state of ETL and how it is communicated to the project management and business community at large. There is a great deal to be gained here by muddying the waters even by those who know better and have the technology. So let’s begin by clearing things up and making this entire field a bit more coherent.

Let’s start with the basics. Any organization that contains the interaction of people is a system. For purposes of a project management team, a business enterprise, or a governmental body we deal with a special class of systems known as Complex Adaptive Systems: CAS for short. A CAS is a non-linear learning system that reacts and evolves to its environment. It is complex because of the inter-relationships and interactions of more than two agents in any particular portion of the system.

I was first introduced to the concept of CAS through readings published out of the Santa Fe Institute in New Mexico. Most noteworthy is the work The Quark and the Jaguar by the physicist Murray Gell-Mann. Gell-Mann is received the Nobel in physics in 1969 for his work on elementary particles, such as the quark, and is co-founder of the Institute. He also was part of the team that first developed simulated Monte Carlo analysis during a period he spent at RAND Corporation. Anyone interested in the basic science of quanta and how the universe works that then leads to insights into subjects such as day-to-day probability and risk should read this book. It is a good popular scientific publication written by a brilliant mind, but very relevant to the subjects we deal with in project management and information science.

Understanding that our organizations are CAS allows us to apply all sorts of tools to better understand them and their relationship to the world at large. From a more practical perspective, what are the risks involved in the enterprise in which we are engaged and what are the probabilities associated with any of the range of outcomes that we can label as success. For my purposes, the science of information theory is at the forefront of these tools. In this world an engineer by the name of Claude Shannon working at Bell Labs essentially invented the mathematical basis for everything that followed in the world of telecommunications, generating, interpreting, receiving, and understanding intelligence in communication, and the methods of processing information. Needless to say, computing is the main recipient of this theory.

Thus, all CAS process and react to information. The challenge for any entity that needs to survive and adapt in a continually changing universe is to ensure that the information that is being received is of high and relevant quality so that the appropriate adaptation can occur. There will be noise in the signals that we receive. What we are looking for from a practical perspective in information science are the regularities in the data so that we can make the transformation of receiving the message in a mathematical manner (where the message transmitted is received) into the definition of information quality that we find in the humanities. I believe that we will find that mathematical link eventually, but there is still a void there. A good discussion of this difference can be found here in the on-line publication Double Dialogues.

Regardless of this gap, the challenge of those of us who engage in the business of ETL must bring to the table the ability not only to ensure that the regularities in the information are identified and transmitted to the intended (or necessary) users, but also to distinguish the quality of the message in the terms of the purpose of the organization. Shannon’s equation is where we start, not where we end. Given this background, there are really two basic types of data that we begin with when we look at a set of data: structured and unstructured data.

Structured data are those where the qualitative information content is either predefined by its nature or by a tag of some sort. For example, schedule planning and performance data, regardless of the idiosyncratic/proprietary syntax used by a software publisher, describes the same phenomena regardless of the software application. There are only so many ways to identify snow–and, no, the Inuit people do not have 100 words to describe it. Qualifiers apply in the humanities, but usually our business processes more closely align with statistical and arithmetic measures. As a result, structured data is oftentimes defined by its position in a hierarchical, time-phased, or interrelated system that contains a series of markers, indexes, and tables that allow it to be interpreted easily through the identification of a Rosetta stone, even when the system, at first blush, appears to be opaque. When you go to a book, its title describes what it is. If its content has a table of contents and/or an index it is easy to find the information needed to perform the task at hand.

Unstructured data consists of the content of things like letters, e-mails, presentations, and other forms of data disconnected from its source systems and collected together in a flat repository. In this case the data must be mined to recreate what is not there: the title that describes the type of data, a table of contents, and an index.

All data requires initial scrubbing and pre-processing. The difference here is the means used to perform this operation. Let’s take the easy path first.

For project management–and most business systems–we most often encounter structured data. What this means is that by understanding and interpreting standard industry terminology, schemas, and APIs that the simple process of aligning data to be transformed and stored in a database for consumption can be reduced to a systemic and repeatable process without the redundancy of rediscovery applied in every instance. Our business intelligence and business analytics systems can be further developed to anticipate a probable question from a user so that the query is pre-structured to allow for near immediate response. Further, structuring the user interface in such as way as to make the response to the query meaningful, especially integrated with and juxtaposed other types of data requires subject matter expertise to be incorporated into the solution.

Structured ETL is the place that I most often inhabit as a provider of software solutions. These processes are both economical and relatively fast, particularly in those cases where they are applied to an otherwise inefficient system of best-of-breed applications that require data transfers and cross-validation prior to official reporting. Time, money, and effort are all saved by automating this process, improving not only processing time but also data accuracy and transparency.

In the case of unstructured data, however, the process can be a bit more complicated and there are many ways to skin this cat. The key here is that oftentimes what seems to be unstructured data is only so because of the lack of domain knowledge by the software publisher in its target vertical.

For example, I recently read a white paper published by a large BI/BA publisher regarding their approach to financial and accounting systems. My own experience as a business manager and Navy Supply Corps Officer provide me with the understanding that these systems are highly structured and regulated. Yet, business intelligence publishers treated this data–and blatantly advertised and apparently sold as state of the art–an unstructured approach to mining this data.

This approach, which was first developed back in the 1980s when we first encountered the challenge of data that exceeded our expertise at the time, requires a team of data scientists and coders to go through the labor- and time-consuming process of pre-processing and building specialized processes. The most basic form of this approach involves techniques such as frequency analysis, summarization, correlation, and data scrubbing. This last portion also involves labor-intensive techniques at the microeconomic level such as binning and other forms of manipulation.

This is where the fear and loathing comes into play. It is not as if all information systems do not perform these functions in some manner, it is that in structured data all of this work has been done and, oftentimes, is handled by the database system. But even here there is a better way.

My colleague, Dave Gordon, who has his own blog, will emphasize that the identification of probable questions and configuration of queries in advance combined with the application of standard APIs will garner good results in most cases. Yet, one must be prepared to receive a certain amount of irrelevant information. For example, the query on Google of “Fun Things To Do” that you may use if you are planning for a weekend will yield all sorts of results, such as “50 Fun Things to Do in an Elevator.”  This result includes making farting sounds. The link provides some others, some of which are pretty funny. In writing this blog post, a simple search on Google for “Google query fails” yields what can only be described as a large number of query fails. Furthermore, this approach relies on the data originator to have marked the data with pointers and tags.

Given these different approaches to unstructured data and the complexity involved, there is a decision process to apply:

1. Determine if the data is truly unstructured. If the data is derived from a structured database from an existing application or set of applications, then it is structured and will require domain expertise to inherit the values and information content without expending unnecessary resources and time. A structured, systemic, and repeatable process can then be applied. Oftentimes an industry schema or standard can be leveraged to ensure consistency and fidelity.

2. Determine whether only a portion of the unstructured data is relative to your business processes and use it to append and enrich the existing structured data that has been used to integrate and expand your capabilities. In most cases the identification of a Rosetta Stone and standard APIs can be used to achieve this result.

3. For the remainder, determine the value of mining the targeted category of unstructured data and perform a business case analysis.

Given the rapidly expanding size of data that we can access using the advancing power of new technology, we must be able to distinguish between doing what is necessary from doing what is impressive. The definition of Big Data has evolved over time because our hardware, storage, and database systems allow us to access increasingly larger datasets that ten years ago would have been unimaginable. What this means is that–initially–as we work through this process of discovery, we will be bombarded with a plethora of irrelevant statistical measures and so-called predictive analytics that will eventually prove out to not pass the “so-what” test. This process places the users in a state of information overload, and we often see this condition today. It also means that what took an army of data scientists and developers to do ten years ago takes a technologist with a laptop and some domain knowledge to perform today. This last can be taught.

The next necessary step, aside from applying the decision process above, is to force our information systems to advance their processing to provide more relevant intelligence that is visualized and configured to the domain expertise required. In this way we will eventually discover the paradox that effectively accessing larger sets of data will yield fewer, more relevant intelligence that can be translated into action.

At the end of the day the manager and user must understand the data. There is no magic in data transformation or data processing. Even with AI and machine learning it is still incumbent upon the people within the organization to be able to apply expertise, perspective, knowledge, and wisdom in the use of information and intelligence.

Move It On Over — Third and Fourth Generation Software: A Primer

While presenting to organizations regarding business intelligence and project management solutions I often find myself explaining the current state of programming and what current technology brings to the table. Among these discussions is the difference between third and fourth generation software, not just from the perspective of programming–or the Wikipedia definition (which is quite good, see the links below)–but from a practical perspective.

Recently I ran into someone who asserted that their third-generation software solution was advantageous over a fourth generation one because it was “purpose built.” My response was that a fourth generation application provides multiple “purpose built” solutions from one common platform in a more agile and customer-responsive environment. For those unfamiliar with the differences, however, this simply sounded like a war of words rather than the substantive debate that it was.

For anyone who has used a software application they are usually not aware of the three basic logical layers that make up the solution. These are the business logic layer, the application layer, and the database structure. The user interface delivers the result of the interaction of these three layers to the user–what is seen on the screen.

Back during the early advent of the widespread use of PCs and distributed computing on centralized systems, a group of powerful languages were produced that allowed the machine operations to be handled by an operating system and for software developers to write code to focus on “purpose built” solutions.

Initially these efforts concentrated on automated highly labor-intensive activities to achieve maximum productivity gains in an organization, and to leverage those existing systems to distribute information that previously would require many hours of manual effort in terms of mathematical and statistical calculation and visualization. The solutions written were based on what were referred to as third generation languages, and they are familiar even to non-technical people: Fortran, Cobol, C+, C++, C#, and Java, among others. These languages are highly structured and require a good bit of expertise to correctly program.

In third generation environments, the coder specifies operations that the software must perform based on data structure, application logic, and pre-coded business logic.These three levels of highly integrated and any change in one of them requires that the programmer trace the impact of that change to ensure that the operations in the other two layers are not affected. Oftentimes, the change has a butterfly effect, requiring detailed adjustments to take into the account the subtleties in processing. It is this highly structured, interdependent, “purpose built” structure that causes unanticipated software bugs to pop up in most applications. It is also the reason why software development and upgrade configuration control is also highly structured and time-consuming–requiring long lead-times to even deliver what most users view as relatively mundane changes and upgrades, like a new chart or graph.

In contrast, fourth generation applications separate the three levels and control the underlying behavior of the operating environment by leveraging a standard framework, such as .NET. The .NET operating environment, for example, controls both a library of interoperability across programming languages (known as a Framework Class Library or FCL), and virtual machine that handles exception handling, memory management, and other common functions (known as Common Language Runtime or CLR).

With the three layers separated, with many of the more mundane background tasks being controlled by the .NET framework, a great deal of freedom is provided to the software developer that provides real benefits to customers and users.

For example, the database layer is freed from specific coding from the application layer, since the operating environment allows libraries of industry standard APIs to be leveraged, making the solution agnostic to data. Furthermore, the business logic/UI layer allows for table-driven and object-oriented configuration that creates a low code environment, which not only allows for rapid roll-out of new features and functionality (since hard-coding across all three layers is eschewed), but also allows for more precise targeting of functionality based on the needs of user groups (or any particular user).

This is what is meant in previous posts by new technology putting the SME back in the driver’s seat, since pre-defined reports and objects (GUIs) at the application layer allow for immediate delivery of functionality. Oftentimes data from disparate data sources can be bound together through simple query languages and SQL, particularly if the application layer table and object functionality is built well enough.

When domain knowledge is incorporated into the business logic layer, the distinction between generic BI and COTS is obliterated. Instead, what we have is a hybrid approach that provides the domain specificity of COTS (‘purpose built”), with the power of BI that reduces the response time between solution design and delivery. More and better data can also be accessed, establishing an environment of discovery-driven management.

Needless to say, properly designed Fourth Generation applications are perfectly suited to rapid application development and deployment approaches such as Agile. They also provide the integration potential, given the agnosticism to data, that Third Generation “purpose built” applications can only achieve through data transfer and reconciliation across separate applications that never truly achieve integration. Instead, Fourth Generation applications can subsume the specific “purpose built” functionality found in stand-alone applications and deliver it via a single platform that provides one source of truth, still allowing for different interpretations of the data through the application of differing analytical approaches.

So move it on over nice (third generation) dog, a big fat (fourth generation) dog is moving in.

Learning the (Data) — Data-Driven Management, HBR Edition

The months of December and January are usually full of reviews of significant events and achievements during the previous twelve months. Harvard Business Review makes the search for some of the best writing on the subject of data-driven transformation by occasionally publishing in one volume the best writing on a critical subject of interest to professional through the magazine OnPoint. It is worth making part of your permanent data management library.

The volume begins with a very concise article by Thomas C. Redman with the provocative title “Does Your Company Know What to Do with All Its Data?” He then goes on to list seven takeaways of optimizing the use of existing data that includes many of the themes that I have written about in this blog: better decision-making, innovation, what he calls “informationalize products”, and other significant effects. Most importantly, he refers to the situation of information asymmetry and how this provides companies and organizations with a strategic advantage that directly affects the bottom line–whether that be in negotiations with peers, contractual relationships, or market advantages. Aside from the OnPoint article, he also has some important things to say about corporate data quality. Highly recommended and a good reason to implement systems that assure internal information systems fidelity.

Edd Wilder-James also covers a theme that I have hammered home in a number of blog posts in the article “Breaking Down Data Silos.” The issue here is access to data and the manner in which it is captured and transformed into usable analytics. His recommended approach to a task that is often daunting is to find the path of least resistance in finding opportunities to break down silos and maximize data to apply advanced analytics. The article provides a necessary balm that counteracts the hype that often accompanies this topic.

Both of these articles are good entrees to the subject and perfectly positioned to prompt both thought and reflection of similar experiences. In my own day job I provide products that specifically address these business needs. Yet executives and management in all too many cases continue to be unaware of the economic advantages of data optimization or the manner in which continuing to support data silos is limiting their ability to effectively manage their organizations. There is no doubt that things are changing and each day offers a new set of clients who are feeling their way in this new data-driven world, knowing that the promises of almost effort-free goodness and light by highly publicized data gurus are not the reality of practitioners, who apply the detail work of data normalization and rationalization. At the end it looks like magic, but there is effort that needs to be expended up-front to get to that state. In this physical universe under the Second Law of Thermodynamics there are no free lunches–energy must be borrowed from elsewhere in order to perform work. We can minimize these efforts through learning and the application of new technology, but managers cannot pretend not to have to understand the data that they intend to use to make business decisions.

All of the longer form articles are excellent, but I am particularly impressed with the Leandro DalleMule and Thomas H. Davenport article entitled “What’s Your Data Strategy?” from the May-June 2017 issue of HBR. Oftentimes when addressing big data at professional conferences and in visiting businesses the topic often runs to the manner of handling the bulk of non-structured data. But as the article notes, less than half of an organization’s relevant structured data is actually used in decision-making. The most useful artifact that I have permanently plastered at my workplace is the graphic “The Elements of Data Strategy”, and I strongly recommend that any manager concerned with leveraging new technology to optimize data do the same. The graphic illuminates the defensive and offensive positions inherent in a cohesive data strategy leading an organization to the state: “In our experience, a more flexible and realistic approach to data and information architectures involves both a single source of truth (SSOT) and multiple versions of the truth (MVOTs). The SSOT works at the data level; MVOTs support the management of information.” Elimination of proprietary data silos, elimination of redundant data streams, and warehousing of data that is accessed using a number of analytical methods achieve the necessary states of SSOT that provides the basis for an environment supporting MVOTs.

The article “Why IT Fumbles Analytics” by Donald A. Marchand and Joe Peppard from 2013, still rings true today. As with the article cited above by Wilder-James, the emphasis here is with the work necessary to ensure that new data and analytical capabilities succeed, but the emphasis shifts to “figuring out how to use the information (the new system) generates to make better decisions or gain deeper…insights into key aspects of the business.” The heart of managing the effort in providing this capability is to put into place a project organization, as well as systems and procedures, that will support the organizational transformation that will occur as a result of the explosion of new analytical capability.

The days of simply buying an off-the-shelf silo-ed “tool” and automating a specific manual function are over, especially for organizations that wish to be effective and competitive–and more profitable–in today’s data and analytical environment. A more comprehensive and collaborative approach is necessary. As with the DalleMule and Davenport article, there is a very useful graphic that contrasts traditional IT project approaches against Analytics and Big Data (or perhaps “Bigger” Data) Projects. Though the prescriptions in the article assume an earlier concept of Big Data optimization focused on non-structured data, thereby making some of these overkill, an implementation plan is essential in supporting the kind of transformation that will occur, and managers act at their own risk if they fail to take this effect into account.

All of the other articles in this OnPoint issue are of value. The bottom line, as I have written in the past, is to keep a focus on solving business challenges, rather than buying the new bright shiny object. Alternatively, in today’s business environment the day that business decision-makers can afford to stay within their silo-ed comfort zone are phasing out very quickly, so they need to shift their attention to those solutions that address these new realities.

So why do this apart from the fancy term “data optimization”? Well, because there is a direct return-on-investment in transforming organizations and systems to data-driven ones. At the end of the day the economics win out. Thus, our organizations must be prepared to support and have a plan in place to address the core effects of new data-analytics and Big Data technology:

a. The management and organizational transformation that takes place when deploying the new technology, requiring proactive socialization of the changing environment, the teaching of new skill sets, new ways of working, and of doing business.

b. Supporting transformation from a sub-optimized silo-ed “tell me what I need to know” work environment to a learning environment, driven by what the data indicates, supporting the skills cited above that include intellectual curiosity, engaging domain expertise, and building cross-domain competencies.

c. A practical plan that teaches the organization how best to use the new capability through a practical, hands-on approach that focuses on addressing specific business challenges.