Open: Strategic Planning, Open Data Systems, and the Section 809 Panel

Sundays are usually days reserved for music and the group Rhye was playing in the background when this topic came to mind.

I have been preparing for my presentation in collaboration with my Navy colleague John Collins for the upcoming Integrated Program Management Workshop in Baltimore. This presentation will be a non-proprietary/non-commercial talk about understanding the issue of unlocking data to support national defense systems, but the topic has broader interest.

Thus, in advance of that formal presentation in Baltimore, there are issues and principles that are useful to cover, given that data capture and its processing, delivery, and use is at the heart of all systems in government, and private industry and organizations.

Top Data Trends in Industry and Their Relationship to Open Data Systems

According to Shohreh Gorbhani, Director, Project Control Academy, the top five data trends being pursued by private industry and technology companies. My own comments follow as they relate to open data systems.

  1. Open Technologies that transition from 2D Program Management to 3D and 4D PM. This point is consistent with the College of Performance Management’s emphasis on IPM, but note that the stipulation is the use of open technologies. This is an important distinction technologically, and one that I will explore further in this post.
  2. Real-time Data Capture. This means capturing data in the moment so that the status of our systems is up-to-date without the present delays associated with manual data management and conditioning. This does not preclude the collection of structured, periodic data, but also does include the capture of transactions from real-time integrated systems where appropriate.
  3. Seamless Data Flow Integration. From the perspective of companies in manufacturing and consumer products, technologies such as IoT and Cloud are just now coming into play. But, given the underlying premises of items 1 and 2, this also means the proper automated contextualization of data using an open technology approach that flows in such a way as to be traceable.
  4. The use of Big Data. The term has lost a good deal of its meaning because of its transformation into a buzz-phrase and marketing term. But Big Data refers to the expansion in the depth and breadth of available data driven by the economic forces that drive Moore’s Law. What this means is that we are entering a new frontier of data processing and analysis that will, no doubt, break down assumptions regarding the validity and strength of certain predictive analytics. The old assumptions that restrict access to data due to limitations of technology and higher cost no longer apply. We are now in the age of Knowledge Discovery in Data (KDD). The old approach of reporting assumed that we already know what we need to know. The use of data challenges old assumptions and allows us to follow the data where it will lead us.
  5. AI Forecasting and Analysis. No doubt predictive AI will be important as we move forward with machine learning and other similar technologies. But this infant is not yet a rug rat. The initial experiences with AI are that they tend to reflect the biases of the creators. The danger here is that this defeats KDD, which results in stagnation and fugue. But there are other areas where AI can be taught to automate mundane, value-neutral tasks relating to raw data interpretation.

The 809 Panel Recommendation

The fact that industry is the driving force behind these trends that will transform the way that we view information in our day-to-day work, it is not surprising that the 809 Panel had this to say about existing defense business systems:

“Use existing defense business system open-data requirements to improve strategic decision making on acquisition and workforce issues…. DoD has spent billions of dollars building the necessary software and institutional infrastructure to collect enterprise wide acquisition and financial data. In many cases, however, DoD lacks the expertise to effectively use that data for strategic planning and to improve decision making. Recommendation 88 would mitigate this problem by implementing congressional open-data mandates and using existing hiring authorities to bolster DoD’s pool of data science professionals.”

Section 809 Volume 3, Section 9, p. 477

At one point in my military career, I was assigned as the Materiel, Fuels, and Transportation Officer of Naval Air Station, Norfolk. As a major naval air base, transportation hub, and home to a Naval Aviation Depot, we shipped and received materiel and supplies across the world. In doing so, our transportation personnel would use what at the time was new digital technology to complete an electronic bill of lading that specified what and when items were being shipped, the common or military carrier, the intended recipient, and the estimated date of arrival, among other essential information.

The customer and receiving end of this workflow received an open systems data file that contained these particulars. The file was an early version of open data known as an X12 file, for which the commercial transportation industry was an early adopter. Shipping and receiving activities and businesses used their own type of local software: and there were a number of customized and commercial choices out there, as well as those used by common carriers such various trucking and shipping firms, the USPS, FEDEX, DHS, UPS, and others. The X12 file was the DMZ that made the information open. Software manufacturers, if they wanted to stay relevant in the market, could not impose a proprietary data solution.

Furthermore, standardization of terminology and concepts ensured that the information was readable and comprehensible wherever the items landed–whether across receiving offices in the United States, Japan, Europe, or even Istanbul. Understanding that DoD needs the skillsets to be able to optimize data, it didn’t require an army of data scientists to achieve this end-state. It required the right data science expertise in the right places, and the dictates of transportation consumers to move the technology market to provide the solution.

Over the years both industry and government have developed a number of schema standards focused on specific types of data, progressing from X12 to XML and now projected to use JSON-based schemas. Each of them in their initial iterations automated the submission of physical reports that had been required by either by contract or operations. These focused on a small subset of the full dataset relating to program management and project controls.

This progression made sense.

When digitized technology is first introduced into an intensive direct-labor environment, the initial focus is to automate the production of artifacts and their underlying processes in order to phase in the technology’s acceptance. This also allows the organization to realize immediate returns on investment and improvements in productivity. But this is the first step, not the final one.

Currently for project controls the current state is the UN/CEFACT XML for program performance management data, and the contract cost and labor data collection file known as the FlexFile. Clearly the latter file, given that the recipient is the Office of the Secretary of Defense Cost Assessment and Program Evaluation (OSD CAPE), establish it as one of many feedback loops that support that office’s role in coordinating the planning, programming, budgeting, and evaluation (PPBE) system related to military strategic investments and budgeting, but only one. The program performance information is also a vital part of the PPBE process in evaluation and in future planning.

For most of the U.S. economy, market forces and consumer requirements are the driving force in digital innovation. The trends noted by Ms. Gorbhani can be confirmed through a Google search of any one of the many technology magazines and websites that can be found. The 809 Panel, drawn as it was from specialists and industry and government, were tasked “to provide recommendations that would allow DoD to adapt and deliver capability at market speeds, while ensuring that DoD remains true to its commitment to promote competition, provide transparency in its actions, and maintain the integrity of the defense acquisition system.”

Given that the work of the DoD is unique, creating a type of monopsony, it is up to leadership within the Department to create the conditions and mandates necessary to recreate in microcosm the positive effects of market forces. The DoD also has a very special, vital mission in defending the nation.

When an individual business cobbles together its mission statement it is that mission that defines the necessary elements in data collection that are then essential in making decisions. In today’s world, best commercial sector practice is to establish a Master Data Management (MDM) approach in defining data requirements and practices. In the case of DoD, a similar approach would be beneficial. Concurrent with the period of the 809 Panel’s efforts, RAND Corporation delivered a paper in 2017 (link in the previous sentence) that made recommendations related to data governance that are consistent with the 809 Panel’s recommendations. We will be discussing these specific recommendations in our presentation.

Meeting the mission and readiness are the key components to data governance in DoD. Absent such guidance, specialized software solution providers, in particular, will engage in what is called “rent-seeking” behavior. This is an economic term that means that an “entity (that) seeks to gain added wealth without any reciprocal contribution of productivity.”

No doubt, given the marketing of software solution providers, it is hard for decision-makers to tell what constitutes an open data system. The motivation of a software solution is to make itself as “sticky” as possible and it does that by enticing a customer to commit to proprietary definitions, structures, and database schemas. Usually there are “black-boxed” portions of the software that makes traceability impossible and that complicates the issue of who exactly owns the data and the ability of the customer to optimize it and utilize it as the mission dictates.

Furthermore, data visualization components like dashboards are ubiquitous in the market. A cursory stroll through a tradeshow looks like a dashboard smorgasbord combined with different practical concepts of what constitutes “open” and “integration”.

As one DoD professional recently told me, it is hard to tell the software systems apart. To do this it is necessary to understand what underlies the software. Thus, a proposed honest-broker definition of an open data system is useful and the place to start, given that this is not a notional concept since such systems have been successfully been established.

The Definition of Open Data Systems

Practical experience in implementing open data systems toward the goal of optimizing essential information from our planning, acquisition, financial, and systems engineering systems informs the following proposed definition, which is based on commercial best practice. This proposal is also based on the principle that the customer owns the data.

  1. An open data system is one based on non-proprietary neutral schemas that allow for the effective capture of all essential elements from third-party proprietary and customized software for reporting and integration necessary to support both internal and external stakeholders.
  2. An open data system allows for complete traceability and transparency from the underlying database structure of the third-party software data, through the process of data capture, transformation, and delivery of data in the neutral schema.
  3. An open data system targets the loading of the underlying source data for analysis and use into a neutral database structure that replicates the structure of the neutral schema. This allows for 100% traceability and audit of data elements received through the neutral schema, and ensures that the receiving organization owns the data.

Under this definition, data from its origination to its destination is more easily validated and traced, ensuring quality and fidelity, and establishing confidence in its value. Given these characteristics, integration of data from disparate domains becomes possible. The tracking of conflicting indicators is mitigated, since open system data allows for its effective integration without the bias of proprietary coding or restrictions on data use. Finally, both government and industry will not only establish ownership of their data–a routine principle in commercial business–but also be free to utilize new technologies that optimize the use of that data.

In closing, Gahan Wilson, a cartoonist whose work appeared in National Lampoon, The New Yorker, Playboy, and other magazines recently passed.

When thinking of the barriers to the effective use of data, I came across this cartoon in The New Yorker:

Open Data is the key to effective integration and reporting–to the optimal use of information. Once mandated and achieved, our defense and business systems will be better informed and be able to test and verify assumed knowledge, address risk, and eliminate dogmatic and erroneous conclusions. Open Data is the driver of organizational transformation keyed to the effective understanding and use of information, and all that entails. Finally, Open Data is necessary to the mission and planning systems of both industry and the U.S. Department of Defense.

The Future — Data Focus vs. “Tools” Focus

The title in this case is from the Leonard Cohen song.

Over the last few months I’ve come across this issue quite a bit and it goes to the heart of where software technology is leading us.  The basic question that underlies this issue can be boiled down into the issue of whether software should be thought of as a set of “tools” or an overarching solution that can handle data in a way that the organization requires.  It is a fundamental question because what we call Big Data–despite all of the hoopla–is really a relative term that changes with hardware, storage, and software scalability.  What was Big Data in 1997 is not Big Data in 2016.

As Moore’s Law expands scalability at lower cost, organizations and SMEs are finding that the dedicated software tools at hand are insufficient to leverage the additional information that can be derived from that data.  The reason for this is simple.  A COTS tools publisher will determine the functionality required based on a structured set of data that is to be used and code to that requirement.  The timeframe is usually extended and the approach highly structured.  There are very good reasons for this approach in particular industries where structure is necessary and the environment is fairly stable.  The list of industries that fall into this category is rapidly becoming smaller.  Thus, there is a large gap that must be filled by workarounds, custom code, and suboptimized use of Excel.  Organizations and people cannot wait until the self-styled software SMEs get around to providing that upgrade two years from now so that people can do their jobs.

Thus, the focus must be shifted to data and the software technologies that maximize its immediate exploitation for business purposes to meet organizational needs.  The key here is the arise of Fourth Generation applications that leverage object oriented programming language that most closely replicate the flexibility of open source.  What this means is that in lieu of buying a set of “tools”–each focused on solving a specific problem stitched together by a common platform or through data transfer–that software that deals with both data and UI in an agnostic fashion is now available.

The availability of flexible Fourth Generation software is of great concern, as one would imagine, to incumbents who have built their business model on defending territory based on a set of artifacts provided in the software.  Oftentimes these artifacts are nothing more than automatically filled in forms that previously were filled in manually.  That model was fine during the first and second waves of automation from the 1980s and 1990s, but such capabilities are trivial in 2016 given software focused on data that can be quickly adapted to provide functionality as needed.  What this development also does is eliminate and make trivial those old checklists that IT shops used to send out in a lazy way of assessing relative capabilities of software to simplify the competitive range.

Tools restrict themselves to a subset of data by definition to provide a specific set of capabilities.  Software that expands to include any set of data and allows that data to be displayed and processed as necessary through user configuration adapts itself more quickly and effectively to organizational needs.  They also tend to eliminate the need for multiple “best-of-breed” toolset approaches that are not the best of any breed, but more importantly, go beyond the limited functionality and ways of deriving importance from data found in structured tools.  The reason for this is that the data drives what is possible and important, rather than tools imposing a well-trod interpretation of importance based on a limited set of data stored in a proprietary format.

An important effect of Fourth Generation software that provides flexibility in UI and functionality driven by the user is that it puts the domain SME back in the driver’s seat.  This is an important development.  For too long SMEs have had to content themselves with recommending and advocating for functionality in software while waiting for the market (software publishers) to respond.  Essential business functionality with limited market commonality often required that organizations either wait until the remainder of the market drove software publishers to meet their needs, finance expensive custom development (either organic or contracted), or fill gaps with suboptimized and ad hoc internal solutions.  With software that adapts its UI and functionality based on any data that can be accessed, using simple configuration capabilities, SMEs can fill these gaps with a consistent solution that maintains data fidelity and aids in the capture and sustainability of corporate knowledge.

Furthermore, for all of the talk about Agile software techniques, one cannot implement Agile using software languages and approaches that were designed in an earlier age that resists optimization of the method.  Fourth Generation software lends itself most effectively to Agile since configuration using simple object oriented language gets us to the ideal–without a reliance on single points of failure–of releasable solutions at the end of a two-week sprint.  No doubt there are developers out there making good money that may challenge this assertion, but they are the exceptions to the rule that prove the point.  An organization should be able to optimize the pool of contributors to solution development and rollout in supporting essential business processes.  Otherwise Agile is just a pretext to overcome suboptimized developmental approaches, software languages, and the self-interest of developers that can’t plan or produce a releasable product in a timely manner within budgetary constraints.

In the end the change in mindset from tools to data goes to the issue of who owns the data: the organization that creates and utilizes the data (the customer), or the proprietary software tool publishers?  Clearly the economics will win out in favor of the customer.  It is time to displace “tools” thinking.

Note:  I’ve revised the title of the blog for clarity.

Living in the Material World — A Call for Open Data from Materials Science

Since advocating for transparent and open data schemas and data repositories, I’ve run into other high tech professionals who have opined that such data requirements are only applicable to my core competency in the U.S. Aerospace & Defense vertical.  Now, in the 26 November on-line edition of the journal Science, comes an opinion piece from a number of Chinese corrosion scientists under the title “Materials science: Share corrosion data” have advocated the development of open data infrastructures to make the age and life of existing metallurgic infrastructure easily accessible to technology in a non-proprietary format.

As the article states, “corrosion costs six cents for every dollar of gross domestic product in the United States. Globally, that amounts to more than US$4 trillion a year — equivalent to damages from 40 Hurricane Katrinas. Half of that cost is in corrosion prevention and control, the other half in damages and lost productivity.”

In the software technology industry, the recent issues regarding the Trans-Pacific Partnership (TPP) has revealed strains in U.S.-Chinese relations on intellectual property and proprietary systems.  So the danger is always that the source of the advocacy will be ignored, despite the clear economic argument in favor of what the editorial proposes.

But the concern is transnational in nature.  As the piece points out, the U.S. Department of Energy has initiated its own program in alignment with the U.S. National Institute of Standards and Technology (NIST) Materials Genome Initiative (MGI) to establish open repositories of data in support of the alternative energy industry.  Given the aging U.S. infrastructure and the dangers from sudden failure from these engineering systems, it seems wise to track and share data for materials corrosion and lifespans.  Think of the sudden bridge failures that have hit headlines over the last several years.

Since the cost of not doing so can be measured in lives as well as economic cost, such data normalization and rationalization in this area takes on the role of being an essential element of governance.