Sundays are usually days reserved for music and the group Rhye was playing in the background when this topic came to mind.
I have been preparing for my presentation in collaboration with my Navy colleague John Collins for the upcoming Integrated Program Management Workshop in Baltimore. This presentation will be a non-proprietary/non-commercial talk about understanding the issue of unlocking data to support national defense systems, but the topic has broader interest.
Thus, in advance of that formal presentation in Baltimore, there are issues and principles that are useful to cover, given that data capture and its processing, delivery, and use is at the heart of all systems in government, and private industry and organizations.
Top Data Trends in Industry and Their Relationship to Open Data Systems
According to Shohreh Gorbhani, Director, Project Control Academy, the top five data trends being pursued by private industry and technology companies. My own comments follow as they relate to open data systems.
- Open Technologies that transition from 2D Program Management to 3D and 4D PM. This point is consistent with the College of Performance Management’s emphasis on IPM, but note that the stipulation is the use of open technologies. This is an important distinction technologically, and one that I will explore further in this post.
- Real-time Data Capture. This means capturing data in the moment so that the status of our systems is up-to-date without the present delays associated with manual data management and conditioning. This does not preclude the collection of structured, periodic data, but also does include the capture of transactions from real-time integrated systems where appropriate.
- Seamless Data Flow Integration. From the perspective of companies in manufacturing and consumer products, technologies such as IoT and Cloud are just now coming into play. But, given the underlying premises of items 1 and 2, this also means the proper automated contextualization of data using an open technology approach that flows in such a way as to be traceable.
- The use of Big Data. The term has lost a good deal of its meaning because of its transformation into a buzz-phrase and marketing term. But Big Data refers to the expansion in the depth and breadth of available data driven by the economic forces that drive Moore’s Law. What this means is that we are entering a new frontier of data processing and analysis that will, no doubt, break down assumptions regarding the validity and strength of certain predictive analytics. The old assumptions that restrict access to data due to limitations of technology and higher cost no longer apply. We are now in the age of Knowledge Discovery in Data (KDD). The old approach of reporting assumed that we already know what we need to know. The use of data challenges old assumptions and allows us to follow the data where it will lead us.
- AI Forecasting and Analysis. No doubt predictive AI will be important as we move forward with machine learning and other similar technologies. But this infant is not yet a rug rat. The initial experiences with AI are that they tend to reflect the biases of the creators. The danger here is that this defeats KDD, which results in stagnation and fugue. But there are other areas where AI can be taught to automate mundane, value-neutral tasks relating to raw data interpretation.
The 809 Panel Recommendation
The fact that industry is the driving force behind these trends that will transform the way that we view information in our day-to-day work, it is not surprising that the 809 Panel had this to say about existing defense business systems:
“Use existing defense business system open-data requirements to improve strategic decision making on acquisition and workforce issues…. DoD has spent billions of dollars building the necessary software and institutional infrastructure to collect enterprise wide acquisition and financial data. In many cases, however, DoD lacks the expertise to effectively use that data for strategic planning and to improve decision making. Recommendation 88 would mitigate this problem by implementing congressional open-data mandates and using existing hiring authorities to bolster DoD’s pool of data science professionals.”Section 809 Volume 3, Section 9, p. 477
At one point in my military career, I was assigned as the Materiel, Fuels, and Transportation Officer of Naval Air Station, Norfolk. As a major naval air base, transportation hub, and home to a Naval Aviation Depot, we shipped and received materiel and supplies across the world. In doing so, our transportation personnel would use what at the time was new digital technology to complete an electronic bill of lading that specified what and when items were being shipped, the common or military carrier, the intended recipient, and the estimated date of arrival, among other essential information.
The customer and receiving end of this workflow received an open systems data file that contained these particulars. The file was an early version of open data known as an X12 file, for which the commercial transportation industry was an early adopter. Shipping and receiving activities and businesses used their own type of local software: and there were a number of customized and commercial choices out there, as well as those used by common carriers such various trucking and shipping firms, the USPS, FEDEX, DHS, UPS, and others. The X12 file was the DMZ that made the information open. Software manufacturers, if they wanted to stay relevant in the market, could not impose a proprietary data solution.
Furthermore, standardization of terminology and concepts ensured that the information was readable and comprehensible wherever the items landed–whether across receiving offices in the United States, Japan, Europe, or even Istanbul. Understanding that DoD needs the skillsets to be able to optimize data, it didn’t require an army of data scientists to achieve this end-state. It required the right data science expertise in the right places, and the dictates of transportation consumers to move the technology market to provide the solution.
Over the years both industry and government have developed a number of schema standards focused on specific types of data, progressing from X12 to XML and now projected to use JSON-based schemas. Each of them in their initial iterations automated the submission of physical reports that had been required by either by contract or operations. These focused on a small subset of the full dataset relating to program management and project controls.
This progression made sense.
When digitized technology is first introduced into an intensive direct-labor environment, the initial focus is to automate the production of artifacts and their underlying processes in order to phase in the technology’s acceptance. This also allows the organization to realize immediate returns on investment and improvements in productivity. But this is the first step, not the final one.
Currently for project controls the current state is the UN/CEFACT XML for program performance management data, and the contract cost and labor data collection file known as the FlexFile. Clearly the latter file, given that the recipient is the Office of the Secretary of Defense Cost Assessment and Program Evaluation (OSD CAPE), establish it as one of many feedback loops that support that office’s role in coordinating the planning, programming, budgeting, and evaluation (PPBE) system related to military strategic investments and budgeting, but only one. The program performance information is also a vital part of the PPBE process in evaluation and in future planning.
For most of the U.S. economy, market forces and consumer requirements are the driving force in digital innovation. The trends noted by Ms. Gorbhani can be confirmed through a Google search of any one of the many technology magazines and websites that can be found. The 809 Panel, drawn as it was from specialists and industry and government, were tasked “to provide recommendations that would allow DoD to adapt and deliver capability at market speeds, while ensuring that DoD remains true to its commitment to promote competition, provide transparency in its actions, and maintain the integrity of the defense acquisition system.”
Given that the work of the DoD is unique, creating a type of monopsony, it is up to leadership within the Department to create the conditions and mandates necessary to recreate in microcosm the positive effects of market forces. The DoD also has a very special, vital mission in defending the nation.
When an individual business cobbles together its mission statement it is that mission that defines the necessary elements in data collection that are then essential in making decisions. In today’s world, best commercial sector practice is to establish a Master Data Management (MDM) approach in defining data requirements and practices. In the case of DoD, a similar approach would be beneficial. Concurrent with the period of the 809 Panel’s efforts, RAND Corporation delivered a paper in 2017 (link in the previous sentence) that made recommendations related to data governance that are consistent with the 809 Panel’s recommendations. We will be discussing these specific recommendations in our presentation.
Meeting the mission and readiness are the key components to data governance in DoD. Absent such guidance, specialized software solution providers, in particular, will engage in what is called “rent-seeking” behavior. This is an economic term that means that an “entity (that) seeks to gain added wealth without any reciprocal contribution of productivity.”
No doubt, given the marketing of software solution providers, it is hard for decision-makers to tell what constitutes an open data system. The motivation of a software solution is to make itself as “sticky” as possible and it does that by enticing a customer to commit to proprietary definitions, structures, and database schemas. Usually there are “black-boxed” portions of the software that makes traceability impossible and that complicates the issue of who exactly owns the data and the ability of the customer to optimize it and utilize it as the mission dictates.
Furthermore, data visualization components like dashboards are ubiquitous in the market. A cursory stroll through a tradeshow looks like a dashboard smorgasbord combined with different practical concepts of what constitutes “open” and “integration”.
As one DoD professional recently told me, it is hard to tell the software systems apart. To do this it is necessary to understand what underlies the software. Thus, a proposed honest-broker definition of an open data system is useful and the place to start, given that this is not a notional concept since such systems have been successfully been established.
The Definition of Open Data Systems
Practical experience in implementing open data systems toward the goal of optimizing essential information from our planning, acquisition, financial, and systems engineering systems informs the following proposed definition, which is based on commercial best practice. This proposal is also based on the principle that the customer owns the data.
- An open data system is one based on non-proprietary neutral schemas that allow for the effective capture of all essential elements from third-party proprietary and customized software for reporting and integration necessary to support both internal and external stakeholders.
- An open data system allows for complete traceability and transparency from the underlying database structure of the third-party software data, through the process of data capture, transformation, and delivery of data in the neutral schema.
- An open data system targets the loading of the underlying source data for analysis and use into a neutral database structure that replicates the structure of the neutral schema. This allows for 100% traceability and audit of data elements received through the neutral schema, and ensures that the receiving organization owns the data.
Under this definition, data from its origination to its destination is more easily validated and traced, ensuring quality and fidelity, and establishing confidence in its value. Given these characteristics, integration of data from disparate domains becomes possible. The tracking of conflicting indicators is mitigated, since open system data allows for its effective integration without the bias of proprietary coding or restrictions on data use. Finally, both government and industry will not only establish ownership of their data–a routine principle in commercial business–but also be free to utilize new technologies that optimize the use of that data.
In closing, Gahan Wilson, a cartoonist whose work appeared in National Lampoon, The New Yorker, Playboy, and other magazines recently passed.
When thinking of the barriers to the effective use of data, I came across this cartoon in The New Yorker:
Open Data is the key to effective integration and reporting–to the optimal use of information. Once mandated and achieved, our defense and business systems will be better informed and be able to test and verify assumed knowledge, address risk, and eliminate dogmatic and erroneous conclusions. Open Data is the driver of organizational transformation keyed to the effective understanding and use of information, and all that entails. Finally, Open Data is necessary to the mission and planning systems of both industry and the U.S. Department of Defense.