Post-Blogging NDIA Blues — The Latest News (Project Management Wonkish)

The National Defense Industrial Association’s Integrated Program Management Division (NDIA IPMD) just had its quarterly meeting here in sunny Orlando where we braved the depths of sub-60 degrees F temperatures to start out each day.

For those not in the know, these meetings are an essential coming together of policy makers, subject matter experts, and private industry practitioners regarding the practical and mundane state-of-the-practice in complex project management, particularly focused on the concerns of the the federal government and the Department of Defense.  The end result of these meetings is to publish white papers and recommendations regarding practice to support continuous process improvement and the practical application of project management practices–allowing for a cross-pollination of commercial and government lessons learned.  This is also the intersection where innovation among the large and small are given an equal vetting and an opportunity to introduce new concepts and solutions.  This is an idealized description, of course, and most of the petty personality conflicts, competition, and self-interest that plagues any group of individuals coming together under a common set of interests also plays out here.  But generally the days are long and the workshops generally produce good products that become the de facto standard of practice in the industry. Furthermore the control that keeps the more ruthless personalities in check is the fact that, while it is a large market, the complex project management community tends to be a relatively small one, which reinforces professionalism.

The “blues” in this case is not so much borne of frustration or disappointment but, instead, from the long and intense days that the sessions offer.  The biggest news from an IT project management and application perspective was twofold. The data stream used by the industry in sharing data in an open systems manner will be simplified.  The other was the announcement that the technology used to communicate will move from XML to JSON.

Human readable formatting to Data-focused formatting.  Under Kendall’s Better Buying Power 3.0 the goal of the Department of Defense (DoD) has been to incorporate better practices from private industry where they can be applied.  I don’t see initiatives for greater efficiency and reduction of duplication going away in the new Administration, regardless of what a new initiative is called.

In case this is news to you, the federal government buys a lot of materials and end items–billions of dollars worth.  Accountability must be put in place to ensure that the money is properly spent to acquire the things being purchased.  Where technology is pushed and where there are no commercial equivalents that can be bought off the shelf, as in the systems purchased by the Department of Defense, there are measures of progress and performance (given that the contract is under a specification) that are submitted to the oversight agency in DoD.  This is a lot of data and to be brutally frank the method and format of delivery has been somewhat chaotic, inefficient, and duplicative.  The Department moved to address this by a somewhat modest requirement of open systems submission of an application-neutral XML file under the standards established by the UN/CEFACT XML organization.  This was called the Integrated Program Management Report (IMPR).  This move garnered some improvement where it has been applied, but contracts are long-term, so incorporating improvements though new contractual requirements tends to take time.  Plus, there is always resistance to change.  The Department is moving to accelerate addressing these inefficiencies in their data streams by eliminating the unnecessary overhead associated with specifications of formatting data for paper forms and dealing with data as, well, data.  Great idea and bravo!  The rub here is that in making the change, the Department has proposed dropping XML as the technology used to transfer data and move to JSON.

XML to JSON. Before I spark another techie argument about the relative merits of each, there are some basics to understand here.  First, XML is a language, JSON is simply data exchange format.  This means that XML is specifically designed to deal with hierarchical and structured data that can be queried and where validation and fidelity checks within the data are inherent in the technology. Furthermore, XML is known to scale while maintaining the integrity of the data, which is intended for use in relational databases.  Furthermore, XML is hard to break.  It is meant for editing and will maintain its structure and integrity afterward.

The counter argument encountered is that JSON is new! and uses fewer characters! (which usually turns out to be inconsequential), and people are talking about it for Big Data and NoSQL! (but this happened after the fact and the reason for shoehorning it this way is discussed below).

So does it matter?  Yes and no.  As a supplier specializing in delivering solutions that normalize and rationalize data across proprietary file structures and leverage database capabilities, I don’t care.  I can adapt quickly and will have a proof-of-concept solution out within 30 days of receiving the schema.

The risk here, which applies to DoD and the industry, is that the decision to go to JSON is made only because it is the shiny new thing used by gamers and social networking developers.  There has also been a move to adapt to other uses because of the history of significant security risks that had been found in Java, so much so that an entire Wikipedia page is devoted to them.  Oracle just killed off Java applets, though Java hangs on.  JSON, of course, isn’t Java, but it was designed from birth as JavaScript Object Notation (hence the acronym JSON), with the purpose of handling relatively small bits of data across web servers in a number of proprietary settings.

To address JSON deficiencies relative to XML, a number of tools have been and are being developed to replicate the fidelity and reliability found in XML.  Whether this is sufficient to be effective against a structured LANGUAGE is to be seen.  Much of the overhead that technies complain about in XML is due to the native functionality related to the power it brings to the table.  No doubt, a bicycle is simpler than a Formula One racer–and this is an apt comparison.  Claiming “simpler” doesn’t pass the “So What?” test knowing the business processes involved.  The technology needs to be fit to the solution.  The purpose of data transmission using APIs is not only to make it easy to produce but for it to–you know–achieve the goals of normalization and rationalization so that it can be used on the receiving end which is where the consumer (which we usually consider to be the customer) sits.

At the end of the day the ability to scale and handle hierarchical, structured data will rely on the quality and strength of the schema and the tools that are published to enforce its fidelity and compliance.  Otherwise consuming organizations will be receiving a dozen different proprietary JSON files, and that does not address the present chaos but simply adds to it.  These issues were aired out during the meeting and it seems that everyone is aware of the risks and that they can be addressed.  Furthermore, as the schema is socialized across solutions providers, it will be apparent early if the technology will be able handle the project performance data resulting from the development of a high performance aircraft or a U.S. Navy destroyer.

Stay Open — Open and Proprietary Databases (and Why It Matters)

The last couple of weeks have been fairly intense workwise and so blogging has lagged a bit.  Along the way the matter of databases came up at a customer site and what constitutes open data and what comprises proprietary data.  The reason why this issue matters to customers rests of several foundations.

First, in any particular industry or niche there is a wide variety of specialized apps that have blossomed.  This is largely due to Moore’s Law.  Looking at the number of hosted and web apps alone can be quite overwhelming, particularly given the opaqueness of what one is buying at any particular time when it comes to software technology.

Second, given this explosion, it goes without saying that the market will apply its scythe ruthlessly in thinning it.  Despite the ranting of ideologues, this thinning applies to both good and bad ideas, both sound and unsound businesses equally.  The few that remain are lucky, good, or good and lucky.  Oftentimes it is being first to market on an important market discriminator, regardless of the quality in its initial state, that determines winners.

Third, most of these technology solutions will run their software on proprietary database structures.  This undermines the concept that the customer owns the data.

The reasons why software solutions providers do this is multifaceted.  For example, the database structure is established to enhance the functionality and responsiveness of the application where the structure is leveraged to work optimally with the application’s logic.

But there are also more prosaic reasons for proprietary database structures.  First, the targeted vertical or segment may not be very structured regarding the type of data, so there is wide variation on database configuration and structure.  But there is also a more base underlying motivation to keep things this way:  the database structure is designed to protect the application’s data from easy access from third party tools and, as a result, make their solution “sticky” within the market segment that is captured.  That is, database structure is a way to build barriers to competition.

For incumbents that are stable, the main disadvantages to the customer lie in the use of the database as a means of tying them to the solution as a barrier to exit.  At the same time incumbents erect artificial barriers to data entry.  For software markets with a great deal of new entries and innovation that will lead to some thinning, picking the wrong solution using proprietary data structures can lead to real problems when attempting to transition to more stable alternatives.  For example, in the case of hosted applications not only is data not on the customer’s own database servers, but that data could be located far from the worksite or even geographically dispersed outside of the physical control of the customer.

Open APIs in using data mining and variations of it as the Shaman of Big Data prescribe unstructured and non-relational databases has served to, at least in everyone’s mind, minimize such proprietary concerns.  After all, it thought, we can just crack open the data–right?  Well…not so fast.  Given a number of data scientists, data analysts, and open API object tools mainframe types can regain the status they lost with the introduction of the PC and spend months building systems that will eventually rationalize data that has been locked in proprietary prisons.  Or perhaps not.  The bigger the data the bigger the problem.  The bigger the question the more one must bring in those who understand the difference between correlation and causation.  In the end it comes down to the mathematics and valid methods of determining in real terms the behavior of systems.

Or if you are a small or medium-sized business or organization you can just decide that the data is irretrievable, or effectively so, since the ROI is not there to make it retrievable.

Or you can avoid the inevitable and, if you do business in a highly structured market, such as project management, utilize some open standard such as the UN/CEFACT XML.  Then, when choosing a COTS solution in communicating with the market, determine that databases must, at a minimum, conform to the open standard in database design.  This provides maximum flexibility to the customer, who can then perform value analysis on competing products, based on a analysis of functionality, flexibility, and sustainability.

This places the customer back into the role of owning the data.

The Water is Wide — Data Streams and Data Reservoirs

I’ll have an article that elaborates on some of the ramifications of data streams and data reservoirs on AITS.org, so stay tuned there.  In the meantime, I’ve had a lot of opportunities lately, in a practical way, to focus on data quality and approaches to data.  There is some criticism in our industry about using metaphors to describe concepts in computing.

Like any form of literature, however, there are good and bad metaphors.  Opposing them in general, I think, is contrarian posing.  Metaphors, after all, often allow us to discover insights into an otherwise opaque process, clarifying in our mind’s eye what is being observed through the process of deriving similarities to something more familiar.  Strong metaphors allow us to identify analogues among the phenomena being observed, providing a ready path to establishing a hypothesis.  Having served this purpose, we can test that hypothesis to see if the metaphor serves our purposes in contributing to understanding.

I think we have a strong set of metaphors in the case of data streams and data reservoirs.  So let’s define our terms.

Traditionally a data stream in communications theory is a set of data packets that are submitted in sequence.  For the purpose of systems theory, a data stream is data that is submitted between two entities either on a sequential real time or on a regular periodic basis.  A data reservoir is just what it sounds like it is.  Streams can be diverted to feed a reservoir, which diverts data for a specific purpose.  Thus, data in the reservoir is a repository of all data from the selected streams, and any alternative streams, that includes legacy data.  The usefulness of the metaphors are found in the way in which we treat these data.

So, for example, data streams in practical terms in project and business management are the artifacts that represent the work that is being performed.  This can be data relating to planning, production, financial management and execution, earned value, scheduling, technical performance, and risk for each period of measurement.  This data, then, requires real time analysis, inference, and distribution to decision makers.  Over time, this data provides trending and other important information that measures the inertia of the efforts in providing leading and predictive indicators.

Efficiencies can be realized by identifying duplication in data streams, especially if the data being provided into the streams are derived from a common dataset.  Streams can be modified to expand the data that is submitted, so as to eliminate alternative streams of data that add little value on their own, that is, that are stovepiped and suboptimized contrary to the maximum efficiency of the system.

In the case of data reservoirs, what these contain is somewhat different than the large repositories of metadata that must be mined.  On the contrary, a data reservoir contains a finite set of data, since what is contained in the reservoir is derived from the streams.  As such, these reservoirs contain much essential historical information to derive parametrics and sufficient data from which to derive organizational knowledge and lessons learned.  Rather than processing data in real time, the handling of data reservoirs are done to append the historical record of existing efforts to provide a fuller picture of performance and trending, and of closed out efforts that can inform systems approaches to similar future efforts.  While not quite fitting into the category of Big Data, such reservoirs can probably best be classified as Small Big Data.

Efficiencies from the streams into the reservoir can be realized if the data can be further definitized through the application of structured schemas, combined with flexible Data Exchange Instructions (DEIs) that standardize the lexicon, allowing for both data normalization and rationalization.  Still, there may be data that is not incorporated into such schemas, especially if the legacy metadata predates the schema specified for the applicable data streams.  In this case, data rationalization must be undertaken combined with standard APIs to provide consistency and structure to the data.  Even in this case, however, given the finite set since the data is specific to a system that uses a fairly standard lexicon, such rationalization will yield results that are valid.

Needless to say, applications that are agnostic to data and that provide on-the-fly flexibility in UI configuration by calling standard operating environment objects–also known as fourth generation software–have the greatest applicability to this new data paradigm.  This is because they most effectively leverage both flexibility in the evolution of the data streams to reach maximum efficiency, and in leveraging the lessons learned that are derived from the integration of data that was previously walled off from complementary data that will identify and clarify systems interdependencies.

 

Over at AITS.org Dave Gordon takes me to task on data normalization — and I respond with Data Neutrality

Dave Gordon at AITS.org takes me to task on my post regarding recommending using common schemas for certain project management data.  Dave’s alternative is to specify common APIs instead.   I am not one to dismiss alternative methods of reconciling disparate and, in their natural state, non-normalized data to find the most elegant solution.  My initial impression, though, is: been there, done that.

Regardless of the method used to derive significance from disparate sources of data that is of a common type, one still must obtain the cooperation of the players involved.  The ANSI X12 standard has been in use in the transportation industry for quite some time and has worked quite well, leaving the preference of proprietary solution up to the individual shippers.  The rule has been, however, that if you are going to write solutions for that industry that you need to allow the shipping info needed by any receiver to conform to a particular format so that it can be read regardless of the software involved.

Recently the U.S. Department of Defense, which had used certain ANSI X12 formats for particular data for quite some time has published and required a new set of schemas for a broader set of data under the rubric of the UN/CEFACT XML.  Thus, it has established the same approach as the transportation industry: taking an agnostic stand regarding software preferences while specifying that submitted data must conform to a common schema so that a proprietary file type is not given preference over another.

A little background is useful.  In developing major systems contractors are required to provide project performance data in order to ensure that public funds are being expended properly for the contracted effort.  This is the oversight responsibility portion of the equation.  The other side concerns project and program management.  Given the usual cost-plus contract type most often used, the government program management office in cooperation with its commercial counterpart looks to identify the manifestation of cost, schedule, and/or technical risk early enough to allow that risk to be handled as necessary.   Also, at the end of this process, which is only now being explored, is the usefulness of years of historical data across contract types, technologies, and suppliers that can be used to benefit the public interest by demonstrating which contractors perform better, to show the inherent risk associated with particular technologies through parametric methods, and a host of insights that can be derived through econometric project management trending and modeling.

So let’s assume that we can specify APIs in requesting the data in lieu of specifying that the customer can receive an application-agnostic file that can be read by any application that conforms with the data standard.  What is the difference?  My immediate observation is that is reverses the relationship in who owns the data.  In the case of the API the proprietary application becomes the gatekeeper.  In the case of an agnostic file structure it is open to everyone and the consumer owns the data.

In the API scenario large players can do what they want to limit competition and extensions to their functionality.  Since they can block box the manner in which data is structured, it also becomes increasingly difficult to make qualitative selections from the data.  The very example that Dave uses–the plethora of one-off mobile apps–usually must exist only in their own ecosystem.

So it seems to me that the real issue isn’t that Big Brother wants to control data structure.  What it comes down to is that specifying an open data structure defeats the ability of one or a group of solution providers from controlling the market through restrictions on accessing data.  This encourages maximum competition and innovation in the marketplace–Data Neutrality.

I look forward to additional information from Dave on this issue.  Each of the methods of achieving the end of Data Neutrality isn’t an end in itself.  Any method that is less structured and provides more flexibility is welcome.  I’m just not sure that we’re there yet with APIs.