Over at AITS.org Dave Gordon takes me to task on data normalization — and I respond with Data Neutrality

Dave Gordon at AITS.org takes me to task on my post regarding recommending using common schemas for certain project management data.  Dave’s alternative is to specify common APIs instead.   I am not one to dismiss alternative methods of reconciling disparate and, in their natural state, non-normalized data to find the most elegant solution.  My initial impression, though, is: been there, done that.

Regardless of the method used to derive significance from disparate sources of data that is of a common type, one still must obtain the cooperation of the players involved.  The ANSI X12 standard has been in use in the transportation industry for quite some time and has worked quite well, leaving the preference of proprietary solution up to the individual shippers.  The rule has been, however, that if you are going to write solutions for that industry that you need to allow the shipping info needed by any receiver to conform to a particular format so that it can be read regardless of the software involved.

Recently the U.S. Department of Defense, which had used certain ANSI X12 formats for particular data for quite some time has published and required a new set of schemas for a broader set of data under the rubric of the UN/CEFACT XML.  Thus, it has established the same approach as the transportation industry: taking an agnostic stand regarding software preferences while specifying that submitted data must conform to a common schema so that a proprietary file type is not given preference over another.

A little background is useful.  In developing major systems contractors are required to provide project performance data in order to ensure that public funds are being expended properly for the contracted effort.  This is the oversight responsibility portion of the equation.  The other side concerns project and program management.  Given the usual cost-plus contract type most often used, the government program management office in cooperation with its commercial counterpart looks to identify the manifestation of cost, schedule, and/or technical risk early enough to allow that risk to be handled as necessary.   Also, at the end of this process, which is only now being explored, is the usefulness of years of historical data across contract types, technologies, and suppliers that can be used to benefit the public interest by demonstrating which contractors perform better, to show the inherent risk associated with particular technologies through parametric methods, and a host of insights that can be derived through econometric project management trending and modeling.

So let’s assume that we can specify APIs in requesting the data in lieu of specifying that the customer can receive an application-agnostic file that can be read by any application that conforms with the data standard.  What is the difference?  My immediate observation is that is reverses the relationship in who owns the data.  In the case of the API the proprietary application becomes the gatekeeper.  In the case of an agnostic file structure it is open to everyone and the consumer owns the data.

In the API scenario large players can do what they want to limit competition and extensions to their functionality.  Since they can block box the manner in which data is structured, it also becomes increasingly difficult to make qualitative selections from the data.  The very example that Dave uses–the plethora of one-off mobile apps–usually must exist only in their own ecosystem.

So it seems to me that the real issue isn’t that Big Brother wants to control data structure.  What it comes down to is that specifying an open data structure defeats the ability of one or a group of solution providers from controlling the market through restrictions on accessing data.  This encourages maximum competition and innovation in the marketplace–Data Neutrality.

I look forward to additional information from Dave on this issue.  Each of the methods of achieving the end of Data Neutrality isn’t an end in itself.  Any method that is less structured and provides more flexibility is welcome.  I’m just not sure that we’re there yet with APIs.

3 thoughts on “Over at AITS.org Dave Gordon takes me to task on data normalization — and I respond with Data Neutrality

  1. I really don’t have much to add to this subject, other than to note that API’s are becoming a common engineering approach for exposing distributed data for external analysis. “Pulling” on demand, for analytics and decision support, is rapidly gaining traction in business back-office functions, CRM, and even manufacturing.

    For example: Workday is about to release the capability for a manager to identify which of their team members are “flight risks,” by comparing data from external job sites with internal data on skills, education, experience, compensation, and so on. They already have the ability to benchmark compensation against other employers; with the ability to tap into job boards to look for openings, they can identify which of their folks might be tempted, and develop a retention strategy, if warranted. And yes, this will work from a manager self-service app on an iPad, because the heavy crunching is done on some server, somewhere in the cloud.

    As Watson and other tools for natural language processing is applied to this sort of analysis, I think we’ll see some truly remarkable tools for management decision making.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s