The last couple of weeks have been fairly intense workwise and so blogging has lagged a bit. Along the way the matter of databases came up at a customer site and what constitutes open data and what comprises proprietary data. The reason why this issue matters to customers rests of several foundations.
First, in any particular industry or niche there is a wide variety of specialized apps that have blossomed. This is largely due to Moore’s Law. Looking at the number of hosted and web apps alone can be quite overwhelming, particularly given the opaqueness of what one is buying at any particular time when it comes to software technology.
Second, given this explosion, it goes without saying that the market will apply its scythe ruthlessly in thinning it. Despite the ranting of ideologues, this thinning applies to both good and bad ideas, both sound and unsound businesses equally. The few that remain are lucky, good, or good and lucky. Oftentimes it is being first to market on an important market discriminator, regardless of the quality in its initial state, that determines winners.
Third, most of these technology solutions will run their software on proprietary database structures. This undermines the concept that the customer owns the data.
The reasons why software solutions providers do this is multifaceted. For example, the database structure is established to enhance the functionality and responsiveness of the application where the structure is leveraged to work optimally with the application’s logic.
But there are also more prosaic reasons for proprietary database structures. First, the targeted vertical or segment may not be very structured regarding the type of data, so there is wide variation on database configuration and structure. But there is also a more base underlying motivation to keep things this way: the database structure is designed to protect the application’s data from easy access from third party tools and, as a result, make their solution “sticky” within the market segment that is captured. That is, database structure is a way to build barriers to competition.
For incumbents that are stable, the main disadvantages to the customer lie in the use of the database as a means of tying them to the solution as a barrier to exit. At the same time incumbents erect artificial barriers to data entry. For software markets with a great deal of new entries and innovation that will lead to some thinning, picking the wrong solution using proprietary data structures can lead to real problems when attempting to transition to more stable alternatives. For example, in the case of hosted applications not only is data not on the customer’s own database servers, but that data could be located far from the worksite or even geographically dispersed outside of the physical control of the customer.
Open APIs in using data mining and variations of it as the Shaman of Big Data prescribe unstructured and non-relational databases has served to, at least in everyone’s mind, minimize such proprietary concerns. After all, it thought, we can just crack open the data–right? Well…not so fast. Given a number of data scientists, data analysts, and open API object tools mainframe types can regain the status they lost with the introduction of the PC and spend months building systems that will eventually rationalize data that has been locked in proprietary prisons. Or perhaps not. The bigger the data the bigger the problem. The bigger the question the more one must bring in those who understand the difference between correlation and causation. In the end it comes down to the mathematics and valid methods of determining in real terms the behavior of systems.
Or if you are a small or medium-sized business or organization you can just decide that the data is irretrievable, or effectively so, since the ROI is not there to make it retrievable.
Or you can avoid the inevitable and, if you do business in a highly structured market, such as project management, utilize some open standard such as the UN/CEFACT XML. Then, when choosing a COTS solution in communicating with the market, determine that databases must, at a minimum, conform to the open standard in database design. This provides maximum flexibility to the customer, who can then perform value analysis on competing products, based on a analysis of functionality, flexibility, and sustainability.
This places the customer back into the role of owning the data.