Post-Blogging NDIA Blues — The Latest News (Project Management Wonkish)

The National Defense Industrial Association’s Integrated Program Management Division (NDIA IPMD) just had its quarterly meeting here in sunny Orlando where we braved the depths of sub-60 degrees F temperatures to start out each day.

For those not in the know, these meetings are an essential coming together of policy makers, subject matter experts, and private industry practitioners regarding the practical and mundane state-of-the-practice in complex project management, particularly focused on the concerns of the the federal government and the Department of Defense.  The end result of these meetings is to publish white papers and recommendations regarding practice to support continuous process improvement and the practical application of project management practices–allowing for a cross-pollination of commercial and government lessons learned.  This is also the intersection where innovation among the large and small are given an equal vetting and an opportunity to introduce new concepts and solutions.  This is an idealized description, of course, and most of the petty personality conflicts, competition, and self-interest that plagues any group of individuals coming together under a common set of interests also plays out here.  But generally the days are long and the workshops generally produce good products that become the de facto standard of practice in the industry. Furthermore the control that keeps the more ruthless personalities in check is the fact that, while it is a large market, the complex project management community tends to be a relatively small one, which reinforces professionalism.

The “blues” in this case is not so much borne of frustration or disappointment but, instead, from the long and intense days that the sessions offer.  The biggest news from an IT project management and application perspective was twofold. The data stream used by the industry in sharing data in an open systems manner will be simplified.  The other was the announcement that the technology used to communicate will move from XML to JSON.

Human readable formatting to Data-focused formatting.  Under Kendall’s Better Buying Power 3.0 the goal of the Department of Defense (DoD) has been to incorporate better practices from private industry where they can be applied.  I don’t see initiatives for greater efficiency and reduction of duplication going away in the new Administration, regardless of what a new initiative is called.

In case this is news to you, the federal government buys a lot of materials and end items–billions of dollars worth.  Accountability must be put in place to ensure that the money is properly spent to acquire the things being purchased.  Where technology is pushed and where there are no commercial equivalents that can be bought off the shelf, as in the systems purchased by the Department of Defense, there are measures of progress and performance (given that the contract is under a specification) that are submitted to the oversight agency in DoD.  This is a lot of data and to be brutally frank the method and format of delivery has been somewhat chaotic, inefficient, and duplicative.  The Department moved to address this by a somewhat modest requirement of open systems submission of an application-neutral XML file under the standards established by the UN/CEFACT XML organization.  This was called the Integrated Program Management Report (IMPR).  This move garnered some improvement where it has been applied, but contracts are long-term, so incorporating improvements though new contractual requirements tends to take time.  Plus, there is always resistance to change.  The Department is moving to accelerate addressing these inefficiencies in their data streams by eliminating the unnecessary overhead associated with specifications of formatting data for paper forms and dealing with data as, well, data.  Great idea and bravo!  The rub here is that in making the change, the Department has proposed dropping XML as the technology used to transfer data and move to JSON.

XML to JSON. Before I spark another techie argument about the relative merits of each, there are some basics to understand here.  First, XML is a language, JSON is simply data exchange format.  This means that XML is specifically designed to deal with hierarchical and structured data that can be queried and where validation and fidelity checks within the data are inherent in the technology. Furthermore, XML is known to scale while maintaining the integrity of the data, which is intended for use in relational databases.  Furthermore, XML is hard to break.  It is meant for editing and will maintain its structure and integrity afterward.

The counter argument encountered is that JSON is new! and uses fewer characters! (which usually turns out to be inconsequential), and people are talking about it for Big Data and NoSQL! (but this happened after the fact and the reason for shoehorning it this way is discussed below).

So does it matter?  Yes and no.  As a supplier specializing in delivering solutions that normalize and rationalize data across proprietary file structures and leverage database capabilities, I don’t care.  I can adapt quickly and will have a proof-of-concept solution out within 30 days of receiving the schema.

The risk here, which applies to DoD and the industry, is that the decision to go to JSON is made only because it is the shiny new thing used by gamers and social networking developers.  There has also been a move to adapt to other uses because of the history of significant security risks that had been found in Java, so much so that an entire Wikipedia page is devoted to them.  Oracle just killed off Java applets, though Java hangs on.  JSON, of course, isn’t Java, but it was designed from birth as JavaScript Object Notation (hence the acronym JSON), with the purpose of handling relatively small bits of data across web servers in a number of proprietary settings.

To address JSON deficiencies relative to XML, a number of tools have been and are being developed to replicate the fidelity and reliability found in XML.  Whether this is sufficient to be effective against a structured LANGUAGE is to be seen.  Much of the overhead that technies complain about in XML is due to the native functionality related to the power it brings to the table.  No doubt, a bicycle is simpler than a Formula One racer–and this is an apt comparison.  Claiming “simpler” doesn’t pass the “So What?” test knowing the business processes involved.  The technology needs to be fit to the solution.  The purpose of data transmission using APIs is not only to make it easy to produce but for it to–you know–achieve the goals of normalization and rationalization so that it can be used on the receiving end which is where the consumer (which we usually consider to be the customer) sits.

At the end of the day the ability to scale and handle hierarchical, structured data will rely on the quality and strength of the schema and the tools that are published to enforce its fidelity and compliance.  Otherwise consuming organizations will be receiving a dozen different proprietary JSON files, and that does not address the present chaos but simply adds to it.  These issues were aired out during the meeting and it seems that everyone is aware of the risks and that they can be addressed.  Furthermore, as the schema is socialized across solutions providers, it will be apparent early if the technology will be able handle the project performance data resulting from the development of a high performance aircraft or a U.S. Navy destroyer.

Takin’ Care of Business — Information Economics in Project Management

Neoclassical economics abhors inefficiency, and yet inefficiencies exist.  Among the core issues that create inefficiencies is the asymmetrical nature of information.  Asymmetry is an accepted cornerstone of economics that leads to inefficiency.  We can see in our daily lives and employment the effects of one party in a transaction having more information than the other:  knowing whether the used car you are buying is a lemon, measuring risk in the purchase of an investment and, apropos to this post, identifying how our information systems allow us to manage complex projects.

Regarding this last proposition we can peel this onion down through its various levels: the asymmetry in the information between the customer and the supplier, the asymmetry in information between the board and stockholders, the asymmetry in information between management and labor, the asymmetry in information between individual SMEs and the project team, etc.–it’s elephants all the way down.

This asymmetry, which drives inefficiency, is exacerbated in markets that are dominated by monopoly, monopsony, and oligopoly power.  When informed by the work of Hart and Holmström regarding contract theory, which recently garnered the Nobel in economics, we have a basis for understanding the internal dynamics of projects in seeking efficiency and productivity.  What is interesting about contract theory is that it incorporates the concept of asymmetrical information (labeled as adverse selection), but expands this concept in human transactions at the microeconomic level to include considerations of moral hazard and the utility of signalling.

The state of asymmetry and inefficiency is exacerbated by the patchwork quilt of “tools”–software applications that are designed to address only a very restricted portion of the total contract and project management system–that are currently deployed as the state of the art.  These tend to require the insertion of a new class of SME to manage data by essentially reversing the efficiencies in automation, involving direct effort to reconcile differences in data from differing tools. This is a sub-optimized system.  It discourages optimization of information across the project, reinforces asymmetry, and is economically and practically unsustainable.

The key in all of this is ensuring that sub-optimal behavior is discouraged, and that those activities and behaviors that are supportive of more transparent sharing of information and, therefore, contribute to greater efficiency and productivity are rewarded.  It should be noted that more transparent organizations tend to be more sustainable, healthier, and with a higher degree of employee commitment.

The path forward where there is monopsony power, where there is a dominant buyer, is to impose the conditions for normative behavior that would otherwise be leveraged through practice in a more open market.  For open markets not dominated by one player as either supplier or seller, instituting practices that reward behavior that reduces the effects of asymmetrical information, and contracting disincentives in business transactions on the open market is the key.

In the information management market as a whole the trends that are working against asymmetry and inefficiency involve the reduction of data streams, the construction of cross-domain data repositories (or reservoirs) that allow for the satisfaction of multiple business stakeholders, and the introduction of systems that are more open and adaptable to the needs of the project system in lieu of a limited portion of the project team.  These solutions exist, yet their adoption is hindered because of the long-term infrastructure that is put in place in complex project management.  This infrastructure is supported by incumbents that are reinforcing to the status quo.  Because of this, from the time a market innovation is introduced to the time that it is adopted in project-focused organizations usually involves the expenditure of several years.

This argues for establishing an environment that is more nimble.  This involves the adoption of a series of approaches to achieve the goals of broader information symmetry and efficiency in the project organization.  These are:

a. Instituting contractual relationships, both internally and externally, that encourage project personnel to identify risk.  This would include incentives to kill efforts that have breached their framing assumptions, or to consolidate progress that the project has achieved to date–sending it as it is to production–while killing further effort that would breach framing assumptions.

b. Institute policy and incentives on the data supply end to reduce the number of data streams.  Toward this end both acquisition and contracting practices should move to discourage proprietary data dead ends by encouraging normalized and rationalized data schemas that describe the environment using a common or, at least, compatible lexicon.  This reduces the inefficiency derived from opaqueness as it relates to software and data.

c.  Institute policy and incentives on the data consumer end to leverage the economies derived from the increased computing power from Moore’s Law by scaling data to construct interrelated datasets across multiple domains that will provide a more cohesive and expansive view of project performance.  This involves the warehousing of data into a common repository or reduced set of repositories.  The goal is to satisfy multiple project stakeholders from multiple domains using as few streams as necessary and encourage KDD (Knowledge Discovery in Databases).  This reduces the inefficiency derived from data opaqueness, but also from the traditional line-and-staff organization that has tended to stovepipe expertise and information.

d.  Institute acquisition and market incentives that encourage software manufacturers to engage in positive signalling behavior that reduces the opaqueness of the solutions being offered to the marketplace.

In summary, the current state of project data is one that is characterized by “best-of-breed” patchwork quilt solutions that tend to increase direct labor, reduces and limits productivity, and drives up cost.  At the end of the day the ability of the project to handle risk and adapt to technical challenges rests on the reliability and efficiency of its information systems.  A patchwork system fails to meet the needs of the organization as a whole and at the end of the day is not “takin’ care of business.”

Big Time — Elements of Data Size in Scaling

I’ve run into additional questions about scalability.  It is significant to understand the concept in terms of assessing software against data size, since there are actually various aspect of approaching the issue.

Unlike situations where data is already sorted and structured as part of the core functionality of the software service being provided, this is in dealing in an environment where there are many third-party software “tools” that put data into proprietary silos.  These act as barriers to optimizing data use and gaining corporate intelligence.  The goal here is to apply in real terms the concept that the customers generating the data (or stakeholders who pay for the data) own the data and should have full use of it across domains.  In project management and corporate governance this is an essential capability.

For run-of-the-mill software “tools” that are focused on solving one problem, this often is interpreted as just selling a lot more licenses to a big organization.  “Sure we can scale!” is code for “Sure, I’ll sell you more licenses!”  They can reasonably make this assertion, particularly in client-server or web environments, where they can point to the ability of the database system on which they store data to scale.  This also comes with, usually unstated, the additional constraint that their solution rests on a proprietary database structure.  Such responses, through, are a form of sidestepping the question, nor is it the question being asked.  Thus, it is important for those acquiring the right software to understand the subtleties.

A review of what makes data big in the first place is in order.  The basic definition, which I outlined previously, came from NASA in describing data that could not be held in local memory or local storage.  Hardware capability, however, continues to grow exponentially, so that what is big data today is not big data tomorrow.  But in handling big data, it then becomes incumbent on software publishers to drive performance to allow their customers to take advantage of the knowledge contained in these larger data sets.

The elements that determine the size of data are:

a.  Table size

b.  Row or Record size

c.  Field size

d.  Rows per table

e.  Columns per table

f.  Indexes per table

Note the interrelationships of these elements in determining size.  Thus, recently I was asked how many records are being used on the largest tables being accessed by a piece of software.  That is fine as shorthand, but the other elements add to the size of the data that is being accessed.  Thus, a set of data of say 800K records may be just as “big” as one containing 2M records because of the overall table size of fields, and the numbers of columns and indices, as well as records.  Furthermore, this question didn’t take into account the entire breadth of data across all tables.

Understanding the definition of data size then leads us to understanding the nature of software scaling.  There are two aspects to this.

The first is the software’s ability to presort the data against the database in such as way as to ensure that latency–the delay in performance when the data is loaded–is minimized.  The principles applied here go back to database management practices back in the day when organizations used to hire teams of data scientists to rationalize data that was processed in machine language–especially when it used to be stored in ASCII or, for those who want to really date themselves, EBCDIC, which were incoherent by today’s generation of more human-readable friendly formats.

Quite simply, the basic steps applied has been to identify the syntax, translate it, find its equivalents, and then sort that data into logical categories that leverage database pointers.  What you don’t want the software to do is what used to be done during the earliest days of dealing with data, which was smaller by today’s standards, of serially querying ever data element in order to fetch only what the user is calling.  Furthermore, it also doesn’t make much sense to deal with all data as a Repository of Babel to apply labor-intensive data mining in non-relational databases, especially in cases where the data is well understood and fairly well structured, even if in a proprietary structure.  If we do business in a vertical where industry standards in data apply, as in the use of the UN/CEFACT XML convention, then much of the presorting has been done for us.  In addition, more powerful industry APIs (like OLE DB and ODBC) that utilize middleware (web services, XML, SOAP, MapReduce, etc.) multiply the presorting capabilities of software, providing significant performance improvements in accessing big data.

The other aspect is in the software’s ability to understand limitations in data communications hardware systems.  This is a real problem, because the backbone in corporate communication systems, especially to ensure security, is still largely done over a wire.  The investments in these backbones is usually categorized as a capital investment, and so upgrades to the system are slow.  Furthermore, oftentimes backbone systems are embedded in physical plant building structures.  So any software performance is limited by the resistance and bandwidth of wiring.  Thus, we live in a world where hardware storage and processing is doubling every 12-18 months, and software is designed to better leverage such expansion, but the wires over which data communication depends remains stuck in the past–constrained by the basic physics of CAT 6 or Fiber Optic cabling.

Needless to say, software manufacturers who rely on constant communications with the database will see significantly degraded performance.  Some software publishers who still rely on this model use the “check out” system, treating data like a lending library, where only one user or limited users can access the same data.  This, of course, reduces customer flexibility.  Strategies that are more discrete in handling data are the needed response here until day-to-day software communications can reap the benefits of physical advancements in this category of hardware.  Furthermore, organizations must understand that the big Cloud in the sky is not the answer, since it is constrained by the same physics as the rest of the universe–with greater security risks.

All of this leads me to a discussion I had with a colleague recently.  He opened his iPhone and did a query in iTunes for an album.  In about a second or so his query selected the artist and gave a list of albums–all done without a wire connection.  “Why can’t we have this in our industry?” he asked.  Why indeed?  Well, first off, Apple iTunes has sorted its data to optimize performance with its app, and it is performed within a standard data stream and optimized for the Apple iOS and hardware.  Secondly, the variables of possible queries in iTunes are predefined and tied to a limited and internally well-defined set of data.  Thus, the data and application challenges are not equivalent as found in my friend’s industry vertical.  For example, aside from the challenges of third party data normalization and rationalization, iTunes is not dealing with dynamic time-phased or trending data that requires multiple user updates to reflect changes using predictive analytics which is then served to different classes of users in a highly secure environment.  Finally, and most significantly, Apple spent more on that system than the annual budget of my friend’s organization.  In the end his question was a good one, but in discussing this it was apparent that just saying “give me this” is a form of magical thinking and hand waving.  The devil is in the details, though I am confident that we will eventually get to an equivalent capability.

At the end of the day IT project management strategy must take into account the specific needs of classes of users in making determinations of scaling.  What this means is a segmented approach: thick-client users, compartmentalized local installs with subsets of data, thin-client, and web/mobile or terminal services equivalents.  The practical solution is still an engineered one that breaks the elephant into digestible pieces that lean forward to leveraging advances in hardware, database, and software operating environments.  These are the essential building blocks to data optimization and scaling.

I Can See Clearly Now — Knowledge Discovery in Databases, Data Scalability, and Data Relevance

I recently returned from a travel and much of the discussion revolved around the issues of scalability and the use of data.  What is clear is that the conversation at the project manager level is shifting from a long-running focus on reports and metrics to one focused on data and what can be learned from it.  As with any technology, information technology exploits what is presented before it.  Most recently, accelerated improvements in hardware and communications technology has allowed us to begin to collect and use ever larger sets of data.

The phrase “actionable” has been thrown around quite a bit in marketing materials, but what does this term really mean?  Can data be actionable?  No.  Can intelligence derived from that data be actionable?  Yes.  But is all data that is transformed into intelligence actionable?  No.  Does it need to be?  No.

There are also kinds and levels of intelligence, particularly as it relates to organizations and business enterprises.  Here is a short list:

a. Competitive intelligence.  This is intelligence derived from data that informs decision makers about how their organization fits into the external environment, further informing the development of strategic direction.

b. Business intelligence.  This is intelligence derived from data that informs decision makers about the internal effectiveness of their organization both in the past and into the future.

c. Business analytics.  The transformation of historical and trending enterprise data used to provide insight into future performance.  This includes identifying any underlying drivers of performance, and any emerging trends that will manifest into risk.  The purpose is to provide sufficient early warning to allow risk to be handled before it fully manifests, therefore keeping the effort being measured consistent with the goals of the organization.

Note, especially among those of you who may have a military background, that what I’ve outlined is a hierarchy of information and intelligence that addresses each level of an organization’s operations:  strategic, operational, and tactical.  For many decision makers, translating tactical level intelligence into strategic positioning through the operational layer presents the greatest challenge.  The reason for this is that, historically, there often has been a break in the continuity between data collected at the tactical level and that being used at the strategic level.

The culprit is the operational layer, which has always been problematic for organizations and those individuals who find themselves there.  We see this difficulty reflected in the attrition rate at this level.  Some individuals cannot successfully make this transition in thinking. For example, in the U.S. Army command structure when advancing from the battalion to the brigade level, in the U.S. Navy command structure when advancing from Department Head/Staff/sea command to organizational or fleet command (depending on line or staff corps), and in business for those just below the C level.

Another way to look at this is through the traditional hierarchical pyramid, in which data represents the wider floor upon which each subsequent, and slightly reduced, level is built.  In the past (and to a certain extent this condition still exists in many places today) each level has constructed its own data stream, with the break most often coming at the operational level.  This discontinuity is then reflected in the inconsistency between bottom-up and top-down decision making.

Information technology is influencing and changing this dynamic by addressing the main reason for the discontinuity existing–limitations in data and intelligence capabilities.  These limitations also established a mindset that relied on limited, summarized, and human-readable reporting that often was “scrubbed” (especially at the operational level) as it made its way to the senior decision maker.  Since data streams were discontinuous, there were different versions of reality.  When aspects of the human equation are added, such as selection bias, the intelligence will not match what the data would otherwise indicate.

As I’ve written about previously in this blog, the application of Moore’s Law in physical computing performance and storage has pushed software to greater needs in scaling in dealing with ever increasing datasets.  What is defined as big data today will not be big data tomorrow.

Organizations, in reaction to this condition, have in many cases tended to simply look at all of the data they collect and throw it together into one giant pool.  Not fully understanding what the data may say, a number of ad hoc approaches have been taken.  In some cases this has caused old labor-intensive data mining and rationalization efforts to once again rise from the ashes to which they were rightly consigned in the past.  On the opposite end, this has caused a reliance on pre-defined data queries or hard-coded software solutions, oftentimes based on what had been provided using human-readable reporting.  Both approaches are self-limiting and, to a large extent, self-defeating.  In the first case because the effort and time to construct the system will outlive the needs of the organization for intelligence, and in the second case, because no value (or additional insight) is added to the process.

When dealing with large, disparate sources of data, value is derived through that additional knowledge discovered through the proper use of the data.  This is the basis of the concept of what is known as KDD.  Given that organizations know the source and type of data that is being collected, it is not necessary to reinvent the wheel in approaching data as if it is a repository of Babel.  No doubt the euphemisms, semantics, and lexicon used by software publishers differs, but quite often, especially where data underlies a profession or a business discipline, these elements can be rationalized and/or normalized given that the appropriate business cross-domain knowledge is possessed by those doing the rationalization or normalization.

This leads to identifying the characteristics* of data that is necessary to achieve a continuity from the tactical to the strategic level that will achieve some additional necessary qualitative traits such as fidelity, credibility, consistency, and accuracy.  These are:

  1. Tangible.  Data must exist and the elements of data should record something that correspondingly exists.
  2. Measurable.  What exists in data must be something that is in a form that can be recorded and is measurable.
  3. Sufficient.  Data must be sufficient to derive significance.  This includes not only depth in data but also, especially in the case of marking trends, across time-phasing.
  4. Significant.  Data must be able, once processed, to contribute tangible information to the user.  This goes beyond statistical significance noted in the prior characteristic, in that the intelligence must actually contribute to some understanding of the system.
  5. Timely.  Data must be timely so that it is being delivered within its useful life.  The source of the data must also be consistently provided over consistent periodicity.
  6. Relevant.  Data must be relevant to the needs of the organization at each level.  This not only is a measure to test what is being measured, but also will identify what should be but is not being measured.
  7. Reliable.  The sources of the data be reliable, contributing to adherence to the traits already listed.

This is the shorthand that I currently use in assessing a data requirements and the list is not intended to be exhaustive.  But it points to two further considerations when delivering a solution.

First, at what point does the person cease to be the computer?  Business analytics–the tactical level of enterprise data optimization–oftentimes are stuck in providing users with a choice of chart or graph to use in representing such data.  And as noted by many writers, such as this one, no doubt the proper manner of representing data will influence its interpretation.  But in this case the person is still the computer after the brute force computing is completed digitally.  There is a need for more effective significance-testing and modeling of data, with built-in controls for selection bias.

Second, how should data be summarized to the operational and strategic levels so that “signatures” can be identified that inform information?  Furthermore, it is important to understand what kind of data must supplement the tactical level data at those other levels.  Thus, data streams are not only minimized to eliminate redundancy, but also properly aligned to the level of data intelligence.

*Note that there are other aspects of data characteristics noted by other sources here, here, and here.  Most of these concern themselves with data quality and what I would consider to be baseline data traits, which need to be separately assessed and tested, as opposed to antecedent characteristics.

 

Living in the Material World — A Call for Open Data from Materials Science

Since advocating for transparent and open data schemas and data repositories, I’ve run into other high tech professionals who have opined that such data requirements are only applicable to my core competency in the U.S. Aerospace & Defense vertical.  Now, in the 26 November on-line edition of the journal Science, comes an opinion piece from a number of Chinese corrosion scientists under the title “Materials science: Share corrosion data” have advocated the development of open data infrastructures to make the age and life of existing metallurgic infrastructure easily accessible to technology in a non-proprietary format.

As the article states, “corrosion costs six cents for every dollar of gross domestic product in the United States. Globally, that amounts to more than US$4 trillion a year — equivalent to damages from 40 Hurricane Katrinas. Half of that cost is in corrosion prevention and control, the other half in damages and lost productivity.”

In the software technology industry, the recent issues regarding the Trans-Pacific Partnership (TPP) has revealed strains in U.S.-Chinese relations on intellectual property and proprietary systems.  So the danger is always that the source of the advocacy will be ignored, despite the clear economic argument in favor of what the editorial proposes.

But the concern is transnational in nature.  As the piece points out, the U.S. Department of Energy has initiated its own program in alignment with the U.S. National Institute of Standards and Technology (NIST) Materials Genome Initiative (MGI) to establish open repositories of data in support of the alternative energy industry.  Given the aging U.S. infrastructure and the dangers from sudden failure from these engineering systems, it seems wise to track and share data for materials corrosion and lifespans.  Think of the sudden bridge failures that have hit headlines over the last several years.

Since the cost of not doing so can be measured in lives as well as economic cost, such data normalization and rationalization in this area takes on the role of being an essential element of governance.

Stay Open — Open and Proprietary Databases (and Why It Matters)

The last couple of weeks have been fairly intense workwise and so blogging has lagged a bit.  Along the way the matter of databases came up at a customer site and what constitutes open data and what comprises proprietary data.  The reason why this issue matters to customers rests of several foundations.

First, in any particular industry or niche there is a wide variety of specialized apps that have blossomed.  This is largely due to Moore’s Law.  Looking at the number of hosted and web apps alone can be quite overwhelming, particularly given the opaqueness of what one is buying at any particular time when it comes to software technology.

Second, given this explosion, it goes without saying that the market will apply its scythe ruthlessly in thinning it.  Despite the ranting of ideologues, this thinning applies to both good and bad ideas, both sound and unsound businesses equally.  The few that remain are lucky, good, or good and lucky.  Oftentimes it is being first to market on an important market discriminator, regardless of the quality in its initial state, that determines winners.

Third, most of these technology solutions will run their software on proprietary database structures.  This undermines the concept that the customer owns the data.

The reasons why software solutions providers do this is multifaceted.  For example, the database structure is established to enhance the functionality and responsiveness of the application where the structure is leveraged to work optimally with the application’s logic.

But there are also more prosaic reasons for proprietary database structures.  First, the targeted vertical or segment may not be very structured regarding the type of data, so there is wide variation on database configuration and structure.  But there is also a more base underlying motivation to keep things this way:  the database structure is designed to protect the application’s data from easy access from third party tools and, as a result, make their solution “sticky” within the market segment that is captured.  That is, database structure is a way to build barriers to competition.

For incumbents that are stable, the main disadvantages to the customer lie in the use of the database as a means of tying them to the solution as a barrier to exit.  At the same time incumbents erect artificial barriers to data entry.  For software markets with a great deal of new entries and innovation that will lead to some thinning, picking the wrong solution using proprietary data structures can lead to real problems when attempting to transition to more stable alternatives.  For example, in the case of hosted applications not only is data not on the customer’s own database servers, but that data could be located far from the worksite or even geographically dispersed outside of the physical control of the customer.

Open APIs in using data mining and variations of it as the Shaman of Big Data prescribe unstructured and non-relational databases has served to, at least in everyone’s mind, minimize such proprietary concerns.  After all, it thought, we can just crack open the data–right?  Well…not so fast.  Given a number of data scientists, data analysts, and open API object tools mainframe types can regain the status they lost with the introduction of the PC and spend months building systems that will eventually rationalize data that has been locked in proprietary prisons.  Or perhaps not.  The bigger the data the bigger the problem.  The bigger the question the more one must bring in those who understand the difference between correlation and causation.  In the end it comes down to the mathematics and valid methods of determining in real terms the behavior of systems.

Or if you are a small or medium-sized business or organization you can just decide that the data is irretrievable, or effectively so, since the ROI is not there to make it retrievable.

Or you can avoid the inevitable and, if you do business in a highly structured market, such as project management, utilize some open standard such as the UN/CEFACT XML.  Then, when choosing a COTS solution in communicating with the market, determine that databases must, at a minimum, conform to the open standard in database design.  This provides maximum flexibility to the customer, who can then perform value analysis on competing products, based on a analysis of functionality, flexibility, and sustainability.

This places the customer back into the role of owning the data.

When You’re a Jet You’re a Jet all the Way — Software as a Change Agent for Professional Development

Earlier in the week Dave Gordon at his blog responded to my post on data normalization and rightly introduced the need for data rationalization.  I had omitted the last concept in my own post, but strongly implied that the two were closely aligned in my broad definition of normalization beyond the boundaries of eliminating redundancies.  In the end in thinking about this, I prefer Dave’s dichotomy because it more clearly defines what we are doing.

Later in the week I found myself elaborating on these issues in discussions with customers and other professionals in the project management discipline.  In the projects in which I am involved, what I have found is that the process of normalizing and rationalizing data, even historical data which, contrary to Dave’s assertion can be maintained–at least in my business–acts as a change agent in defining the agnostic characteristics what defines the type of data being normalized and rationalized.

What I mean here is that, for instance, a time-phased CPM schedule that eventually becomes an integrated master schedule has an analogue.  For years we have been told, mostly by marketing types working for software manufacturers, that there is a secret sauce that they provide that cannot be reconciled against their competitors.  As a result, entire professional organizations, conferences, white papers, and presentations have been given to prove this assertion.  When looking at the data, however, the assertion is invalid.

The key differentiator between CPM scheduling applications is the optimization engine.  That is the secret sauce and the black box where the valuable IP lies.  It is the algorithms in the optimization that identifies for us those schedule activities that are on the critical and near critical paths.  But when you run these engines side by side on the same schedule, their results are well within one standard deviation of one another.

Keep in mind that I’m talking about differences in data related to normalization and rationalization and whether these differences can be reconciled.  There are other differences in features between the applications that do make a difference in their use and functionality: whether they can lock down a baseline, manage multiple baselines, prevent future work from being planned and executed in the past (yes, this happens), handle hammocks, scale properly, etc.  Because of these functional differences the same data may have been given a different value in the table or the file.  As Dave Gordon rightly points out, reconciling what on the surface are irreconcilable values requires specialized knowledge.  Well, if you have that specialized knowledge then you can achieve what otherwise seems impossible.  Once you achieve this “impossible” feat, it quickly becomes apparent that the features and functions involved are based on a very limited number of values that are common across CPM scheduling applications.

This should not be surprising for those of you out there that have been doing this a long time.  Back in the 1980s we would use visual display boards to map out short segments of the schedule.  We would manually construct schedules in very rudimentary (by today’s standards) mainframe computers and get very long dot matrix representations of the schedule to tape to the “War Room” walls.  The resources, risks, etc. had to be drawn on the schedule.  This manual process required an understanding of CPM schedule construction similar to someone still using long division today.  There was actually a time when people had to memorize their log and square root tables.  it was not very efficient, but the deep understanding of the analogue schedule has since been lost with the introduction of new technology.  This came to mind when I saw on LinkedIn a question of the types of questions that should be asked of a master scheduler in an interview.

As a result of new technology, schedulers aligned themselves into camps based on the application they selected or was selected for them in their job.  Over time I have seen brand loyalty turn into partisanship.  Once again, this should not be surprising.  If you spent ten years of your career on a very popular scheduling application, anything that may undermine one’s investment in that choice–and which makes employment possible–will be deemed as a threat.

I first came upon this behavior years ago when I was serving as CIO for a project management organization.  Some PMOs could not share information–not even e-mail and documents–because most were using PCs and some were using Macs.  Problem was that the key PMO was using Mac.  This was before Microsoft and Apple got together and solved this for us.  Needless to say this undermined organizational effectiveness.  My attempt to get everyone on the same page in terms of operating system compatibility sparked a significant backlash.  Luckily for me Microsoft soon introduced its first solution to address this issue.  So, in the end, the “Macintites”, as we good naturedly called them, could use their Macs for business common to other parts of the organization.

This almost cultish behavior finds itself in new places today: in the iPhone and Droid wars, in the use of Agile, and among CPM scheduling application partisans.  It is true that those of us in the software industry certainly want to see brand loyalty.  It is one of the key measures of success in proving the product’s value and effectiveness.  But it need not undermine the fact that a scheduler is a key specialist in the project management discipline.  If you are a Jet, you don’t need to be a Jet all the way.

Since creating generic analogues of schedules from submitted third-party data, I have found that insights into project performance can be noted that previously were not available.  The power of digitization along with normalization and rationalization allows the data to be effectively integrated at the proper point of intersection with other dimensions of project performance such as cost performance and risk.  Freed from the shackles of having to learn the specific idiosyncrasies of particular applications, the deep understanding of scheduling is being reintroduced.  This is a long time in coming.