Money for Nothing — Project Performance Data and Efficiencies in Timeliness

I operate in a well regulated industry focused on project management. What this means practically is that there are data streams that flow from the R&D activities, recording planning and progress, via control and analytical systems to both management and customer. The contract type in most cases is Cost Plus, with cost and schedule risk often flowing to the customer in the form of cost overruns and schedule slippages.

Among the methodologies used to determine progress and project eventual outcomes is earned value management (EVM). Of course, this is not the only type of data that flows in performance management streams, but oftentimes EVM is used as shorthand to describe all of the data captured and submitted to customers in performance management. Other planning and performance management data includes time-phased scheduling of tasks and activities, cost and schedule risk assessments, and technical performance.

Previously in my critique regarding the differences between project monitoring and project management (before Hurricane Irma created some minor rearranging of my priorities), I pointed out that “looking in the rear view mirror” was often used as an excuse for by-passing unwelcome business intelligence. I followed this up with an intro to the synergistic economics of properly integrated data. In the first case I answered the critique demonstrating that it is based on an old concept that no longer applies. In the second case I surveyed the economics of data that drives efficiencies. In both cases, new technology is key to understanding the art of the possible.

As I have visited sites in both government and private industry, I find that old ways of doing things still persist. The reason for this is multivariate. First, technology is developing so quickly that there is fear that one’s job will be eliminated with the introduction of technology. Second, the methodology of change agents in introducing new technology often lacks proper socialization across the various centers of power that inevitably exist in any organization. Third, the proper foundation to clearly articulate the need for change is not made. This last is particularly important when stakeholders perform a non-rational assessment in their minds of cost-benefit. They see many downsides and cannot accept the benefits, even when they are obvious. For more on this and insight into other socioeconomic phenomena I strongly recommend Daniel Kahneman’s Thinking Fast and Slow. There are other reasons as well, but these are the ones that are most obvious when I speak with individuals in the field.

The Past is Prologue

For now I will restrict myself to the one benefit of new technology that addresses the “looking in the rear window” critique. It is important to do so because the critique is correct in application (for purposes that I will outline) if incorrect in its cause-and-effect. It is also important to focus on it because the critique is so ubiquitous.

As I indicated above, there are many sources of data in project management. They derive from the following systems (in brief):

a. The planning and scheduling applications, which measure performance through time in the form of discrete activities and events. In the most sophisticated implementations, these applications will include the assignment of resources, which requires the integration of these systems with resource management. Sometimes simple costs are also assigned and tracked through time as well.

b. The cost performance (earned value) applications, which ideally are aligned with the planning and scheduling applications, providing cross-integration with WBS and OBS structures, but focused on work accomplishment defined by the value of work completed against a baseline plan. These performance figures are tied to work accomplishment through expended effort collected by and, ideally, integrated with the financial management system. It involves the proper application of labor rates and resource expenditures in the accomplishment of the work to not only provide an statistical assessment of performance to date, but a projection of likely cost performance outcomes at completion of the effort.

c. Risk assessment applications which, depending of their sophistication and ease of use, provide analysis of possible cost and schedule outcomes, identify the sensitivity of particular activities and tasks, provide an assessment of alternative driving and critical paths, and apply different models of baseline performance to predict future outcomes.

d. Systems engineering applications that provide an assessment of technical performance to date and the likely achievement of technical parameters within the scope of the effort.

e. The financial management applications that provide an accounting of funds allocation, cash-flow, and expenditure, including planning information regarding expenditures under contract and planned expenditures in the future.

These are the core systems of record upon which performance information is derived. There are others as well, depending on the maturity of the project such as ERP systems and MRP systems. But for purposes of this post, we will bound the discussion to these standard sources of data.

In the near past, our ability to understand the significance of the data derived from these systems required manual processing. I am not referring to the sophistication of human computers of 1960s and before, dramatized to great effect in the uplifting movie Hidden Figures. Since we are dealing with business systems, these methodologies were based on simple business metrics and other statistical methods, including those that extended the concept of earned value management.

With the introduction of PCs in the workplace in the 1980s, desktop spreadsheet applications allowed this data to be entered, usually from printed reports. Each analyst not only used standard methods common in the discipline, but also developed their own methods to process and derive importance from the data, transforming it into information and useful intelligence.

Shortly after this development simple analytical applications were introduced to the market that allowed for pairing back the amount of data deriving from some of these systems and performing basic standard calculations, rendering redundant calculations unnecessary. Thus, for example, instead of a person having to calculate multiple estimates to complete, the application could perform those calculations as part of its functionality and deliver them to the analyst for use in, hopefully, their own more extensive assessments.

But even in this case, the data flow was limited to the EVM silo. The data streams relating to schedule, risk, SE, and FM were left to their own devices, oftentimes requiring manual methods or, in the best of cases, cut-and-paste, to incorporate data from reports derived from these systems. In the most extreme cases, for project oversight organizations, this caused analysts to acquire a multiplicity of individual applications (with the concomitant overhead and complexity of understanding differing lexicons and software application idiosyncrasies) in order to read proprietary data types from the various sources just to perform simple assessments of the data before even considering integrating it properly into the context of all of the other project performance data that was being collected.

The bottom line of outlining these processes is to note that, given a combination of manual and basic automated tools, that putting together and reporting on this data takes time, and time, as Mr. Benjamin Franklin noted, is money.

By itself the critique that “looking in the rear view mirror” has no value and attributing it to one particular type of information (EVM) is specious. After all, one must know where one has been and presently is before you can figure out where you need to go and how to get there and EVM is just one dimension of a multidimensional space.

But there is a utility value associated with the timing and locality of intelligence and that is the issue.

Contributors to time

Time when expended to produce something is a form of entropy. For purposes of this discussion at this level of existence, I am defining entropy as availability of the energy in a system to do work. The work in this case is the processing and transformation of data into information, and the further transformation of information into usable intelligence.

There are different levels and sub-levels when evaluating the data stream related to project management. These are:

a. Within the supplier/developer/manufacturer

(1) First tier personnel such as Control Account Managers, Schedulers (if separate), Systems Engineers, Financial Managers, and Procurement personnel among other actually recording and verifying the work accomplishment;

(2) Second tier personnel that includes various levels of management, either across teams or in typical line-and-staff organizations.

b. Within customer and oversight organizations

(1) Reporting and oversight personnel tasks with evaluating the fidelity of specific business systems;

(2) Counterpart project or program officer personnel tasked with evaluating progress, risk, and any factors related to scope execution;

(3) Staff organizations designed to supplement and organize the individual project teams, providing a portfolio perspective to project management issues that may be affected by other factors outside of the individual project ecosystem;

(4) Senior management at various levels of the organization.

Given the multiplicity of data streams it appears that the issue of economies is vast until it is understood that the data that underlies the consumers of the information is highly structured and specific to each of the domains and sub-domains. Thus there are several opportunities for economies.

For example, cost performance and scheduling data have a direct correlation and are closely tied. Thus, these separate streams in the A&D industry were combined under a common schema, first using the UN/CEFACT XML, and now transitioning to a more streamlined JSON schema. Financial management has gone through a similar transition. Risk and SE data are partially incorporated into project performance schemas, but the data is also highly structured and possesses commonalities to be directly accessed using technologies that effectively leverage APIs.

Back to the Future

The current state, despite advances in the data formats that allow for easy rationalization and normalization of data that breaks through propriety barriers, still largely is based a slightly modified model of using a combination of manual processing augmented by domain-specific analytical tools. (Actually sub-domain analytical tools that support sub-optimization of data that are a barrier to incorporation of cross-domain integration necessary to create credible project intelligence).

Thus, it is not unusual at the customer level to see project teams still accepting a combination of proprietary files, hard copy reports, and standard schema reports. Usually the data in these sources is manually entered into Excel spreadsheets or a combination of Excel and some domain-specific analytical tool (and oftentimes several sub-specialty analytical tools). After processing, the data is oftentimes exported or built in PowerPoint in the form of graphs or standard reporting formats. This is information management by Excel and PowerPoint.

In sum, in all too many cases the project management domain, in terms of data and business intelligence, continues to party like it is 1995. This condition also fosters and reinforces insular organizational domains, as if the project team is disconnected from and can possess goals antithetical and/or in opposition to the efficient operation of the larger organization.

A typical timeline goes like this:

a. Supplier provides project performance data 15-30 days after the close of a period. (Some contract clauses give more time). Let’s say the period closed at the end of July. We are now effectively in late August or early September.

b. Analysts incorporate stove-piped domain data into their Excel spreadsheets and other systems another week or so after submittal.

c. Analysts complete processing and analyzing data and submit in standard reporting formats (Excel and PowerPoint) for program review four to six weeks after incorporation of the data.

Items a through c now put a typical project office at project review for July information at the end of September or beginning of October. Furthermore, this information is focused on individual domains, and given the lack of cross-domain knowledge, can be contradictory.

This system is broken.

Even suppliers who have direct access to systems of record all too often rely on domain-specific solutions to be able to derive significance from the processing of project management data. The larger suppliers seem to have recognized this problem and have been moving to address it, requiring greater integration across solutions. But the existence of a 15-30 day reconciliation period after the end of a period, and formalized in contract clauses, is indicative of an opportunity for greater efficiency in that process as well.

The Way Forward

But there is another way.

The opportunities for economy in the form of improvements in time and effort are in the following areas, given the application of the right technology:

  1. In the submission of data, especially by finding data commonalities and combining previously separate domain data streams to satisfy multiple customers;
  2. In retrieving all data so that it is easily accessible to the organization at the level of detailed required by the task at hand;
  3. In processing this data so that it can converted by the analyst into usable intelligence;
  4. In properly accessing, displaying, and reporting properly integrated data across domains, as appropriate, to each level of the organization regardless of originating data stream.

Furthermore, there opportunities to realizing business value by improving these processes:

  1. By extending expertise beyond a limited number of people who tend to monopolize innovations;
  2. By improving organizational knowledge by incorporating innovation into the common system;
  3. By gaining greater insight into more reliable predictors of project performance across domains instead of the “traditional” domain-specific indices that have marginal utility;
  4. By developing a project focused organization that breaks down domain-centric thinking;
  5. By developing a culture that ties cross-domain project knowledge to larger picture metrics that will determine the health of the overarching organization.

It is interesting that when I visit the field how often it is asserted that “the technology doesn’t matter, it’s process that matters”.

Wrong. Technology defines the art of the possible. There is no doubt that in an ideal world we would optimize our systems prior to the introduction of new technology. But that assumes that the most effective organization (MEO) is achievable without technological improvements to drive the change. If one cannot efficiently integrate all submitted cross-domain information effectively and efficiently using Excel in any scenario (after all, it’s a lot of data), then the key is the introduction of new technology that can do that very thing.

So what technologies will achieve efficiency in the use of this data? Let’s go through the usual suspects:

a. Will more effective use of PowerPoint reduce these timelines? No.

b. Will a more robust set of Excel workbooks reduce these timelines? No.

c. Will an updated form of a domain-specific analytical tool reduce these timelines? No.

d. Will a NoSQL solution reduce these timelines? Yes, given that we can afford the customization.

e. Will a COTS BI application that accepts a combination of common schemas and APIs reduce these timelines? Yes.

The technological solution must be fitted to its purpose and time. Technology matters because we cannot avoid the expenditure of time or energy (entropy) in the processing of information. We can perform these operations using a large amount of energy in the form of time and effort, or we can conserve time and effort by substituting the power of computing and information processing. While we will never get to the point where we completely eliminate entropy, our application of appropriate technology makes it seem as if effort in the form of time is significantly reduced. It’s not quite money for nothing, but it’s as close as we can come and is an obvious area of improvement that can be made for a relatively small investment.

Back in the Saddle Again — Putting the SME into the UI Which Equals UX

“Any customer can have a car painted any colour that he wants so long as it is black.”  — Statement by Henry Ford in “My Life and Work”, by Henry Ford, in collaboration with Samuel Crowther, 1922, page 72

The Henry Ford quote, which he made half-jokingly to his sales staff in 1909, is relevant to this discussion because the information sector has developed along the lines of the auto and many other industries.  The statement was only half-joking because Ford’s cars could be had in three colors.  But in 1909 Henry Ford had found a massive market niche that would allow him to sell inexpensive cars to the masses.  His competition wasn’t so much as other auto manufacturers, many of whom catered to the whims of the rich and more affluent members of society, but against the main means of individualized transportation at the time–the horse and buggy.  The color was not so much important to this market as was the need for simplicity and utility.

Since the widespread adoption of the automobile and the expansion of the market with multiple competitors, high speed roadways, a more affluent society anchored by a middle class, and the impact of industrial and information systems development in shaping societal norms, the automobile consumer has, over time, become more varied and sophisticated.  Today automobiles have introduced a number of new features (and color choices)–from backup cameras, to blind spot and back-up warning signals, to lane control, auto headline adjustment, and many other innovations.  Enhancements to industrial production that began with the introduction of robotics into the assembly line back in the late 1970s and early 1980s, through to the adoption of Just-in-Time (JiT) and Lean principles in overall manufacturing, provide consumers a a multitude of choices.

We are seeing a similar evolution in information systems, which leads me to the title of this post.  During the first waves of information systems development and introduction into our governing and business systems, the process has been one in which software is developed first to address an activity that is completed manually.  There would be a number of entries into a niche market (or for more robustly capitalized enterprises into an entire vertical).  The software would be fairly simplistic and the features limited, the objects (the way the information is presented and organized on the screen, the user selections, and the charts, graphs, and analytics allowed to enhance information visibility) well defined, and the UI (user interface) structured along the lines of familiar formats and views.

To include the input of the SME into this process, without specific soliciting of advice, was considered both intrusive and disruptive.  After all, software development largely was an activity confined to a select and highly trained specialty involving sophisticated coding languages that required a good deal of talent to be considered “elegant”.  I won’t go into a definition of elegance here, which I’ve addressed in previous posts, but for a short definition it is this:  the fewest bits of code possible that both maximizes computing power and provides the greatest flexibility for any given operation or set of operations.

This is no mean feat and a great number of software applications are produced in this way.  Since the elegance of any line of code varies widely by developer and organization, the process of update and enhancement can involve a great deal of human capital and time.  Thus, the general rule has been that the more sophisticated that any software application is, the more effort and thus less flexibility that the application possesses.  Need a new chart?  We’ll update you next year.  Need a new set of views or user settings?  We’ll put your request on the road-map and maybe you’ll see something down the road.

It is not as if the needs and requests of users have always been ignored.  Most software companies try to satisfy the needs of their customer, balancing the demands of the market against available internal resources.  Software websites, such as at UXmatters in this article, have advocated the ways that the SME (subject-matter expert) needs to be at the center of the design process.

With the introduction of fourth-generation adaptive software environments–that is, those systems that leverage underlying operating environments and objects such as .NET and WinForms, that are open to any data through OLE DB and ODBC, and that leave the UI open to simple configuration languages that leverage these underlying capabilities and place them at the feet of the user–put the SME at the center of the design process into practice.

This is a development in software as significant as the introduction of JiT and Lean in manufacturing, since it removes both the labor and time-intensiveness involved in rolling out software solutions and enhancements.  Furthermore, it goes one step beyond these processes by allowing the SME to roll out multiple software solutions from one common platform that is only limited by access to data.  It is as if each organization and SME has a digital printer for software applications.

Under this new model, software application manufacturers have a flexible environment to pre-configure the 90% solution to target any niche or market, allowing their customers to fill in any gaps or adapt the software as they see fit.  There is still IP involved in the design and construction of the “canned” portion of the solution, but the SME can be placed into the middle of the design process for how the software interacts with the user–and to do so at the localized and granular level.

This is where we transform UI into UX, that is, the total user experience.  So what is the difference?  In the words of Dain Miller in a Web Designer Depot article from 2011:

UI is the saddle, the stirrups, and the reigns.

UX is the feeling you get being able to ride the horse, and rope your cattle.

As we adapt software applications to meet the needs of the users, the role of the SME can answer many of the questions that have vexed many software implementations for years such as user perceptions and reactions to the software, real and perceived barriers to acceptance, variations in levels of training among users, among others.  Flexible adaptation of the UI will allow software applications to be more successfully localized to not only meet the business needs of the organization and the user, but to socialize the solution in ways that are still being discovered.

In closing this post a bit of full disclosure is in order.  I am directly involved in such efforts through my day job and the effects that I am noting are not simply notional or aspirational.  This is happening today and, as it expands throughout industry, will disrupt the way in which software is designed, developed, sold and implemented.

New Directions — Fourth Generation apps, Agile, and the New Paradigm

The world is moving forward and Moore’s Law is accelerating in interesting ways on the technology side, which opens new opportunities, especially in software.  In the past I have spoken of the flexibility of Fourth Generation software, that is, software that doesn’t rely on structured hardcoding, but instead, is focused on the data to deliver information to the user in more interesting and essential ways.  I work in this area for my day job, and so using such technology has tipped over more than a few rice bowls.

The response from entrenched incumbents and those using similar technological approaches in the industry focused on “tools” capabilities has been to declare vices as virtues.  Hard-coded applications that require long-term development and structures, built on proprietary file and data structures are, they declare, the right way to do things.  “We provide value by independently developing IP based on customer requirements,” they declare.  It sounds very reasonable, doesn’t it?  Only one problem: you have to wait–oh–a year or two to get that chart or graph you need, to refresh that user interface, to expand functionality, and you will almost never be able to leverage the latest capabilities afforded by the doubling of computing capability every 12 to 24 months.  The industry is filled with outmoded, poorly supported, and obsolete “tools’ already.  Guess it’s time for a new one.

The motivation behind such assertions, of course, is to slow things down.  Not possessing the underlying technology to provide more, better, and more powerful functionality to the customer quicker and more flexibly based on open systems principles, that is, dealing with data in an agnostic manner, they use their position to try to hold up disruptive entries from leaving them far behind.  This is done, especially in the bureaucratic complexities of A&D and DoD project management, through professional organizations that are used as thinly disguised lobbying opportunities by software suppliers such as the NDIA, or by appeals to contracting rules that they hope will undermine the introduction of new technologies.

All of these efforts, of course, are blowing into the wind.  The economics of the new technologies is too compelling for anyone to last long in their job by partying like it’s still 1997 under the first wave of software solutions targeted at data silos and stove-piped specialization.

The new paradigm is built on Agile and those technologies that facilitate that approach.  In case my regular readers think that I have become one of the Cultists, bowing before the Manfesto That May Not Be Named, let me assure you that is not the case.  The best articulation of Agile that I have read recently comes from Neil Killick, whom I have expressed some disagreement on the #NoEstimates debate and the more cultish aspects of Agile in past posts, but who published an excellent post back in July entitled “12 questions to find out: Are you doing Agile Software Development?”

Here are Neil’s questions:

  1. Do you want to do Agile Software Development? Yes – go to 2. No – GOODBYE.
  2. Is your team regularly reflecting on how to improve? Yes – go to 3. No – regularly meet with your team to reflect on how to improve, go to 2.
  3. Can you deliver shippable software frequently, at least every 2 weeks? Yes – go to 4. No – remove impediments to delivering a shippable increment every 2 weeks, go to 3.
  4. Do you work daily with your customer? Yes – go to 5. No – start working daily with your customer, go to 4.
  5. Do you consistently satisfy your customer? Yes – go to 6. No – find out why your customer isn’t happy, fix it, go to 5.
  6. Do you feel motivated? Yes – go to 7. No – work for someone who trusts and supports you, go to 2.
  7. Do you talk with your team and stakeholders every day? Yes – go to 8. No – start talking with your team and stakeholders every day, go to 7.
  8. Do you primarily measure progress with working software? Yes – go to 9. No – start measuring progress with working software, go to 8.
  9. Can you maintain pace of development indefinitely? Yes – go to 10. No – take on fewer things in next iteration, go to 9.
  10. Are you paying continuous attention to technical excellence and good design? Yes – go to 11. No – start paying continuous attention to technical excellent and good design, go to 10.
  11. Are you keeping things simple and maximising the amount of work not done? Yes – go to 12. No – start keeping things simple and writing as little code as possible to satisfy the customer, go to 11.
  12. Is your team self-organising? Yes – YOU’RE DOING AGILE SOFTWARE DEVELOPMENT!! No – don’t assign tasks to people and let the team figure out together how best to satisfy the customer, go to 12.

Note that even in software development based on Agile you are still “provid(ing) value by independently developing IP based on customer requirements.”  Only you are doing it faster and more effectively.

Now imagine a software technology that is agnostic to the source of data, that does not require a staff of data scientists, development personnel, and SMEs to care and feed it; that allows multiple solutions to be released from the same technology; that allows for integration and cross-data convergence to gain new insights based on Knowledge Discovery in Databases (KDD) principles; and that provides shippable, incremental solutions every two weeks or as often as can be absorbed by the organization, but responsively enough to meet multiple needs of the organization at any one time.

This is what is known as disruptive value.  There is no stopping this train.  It is the new paradigm and it’s time to take advantage of the powerful improvements in productivity, organizational effectiveness, and predictive capabilities that it provides.  This is the power of technology combined with a new approach to “small” big data, or structured data, that is effectively normalized and rationalized to the point of breaking down proprietary barriers, hewing to the true meaning of making data–and therefore information–both open and accessible.

Furthermore, such solutions using the same data streams produced by the measurement of work can also be used to evaluate organizational and systems compliance (where necessary), and effectiveness.  Combined with an effective feedback mechanism, data and technology drive organizational improvement and change.  There is no need for another tool to layer with the multiplicity of others, with its attendant specialized training, maintenance, and dead-end proprietary idiosyncrasies.  On the contrary, such an approach is an impediment to data maximization and value.

Vices are still vices even in new clothing.  Time to come to the side of the virtues.

Do You Believe in Magic? — Big Data, Buzz Phrases, and Keeping Feet Planted Firmly on the Ground

My alternative title for this post was “Money for Nothing,” which is along the same lines.  I have been engaged in discussions regarding Big Data, which has become a bit of a buzz phrase of late in both business and government.  Under the current drive to maximize the value of existing data, every data source, stream, lake, and repository (and the list goes on) has been subsumed by this concept.  So, at the risk of being a killjoy, let me point out that not all large collections of data is “Big Data.”  Furthermore, once a category of data gets tagged as Big Data, the further one seems to depart from the world of reality in determining how to approach and use the data.  So for of you who find yourself in this situation, let’s take a collective deep breath and engage our critical thinking skills.

So what exactly is Big Data?  Quite simply, as noted by this article in Forbes by Gil Press, term is a relative one, but generally means from a McKinsey study, “datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.”  This subjective definition is a purposeful one, since Moore’s Law tends to change what is viewed as simply digital data as opposed to big data.  I would add some characteristics to assist in defining the term based on present challenges.  Big data at first approach tends to be unstructured, variable in format, and does not adhere to a schema.  Thus, not only is size a criteria for the definition, but also the chaotic nature of the data that makes it hard to approach.  For once we find a standard means of normalizing, rationalizing, or converting digital data, it no longer is beyond the ability of standard database tools to effectively use it.  Furthermore, the very process of taming it thereby renders it non-big data, or perhaps, if a exceedingly large dataset, perhaps “small big data.”

Thus, having defined our terms and the attributes of the challenge we are engaging, we now can eliminate many of the suppositions that are floating around in organizations.  For example, there is a meme that I have come upon that asserts that disparate application file data can simply be broken down into its elements and placed into database tables for easy access by analytical solutions to derive useful metrics.  This is true in some ways but both wrong and dangerous in its apparent simplicity.  For there are many steps missing in this process.

Let’s take, for example, the least complex example in the use of structured data submitted as proprietary files.  On its surface this is an easy challenge to solve.  Once someone begins breaking the data into its constituent parts, however, greater complexity is found, since the indexing inherent to data interrelationships and structures are necessary for its effective use.  Furthermore, there will be corruption and non-standard use of user-defined and custom fields, especially in data that has not undergone domain scrutiny.  The originating third-party software is pre-wired to be able to extract this data properly.  Absent having to use and learn multiple proprietary applications with their concomitant idiosyncrasies, issues of sustainability, and overhead, such a multivariate approach defeats the goal of establishing a data repository in the first place by keeping the data in silos, preventing integration.  The indexing across, say, financial systems or planning systems are different.  So how do we solve this issue?

In approaching big data, or small big data, or datasets from disparate sources, the core concept in realizing return on investment and finding new insights, is known as Knowledge Discovery in Databases or KDD.  This was all the rage about 20 years ago, but its tenets are solid and proven and have evolved with advances in technology.  Back then, the means of extracting KDD from existing databases was the use of data mining.

The necessary first step in the data mining approach is pre-processing of data.  That is, once you get the data into tables it is all flat.  Every piece of data is the same–it is all noise.  We must add significance and structure to that data.  Keep in mind that we live in this universe, so there is a cost to every effort known as entropy.  Computing is as close as you’ll get to defeating entropy, but only because it has shifted the burden somewhere else.  For large datasets it is pushed to pre-processing, either manual or automated.  In the brute force world of data mining, we hire data scientists to pre-process the data, find commonalities, and index it.  So let’s review this “automated” process.  We take a lot of data and then add a labor-intensive manual effort to it in order to derive KDD.  Hmmm..  There may be ROI there, or there may not be.

But twenty years is a long time and we do have alternatives, especially in using Fourth Generation software that is focused on data usage without the limitations of hard-coded “tools.”  These alternatives apply when using data on existing databases, even disparate databases, or file data structured under a schema with well-defined data exchange instructions that allow for a consistent manner of posting that data to database tables. The approach in this case is to use APIs.  The API, like OLE DB or the older ODBC, can be used to read and leverage the relative indexing of the data.  It will still require some code to point it in the right place and “tell” the solution how to use and structure the data, and its interrelationship to everything else.  But at least we have a means for reducing the cost associated with pre-processing.  Note that we are, in effect, still pre-processing data.  We just let the CPU do the grunt work for us, oftentimes very quickly, while giving us control over the decision of relative significance.

So now let’s take the meme that I described above and add greater complexity to it.  You have all kinds of data coming into the stream in all kinds of formats including specialized XML, open, black-boxed data, and closed proprietary files.  This data is non-structured.  It is then processed and “dumped” into a non-relational database such as NoSQL.  How do we approach this data?  The answer has been to return to a hybrid of pre-processing, data mining, and the use of APIs.  But note that there is no silver bullet here.  These efforts are long-term and extremely labor intensive at this point.  There is no magic.  I have heard time and again from decision makers the question: “why can’t we just dump the data into a database to solve all our problems?”  No, you can’t, unless you’re ready for a significant programmatic investment in data scientists, database engineers, and other IT personnel.  At the end, what they deploy, when it gets deployed, may very well be obsolete and have wasted a good deal of money.

So, once again, what are the proper alternatives?  In my experience we need to get back to first principles.  Each business and industry has commonalities that transcend proprietary software limitations by virtue of the professions and disciplines that comprise them.  Thus, it is domain expertise to the specific business that drives the solution.  For example, in program and project management (you knew I was going to come back there) a schedule is a schedule, EVM is EVM, financial management is financial management.

Software manufacturers will, apart from issues regarding relative ease of use, scalability, flexibility, and functionality, attempt to defend their space by establishing proprietary lexicons and data structures.  Not being open, while not serving the needs of customers, helps incumbents avoid disruption from new entries.  But there often comes a time when it is apparent that these proprietary definitions are only euphemisms for a well-understood concept in a discipline or profession.  Cat = Feline.  Dog = Canine.

For a cohesive and well-defined industry the solution is to make all data within particular domains open.  This is accomplished through the acceptance and establishment of a standard schema.  For less cohesive industries, but where the data or incumbents through the use of common principles have essentially created a de facto schema, APIs are the way to extract this data for use in analytics.  This approach has been applied on a broader basis for the incorporation of machine data and signatures in social networks.  For closed or black-boxed data, the business or industry will need to execute gap analysis in order to decide if database access to such legacy data is truly essential to its business, or given specification for a more open standard from “time-now” will eventually work out suboptimization in data.

Most important of all and in the end, our results must provide metrics and visualizations that can be understood, are valid, important, material, and be right.

Rise of the Machines — Drivers of Change in Business and Project Management

Last week I found myself in business development mode, as I often am, in explaining to a prospective client our future plans in terms of software development.  The point that I was making was that it was not our goal to simply reproduce the functionality that every other software solution provider offered, but to improve how the industry does business by making the drive for change through the application of appropriate technology so compelling through efficiencies, elimination of redundancy, and improved productivity, that not making the change would be deemed foolish.  In sum, we are out to take a process and improve on it through the application of disruptive technology.  I highlighted my point by stating:  “It is not our goal to simply reproduce functionality so we can party like it’s 1998, it’s been eight software generations since that time and technology has provided us smarter and better ways of doing things.”

I received the usual laughter and acknowledgement by some of the individuals to whom I was making this point, but one individual rejoined: “well, I don’t mind doing things like we did it in 1998,” or words to that effect.  I acknowledged the comment, but then reiterated that our goal was somewhat more proactive.  We ended the conversation in a friendly manner and I was invited to come back and show our new solution upon release to market.

Still, the rejoinder of being satisfied with things the way they are has stuck with me.  No doubt that being a nerd (and years as a U.S. Navy officer) have inculcated a drive in me for constant process improvement.  My default position going into a discussion is that the individuals that I am addressing share that attitude with me.  But that is not always the case.

The kneejerk position of other geeks is often of derision when confronted by resistance to change.  But not every critic or skeptic is a Luddite, and it is important to understand the basis for both criticism and skepticism.  For many of our colleagues in the project management world, software technology is a software application, something that “looks into the rear glass window.”  This meme is pervasive out there, but it is wrong.  Understanding why it is wrong is important in addressing the concerns behind them in an appropriate manner.

This view is wrong because the first generations of software that serve this market simply replicated the line and staff, specialization, and business process and analysis regime that existed prior to digitization.  Integration of data that could provide greater insight was not possible at a level of detail needed to establish confidence.  The datasets upon which we derived our data were not flexible, nor did they allow for widespread distribution of more advanced corporate and institutional knowledge.  In fact, the first software generation in project management often supported and sustained the subject matter expert (SME) framework, in which only a few individuals possessed advanced knowledge of methods and analytics, upon which the organization had to rely.

We still see this structure in place in much of industry and government–and it is self-sustaining, since it involves not only individuals within the organization that possess this attribute, but also a plethora of support contractors and consultants who have built their businesses to support it.

Additional resistance comes from individuals who have dealt with new entries in the past, which turned out only to be incremental or marginal improvements for what is already in place, not to mention the few bad actors that come along.  Established firms in the market take this approach in order to defend market share and like the SME structure, it is self-sustaining by attempting to establish a barrier to new entrants into the market.  At the same time they establish an environment of stability and security from which buyers are hesitant to leave, thus the prospective customer is content to “party like it’s 1998.”

Value proposition alone will not change the mind of those who are content.  You sell what a prospective customer needs, not usually solely what they want.  For those introducing disruptive innovation, the key is to be at the forefront in shifting the basis for what defines the basis of market need.

For example, in business and project systems, the focus has always been on “tools.”  Given the engineering domain that is dominant in many project management organizations, such terminology provides a comfortable and familiar way of addressing technology.  Getting the “right set of tools” and “using the right tool for the work” are the implicit assumptions in using such simplistic metaphors.  This has caused many companies and organizations to issue laundry lists of features and functionality in order to compare solutions when doing market surveys.  Such lists are self-limiting, supporting the self-reinforcing systems mentioned above.  Businesses who rely on this approach to the technology market are not open to leveraging the latest capabilities in improving their systems.  The metaphor of the “tool” is an out of date one.

The shift, which is accelerating in the commercial world, is emphasis on software technology that is focused on the capabilities inherent in the effective use of data.  In today’s world data is king, and the core issue is who owns the data.  I have referred to some of the new metaphors in data in my last post and, no doubt, new ones will arise.  What is important to know about the shift to an emphasis on data and its use is that it is driving organizational change that not only breaks down the “tool”-based approach to the market, but also undermines the software market emphasis on tool functionality, and on the organizational structure and support market built on the SME.

There is always fear surrounding such rapid change, and I will not argue against the fact that some of it needs to be addressed.  For example, the rapid displacement through digitization of previously human-centered manual work that previously required expertise and which paid well, will soon become one of the most important challenges of our time.  I am optimistic that the role of the SME simply needs to follow the shift, but I have no doubt that the shift will require fewer SMEs.  This highlights, however, that the underlying economics of the shift will make it both compelling and necessary.

Very soon, it will be impossible to “party like it’s 1998” and still be in business.

The Water is Wide — Data Streams and Data Reservoirs

I’ll have an article that elaborates on some of the ramifications of data streams and data reservoirs on AITS.org, so stay tuned there.  In the meantime, I’ve had a lot of opportunities lately, in a practical way, to focus on data quality and approaches to data.  There is some criticism in our industry about using metaphors to describe concepts in computing.

Like any form of literature, however, there are good and bad metaphors.  Opposing them in general, I think, is contrarian posing.  Metaphors, after all, often allow us to discover insights into an otherwise opaque process, clarifying in our mind’s eye what is being observed through the process of deriving similarities to something more familiar.  Strong metaphors allow us to identify analogues among the phenomena being observed, providing a ready path to establishing a hypothesis.  Having served this purpose, we can test that hypothesis to see if the metaphor serves our purposes in contributing to understanding.

I think we have a strong set of metaphors in the case of data streams and data reservoirs.  So let’s define our terms.

Traditionally a data stream in communications theory is a set of data packets that are submitted in sequence.  For the purpose of systems theory, a data stream is data that is submitted between two entities either on a sequential real time or on a regular periodic basis.  A data reservoir is just what it sounds like it is.  Streams can be diverted to feed a reservoir, which diverts data for a specific purpose.  Thus, data in the reservoir is a repository of all data from the selected streams, and any alternative streams, that includes legacy data.  The usefulness of the metaphors are found in the way in which we treat these data.

So, for example, data streams in practical terms in project and business management are the artifacts that represent the work that is being performed.  This can be data relating to planning, production, financial management and execution, earned value, scheduling, technical performance, and risk for each period of measurement.  This data, then, requires real time analysis, inference, and distribution to decision makers.  Over time, this data provides trending and other important information that measures the inertia of the efforts in providing leading and predictive indicators.

Efficiencies can be realized by identifying duplication in data streams, especially if the data being provided into the streams are derived from a common dataset.  Streams can be modified to expand the data that is submitted, so as to eliminate alternative streams of data that add little value on their own, that is, that are stovepiped and suboptimized contrary to the maximum efficiency of the system.

In the case of data reservoirs, what these contain is somewhat different than the large repositories of metadata that must be mined.  On the contrary, a data reservoir contains a finite set of data, since what is contained in the reservoir is derived from the streams.  As such, these reservoirs contain much essential historical information to derive parametrics and sufficient data from which to derive organizational knowledge and lessons learned.  Rather than processing data in real time, the handling of data reservoirs are done to append the historical record of existing efforts to provide a fuller picture of performance and trending, and of closed out efforts that can inform systems approaches to similar future efforts.  While not quite fitting into the category of Big Data, such reservoirs can probably best be classified as Small Big Data.

Efficiencies from the streams into the reservoir can be realized if the data can be further definitized through the application of structured schemas, combined with flexible Data Exchange Instructions (DEIs) that standardize the lexicon, allowing for both data normalization and rationalization.  Still, there may be data that is not incorporated into such schemas, especially if the legacy metadata predates the schema specified for the applicable data streams.  In this case, data rationalization must be undertaken combined with standard APIs to provide consistency and structure to the data.  Even in this case, however, given the finite set since the data is specific to a system that uses a fairly standard lexicon, such rationalization will yield results that are valid.

Needless to say, applications that are agnostic to data and that provide on-the-fly flexibility in UI configuration by calling standard operating environment objects–also known as fourth generation software–have the greatest applicability to this new data paradigm.  This is because they most effectively leverage both flexibility in the evolution of the data streams to reach maximum efficiency, and in leveraging the lessons learned that are derived from the integration of data that was previously walled off from complementary data that will identify and clarify systems interdependencies.

 

Let’s Get Physical — Pondering the Physics of Big Data

I’ll have a longer and less wonky article on this and related topics next week at AITS.org’s Blogging Alliance, but Big Data has been a hot topic of late.  It also concerns the business line in which I engage and so it is time to sweep away a lot of the foolishness concerning it: what it can do, its value, and its limitations.

As a primer a useful commentary on the ethical uses of Big Data was published today at Salon.com in an excerpt from Jacob Silverman’s book, Terms of Service: Social Media and the Price of Constant Connection.  Silverman takes a different approach from the one that I outline in my article, but he tackles the economics of new media that were identified years ago by Brad DeLong and A. Michael Froomkin back in the late 1990s and first decade of the 21st century.  This article on First Monday from 2000 regarding speculative microeconomics emerging from new media nicely summarizes their thesis.  Silverman rejects reforming the system in economic terms, entering the same ethical terrain on personal data collection that was explored by Rebecca Skloot on the medical profession’s genetic collection and use of tissue during biopsies in the book, The Immortal Life of Henrietta Lacks.

What Silverman’s book does make clear–and which is essential in understanding the issue–is that not all big data is the same.  To our brute force machines data is data absent the means of software to distinguish it, since they are not yet conscious in the manner that would pass a Turing test.  Even with software such machines still cannot pass such a test, though I personally believe that strong AI is inevitable.

Thus, there is Big Data that is swept up–often without deliberate consent by the originator of the data–from the larger pool of society at large by commercial companies that have established themselves as surveillance “statelets” in gathering data from business transactions, social media preferences, and other electronic means.

And there is data that is deliberately stored and, oftentimes shared, among conscious actors for a specific purpose.  These actors are often government agencies, corporations, and related organizations that cooperatively share business information from their internal processes and systems for the purpose of developing predictive systems toward a useful public purpose, oftentimes engaged in joint enterprises toward the development of public goods and services.  It is in this latter domain that I operate.  I like to call this “small” Big Data, since we operate in what can realistically be characterized as closed systems.

Data and computing has a physical and mathematical basis.  For anyone who has studied the history of computing (or has coded) this is a self-evident fact.  But for the larger community of users it appears–especially if one listens to the hype of our industry–that the sky is the limit.  But perhaps that is a good comparison after all, for anyone who has flown in a plane knows that the sky does indeed have limits.  To fly requires a knowledge of gravity, the atmosphere, lift, turbulence, aerodynamics, and propulsion, among other disciplines and sciences.  All of these have their underpinnings in physics and mathematics.

The equation that we use in computing is known as Landauer’s Principle.  It is as follows:

kT In 2,

where k is the Boltzmann constant, T is the temperature of the circuit in Kelvins, and In 2 is the natural logarithm of 2.

This equation follows those in thermodynamics established earlier in physics.  What this means is that the inherent entropy in a system–its onward inevitable journey toward a state of disorder–cannot be reduced, it can only be expelled from the system. For Landauer, who worked at IBM in physical computing, entropy is expelled in the form of heat and energy.  For the longest time, given the close correlation and applied proofs of the Principle, this was seen as a physical law, but modern computing seems to be undermining the manner in which entropy is expelled.

Big Data runs up against the physics identified in Landauer’s Principle because heat and energy are not the only ways to expel entropy.  For really Big Data entropy is expelled by the iron law of Boltzmann’s Constant: the calculation of probable states of disorder in the system. The larger the system, the larger the probable states of disorder, and the more our results in processing such information become a function of probability.  This may or many not matter, depending on the fidelity of the probabilistic methods and their application.

For “small” Big Data, the acceptability of variations from the likely outcome is much narrower.  We need to approach being 100% correct, 100% of the time, though small variations are acceptable depending on the type of system.  So, for example, in project management systems, we can be a percent or two off on rolling up data, since accountability is not an issue.  Financial systems compliance is a different matter.

In “small” Big Data, entropy can be expelled by pre-processing the data in the form of effort expended toward standardization, normalization, and rationalization.  Our equation, kT In 2, is the lower bound, that is, it identifies the minimum state of entropy that need be expelled in order to process a bit.  In reality we will never reach this lower bound, but we can approach it until the difference between the lower bound of entropy and the “cost” of processing data is vanishingly small.  Once we have expelled entropy by limiting the states of instability in the data, expelling the cost of entropy through the data pipeline, we can then process the data to derive its significant with a high degree of confidence.

But this is only the start.  For once “small” Big Data undergoes a process to ensure its fidelity, the same pattern recognition algorithms used in Big Data can be applied, but to more powerful and credible effect.  Early warning “signatures” of project performance can be collected and applied to provide decision-makers with information early enough to affect the outcome of efforts before risk is fully manifested, with the calculated probabilities of cost, schedule, and technical impacts possessing a higher level of certainty.