Data-Driven Decision Making

The Argument for Open

A computer science prof conducts his own study of proprietary versus open source BI-- and stands by his 'open' results. by

The Argument for Open Just last year, business intelligence consultant David Wells, former director of education at The Data Warehousing Institute, remarked in Campus Technology that enterprise-level open source BI (OS BI) was not quite ready for prime time (see "Open for Business," CT February 2007). Still, he urged us not to wait five years or implement a proprietary system in the meantime: OS BI would be ready in a year or two, he predicted.

He was right.

Recent advancements in OS BI tool integration from companies such as Pentaho, Talend, and Jaspersoft-- coupled with the confusing and sometimes disruptive merging of proprietary BI vendors and product shuffling-- have produced disarray in product stability (not to mention licensing issues), thus moving OS BI tools to the top of the list for many users wanting to implement a BI solution. In fact, technology publications and organizations like CT and TDWI have in the past five years covered a growing number of OS BI options. It's now time to launch that BI initiative you've been contemplating-- and launch it with open source.

There are several categories of OS BI solutions: complete suites with commercial support and partnered consulting; individual modules or services developed specifically to work with other BI-related modules; frameworks and APIs intended for in-house programmers to use as a starting point; and even office suites-- for example, OpenOffice, with built-in database connectivity and a report extension, Sun Report Builder. In addition, there are many open source report tools such as (but not limited to) Pentaho Reporting, JasperReports, DataVision, OpenRPT, and Eclipse BIRT. No matter what your experience level or budget, there's an open source solution that can work for your organization.

I thoroughly investigated each of three OS BI toolsets by installing the OS BI tools myself, by interviewing technologists at academic institutions who had implemented these OS BI solutions, and by discussing the products with their developers/vendors. The three OS BI products I installed-- in addition to the OpenOffice solution that I was already running on my personal desktop-- were Pentaho BI Suite, Jaspersoft BI Suite, and Talend Open Studio. None of the installations took longer than 15 minutes, at which time I had real products accessing real databases-- not just simulated demos. For Jaspersoft, I caved in and downloaded the 30-day trial rather than downloading the four individual modules of the community suite, just to save a few minutes of downloading and setup time. But otherwise, the products I installed were free and open source with no subscription required, yet were enterprise-level applications.

I interviewed technologists at four institutions. Each organization I interviewed had evaluated a number of products, ranging from only a couple to seven or eight, including both open source and proprietary options.

Which Product?

With all this talk about standalone BI tools and packages, what about your enterprise resource planning (ERP) package and its BI modules? Maybe you can relate to the unnamed higher ed institution that purchased proprietary BI tool components along with its other ERP apps eight years ago. Additional time, money, and effort was spent to secure proprietary training, and then technologists worked for three years to get the online analytical processing (OLAP) module into production-- before they abandoned the entire BI project altogether. Yes, the project was eventually re-initiated: The proprietary OLAP and enterprise data warehouse (EDW) are now being implemented and the project is becoming operational this time. Not completed; just operational.

Luc Boudreau is a BI architect at the University of Montreal, Department of the Registrar Bureau, with previous experience in implementing BI and in understanding the open source paradigm. He puts it simply: "It's a myth that proprietary commercial software solutions are going to just 'work.'" In fact, I have served on software selection teams where proprietary vendors show off their top-of-the-line product features yet indicate only basic price. Not surprisingly, after a few of these meetings, I began to look at open source applications, and I noticed some dramatic differences between open source and proprietary apps. Although many of the differences I describe below are not peculiar to BI tools, the complex and permeating nature of BI is such that these differences are compounded through the use of these applications and so, in this context, need to be summarized.

With OS BI products the evaluation process is relatively painless: Download, install, and run the demo.

The Open Source Difference: Compounded in BI

Open is open. While it may seem either obvious or a non-issue, OS BI tools are, well, open source, and that's not trivial. Proprietary tools are essentially black boxes; institutions must trust their internal functionality. If there are implementation problems, clients are forced to trust the code (and, ultimately, the vendor) until, after many headaches and much overtime, they can prove beyond a shadow of doubt that the problem lies within the black box.

Language advantages. On the other hand, when the source code is readily available (for free; no small point) and exists in a standard, popular language rather than some unique proprietary language, it is simple to look into the "clear box" and verify the code, if necessary. What's more, with the magnitude of components that exist in a BI toolset, it is imperative to validate the configuration as it is built, rather than test the completed system only. (It's a combinatoric math thing.) And because open source code is written in a publicly available language (Java, XML, PHP, etc.), not only is it easy to find staff who know the language, but the programming language faults and problems have behind them an entire community of developers and testers continually verifying and improving the code: double assurance. (As for the proprietary BI tool, it's often written in a proprietary language that is a black box inside another black box.)

Extensibility. Openness of code and having the advantage of a community strengthening that code are two OS BI factors that promote a completely extensible product. It's not merely that campus developers or technologists have the ability to change the code to add a feature, enhancement, or correction. Because of the community paradigm of open source, you prototype your enhancement and submit it to the open source developers. They, in turn, make it available to the community for testing and perfecting. If you simply added it to your individual solution, it might not remain compatible when the next open source version was released. With the open source BI paradigm, your enhancement (when fed back to the community) becomes part of the next release and you don't have to maintain the enhancement code. Additionally, you get the benefit of other user-client enhancements as they submit their work to the community, fostering a much faster improvement pace for the OS BI toolset than for its proprietary counterparts.

The 'integration' factor. Only a few years ago, proprietary vendors used terms like "internally integrated" and "monolithic software" as if these characteristics were advantageous over what they called "patched-together" open source software. Most proprietary vendors now know better-- especially after all the recent buyouts and mergers. From the very beginning, open source communities have used open standards and modular (rather than monolithic) paradigms, allowing for best-of-breed components to be integrated as needed. For example, the LAMP stack (Linux, Apache, MySQL, PHP) became an early mainstay in web development circles. And, although LAMP still is a powerful combination, replacing PHP with Perl or Ruby is not a difficult chore because the stack components are modular and based on open architectures. Because of the modular, open architecture of OS BI tools, typically, you can pick and choose individual components; mix existing tools with inhouse solutions; or export information to other databases and warehouses farther up the institutional chain, or to other external heterogeneous systems.

Support, and M&As. Many proprietary vendors now are using the integrated approach, but not necessarily because they see how well it works in the open source paradigm. The truth is, it's more a matter of streamlining and company downsizing. Can you say patched together? If a company has an extraction, transformation, and loading (ETL) component and merges with another company that has an ETL component, eventually one of the components will have to be retired. If you are the lucky customer, maybe your ETL (or dashboard, reporting, data mining, or analytics tool) is the one the vendor keeps.

Yet "lucky" may not be the correct word: Even if the company keeps your tool, you may lose features or support due to vendor need to assimilate the new product suite. And since the individual features or tools were not developed to be standalone, time-consuming issues and loss of effectiveness may linger indefinitely. Yet there is not much you can do to avoid such a fate, since the vendor owns the code and you are only permitted to use it. If you stop paying the annual support fee of 12 to 25 percent of the original purchase price (some end users call it a tax), you most assuredly won't receive any updates and, depending on the vendor, you even may be prohibited from using your currently installed software. The fact of the matter is that when it comes to BI, vendor lock-in should be an issue of greatest concern. After all, the one thing you cannot afford is for all of your institution's strategic information and operational data to be held hostage by an external party.

OS BI Advantages Over Proprietary: A Snapshot
  • Openness. No black box to have to trust.
  • Language. Code is written in common languages (Java, XML, etc.) that programmers already know or can learn.
  • Extensibility. OS BI is completely extensible.
  • Support. Multiple levels of support options with no vendor lock-in; can change or drop support at any time.
  • Standalone or module. Standalone modules integrate well with each other.
  • Database integration. Easier to integrate with external databases of choice.
  • Best-of-breed. Ability to use best-of-breed components.
  • Tested enhancement availability. Large community of developers/users providing enhancement features.
  • Product improvement. Faster product development.

Lance Walter, marketing VP for Pentaho, may have an open source bias, but it's hard to argue his point: According to Walter, "'Single application' is quickly becoming a myth, given BI vendor consolidation. Cognos, Business Objects, and Hyperion all executed multiple acquisitions in the year before they were acquired by IBM, SAP, and Oracle, respectively, and in some cases their acquiring vendor had significant product overlap. It's possible to have a single vendor for business intelligence, but a single application on a large scale is pretty hard to achieve with the consolidators. Oracle has Brio [via the Hyperion acquisition], Sqribe [acquired by Brio in 1999], Siebel Analytics, and Hyperion in its portfolio. SAP has Business Objects, Cartesis [acquired by Business Objects in 2007], SRC [acquired by Business Objects in 2005], Crystal Reports [acquired by Business Objects in 2004], Xcelsius [acquired via Business Objects' buyout of Infomersion in 2005], etcetera. There's a wide range of functionality available, but there's no way you're going to just deploy a single application or technology infrastructure that can provide all of it."

Clearly, this is only a problem with proprietary software vendors, because with open source vendors you can drop support at any time and find another firm to support your software. (Yes, even with a complex toolset like BI.) In addition, you can opt to not purchase any support at all, keep the software, and still get version updates from the project community.

The Sales Connection

Impress the big guy (or gal). Another issue unique to dealing with proprietary vendors is that of the relationship fostering that may exist between the executive administration and the sales representative. My direct experience with the executive-vendor relationship phenomenon is that the campus execs generally want to retain decision authority and are torn when it comes to purchasing decisions-- this, because of the apparent information disparity that can exist between vendor pitches and internal IT staff recommendations/clarifications. Let's face it: Vendor sales reps are trained to offer more information more powerfully argued than internal IT staffers who may not be experienced at business arguments or internal marketing. But here's the sales rep, with intimate knowledge of his product and the industry sector. And he's displaying the product's eye candy as if it were part of the basic system, when it actually adds an additional 20 percent. Unfair, you say? Hey, I'm not making it up.

The OS BI pitch. But here's how that same scenario might play out for a typical commercial open source BI vendor: The software is free. The support, if purchased, is similar to that of the proprietary vendor's support, except that requests for enhancements usually won't be charged at the $100-plus/hour development fee. Multiple vendor partners-- whose roles are to handle institutional understanding of the product's use and added value potential particular to that institution-- are available if desired. Both support and vendor partners know that their services can be terminated at any time without adverse effect to the institution. Therefore, it is in the vendors' best interest to be upfront, open, and honest about what the product currently can (or cannot) do, and how the product should be deployed to add the most value for the institution.

With proprietary BI's 'black box' approach, you have no access to strategic information-- you simply have to trust it'll work.

Looking at OS BI Implementations

David Jordan is a data warehouse architect at the Lineberger Comprehensive Cancer Center at The University of North Carolina at Chapel Hill, which employs about 300 staff. Jordan claims he is happy with his decision to use open source for his BI toolset and would make the same choice again because "it is at least an order of magnitude cheaper for what we believe are equivalent features [compared to proprietary commercial BI products]." But cost isn't his only reason for the open source choice: Jordan also cites extensibility and vendor support.

The BI product evaluation. As for my own OS BI evaluation and installation experience, I saw the benefits starting right up front, with the self-assessment and product-evaluation stage. Typically, with proprietary enterprise-level applications, the requirements assessment and software evaluation are time-consuming processes. Technologists ordinarily look at all possible options relative to the assessed requirements, and then create a short list of three or four products worthy of detailed scrutiny.

With open source products, however, the evaluation process is relatively painless: Download it, install it, and run the demo. In my evaluation of the Pentaho BI suite, for instance, the installation process took less than 15 minutes in all, including looking at its online help (a wiki) to make sure I was doing everything correctly. (Had I actually read the instructions first, total installation time probably would have been closer to 10 minutes.) While I was on the download page, I also found an alternate demo version of the BI suite that didn't even require an installed database server. The complete database auto-install and configuration was included in one package that took less than five minutes to download. With my Firefox browser (also open source), I pointed to localhost:8080/pentaho as directed and was in the BI dashboard. I don't like calling it a demo, however, as this was not a simulation or a detailed image of a dashboard, but a dynamic web page that was driven by data from the warehouse. Jaspersoft and Talend products installed just as easily.

OS BI Implementation Checklist

This checklist differs from the one you'd use for proprietary BI. Can you spot the differences?

  • Determine need, and level of need
  • Demo a few products
  • Gain familiarity with OS BI components and community
  • Foster and obtain administrative buy-in
  • Find administrative champion or sponsor
  • Quantify current data distribution and mapping, if not already known
  • Agree on implementation priority and timeline
  • Form implementation team (including users of various levels of experience)
  • Decide which BI tool(s) to implement
  • Begin training on the system (whether you're a spender or a budget miser)
  • Validate current data integrity and accuracy before proceeding
  • Refine and standardize metadata
  • Begin extracting, transforming, and loading (ETL) data into BI system (could also be implemented after the next three modules)
  • Implement basic reporting capability
  • Implement analysis (analytics) capability
  • Implement dashboard services
  • Implement data mining
  • Implement workflow services
  • Get a raise and/or a big bonus

The point is: To get this much of a demo from a proprietary vendor can take several days of coordination and organization-wide schedule de-conflicting and, frankly, none of that even takes place until weeks or months after initial contact with the vendor sales team. What's more, vendors usually prefer to coordinate with the VP for operations or the president of the school, and they usually want two days to show the product, with meetings scheduled over the course of those days.

When searching for a BI tool for his own center, Jordan relates, "Several vendors would not even allow us to do a hands-on evaluation. One vendor was going to charge us $15,000 for a 30-day evaluation."

Real-world pros and cons. And what about the extensibility that was so important to Jordan? Speaking of his open source solution, he explains, "Its architecture is designed to be extensible, allowing you to incorporate your own Java code if you need to. While we have added some JavaScript scripts in the ETL tool, we have not yet needed to do any enhancements via Java. But it is comforting to know that we can, if we need to. And since [our] software is implemented in Java, it easily runs in any of our computing environments."

As to roadblocks or obstacles to implementation, few were mentioned, but Boudreau at the University of Montreal does admit, "Data migration requires a bit of work; you have to get your hands in there." (He also sees that as an advantage over the proprietary black box "that you have to trust will work. We're talking about strategic information here.") And he points out that, as in any implementation, "Multiple issues will arise from the competence of your employees; you are not the only one working on a project. It requires a willingness to learn how the [open source] community works."

What about "selling" the administration on open source software? Says Boudreau, "It depends mostly on their perspectives on software: Some of our administrative people are aware that open source works. But you also have to be able to integrate the proprietary with the open source." It is usually the proprietary "black box" product that makes the integration difficult.

Jordan's only reservation: "There is a lack of sufficient documentation in some areas." But this, he claims, is offset by the training, which he terms "effective," and by the fact that his vendor representative was a pleasure to work with.

Joe Burden, health information systems project manager at the Jim Ayers Institute for Precancer Detection and Diagnosis at Vanderbilt University (TN), has over a decade of technology experience, and has been implementing an open source BI solution for about six months now. Burden says, "We chose open source because of a need for a low entry point [both in terms of funding and personnel] and to be able to share information and collaborate with others." He cites open standards, market stability, and flexible licensing as important considerations in his final decision to go with an open source product, and though still implementing, he is pleased with the product and support.

On the international front, Bernhard Pfeifer, head of the Research Group for Biomedical Modeling, Institute of Biomedical Engineering at the University for Health Sciences, Medical Informatics, and Technology (Austria), has about 110 staff and 18 researchers in bioinformatics. Pfeifer chose Talend OS BI, and claims, "It was an easy decision after evaluating two other proprietary products and another open source solution." He explains: "You can look into and adapt to all situations. With open source, you get a deeper insight, allowing the general development to go faster." Pfeifer applauds Talend support. "They really cooperate with our special needs and goals."

Best Practices for OS BI Implementation

Heed these tips for successful open source-- or proprietary, for that matter-- deployments:

Begin early in the project to establish an internal team of competent, innovative thinkers representing various user, maintainer, and supervisory roles; individuals who have a willingness to work for the good of the project and not for personal agendas or departmental-only gains. Work with your superiors to gain their trust by showing integrity and thoroughness in your evaluations, observations, and recommendations. Without adhering to these two core tenets, there is little chance of success for any such complex project.

Get the OS BI message across. If you have "sole responsibility" for the campus BI project, yet lack final decision on vendor selection, I feel for you. Remember, the proprietary vendor sales force tends to focus on those who are known to be able to make final purchasing decisions, whether or not those individuals have a thorough understanding of IT. While open source products (which are now common in campus infrastructures) are generally accepted for less-than-enterprise user applications, quite often there is still resistance from top-level administration to open source at the enterprise-wide level. This is understandable because of the traditional paradigm for ERP evaluation, and the low level at which program development occurred (making programs extremely complex and time-consuming to write, hard to make into a reliable product, and hard for the user to understand). Today's high levels of abstraction and the elaborate and efficient open source community development paradigm simplify the development process.

But it is important that you-- the technologist or campus executive who "gets" the import of moving to open source at this enterprise-wide level-- gently instruct the administrators of the benefits of open source in a product of this magnitude. UNC-Chapel Hill's Jordan, with his nearly 30 years of software architecture and development experience, did have a distinct advantage here: His administrative superior allowed him full control over the BI tool selection process. If this is not your situation, you may be swimming upstream until you can demonstrate the viability of open source user-level applications in general. Vanderbilt's Burden says, "From a corporate perspective, the biggest hurdle has been the lack of understanding of the open source product and support models."

OS BI Tools: A Sampler

Studios, Suites, and Integrated Tools

Individual Modules, Simple OS BI

Don't neglect policy and process. Few other enterprise-level applications can affect policy as much as an enterprise-wide BI system, so it is imperative that institutional policies, business rules, and technology tools are correctly melded. It may be wise to include, as part of the project-planning phase, time to rework the strategic and IT plans. Then, move forward to integrate them into a seamless overarching policy designed to guide knowledge workers toward making effective use of data, processes, and business intelligence.

'Will open BI work?' is a frequently asked question, along with, "Will it add value to our data by transforming the data into intelligence?" Jordan believes it will: "We're going to discover some interesting and very useful patterns and relationships in our data by using [our open source data-mining] Machine Learning toolkit, which could guide us toward more effective research projects."

OS BI: Up-and-Comer

Not quite sure there's a future for open source BI? The University of Montreal's Boudreau is confident there is: "OS BI is 'growing up' to be a major player in the coming years. In fact, many of the serious open source companies are only about six years old and are already contending with the big players in the proprietary circles."

As it turns out, David Wells may indeed be right: Your institution's strategic IT plan for the next few years should include careful evaluation of OS BI. And if you're already involved in a BI product assessment process, don't overlook those easy-to-evaluate open source demos!

::WEBEXTRAS ::
Is Open Source the ERP Cure-All?
It's 'Open' Season.

-Rob Byrd is a professor in the School of Information Technology and Computing at Abilene Christian University (TX) and a Certified Manager of Quality and Organizational Excellence (CMQ/OE).

comments powered by Disqus