BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

How A Converged Architecture Improves Big Data Applications

This article is more than 7 years old.

“You can’t handle the truth.” We all remember Jack Nicholson’s iconic words in A Few Good Men as a seminal moment in pop culture. Yet, that statement has a lot of relevance for companies and their data architecture. When they look at their architecture, businesses should be asking themselves: “Can we handle the truth?”

Big data has created a new reality in which companies have more data at their disposal than ever before. However, just because you have more data doesn’t mean you’re any closer to discovering the real truth. If you don’t have the proper infrastructure in place to seamlessly integrate all your data, from all your sources, you risk not only producing data siloes, but also making decisions on partial truths. For businesses of all sizes, this can have a dramatic impact. For instance, a clothing chain might be able to review customer transactions, but not their browsing history, meaning they have only limited insight into customer behavior. Having customer info spread across multiple applications, none of which are speaking to each other, can greatly impair the effectiveness of your data analytics.

There is an alternative. A converged architecture changes basic assumptions about applications. Instead of using the current application paradigm in which batch, transactional, and streaming applications are all separated, with a converged architecture you can employ all or parts of your application patterns into one to get to a single version of truth. The availability of a converged architecture allows companies to foster creativity and spur innovation. Let’s take a look at how employing this architecture can improve your business.

The Trend Toward Converged Everything

The idea of convergence is highly related to the notion of creating a product. As I pointed out in “Big Data 2.0”, my view is that the primary way that big data and advanced analytics technology to penetrate the wider business world, that is the early and late majority segments, will be through products as opposed to general purpose platforms. The early and late majority segments just don’t have the engineering talent to cobble together a system that fits their needs perfectly and evolves to meet new needs from a general purpose tool. They need a product that fits their use case and can be configured to get the job done.

The idea of convergence is another way to talk about an advanced form of productization. Separate entities must exist before they can be converged. For example, SAP and Oracle created their massive business suites by converging lots of separate applications into a larger integrated package. The goal was to productize the integration between all of the TLA applications that were combined into the suites such as ERP, CRM, MRP, HRM, SCM, and so on. Of course, convergence is a lofty goal and many would say that these business suites didn’t really converge into that pretty of a product. But given that SAP and Oracle are two of the top three software companies in the world, perhaps convergence did succeed just a little bit.

Convergence can also be seen as continuation of the debate about best of breed vs. integrated solutions, a technology conversation that never stops. The advocates of convergence argue that it is possible to productize the integration between many components and create a product that still can have a good fit to a wide scope of requirements. Best of breed advocates argue that you must get a good fit for each particular use case and then knit everything together yourself. A great idea, if you have the skills, money, and energy to do so.

The latest use of convergence outside of the big data realm, is in the infrastructure space. Cisco, for example, offers its UCS (Unified Computing System) as an example of converged infrastructure that combines compute, storage, and networking into a fully integrated package.

But convergence does not succeed unless the package has a good fit to your needs and you actually save money and go faster because of the productized integration. If I am right about my prediction that big data adoption will be driven by products, then we should be able to find evidence of the benefits of convergence in big data applications built with those products.

The Agony and Anguish of Hadoop

Many enterprises have challenges with their architecture because they adopt Hadoop under mistaken assumptions. Hadoop is not an architecture in and of itself, as is commonly believed. Rather, it’s a software platform that is more like a raw toolkit for managing data, instead of a fully-fledged, enterprise-ready system. Therefore, creating a converged data platform means adding solutions and integration to that toolkit that will solve all the problems enterprises might have with data management and analysis. The high availability of a quality converged architecture allows new use cases to be supported in ways that formerly would have been difficult. To be meaningful convergence must accelerate the creation of just the right kind of application you need. In addition, convergence should make it easier to build apps that were never built before.

So, in the big data space, a converged architecture will in effect productize Hadoop by using the raw open source components, adding what’s missing, and then productizing the integration to create a machine to meet a wide scope of use cases. We all know that Hadoop enables enterprises to store their big data economically using a scale out architecture, but it needs some work to make it an enterprise ready product for the wider business world. A converged Hadoop architecture should provide the power of Hadoop, with the agility of an integrated data platform.

Making Sure You See Everything

It is not that a converged architecture makes different promises than a best of breed approach or one that uses a general-purpose platform. The advantage of a converged architecture is that the chance of those promises coming true should be much higher. For example, having a converged architecture allows you to have everyone on your team improve data quality and categorization and ensure that the view you’re getting of your data is accurate and complete. For instance, the marketing team can enhance the data they use with certain attributes while the sales team enhances it with their attributes, providing greater data quality, relevance and depth for the organization as a whole. Additionally, by merging it into a centralized repository, you can have a record-by-record level of how the data is changed, by whom and how, as well as who is using it. The result is a holistic view of your data instead of worrying about data and departmental silos. All your data is integrated into one place so you can feel confident you’re getting the truth and not just a version of it. Jack Nicholson would be so proud.

Use Cases

[Disclosure: I have done content marketing and IT consulting work for MapR, Teradata, Hortonworks, Altiscale, Datameer, and a variety of other big data- and Hadoop-related companies.]

So if this isn’t hyperbole, we should be able to see the benefits a converged architecture offer enterprises in the real world. MapR has been the most aggressive of the Hadoop vendors in pursuing a converged architecture by adding missing components, replacing existing ones, and productizing the integration of a variety of Hadoop-related products. So, I asked MapR to show me a few applications that illustrated concrete benefits of a converged product. Here are a few examples of what convergence has meant for specific applications:

  • Altitude Digital is one of the fastest-growing advertising platforms in the industry. Despite having to manage nearly seven billion transactions per day, using a converged architecture, Altitude Digital is able to select, in real time, the best video advertisement to play at the right time for every person. MapR’s Converged Data Platform allows Altitude Digital to blend file operations, table operations, batch, real-time, structured and unstructured on one platform. In the fast growing and competitive advertising industry the convergence in Altitude Digital’s application means that it has a richer and more up to date model of the video ad, its history, and the target customer, so a better targeting decision can be made.
  • National Oilwell Varco is a $23 billion multinational company that is a leading provider of oil equipment, components, and services. The company relies on MapR to perform real-time analysis to optimize oil and gas drilling and production. With hundreds of billions of data points to contend with, NOV required the converged data platform to support time series analysis and access by common tools such as Apache Spark and Drill for near-instantaneous visualization of months and years of data. One of the first business use cases for the converged data platform is to power Condition-Based Maintenance. With convergence, all sensor data are now accessible by any authorized user or application at any time for analytics, machine learning, and visualization. By providing a much more robust, real time view, maintenance staff can get ahead of problems and avoid downtime.
  • Xactly provides secure, cloud-based incentive compensation and sales performance management solutions. With the business rapidly expanding and transactions growing from a half-billion to billions of transactions per month, their traditional technology platform could not cost-effectively keep up with their future growth trajectory. They required MapR’s Converged Data Platform to not only cost effectively scale but also to develop a new web app for companies to easily design, manage, and optimize incentive compensation programs. With the converged platform they were able to leverage file, database, and streaming analytics on the same cluster that enabled their customers to quickly compare comp plans to other similar businesses and ultimately save time, reduce risk, and motivate and align employee behaviors with corporate goals.
  • A leading financial services company has experienced phases of convergence. In the first phase they started with little insight into customer activities on the web; they would just collect info from their site and integrate that data with information about clients from other systems. They needed that data to be available and protected. They required Hadoop converged with enterprise storage capabilities to ensure HA, DR, and data corruption prevention. In the second phase, they required dynamic content on the website, when customers logged in, the company knew what customers were interested in and presented smart banners, customized content, and special offers. The second phase required real-time database capabilities integrated into the platform. However, this content and insight were not integrated with other customer-facing systems, such as the call center, where updates lagged by weeks. The third phase of convergence was all about real-time. All customer-facing applications are in synch and updated in real-time. Customer activity on the web is immediately reflected in the customer support applications saving time and improving service. This final phase of convergence integrates streaming data and analytics to synchronize data flows and real-time responses.

These examples bring to life what convergence can mean in the big data space. I believe that all of the Hadoop vendors will eventually stop being so focused on open-source purity and will realize that they need to make things easier on the customer. In my view, that is a vision for a big data product that is as converged as possible on customer needs.

Follow Dan Woods on Twitter:

Dan Woods is on a mission to help people find the technology they need to succeed. Users of technology should visit CITO Research, a publication where early adopters find technology that matters. Vendors should visit Evolved Media for advice about how to find the right buyers. See list of Dan's clients on this page.

 

Follow me on Twitter or LinkedInCheck out my website