Monday, August 13, 2012

Enterprise Data Architecture, SOA and Data Services



Decades of R&D and growth around EAI, ETL, MDM and SOA has led us to one conclusion alone – data matters. It has always been about data. I guess it’s no secret either that content is the king of consumer web and data is the king of enterprise software. 

Interestingly enterprise data requirements are fundamentally too complex and too closely driven by the high-volume, mutli-dimensional nature of BI systems to entirely be serviced from a messaging layer alone. Unfortunately, the data universe of DW, BI and MDM still continues to live outside the Enterprise SOA Vision/Road-map which significantly dents the overall ROI for IT/SOA expenditure.

Solution Architects (most often under corporate obligations) often fail to notice that a significant chunk of the top-line business value of SOA resides in the data that SOA suite pushes between applications/service components. SOA can efficiently orchestrate ETL/ELT module to process bulk data, batch jobs or file transfer in near real-time. SOA stack can also integrate the BI modules to embed BI metrics within BPEL workflows, generate business events from BI alerts, embed analytics in business rules and hence helps take smart real-time decisions.


SOA plays a significant role in MDM and Information Architecture roadmap for any big/small enterprise. Essentially within the walls of Information Architecture, SOA tier plays a pivotal role for Data Federation encompassing data massaging, validations, cleansing, consolidation etc. However the inherent inefficiencies of XML, associated with large data handling, and the fact that almost all SOA suites are built upon Java containers – it necessitates the use of a highly optimized caching server within the MW tier to Federate IA using SOA. This added setup/maintenance cost of Caching server basically veers us to utilize SOA Data Service alongside Data Federation Tier.

SOA Data Services are not just services operating on XML. Instead SOA Data Services are enterprise data management end-points that expose highly optimized engines for working on all types of data. Data service as a concept predates SOA, in fact it dates back to time when B2B ecommerce and EDI were gaining momentum in the business space in early 1970s. 

Technically, a data service should exhibit 2 or more of following attributes – 
  • Contract based routing, 
  • Declarative API, 
  • Data encapsulation, 
  • Data abstraction and 
  • Service metadata. 
A close look at above mentioned attributes could show the direct relation between Data Services and basic pillars of SOA. However we should never confuse SOA with unified data tier. 

The great value of latest, up-to-date and trusted information is more evident at present times than ever before. “The two second advantage”[REF-1] is indeed a game changer. Apparently I have worked closely on enterprise architectures where SOA was employed as means of integration between ETL, BI, MDM and Web Applications. By building the enterprise components strategically and then by creating elegant unified integration solutions using SOA, enterprises more likely to achieve the following objectives;
  • reduce the TCO for data (MDM), 
  • increase agility (hot plug-gable SOA), 
  • improve performance (real-time batch/DW), 
  • reduce risk (integrated human workflow) and 
  • improve business insight (BI)

Mature Data Integration Architecture is the foundation of strong EA. Success or failure of application integration and SOA composites is directly related to the maturity of data integration tier.

Enterprise level Data Services can be broadly grouped into following categories;

  • Master Data Services can form a trusted single point of reference during real-time or batch data movements. 
  • Batch Data Services can work closely with BPEL/ESB so that the point of control for invoking/running ETL remains inside the SOA tier, just like classic delegation design pattern providing architectural loose coupling. 
  • Data Quality Services are typically recommended to be used inline with other data services, else statically directly with data source. These data services use pre-defined rules and algorithms to clean, reformat, de-dupe the data. 
  • Data Transformation Services are classic data services used primarily for swapping formats of data to meet the downstream system requirements. These data services require a very critical view at the ESB and ETL tier within the EA landscape to make appropriate and architecturally efficient choice. 
  • Finally the Event Services are driven primarily by the EDA/CEP and these data services track, co-relate and propagate data on certain pre-defined triggers with the MW.
In the end I would like to recommend the approach/pointers for prospective adopters of SOA based Data Services;
  1. Start small – don’t go big-bang
  2. Use XML judiciously – explore other means
  3. Evaluate trade-offs diligently for any conflicting solutions/approaches
  4. Don’t be a victim of ‘The Accidental Architecture’
  5. Devil is in the details – take a deep dive
Irrespective of the visible and obvious integration points in existing EA landscape, one must understand that the true value for architecture tier resides in flexibility of the core infrastructure to be reconfigurable with minimal resource overhead. This re-configurability is a central characteristic of a Data Services approach, and the foundation of a successful long-term strategy for Enterprise Data Architecture. 



References:
REF-1: The “Two second advantage” is concept coined by founder of TIBCO Vivek Ranadive'.