Case Study : Elegant MicroWeb Creates Custom Apache Spark ETL Utility for Global Data Management and Analytics Firm
The Client is a leader in enterprise transformation, data engineering and an acknowledged world-class Ab Initio delivery partner. The Client has offices in India, the United States, and the United Kingdom and provides global services focused on data integration, data analytics and data visualization.
The Client provides global services focused on data integration, data analytics and data visualization with offices in the U.S., U.K. and India. The Client wished to establish a standalone ETL utility in order to join two Type Unit Data Format (TUDF) files, separate updated records, new records and invalid records, drive command line parameters using two TUDF input files, process files, generate output files and handle more than sixty (60) million records and more than 3GB of files with a data processing rate of more than 10000 TPS. The Elegant MicroWeb team designed an Apache Spark architecture, developed custom readers for TUDF file formats, built a complex validation and data processing process, designed TUDDF definitions using Spark UDF, designed a process to manage and process records and generate output, including log generation. The Elegant MicroWeb team performed comprehensive performance testing to ensure performance and dependability and trained the Client team to transfer the solution and its support to the Client technology and implementation teams. The resulting solution ensures scalability and performance and meets Client requirements for records management and processing.