· Develop the Shell scripts and python wrappers as part of framework development to load the data in batch and near real-time processes. · Develop Scala scripts, UDFs using both Data frames/SQL/Datasets and RDD in Spark for Data Aggregation, queries and writing data back to storage system. · Engages in building solutions leveraging technologies including but not limited to Scala, Spark, Hive and AWS services such as EC2, S3, EMR, ELB, Kinesis, SNS, Redshift, Data Pipelines · Create the Hive external tables using Accumulate connector and Mongo dB connector. · Experience in creating Hive tables and loading data incrementally into the tables using Dynamic Partitioning. · Define the Kafka topics and partitions based on the source systems and requirements. · Define the Storm Spout and Bolts to read data from Kafka topics and write into HDFS. · Complete SDLC of project includes requirements gathering from business and creating design documents (HLD, LLD). · Involve in common framework development to create batch and near real-time data pipelines for faster data ingestions from multiple Relational databases and HDFS streamed data. · Fine Tuning the Storm topology to determine and configure the right workers, buffer sizes, tuple sync counts and file rotation policies. · Define the unified JSON format to publish the messages into Kafka topics from different sources · Create the Java Kafka Producer to query RDBMS sources and publish to the Kafka topics. · Prepare the workflows using Automation Engine (UC4) tool to schedule the data pipeline processes. · Create the SLA documents and Run books to help the Hadoop Admins for Operations and Maintenance.