Hadoop Developer

2017-12-11 / Houston, TX

Develop the Shell scripts and python wrappers as part of framework development to load the data in batch and near real-time processes. Develop Scala scripts, UDFs using both Data frames/SQL/Datasets and RDD in Spark for Data Aggregation, queries and writing data back to storage system. Engages in building solutions leveraging technologies including but not limited to Scala, Spark, Hive and AWS services such as EC2, S3, EMR, ELB, Kinesis, SNS, Redshift, Data Pipelines Create the Hive external tables using Accumulate connector and Mongo dB connector. Experience in creating Hive tables and loading data incrementally into the tables using Dynamic Partitioning. Define the Kafka topics and partitions based on the source systems and requirements. Define the Storm Spout and Bolts to read data from Kafka topics and write into HDFS. Complete SDLC of project includes requirements gathering from business and creating design documents (HLD, LLD). Involve in common framework development to create batch and near real-time data pipelines for faster data ingestions from multiple Relational databases and HDFS streamed data. Fine Tuning the Storm topology to determine and configure the right workers, buffer sizes, tuple sync counts and file rotation policies. Define the unified JSON format to publish the messages into Kafka topics from different sources Create the Java Kafka Producer to query RDBMS sources and publish to the Kafka topics. Prepare the workflows using Automation Engine (UC4) tool to schedule the data pipeline processes. Create the SLA documents and Run books to help the Hadoop Admins for Operations and Maintenance.

Apply for the Job using this form