Job Title: Data Engineer (Iceberg Experience) – 8+ Years of overall Experience.
Job Description: As a Data Engineer with Iceberg experience, you will play a crucial role in the design, development, and maintenance of our data infrastructure.
Your work will empower data-driven decision-making and contribute to the success of our data-driven initiatives.
Key Responsibilities: Data Integration: Develop and maintain data pipelines to extract, transform, and load (ETL) data from various sources into AWS data stores for both batch and streaming data ingestion.
AWS Expertise: Utilize your expertise in AWS services such as Amazon EMR , S3, AWS Glue, Amazon Redshift, AWS Lambda, and more to build and optimize data solutions.
Data Modeling: Design and implement data models to support analytical and reporting needs, ensuring data accuracy and performance.
Data Quality: Implement data quality and data governance best practices to maintain data integrity.
Performance Optimization: Identify and resolve performance bottlenecks in data pipelines and storage solutions to ensure optimal performance.
Documentation: Create and maintain comprehensive documentation for data pipelines, architecture, and best practices.
Collaboration: Collaborate with cross-functional teams, including data scientists and analysts, to understand data requirements and deliver high-quality data solutions.
Automation: Implement automation processes and best practices to streamline data workflows and reduce manual interventions.
Experience working with bigdata ACID file formats to build delta lake, particularly with Iceberg file formats and loading methods of Iceberg.
Good knowledge on Iceberg functionalities to use the delta features to identify the changed records, optimization and housekeeping on Iceberg tables in the data lake.
Must have: AWS, ETL, EMR, GLUE, Spark/Scala, Java, Python, Good to have: Cloudera – Spark, Hive, Impala, HDFS , Informatica PowerCenter, Informatica DQ/DG, Snowflake Erwin Qualifications: Bachelor's or Master's degree in Computer Science, Data Engineering, or a related field.
5 to 8 years of experience in data engineering, including working with AWS services.
Proficiency in AWS services like S3, Glue, Redshift, Lambda, and EMR.
Knowledge on Cloudera based hadoop is a plus.
Strong ETL development skills and experience with data integration tools.
Knowledge of data modeling, data warehousing, and data transformation techniques.
Familiarity with data quality and data governance principles.
Strong problem-solving and troubleshooting skills.
Excellent communication and teamwork skills, with the ability to collaborate with technical and non-technical stakeholders.
Knowledge of best practices in data engineering, scalability, and performance optimization.
Experience with version control systems and DevOps practices is a plus.