Lead Data Engineer (Python/AWS)

Indeed

Full-time

Onsite

No experience limit

No degree limit

79Q22222+22

Favourites

Description

Summary: Seeking an experienced Lead Data Engineer proficient in PySpark to build ETL pipelines and data lake architectures on AWS, integrating data from diverse enterprise sources. Highlights: 1. Lead data engineering initiatives on large-scale AWS projects 2. Utilize PySpark, AWS Glue, and Airflow for robust data pipelines 3. Integrate complex enterprise data sources like SAP and OSI PI We are seeking an experienced **Lead Data Engineer** with advanced expertise in PySpark and hands\-on experience building ETL pipelines, data lake architectures, and integrating data feeds on AWS. You will handle both structured and unstructured data, ingesting information from a variety of on\-premises and enterprise sources such as SAP, Intelex, SQL, and OSI PI into AWS. This position provides the chance to work on large\-scale data projects and collaborate with diverse teams in a fast\-paced setting. **Responsibilities** * Create, refine, and manage ETL pipelines using PySpark and AWS Glue Jobs to process extensive structured and unstructured datasets * Coordinate data workflows with Apache Airflow, ensuring dependable scheduling, dependency management, and effective error handling * Develop and sustain data feeds from on\-premises and enterprise systems into AWS data lake environments * Integrate with enterprise sources including SAP for ERP and operational data, Intelex for environmental, health, safety, and quality data, SQL databases for relational data, and OSI PI for real\-time industrial and process historian data * Build and oversee API interactions to retrieve data from on\-premises services into AWS * Manage data extraction, transformation, and loading across multiple formats and protocols * Assist in designing and maintaining AWS data lake architectures using Amazon S3, AWS Glue, and Lake Formation * Ensure data is properly cataloged, partitioned, and optimized for analytics and reporting * Apply data quality checks, validation, and lineage tracking throughout all pipelines **Requirements** * At least 5 years of experience in data engineering positions * Minimum one year of experience leading and managing development teams * High\-level proficiency in Python and PySpark for data processing and pipeline creation * Strong foundation in ETL processes for data integration * Experience coordinating workflows with Apache Airflow * Demonstrated success building production\-grade data pipelines on AWS * Hands\-on experience with AWS Glue Jobs for ETL operations * Familiarity with Amazon S3, data lake methodologies, and data cataloging practices * Experience with AWS\-native monitoring and operational tools * Skilled in integrating enterprise systems via APIs, JDBC, or native connectors, including SAP, Intelex, SQL databases, and OSI PI * Capability to work with both structured and unstructured data formats * Excellent skills in documentation, communication, and collaboration * English proficiency at B2\+ level or higher, both written and spoken **Nice to have** * Experience working with energy, oil \& gas, or industrial data environments * Knowledge of Drilling and Completions data flows and terminology

Source: indeed View original post