Role : AWS Data Engineer with Python (Healthcare domain)
Location: Remote – Alpharetta, GA
Exp: 11 Years
Skills: AWS, Python, Healthcare domain, Spark SQL, Airflow, Healthcare data, production-grade pipelines, streaming pipelines , Kafka or SQS
Job Description:
Responsibilities:
- Design and implement robust, production-grade pipelines using Python, Spark SQL, and Airflow to process high-volume file-based datasets (CSV, Parquet, JSON).
- Lead efforts to canonicalize raw healthcare data (837 claims, EHR, partner data, flat files) into internal models.
- Own the full lifecycle of core pipelines — from file ingestion to validated, queryable datasets — ensuring high reliability and performance.
- Onboard new customers by integrating their raw data into internal pipelines and canonical models; collaborate with SMEs, Account Managers, and Product to ensure successful implementation and troubleshooting.
- Build resilient, idempotent transformation logic with data quality checks, validation layers, and observability.
- Refactor and scale existing pipelines to meet growing data and business needs.
- Tune Spark jobs and optimize distributed processing performance.
- Implement schema enforcement and versioning aligned with internal data standards.
- Collaborate deeply with Data Analysts, Data Scientists, Product Managers, Engineering, Platform, SMEs, and AMs to ensure pipelines meet evolving business needs.
- Monitor pipeline health, participate in on-call rotations, and proactively debug and resolve production data flow issues.
- Contribute to the evolution of our data platform — driving toward mature patterns in observability, testing, and automation.
- Build and enhance streaming pipelines (Kafka, SQS, or similar) where needed to support near-real-time data needs.
- Help develop and champion internal best practices around pipeline development and data modeling.
Skillset:
- 10+ years of experience as a Data Engineer (or equivalent), building production-grade pipelines.
- Strong expertise in Python, Spark SQL, and Airflow.
- Experience processing large-scale file-based datasets (CSV, Parquet, JSON, etc) in production environments.
- Experience mapping and standardizing raw external data into canonical models.
- Familiarity with AWS (or any cloud), including file storage and distributed compute concepts.
- Experience onboarding new customers and integrating external customer data with non-standard formats.
- Bonus: Familiarity with healthcare data (837, 835, EHR, UB04, claims normalization).
- Bonus: Prior experience working on complex migration projects
Warm Regards,
Prema
Reveille Technologies, Inc
prema@reveilletechnologies.com
Desk Number : (704) 444 -0697 Ext 829