Job Requirements
Washington, DC Manhattan, NY
Secret Polygraph not specified
Career Level not specified
$175,000 - $275,000
Job Description
Data Engineer – Pipeline Operations & Incident Response
Overview
This role is heavily focused on maintaining and stabilizing large-scale data pipelines in a production environment. The majority of time is spent troubleshooting and resolving issues across existing data workflows rather than building new systems.
Early success in this position looks like gaining enough familiarity with the platform, data flows, and key stakeholders to independently diagnose and resolve pipeline failures across multiple environments.
Key Responsibilities
Investigate and resolve data pipeline failures across multiple production environments
Perform root cause analysis on data quality and pipeline performance issues
Apply targeted code fixes and adjustments to restore pipeline functionality
Monitor pipeline health and respond to alerts within defined SLAs
Support and maintain existing ETL processes rather than developing new ones
Refactor pipelines to resolve performance issues such as memory constraints or inefficient processing
Coordinate with upstream data providers and internal teams to resolve data ingestion issues
Escalate issues when access, ownership, or dependencies fall outside immediate control
Day-to-Day Breakdown
~85–90%: Debugging, incident response, and pipeline issue resolution
~5–10%: Monitoring, validation, and health checks
~5–10%: Minor code updates, optimizations, and pipeline adjustments
Work is centered on fixing and stabilizing existing pipelines, not building new ones from scratch.
Technical Environment
Predominantly batch-based ETL pipelines (incremental processing is common)
High-volume pipeline ecosystem spanning multiple data domains and environments
Mix of code-driven pipelines and low-code/visual pipeline tools
Streaming pipelines are minimal
Required Technical Skills
Strong experience with large-scale data engineering and ETL/ELT workflows
Proficiency in Python and distributed data processing frameworks (PySpark preferred)
Solid understanding of dataframes and data manipulation at scale
Experience troubleshooting production data pipelines and debugging failures
Knowledge of relational databases and SQL fundamentals
Familiarity with distributed computing concepts
Additional Technical Exposure
Experience with Java or similar languages (C++ acceptable alternative)
Ability to diagnose and resolve memory/performance issues in distributed jobs
Exposure to visual pipeline tools or data workflow platforms is helpful
Basic understanding of networking concepts and API-based data ingestion
Operational Environment
Engineers support a large number of pipelines across multiple environments simultaneously
Work is highly reactive, driven by incoming alerts and data incidents
Engineers are expected to quickly assess and troubleshoot pipelines they have not previously worked on
High alert volume, with multiple issues often tied to common root causes
Collaboration
Frequent interaction with data providers to resolve source data issues
Regular coordination with cross-functional technical teams on pipeline failures
Occasional engagement with end users reporting data discrepancies
On-Call & Incident Response
Rotating on-call schedule supporting different pipeline groups
Some rotations may include off-hours alerts tied to overnight pipeline processing
Majority of incidents handled during business hours, with occasional escalation scenarios
Engineers are expected to own resolution when possible and coordinate when dependencies exist
Ideal Candidate Background
Strong foundation in data engineering within production environments
Experience supporting operational data systems rather than purely building new solutions
Comfortable working in high-volume, incident-driven environments
Able to quickly understand and troubleshoot unfamiliar systems
Hands-on experience with distributed data processing and large datasets
Overview
This role is heavily focused on maintaining and stabilizing large-scale data pipelines in a production environment. The majority of time is spent troubleshooting and resolving issues across existing data workflows rather than building new systems.
Early success in this position looks like gaining enough familiarity with the platform, data flows, and key stakeholders to independently diagnose and resolve pipeline failures across multiple environments.
Key Responsibilities
Investigate and resolve data pipeline failures across multiple production environments
Perform root cause analysis on data quality and pipeline performance issues
Apply targeted code fixes and adjustments to restore pipeline functionality
Monitor pipeline health and respond to alerts within defined SLAs
Support and maintain existing ETL processes rather than developing new ones
Refactor pipelines to resolve performance issues such as memory constraints or inefficient processing
Coordinate with upstream data providers and internal teams to resolve data ingestion issues
Escalate issues when access, ownership, or dependencies fall outside immediate control
Day-to-Day Breakdown
~85–90%: Debugging, incident response, and pipeline issue resolution
~5–10%: Monitoring, validation, and health checks
~5–10%: Minor code updates, optimizations, and pipeline adjustments
Work is centered on fixing and stabilizing existing pipelines, not building new ones from scratch.
Technical Environment
Predominantly batch-based ETL pipelines (incremental processing is common)
High-volume pipeline ecosystem spanning multiple data domains and environments
Mix of code-driven pipelines and low-code/visual pipeline tools
Streaming pipelines are minimal
Required Technical Skills
Strong experience with large-scale data engineering and ETL/ELT workflows
Proficiency in Python and distributed data processing frameworks (PySpark preferred)
Solid understanding of dataframes and data manipulation at scale
Experience troubleshooting production data pipelines and debugging failures
Knowledge of relational databases and SQL fundamentals
Familiarity with distributed computing concepts
Additional Technical Exposure
Experience with Java or similar languages (C++ acceptable alternative)
Ability to diagnose and resolve memory/performance issues in distributed jobs
Exposure to visual pipeline tools or data workflow platforms is helpful
Basic understanding of networking concepts and API-based data ingestion
Operational Environment
Engineers support a large number of pipelines across multiple environments simultaneously
Work is highly reactive, driven by incoming alerts and data incidents
Engineers are expected to quickly assess and troubleshoot pipelines they have not previously worked on
High alert volume, with multiple issues often tied to common root causes
Collaboration
Frequent interaction with data providers to resolve source data issues
Regular coordination with cross-functional technical teams on pipeline failures
Occasional engagement with end users reporting data discrepancies
On-Call & Incident Response
Rotating on-call schedule supporting different pipeline groups
Some rotations may include off-hours alerts tied to overnight pipeline processing
Majority of incidents handled during business hours, with occasional escalation scenarios
Engineers are expected to own resolution when possible and coordinate when dependencies exist
Ideal Candidate Background
Strong foundation in data engineering within production environments
Experience supporting operational data systems rather than purely building new solutions
Comfortable working in high-volume, incident-driven environments
Able to quickly understand and troubleshoot unfamiliar systems
Hands-on experience with distributed data processing and large datasets
group id: kforcecx
We offer roles across all three clearance levels: Confidential, Secret and Top Secret. With a Top Secret Facilities clearance, a proven subcontractor track record and a deep understanding of agencies across Defense, Intelligence, Homeland, Justice and Federal Civilian Sectors, Kforce brings more than 20 years of experience to supporting critical missions at federal, state and local levels.