Job Requirements
Washing, DC New York, NY
Secret Polygraph not specified
Mid Level Career (5+ yrs experience)
$175,000 - $260,000
Job Description
Job Description:
Serve as a Data Pipeline Reliability Engineer (DPRE) on a cross-functional team
Help to ensure our customers’ missions are supported with updated and accurate data
Build, optimize, and maintain data pipelines to improve their efficiency and resilience.
Serve as a first responder
Triaging, troubleshooting, and coordinating the resolution of technical issues
Diagnose, resolve, and prevent issues encountered in the field
Implement/maintain automated monitoring to detect data quality issues
Maintaining and building schedules so that pipelines run
Setting up and maintaining health checks on different pipelines
Read and write code changes and modify the monitoring set-up where necessary
Know and understand how to navigate the pipelines and documentation
Follow SOPs to contact other teams and data providers when
Data is incorrect
Not received on time
Communicate outages with the end users of a pipeline
Contribute to and monitoring tooling improvements (where feasible)
Qualifications:
Strong engineering background
Proficiency with programming languages such as Python, Pyspark, SQL, and Java
Basic parallel data processing experience
Basics understanding of Spark and optimizing and tuning Spark jobs
Experience performing root cause analysis and documentation of findings
Understanding/experience with data concepts such as:
Data warehousing
Data Lakes
Data governance
Data Liniage
Understanding of networking concepts (DNS, VPNs, Load Balancing)
Experience with the following tools (highly preferred):
Observability tools (Ex. Grafana)
Data Pipeline tools (Ex. Airflow)
Cloud tools (Ex: AWS, Azure, Google Cloud)
IaC tools (Ex. Terraform)
Required: Active Secret Security Clearance
Job Type: Full-time
Serve as a Data Pipeline Reliability Engineer (DPRE) on a cross-functional team
Help to ensure our customers’ missions are supported with updated and accurate data
Build, optimize, and maintain data pipelines to improve their efficiency and resilience.
Serve as a first responder
Triaging, troubleshooting, and coordinating the resolution of technical issues
Diagnose, resolve, and prevent issues encountered in the field
Implement/maintain automated monitoring to detect data quality issues
Maintaining and building schedules so that pipelines run
Setting up and maintaining health checks on different pipelines
Read and write code changes and modify the monitoring set-up where necessary
Know and understand how to navigate the pipelines and documentation
Follow SOPs to contact other teams and data providers when
Data is incorrect
Not received on time
Communicate outages with the end users of a pipeline
Contribute to and monitoring tooling improvements (where feasible)
Qualifications:
Strong engineering background
Proficiency with programming languages such as Python, Pyspark, SQL, and Java
Basic parallel data processing experience
Basics understanding of Spark and optimizing and tuning Spark jobs
Experience performing root cause analysis and documentation of findings
Understanding/experience with data concepts such as:
Data warehousing
Data Lakes
Data governance
Data Liniage
Understanding of networking concepts (DNS, VPNs, Load Balancing)
Experience with the following tools (highly preferred):
Observability tools (Ex. Grafana)
Data Pipeline tools (Ex. Airflow)
Cloud tools (Ex: AWS, Azure, Google Cloud)
IaC tools (Ex. Terraform)
Required: Active Secret Security Clearance
Job Type: Full-time
group id: 90942178