Posted today
Secret
Unspecified
Unspecified
Engineering - Mechanical
Remote/Hybrid• (Off-Site/Hybrid)
Blu Omega is seeking an AWS Site Reliability Engineer (SRE) for a critical federal analytics program on a remote basis. To be considered for this role, you must hold an Active DoD Secret Clearance (or higher), and the ability to work remotely in a collaborative Agile Scrum environment. #CJ
Role Description:
Role Description:
- In this role, you will be supporting our client's AWS-based analytics environment. This role is is responsible for the operational health, availability, and performance of the AWS and Databricks environments built by the Platform Engineering team.
- Prepare and take ownership of "day two" operations, focusing on observability, incident response, and capacity planning.
- Design and implement comprehensive monitoring solutions using tools like AWS CloudWatch to track the health of Databricks clusters, job performance, and underlying AWS resources.
- Your goal is to minimize downtime and inefficiencies (manual, repetitive work) by automating operational tasks and recovery procedures.
- Define and track Service Level Objectives (SLOs) to balance reliability with innovation as well as create the operations Service Operating Procedures (SOPs).
- Active DoD Secret Clearance
- Ability to work remotely in a collaborative, Agile Scrum environment
- Experience with CloudWatch, performance tuning in cloud environments, IaC tools, Databricks management and performance instrumentation
group id: 91121246