user avatar

AWS Site Reliability Engineer (SRE)

Blu Omega LLC

Posted today

Job Requirements

Remote
Secret Polygraph Unspecified
Career Level not specified
Salary not specified
Join Premium to unlock estimated salaries

Job Description

Blu Omega is seeking an AWS Site Reliability Engineer (SRE) for a critical federal analytics program on a remote basis. To be considered for this role, you must hold an Active DoD Secret Clearance (or higher), and the ability to work remotely in a collaborative Agile Scrum environment. #CJ

Role Description:
  • In this role, you will be supporting our client's AWS-based analytics environment. This role is is responsible for the operational health, availability, and performance of the AWS and Databricks environments built by the Platform Engineering team.
Your responsibilities include:
  • Prepare and take ownership of "day two" operations, focusing on observability, incident response, and capacity planning.
  • Design and implement comprehensive monitoring solutions using tools like AWS CloudWatch to track the health of Databricks clusters, job performance, and underlying AWS resources.
  • Your goal is to minimize downtime and inefficiencies (manual, repetitive work) by automating operational tasks and recovery procedures.
  • Define and track Service Level Objectives (SLOs) to balance reliability with innovation as well as create the operations Service Operating Procedures (SOPs).
Required Skills/Background:
  • Active DoD Secret Clearance
  • Ability to work remotely in a collaborative, Agile Scrum environment
  • Experience with CloudWatch, performance tuning in cloud environments, IaC tools, Databricks management and performance instrumentation
group id: 91121246
N
Name HiddenRecruiter

Similar Jobs


Clearance Level
Secret