user avatar

AWS Site Reliability Engineer (SRE)

Blu Omega LLC

Posted today
Secret
Unspecified
Unspecified
Engineering - Mechanical
Remote/Hybrid (Off-Site/Hybrid)

Blu Omega is seeking an AWS Site Reliability Engineer (SRE) for a critical federal analytics program on a remote basis. To be considered for this role, you must hold an Active DoD Secret Clearance (or higher), and the ability to work remotely in a collaborative Agile Scrum environment. #CJ

Role Description:
  • In this role, you will be supporting our client's AWS-based analytics environment. This role is is responsible for the operational health, availability, and performance of the AWS and Databricks environments built by the Platform Engineering team.
Your responsibilities include:
  • Prepare and take ownership of "day two" operations, focusing on observability, incident response, and capacity planning.
  • Design and implement comprehensive monitoring solutions using tools like AWS CloudWatch to track the health of Databricks clusters, job performance, and underlying AWS resources.
  • Your goal is to minimize downtime and inefficiencies (manual, repetitive work) by automating operational tasks and recovery procedures.
  • Define and track Service Level Objectives (SLOs) to balance reliability with innovation as well as create the operations Service Operating Procedures (SOPs).
Required Skills/Background:
  • Active DoD Secret Clearance
  • Ability to work remotely in a collaborative, Agile Scrum environment
  • Experience with CloudWatch, performance tuning in cloud environments, IaC tools, Databricks management and performance instrumentation
group id: 91121246
N
Name HiddenRecruiter

Match Score

Powered by IntelliSearch™
image match score
Create an account or Login to see how closely you match to this job!

Similar Jobs


Clearance Level
Secret