user avatar

Site Reliability Engineer

Kaztronix

Yesterday
Secret
IT - Software
Sunnyvale, CA (On-Site/Office)

A Global Government Contracting Company is seeking a Site Reliability Engineer to join their team in Sunnyvale, CA!

As a Site Reliability Engineer, you will:

Design, implement, and maintain highly available and scalable systems and infrastructure to support classified applications and services
Develop and implement reliability-focused engineering practices, such as continuous integration, continuous deployment, and continuous monitoring, while ensuring compliance with classified system requirements
Collaborate with development teams to ensure that reliability and scalability are considered throughout the software development lifecycle, while maintaining the security and integrity of the classified system
Identify and mitigate potential sources of downtime and performance degradation, including infrastructure, application, and network issues, while ensuring that all troubleshooting and debugging activities are conducted in accordance with classified system procedures
Develop and maintain technical documentation, including system diagrams, architecture documents, and runbooks, while ensuring that all documentation is properly marked and handled in accordance with classified system requirements
Lead and participate in incident response and post-incident reviews to identify root causes and implement corrective actions, while ensuring that all incident response activities are conducted in accordance with classified system procedures
Collaborate with other teams, including development, operations, and security, to ensure that reliability and scalability are considered in all aspects of system design and operation, while maintaining the security and integrity of the classified system
Develop and maintain metrics and monitoring systems to measure system reliability and performance, while ensuring that all monitoring activities are conducted in accordance with classified system requirements
Stay up-to-date with industry trends and emerging technologies, and apply this knowledge to continuously improve system reliability and scalability, while maintaining the security and integrity of the classified system
Basic Qualifications

Bachelor's degree in Computer Science, Engineering, or a related field
Minimum 8 years of experience in site reliability engineering, DevOps, or a related field, with a focus on classified systems
Must possess or be able to obtain within 6 months of start date a valid IAT Level II or III DoD Approved 8140 (DoD 8570) certification such as Security+, in good standing
Ability to obtain & maintain a Top Secret security clearance, US Citizenship required
Experienced with production use of vSphere/ESXi/vCenter, RHEL
Advance proficiency using of Python, BASH, Ansible, puppet, and chef for system administration
Demonstrable proficiency with MRTG/PRTG, Nagios, SolarWinds or similar
Proven ability with Cloud and Container technologies: Kubernetes, Docker/Mirantis, AWS, and/or Azure
Strong technical background in systems administration, networking, and software development, with a focus on classified systems
Excellent problem-solving skills, with the ability to analyze complex systems and identify root causes of issues, while maintaining the security and integrity of the classified system
Networking fundamentals, including TCP/IP, DNS, and routing protocols


Desired Skills

System integration experience of large-scale distributed infrastructure systems
Masters degree in Computer Engineering or related field
Data center operations/system administrator experience, preferably in a DoD environment (RMF, STIG, or NISPOM)
Certification in site reliability engineering, DevOps, or a related field, with a focus on classified systems
Experience with machine learning and artificial intelligence technologies, with a focus on classified systems
Strong knowledge of security principles and practices, including secure coding, secure deployment, and secure operations, with a focus on classified systems
Strong understanding of networking fundamentals, including TCP/IP, DNS, and routing protocols, with a focus on classified systems
Ability to support on-call 24X7 and off-shift for mission critical events/operation that may require extended hours or weekend supports
Comfortable working in a fast paced and dynamic multi-disciplinary environment
Active Secret security clearance
Location: Sunnyvale, CA

Work Schedule 9 x 80 onsite with on call rotations

Clearance: Minimum Secret with the ability to obtain TS
group id: 10195552
N
Name HiddenRecruiter

Match Score

Powered by IntelliSearchâ„¢
image match score
Create an account or Login to see how closely you match to this job!

Similar Jobs


Job Category
IT - Software
Clearance Level
Secret
Employer
Kaztronix