Yesterday
Secret
Mid Level Career (5+ yrs experience)
IT - Software
Fort Belvoir, VA (On-Site/Office)
Job Description
Looking for an exceptional Senior Systems Engineer, Web Application Subject Matter Expert, who will be the ultimate escalation point for complex web application issues, ensuring the stability and performance of critical government-facing applications. Your ability to tackle complex problems and build reliable, scalable solutions will help shape the future of technology and contribute to making our nation safer.
Your Impact / Key Responsibilities
Tier 4 Escalation & Advanced Troubleshooting: Serve as the highest escalation point for critical and complex web application incidents, performing in-depth analysis, root cause identification, and implementing effective resolutions to minimize downtime and prevent recurrence.
Web Application Subject Matter Expertise: Develop and maintain comprehensive expertise in the web application's architecture, components, dependencies, and operational behavior within the deployment environment.
Problem Management & Prevention: Lead efforts in post-incident reviews, identifying underlying issues, and driving the implementation of permanent solutions and preventative measures to enhance application reliability and stability.
Collaboration & Knowledge Transfer: Work closely with Operations, On-site DevOps, and Development teams to facilitate efficient incident response, share expert knowledge, and contribute to the continuous improvement of operational processes and application design.
Deployment Environment Support: Troubleshoot and resolve issues related to the web application's deployment within its environment, encompassing infrastructure, networking, and related services.
Kubernetes Troubleshooting: Utilize existing Kubernetes knowledge or rapidly acquire proficiency to diagnose and resolve application issues within containerized environments, understand deployment strategies, and assist with related infrastructure challenges.
Documentation & Best Practices: Create and maintain detailed troubleshooting guides, runbooks, knowledge base articles, and technical documentation to empower other support tiers and foster a culture of shared knowledge.
Performance Monitoring & Optimization: Proactively monitor application performance, identify bottlenecks, and recommend architectural or configuration changes to optimize efficiency, scalability, and resilience.
Continuous Improvement: Champion initiatives to enhance the application's operational excellence, including automation, improved monitoring, and streamlined deployment practices.
Minimum Qualifications
Proven experience as a Systems Engineer, Application Engineer, or similar role with a strong focus on supporting and troubleshooting complex web applications.
Demonstrated Subject Matter Expertise in a critical web application, including its full stack (front-end, back-end, database, APIs).
Extensive experience in incident management and problem resolution at a Senior escalation level.
Strong understanding of web application architecture, performance tuning, and security best practices.
Experience working with and troubleshooting applications in modern deployment environments.
Familiarity with containerization technologies, specifically Kubernetes, or a strong aptitude and willingness to learn and apply it for troubleshooting and operational tasks.
Proficiency with monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK Stack, Splunk).
Exceptional analytical, problem-solving, and critical thinking skills with a methodical approach to complex issues.
Excellent verbal and written communication skills, with the ability to articulate complex technical concepts to diverse audiences.
Ability to work effectively under pressure, prioritize tasks, and manage multiple concurrent issues.
Nice to Have Qualifications
Experience with specific web application frameworks or technologies relevant to our stack.
Certifications in cloud platforms (e.g., AWS, Azure, GCP) or Kubernetes (e.g., CKA, CKAD).
Experience with scripting and automation (e.g., Python, Bash, Ansible).
Prior experience in a government or highly regulated environment.
Work Environment & Expectations
US-based geographically dispersed team working in a hybrid work environment.
Work hours may vary based on mission-critical tasks, but a collaborative and flexible approach is encouraged.
Self-starter mentality and ability to take direction and execute on tasks in the absence of specific detailed tasking.
Looking for an exceptional Senior Systems Engineer, Web Application Subject Matter Expert, who will be the ultimate escalation point for complex web application issues, ensuring the stability and performance of critical government-facing applications. Your ability to tackle complex problems and build reliable, scalable solutions will help shape the future of technology and contribute to making our nation safer.
Your Impact / Key Responsibilities
Tier 4 Escalation & Advanced Troubleshooting: Serve as the highest escalation point for critical and complex web application incidents, performing in-depth analysis, root cause identification, and implementing effective resolutions to minimize downtime and prevent recurrence.
Web Application Subject Matter Expertise: Develop and maintain comprehensive expertise in the web application's architecture, components, dependencies, and operational behavior within the deployment environment.
Problem Management & Prevention: Lead efforts in post-incident reviews, identifying underlying issues, and driving the implementation of permanent solutions and preventative measures to enhance application reliability and stability.
Collaboration & Knowledge Transfer: Work closely with Operations, On-site DevOps, and Development teams to facilitate efficient incident response, share expert knowledge, and contribute to the continuous improvement of operational processes and application design.
Deployment Environment Support: Troubleshoot and resolve issues related to the web application's deployment within its environment, encompassing infrastructure, networking, and related services.
Kubernetes Troubleshooting: Utilize existing Kubernetes knowledge or rapidly acquire proficiency to diagnose and resolve application issues within containerized environments, understand deployment strategies, and assist with related infrastructure challenges.
Documentation & Best Practices: Create and maintain detailed troubleshooting guides, runbooks, knowledge base articles, and technical documentation to empower other support tiers and foster a culture of shared knowledge.
Performance Monitoring & Optimization: Proactively monitor application performance, identify bottlenecks, and recommend architectural or configuration changes to optimize efficiency, scalability, and resilience.
Continuous Improvement: Champion initiatives to enhance the application's operational excellence, including automation, improved monitoring, and streamlined deployment practices.
Minimum Qualifications
Proven experience as a Systems Engineer, Application Engineer, or similar role with a strong focus on supporting and troubleshooting complex web applications.
Demonstrated Subject Matter Expertise in a critical web application, including its full stack (front-end, back-end, database, APIs).
Extensive experience in incident management and problem resolution at a Senior escalation level.
Strong understanding of web application architecture, performance tuning, and security best practices.
Experience working with and troubleshooting applications in modern deployment environments.
Familiarity with containerization technologies, specifically Kubernetes, or a strong aptitude and willingness to learn and apply it for troubleshooting and operational tasks.
Proficiency with monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK Stack, Splunk).
Exceptional analytical, problem-solving, and critical thinking skills with a methodical approach to complex issues.
Excellent verbal and written communication skills, with the ability to articulate complex technical concepts to diverse audiences.
Ability to work effectively under pressure, prioritize tasks, and manage multiple concurrent issues.
Nice to Have Qualifications
Experience with specific web application frameworks or technologies relevant to our stack.
Certifications in cloud platforms (e.g., AWS, Azure, GCP) or Kubernetes (e.g., CKA, CKAD).
Experience with scripting and automation (e.g., Python, Bash, Ansible).
Prior experience in a government or highly regulated environment.
Work Environment & Expectations
US-based geographically dispersed team working in a hybrid work environment.
Work hours may vary based on mission-critical tasks, but a collaborative and flexible approach is encouraged.
Self-starter mentality and ability to take direction and execute on tasks in the absence of specific detailed tasking.
group id: 10119426