Required Skills: Experience in writing code to automate Client models and relate events and incidents AI-Ops - run log events through models and come with anomaly detection. Python automation skills for Model Experience in Client model and deployment Kubernetes administration. More ❯
New York City (Manhattan), New York, United States
Omni Inclusive
run a lot of things on Windows/Biz Tech - but they are shifting towards Linux - (70% Windows, 30% Linux) Remote access technology protocols are a plus Job Description: Site Reliability Engineer Periodic updates and maintenance of Windows-based golden image for ESX & AWS. Patching of software, systems, appliances etc, through scripting or manual process Disaster recovery planning More ❯
About the job Site Reliability Engineer Top Secret Clearance Jobs is dedicated to helping those with the most exclusive security clearance find their next career opportunity and get interviews within 48 hours. Modern Technology Solutions, Inc. (MTSI) is seeking a Site Reliability Engineer to manage laboratory network assets for development and testing efforts located in Colorado … problems of global importance. Founded in 1993, MTSI today has employees at over 20 offices and field sites worldwide. For more information about MTSI, please visit Responsibilities As a Site Reliability Engineer (Senior) with MTSI, you will manage laboratory network assets for development and testing efforts located in Colorado Springs, CO. Your essential job functions will include but More ❯
Job Title : Site Reliability Engineer (SRE) Location : Austin, TX Type: Fulltime Job Summary - Technical Skills: • Expertise in understanding large scale production systems and technologies, for example load balancing, monitoring, distributed systems, microservices, and configuration management. • Should have solid hands-on experience in troubleshooting and fixing application failures, application Performance degradation, Code issues, cloud platform issues, Batch Failures, Infra More ❯
Job Description Position Title: Site Reliable Engineer (SRE) for Equity Trading Platform Job Description Jefferies is seeking for Site Reliability Engineer to play an instrumental role in supporting Equity Front office trading application, risk and middle office real time products, developed and used for Equity Cash and ETS application. As part of the wider platform engineering … skills and a keen intellect. The business is a growth area, with current investments taking place in all the technology, business and middle office areas. Job Duties: Front Line Site Reliable Engineering and Support functions for Equity trading systems used by Jefferies clients as well as internal users. Build monitoring tools for application and infrastructure components. Implement and manage More ❯
Reston, Virginia, United States Hybrid / WFH Options
Top Secret Clearance Jobs
About the job Site Reliability Engineer - Security Clearance Required Top Secret Clearance Jobs is dedicated to helping those with the most exclusive security clearance find their next career opportunity and get interviews within 48 hours. About Virtru: Virtru is a leading data protection provider backed by some of the foremost venture capital firms in Silicon Valley and the … our best work. We're building something special at Virtru. We hope you consider joining our team and helping us create a brighter future for data privacy. As a Site Reliability Engineer (SRE) at Virtru, you will play a pivotal role in driving continuous improvements in observability, performance, and reliability across our platform infrastructure. Your mission will be … core platform functions to establish a robust infrastructure. Collaborate closely with internal teams and government clients on a daily basis. Requirements Minimum of 5+ years of experience as a Site Reliability Engineer, demonstrating a strong understanding of SRE principles for highly scalable and reliable systems. Bachelor's degree in Computer Science or related field. Active TS/SCI More ❯
Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability. As a Principal Site Reliability Engineer at JPMorgan Chase within the Enterprise technology, Employee Platforms team, you work with your fellow stakeholders to define non-functional requirements (NFRs) and availability targets for … the engineers under your guidance Provides advice and mentoring to other engineers and acts as a key resource for technologists seeking advice on technical and business-related issues Demonstrates site reliability principles and practices every day and champions the adoption of site reliability throughout your team Collaborates with others to create and implement observability and reliability designs for … Evolves and debug critical components of applications and platforms Provides comprehensive and ongoing guidance, tools, and solutions to support the firms' growth Makes significant contributions to JPMorgan Chase's site reliability community via internal forums, communities of practice, guilds, and conferences Required qualifications, capabilities, and skills Formal training or certification on software engineering concepts and 5+ years applied experience More ❯
Guide and shape the future of technology at a globally recognized firm, driven by pride in ownership. As a Senior Lead Site Reliability Engineer at JPMorgan Chase within the Enterprise technology, Employee Platforms team, you are the non-functional requirement owner and champion for the applications in your remit. You are a key influencer in your team's … and automation of the software in your area. You act in a blameless, data-driven manner and navigate difficult situations with composure and tact. Job responsibilities Demonstrates expertise in site reliability principles and demonstrates an understanding of the fine balance between features, efficiency, and stability Effectively negotiates with peers and executive partners to ensure optimal outcomes for all Drives … the adoption of site reliability practices throughout the organization Ensures your teams demonstrate site reliability best practices with the ability to demonstrate this empirically through stability and reliability metrics Drives a culture of continual improvement and solicits real-time feedback to improve the customer's experience Ensures your team collaborates with other teams within your group's specialization More ❯
Site Reliability Engineer, Data Analytics San Diego, California, United States Software and Services Summary Posted: Apr 25, 2025 Weekly Hours: 40 Role Number: At Apple, our Data Analytics team focuses on improving the user experience by improving operating system stability, gathering feature usage telemetry, and evaluating device performance. This requires capturing data from customers who have given consent … information, all to help inform direction. We develop and operate a variety of Big Data infrastructure products and applications in support of these goals. Description We are looking for Site Reliability Engineer to be a member of our team in data analytics. If working on large scale problems excites you then we're excited to talk to you More ❯
an impact with a purpose-driven industry leader. Join us today and experience Life at Visa. Job Description Visa Technology & Operations LLC, a Visa Inc. company, needs a Sr. Site Reliability Engineer (multiple openings) in Austin, TX to Provide technical support to critical applications. Perform root cause analysis, applying operation break fixes and other maintenance activities to keep … will accept a Bachelor's degree in Computer Science, Engineering, Business Analytics or related field, and 2 years of experience in the job offered or in a related application engineer or analyst occupation. Alternatively, employer will accept a Master's degree in Computer Science, Engineering, Business Analytics, or related field. Position requires experience in the following skills: Linux Operating … can alternate time between both remote and office. Employees in hybrid roles are expected to work from the office 2-3 set days a week (determined by leadership/site), with a general guidepost of being in the office 50% or more of the time based on business needs. Travel Requirements: This position does not required travel. Mental/ More ❯
Waltham, Massachusetts, United States Hybrid / WFH Options
Dentsply Sirona
industry - with a market leader that continues to drive innovation. Make a difference -by helping improve oral health worldwide. Dentsply Sirona's Waltham, MA location is hiring a Sr. Site Reliability Engineer to join a global team that will ensure system reliability and performance. Together, this team will act as 24/7 emergency 2nd/3rd level … role is partially remote, providing a mix of working remotely and in office. The role is positioned in the Software Engineering & Cloud Operations (SECO) organization, under the Team Lead Site Reliability Engineering. Being an experienced technologist, you will be able to optimize our system performance and innovate for continuous improvement. KEY RESPONSIBILITIES Gather and analyze metrics from operating systems … when a downtime occurs EDUCATION AND EXPERIENCE Bachelor's or Master's degree in Computer Science or Software Engineering or relevant experience At least 5 years' experience in a Site Reliability Engineering/Platform Engineering/DevOps role or similar Excellent troubleshooting skills and proven experience resolving production downtime with immediate and long-term solutions A deep understanding of More ❯
fast, accurate, real-time access to motor vehicle, vessel and driver license records, is looking for creative individuals who are driven to produce great software solutions. As a DevOps Site Reliability Engineer, you will join a team responsible for continuous improvement and support of customer facing products. Responsibilities will include developing new solutions to solve complex business and More ❯
Watertown, Massachusetts, United States Hybrid / WFH Options
Dentsply Sirona
industry - with a market leader that continues to drive innovation. Make a difference -by helping improve oral health worldwide. Dentsply Sirona's Waltham, MA location is hiring a Sr. Site Reliability Engineer to join a global team that will ensure system reliability and performance. Together, this team will act as 24/7 emergency 2nd/3rd level … role is partially remote, providing a mix of working remotely and in office. The role is positioned in the Software Engineering & Cloud Operations (SECO) organization, under the Team Lead Site Reliability Engineering. Being an experienced technologist, you will be able to optimize our system performance and innovate for continuous improvement. KEY RESPONSIBILITIES Gather and analyze metrics from operating systems … when a downtime occurs EDUCATION AND EXPERIENCE Bachelor's or Master's degree in Computer Science or Software Engineering or relevant experience At least 5 years' experience in a Site Reliability Engineering/Platform Engineering/DevOps role or similar Excellent troubleshooting skills and proven experience resolving production downtime with immediate and long-term solutions A deep understanding of More ❯
future states of the organisation and make faster, more informed decisions. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney. As a Principal Site Reliability Engineer, you will be a senior technical leader focused on scaling and hardening our AWS- and Kubernetes-based infrastructure. You will work across product, platform, and operations … a robust engineering foundation for growth. We're looking for someone who has technical expertise, is a great communicator and enjoys collaborating across multiple teams. As a Lead Software Engineer, you will: Define and enforce SLOs, SLIs, and error budgets across critical services Craft and implement a cloud infrastructure and tooling strategy Work across our organization to level up More ❯
for authorized solutions, enabling seamless access for end-users and partners. Please note that this position requires US Citizenship Role Description: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) specializing in AWS DevOps to manage the onboarding of Independent Software Vendors (ISVs) within CGC's FedRAMP authorized boundary. This role demands a professional, customer … do their best. Workdays are dynamic, collegial, and fun. Our office features multiple places to work unconstrained by typical office barriers. Our wellness package provides access to an on-site gym and includes medical, dental, and vision insurance along with options for FSA and EAP. We offer 401(k) with employer match, unlimited PTO, and a culture respectful of More ❯
The Site Reliability Engineering (SRE) team at Pendo is responsible for provisioning and maintaining cloud infrastructure from development through production for all product initiatives, and working with developers and product managers to ensure that our products are not only reliable and performant, but also cost-efficient. Our platform is built on Google Kubernetes Engine (GKE) and utilizes several other … discussing performance analysis, cost analysis, and operational metrics Preferred Qualifications Experience designing, analyzing, and troubleshooting distributed systems Experience maintaining Kubernetes clusters in a production environment Previous experience as a Site Reliability Engineer, DevOps Engineer, or similar role Pendo was founded in 2013 by former product managers, who combined their heads and hearts to build something they wanted More ❯
San Diego, California, United States Hybrid / WFH Options
Sony Interactive Entertainment
positions and join our growing global team. The PlayStation brand falls under Sony Interactive Entertainment, a wholly-owned subsidiary of Sony Group Corporation. Sony Interactive Entertainment LLC seeks a Site Reliability Engineer in San Diego, CA to support application delivery and operations of internal and public-facing services within AWS cloud environment. Requires a Master's degree in More ❯
New York City (Manhattan), New York, United States
Baseten
and reliability for their mission-critical workloads. With our recent $75M Series C funding, we're growing fast to make AI accessible across all products. THE ROLE As a Site Reliability Engineer, you'll envision and build robust systems and processes that ensure our infrastructure is scalable, reliable, and efficient. This can range from automating deployments and monitoring … for reliability and performance across the infrastructure. Automate processes when relevant, particularly for managing CI/CD pipelines. Own products and projects end-to-end, functioning as both an engineer and a project manager, with a focus on user empathy, project specification, and end-to-end execution. Collaborate with cross-functional teams to understand project requirements and translate them More ❯
to own their own destiny. Klaviyo is growing fast and we have openings for all skill levels across all of our teams. Learn more about our engineering culture at Site Reliability Engineering (SRE) is what you get when you treat system operations as a software engineering problem. The mission of the Site Reliability Engineering team is to provide … SREs are team players who work collaboratively amongst themselves and with engineers from product teams to build the platform Klaviyo relies on to power its products. As a Senior Site Reliability Engineer you will own multiple foundational Klaviyo services and make a big impact on the productivity of our product engineering teams. How you will make a difference … other teams in a culture that values technical design review Contribute to the company as a subject matter expert in multiple areas, constantly pushing yourself to be a better engineer and to level up all of your peers within your team and within Klaviyo. Mentor and pair with other Klaviyo engineers to build better software by focusing on performance More ❯
your best. Be a part of a company that is part of the community; driven to improve our future and protect our freedom. We are looking for experienced Systems Engineer/Site Reliability Engineer (SRE) to join our technology-based program supporting a key Government customer. The Systems Engineer/SRE provides subject expertise and guidance … the development, testing, and implementation of technical solutions. Determining whether technical solutions meet defined requirements. The SRE may also provide Agile DevOps support to mission critical systems. The Systems Engineer/SRE may have the opportunity to build strong systems, software, and cloud environments and provide operations and maintenance for critical systems. The candidate will provide technical expertise and … of projects through all aspects of the software development lifecycle including scope and work estimation, architecture and design, coding, and unit testing. ABC Required Education, Experience, & Skills The Systems Engineer will support the team in the following activities (including but not limited to): Ensuring reliability, getting systems back to steady-state as quickly as possible Eliminating toil, automating wherever More ❯
Denver, Colorado, United States Hybrid / WFH Options
DAT Freight Solutions
offices in Seattle, WA; Springfield, MO; and Bangalore, India. For additional information, see Job Application Deadline: 06/30/2025 The Opportunity DAT is looking for a Staff Site Reliability Engineer to join our SRE platform team. This position will work hybrid from one of our locations: Seattle, WA, Beaverton, OR, or Denver, Colorado. Candidate profile DAT … is seeking an experienced Staff Site Reliability Engineer to help grow our SRE practices. In this role, you will be responsible for leading major technical initiatives and mentoring engineers to enhance their skills. You'll work closely with development teams, platform architects and management to achieve critical reliability goals and help scale our platform. What You'll Do More ❯
Seattle, Washington, United States Hybrid / WFH Options
DAT Freight Solutions
offices in Seattle, WA; Springfield, MO; and Bangalore, India. For additional information, see Job Application Deadline: 06/30/2025 The Opportunity DAT is looking for a Staff Site Reliability Engineer to join our SRE platform team. This position will work hybrid from one of our locations: Seattle, WA, Beaverton, OR, or Denver, Colorado. Candidate profile DAT … is seeking an experienced Staff Site Reliability Engineer to help grow our SRE practices. In this role, you will be responsible for leading major technical initiatives and mentoring engineers to enhance their skills. You'll work closely with development teams, platform architects and management to achieve critical reliability goals and help scale our platform. What You'll Do More ❯
Portland, Oregon, United States Hybrid / WFH Options
DAT Freight Solutions
offices in Seattle, WA; Springfield, MO; and Bangalore, India. For additional information, see Job Application Deadline: 06/30/2025 The Opportunity DAT is looking for a Staff Site Reliability Engineer to join our SRE platform team. This position will work hybrid from one of our locations: Seattle, WA, Beaverton, OR, or Denver, Colorado. Candidate profile DAT … is seeking an experienced Staff Site Reliability Engineer to help grow our SRE practices. In this role, you will be responsible for leading major technical initiatives and mentoring engineers to enhance their skills. You'll work closely with development teams, platform architects and management to achieve critical reliability goals and help scale our platform. What You'll Do More ❯
your best. Be a part of a company that is part of the community; driven to improve our future and protect our freedom. We are looking for experienced Systems Engineer/Site Reliability Engineer (SRE) to join our technology-based program supporting a key Government customer. The Systems Engineer/SRE provides subject expertise and guidance … the development, testing, and implementation of technical solutions. Determining whether technical solutions meet defined requirements. The SRE may also provide Agile DevOps support to mission critical systems. The Systems Engineer/SRE may have the opportunity to build strong systems, software, and cloud environments and provide operations and maintenance for critical systems. The candidate will provide technical expertise and … of projects through all aspects of the software development lifecycle including scope and work estimation, architecture and design, coding, and unit testing. ABC Required Education, Experience, & Skills The Systems Engineer will support the team in the following activities (including but not limited to): Ensuring reliability, getting systems back to steady-state as quickly as possible Eliminating toil, automating wherever More ❯
a sharp and passionate team at your back. If Braze sounds like a place where you can thrive, we can't wait to meet you. WHAT YOU'LL DO Site Reliability Engineers (SREs) are responsible for keeping all internal-facing services and platforms running smoothly. In a nutshell, SREs ensure site uptime. SREs blend sensible system administrators and … and sending billions of messages to end-users daily. We use a diverse technology stack rooted in Ruby on Rails, MongoDB, Redis, Kafka, Kubernetes, and more. As a Senior Site Reliability Engineer at Braze, you will collaborate with your team and consumer engineering teams to continuously improve the infrastructure, automation, and tooling that build internal products from these … from ever happening Retrospect everything that happens to turn lessons into system improvements/changes, automation, etc. WHO YOU ARE 5+ years of experience as a Software, DevOps, or Site Reliability Engineer 3+ years of Data Streaming Reliability Engineering Experience in monitoring, troubleshooting, and optimizing Kafka streaming applications, including diagnosing lag, partition imbalances, consumer group issues, and broker More ❯