Falls Church, Virginia, United States Hybrid / WFH Options
Epsilon Inc
teams to optimize data pipelines for AI/ML initiatives, automation, and productization Lead efforts to integrate security best practices, ensuring compliance with relevant regulations and standards Conduct performance analysis, capacity planning, and system tuning to maximize uptime and reliability Guide junior team members in troubleshooting techniques, documentation, and adherence to best practices Drive continuous improvement by reviewing existing … for secure system architecture Familiarity with data engineering concepts, including ETL/ELT pipelines, big data tools, and AI/ML workflows Ability to troubleshoot complex system issues, perform root-causeanalysis, and implement effective solutions Excellent communication, teamwork, and organizational skills, with a focus on innovation and continuous improvement One or more of the following certifications More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
First Central
such as major incident management, service management and change. Automation & AI Integration Expertise Platform Ownership & Product Engineering Data-Driven Decision-Making using telemetry Security & Compliance by Design Problem Decomposition & RootCauseAnalysis Technical Communication & Documentation Qualifications Maintain certifications and expertise in Azure technologies (Desirable), including but not limited to: AZ-104 – Azure Administrator MS-102 – Microsoft More ❯
Haywards Heath, England, United Kingdom Hybrid / WFH Options
First Central
such as major incident management, service management and change. Automation & AI Integration Expertise Platform Ownership & Product Engineering Data-Driven Decision-Making using telemetry Security & Compliance by Design Problem Decomposition & RootCauseAnalysis Technical Communication & Documentation Qualifications Maintain certifications and expertise in Azure technologies (Desirable), including but not limited to: AZ-104 – Azure Administrator MS-102 – Microsoft More ❯
Dashboards. Contribute to process and technical capabilities (e.g., Data Modeling, Data Visualizations, Artificial Intelligence (AI), Machine Learning (ML to enhance identification of service improvement opportunities. Complex data mining, trend analysis, metric and report production will be required. Identify and review service improvement opportunities with stakeholders based on TR enterprise-wide performance metrics. Proactive collaboration with stakeholders to create and … improvement initiatives. Be responsive to internal stakeholder needs and engage with stakeholders across multiple functions. Typical daily work may include but is not limited to complex data mining, trend analysis, metric and report production, process flow charting, and iterative service improvement activities (e.g. daily standups, data quality checks, change reviews, tool enhancement design and review). Contribute to proactive … enhance service reliability and availability. Support for Service Management activities to ensure a consistent standard of incident, problem, change and other practice areas for enhanced accuracy of data quality, rootcauseanalysis and identification of preventative measures. Support the recurring service performance reporting cycle (e.g., weekly, monthly, quarterly). About You: Experience in enterprise problem management, application More ❯
maintain systems according to approved design. Service Delivery & Operations: Lead key service management processes (Continuity, Capacity, Availability). Attend incident/problem bridges as the subject matter expert. Review rootcause analyses (RCAs) and oversee corrective actions. Provide accurate monthly service performance reports across IT and OT. Supplier & Financial Management: Lead and manage suppliers to meet agreed SLAs … change management experience. Ability to simplify complex network architecture for non-technical audiences. Desirable Technical Skills & Qualifications: Knowledge of network security technologies and strategic supplier management. Experience in stakeholder analysis and business case development. Familiarity with cloud integration (Azure and AWS). What's in it for you? Competitive salary up to £75,000 per annum, depending on experience More ❯
London, England, United Kingdom Hybrid / WFH Options
THAMES WATER UTILITIES LIMITED
maintain systems according to approved design. Service Delivery & Operations: Lead key service management processes (Continuity, Capacity, Availability). Attend incident/problem bridges as the subject matter expert. Review rootcause analyses (RCAs) and oversee corrective actions. Provide accurate monthly service performance reports across IT and OT. Supplier & Financial Management: Lead and manage suppliers to meet agreed SLAs … change management experience. Ability to simplify complex network architecture for non-technical audiences. Desirable Technical Skills & Qualifications: Knowledge of network security technologies and strategic supplier management. Experience in stakeholder analysis and business case development. Familiarity with cloud integration (Azure and AWS). What's in it for you? Competitive salary up to £75,000 per annum, depending on experience More ❯
Herne Bay, England, United Kingdom Hybrid / WFH Options
Thames Water
maintain systems according to approved design. Service Delivery & Operations: Lead key service management processes (Continuity, Capacity, Availability). Attend incident/problem bridges as the subject matter expert. Review rootcause analyses (RCAs) and oversee corrective actions. Provide accurate monthly service performance reports across IT and OT. Supplier & Financial Management: Lead and manage suppliers to meet agreed SLAs … change management experience. Ability to simplify complex network architecture for non-technical audiences. Desirable Technical Skills & Qualifications: Knowledge of network security technologies and strategic supplier management. Experience in stakeholder analysis and business case development. Familiarity with cloud integration (Azure and AWS). What’s in it for you? Competitive salary up to £75,000 per annum, depending on experience More ❯
Rancho Cordova, California, United States Hybrid / WFH Options
Delta Dental Plans Association
Tasks - Identify opportunities to streamline deployments, monitoring, and incident response through automation. Optimize CI/CD Pipelines - Manage and refine continuous integration and delivery processes. Incident Management - Collaborate on rootcauseanalysis (RCA) and postmortems to mitigate risks and improve resilience. Ensure Smooth Change Management - Collaborate with teams to execute safe, efficient releases aligned with ITIL best More ❯
Theale, England, United Kingdom Hybrid / WFH Options
Bottomline
for ETL/ELT jobs, automating deployment, testing, and delivery processes. Deliver and maintain Disaster Recovery (DR) processes and solutions, including EDW backup, restore, and failover capabilities. Collaborate on rootcauseanalysis (RCA) and permanent resolution of EDW and pipeline-related incidents. Infrastructure & Automation: Implement and manage Infrastructure as Code (IaC) for data platform components using tools … Incident Management: Design and implement monitoring, logging, and alerting frameworks for data pipelines and EDW systems to ensure high availability and reliability. Lead or contribute to incident response, performing rootcauseanalysis (RCA), corrective action, and continuous improvement initiatives. Maintain and enforce SLAs and operational best practices for EDW and reporting platforms. Reporting & Analytics Support: Support business … understanding of DevOps best practices, including automation, configuration management, and continuous delivery. Familiarity with Data Mesh technologies (Denodo, Starburst) is a plus. Strong analytical and troubleshooting skills for performing rootcauseanalysis (RCA) of incidents. Good working knowledge of Power BI or similar BI tools. Strong communication and documentation skills to support cross-functional collaboration. Preferred Certifications More ❯
System Reliability Engineering and 3rd level Application Support for multiple fast-growing Products. This individual will be responsible for issue investigation, services monitoring and reporting, Production validations and leading rootcauseanalysis and problem resolution with multiple teams. Determining any impacts due to internal or external incidents and the associated service and function availability. Specific Responsibilities include … Provide level 3 application support for mission-critical applications including Investigate issues, problems, detect and investigate defects. Determine and provide rootcauseanalysis based on application behavior and log data. Utilize basic scripting skills to generate log output based on support requests and issues Work with multiple teams to determine the mitigation and resolution solutions for issues … using Splunk for Monitoring and Issue triage Experience with using Prometheus/Grafana Ability to adapt to shift work (off hours/weekend work) Analytical and diagnostic skills, e.g., rootcauseanalysis Must have a strong commitment to execution, follow through and timely communication Excellent writing and interpersonal skills with the ability to communicate effectively with both More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
Smart DCC
documented in accordance with the relevant policies and procedures. Act as the primary point of contact for the Security Operations Manager for potential incidents, supporting more junior analysts’ subsequent analysis and investigation to determine their severity and the response required. Provide a Technical Escalation Point during security incidents, working collaboratively to establish the extent of an attack, the business … dashboard reporting. Work collaboratively with internal and external teams to identify opportunities for security improvements and review products that can advance our security capabilities, such as tools that support analysis/detection and other emerging technologies. Gather forensic data and physical equipment, to perform in-depth rootcause analysis. Support use case tuning through auditing and approval More ❯
Colorado Springs, Colorado, United States Hybrid / WFH Options
Lockheed Martin
installing, configuring, and deploying one or more of the following cybersecurity tools: ACAS (Assured Compliance Assessment Solution) Vulnerability scanning and compliance assessment, BigFix, Delinea, Privileged access management (PAM), Log analysis and threat hunting using the Elastic Stack (Elasticsearch, Logstash, Kibana), ESS (Trellix), Axway Repeater/Responder for Windows, MFA (Multi-factor authentication) implementation. - Experience with deploying tools in Windows … and Linux environments, including virtualized or containerized setups (e.g., VMware, Docker). - Ability to diagnose and troubleshoot issues during testing, including rootcauseanalysis and error reporting. - Knowledge of scripting languages (e.g., Python, PowerShell, Bash) to automate test cases and validate outputs. - Expertise in applying vendor patches for cybersecurity tools and underlying systems (Windows, Linux). - Knowledge More ❯
Chicago, Illinois, United States Hybrid / WFH Options
Ahold Delhaize
indicators (SLIs). Build and manage microservices-based platforms leveraging Spring Boot, Java, Tomcat, and Redis. Monitor production environments using Datadog and proactively address performance and reliability issues. Perform rootcauseanalysis and lead post-incident reviews to drive continual improvement. Manage CI/CD pipelines and deployment automation using GitHub, Docker, and container orchestration technologies. Create More ❯
Falls Church, Virginia, United States Hybrid / WFH Options
Epsilon Inc
of data between systems by helping with Extract, Transform, Load (ETL) processes and ensuring data consistency across different platforms. Monitor and Troubleshoot Database Performance Issues - Identify potential bottlenecks, perform rootcauseanalysis, and work with senior architects to implement solutions that enhance database reliability and efficiency. Support Compliance and Regulatory Requirements - Ensure database structures and data management More ❯
London, England, United Kingdom Hybrid / WFH Options
SERVPRO of Limestone and Lawrence Counties
processes. You'll have some experience managing coding standards, security and auditing procedures. You'll have experience managing monitoring/proactive alerting processes. You'll need intermediate experience with rootcauseanalysis, database performance monitoring and capacity planning. You'll ensure all database systems meet business and performance requirements. You'll have the ability to work towards More ❯
London, England, United Kingdom Hybrid / WFH Options
MetaCompliance
compliance requirements. Spearhead database observability efforts by instrumenting telemetry and diagnostics to provide insights into query performance, blocking, deadlocks, resource contention, and availability. Lead database incident response by performing rootcauseanalysis, collaborating with cross-functional teams, and documenting post-mortems to drive continuous improvement and prevention. Establish governance processes that manage schema evolution through impact assessment More ❯
Phoenix, Arizona, United States Hybrid / WFH Options
Nexthink
vulnerability scans, and penetration testing. Collaborate with the compliance team to prepare for and respond to FedRAMP audits. Incident Management: Lead incident management efforts, ensuring rapid resolution and thorough rootcause analysis. Develop and implement strategies for improving incident response and minimizing downtime. Collaboration and Communication: Work closely with development, operations, and security teams to integrate reliability and More ❯
Salt Lake City, Utah, United States Hybrid / WFH Options
Nexthink
vulnerability scans, and penetration testing. Collaborate with the compliance team to prepare for and respond to FedRAMP audits. Incident Management: Lead incident management efforts, ensuring rapid resolution and thorough rootcause analysis. Develop and implement strategies for improving incident response and minimizing downtime. Collaboration and Communication: Work closely with development, operations, and security teams to integrate reliability and More ❯
Falls Church, Virginia, United States Hybrid / WFH Options
Epsilon Inc
assessments and provide actionable recommendations for mitigation. Experience supporting security for data pipelines, AI/ML environments, or cloud-based infrastructures. Excellent incident response skills, including triage, containment, and rootcause analysis. Strong communication and collaboration abilities to partner with cross-functional teams and stakeholders. One or more of the following certifications are desired: Certified Cloud Security Professional More ❯
Daytona Beach, Florida, United States Hybrid / WFH Options
Wright Technical Services
management, along with exceptional communication and analytical skills. The ideal candidate will lead major incident bridges, coordinate change management activities, ensure SLA compliance, and drive continuous improvement through trend analysis and stakeholder engagement. A collaborative leader with a results-driven mindset, this individual will play a critical role in enhancing service reliability and operational efficiency across the enterprise. Experience … to restore operations swiftly while ensuring adherence to Service Level Agreements (SLAs) Incident and Problem Management : Oversee end-to-end incident management using ServiceNow, ensuring accurate tracking, status updates, rootcauseanalysis, and post-mortem documentation to prevent recurrence Change Management : Coordinate and review change requests in ServiceNow, ensuring changes are assessed for risk, properly documented, and … applications and services Technical Leadership : Collaborate with IT and business teams to troubleshoot complex issues, leveraging your technical background to guide resolution efforts and improve system reliability Proactive Trend Analysis : Identify patterns in incidents and change outcomes to recommend process improvements, enhance monitoring, and resolve chronic issues that impact operations Stakeholder Communication : Define and maintain distribution lists in ServiceNow More ❯
Manchester Area, United Kingdom Hybrid / WFH Options
Us3 Consulting
internal and/or 3rd party support teams Ensure resolution of incidents according to agreed SLA's Apply problem solving skills to recreate, debug, identify and resolve issues Perform rootcauseanalysis of issues to prevent reoccurrence Form part of the on-call rota for out of hours critical incidents Provide proactive support & maintenance across the application More ❯
Reston, Virginia, United States Hybrid / WFH Options
CGI
methodologies, assumption, validation techniques and findings to align with regulatory expectations and internal governance standards Support the Funds Transfer Pricing and Enterprise Financial Analytics teams with any ad-hoc analysis projects or reporting Experience working within Capital Markets, Treasury or balance sheet management preferred Proficient in MS Excel technical skills, i.e. Python, R, SAS and using BI tools for … financial analysis desired Required qualifications to be successful in this role: 8-9 years of relevant experience Proficiency in Microsoft Excel; familiarity with Python, R, SAS, and BI tools (e.g., Power BI, Tableau) for financial analysis Strong experience in financial modeling, documentation, and regulatory compliance Experience in Capital Markets, Treasury, or balance sheet management Excellent planning and organizational … skills using tools like Microsoft Project Strong facilitation, communication, and relationship-building skills Ability to manage and coordinate project teams and resolve technical issues Skilled in process mapping, rootcauseanalysis, and structured problem-solving Familiarity with project management methodologies and risk management practices Education: Bachelors degree in Business, Computer Science, Information Systems, or a related field More ❯
Bolton, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
internal and/or 3rd party support teams Ensure resolution of incidents according to agreed SLA's Apply problem solving skills to recreate, debug, identify and resolve issues Perform rootcauseanalysis of issues to prevent reoccurrence Form part of the on-call rota for out of hours critical incidents Provide proactive support & maintenance across the application More ❯
Wilmslow, England, United Kingdom Hybrid / WFH Options
Waters Corporation
code quality, and team collaboration Supervise and measure KPIs related to development efficiency, such as cycle time, lead time, and deployment frequency Facilitate continuous improvement initiatives like Agile retrospectives, rootcause analyses, and process audits Work closely with DevOps and tooling teams to streamline CI/CD pipelines and automate manual workflows Support Agile transformation by aligning teams … maintainable, and scalable development Act as a liaison between business partners and technical teams to align process improvements with strategic goals Qualifications 10+ years of experience in business process analysis, with a focus on software development and IT operations Deep understanding of software development methodologies ( Agile, Scrum, SAFe, DevOps, Waterfall) Proven track record of leading large- scale process improvement … in Business Administration, Computer Science, Engineering, or related field Company Description Waters Corporation (NYSE: WAT), the world's leading specialty measurement company, has pioneered chromatography, mass spectrometry and thermal analysis innovations serving the life, materials, and food sciences for over 60 years. With approximately 8,000 employees worldwide, Waters operates directly in 35 countries, including 15 manufacturing facilities, with More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
US3 Consulting
internal and/or 3rd party support teams Ensure resolution of incidents according to agreed SLA's Apply problem solving skills to recreate, debug, identify and resolve issues Perform rootcauseanalysis of issues to prevent reoccurrence Form part of the on-call rota for out of hours critical incidents Provide proactive support & maintenance across the application More ❯