and build environments using Infrastructure as Code with Terraform and configuration management tools like Ansible Automating repetitive tasks to eliminate toil and drive consistency + repeatability Incident response and root-causeanalysis, support a blameless post-mortems culture As a Cloud Systems Engineer, you will have a direct role in providing infrastructure services to our development environment. More ❯
and test network engineering/administration activities. Create and maintain Standard Operating Procedures (SOPs) and technical documentation. Provide follow-up reports (technical findings, feedback, and resolution steps taken) for RootCauseAnalysis and process improvement initiatives. Basic Qualifications Minimum of a Bachelor's degree in Science, with 8+ years' experience or Master's degree with 6+ years More ❯
and test network engineering/administration activities. Create and maintain Standard Operating Procedures (SOPs) and technical documentation. Provide follow-up reports (technical findings, feedback, and resolution steps taken) for RootCauseAnalysis and process improvement initiatives. Qualifications Minimum of a Bachelor's degree in Science, Technology, Engineering and Math (preferred) with 12-15 years' experience or Master More ❯
to deploy applications on Kubernetes clusters in an on-premises cloud environment. Automate repetitive tasks to eliminate toil and drive consistency + repeatability. Actively participate in incident response and root-causeanalysis, support a blameless post-mortems culture. Required Skills and Qualifications Active TS/SCI security clearance. Bachelor of science degree in engineering/technical discipline. More ❯
and resolving challenging issues. Collaborate with development, infrastructure, network, and other technical teams to diagnose and resolve cross-functional problems. Liaise with vendors for support when necessary. Conduct thorough rootcauseanalysis for recurring issues. Work with problem management teams to identify long-term solutions and ensure the implementation of preventive measures. Create and update documentation, knowledge More ❯
and testing efforts to maintain software quality and performance. Support CI/CD pipelines using Jenkins and contribute to automated testing and deployment. Troubleshoot and resolve production issues, performing rootcauseanalysis and providing timely solutions. Mentor junior engineers and share knowledge across the team to foster a collaborative working environment. Basic Qualifications: Bachelor's degree in More ❯
Systems Support team (CIM), Operational Technology Engineers, Data Engineers, and Web Developer Monitoring and reporting on system performance, availability, and incident response metrics Providing leadership in incident management and rootcauseanalysis for system-related issues, while also ensuring effective change control procedures for all changes introduced to the factory (ITIL) Managing and leading a team of More ❯
Stockport, Greater Manchester, North West, United Kingdom
Nexperia
Systems Support team (CIM), Operational Technology Engineers, Data Engineers, and Web Developer Monitoring and reporting on system performance, availability, and incident response metrics Providing leadership in incident management and rootcauseanalysis for system-related issues, while also ensuring effective change control procedures for all changes introduced to the factory (ITIL) Managing and leading a team of More ❯
and legacy systems/technical debt activities Collaborate with Senior Engineers to improve delivery automation and enhance DevEx and self-servicing Aligns to effective incident response processes, helping with rootcauseanalysis and problem resolution during incident management sessions Take ownership and pride in the work you deliver, ensure what is delivered is of quality and takes More ❯
our tools and platforms Collaborate with the team to troubleshoot and resolve issues, shadowing and learning from Mid and Senior-level Engineers Aligns to incident response processes, helping with rootcauseanalysis and problem resolution during incident management sessions Take ownership and pride in the work delivered, ensure what is delivered is of quality and takes into More ❯
Systems Support team (CIM), Operational Technology Engineers, Data Engineers, and Web Developer Monitoring and reporting on system performance, availability, and incident response metrics Providing leadership in incident management and rootcauseanalysis for system-related issues, while also ensuring effective change control procedures for all changes introduced to the factory (ITIL) Managing and leading a team of More ❯
members, stakeholders, and customers. Manage major incident bridges with calmness and experience, ensuring timely resolution, formalized communication of impact, and minimal impact to the business. Drive Lessons Learned and RootCauseAnalysis (RCA) on all P1/P2 incidents and some business-impacting P3 incidents to prevent recurrence. Develop and maintain the strategy for Operational Support to More ❯
first six to twelve months you will: - Adapt and improve operations management systems and processes to accommodate rapid and increasing growth in systems and traffic - Drive expedient resolution and rootcauseanalysis for critical issues affecting your services - Mentor and develop a team of Operational Engineers with an emphasis on providing dial tone services - Bring continuous improvement More ❯
and enhancing Cloud One Infrastructure as Code (IAC). • Responsible for daily system monitoring, security, SW Factory Cloud health, resources, and log management of AWS systems. • Troubleshooting and performing rootcause analysis. This includes troubleshooting all issues with Cloud system configurations, backups, files systems, and user access. • Interface with engineers, systems engineers, and subject matter experts. • Troubleshoot networking More ❯
operational procedures. Mentor team members and contribute to a culture of learning and inclusion. Continuously improving infrastructure reliability and reducing manual work (TOIL). Participating in incident response and rootcause analysis. Why Join Us? Join our team and contribute to a culture of innovation, collaboration, and excellence. If you are ready to advance your career and make More ❯
Develop and manage the IT operations budget. Monitor expenditures and optimize cost-efficiency. Incident, Problem, and Change Management: Oversee incident response and resolution processes. Lead problem-solving efforts and rootcause analysis. Manage changes to IT systems, ensuring minimal disruption to services. Required qualifications to be successful in this role: Required qualifications to be successful in this role More ❯
that cannot be addressed by First or Second Line support. You will play a key role in maintaining and improving the organisation’s IT infrastructure, performing deep-dive diagnostics, rootcauseanalysis, and implementing long-term solutions. In addition to supporting escalated incidents, you will contribute to system design, strategic projects, and continuous service improvement. Key Responsibilities … Expert-Level Support & Issue Resolution Take ownership of high-level, complex incidents and problems escalated from Second Line Support Perform in-depth diagnostics and rootcauseanalysis across infrastructure, systems, and applications Develop and implement long-term fixes and preventative measures to reduce repeat incidents Infrastructure Management & Improvement Maintain, monitor, and optimise servers, storage, networking, and virtual … support role Strong expertise in server administration, networking, virtualisation, and storage solutions Solid understanding of IT security principles and best practices Ability to carry out detailed troubleshooting and perform rootcauseanalysis Experience managing or contributing to technical projects and service improvements Proficiency in tools such as Active Directory, Group Policy, Office 365, Exchange, and Windows Server More ❯
Newcastle Upon Tyne, United Kingdom Hybrid / WFH Options
NHS Business Services Authority
platforms, ensuring the availability and stability of NHSBSA services.o Carrying out proactive support activities, such as evaluation of performance, tuning and running backup/recovery schedules.o Providing troubleshooting and rootcauseanalysis to identify issues, understand underlying cause and suggest future improvements.o Evaluating and interpreting technical data to resolve complex issues when performance is impaired.o Maintaining … to clinicians, NHS bodies and citizens. 2. Carry out proactive support activities, such as evaluation of performance, tuning and running backup/recovery schedules. 3. Carry out troubleshooting and rootcauseanalysis to identify issues, understand their underlying cause and suggest improvements for the future. 4. Carry out impact analysis to understand how change will … roles: Understanding of DevOps concepts such as version control, test automation, continuous integration; continuous deployment; infrastructure as code, containerisation, and pipeline orchestration. A strong focus on customer service Technical rootcauseanalysis skills Self-motivated, with an ability to work independently as well as part of an effective team. Proactive Desirable Strong Knowledge of a variety of More ❯
San Antonio, Texas, United States Hybrid / WFH Options
BridgePhase, LLC
systems to identify anomalous or malicious activity. Support incident response activities by conducting initial investigations and escalating issues as needed. Lead investigations into high-priority security incidents, including malware analysis and reverse engineering to determine intent and impact, and provide rootcauseanalysis and remediation guidance to system teams. Leverage SIEM platforms and threat intelligence feeds … looking for analysts who are adaptable, curious, and eager to support cyber defense in a mission-focused environment. Preferred Experience and Qualifications: 3-5 years of experience in cybersecurity analysis or security operations, including defending AWS-hosted environments and Internet-facing web services. Hands-on experience with SIEM platforms, log analysis, and basic incident response techniques. Experience developing … detection content such as alerts, dashboards, and correlation rules to support threat monitoring. Familiarity with malware analysis and reverse engineering techniques to determine impact and intent. Ability to produce rootcauseanalysis reports and remediation guidance following security incidents. Understanding of common cybersecurity frameworks such as RMF, NIST SP 800-53, and DISA STIGs. Working knowledge More ❯
Job Description: Support the customer in providing digital forensic analysis across various types of cases involving both mobile devices and computer systems. Must have the ability to perform forensic analysis on common operating system environments, to include, but not limited to, Microsoft Windows, Mac OS, UNIX/Linux and various mobile platforms (Apple, Android). Serve a tool … maintaining chain of custody for all digital evidence in accordance with Policies, NIST, and OMB standards. Execute existing forensic processes, and procedures. Obtain victim evidence and provide additional forensic analysis where required based on identified event/incident parameters by the incident managers during an event/incident response. Conduct forensic evidence collection utilizing security tools to include Splunk … and EDR solutions to correlate and analyze network sensor data with host forensic evidence. Produce technical analysis reports including rootcauseanalysis of analyzed hosts and/or artifacts discovered during an incident investigation. Maintain operational support of forensic capabilities, but not limited to administration and management of forensic systems and components. Salary Range More ❯
Arlington, Virginia, United States Hybrid / WFH Options
CGI
frameworks and metrics. - Assist in developing, tracking, and refining outcomes and driver metrics, including creating driver trees and updating functional and technical data definitions. - Support cross-functional teams with rootcauseanalysis, corrective actions, and process improvement initiatives. - Provide support for P2P forums, including preparing executive-level briefs and summaries and updating task management systems. - Monitor progress … for performance improvement initiatives through strategic communication and change management efforts. - Support cross-functional teams by applying process improvement tools and methodologies to address performance deficiencies and assist with rootcause analysis. - Benchmark and incorporate best practices from industry to recommend correction actions and implementation timelines. - Assist in creating workflows, dashboards, and analytics to optimize performance management activities. … improvement frameworks such as Change Management, Lean Six Sigma, Theory of Constraints, Agile or Scrum methodologies, and/or P2P. - Experience in developing and tracking metrics, driver trees, conducting cause-and-effect analysis, and reporting structures. - Proven ability to conduct rootcauseanalysis, recommend, and implement corrective action plans. - Exceptional written and verbal communication skills More ❯
City of London, London, United Kingdom Hybrid / WFH Options
REC SOLUTIONS LIMITED
with development, networks, ops and product teams on strategic IT initiatives. Assist with planning, management and resource allocation of inter-departmental projects alongside the PM team. Oversee incident management, rootcauseanalysis, and rapid resolution of system outages or performance degradation. Ensure compliance of procedures such as change management, patch management and security and audit processes. Assist … understanding of cybersecurity principles and experience implementing security measures in a regulated environment. Ability to coach, mentor, and upskill staff; develop career paths and ensure team resilience. Experience undertaking rootcauseanalysis including prevention orientated solution reporting. Working experience with deployment tools (e.g. GitLab pipelines) and rollback strategies. Proficiency in managing bare-metal servers, virtualization platforms such More ❯
City of London, London, United Kingdom Hybrid / WFH Options
REC SOLUTIONS LIMITED
with development, networks, ops and product teams on strategic IT initiatives. Assist with planning, management and resource allocation of inter-departmental projects alongside the PM team. Oversee incident management, rootcauseanalysis, and rapid resolution of system outages or performance degradation. Ensure compliance of procedures such as change management, patch management and security and audit processes. Assist … understanding of cybersecurity principles and experience implementing security measures in a regulated environment. Ability to coach, mentor, and upskill staff; develop career paths and ensure team resilience. Experience undertaking rootcauseanalysis including prevention orientated solution reporting. Working experience with deployment tools (e.g. GitLab pipelines) and rollback strategies. Proficiency in managing bare-metal servers, virtualization platforms such More ❯
the lifecycle, from requirements gathering through operations and maintenance. Integrate systems and applications across physical, virtual, and cloud environments (e.g., AWS, Azure, and VMware). Monitor system performance, conduct rootcauseanalysis, and apply patches and upgrades to maintain system health. Develop and maintain documentation for system configuration, architecture diagrams, and operational procedures. Ensure compliance with security … e.g., SolarWinds, Nagios), and ticketing systems (e.g., ServiceNow, Jira). Working knowledge of networking, identity, credential, and access management (ICAM). Ability to troubleshoot complex technical issues and lead rootcause investigations. Knowledge, Skills & Abilities: Knowledge of enterprise system architecture, virtualization, and cloud operations. Knowledge of systems hardening, patch management, and baseline security configurations. Skill in system troubleshooting … performance analysis, and technical documentation. Skill in scripting and automation to improve efficiency and reduce manual errors. Ability to translate complex requirements into functional system designs and implement them effectively. Ability to manage multiple systems and tasks in high-availability, fast-paced environments. Ability to work collaboratively with cross-disciplinary teams and communicate technical information clearly. Why Join Command More ❯
Lead Cost Analyst Overview: Technomics is a growing employee-owned, decision analytics company that specializes in cost and economic analysis to facilitate better decisions faster. We enable a wide range of clients across the Federal government, from senior level policy makers to program managers, to choose smartly, buy effectively and operate efficiently. We deliver practical, credible and defensible results … operating and support (O&S) cost estimating techniques for a broad range of cost elements Develop independent or program life cycle cost estimates and accompanying risk, uncertainty and sensitivity analysis Develop economic analyses, including but not limited to business case analyses, cost/benefit analyses and analysis-of-alternatives Assess the credibility of government and industry cost estimates … economic analyses Evaluate the credibility of industry cost proposals Assess the credibility of government and industry cost savings initiatives Assess industry contract cost and schedule performance and conduct variance rootcauseanalysis Document and present/defend analytical results Apply leadership skills and the ability to manage competing priorities, multiple tasks and work requirements. Apply highly effective More ❯