Site Reliability Engineer

At Orange Logic, we’ve been solving complex content challenges for over two decades—driven by innovation, curiosity, and a passion for impact, our intelligent Digital Asset Management (DAM) system, Orange Logic Platform, empowers organizations across industries to manage, access, and leverage their digital assets more effectively. We’re not just building powerful software—we’re building a team of bold thinkers, collaborators, and problem-solvers who care deeply about delivering real value. The Site Reliability Engineer (SRE) is responsible for ensuring the availability, reliability, and optimal performance of critical platform services and applications across Orange Logic’s global infrastructure. This role involves proactive monitoring, incident response, root cause analysis, and continuous improvement to meet internal and external user requirements. The SRE will collaborate closely with infrastructure, development, and operations teams to maintain service excellence in a remote, cloud-based environment.

Essential Functions:

Application and System Support:

Monitor, administer, and troubleshoot application performance and infrastructure health using observability tools.
Analyze and resolve application issues, provide timely status updates, and perform thorough root cause investigations.
Respond to alerts, outages, and system degradations, executing recovery procedures and supporting post-incident reviews.
Deliver front-end and back-end application support, including stakeholder consultation for performance improvements.

Infrastructure Management and Automation:

Implement and manage infrastructure using Infrastructure as Code (IaC) tools such as Terraform, Ansible, or Puppet.
Administer cloud-native services (e.g., EC2, S3, RDS, Kubernetes) on AWS, Azure, or Google Cloud.
Develop and maintain automation scripts to streamline deployments, configuration management, and repetitive tasks.
Ensure consistent code migration across environments to maintain application stability and functionality.

Monitoring and Observability:

Deploy and maintain application monitoring tools such as Prometheus, Grafana, and ELK stack.
Establish proactive alerting and visibility into system behavior to ensure rapid detection and resolution of issues.

Operations and Reliability:

Plan and execute application and configuration change procedures with minimal disruption.
Support scheduled maintenance activities including patching, updates, and server health checks.
Participate in an on-call rotation to support incident resolution during evenings and weekends.

Continuous Improvement:

Collaborate with Development, Infrastructure, and Production Support teams to optimize system performance and scalability.
Identify and implement process improvements to enhance service reliability and deployment efficiency.
Stay informed on the latest industry practices and tools related to site reliability, DevOps, and cloud infrastructure.

Qualifications:

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
8+ years of experience in site reliability, DevOps, or production engineering roles with increasing responsibilities.
Strong knowledge of distributed systems, cloud platforms (AWS, Azure, GCP), and containerized environments (Docker, Kubernetes).
Proficient in SQL and scripting languages (Python, Bash, PowerShell).
Extensive experience with observability stacks and automated alert systems.
Familiarity with web protocols, networking fundamentals, and API performance optimization.
Demonstrated ability to lead cross-functional initiatives and influence without authority.
Excellent verbal and written communication skills with a focus on clarity and action.
Experience mentoring engineering teams and conducting architectural or reliability reviews.

Perks of joining the team:

Competitive compensation and benefits designed to support you now and as you grow
Join a high-growth company where your work directly shapes the product, the team, and your long-term career trajectory

How to get started:

If you’re up for the challenge to be part of a growing engineering team we’d like to hear from you. Apply today!

Note:

Orange Logic only communicates with candidates through official @orangelogic.com email addresses or via LinkedIn from a verified Orange Logic Talent Acquisition employee listed under “Meet the Hiring Team.” We do not conduct interviews via text message or request payment for equipment purchases at any stage of the hiring process.

By submitting this application, I certify that all information provided herein is true, accurate, and complete to the best of my knowledge. I understand that any false or misleading information may result in disqualification from consideration or, if discovered after acceptance, may lead to immediate dismissal.

Orange Logic is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all our employees.

We respect your privacy. Learn more about how we handle applicant data in our Global Career Privacy Notice.

Apply Now

Site Reliability Engineer

Job Details