VoIP Operations & Production Release Engineer

About the Role

 

We are looking for an experienced Operations & Production Release Engineer to own the day-to-day stability, security, and evolution of our VoIP and telecommunications platform.

This is a hands-on hybrid or remote role that combines production release management (planning, governance, execution) with deep operational ownership of a real-time voice infrastructure built on open-source SIP stacks, session border controllers, and AWS cloud services.

You will be the technical authority responsible for ensuring that releases reach production safely, that the platform stays up, that calls connect, and that we are never the weakest link in our customers’ communications stack. You will work closely with Engineering, Development, Support, and Security.

Key Responsibilities
Production Release Management

 

· Own the end-to-end release lifecycle: change intake, risk assessment, scheduling, CAB approval, execution, validation, and rollback.

· Maintain the release calendar across staging and production environments and coordinate maintenance windows with carriers, customers, and internal stakeholders.

· Define and enforce release governance: change tickets, deployment runbooks, pre/post-checks, and back-out plans for every change.

· Drive continuous improvement of CI/CD pipelines (e.g., GitLab CI, Jenkins, GitHub Actions) for SIP application code, dial-plan logic, and infrastructure-as-code.

· Conduct post-release reviews and feed lessons learned back into engineering.

VoIP Platform Operations

 

· Operate, tune, and troubleshoot Kamailio as the SIP edge/registrar/proxy: dispatcher, dialog, permissions, auth, accounting, TLS, NAT traversal, RTPengine integration, scripting in KEMI/native config.

· Operate FreeSWITCH as the media/B2BUA/IVR/conferencing layer: dialplans, modules (mod_sofia, mod_conference, mod_xml_curl), ESL, codec negotiation, and transcoding.

· Manage Session Border Controllers (Oracle/Acme Packet, AudioCodes, Sansay, or equivalent): signalling and media policy, topology hiding, codec interworking, registration throttling, DoS protection.

· Tune call-handling capacity (CPS, concurrent sessions), monitor jitter/packet loss/MOS, and resolve one-way audio, ghost calls, registration storms, and SIP loops.

SIP Troubleshooting & Diagnostics

 

· Perform live and forensic SIP tracing using HOMER/sipcapture, sngrep, Wireshark, ngrep, tcpdump, and pcap analysis.

· Diagnose interop issues across SIP variants and edge-case dialog flows (REFER, re-INVITE, UPDATE, early media, SDP renegotiation).

· Build dashboards and alerting on call quality, ASR, ACD, NER, and SIP response-code distributions.

Telco Infrastructure & Carrier Operations

 

· Manage SIP trunks and interconnects with upstream carriers and tier-1 wholesalers.

· Own carrier onboarding, IP whitelisting, codec/profile alignment, and signalling testing.

· Coordinate with carriers on incident triage, MOS degradation, FAS, and trunk failover.

Routing, Numbering & Number Management

 

· Maintain and evolve LCR (Least Cost Routing) logic, prefix tables, and routing policies across multiple carriers.

· Handle DID/number provisioning, porting (LNR/LNP) workflows, E.164 normalisation, CLI and P-Asserted-Identity handling, and STIR/SHAKEN where applicable.

· Manage number inventory, regulatory tagging, and emergency services routing (E911 / 999 / 112) where in scope.

Cloud & Infrastructure (AWS)

 

· Operate the production estate on AWS: EC2, VPC peering, Transit Gateway, Elastic IPs, Route 53, S3, RDS/Aurora, CloudWatch, IAM, KMS, Systems Manager.

· Maintain infrastructure as code (Terraform/CloudFormation) and configuration management (Ansible).

· Capacity plan for real-time workloads: instance sizing, ENA/SR-IOV networking, placement groups, and dedicated tenancy where required for media performance.

Databases

 

· Administer MySQL/MariaDB/PostgreSQL clusters underpinning Kamailio, FreeSWITCH, CDR, and provisioning systems.

· Manage replication, backups, schema migrations during releases, and tuning for high-write CDR/accounting workloads.

· Operate Redis and time-series stores used for session state, rate limiting, and metrics where applicable.

Security

 

· Harden SIP edges against fraud and abuse: toll-fraud detection, brute-force registration protection, geo-fencing, fail2ban/Kamailio pike, anti-flood, ACLs, and rate limiting.

· Manage firewalls at the network and host level (AWS Security Groups, NACLs, iptables/nftables, perimeter firewalls).

· Operate TLS for SIP signalling and SRTP for media: certificate lifecycle, cipher policy, and mTLS where used.

· Support security audits, vulnerability management, patch cycles, and incident response.

· Maintain alignment with relevant frameworks (ISO 27001, SOC 2, GDPR, PCI-DSS as applicable).

Monitoring, Observability & Incident Response

 

· Maintain monitoring and alerting across the stack (Prometheus, Grafana, CloudWatch, Zabbix, HOMER, Sipwise, or equivalent).

· Run incident response: lead bridge calls, drive RCA, write postmortems, and close the loop with engineering on permanent fixes.

· Define and report on SLOs/SLAs: availability, ASR, post-dial delay, and MOS.

Essential Skills & Experience

· 5+ years in a senior operations, SRE, or release management role on a real-time voice/VoIP platform at scale.

· Strong hands-on expertise with Kamailio and FreeSWITCH (or similar) in production, including reading/writing configs, implementing logic, and debugging under pressure.

· Deep, practical knowledge of SIP (RFC 3261 and major extensions), SDP, RTP/SRTP, codecs (G.711, G.729, Opus, AMR-WB), and common interop pitfalls.

· Demonstrable experience operating SBCs in production (any major vendor).

· Strong experience with SIP tracing tooling: HOMER, sngrep, Wireshark.

· Proven AWS production experience, including infrastructure-as-code with Terraform or CloudFormation.

· Solid Linux administration (RHEL/CentOS/Rocky or Debian/Ubuntu), networking (TCP/IP, NAT, BGP basics, MPLS awareness), and shell scripting.

· Database administration experience (MySQL/MariaDB or PostgreSQL) including replication and tuning.

· Experience leading change management and production releases in a regulated or carrier-grade environment.

· Strong written communication: runbooks, RCAs, change records, and customer-facing incident notes.

Desirable

· Experience with WebRTC, Janus, or Jitsi.

· Exposure to STIR/SHAKEN, KYC, and number-validation regimes.

· Container and orchestration experience (Docker, Kubernetes/EKS) for stateless components.

· Experience with HOMER/Heplify deployments at scale.

· Carrier-grade interconnect experience (SIP-I/SIP-T, SS7/Sigtran awareness).

· Programming in Python, Go, or Lua for automation and Kamailio scripting.

· Experience operating UCaaS, CCaaS, or CPaaS platforms.

· Relevant certifications: AWS (SA Pro/SysOps), Linux (RHCE/LFCE), security (CISSP, CompTIA Security+).

Personal Attributes

· Calm under pressure—voice outages happen at 3am and customers notice immediately.

· Disciplined about change control without being bureaucratic.

· Comfortable owning a production environment end-to-end.

· A genuine troubleshooter who enjoys taking a pcap apart to find the one malformed header.

What We Offer

· Competitive salary and bonus

· Pension

· Healthcare

· On-call allowance

· Training and certifications

· Hybrid / remote working arrangements

Job Details

Company
StableLogic
Location
Greater London, England, United Kingdom
Hybrid / Remote Options
Posted