Site Reliability Engineering (SRE) Lead - 2770

Guadalajara, Jalisco, México | Software Development | Full-time | Fully remote

Apply by: No close date
Apply

Seeking Experienced Site Reliability Engineering (SRE) / Lead Engineer for Exciting Projects Remote in Guadalajara, Jalisco

 

We are looking for skilled Site Reliability Engineering (SRE) / Lead Engineer with a minimum of 8 years of experience to join a dynamic team within a leading organization. This role must have deep expertise in Application Performance Monitoring (APM), Infrastructure as Code (IaC), automation, and distributed tracing using OpenTelemetry.

As a SRE lead, he will guide the design, implementation, and continuous improvement of observability solutions, ensuring system reliability, performance, and scalability while fostering best practices in SRE and DevOps. 

 

Key Responsibilities:

·     -Lead the strategic development and management of observability and reliability frameworks across the organization, ensuring alignment with business goals and technical requirements.

·     -Design and implementation of monitoring and observability solutions, collaborating with engineering teams to define standards and best  practices.

·      -Manage Infrastructure as Code (IaC) initiatives using Terraform, coordinating with cloud and infrastructure teams to ensure scalable and secure deployments.

·      -Drive automation strategies for monitoring, alerting, and logging pipelines, focusing on process improvements and operational efficiency.

·       -Develop and maintain comprehensive observability roadmaps, including distributed tracing, logging, and metrics collection strategies.

·      -Collaborate with product management, sales, and pre-sales teams to provide technical expertise and support during solution design and customer engagements.

·    -Lead cross-functional teams to enhance CI/CD pipelines and deployment reliability, ensuring smooth integration of observability tools and practices.

·      -Engage with vendors and strategic partners to evaluate, select, and integrate observability and monitoring solutions, ensuring alignment with organizational needs and fostering strong collaborative relationships.

·      -Mentor and develop junior engineers and analysts, fostering a culture of reliability, observability, and operational excellence.

 

Technical Skills Required:

·       -  8-10+ years of experience in SRE, Observability, or DevOps roles, with leadership responsibilities.

·        - Hands-on experience with OpenTelemetry for distributed tracing and observability instrumentation.

·         -Proven expertise with Application Performance Monitoring (APM) tools such as New Relic, Datadog, AppDynamics, or Dynatrace.

·         -Strong proficiency in Infrastructure as Code (IaC) using Terraform.

·         -Solid understanding of cloud platforms including AWS, GCP, or Azure.

·         -Experience with automation/configuration management tools like Ansible, Chef, or Puppet.

·         -Deep knowledge of CI/CD pipelines and tools such as GitHub Actions, Jenkins, or Azure DevOps.

·         -Experience managing Kubernetes and containerized environments (Docker, Helm).

·         -Familiarity with log aggregation and analysis platforms like ELK Stack or Splunk.

·         -Excellent leadership, communication, and collaboration skills.

 

Location & Schedule:

  • Remote work in Guadalaraja, Jaliso
  • Work hours Monday to Friday, 09:00 – 18:00
  • Advanced English skills are mandatory

 

Benefits:

·         Attractive Salary + Premium Benefits

·         Performance bonuses,  grocery coupons, and savings are found.

·         Aguinaldo,  premium vacations,  and vacations paid

·         SGMM Medical insurance, family, and  Life insurance.

 

Candidates must include their compensation expectations in their applications and resumes in English.

 

Interested? Apply now through this link