hero

Careers

Are you as enthusiastic about innovation as we are? Our portfolio companies are hiring.
OCA Ventures
60
companies
193
Jobs

Site Reliability Engineer II

mPulse Mobile

mPulse Mobile

Software Engineering, Other Engineering
Remote
Posted on Jan 16, 2025

Job Details

Description

Position Summary:

We seek a highly skilled and motivated Site Reliability Engineer (SRE) to join our team. The ideal candidate will combine software engineering expertise with operational excellence to ensure our systems' availability, performance, and scalability. You will leverage cutting-edge tools and technologies, including AWS, Kubernetes, Docker, GitLab, and Python, to optimize and automate our infrastructure and deployment processes.

Duties/Responsibilities:

  • Design, implement, and manage scalable, highly available, and fault-tolerant systems using AWS cloud services.
  • Manage and maintain containerized applications with Kubernetes and Docker.
  • Automate deployments, monitoring, and incident responses using tools like GitLab CI/CD and Python scripting.
  • Develop and maintain comprehensive monitoring, alerting, and reporting systems leveraging Grafana, Prometheus, AppDynamics, and Splunk
  • Troubleshoot and resolve complex issues in production environments, ensuring high availability and system reliability.
  • Collaborate with development and operations teams to improve deployment pipelines, system performance, and infrastructure.
  • Create detailed documentation and runbooks to support operational workflows and incident resolution.
  • Perform capacity planning, and performance analysis, and implement infrastructure optimization strategies.

Skills/Abilities/Experience:

  • Core AWS technologies including EC2, RDS, S3, Lambda
  • Infrastructure as code experiences such as Terraform or Cloud Formation
  • Ansible or other configuration management technologies
  • Hashicorp Vault and Nomad
  • Experience implementing application performance monitoring tools such as New Relic or Sentry
  • Experience with Infrastructure as Code (IaC) tools such as Terraform or CloudFormation is preferred
  • Familiarity with security best practices in a cloud-native environment is preferred.
  • Knowledge of load testing and performance tuning is preferred.
  • Experience with additional scripting languages (e.g., Bash) preferred.

Minium Qualifications:

  • 3+ years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role.
  • Proven expertise in managing AWS cloud infrastructure (EC2, S3, RDS, Lambda, VPC, etc.).
  • Hands-on experience with Kubernetes for orchestration and Docker for containerization.
  • Proficiency in scripting and automation using Python.
  • Experience with CI/CD pipelines using GitLab or similar tools.
  • Strong knowledge of monitoring and logging tools: Grafana, Prometheus, AppDynamics, and Splunk.
  • Experience troubleshooting production systems, analyzing incidents, and implementing preventive measures.
  • Familiarity with distributed systems and microservices architecture.
  • Excellent problem-solving skills, with a focus on automation and scalability.
  • Strong communication and collaboration skills.

Physical Requirements:

  • Ability to stand and sit for extended periods..per
  • Ability to lift 10 lbs. weight.

Qualifications

Skills

Behaviors

:

Motivations

:

Education

Experience

Licenses & Certifications