Returning Candidate?

Site Reliability Engineer

Site Reliability Engineer

Job Locations 
US-NY-New York
US-CA-Santa Monica
US-CA-San Francisco
Infrastructure & Platform Services

More information about this job


MediaMath’s strength is in numbers. Our technology analyzes 200 billion customer opportunities daily– more volume than the top 10 stock exchanges in the world, combined.


We believe consumers want to have meaningful conversations with their favorite and yet-to-be-discovered brands across all digital touchpoints. Our omnichannel, integrated programmatic platform unites digital media and big data to maximize the return on every marketing dollar spent by making advertising relevant, personalized, measurable and controllable.   From inventing the DSP category in 2007 to being named a DMP Forrester Challenger (our first year participating in the DMP Wave!) in 2017, we continue to deliver results for marketers more quickly and accurately than any other solution. 

Technology is changing the way brands interact with consumers. MediaMath is powering that change. Come be a part of it!


We are seeking a Site Reliability Engineer to help support our internal engineering teams in creating and maintaining fast, scalable, secure, and reliable systems. Your responsibilities will be ensuring services you support are healthy, built on best practices, thoroughly documented (with proper SLOs defined and alerted on), have clear on-call escalation policies defined, operational runbooks created and tested with regular cadence, participating in incident management and postmortems. You are an arbiter of technical excellence. You care deeply about your users and their continued happiness. You desire deep understanding of technical issues when they arise, and work diligently to understand, and drive them to resolution.


  • Manage the scalability, performance, and availability of MediaMath RTB bidding platform by solving for reliability against existing systems and services spanning the entire stack.
  • Develop tools and automation to minimize delivery time and increase developer productivity.
  • Participate in the design and development of new and evolving services, architecture, and performance standards.
  • Participate in and strongly influence capacity planning and service performance analysis and tuning.
  • Influence in development of best practices for deployment, monitoring and alerting.
  • Support team members in the development of a SOA strategy and migration path.
  • Respond to and resolve emergent issues. Be on-call periodically as part of shared team.
  • Begin to mentor and coaching for junior team members
  • You are considered a “security employee” and have a particularly noteworthy security aspect to your role and are required to undergo additional training annually.
  • Administer and ensure logical security in carrying out all job duties
  • Support in Security Incident response and monitoring, as needed

This is not an exhaustive list of responsibilities. As part of our global technology team, you may be required to be work off-hours or be on-call on a rotating basis. Other duties may be assigned, as needed. MediaMath retains the right to change job duties at any time.


The top qualification for this role, above all else, is a strong desire to be part of something big; where input is encouraged and results are rewarded.




  • 5-7 years of relevant work experience, including experience with high-volume, production distributed systems environment.


  • Linux administration/operating system internals (filesystems, syscalls, cgroups, etc)
  • Networking experience (routing, SDNs, network topologies)
  • Experience in one or more of the following: Perl, Ruby, Go, C, C++, Scala, Java
  • Experience with any of the following applications: Foreman, Apache, HAProxy, Prometheus, Kubernetes, Graphite, Kafka, Redis, Cassandra
  • Ability to break down large bodies of work into manageable tasks

Highly Desired:

  • Extensive working experience with Linux system (Debian based).
  • Familiarity with cloud infrastructure, such as AWS.
  • High-level shell fluency + one or more scripting languages (Python, Perl, or similar).
  • Experience managing and deploying full stack, distributed services.
  • Experience with container technologies (Docker, Vagrant, LXC, etc)
  • Experience with system automation tools (Ansible, Chef, Puppet, Salt Stack, etc.).
  • Experience with monitoring, alerting, and pipeline analysis tools (Nagios, Sensu, Graphite, Riemann, Logstash, etc.).
  • Excellent analytical skills, coupled with a strong sense of ownership, urgency and drive.
  • Experience with queuing/data-pipelining solutions (Storm, RabbitMQ, Amazon Kinesis, ZeroMQ, Kafka, etc.).
  • Experience with SQL/NoSQL systems such as PostgresSQL, MongoDB, Redis, Cassandra, DynamoDB, etc.


MediaMath is privately held, employee owned, and headquartered in New York. Mathletes enjoy: Company equity. Performance Bonus. Comprehensive Insurance. Global Internal Mobility. Open Paid Time Off, Philanthropy and Holidays. 401(k) match. Paid Parental Leave. Cell Phone Reimbursement. Modern office space. Onsite Fitness & Wellness.

If there might be a match, you'll be scheduled for a first round interview; a 30-minute phone call with our recruiting team so we may get a better understanding of why you are interested in MediaMath and why you think it's a fit. We do our best to respond to everyone, however due to the volume of applications received, only those selected for interviews will be contacted. If you really think we’ve missed the mark, please follow up with and let us know why you’re the perfect fit!