Rakuten | DevOps/SRE


Site Reliability Engineer – Marketing Cloud Platform Department (MCPD)


Job Summary


Japan, Tokyo
Partial remote
Apply from Anywhere

Language requirements

English: Business
Japanese: Not required

Key skills

  • Docker
  • CI/CD
  • Prometheus

Job Description

Job role


Job description

Department Overview
Marketing Cloud Platform Department (MCPD) mission is to lead Rakuten’s marketing-related products strategy and execute product development and implementation. We empower Rakuten internal marketing teams by building engaging, respectful, and cost-efficient marketing platforms that put our customers at the center. Our main advantage comes from our ability to leverage the Rakuten Ecosystem. We provide marketing solutions such as marketing campaign management, multichannel communication, and personalization.

Why We Hire
With the expansion of our business, we are opening the position of site reliability engineers who will ensure reliability for both new product concepts of tomorrow and functions of the existing product that we believe could be implemented into actual businesses in the next one to two years.

Position Details
The site reliability engineer will be responsible for the reliability and resiliency of the systems who will execute SRE activities in the whole software development life cycle from discussion with development team and product management team to gather requirements on infrastructure and operation, prepare SRE plan and strategy, build and operate infrastructure and monitoring solutions to enhance the reliability of Marketing Cloud ecosystem.

Responsibilities will include the following:
• Develop SLO/SLA/SLI collaborating with stake holders.
• Strategy planning on IT-BCP and system resiliency.
• Consult and build infrastructure for new services.
• Maintain the existing services infrastructure.
• Develop monitoring solutions for Marketing Cloud services.
• Build and maintain CI/CD/CT platform.
• Incident management and monitoring.
• Manage IT security.
• Infrastructure cost management.
• Guide engineering team on best practices of SRE and DevOps.

Required skills and experiences

Basic qualifications

- 5+ years of experience working as SRE / DevOps.
- Experience on SLA/SLO/SLI development.
- Knowledge in system resiliency and IT-BCP.
- Knowledge of system architecture .
- Experience working on maintenance of large scale system and troubleshooting.
- Experience with CI/CD process development and operation.
- Experience with Docker and Kubernetes in production environment.
- Experience with monitoring tool (Prometheus / Grafana, etc) and Logging & Alerting tool (Elasticsearch / Stack, etc)
- Have a growth mindset to find issues and solutions.
- Experience in private and public cloud operation.
- Proficiency in at least one programming language.
- Proficiency in at least one scripting language.
- Strong knowledge in Linux operating system.

Preferred qualifications

- Building and operating large scale system.
- Experience with system development and operation.
- Experience with Public cloud (Azure, GCP, etc)
- Experience working on system development and operation with large team and horizontal organization
- Experience in guiding and coaching other site reliability engineers.
- Clear understanding on networking protocols and layers.

Job Details

Employment typeFull-time
LocationRakuten Crimson House, 1-14-1 Tamagawa, Setagaya-ku, Tokyo158-0094
(1 min walk from Futakotamagawa Station on the Denentoshi Line)
Apply fromAnywhere
Remote workPartial remote
Working hours9:00am - 5:30pm (Every Monday, work hours are from 8:00am to 4:30pm due to morning meeting)
Holidays・2 days off per week (Saturdays, Sundays, and national holidays are holidays)
・10-20 days of annual paid vacation (the minimum number of days is the number of days granted after six months of employment)
・120 days off per year
In addition, year-end and New Year vacations, paid vacation, congratulation or condolence leave, maternity and paternity leave, etc.
*Once a year, you can take 9 to 12 consecutive holidays by using the long vacation (Success Vacation) system.
Employee benefits・Commuting allowance
・Housing allowance
・Health insurance
・Employee pension insurance
・Unemployment insurance
・Workers' accident compensation insurance
・Retirement allowance system
Supplemental education and qualification support
・English learning support (in-house TOEIC(R) test IP test, English conversation, etc.)
・Career challenge system (challenge the department of your choice)
・Job return system (rehiring system for those who retired due to marriage, childbirth, nursing care, etc.), etc.
・Stock Option Plan
・Cafeteria system with three free meals
・LILO Club (preferential treatment at sports clubs, accommodations, leisure facilities, movie theaters, etc.)
・LILO Club (sports clubs, lodging, leisure facilities, movie theaters, etc.) (Running, mountain climbing, cooking, etc., part of the expenses paid by the company)
・Reward system
・Free English conversation lessons by native English speakers
・Support system for certification acquisition
・Qualification support system, etc.
You must agree to the terms and conditions and the privacy policy