Machine Learning Engineer
| Overview Money Forward is developing a variety of services for individuals and corporations to realize our vision, “Becoming the financial platform for all”. In addition, we are working to promote the effective use of data. In order to further address our customers' needs in the future, we are actively strengthening our development system using AI/ML technology for the main services of each department. We are looking for a passionate MLOps Platform Engineer to drive the operation and implementation strategies of AI/ML projects alongside our ML engineering development team. As part of our team, you will aid with the integration and optimization of ML technologies to enhance user experiences and contribute to Money Forward's numerous services. | Attractive Points In this role, you will be at the forefront of the latest technologies in container orchestration, cloud services, and CI/CD pipelines to enable efficient development, training and deployment of ML models. You will have the autonomy to design and implement optimization strategies, operate and maintain a scalable robust infrastructure tailored for ML projects, and empower ML engineers throughout the MLOps cycle. Alongside our technical team of talented experienced ML engineers, you will also have the opportunity to contribute to the MLOps cycle, gaining valuable insights in a diverse and dynamic environment. | Responsibility - As an MLOps Platform Engineer, you will play a critical role by enabling our team of ML engineers to develop, train and deploy ML projects efficiently using the latest technologies in container orchestration, cloud services, CI/CD pipelines for data collection, model training and monitoring in production - Building and maintaining a scalable infrastructure to execute ML projects, while committed to results and user value - Develop, design, maintain and manage container orchestration using Kubernetes - Design and execute strategies for GPU optimization, prediction servers, data and training pipelines while ensuring efficient use - Design and build inference platforms while ensuring reliability and high performance - Provision and monitor infrastructure resources - Build and maintain ML workflows and pipelines - Deploy and maintain monitoring services for observability - Ensure compliance with security best practices
- Bachelor's degree in Computer Science, engineering or related field - 3+ years building core infrastructure for ML projects - 2+ years of experience implementing AI/ML algorithms, refining and improving models, and integrating them into production services - Experience in managing, designing, implementing and maintaining robust ML infrastructure to support development and inference workloads, ML workflows, training pipelines and versioning - Experience building and scaling machine learning infrastructure - Experience with AWS cloud services - Experience with Kubernetes to deploy and manage containerized applications with high availability and performance - Experience in running and scaling inference clusters - Experience with TerraGrunt or TerraForm, IaC and CI/CD practices - Comfortable taking over legacy projects for operation and maintenance - Proficiency in programming (Python or Go) - Excellent problem-solving skills and ability to work in a dynamic environment - Effective communication skills to collaborate with technical and nontechnical members
- Master’s degree in Computer Science, engineering or related field - Proficiency on KubeFlow and MLFlow for workflows and pipelines - Experience in designing, developing and operating large-scale AI/ML systems - Certifications in AWS(MLS-C01), Kubernetes(CKA) or relevant technologies - Experience with additional cloud services - Contributions to open-source projects - Experience in working to improve model performance, including AI/ML model refinement and fine-tuning - Knowledge of data security standards such as handling personal information, financial/accounting data, PCI DSS, etc., and experience in designing, developing, and operating systems by these requirements
- A shared belief in Money Forward's Mission/Vision/Values/Culture - Able to communicate proactively across organizational boundaries and resolve complex inter-organizational issues on their initiative - Able to recognize and leverage the potential in Money Forward's data - Feel the joy and satisfaction of solving business problems with AI/ML technology
Tech Stack - AI/ML: SageMaker, TensorFlow, PyTorch, Kubeflow etc. - AWS: SageMaker, EKS, ECS, Lambda, Step Functions etc. - Middleware: Docker, Terraform, Kubernetes - Programming Language: Python, Go etc. Tool - Groupware: Google Workspace - Repository Management: GitHub - CI/CD: CircleCI, GitHub Actions - Monitoring: Datadog, Grafana - Communication: Slack - Ticket Management: Jira