- Develop tools to enhance the system monitoring and tune the false alarms.
- Take the rotation to take the 24*7 on-call to investigate the production incidents, and the investigation process may include sampling transactions, log searching, code checking as well as contacting external channels and escalations, and execute the remedy plans.
- Document the incident details to track, and collect the statistics to periodically evaluate the system heath and tune the remedy plans.
- Script to repair or diagnose.
- Deploy system services and take actions to release them.
- Configure and develop to allow the new channels and features to be integrated.
Required Skills and Qualifications
- Bachelor's degree/diploma in Computer Science-related technical discipline, or equivalent practical experience.
- At least 1 year of relevant experience.
- Familiar with one of the programming languages Java/Go/C++/C#/Python.
- Familiar with Web Server and Network.
- Good analytical thinking and troubleshooting.
- Good communication skills to liaise with stakeholders from different business units.
- Knowing Go and Grafana are plus.
- Backend server development is a plus.
Claudia Kueh Kee Jinq EA License No.: 02C3423 Personnel Registration No.:R1880247