How to write a robust system level service - some key learning - 如何寫好一個健壯的系統級服務
Scenario: Rewriting a quartz job service.
Background: The existing service logic was hardcoding every job‘s cron expression into xml file. This makes the debugging/tuning of each single jobs very difficult. For example. if a job runs once everyday during midnight, and you want to trigger the job manually during daylight time, or you want to stop that job because it has a bug, you have to shutdown the web site, change the xml file and restart. But it‘s not possible to stop a running web site whenever you want. Hence we have the requirement to re-write this basic service.
Key features:
1. The detailed configration of each single quartz jobs should be stored in DB, rather than xml file (including Java bean name, target method, cron expression etc.)
2. There should be a centralized job factory service, which can read job configurations from database.
3. Each job can be enabled or disabled, and the cron expression can be changed without altering the state of the running web site.
4. Each job can be manually triggered without altering the cron expression. (By changing the cron expression itself, this function can be done, by you have to restore the orginal cron expression, in some cases this is not very convenient)
Some key learnings and critical modifications during the stabilization of the service:
1. (Not related to this topic): Several different ways to implement the quartz job factory, one is to use the reflection mechanism to use class name to init a new object and call thet target method to run, another is to get the java bean from the spring context. However, the first one cannot be used because some referred objects are initialized by spring framework, you cannot simply get it done by a "new".
2. Think carefully, double check each code branch, not to run into dead ends, endless loops.
3. There is a flag for each job in db, indicating enabled or disabled state. The job factory should load all the jobs from DB, including disabled ones. If you only load enabled jobs, the job will never be disabled. (For those disabled jobs, you need to check whether they are still running. if yes, shut it down.)
4. There is a flag to set "manual trigger". If set to 1, the job will be triggered immediately. You should disable the manual trigger back to 0 immediately when you read the job, not to wait till the task ends because the task might take long time.
5. Add a local config file in each environment to switch the job factory on or off. By default off. The job factory should read from this local file to get the switch. This happens when several different developers are connecting to 1 DB. You will find you can hardly trigger your job because other computers are competing with you.
How to write a robust system level service - some key learning - 如何寫好一個健壯的系統級服務