Batch processing

Spring Batch: Running A Batch Job Explained

Spring Batch: Running a Batch Job Explained

To run a batch job, a JobLauncher needs a batch JobConfiguration and JobParameters. This creates a JobInstance.

A JobInstance creates a JobExecution. If the job fails and is restarted it retains the same JobInstance but creates a new JobExecution.

A JobExecution runs one or more steps in the Batch_Step_Execution table.

So each new run of a batch job creates a new batch job instance, and each restart creates a new job execution.

Spring Batch: Creating A Custom Job Builder

Spring Batch: Creating a Custom Job Builder

Batch jobs are created in Spring Batch using the JobBuilderFactory. A custom job builder factory can be created and re-used to allow common job configurations to be shared across all batch jobs in a project.

Example

@Configuration
public class CustomJobBuilderFactory extends JobBuilderFactory {

    @Override
    public JobBuilder get(final String name) {
        final JobBuilder jobBuilder = super.get(name);
	// register common to all jobs things here
        jobBuilder.listener(commonListener);
	return jobBuilder;
    }
}

Implementation


    @Bean
    public Job exampleJob(final CustomJobBuilderFactory jbf) {
	// This job now has the common listener 
	return jbf.get("myJob").start(myStep).build();
    }
}

Spring Batch: Trigger an Action When A Batch Job Fails

Spring Batch: Trigger an Action When a Batch Job Fails

When a batch job fails it’s useful to automatically send a notification or trigger some sort of an action.

Spring Batch allows registering a listener in the job configuration. Below is an example of a listener that triggers an action after a job. Using the job execution, the action can be set to only fire when the job is not complete.


@Component
public class AfterJobListener implements JobExecutionListener {

    @Autowired
    private NotificationService notificationService;

    @Override
    public void afterJob(final JobExecution jobExecution) {
        if (jobExecution.getStatus() != BatchStatus.COMPLETED) {
	    notificationService.sendNotification(); // pass data about the job here
	}
    }
}

Spring Batch: Restart A Failed Batch Job Using an API Endpoint

Spring Batch: Restart a Failed Batch Job Using an API Endpoint

An key thing to remember in Spring Batch is that an instance of a Spring Batch job can only be restarted if it FAILED. If the job completeds SUCCESSFULLY a new instance of the job will have to be created to run it again.

When restarting batch jobs manually, is helpful to create an API endpoint for restarting failed batch jobs.

When a batch job is run the first time, it is assigned a batch_job_instance_id and a batch_job_execution_id.

Spring Batch: Chunk Size

Spring Batch: Chunk Size

In Spring Batch, when configuring a step you can set the chunk size.

This is a very critical and often overlooked settings.

When developing locally, it’s difficult to catch performance problems because typically local data sets are very small. However once deployed the performance problems can be crippling.

The chunk size determines how many records are processed before a commit is triggered.

So if your chunk size is set to 10 and you have 1 million records to process, the application will trigger 100,000 commits. This will be very slow.

Spring Batch: Running Batch Jobs Asynchronously

Spring Batch: Running Batch Jobs Asynchronously

When running multiple batch jobs in Spring Batch, the default JobLauncher waits to launch a job until the previous job running is COMPLETE. In order to run multiple batch jobs simultaneously, the JobLauncher must be configured to run jobs asynchronously.

Configure the async job launcher


public JobLauncher customJobLauncher(@Autowired final JobRepository jobRepository) {
    final SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
    jobLauncher.setJobRepository(jobRepository);

    // Here we configure the async task executer
    jobLauncher.setTaskExecutor(new SimpleAsyncTaskExecutor());
    jobLauncher.afterPropertiesSet();
    return jobLauncher;
}

Run a job with the asyc job launcher

public class BatchJobScheduler {

    @Autowired
    private JobLauncher customJobLauncher;

    @Autowired
    private Job customJob;

    @Scheduled(cron="* * 1 * * *")
    public BatchStatus scheduleJob() {
        final JobExecution ex = customJobLauncher.runJob(customJob, getJobParameters());
        return ex.getStatus();
    }

    JobParameters getJobParameters() {
        // Get the job parameters
    }
}

Spring Batch: Limit How Frequent a Batch Job Runs

Spring Batch: Limit How Frequent a Batch Job Runs

Often in designing batch jobs you want to limit the frequency that a job can run. For example, if you want a batch job to run once a day, you can use @Scheduled annotation to run the batch job once a day. This is enough if you only run one instance of the batch program and there are no other ways to launch the job.

But if you have manual ways to launch the same job, or you are you using kubernetes to run multiple instances of your batch program, the batch job may be attempted to run multiple times in one day, even if you intend to only run once a day.

Spring Batch: Sharing Batch Step Configurations

Spring Batch: Sharing Batch Step Configurations

When defining a Spring batch step, often common configurations are added to almost every step. In order to not violate DRY (Don’t Repeat Yourself), a StepBuilder can be customized upstream with all the shared configurations.

Configuring the StepBuilder


@Configuration
public class SharedStepBuilderFactory extends StepBuilderFactory {

    @Autowired
    public JobRepository jobRepository;

    @Autowired
    public PlatformTransactionManager transactionManager;

    // Inject shared values here and use them as well
    @Value
    private String logDirectory;

    public SharedStepBuilderFactory(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
        super(jobRepository, transactionManager);
    }

    @Override
    public StepBuilder get(final String name) {
        // Get the default step builder
        final StepBuilder builder = super.get(name);

        // Add listeners you want for EVERY step
        stepBuilder.listener(new StepLoggingListener());
        stepBuilder.listener(getCustomLoggingListener(logDirectory);
        return stepBuilder;
        }
}

Implementation

@Configuration
public class StepConfig {

    @Bean
    public Step getStepOne(final SharedStepBuilderFactory sharedStepBuilderFactory) {
        return sharedStepBuilderFactory
        .get("stepOne") 
        .reader(stepOneReader())
        .writer(stepOneWriter())
        .build(); // The listeners are already configured for this step from the SharedStepBuilderFactory
    }
}

Spring Batch: Query All The Steps of a Batch Job

Spring Batch: Query All the Steps of a Batch Job

In Spring Batch, in order to get the job_execution_id of the last batch job instance for a given batch job name use this query:

select bje.job_execution_id from batch_job_instance bji
join batch_job_execution bje
on bji.job_instance_id = bje.job_instance_id
where bji.job_name = 'jobname'
order by bje.start_time desc
limit 1;

In order to get all the steps for the latest batch job instance for a given batch job name use this query:

select *
from batch_job_execution bje
join batch_job_instance bji
on bje.job_instance_id = bji.job_instance_id
join batch_job_step_execution bse
on bse.job_execution_id = bje.job_execution_id
and bje.job_execution_id = 
(
    select bje.job_execution_id from batch_job_instance bji
    join batch_job_execution bje
    on bji.job_instance_id = bje.job_instance_id
    where bji.job_name = 'jobname'
    order by bje.start_time desc
    limit 1
)
order by bse.start_time;