Posted By: Anonymous
I have a use-case for which I could use spring batch job which I could design in following ways.
1) First Way:
Step1 (Chunk oriented step): Read from the file â€”> filter, validate and transform the read row into DTO (data transfer object), if there are any errors, store errors in DTO itself â€”> Check if any of the DTOs has errors , if not write to Database. If yes, write to an error file.
However, problem with this way is – I need this entire JOB in transaction boundary. So if there is a failure in any of the chunks then I donâ€™t want to write to DB and want to rollback all successful writes till that point in DB. Above way forces me to write rollback logic for all successful writes if there is a failure in any of the chunks.
2) Second way
Step 1 (Chunk oriented step): Read items from the file â€”> filter, validate and transform the read row in DTO (data transfer object). This does store the errors in the DTO object itself.
Step 2 (Tasklet): Read entire list (and not chunks) of DTOs created from step 1 â€”> Check if any of the DTOs has errors populated in it. If yes, then abort the writing to DB and fail the JOB.
In second way, I get all benefits of chunk processing and scaling. At the same time I have created transaction boundary for entire job.
PS: In both ways in their first step there wonâ€™t be any step failure, if there is failure; errors are stored in DTO object itself. Thus, DTO object is always created.
Question is – Since I am new to Spring batch, is it a good pattern to go with second way. And is there a way that I can share data between steps so that entire List of DTOs is available to second step (in second way above) ?
In my opinion, trying to process the entire file in a single transaction (ie a transaction at the job level) is not the way to go. I would proceed in two steps:
- Step 1: process the input and writes errors to the file
- Step 2: this step is conditioned by step1. If no errors has been detected in step 1, then save data to the db.
This approach does not require to write data to the database and roll it back if there are errors (as suggested by option 1 in your description). It only writes to the database when everything is ok.
Moreover, this approach does not require holding a list of items in-memory as suggested by option 2, which could be inefficient in terms of memory usage and performs poorly if the file is big.