Home » How to Use Batch in JDBC to Insert huge data

How to Use Batch in JDBC to Insert huge data

by Imran Shaikh
Published: Last Updated on 134 views
How to Use Batch in JDBC to Insert huge data

Hey. Tea Lovers! Today we will discuss batching and how to use batch in JDBC to insert huge data.

Before jumping in, I would suggest you look at my previous posts, “How to Achieve Greatness in JDBC Performance” and “JDBC Connection Pooling Explained with HikariCP“. This will help you to further increase the performance along with reliability, maintainability, and flexibility.

Why use Batch Insertion

Many of you have worked on some project where you needed to insert a large amount of data into the database. I prefer just using plain JDBC for this kind of job to give the operation a significant boost. But inserting the record one by one like we do is a very slow process. We need to wait until the data of a single record goes to the database, insert happens and then over the network, an acknowledgment occurs, phew!. Luckily in JDBC, we have batch operations. With this, we can insert the data in batches rather than one by one, giving a huge advantage.

Create and Add Data To a Batch in JDBC

Batch in JDBC is very simple. There are 2 methods to create/add the batch. One is addBatch() and another is addBatch(String sql). The first one is used in the PreparedStatement and the second one is for the Statement. Although I won’t recommend you to use the Statement for batch insertion at all, as it would be (kind of) similar to one by one. Why you say, find your answer here.

For this example, I am inserting the data in the employee table with some random data generated in the loop. In a real scenario, you would get the data in a Collection or array or from a CSV file. Create an insert SQL with the parameters to set it up later. Put it outside the try-catch because we just need a single PreparedStatement. And then set up the parameter by looping the data. And at the end of each loop, add it to the batch with addBatch() Got it? Well, let me just show you the code.

final String empInsertSql = "INSERT INTO employee (id, name, salary) VALUES(?,?,?)";
try (
        // set up the connection in another function. This is just for example
        Connection con = DriverManager.getConnection("");
        PreparedStatement pstmt = con.prepareStatement(empInsertSql)
) {
    for (int i = 0; i < 1_000_000; i++) {
        // replace with your data setup
        pstmt.setInt(1, i);
        pstmt.setString(2, "name_" + i);
        pstmt.setInt(3, i * 100);
        // add it to a batch
        pstmt.addBatch();
        // not needed but for safety reset the parameters we passed to the stmt.
        pstmt.clearParameters();
    }
} catch (EXception e) {}

As you can see, I am generating 1,000,000 of data and inserting into a batch. But hold your horses, it is not a complete picture, yet. Generally, you want to create a batch of sizes 1 to 3 thousand. More than that, I would say, not a good performance. But how to decide the batch size? Simple, just execute the batch when you reach the size. Let us see how.

Executing a Batch in JDBC

The batch itself does not have any mechanism to get the size of a batch. You need to create your own mechanism. Remember that index we are using in the loop? Yes, that variable name i. We will be using that. You check whether the current value of i is the perfect size for the batch. If yes, then execute the batch with the help of executeBatch(). This method returns an array of int, which have the result of each statement in the same order they were added to the batch. Anything more than 0, is a successful execution.

There is a similar method named executeLargeBatch() to, well, a large batch.

The following code is to check the i and execute it with the batch of 2000. And check the result for any failures.

if (i % 2000 == 0){
    int[] results = pstmt.executeBatch();
    // loop to check the result of each record
    for (int k = 0; k < results.length; k++) {
        // if it is a minus value, then the execution failed for that record
        if (results[k] <= 0) {
            // if it failed for this record do something
        }
    }
}

This needs to be added inside the loop. But wait, what of the remainder, suppose I have 2300 records, then the remaining batch of 300 records will be executed? Unfortunately, NO. You just have to add repeat the block outside the loop as well, without that condition of course.

// to insert whatever is remained in the batch
int[] results = pstmt.executeBatch();
// loop to check the result of each record
for (int k = 0; k < results.length; k++) {
    // if it is a minus value, then the execution failed for that record
    if (results[k] <= 0) {
        // if it failed for this record do something
    }
}

This will execute the last batch. Now let us see how the full code looks like.

Batch Insertion in JDBC Full Code

The code for batch insertion.

final String empInsertSql = "INSERT INTO employee (id, name, salary) VALUES(?,?,?)";
try (
        // set up the connection in another function. This is just for example
        Connection con = DriverManager.getConnection("");
        PreparedStatement pstmt = con.prepareStatement(empInsertSql)
) {
    for (int i = 0; i < 1_000_000; i++) {
        // replace with your data setup
        pstmt.setInt(1, i);
        pstmt.setString(2, "name_" + i);
        pstmt.setInt(3, i * 100);
        pstmt.addBatch();
        pstmt.clearParameters();
        // I a using the batch of 2000 records at a time
        if (i % 2000 == 0) {
            int[] results = pstmt.executeBatch();
            // loop to check the result of each record
            // need index as we might have to identify the failed record
            // with some calculation with batch and i's value
            for (int k = 0; k < results.length; k++) {
                // if it is a minus value, then the execution failed for that record
                if (results[k] <= 0) {
                    // if it failed for this record do something
                }
            }
        }
    }
    // to insert whatever is remained in the batch
    int[] results = pstmt.executeBatch();
    // loop to check the result of each record
    for (int k = 0; k < results.length; k++) {
        // if it is a minus value, then the execution failed for that record
        if (results[k] <= 0) {
            // if it failed for this record do something
        }
    }
    System.out.println("The data is inserted successfully with batches of 2000");
} catch (SQLException e) {
    e.printStackTrace();
}

The code I mentioned is just for getting you to familiarize with the batches and you might want to modify the connection object and set it up to run the code.

Conclusion

I know, that the code I mentioned needs modification to run. But this block is highly customizable depending on your data and table. Don’t use the code as-is, identify the data, and modify accordingly.

In batches you just need to make sure few points are meeting and you are all set.

  • Use PreparedStatement and reset the parameters after adding to the batch.
  • Add only 1000-3000 records in a single batch
  • Execute the batch once the batch is complete.
  • Once again execute the batch outside the loop to execute the last batch.

That’s it for this post. I hope you liked the post and will use the batch every time you are working on a huge amount of data migration. But before that, I highly recommend you to use the tips and code standards I described in the following posts.

You can check the whole code describe here on GitHub or Full Project.

If you have any questions or want to suggest something, please free to add a comment, we will be happy to respond and help.

See you in the next post. HAKUNA MATATA!!

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More

Privacy & Cookies Policy