Sometimes our application needs to fetch a large amount of data from database systems and do some in memory processing on top of it. The SELECT query to fetch the data may contain a WHERE clause, IN operator etc and the query may return millions of rows. Our application and the database may reside on different machines, so the data will be transferred over the network from the database to our application. For our case, we will see how to improve performance of select query using JDBC. The database we are using is oracle, though most of the points are valid for other databases as well.
- Create Index on the column/columns used in the WHERE clause of SELECT query. This will avoid the full table scan. Full table scan is a time-consuming operation.
- Set FETCH SIZE using setFetchSize() method on Statement or ResultSet to reduce unnecessary network round trips.- With Oracle, the default value is 10. If you want to retrieve 1000 rows it requires 100 network round trips between your application and database server to transfer all data. This will impact your application response time. The JDBC drivers are designed to fetch small number of rows from database to avoid any out of memory issues. Both Statement and ResultSet have setFetchSize() method. setFetchSize() gives a hint about the number of rows that should be fetched from the database each time new rows are needed. The goal is to reduce unnecessary network round trip.
- Specify only required columns in the SELECT query. Avoid using SELECT * query, if you do not need data from all the columns. This will reduce unnecessary data retrieval and transfer.
- Use proper getXXX() methods on ResultSet – ResultSet interface provides lot of getXXX() methods to get and convert database data types to java data types.
For example, getString(String columnName) returns java String object. columnName is recommended to be a VARCHAR OR CHAR type of database. But it can also be a NUMERIC, DATE etc in which case it needs to cast it to proper java data type which is expensive operation. Now consider the expense of casting on millions of rows of data. - Use PreparedStatement if you have to execute same query more than once using different parameters(in the WHERE clause etc).
- Avoid Mixed-Type Expressions in where clause. If your column emp_id is VARCHAR2 and you want to use an index on the column emp_id, but the WHERE clause looks like this:-
AND emp_id = 10010
Where 10010 is of number type, Oracle translates that expression into:-
AND TO_NUMBER(emp_id) = 10010
Any expression using a column, such as a function having the column as its argument, causes the optimizer to ignore the possibility of using an index on that column, even a unique index. If the system processes even a single row having emp_id as a string of characters that does not translate to a number, then an error is returned.
You can avoid this problem by replacing the above expression with the explicit conversion:-
AND emp_id= '10010'