Debugging heavy load on Oracle databases

Rama Satya Diwakar Grandhi
4 min readFeb 23, 2022

--

Oracle database

A lot of enterprises rely on the Oracle database for their data layer. Although the licenses are costly, Oracle provides a proven product in terms of performance and scalability and very good support, so it turns out to be a good trade-off. However, every product will have certain limits. If Oracle is being used to serve data by busy applications, the number of parallel database connections will often be the cause of bottlenecks. The high number can lead to high CPU usage on the Oracle side. Also, it can starve other applications from getting connections and lead to functional issues. So, it becomes critical for developers to understand not just the number of connections their applications are consuming but also how effectively they are being used.

Before we dive in, there is one pre-requisite for the below analysis to work. We need to ensure that an appropriate value is set for the module attribute. One of the ways to do it is during the connection creation. The application can set the initSql attribute to call dbms_application_info.set_module('<module-name>','<action-name>') . This will help us map the database connection to a certain application while looking at the oracle database.

All the established database connections

Moving on to the queries, we can get the crux of the information from the “gv$session” and “gv$sql” views. The “gv$session” view can provide us with both the active and inactive connections. If we are debugging our specific application’s performance, we can tune out the rest and add the where clause for the module attribute.

select status, count(1) 
from gv$session
where module = '<module-name>'
and type = 'USER'
group by status;

All the active connections

The next important thing to understand is what these connections are doing on the database. In some cases, we will see long-running queries i.e. active connections which are running for a long time. The reasons can be numerous and I have listed a few here.

  1. The query is working with a large amount of data coming from multiple tables.
  2. The code is not using an optimized set of queries to achieve the functionality. For example, if we have a method to insert a single record and if someone thought of re-using this method by calling it in a loop, it will perform fine for small data but take a long time for large data.
  3. This connection is blocked from performing the required inserts or updates by another connection.

We can see all the currently active connections with the below query. The value in the last_call_et column indicates how long this connection has been running (in seconds). In the case of 1 and 2, we can debug further by picking up the sql_id and investigating the code or the amount of data present in those tables. In the case of 3, the other attributes event, blocking_instance, blocking_session, seconds_in_wait will give more information about which other session is blocking it and how much time has elapsed in waiting.

select last_call_et, module, sql_id, event, blocking_instance, blocking_session, seconds_in_wait 
from gv$session
where module = '<module-name>'
and type = 'USER'
and status = 'ACTIVE'
order by last_call_et desc, module, sql_id, event, blocking_instance, blocking_session, seconds_in_wait;
select * from gv$sql where sql_id = '<above-sql-id>';

All the inactive connections

After looking at the active connections, we can now inspect the inactive connections. From the below query, we can see the number of inactive connections, the time for which the connection has been idle, and finally, the query that was last executed as well.

select last_call_et, module, prev_sql_id  
from gv$session
where module = '<module-name>'
and type = 'USER'
and status = 'INACTIVE';

select * from gv$sql where sql_id = '<above-prev-sql-id>';

Notes:

  1. The standard views to check for database performance are v$session, v$sql, and others. The “v$” views work well if the database is hosted on a single node. However, if the database is split into multiple nodes, we need to look at the “gv$” views. The “INST_ID” column will tell us which exact node is fulfilling the request.
  2. Interpreting the value of last_call_et depends on the value of gv$sql_plan.optimizer . The default value is “ALL_ROWS” but there is an option to change it. In the case of “ALL_ROWS”, the last_call_et will reflect the total time elapsed for this call. However, if we are using “FIRST_ROWS” or “FIRST_ROWS_N”, this value will represent the time elapsed only for the numbers of rows returned in each operation.

This will help us understand the cause for load from a database perspective. For understanding how to debug the load from the application perspective, we need to look at the database pool connection properties. Please refer to my other article — https://diwakargrandhi.medium.com/understanding-the-database-connection-pool-dbcp-properties-ae0747f0962b. Thanks.

--

--