log buffer space - Tune Oracle Redo Log Buffer Size

log buffer space

Overview

The log buffer space wait event occurs when a foreground session generates redo and cannot find free space in the redo log buffer (the SGA memory structure governed by LOG_BUFFER) because LGWR (the Log Writer background process) is not writing the buffer contents to the online redo logs fast enough.

The redo log buffer is a circular ring buffer in the SGA. Foreground sessions write their redo change vectors into the buffer, and LGWR writes those entries to the online redo log files on disk. As long as LGWR keeps pace, the buffer never fills. When LGWR falls behind — due to slow I/O, an undersized buffer, or an extremely high redo generation rate — foreground sessions must wait for LGWR to free space before they can write more redo.

This wait is fundamentally an I/O wait on the LGWR write path, not a concurrency or locking issue. It sits at the very heart of Oracle’s write-ahead logging mechanism, and sustained waits for log buffer space directly translate to reduced transaction throughput for every DML-heavy session on the system.

log buffer space is different from log file sync (which fires when a COMMIT waits for LGWR to flush the buffer to disk) and log file parallel write (which fires inside LGWR itself when writing to redo log files). A system suffering from log buffer space waits is typically also generating high log file parallel write and log file sync waits as secondary symptoms.

When This Wait Is a Problem

Acceptable: Near-zero occurrences. Unlike many Oracle wait events, log buffer space should ideally appear zero times or at negligible frequency (less than 1 per minute) in a well-tuned system.

Investigate when:

Any notable count of log buffer space waits appears in an AWR report
Average wait time exceeds 1ms
The event appears alongside high log file sync average wait times (over 5ms)
Batch DML jobs are running slower than expected
V$SYSSTAT shows “redo log space requests” greater than zero

Critical: If log buffer space waits account for more than 0.5% of total DB time, the redo pipeline is a bottleneck. During large bulk operations (warehouse loads, mass updates), transient spikes are more forgivable — but persistent waits during OLTP indicate a configuration problem.

The P1, P2, P3 parameters of this event are not directly useful for object-level diagnosis. Focus instead on V$SYSSTAT redo statistics and V$LOG for redo log group performance.

Diagnostic Queries

1. Confirm Log Buffer Space Waits Are Occurring

-- Check current and historical log buffer space wait statistics
SELECT
    event,
    total_waits,
    total_timeouts,
    ROUND(time_waited / 100, 2) AS total_wait_secs,
    ROUND(average_wait / 100, 4) AS avg_wait_secs,
    ROUND(max_wait / 100, 2) AS max_wait_secs
FROM
    v$system_event
WHERE
    event IN (
        'log buffer space',
        'log file sync',
        'log file parallel write',
        'log file switch (checkpoint incomplete)',
        'log file switch completion'
    )
ORDER BY
    total_wait_secs DESC;

2. V$SYSSTAT — Redo Generation Rate and Buffer Usage

-- Key redo statistics for log buffer diagnosis
SELECT
    name,
    value,
    CASE name
        WHEN 'redo log space requests'
            THEN 'NON-ZERO = LOG_BUFFER TOO SMALL OR LGWR TOO SLOW'
        WHEN 'redo log space wait time'
            THEN 'Centiseconds waited for redo log space'
        WHEN 'redo size'
            THEN 'Total redo bytes generated'
        WHEN 'redo entries'
            THEN 'Number of redo entries written'
        WHEN 'redo writes'
            THEN 'Number of redo writes by LGWR'
        WHEN 'redo write time'
            THEN 'Centiseconds spent writing redo'
        ELSE NULL
    END AS interpretation
FROM
    v$sysstat
WHERE
    name IN (
        'redo log space requests',
        'redo log space wait time',
        'redo size',
        'redo entries',
        'redo writes',
        'redo write time',
        'redo blocks written'
    )
ORDER BY
    name;

3. Current LOG_BUFFER Size and Configuration

-- Show current log buffer and related parameters
SELECT name, value, description
FROM v$parameter
WHERE name IN (
    'log_buffer',
    'log_checkpoint_interval',
    'log_checkpoint_timeout',
    'fast_start_mttr_target',
    'commit_logging',
    'commit_wait'
)
ORDER BY name;

4. Online Redo Log Group Status and I/O Performance

-- Redo log group status, size, and sequence
SELECT
    l.group#,
    l.members,
    l.status,
    l.archived,
    l.sequence#,
    ROUND(l.bytes / 1024 / 1024, 0) AS size_mb,
    l.first_time,
    l.next_time,
    ROUND(
        (CAST(l.next_time AS DATE) - CAST(l.first_time AS DATE)) * 1440,
        2
    ) AS duration_minutes
FROM
    v$log l
ORDER BY
    l.sequence# DESC;

5. Redo Log File Placement and Write Latency

-- Redo log file locations (should be on fast dedicated storage)
SELECT
    lf.group#,
    lf.member AS file_path,
    lf.status,
    lf.type
FROM
    v$logfile lf
ORDER BY
    lf.group#;

6. ASH — Sessions Waiting for Log Buffer Space

-- Historical log buffer space waits from Active Session History
SELECT
    TO_CHAR(ash.sample_time, 'YYYY-MM-DD HH24:MI') AS sample_minute,
    COUNT(*) AS ash_samples,
    COUNT(DISTINCT ash.session_id) AS sessions_affected,
    MAX(ash.sql_id) AS sample_sql_id
FROM
    v$active_session_history ash
WHERE
    ash.event = 'log buffer space'
    AND ash.sample_time > SYSDATE - 1
GROUP BY
    TO_CHAR(ash.sample_time, 'YYYY-MM-DD HH24:MI')
ORDER BY
    sample_minute DESC;

7. Redo Generation Rate by Session (Who Generates the Most Redo?)

-- Current session redo generation — identify high-redo sessions
SELECT
    s.sid,
    s.serial#,
    s.username,
    s.program,
    s.module,
    ss.value AS redo_size_bytes,
    ROUND(ss.value / 1024 / 1024, 2) AS redo_size_mb
FROM
    v$session s
    JOIN v$sesstat ss ON s.sid = ss.sid
    JOIN v$statname sn ON ss.statistic# = sn.statistic#
WHERE
    sn.name = 'redo size'
    AND ss.value > 1048576  -- More than 1MB redo
    AND s.username IS NOT NULL
ORDER BY
    ss.value DESC
FETCH FIRST 20 ROWS ONLY;

Root Causes

1. LOG_BUFFER Too Small for Redo Generation Rate

The default LOG_BUFFER size in Oracle is typically 32MB (or auto-tuned to a small value in 12c+). High-throughput OLTP or bulk DML workloads can saturate a small log buffer in milliseconds. When the buffer fills before LGWR can drain it, every new redo write stalls. Oracle’s rule of thumb: LOG_BUFFER should be large enough to hold approximately 1–3 seconds of redo at peak generation rate.

2. LGWR Cannot Write Fast Enough (Slow Storage)

Even with an adequately sized log buffer, if the underlying storage hosting the online redo log files has high write latency (spinning disk, shared SAN with contention, slow network storage), LGWR cannot drain the buffer quickly enough. Online redo logs must be written synchronously — LGWR does not return until the write completes. Latency of 10–20ms per write on busy systems can cause log buffer space waits even with generous buffer sizes.

3. Online Redo Log Files Shared with Data Files

A common configuration error: placing online redo log files on the same physical disk spindles or LUN as data files. Data file I/O and redo write I/O compete for the same bandwidth. Redo writes are latency-critical; data writes are throughput-oriented. Sharing storage degrades both.

4. Redo Log Groups Are Too Small, Causing Frequent Switches

When online redo log files are small (e.g., 50MB in a system generating 500MB/hour of redo), log switches occur many times per hour. Each log switch requires LGWR to switch to a new group and may trigger incremental checkpoints. During the switch and checkpoint activity, LGWR is busier than normal, increasing the probability of log buffer space waits.

5. Extreme Bulk DML Generating Massive Redo

INSERT, UPDATE, DELETE, and MERGE operations on large row sets generate proportional redo. An ETL job inserting 100 million rows without using direct-path (nologging) operations can generate tens of gigabytes of redo in a short window, overwhelming the log buffer and LGWR pipeline regardless of configuration.

6. COMMIT_WAIT = NOWAIT or Batch Commit Settings

Some applications use COMMIT_WAIT = NOWAIT (post-commit returns before LGWR confirms the write). This can mask log file sync waits but does not reduce redo generation. High-frequency tiny commits can also create LGWR pressure by triggering many small writes rather than allowing LGWR to batch writes efficiently.

Resolution Steps

Step 1: Increase LOG_BUFFER

-- Check current setting
SHOW PARAMETER log_buffer;

-- Increase LOG_BUFFER (requires restart — SPFILE parameter)
-- Recommended: Start with 128MB for high-throughput OLTP
-- or calculate as: peak_redo_per_second * 3
ALTER SYSTEM SET log_buffer = 134217728 SCOPE=SPFILE;
-- (128MB = 128 * 1024 * 1024 = 134217728 bytes)

-- Note: LOG_BUFFER cannot be changed dynamically (SCOPE=MEMORY not supported)
-- A database restart is required

Step 2: Move Redo Logs to Dedicated Fast Storage

-- Add new redo log members on fast dedicated storage (SSD or dedicated spindle)
-- Best practice: Multiplex redo logs across different physical paths
ALTER DATABASE ADD LOGFILE MEMBER
    '/fast_disk1/oradata/redo01b.log' TO GROUP 1;
ALTER DATABASE ADD LOGFILE MEMBER
    '/fast_disk1/oradata/redo02b.log' TO GROUP 2;
ALTER DATABASE ADD LOGFILE MEMBER
    '/fast_disk1/oradata/redo03b.log' TO GROUP 3;

-- Drop old members on slow storage after verifying new members
ALTER DATABASE DROP LOGFILE MEMBER '/slow_disk/oradata/redo01a.log';

Step 3: Increase Redo Log Group Size

-- Add new larger redo log groups, then drop smaller old groups
-- Target: Each switch should happen no more than 2–4 times per hour
-- Calculate: size = peak_redo_per_hour / desired_switches_per_hour

-- Add new groups with larger size
ALTER DATABASE ADD LOGFILE GROUP 4
    '/fast_disk1/oradata/redo04a.log' SIZE 1G;
ALTER DATABASE ADD LOGFILE GROUP 5
    '/fast_disk1/oradata/redo05a.log' SIZE 1G;
ALTER DATABASE ADD LOGFILE GROUP 6
    '/fast_disk1/oradata/redo06a.log' SIZE 1G;

-- Force log switches to cycle through new groups
ALTER SYSTEM SWITCH LOGFILE;
ALTER SYSTEM SWITCH LOGFILE;
ALTER SYSTEM SWITCH LOGFILE;

-- Drop old undersized groups (only when status = INACTIVE)
-- Check status first:
SELECT group#, status FROM v$log;
-- Then drop when INACTIVE:
ALTER DATABASE DROP LOGFILE GROUP 1;
ALTER DATABASE DROP LOGFILE GROUP 2;
ALTER DATABASE DROP LOGFILE GROUP 3;

Step 4: Use NOLOGGING for Bulk Loads Where Appropriate

-- For data warehouse loads that can be rebuilt from source:
-- Use APPEND hint + NOLOGGING to bypass redo generation
INSERT /*+ APPEND */ INTO target_table NOLOGGING
SELECT * FROM source_table;
COMMIT;

-- Or set table to NOLOGGING mode for the duration of the load
ALTER TABLE target_table NOLOGGING;
-- ... execute bulk inserts ...
ALTER TABLE target_table LOGGING;

-- IMPORTANT: Take a backup immediately after NOLOGGING operations
-- NOLOGGING blocks become "unrecoverable" in a media recovery scenario

Step 5: Enable Asynchronous I/O for Redo Logs

-- Verify async I/O is enabled (reduces LGWR write latency)
SHOW PARAMETER disk_asynch_io;

-- Enable if not already on (requires restart)
ALTER SYSTEM SET disk_asynch_io = TRUE SCOPE=SPFILE;

Prevention & Tuning

Size LOG_BUFFER to hold 1–3 seconds of peak redo: Calculate your peak redo generation rate from V$SYSSTAT’s “redo size” statistic divided by elapsed seconds, then multiply by 2. This gives your target LOG_BUFFER size.

Dedicate fast storage to redo logs: Online redo logs are write-latency-critical. They must be on storage with single-digit millisecond write latency — dedicated SSD, NVMe, or a dedicated SAN LUN with a battery-backed write cache. Never share redo storage with data files or archive logs.

Size redo log groups for 3–6 switches per hour: More frequent switches mean more LGWR overhead; less frequent switches mean slower crash recovery. The recommended sweet spot is 3–6 switches per hour under normal load.

Monitor “redo log space requests” as a zero-tolerance metric: Any non-zero value for this V$SYSSTAT statistic since instance startup indicates that the log buffer has been a bottleneck. Include this in daily DBA health checks.

Consider enabling the LGWR I/O Slaves: In environments with very high redo rates and spinning disk, enabling LGWR I/O slaves can parallelize redo writes:

ALTER SYSTEM SET lgwr_io_slaves = 4 SCOPE=SPFILE;
-- (Requires restart; use cautiously — not needed with modern SSD storage)

log file sync — A COMMIT waits for LGWR to write the redo buffer to disk; the most common redo-related end-user wait
log file parallel write — LGWR internal wait while writing to redo log files; indicates slow storage on the redo path
log file switch (checkpoint incomplete) — All redo log groups are in use and LGWR cannot switch because the checkpoint hasn’t advanced; requires more/larger redo log groups
log file switch completion — Waiting for LGWR to complete switching to the next log group after the current one fills
log file switch (archiving needed) — Archiver cannot keep up with log switches; redo log groups cannot be reused