ORA-15040 Diskgroup Is Incomplete - ASM Storage Redundancy Error
ORA-15040: Diskgroup Is Incomplete
Section titled “ORA-15040: Diskgroup Is Incomplete”Error Overview
Section titled “Error Overview”Error Text: ORA-15040: diskgroup is incomplete
This error occurs when ASM (Automatic Storage Management) detects that a diskgroup does not have sufficient disks to maintain the required redundancy level. The diskgroup is missing one or more disks, making it incomplete and potentially compromising data redundancy. This is a critical error that requires immediate attention to prevent data loss.
Understanding ASM Redundancy
Section titled “Understanding ASM Redundancy”Redundancy Levels
Section titled “Redundancy Levels”ASM Redundancy Types├── EXTERNAL REDUNDANCY│ ├── No ASM mirroring│ ├── Relies on hardware RAID│ └── Cannot tolerate disk loss├── NORMAL REDUNDANCY│ ├── 2-way mirroring│ ├── Requires minimum 2 failure groups│ └── Can tolerate 1 failure group loss└── HIGH REDUNDANCY ├── 3-way mirroring ├── Requires minimum 3 failure groups └── Can tolerate 2 failure group losses
Failure Group Concepts
Section titled “Failure Group Concepts”-- Failure groups isolate disk failures-- Disks in same failure group share failure characteristics-- ASM mirrors data across failure groups, not within
Common Causes
Section titled “Common Causes”Disk Failures
Section titled “Disk Failures”- Physical disk failures
- Disk controller failures
- Storage array problems
- Disk path failures
Configuration Issues
Section titled “Configuration Issues”-- Insufficient disks for redundancy level-- Failure group imbalance-- Disk drop operations incomplete-- Forced dismount operations
Environmental Problems
Section titled “Environmental Problems”- Storage network disruptions
- Operating system disk access issues
- Multipath configuration failures
- Power or hardware failures
Administrative Actions
Section titled “Administrative Actions”- Accidental disk drops
- Incomplete rebalance operations
- Failed disk additions
- Diskgroup alterations gone wrong
Diagnostic Steps
Section titled “Diagnostic Steps”Assess Diskgroup Status
Section titled “Assess Diskgroup Status”-- Connect to ASM instancesqlplus / as sysasm
-- Check diskgroup completenessSELECT name, state, type, total_mb, free_mb, required_mirror_free_mb, usable_file_mb, offline_disksFROM v$asm_diskgroupWHERE state != 'MOUNTED' OR offline_disks > 0;
-- Detailed diskgroup informationSELECT * FROM v$asm_diskgroup WHERE name = 'DATA';
Identify Missing Disks
Section titled “Identify Missing Disks”-- Find offline or missing disksSELECT dg.name as diskgroup_name, d.path, d.name as disk_name, d.mount_status, d.header_status, d.state, d.failgroupFROM v$asm_disk d, v$asm_diskgroup dgWHERE d.group_number = dg.group_number(+)AND (d.mount_status != 'CACHED' OR d.state != 'NORMAL')ORDER BY dg.name, d.failgroup, d.path;
-- Check for former member disksSELECT path, header_status, state, name, failgroupFROM v$asm_diskWHERE header_status = 'FORMER';
Analyze Failure Groups
Section titled “Analyze Failure Groups”-- Check failure group distributionSELECT dg.name as diskgroup_name, d.failgroup, COUNT(*) as disk_count, SUM(d.total_mb) as total_mb, AVG(d.total_mb) as avg_disk_mbFROM v$asm_disk d, v$asm_diskgroup dgWHERE d.group_number = dg.group_numberGROUP BY dg.name, d.failgroupORDER BY dg.name, d.failgroup;
-- Check redundancy violationsSELECT name, type, offline_disks, required_mirror_free_mbFROM v$asm_diskgroupWHERE offline_disks > 0;
Review Alert Log
Section titled “Review Alert Log”# Check ASM alert log for disk errorstail -1000 $ORACLE_BASE/diag/asm/+asm/+ASM/trace/alert_+ASM.log | grep -E "ORA-|WARNING|ERROR"
# Look for disk offline messagesgrep -i "offline\|missing\|incomplete" $ORACLE_BASE/diag/asm/+asm/+ASM/trace/alert_+ASM.log
Resolution Steps
Section titled “Resolution Steps”1. Restore Missing Disks
Section titled “1. Restore Missing Disks”Bring Disks Online
Section titled “Bring Disks Online”-- Check offline disksSELECT path, name, mount_status, header_status, stateFROM v$asm_diskWHERE state = 'OFFLINE' OR mount_status != 'CACHED';
-- Bring specific disk onlineALTER DISKGROUP data ONLINE DISK data_0001;
-- Bring all offline disks onlineALTER DISKGROUP data ONLINE ALL;
-- Force disk onlineALTER DISKGROUP data ONLINE DISK '/dev/oracleasm/disks/DISK1' FORCE;
Fix Disk Access Issues
Section titled “Fix Disk Access Issues”# Check disk visibilityls -la /dev/oracleasm/disks/
# Scan for disksoracleasm scandisksoracleasm listdisks
# Fix permissions if neededchown grid:asmadmin /dev/oracleasm/disks/*chmod 660 /dev/oracleasm/disks/*
# Restart ASM disk driveroracleasm stoporacleasm start
2. Force Mount Incomplete Diskgroup
Section titled “2. Force Mount Incomplete Diskgroup”Mount with Available Disks
Section titled “Mount with Available Disks”-- Force mount (use with caution)ALTER DISKGROUP data MOUNT FORCE;
-- Check mount statusSELECT name, state, type, offline_disksFROM v$asm_diskgroup WHERE name = 'DATA';
-- If mounted, check data redundancySELECT * FROM v$asm_fileWHERE group_number = (SELECT group_number FROM v$asm_diskgroup WHERE name = 'DATA')AND redundancy_lowered = 'Y';
Mount and Restore Redundancy
Section titled “Mount and Restore Redundancy”-- Mount restricted for maintenanceALTER DISKGROUP data MOUNT RESTRICTED FORCE;
-- Add replacement disksALTER DISKGROUP data ADD DISK '/dev/oracleasm/disks/NEWDISK1' NAME data_0005, '/dev/oracleasm/disks/NEWDISK2' NAME data_0006REBALANCE POWER 10;
-- Monitor rebalanceSELECT * FROM v$asm_operation;
3. Drop Missing Disks
Section titled “3. Drop Missing Disks”Remove Permanently Failed Disks
Section titled “Remove Permanently Failed Disks”-- Drop specific offline diskALTER DISKGROUP data DROP DISK data_0001 FORCE;
-- Drop multiple disksALTER DISKGROUP data DROP DISK data_0001, data_0002 FORCE;
-- Drop disks and add replacements in one operationALTER DISKGROUP data DROP DISK data_0001, data_0002 ADD DISK '/dev/oracleasm/disks/NEWDISK1', '/dev/oracleasm/disks/NEWDISK2'REBALANCE POWER 10;
4. Reduce Redundancy Level (Last Resort)
Section titled “4. Reduce Redundancy Level (Last Resort)”Convert to Lower Redundancy
Section titled “Convert to Lower Redundancy”-- WARNING: This reduces data protection-- From NORMAL to EXTERNAL redundancy-- Requires diskgroup recreation
-- First, backup diskgroup metadataSELECT dbms_metadata.get_ddl('DISKGROUP', 'DATA') FROM dual;
-- Create new diskgroup with external redundancyCREATE DISKGROUP data_new EXTERNAL REDUNDANCYDISK '/dev/oracleasm/disks/DISK1', '/dev/oracleasm/disks/DISK2';
-- Migrate data (requires downtime)-- Use RMAN or Data Pump to move data
Advanced Recovery Procedures
Section titled “Advanced Recovery Procedures”Diskgroup Metadata Recovery
Section titled “Diskgroup Metadata Recovery”# Backup current metadatamd_backup -b /backup/asm_metadata_backup.xml -g DATA
# Analyze diskgroup metadataamdu -diskstring '/dev/oracleasm/disks/*' -dump 'DATA' -report /tmp/amdu_report.txt
# Extract critical files if neededamdu -diskstring '/dev/oracleasm/disks/*' -extract 'DATA.256' -output /recovery/
Force Disk Discovery
Section titled “Force Disk Discovery”-- Clear disk header status cacheALTER SYSTEM SET asm_diskstring = '' SCOPE=MEMORY;ALTER SYSTEM SET asm_diskstring = '/dev/oracleasm/disks/*' SCOPE=MEMORY;
-- Force disk header re-readALTER DISKGROUP data CHECK DISK ALL REPAIR;
-- Validate disk headersSELECT path, header_status, state, total_mbFROM v$asm_diskWHERE path LIKE '/dev/oracleasm/disks/%'ORDER BY path;
RAC Environment Recovery
Section titled “RAC Environment Recovery”# Check disk status on all nodesfor node in $(olsnodes); do echo "=== Node: $node ===" ssh $node " export ORACLE_HOME=/u01/app/19.0.0/grid export ORACLE_SID=+ASM${node##*node} \$ORACLE_HOME/bin/sqlplus -s / as sysasm << EOF SELECT inst_id, path, header_status FROM gv\\\$asm_disk WHERE header_status != 'MEMBER' ORDER BY path;EOF "done
# Force mount on all nodessrvctl stop diskgroup -g DATA -fsrvctl start diskgroup -g DATA -n node1
Monitoring and Prevention
Section titled “Monitoring and Prevention”Diskgroup Completeness Monitoring
Section titled “Diskgroup Completeness Monitoring”-- Create monitoring tableCREATE TABLE asm_diskgroup_health ( check_time TIMESTAMP, diskgroup_name VARCHAR2(30), redundancy_type VARCHAR2(20), total_disks NUMBER, offline_disks NUMBER, missing_disks NUMBER, health_status VARCHAR2(20));
-- Monitoring procedureCREATE OR REPLACE PROCEDURE check_diskgroup_completeness ASBEGIN INSERT INTO asm_diskgroup_health SELECT SYSTIMESTAMP, name, type, (SELECT COUNT(*) FROM v$asm_disk WHERE group_number = dg.group_number), offline_disks, (SELECT COUNT(*) FROM v$asm_disk WHERE group_number = dg.group_number AND state != 'NORMAL'), CASE WHEN state != 'MOUNTED' THEN 'CRITICAL' WHEN offline_disks > 0 THEN 'WARNING' ELSE 'HEALTHY' END FROM v$asm_diskgroup dg;
-- Alert on issues FOR rec IN ( SELECT name, offline_disks FROM v$asm_diskgroup WHERE offline_disks > 0 OR state != 'MOUNTED' ) LOOP DBMS_OUTPUT.PUT_LINE('ALERT: Diskgroup ' || rec.name || ' has ' || rec.offline_disks || ' offline disks'); END LOOP;
COMMIT;END;/
Automated Recovery Script
Section titled “Automated Recovery Script”#!/bin/bash# Automated diskgroup recovery
ORACLE_HOME=/u01/app/19.0.0/gridORACLE_SID=+ASMexport ORACLE_HOME ORACLE_SID
check_and_recover_diskgroup() { local dg_name=$1
echo "Checking diskgroup $dg_name..."
# Check diskgroup status DG_STATUS=$($ORACLE_HOME/bin/sqlplus -s / as sysasm << EOF SET PAGESIZE 0 FEEDBACK OFF HEADING OFF SELECT state || ':' || offline_disks FROM v\$asm_diskgroup WHERE name = '$dg_name';EOF )
STATE=$(echo $DG_STATUS | cut -d: -f1) OFFLINE=$(echo $DG_STATUS | cut -d: -f2)
if [ "$STATE" != "MOUNTED" ] || [ "$OFFLINE" -gt 0 ]; then echo "Diskgroup $dg_name needs recovery (State: $STATE, Offline: $OFFLINE)"
# Attempt online all disks $ORACLE_HOME/bin/sqlplus -s / as sysasm << EOF ALTER DISKGROUP $dg_name ONLINE ALL;EOF
# If still issues, try force mount if [ "$STATE" != "MOUNTED" ]; then echo "Attempting force mount..." $ORACLE_HOME/bin/sqlplus -s / as sysasm << EOF ALTER DISKGROUP $dg_name MOUNT FORCE;EOF fi
# Send alert echo "Diskgroup $dg_name recovery attempted" | \ else echo "Diskgroup $dg_name is healthy" fi}
# Check all diskgroupsfor DG in DATA FRA REDO; do check_and_recover_diskgroup $DGdone
Predictive Disk Failure Monitoring
Section titled “Predictive Disk Failure Monitoring”-- Monitor disk performance metricsCREATE OR REPLACE VIEW v_disk_health_metrics ASSELECT d.path, d.name as disk_name, dg.name as diskgroup_name, d.reads, d.writes, d.read_time, d.write_time, ROUND(d.read_time/NULLIF(d.reads,0),2) as avg_read_time, ROUND(d.write_time/NULLIF(d.writes,0),2) as avg_write_timeFROM v$asm_disk d, v$asm_diskgroup dgWHERE d.group_number = dg.group_numberAND d.mount_status = 'CACHED';
-- Alert on slow disksSELECT * FROM v_disk_health_metricsWHERE avg_read_time > 20 OR avg_write_time > 20;
Emergency Procedures
Section titled “Emergency Procedures”Critical Data Extraction
Section titled “Critical Data Extraction”#!/bin/bash# Extract critical files from incomplete diskgroup
DISKGROUP="DATA"OUTPUT_DIR="/emergency/recovery"
echo "Starting emergency data extraction from $DISKGROUP..."
# Create output directorymkdir -p $OUTPUT_DIR
# Extract control filesamdu -diskstring '/dev/oracleasm/disks/*' \ -extract "$DISKGROUP.*.ctl" \ -output $OUTPUT_DIR \ -report $OUTPUT_DIR/control_files.txt
# Extract datafilesamdu -diskstring '/dev/oracleasm/disks/*' \ -extract "$DISKGROUP.*.dbf" \ -output $OUTPUT_DIR \ -report $OUTPUT_DIR/datafiles.txt
# Extract online redo logsamdu -diskstring '/dev/oracleasm/disks/*' \ -extract "$DISKGROUP.*.log" \ -output $OUTPUT_DIR \ -report $OUTPUT_DIR/redo_logs.txt
echo "Extraction completed to $OUTPUT_DIR"
Diskgroup Rebuild Procedure
Section titled “Diskgroup Rebuild Procedure”-- Document current configurationSET LONG 20000SET PAGESIZE 0SPOOL diskgroup_config.sqlSELECT 'CREATE DISKGROUP ' || name || ' ' || type || ' REDUNDANCY' || CHR(10) || 'DISK' || CHR(10) || disk_list || CHR(10) || 'ATTRIBUTE' || CHR(10) || attributes || ';'FROM ( SELECT dg.name, dg.type, LISTAGG(CHR(9) || '''' || d.path || ''' NAME ' || d.name, ',' || CHR(10)) WITHIN GROUP (ORDER BY d.disk_number) as disk_list, LISTAGG(CHR(9) || '''' || a.name || ''' = ''' || a.value || '''', ',' || CHR(10)) WITHIN GROUP (ORDER BY a.name) as attributes FROM v$asm_diskgroup dg JOIN v$asm_disk d ON dg.group_number = d.group_number LEFT JOIN v$asm_attribute a ON dg.group_number = a.group_number WHERE dg.name = 'DATA' GROUP BY dg.name, dg.type);SPOOL OFF
Best Practices
Section titled “Best Practices”- Monitor disk health proactively
- Maintain spare disks for quick replacement
- Use failure groups properly
- Regular diskgroup integrity checks
- Implement automated recovery procedures
- Test force mount procedures
- Keep redundancy appropriate to criticality
Related Errors
Section titled “Related Errors”- ORA-15001: Diskgroup does not exist or is not mounted
- ORA-15017: Diskgroup cannot be mounted
- ORA-15041: Diskgroup space exhausted
- ORA-15042: ASM disk is missing
Troubleshooting Checklist
Section titled “Troubleshooting Checklist”- Identify all missing/offline disks
- Check disk accessibility at OS level
- Verify disk permissions and ownership
- Attempt to bring disks online
- Try force mount if necessary
- Add replacement disks if available
- Monitor rebalance operations
- Verify data redundancy after recovery