Home / Educational Content / Database & Technology / SELECT Journal / Exadata – Part 3: Storage Maintenance

Exadata - Part 3: Storage Maintenance

Oracle-Exadata

Managing an Exadata Server is a great way to jump from being a normal DBA to great DMA (Database Machine Administrator), and get into the nitty-gritty details of storage administration. This tip will share some Exadata Storage maintenance jobs, how to manage them and at which logs to look.

I support a very I/O intensive Data Warehouse that builds every night. It is fairly consistent, except when impacted by two Exadata Storage maintenance jobs: Exadata Battery Learn Cycle and the Exadata Hard Disk Scrubbing. Note that Exadata Hard Disk scrubbing is different than ASM disk scrubbing.

Exadata Battery Learn Cycle

The Exadata Battery Learn Cycle runs once per quarter to perform a discharge and charge of the controller battery. During the maintenance, the Flash Cache Mode changes from Write-Back to Write-Through. Write-Back Flash Cache provides the ability to both read and write I/O directly to flash disks. This is safe in case of power loss as the battery backup will allow time for the writes in the Flash Cache to be committed to the Hard Disk. In Write-Through mode, all write I/O is written directly to the Hard Disk, which is significantly slower than writing to Flash Cache first.

Logs and Schedule

The effects of Write-Through mode can be seen in the database alert log errors under high I/O.

ORA-27626: Exadata error: 2201 (IO canceled due to slow/hung disk)
NOTE: ASM has redirected some slow reads to mirror sides to improve performance.

You can connect to the cell nodes and use list the alert history command at the CellCLI prompt.

CellCLI> list alerthistory
15_1 2016-10-17T04:00:31-07:00 info “The HDD disk controller battery is performing a learn cycle. Battery Serial Number : 1234 Battery Type : ibbu08 Battery Temperature : 27 C Full Charge Capacity : 1349 mAh Relative Charge : 98% Ambient Temperature : 18 C”
15_2 2016-10-17T05:13:51-07:00 clear “All disk drives are in WriteBack caching mode. Battery Serial Number : 1234 Battery Type : ibbu08 Battery Temperature : 29 C Full Charge Capacity : 1348 mAh Relative Charge : 71% Ambient Temperature : 18 C”

By default, the BBU learning cycle is at 2 a.m. on the 17th of every third month (Jan/April/July/Oct). This can be seen at and modified at the CellCLI prompt by running:

CellCLI> list cell attributes bbuLearnCycleTime
2017-04-17T02:00:00-07:00

CellCLI> alter cell bbuLearnCycleTime=’2017-04-17T02:00:00-07:00′;

Exadata Disk Scrubbing

A subtler Exadata Maintenance job is the bi-weekly Disk Scrub. This job does not appear in the CellCLI alert history. It only appears in the $CELLTRACE/alert.log.

Disk Scrubbing is designed to periodically validate the integrity of the mirrored ASM extents and thus eliminate latent corruption. The scrubbing is supposed to only run when average I/O utilization is under 25 percent. However, this can still cause spikes in utilization and latency and adversely affect database I/O, Oracle documentation says that a 4TB high capacity hard disk can take 8-12 hours to scrub, but I have seen it run more than 24 hours. Normally, this isn’t noticeable as it runs quietly in the background. However, if you have a high I/O workload, the additional 10-15 percent latency is noticeable.

Logs and Schedule

The $CELLTRACE/alert.log on the cell nodes reports the timing and results.

Wed Jan 11 16:00:07 2017
Begin scrubbing CellDisk:CD_11_xxxxceladm01.
Begin scrubbing CellDisk:CD_10_xxxxceladm01.

Thu Jan 12 15:12:37 2017
Finished scrubbing CellDisk:CD_10_xxxxceladm01, scrubbed blocks (1MB):3780032, found bad blocks:0
Thu Jan 12 15:42:02 2017
Finished scrubbing CellDisk:CD_11_xxxxceladm01, scrubbed blocks (1MB):3780032, found bad blocks:0

You can connect to the cell nodes and alter the Start Time and Interval at CellCLI prompt:

CellCLI> alter cell hardDiskScrubStartTime=’2017-01-21T08:00:00-08:00′;
CellCLI> list cell attributes name,hardDiskScrubInterval
biweekly

ASM Disk Scrubbing

ASM Disk Scrubbing performs a similar task to Exadata Disk Scrubbing. It searches the ASM blocks and repairs logical corruption using the mirror disks. The big difference is that ASM Disk scrubbing is run manually at disk group or file level and can be seen in V$ASM_OPERATION view and the alert_+ASM.log

Logs and Views

The alert_+ASM.log on the database node reports the command and duration.

Mon Feb 06 09:03:58 2017
SQL> alter diskgroup DBFS_DG scrub power low
Mon Feb 06 09:03:58 2017
NOTE: Start scrubbing diskgroup DBFS_DG
Mon Feb 06 09:03:58 2017
SUCCESS: alter diskgroup DBFS_DG scrub power low

Summary

These storage maintenance tasks are not exclusive to Exadata, but rather are common to all storage vendors. A great DBA will be aware of the storage maintenance, and schedule around other high maintenance I/O activity such as RMAN backups or batch activities to keep the database running smoothly at peak performance.

About the Author

Andrew Meade has been an Oracle DBA for 15 years working in both the financial and higher education sectors. He has presented at Oracle OpenWorld and COLLABORATE. Meade is an Oracle Certified Professional for 9i, 10g, and 11g.

Exadata - Part 3: Storage Maintenance