Direct I/O

From Siwiki

Jump to: navigation, search

[edit] What is Direct I/O?

As a generic term, 'direct I/O' refers to filesystem I/O which does not traffic through the OS-level page cache. Direct I/O is not the default for any filesystems under Solaris, and it can be a huge performance enabler for certain workloads. On the other hand, filesystem features such as read pre-fetching, write coalescing, and write deferral all depend on OS-level I/O buffering - and certain operations and applications may depend on that for performance reasons. Different implememtatins of direct I/O have different implications. Ultimately, the 'goodness' of direct I/O for any specific applcation will depend on the application as well as the implementation.

The 'direct I/O' functionality on non-UFS filesystems is significantly different from that of UFS Direct I/O, since only UFS Direct I/O bypasses POSIX single-writer locking. Other filesystems may have their own means of bypassing this lock, but those means are usually distinct from enabling their direct I/O features. For example, QFS offers 'Q-writes' and Veritas VxFS offers 'Quick I/O' (QIO) and 'Oracle Data Manager' (ODM) features that acheive a similar effect.

UFS Direct I/O uses a very similar code path to that when the raw disk path is used, but rather than bypassing the file system completely, it simply bypasses the file system cache. This has the advantage of retaining the regular file system administration model, with the advantage of providing an efficient code path similar to that of raw disk.

UFS Direct I/O exports kstats, which can be monitored with the down loadable directiostat command.

[edit] How do I use UFS Direct I/O with Oracle?

The Oracle database caches data in its own cache within the Oracle shared global area (SGA). This is known as the database block buffer cache, and sized at database start-up according to the parameters set by the DBA. In earlier releases of Oracle the buffer size was set using the db_block_buffers parameter (specified in db_block_size units), in version 9i with db_cache_size (specified in bytes) and in version 10g with the sga_target parameter. Database reads and writes are cached in block buffer cache so that subsequent accesses for the same blocks do not need to re-read data from the operating system.

Filesystems in Solaris default to reading data though the global file system cache, which means by default we are caching each read potentially twice. In addition to double caching, we are also paying extra CPU overhead for the code which manages the operating system file system cache. At high I/O rates, the locking operations involved in OS-level memory management can challenge hardware cache coherency enforcement mechanisms, imposing a 'speed limit' that might occur far before the full capability of the underlying storage is reached. Additionally, the default modes of filesystems are typically compliant with POSIX single-writer locking constraints, which can create yet another I/O throttling impact. A more efficient and scalable way for Oracle to perform I/O is to have the operating system bypass its file system cache, and read blocks from disk straight into the Oracle block buffer cache. One way to do this is to use the UFS file sytem feature known as 'UFS Direct I/O'.

Since version 2.6, Solaris has provided an option for UFS Direct I/O operations. At Solaris 8 Update 3, Direct I/O was enhanced to relax the single writer-lock for preallocated files. Details on Oracle File System Performance are covered in this older, but still relevant paper on Oracle I/O [1]

Solaris also provides the directio(3C) API to allow applications to request direct I/O operations on a per-file basis. Oracle will use this API when the Oracle parameter filesystemio_options=setall is specified, and this is preferred by many as the best means of enabling UFS direct I/O with Oracle. Some history on this parameter is offered in Allen Packer's "Configuring and Tuning Databases on the Solaris(TM) Platform" (Prentice Hall, 2002, ISBN 0-13-083417-3) - Specifically, Chapter 22, Part 2: "Monitoring and Tuning Oracle", available online at http://www.sun.com/blueprints/0802/816-7472-10.pdf .

The preferred way to enable Direct I/O in Oracle 9,10 is via the FILESYSTEMIO_OPTIONS[2] init.ora parameter:

FILESYSTEMIO_OPTIONS = setall

This parameter lets Oracle use the best features available for that particular file system. With UFS filesystem, direct I/O would be used.

Before this parameter was available, Direct I/O was enabled on UFS with the forcedirectio mount option, on a per-filesytem basis:

# mount -o forcedirectio /dev/dsk/c0t1d0s2 /db1 


The UFS Direct I/O path provides a substantial performance improvement for a number of reasons:

  1. It eliminates the write-breakup of large synchronous writes, which are important for the log-writer.
  2. It allows concurrent read/writes to a single file (eliminates the POSIX reader/writer lock)
  3. It eliminates double buffering
  4. It provides a small, efficient code path for reads and writes
  5. It removes memory pressure from the operating environment. Solaris uses the virtual memory system to implement file system caching and, as a result, many page-in operations result when reading data though the file system cache. With UFS direct I/O, database reads do not need to involve the virtual memory system.

When using UFS Direct I/O we no longer use the file system cache; therefore it is also important to correctly size the database block buffer cache. Use Oracle statistics such as from STATSPACK or AWR to monitor Oracle buffer cache usage. Think big. 64-bit Oracle is all about allowing direct Oracle usage of large amounts of memory (greater than 3.75 GB). With a large Oracle DB cache, tuning opportunities such as using a larger SMALL_TABLE_THRESHOLD or using of KEEP pools might be freshly-evaluated.

There are some tradeoffs associated with using UFS direct I/O where buffered I/O was previously used. For example, certain queries which may have depended on filesystem pre-fetching might go slower. Using Oracle's PARALLEL QUERY options or tuning DB_FILE_MULTIBLOCK_READ_COUNT are among the ways of tackling such issues. Keep in mind that all RAC and Grid varieties of Oracle require tuning SQL code to the constraint that there can be no OS-level buffering of data. It is always sage advice to test your application with any major change such as this - but it is almost always ultimately 'best' to use an underlying storage solution that avoids OS-level caching and POSIX-locking. This is usually the best strategy for an efficient, performant, stable, and scalable Oracle deployment.

Solaris Internals
Personal tools
The Books
The Ads