how solaris zfs cache management differs from ufs and vxfs file systems (doc id 1005367.1)

Upload: rahulhcl

Post on 10-Mar-2016

29 views

Category:

Documents


0 download

TRANSCRIPT

  • Document 1005367.1

    https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=333292318734259&id=1005367.1&_afrWindowMode=0&_adf.ctrl-state=7scxsz45h_615[8/13/2014 7:27:15 AM]

    [My Core Products] Product Line: Internal ...PowerView is On (138)Rahul (Available) Contact Us Help

    APPLIES TO:

    Solaris SPARC Operating System - Version 10 3/05 and laterSolaris x64/x86 Operating System - Version 10 3/05 and laterAll Platforms

    GOAL

    ZFS manages its cache differently to other filesystems such as: UFS and VxFS. ZFS' use of kernel memory as a cache results inhigher kernel memory allocation as compared to UFS and VxFS filesystems. Monitoring a system with tools such as vmstatwould report less free memory with ZFS and may lead to unnecessary support calls.

    SOLUTION

    This is due to ZFS's cache management being different from UFS and VxFS filesystems. ZFS does not use the page cache,unlike other filesystems such as UFS and VxFS. ZFS's caching is drastically different from these old filesystems where cachedpages can be moved to the cache list after being written to the backing store and thus counted as free memory.

    ZFS affects the VM subsystem in terms of memory management. Monitoring of systems with vmstat(1M) and prstat(1M) wouldreport less free memory when ZFS is used heavily ie, copying large file(s) into a ZFS filesystem. The same load running on aUFS filesystem would use less memory since pages that are written to the backing store will be moved into a cache list andcounted as free memory.

    ZFS uses a significantly different caching model than page-based filesystems like UFS and VxFS. This is for both performanceand architectural reasons. This may impact existing application software (like Oracle that itself consumes large amount ofmemory).

    The primary ZFS cache is an Adjustable Replacement Cache (ARC) that is built on top of a number of kmem_cache's:zio_buf_512 thru zio_buf_131072 (+ hdr_cache and buf_cache). These kmem caches are used for holding data blocks(ZFS uses variable block sizes: 512 bytes to 1MB). The ARC will be at least 64MB in size and can use a maximum of physicalmemory less 1GB. With ZFS, reported freemem will be lower than with other filesystems.

    ZFS returns memory from the ARC only when there is a memory pressure. It is a different behaviour than pre Solaris 8 (prepriority paging) where reading or writing one large (GBs) file can lead to memory shortage and that leads to paging/swaping ofapplication pages. That can result in slow application performance. This old problem was due to the failure to distinguishbetween a useful application page and a filesystem cached page. See knowledge article 1003383.1

    ZFS frees up its cache in a way that does not cause a memory shortage. The system can operate with lower freemem withoutsuffering a performance penalty. ZFS, unlike UFS and VxFS filesystems, does not throttle writers. The UFS filesystem throttleswrites when the number of dirty pages/vnode reaches 16MB. The objective being to preserve free memory. The downside ofthis is slow application write performance that may be unnecessary when plenty of free memory is available. ZFS does notthrottle individual applications unlike UFS and VxFS. ZFS only throttles the application when the data load overflows the IOsubsystem capacity for 5 to 10 seconds. See doc

    However, there are occasions when ZFS fails to evict memory from the ARC quickly which can lead to application startup failuredue to a memory shortage. Also, reaping memory from the ARC can trigger high system utilization at the expense ofperformance. This issue is addressed in bug:

    - target MRU size (arc.p) needs to be adjusted more aggressively

    The work-around is to limit the ZFS ARC by setting:

    set zfs:zfs_arc_max

    in /etc/system file. This tunable determines the maximum size of the ZFS Adaptive Replacement Cache (ARC). The default:3/4th of memory on systems with less than 4GB or physmem minus 1GB on Systems with greater than 4GB of memory. Fordatabases, we know in advance how much memory they will consume. Limit ZFS's ARC to the remaining free memory (andpossibly reduce it even more) by setting zfs_arc_max tunable to desired value.

    This tunable is available in Solaris 10 8/07 (Update 4) KU 120011-14 installed along with fix for .

    One can estimate the amount of kernel memory used for caching ZFS data blocks by running:

    # echo ::memstat | mdb -k

    Type:Status:Last MajorUpdate:Last Update:Language:

    HOWTOPUBLISHED10/10/201310/10/2013English

    Solaris SPARC OperatingSystem

    Solaris x64/x86 OperatingSystem

    Information Center: Overviewof the Oracle Solaris 11Operating System [1559480.2]

    Information Center : Overviewof the Oracle Solaris OperatingSystem [1372665.2]

    Information Center: Overviewof Oracle VM Server for SPARC(LDoms) [1589473.2]

    Information Center: Overviewof the Oracle Explorer DataCollector (STB) [1589529.2]

    : SPARCOracle VM Server(LDoms) [1592337.2]

    Show More

    Understanding How ZFSCalculates Used Space[1369456.1]

    ZFS Write PerformanceDegrades With Threads HeldUp By space_map_load_wait()[1359269.1]

    How to Understand "ZFS FileData" Value by mdb and ZFSARC Size. [1430323.1]

    Swapping to a Solaris 10 ZFSVolume (zvol) May Lead to aSystem Hang [1448052.1]

    Write Throttling in UFS andZFS File Systems [1470681.1]

    Show More

    Document Details

    Related Products

    Information Centers

    Document References

    Recently Viewed

    How Solaris ZFS Cache Management Differs From UFS and VxFS File Systems (Doc ID 1005367.1) To Bottom

    Oracle Solaris SystemPerformance Analysis andTuning Overview[1450811.1]

    Segmap tuning for filesystem performance on

    Dashboard Service Requests Patches & Updates CommunityKnowledge

  • Document 1005367.1

    https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=333292318734259&id=1005367.1&_afrWindowMode=0&_adf.ctrl-state=7scxsz45h_615[8/13/2014 7:27:15 AM]

    Where "ZFS File Data" reports the amount of memory currently allocated in all memory caches associated with ARC file data.This includes both memory actively in use as well as additional memory currently being held unused in the kernel memorycaches. When buffers are evicted from the ARC cache, they are returned to the respective caches (e.g. thezio_data_buf_131072 cache is used for allocating 128k blocks). Buffers in the kernel caches will stay unused until the VMsystem can reap excess capacity in these caches when the system comes under memory pressure. You can determine thekernel memory usage of various caches by running:

    One can also monitor ZFS ARC memory usage using:

    Where: "size" reports amount of active data in the ARC. This value stays within the "target" size set using "zfs_arc_max"tunable.

    Also, if possible, consider using the ZFS "primarycache" property to better control what is cached in ZFS ARC. It allowscaching on a per-dataset (filesystem) basis and thus provide better ARC usage and control. If this property is set to "all", thenboth user data and metadata are cached. If this property is set to "metadata", then only metadata is cached. If this propertyis set to "none", then neither user data nor metadata is cached. The default is "all".

    There is also a good discussion on monitoring ZFS ARC using DTrace scripts and arcstat.pl and arcstat-extended.pl toolsavailable.

    Relief/Workaround

    For Solaris 10 releases prior to 8/07 (Update 4) or without KU 120011-14 installed a script is provided in the.

    ZFS Best Practices Guide:

    The ZFS adaptive replacement cache (ARC) tries to use most of a system's available memory to cache filesystem data. Thedefault is to use all of physical memory except 1 Gbyte. As memory pressure increases, the ARC relinquishes memory.

    Consider limiting the maximum ARC memory footprint in the following situations:

    When a known amount of memory is always required by an application. Databases often fall into this category.

    On platforms that support dynamic reconfiguration of memory boards, to prevent ZFS from growing the kernel cage ontoall boards.

    A system that requires large memory pages might also benefit from limiting the ZFS cache, which tends to breakdownlarge pages into base (4KB for x86 or 8KB for SPARC) pages.

    Finally, if the system is running another non-ZFS filesystem, in addition to ZFS, it is advisable to leave some freememory to host that other filesystem's caches.

    The trade off is that limiting this memory footprint means that the ARC is unable to cache as much file system data and thislimit could impact performance. In general, limiting the ARC is wasteful if the memory that would go unused by ZFS is alsounused by other system components. Note that non-ZFS filesystems typically cache data in what is nevertheless reported asfree memory by the system.

    Additional information about Solaris ZFS topics is available at the Oracle Solaris ZFS Resource Center (Document1372694.1).

    Still have questions about ZFS? Consider asking them in the My Oracle Support "Oracle Solaris ZFS File System"Community.

    REFERENCES

    NOTE:1369456.1 - Understanding How ZFS Calculates Used SpaceNOTE:1359269.1 - ZFS Write Performance Degrades With Threads Held Up By space_map_load_wait()NOTE:1430323.1 - How to Understand "ZFS File Data" Value by mdb and ZFS ARC Size.

    # echo ::kmastat | mdb -k

    # kstat -n arcstats

    Solaris[TM], Solaris[TM] x86[1017874.1]

    Diagnostic Assistant:General Information[201804.1]

    Master Note and QuickReference for ReportsServer Tuning /Configuration Checklist[406379.1]

    smpatch on Oracle Solaris10 8/11 does not work outof the box due to a missingpatch [1362722.1]

    Show More

  • Document 1005367.1

    https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=333292318734259&id=1005367.1&_afrWindowMode=0&_adf.ctrl-state=7scxsz45h_615[8/13/2014 7:27:15 AM]

    Sun Microsystems > Operating Systems > Solaris Operating System > Solaris SPARC Operating System > ZFS > zfs, zpool, zdbSun Microsystems > Operating Systems > Solaris Operating System > Solaris x64/x86 Operating System > ZFS > zfs, zpool, zdb

    ARC; CACHE; FILESYSTEM; MEMORY USAGE; PAGE CACHING; PERFORMANCE; SOLARIS; UFS; VXFS; ZFS

    English Source Japanese

    Back to Top

    NOTE:1448052.1 - Swapping to a Solaris 10 ZFS Volume (zvol) May Lead to a System HangNOTE:1470681.1 - Write Throttling in UFS and ZFS File SystemsNOTE:1347387.1 - Improving ZFS Synchronous Write, O_SYNC|O_DSYNC PerformanceNOTE:1404665.1 - ZFS DeduplicationNOTE:1316513.1 - Failing disk can cause the system to be unresponsive with IO to ZFS zpool hanging

    RelatedProducts

    Keywords

    Translations

    Copyright (c) 2014, Oracle. All rights reserved. Legal Notices and Terms of Use Privacy Statement

    oracle.comDocument 1005367.1

    wtc3RhdGU9N3NjeHN6NDVoXzYxNQA=: f1: kmPgTpl:r1:0:soc3: 0button0: button0_(1): kmPgTpl:svMenu:gsb:subFgsb:mGlobalSearch:pt_itG: