Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products
Some article numbers may have changed. If this isn't what you're looking for, try searching all articles. Search articles

Data Domain: FS process PANIC in the inode cache when running out of memory in cache element pool

Summary: A defect has been found in some recent DDOS versions (confirmed in 7.7.4, 7.9.0.10 and 7.10.0, dubious if affecting DDOS 7.7.3 as well) by which an FS process PANIC may occur in the inode cache code when, depending on the workload, a cache element pool runs out of memory for further allocations. ...

This article applies to   This article does not apply to 

Symptoms

There is no degradation or advance warning for this issue, which will manifest itself in the form of an FS process failure (PANIC), after which, the process would restart and come up again fine automatically.
Due to the code path being exercised, the FS process may PANIC in several different ways, including the following:
PANIC: ddr/sm/ddfs/ddfs_mtree.c: ddfs_mtree_list: 829: !((dd_errno(e) == ENOENT) || (dd_errno(e) == DD_ERR_FM_EATTRNOENT) || (dd_errno(e) == DD_ERR_STALE))
PANIC: ddr/fv/file_verify.c: file_verify_update_marker_attrs: 4872: Fatal Error
PANIC: ddr/fv/file_verify.c: file_verify_update_snap_attr: 4446: Fatal Error
PANIC: ddr/fv/file_verify.c: file_verify_update_marker_attrs: 4860: Fatal Error
In the FS process log files (ddfs.info) the following messages will be found prior to each process crash:
01/17 20:21:59.292947 [7fbbf4f98f50] dd_cache_elem_reclaim: Evict count=256, Visited count=257, Skipped elem count=0, Skipped bucket count=0, Time threshold=1539816333626910. (99% full) Complete=True
01/17 20:22:04.662303 [7fbb031ad4f0] ERROR: FM fm_iget:355 - fm_iget failed to allocate elem in dd_cache 5001

Messages indicating the internal process full was 99% full, then unable to allocate any further elements, hence leading to process crash. 

NOTE: This issue is known to only affect the following versions:
  • DDOS 7.7.3.x : Not fully confirmed
  • DDOS 7.7.4.x
  • DDOS 7.9.0.10
  • DDOS 7.10.0.x

Cause

For any file operation like read/write, an inode structure is allocated from the dd_cache element pool.
If this cache is full and a new request comes in, then an element is evicted from this cache and the new request is fulfilled.
This eviction is based on a time policy (an element is evicted if it has not been accessed in last 'x' seconds).
In case this cache becomes too hot (all elements have been accessed within last 'x' seconds), and no elements can be evicted even after multiple retries, then fm_iget returns DD_ERR_NOMEM.
Some callers of this element pool allocation will be unable to handle the error gracefully and hence cause the FS process to PANIC and dump core should function "fm_iget" returns any error. That is why there are a few different PANIC signatures corresponding to the underlying code defect.

Resolution

The fundamental code issue resulting in these FS process crashes is fixed using DDOS-168410 in the following versions (and all later ones in the same code branches) :
  • DDOS 7.7.5.1
  • DDOS 7.10.1.0
  • DDOS 7.11.0
Customers impacted by this problem who cannot immediately upgrade to any of the releases above can try a workaround for which they need to contact Dell Support.
If running a version with the problem (those listed above) but you have not experienced an unexpected FS process crash yet matching the symptoms in this KB, it is our recommendation to not proactively apply the workaround, and instead, upgrade to any of the fixed releases above (or any of their successors) to avail of the latest updates and code fixes.

Affected Products

Data Domain