メイン コンテンツに進む
  • すばやく簡単にご注文が可能
  • 注文内容の表示、配送状況をトラック
  • 会員限定の特典や割引のご利用
  • 製品リストの作成とアクセスが可能

Scalability of Dell Ready Solutions for HPC BeeGFS Storage

概要: Scalability of Dell Ready Solutions for HPC BeeGFS Storage.

この記事は次に適用されます: この記事は次には適用されません: この記事は、特定の製品に関連付けられていません。 すべての製品パージョンがこの記事に記載されているわけではありません。

現象

How to scale the Dell BeeGFS High Performance Storage Solution in terms of capacity or performance or both?

原因

See information in Resolution section.

解決方法

Table of Contents

  1. Introduction
  2. Base Configurations
  3. BeeGFS Usable Space Calculation
  4. Scalable Configurations
  5. Performance Characterization
  6. Conclusion and Future Work
     

Introduction

This blog discusses the scalability of Dell EMC Ready Solutions for HPC BeeGFS Storage  which was announced recently.  BeeGFS architecture consists of four main services, the management service, metadata service, storage service and client service. It is possible to run any combination of these four main services, including all of them, on the same server, because the roles and the hardware are not tightly integrated in the case of BeeGFS.  In a "Hyper Converged Solution", all four services run on the same server. This configuration is not recommended for performance critical environments because client applications usually consume resources which may impact the performance of the storage services. The Dell EMC solution uses dedicated storage servers and a dual purpose metadata and storage server to provide a high-performance, scalable storage solution. It is possible to scale the system by adding additional storage servers to an existing system. In this blog,  we will present configurations with different numbers of storage servers and the performance that can be expected with these configurations.

Base Configurations

The BeeGFS Storage Solution, which is designed to provide a high-performance scratch file system, uses the following hardware components:

  • Management Server
    • R640, Dual Intel Xeon Gold 5218 2.3GHz, 16 cores, 96GB (12x 8GB 2666 MT/s RDIMMs), 6 x 15k RPM 300GB SAS, H740P
  • Metadata and Storage Servers
    • R740xd, 2x Intel Xeon Platinum 8268 CPU @ 2.90GHz, 24 cores, 384GB (12x 32GB 2933 MT/s RDIMMs)
    • BOSS card with 2x 240GB M.2 SATA SSDs in RAID 1 for OS
    • 24x, Intel 1.6TB, NVMe, Mixed Use Express Flash, 2.5 SFF Drives, Software RAID

The management server runs the BeeGFS monitoring service. The metadata server utilizes the 12 drives on NUMA 0 zone to host the MetaData Targets (MDTs), while the remaining 12 drives on NUMA 1 zone, host the Storage Targets (STs). A dedicated metadata server is not used because the storage capacity requirements for BeeGFS metadata are very small. The metadata and storage targets and services are isolated on separate NUMA nodes so that a considerable separation of workloads is established. The storage servers used in the configuration have three storage services running per NUMA zone, six total per server. For more details, please refer to the announcement blog. Figure 1 shows the two base configurations that have been tested and validated at the Dell EMC HPC and AI Innovation Lab.

SLN319382_en_US__1baseconfigsupload

Figure 1: Base Configurations

The small configuration consists of three R740xd servers. It has a total of 15 storage targets. The medium configuration has 6xR740xd servers and has a total of 33 storage targets. The user can start with a "Small" configuration or with the "Medium" configuration and can add storage or metadata servers as needed to increase storage space and overall performance, or number of files and metadata performance, respectively. Table 1 shows the performance data for the base configurations which have been tested and validated extensively at the Dell EMC HPC and AI Innovation Lab.

Base Configuration Small Medium
Total U (MDS+SS) 6U 12U
# of Dedicated Storage Servers 2 5
# of NVMe Drives for data storage 60 132
Estimated Usable Space 1.6 TB 86 TiB 190 TiB
3.2 TB 173 TiB 380 TiB
6.4 TB 346 TiB 761 TiB
Peak Sequential Read 60.1 GB/s 132.4 GB/s
Peak Sequential Write 57.7 GB/s 120.7 GB/s
Random Read 1.80 Million IOPS 3.54 Million IOPS
Random Write 1.84 Million IOPS 3.59 Million IOPS

Table 1: Capacity and Performance Details of Base Configurations

 


BeeGFS Usable Space Calculation

Estimated usable space is calculated in TiB (since most tools show usable space in binary units) using the following formula:


BeeGFS Usable Space in TiB= (0.99* # of Drives* size in TB * (10^12/2^40)

In the above formula, 0.99 is the factor arrived at by assuming conservatively that there is a 1% overhead from the file system.  For arriving at the number of drives for storage, 12 drives from the MDS are also included. This is because, in the MDS, the 12 drives in NUMA zone 0 are used for metadata and the 12 drives in the NUMA zone 1 are used for storage. The last factor in the formula 10^12/2^40 is to convert the usable space from TB to TiB.

Scalable Configurations

The BeeGFS High Performance Storage Solution has been designed to be flexible and one can easily and seamlessly scale performance and/or capacity by adding additional servers as shown below:
SLN319382_en_US__2scale4
             Figure 2: Scaled Configuration Examples 

The metadata portion of the stack remains the same for all the above configurations described in this blog. This is because the storage capacity requirements for BeeGFS metadata are typically 0.5% to 1% of the total storage capacity. However, it really depends on the number of directories and files in the file system. As a general rule, the user can add an additional metadata server when the percentage of metadata capacity to the storage falls below 1%. Table 2 shows the performance data for the different flexible configurations of the BeeGFS Storage Solution.

 
Configuration Small Small +1 Small +2 Medium Medium +1
Total U (MDS+SS) 6U 8U 10U 12U 14U
# of Dedicated Storage Servers 2 3 4 5 6
# of NVMe Drives for data storage 60 84 108 132 156
Estimated Usable Space 1.6 TB 86 TiB 121 TiB 156 TiB 190 TiB 225 TiB
3.2 TB 173 TiB 242 TiB 311 TiB 380 TiB 449 TiB
6.4 TB 346 TiB 484 TiB 622 TiB 761 TiB 898 TiB
Peak Sequential Read 60.1 GB/s 83.3 GB/s 105.2 GB/s 132.4 GB/s 152.9 GB/s
Peak Sequential Write 57.7 GB/s 80.3 GB/s 99.8 GB/s 120.7 GB/s 139.9 GB/s

Table 2:Capacity and Performance Details of Scaled Configurations

 

Performance Characterization

The performance of the various configurations was tested by creating storage pools. The small configuration has 15 storage targets and each additional storage server adds an additional six storage targets. So, for the purpose of testing the performance of the various configurations, storage pools were created from 15 to 39 storage targets (increments of six for small, small+1, small+2, medium, medium+1). For each of those pools, three iterations of iozone benchmark were run, each with one to 1024 threads (in powers of two increments). The testing methodology adopted is the same as that described in the announcement blog . Figures 3 and 4 show the write and read performance of the scalable configurations respectively, with the peak performance of each of the configuration highlighted for ready reference:


  SLN319382_en_US__3image003(2)
Figure 3:  Write Performance of Scalable Configurations

SLN319382_en_US__4image004
Figure 4 :  Read Performance of Scalable Configurations

Note:

The storage pool referred to were created only for the explicit purpose of  characterizing the performance of different configurations.  While doing the performance evaluation of the medium configuration detailed in the announcement blog, all the 33 targets were in the "Default Pool" only. The output of the beegfs-ctl --liststoragepools command given below shows the assignment of the storage targets:

# beegfs-ctl --liststoragepools
Pool ID   Pool Description                      Targets                 Buddy Groups
======= ================== ============================ ============================
      1            Default                 1,2,3,4,5,6,7,8,9,10,11,12,                                                                                                                              
                                                 13,14,15,16,17,18,19,20,21,                                                                                                                              
                                                22,23,24,25,26,27,28,29,30,                                                                                                                              
                                                31,32,33  


Conclusion and Future Work

This blog discussed the scalability of Dell EMC Ready Solutions for HPC BeeGFS Storage and highlighted the performance for sequential read and write throughput, for various configurations. Stay tuned for Part 3 of this blog series that will discuss discuss additional features of BeeGFS and will highlight the use of "StorageBench", the built-in storage targets benchmark of BeeGFS. As part of the next steps, we will be publishing a white paper later with the metadata performance, IOR N-1 performance evaluation and with additional details about design considerations, tuning and configuration. 


References

[1] Dell EMC Ready Solutions for HPC BeeGFS Storage:  https://www.dell.com/support/article/sln319381/
[2] BeeGFS Documentation:  https://www.beegfs.io/wiki/
[3] How to connect two interfaces on the same subnet:  https://access.redhat.com/solutions/30564
[4] PCI Express Direct Memory Access Reference Design using External Memory: https://www.intel.com/content/www/us/en/programmable/documentation/nik1412547570040.html#nik1412547565760

 

対象製品

PowerSwitch S3048-ON, Mellanox SB7800 Series, PowerEdge R640, PowerEdge R740XD
文書のプロパティ
文書番号: 000133410
文書の種類: Solution
最終更新: 03 10月 2023
バージョン:  5
質問に対する他のDellユーザーからの回答を見つける
サポート サービス
お使いのデバイスがサポート サービスの対象かどうかを確認してください。