The BeeGFS Storage Solution, which is designed to provide a high-performance scratch file system, uses the following hardware components:
The management server runs the BeeGFS monitoring service. The metadata server utilizes the 12 drives on NUMA 0 zone to host the MetaData Targets (MDTs), while the remaining 12 drives on NUMA 1 zone, host the Storage Targets (STs). A dedicated metadata server is not used because the storage capacity requirements for BeeGFS metadata are very small. The metadata and storage targets and services are isolated on separate NUMA nodes so that a considerable separation of workloads is established. The storage servers used in the configuration have three storage services running per NUMA zone, six total per server. For more details, please refer to the announcement blog. Figure 1 shows the two base configurations that have been tested and validated at the Dell EMC HPC and AI Innovation Lab.
Figure 1: Base Configurations
The small configuration consists of three R740xd servers. It has a total of 15 storage targets. The medium configuration has 6xR740xd servers and has a total of 33 storage targets. The user can start with a "Small" configuration or with the "Medium" configuration and can add storage or metadata servers as needed to increase storage space and overall performance, or number of files and metadata performance, respectively. Table 1 shows the performance data for the base configurations which have been tested and validated extensively at the Dell EMC HPC and AI Innovation Lab.
Base Configuration | Small | Medium | |
---|---|---|---|
Total U (MDS+SS) | 6U | 12U | |
# of Dedicated Storage Servers | 2 | 5 | |
# of NVMe Drives for data storage | 60 | 132 | |
Estimated Usable Space | 1.6 TB | 86 TiB | 190 TiB |
3.2 TB | 173 TiB | 380 TiB | |
6.4 TB | 346 TiB | 761 TiB | |
Peak Sequential Read | 60.1 GB/s | 132.4 GB/s | |
Peak Sequential Write | 57.7 GB/s | 120.7 GB/s | |
Random Read | 1.80 Million IOPS | 3.54 Million IOPS | |
Random Write | 1.84 Million IOPS | 3.59 Million IOPS |
Table 1: Capacity and Performance Details of Base Configurations
In the above formula, 0.99 is the factor arrived at by assuming conservatively that there is a 1% overhead from the file system. For arriving at the number of drives for storage, 12 drives from the MDS are also included. This is because, in the MDS, the 12 drives in NUMA zone 0 are used for metadata and the 12 drives in the NUMA zone 1 are used for storage. The last factor in the formula 10^12/2^40 is to convert the usable space from TB to TiB.
BeeGFS Usable Space in TiB= (0.99* # of Drives* size in TB * (10^12/2^40)
Configuration | Small | Small +1 | Small +2 | Medium | Medium +1 | |
---|---|---|---|---|---|---|
Total U (MDS+SS) | 6U | 8U | 10U | 12U | 14U | |
# of Dedicated Storage Servers | 2 | 3 | 4 | 5 | 6 | |
# of NVMe Drives for data storage | 60 | 84 | 108 | 132 | 156 | |
Estimated Usable Space | 1.6 TB | 86 TiB | 121 TiB | 156 TiB | 190 TiB | 225 TiB |
3.2 TB | 173 TiB | 242 TiB | 311 TiB | 380 TiB | 449 TiB | |
6.4 TB | 346 TiB | 484 TiB | 622 TiB | 761 TiB | 898 TiB | |
Peak Sequential Read | 60.1 GB/s | 83.3 GB/s | 105.2 GB/s | 132.4 GB/s | 152.9 GB/s | |
Peak Sequential Write | 57.7 GB/s | 80.3 GB/s | 99.8 GB/s | 120.7 GB/s | 139.9 GB/s |
Table 2:Capacity and Performance Details of Scaled Configurations
The storage pool referred to were created only for the explicit purpose of characterizing the performance of different configurations. While doing the performance evaluation of the medium configuration detailed in the announcement blog, all the 33 targets were in the "Default Pool" only. The output of the beegfs-ctl --liststoragepools command given below shows the assignment of the storage targets:
# beegfs-ctl --liststoragepools
Pool ID Pool Description Targets Buddy Groups
======= ================== ============================ ============================
1 Default 1,2,3,4,5,6,7,8,9,10,11,12,
13,14,15,16,17,18,19,20,21,
22,23,24,25,26,27,28,29,30,
31,32,33
[1] Dell EMC Ready Solutions for HPC BeeGFS Storage: https://www.dell.com/support/article/sln319381/
[2] BeeGFS Documentation: https://www.beegfs.io/wiki/
[3] How to connect two interfaces on the same subnet: https://access.redhat.com/solutions/30564
[4] PCI Express Direct Memory Access Reference Design using External Memory: https://www.intel.com/content/www/us/en/programmable/documentation/nik1412547570040.html#nik1412547565760