温馨提示:本站仅提供公开网络链接索引服务,不存储、不篡改任何第三方内容,所有内容版权归原作者所有
AI智能索引来源:http://www.fs.com/blog/building-hpc-data-center-networking-architecture-with-fs-infiniband-solution-9750.html
点击访问原文链接

Building HPC Data Center Networking Architecture with FS InfiniBand Solution

Building HPC Data Center Networking Architecture with FS InfiniBand Solution FS United StatesFREE SHIPPING on Orders Over US$79Contact UsUnited States / $ USDAll ProductsSolutionsServicesResourcesContact UsFREE SHIPPING on Orders Over US$79 United StatesHomeHPCData CenterEnterprise NetworkCablingWDM, OTN, PONSoftwareAmpCon™PicOS®AirwareAmpCon™-TAmpCon-DCAmpCon-CampusHardwareNetwork SwitchNetworking DevicesOptics and TransceiversFiber Optic CablesCopper CablesPatch Panels, Cassettes, EnclosuresTesters and ToolsOptical Networking DevicesPowerNewsroomHomeHPCData CenterEnterprise NetworkCablingWDM, OTN, PONSoftwareHardwareNewsroomHome/Hardware/Building HPC Data Center Networking Architecture with FS InfiniBand Solution/Building HPC Data Center Networking Architecture with FS InfiniBand Solution

HowardMay 24, 20241 min read

In the ever-evolving landscape of high-performance computing, the backbone of future HPC business development lies in HPC networking and infrastructure. As HPC applications grow in complexity and data volume, the demand for resilient, scalable, and efficient networks becomes imperative. The architecture of HPC networks serves as the bedrock for HPC system operations, playing a pivotal role in data processing, management, and large-scale storage. This article delves into the key components of HPC network architecture, elucidates HPC data center networking's advantages, and explores FS's comprehensive solution with extensive products for different partitions of the HPC data center network architecture.What is the HPC Networking Architecture and its Key Components?The network architecture for HPC workloads is meticulously designed and comprises three key components: computing network, management network, and storage network. They collaborate to tackle the most complex algorithms, unlocking new potentials across various domains. For a practical illustration of how these components are integrated in real-world deployments, you can refer to FS’s InfiniBand-based HPC networking solution.Computing NetworkThe computing network serves as the computational backbone of HPC networking systems, consisting of HPC computing networks and general-purpose computing networks.HPC computing networks are tailored for high-performance HPC tasks, adept at efficiently processing extensive data volumes and executing tasks that require complex computations, such as image recognition, natural language processing, and model inference. HPC computing networks typically comprise GPU servers, high-performance switches, high-speed modules, and high-grade DAC/AOC cables, forming large computing network clusters that collaborate to accelerate HPC workloads and deliver real-time insights. The HPC computing cluster has the most performance-demanding characteristics, as it serves as the backbone for processing HPC workloads, requiring a network with high performance, lossless transmission, low latency, and scalability. Therefore, HPC computing networks typically consist of 400G and higher connections and utilize powerful networking technologies, such as InfiniBand interconnection.General-purpose computing networks primarily handle general application traffic, offering versatile computing resources essential for HPC networks, such as deep learning platforms and other software. They provide a flexible computing environment capable of accommodating diverse workloads and applications except data-intensive computing tasks for HPC. This network usually consists of 10/25/100/400/800 Gigabit Ethernet (GE) connections.Management NetworkThe management network primarily deploys service management systems and operational support components to efficiently allocate workloads and distribute resources, ensuring optimal performance and resource utilization.The management network in HPC data center network architecture can be categorized into out-of-band and in-band management networks. The out-of-band management network can host the management port access of multiple terminal types in the data center, and monitor and manage the status of physical devices in the cluster, enabling unified operation and remote maintenance. The in-band management network provides the Internet interface to the business/office network, providing Internet access to the data center.Data centers targeting HPC workloads are typically enormous, with possibly thousands of 100G-800G ports per cluster. To enhance network interoperability and enable unified management of such a large amount of network infrastructure, the management network typically employs open networking operating systems to create a highly resilient, flexible, and reliable network.Storage NetworkIn HPC data centers, the storage network utilizes high-speed, high-bandwidth interconnected storage systems primarily designed for storing vast datasets generated by HPC applications. This network includes components such as storage servers, storage devices, and storage management software. The storage server connects the components within the HPC network, enabling seamless data exchange and access to data stored on the servers. In HPC data centers, storage devices typically feature high-speed and high-capacity attributes to accommodate the storage requirements of extensive datasets. Meanwhile, to ensure rapid and efficient data transmission, it is important to deploy high-speed network infrastructure, including switches and optical modules. Storage management software plays a pivotal role in overseeing and controlling storage systems, encompassing functions like data management, storage resource management, data backup, recovery, and data security.In well-designed HPC network architectures, the storage network and its infrastructure are optimized for high throughput and low latency, ensuring cost-effective and reliable data storage.How Does HPC Data Center Networking Stand Out?Different fabrics within HPC data center network architecture collaborate to construct a lossless, high-performance, and scalable network. This network efficiently distributes workloads among multiple interconnected computing resources, enabling enterprises to rapidly scale the operation of large-scale multi-node training workloads and stay ahead in the industry competition. The following characteristics of HPC data center networking enable it to meet various HPC workloads and scale requirements.Parallel Computing – HPC data center networks utilize parallel processing, enabling the simultaneous execution of multiple workloads. With thousands of tasks processed concurrently, operations are completed within milliseconds. This empowers industries to train bigger, better, and more accurate models, which can accelerate industry advancements.Size – HPC data centers are typically massive in scale, potentially comprising thousands of computing engines (such as GPUs and CPUs) and a vast array of network connectivity infrastructure operating at different speeds.Bandwidth – High-bandwidth traffic needs to flow in and out of servers for applications to operate effectively. In modern data center deployments, HPC functions are achieving interface speeds of up to 400G per compute engine.Latency – The completion time of HPC workloads is a critical factor influencing user experience. Therefore, HPC data center networks often adopt low-latency network technologies such as InfiniBand and RDMA.Lossless – A lossless network minimizes packet loss, enabling smooth and efficient data transmission, which is essential for data centers handling HPC workloads to maintain data integrity and optimize performance.Unified Management – Large-scale HPC networks consist of numerous network infrastructures. Typically, unified management platforms are employed to configure, monitor, and oversee these components, thereby simplifying operations and boosting system security.For more details about RDMA, check RDMA-Enhanced High-Speed Network for Training Large ModelsBuilding Effective Network for HPC Workloads with FS InfiniBand SolutionIn the global landscape of increasing adoption of HPC applications and HPC computing, FS unveils the high-performance HPC solution. Leveraging high-speed, low-latency InfiniBand technology and an elastic, efficient network operating platform - PicOS® and AmpCon Management Platform, FS H100 InfiniBand solution aids enterprises in optimizing HPC workloads, simplifying HPC business processes, and driving intelligent applications of HPC across various industries.Full Range of NVIDIA® InfiniBand Products Empowering Computing NetworkNVIDIA® InfiniBand is globally recognized as a high-speed, low-latency, and scalable solution tailored for supercomputers, HPC, and cloud data centers, making it the prime choice for HPC computing networks. As a trusted Elite Partner in the NVIDIA Partner Network, FS offers a comprehensive range of NVIDIA® InfiniBand products, serving as a reliable solution provider in the HPC field.As shown in the diagram below, FS offers the NVIDIA® Quantum-2 MQM9790 InfiniBand switch, NVIDIA® ConnectX®-7 InfiniBand Adapter, and InfiniBand transceivers and cables with speeds of up to 800G, forming a specialized InfiniBand network for HPC computing. This network provides the fastest networking performance and feature sets available to tackle the world’s most challenging problems.InfiniBand Quantum-2 SwitchesFS NVIDIA® QM9700/9790 infiniBand switches comprise 64 400Gb/s ports or 128 200Gb/s ports on physical 32 OSFP connectors. They can deliver an aggregated 51.2 Tb/s of bidirectional throughput with a capacity of more than 66.5 billion packets per second (bpps), delivering world-leading networking performance.NVIDIA® ConnectX®-6/7 InfiniBand AdaptersFS NVIDIA® InfiniBand adapters support PCIe 5.0 and deliver single network ports at 400Gb/s. FS's NVIDIA® ConnectX-7 InfiniBand adapters include advanced In-Network Computing capabilities and additional programmable engines that enable preprocessing data algorithms and offload application control paths to the network.InfiniBand Transceivers and CablesVarious HDR, NDR, and XDR InfiniBand transceivers and DAC/AOC/ACC cables with 1–2 and 1–4 splitter options provide maximum flexibility to build a topology of choice. FS NVIDIA® InfiniBand modules and cables undergo 100% original equipment validation, ensuring perfect compatibility with targeted NVIDIA InfiniBand devices, including Quantum-2 switches and ConnectX-7 adapters. Besides, each transceiver and cable undergoes diverse testing, including Optical Spectrum, Eye Pattern, and Bit Error Rate (BER) tests, to guarantee optimal signal integrity, ultra-low latency, and error-free data transmission.PicOS® and AmpCon Platform Enabling Intelligent Network ManagementIn the management network, FS PicOS® switches can utilize the advanced PicOS® software and AmpCon management platform feature sets to empower customers to efficiently provision, monitor, manage, preventatively troubleshoot, and maintain the HPC infrastructure, realizing higher utilization and reducing overall OPEX. The FS PicOS® software and AmpCon management platform synergize effectively to enable visualized HPC-driven operations and management across the entire HPC data center network. Their specific advantages include:FS PicOS® SoftwarePicOS® is fully standardized and backward compatible with existing networks, making it easy to integrate with switches from Cisco, Juniper, and others. This allows customers to upgrade their networks according to their budget gradually.Enable zero-trust security for access layers with integrated leading NAC Policy Manager and comprehensive security mechanism support.Work with AmpCon to automate switch provisioning, deployment, and error-free configuration at scale, resulting in reduced OpEx.Use an open solution with spine-leaf arrays to support flexible and scalable virtualization architectures.Achieve full network visibility with SNMP and sFlow, while gNMI delivers efficient and effective open telemetry.FS AmpCon Management PlatformSupport for Zero-Touch Provisioning greatly simplifies the installation and deployment process, enabling effortless deployment of hundreds or even thousands of PicOS® switches.Robust graphical user interfaces (GUIs) enable real-time monitoring of network performance and conditions, with the capability to store monitoring data in either an on-premises or cloud-based database for further analysis.End-to-end networking lifecycle management and automated provisioning, maintenance, compliance checking, and upgrades to prevent misconfigurations and downtime.As an open and extendable platform, AmpCon is ready to take advantage of telemetry and other emerging technologies, continuously evolving to bring new levels of analysis and automation in the HPC era.FS PicOS® Switches Enhancing Large-Scale Data Storage EfficiencyConnected via 100G optical modules, the FS PicOS® switches establish a scalable and high-bandwidth network, facilitating efficient data transmission for HPC data center storage systems. Meanwhile, FS PicOS® switches support the BGP protocol with powerful routing control capabilities while ensuring the optimal forwarding path and low-latency forwarding status of the storage network. These robust switches significantly boost the performance of storage networks, thereby meeting the demanding requirements of modern HPC workloads with ease.The Final ThoughtAs the construction of HPC data center networks continues to expand, FS stands out as a global provider of HPC computing solutions. In addition to offering highly reliable solutions and products, FS boasts seven global local warehouses covering over 200 countries, along with a robust and agile supply chain system. This allows for swift product delivery, shortening customer project cycles and enabling them to seize opportunities in the HPC market swiftly. FS tailors customized solutions for different partitions of HPC data center network architecture, facilitating precise configuration based on customer budget requirements and helping them effectively manage project costs.As the field of HPC advances, FS remains committed to the HPC era, continuously innovating cutting-edge HPC computing solutions to accelerate the adoption of HPC technology across diverse industries. Related Articles:The Rise of HPC Data Centers: FS Empowering Next-gen Data CentersInfiniBand Insights: Powering High-Performance Computing in the Digital AgeFS AmpCon: Your Network Automation PartnerAmpCon-DC Management Platform 1-Year Subscription with Support Service for Data Center Automated Underlay ConfigurationTelemetryTopology Auto-discoveryZTPUS$87.00NVIDIA/Mellanox MMA4Z00-NS Compatible 800GBASE 2xSR4/SR8 OSFP Finned Top PAM4 850nm 50m DOM Dual MPO-12/APC MMF InfiniBand NDR Optical Transceiver Module for Quantum-2 SwitchesOSFP 800G2xSR4 50mFinned Top100G PAM4US$879.00NVIDIA/Mellanox MMA4Z00-NS400 Compatible 400GBASE-SR4 OSFP Flat Top PAM4 850nm 50m DOM MPO-12/APC MMF InfiniBand NDR Optical Transceiver Module for ConnectX-7 HCAOSFP 400GSR4 50mFlat Top100G PAM4US$769.00MQM9700-NS2F, NVIDIA® 64-Port NDR 400G InfiniBand Data Center Switch, 32 OSFP Ports, Managed, x86 Dual Core, NVIDIA Quantum™-2 Chip, P2C AirflowInfiniband NetworkSpineQuantum™-2ManagedVL2VL US$38,400.00MQM9790-NS2F, NVIDIA® 64-Port NDR 400G InfiniBand Data Center Switch, 32 OSFP Ports, Unmanaged, x86 Dual Core, NVIDIA Quantum™-2 Chip, P2C AirflowInfiniband NetworkSpineQuantum™-2UnmanagedVL2VL US$24,300.00NVIDIA Mellanox MCX75510AAS-NEAT ConnectX®-7 InfiniBand Adapter Card 400GbE/NDR, Single-Port OSFP, PCIe 5.0 x16, Tall BracketPCIe5.0 x16Secure BootInfiniband US$2,177.00NVIDIA Mellanox MCX75310AAC-NEAT ConnectX®-7 InfiniBand & Ethernet Adapter Card 400GbE/NDR, Single-Port OSFP, PCIe 5.0 x 16, Crypto and Secure Boot, Tall&Short BracketPCIe 5.0 x16Secure BootCryptoInfiniband & Ethernet RoCEUS$2,518.00NVIDIA Mellanox MCX75310AAS-NEAT ConnectX®-7 InfiniBand & Ethernet Adapter Card 400GbE/NDR, Single-Port OSFP, PCIe 5.0 x16, Secure Boot, Tall&Short BracketPCIe5.0 x16Secure BootInfiniband & Ethernet RoCEUS$2,188.00NVIDIA/Mellanox MFP7E10-N001 Compatible 1m (3ft) MTP® Jumper, MTP®-12 APC (Female) to MTP®-12 APC (Female), 8 Fibers, Multimode (OM4), Plenum (OFNP), 0.35dB Max, Type B, MagentaUS Conec MTP®BIFFor NVIDIA IBUS$59.00Categories: HardwareSoftwareOptics and TransceiversPicOS®Networking DevicesHPCAmpCon™Network SwitchTags: #AmpCon™#PicOS®#InfiniBand Switch#Data Center#High-performance Computing#InfiniBandRelated BlogsComprehensive Guide to Ethernet Cable Types and Speeds: What You Should KnowPatch Cable vs Crossover Cable: What Is the Difference?Multimode Fiber Types: OM1 vs OM2 vs OM3 vs OM4 vs OM5About Us

Overview

Global Warehouse

Advanced R&D Center

Quality Control

Compliance Center

Test Center

Contact Us

Service

Payment Methods

Shipping Guide

Business Account

Net Terms

Return Policy

Product Warranty

Give us your Feedback

Resource

Documentation

Glossary

Audio & Video

FS Blog

Case Studies

Support

FAQ & Help Center

Solution Consulting

Query Tool

WDM Transceiver Stock List

Products Verification

Track My Order

RMA Checklist

Stay in TouchSubscribeAbout Us

Overview

Global Warehouse

Advanced R&D Center

Quality Control

Compliance Center

Test Center

Contact Us

Service

Payment Methods

Shipping Guide

Business Account

Net Terms

Return Policy

Product Warranty

Give us your Feedback

Resource

Documentation

Glossary

Audio & Video

FS Blog

Case Studies

Support

FAQ & Help Center

Solution Consulting

Query Tool

WDM Transceiver Stock List

Products Verification

Track My Order

RMA Checklist

Stay in TouchSubscribe

Download FS APP

United States / $ USDSite MapAccessibilityPrivacy Policy and Notice at CollectionCookies NoticeTerms and ConditionsReport a VulnerabilityDo Not Sell or Share My Personal Information United States

Download FS APP

Privacy Policy and Notice at CollectionCookies NoticeTerms of UseDo Not Sell or Share My Personal Information

智能索引记录