Monitoring for RDMA-Enabled Network
Introduction
Rdma aware networks programming user manual rev note: this hardware, software or test suite product ( product(s) ) and its its related documentation are provided by mellanox technologies. Mellanox OFED manual. Google search for the latest version; RDMA Aware Networks Programming User Manual. RDMA basic concepts Queue Pair. To draw an analogy from everyday mail service, queue pair (QP) defines the address of the communication endpoints, or equivalently, sockets in traditional socket based programming. Programming model and how a transaction persists in remote memory system with fine-grained network persistence, as shown in Figure 3. In these programming model, the software library benefits from the capability of transferring one transaction through the network with much less RDMA RAW operations if they organize logs and data. InfiniBand is an input/output interconnect technology for high performance computing clusters – it is employed in more than one-quarter of the world’s 500 fastest computer systems. Great document that explains how to use RDMA verbs such as ibcreateqp, ibcreatecq, ibpostsend, etc. Attached revision 1.4. Check the latest document under: mellanox.com » Products » Adapter IB/VPI SW » Related Documents » RDMA Aware Networks Programming User Manual.
As an emerging high-speed network technology, RoCE network has been widely deployed in data centers to dramatically provide high throughput, ultra-low latency and lower CPU overhead. Many research works focus on RoCE-driven distributed system, transactions mechanisms and network protocol optimization in industry and academia. However, most RDMA applications can only guarantee high performance in resource-isolated environments. In a data center or cluster where multiple RDMA applications coexist, efficient and real-time network load measurement, monitoring and balancing from a global perspective for RDMA-enabled network is a big challenge to help to boost RoCE networking performance, efficiency, scalability, reliability, and so on. There is no doubt that the challenge mentioned above is an important topic in 'Measuring and debugging real network systems'.
Problems
Problem-1(Easy): Collect L1/L2 RoCE network information.
Build a framework for collecting various RoCEv1/v2 network traces and replaying the RoCE traces data without real jobs. This will help researchers reproduce the network traffics in a datacenter. Then the network workloads can be located and analyzed for futher load balancing.
Problem-2(Advanced): RDMA-Enabled Traffic Generator for RDMA network experiments.
Build a RoCE-based traffic generator which can generate dynamic, diverse and representative RoCEv1/v2 traffics for RDMA network experiments. This traffic generator contains two critical components: server and client.
The server listens for incoming requests, and replies with a flow with the requested size for each request.
According to a user-defined configuration file, the client establishes multi-types (including RC/UC/UD) RDMA connections and randomly generates two-sided (RDMA SEND/RECV) or one-sided (RDMA WRITE/READ) requests with different sizes at the specified requested rate over RDMA connections.
In the configuration file, the user can specify the list of destination servers with RNICs, the request size distribution, the sending rate distribution with different RDMA verbs, the multi-type (RC/UC/UD) RDMA connections distribution, the different RDMA verbs invoking distribution and so on.
Problem-3(Hard): RDMA Packet based Sketch algorithm for Measurement.
Rdma Aware Networks Programming User Manual
Build a Sketch-based measurement module or framework for RDMA-enabled network. As the mainstream of network measurement research, high-speed traffic network measurement based on sketch is a research hotspot in recent years. The **sketch-based ** high-speed traffic network measurement can detect large stream and abnormal stream without occupying too much computing and space (cache/memory) resources. Sketch is a hash-based data structure which stores traffic characteristics information in a real-time manner for a high-speed network environment. It occupies only small space resources, and has theoretically provable balance of estimation accuracy and memory. Therefore, can we apply the Sketch-based measurement to an RDMA-enabled network for workload monitoring? This is an interesting challenge.
Technical Skills Required
Participants are required to have:
- Basic RDMA (Remote Direct Memory Access) background. (Kernel-bypassing networking, RoCEv1/v2, RDMA verbs, RDMA key components, HCA, L1/L2, control-/data-plane, throughput, RTT)
- Solid C/C++ programming. (memory allocator, C++ template, multiple threads, network, lock, etc.)
- Linux and shell commands.
Participants are encouraged to have the following skills or experience:
- Basic knowledge on RDMA programming. (Available RDMA communication operations, RDMA transport modes, VPI Verbs API, etc.)
- Developing experience on RDMA-driven system. (optional)
- Basic knowledge on Sketch data structure. (optional)
Useful References
[1] RDMA Aware Networks Programming User Manual Rev 1.7. http://www.mellanox.com/related-docs/prod_software/RDMA_Aware_Programming_user_manual.pdf
[2] A Tutorial of the RDMA Model. http://www.hpcwire.com/hpcwire/2006-09-15/a_tutorial_of_the_rdma_model-1.html
Rdma Network Adapters
[3] RDMA mojo : blog on RDMA technology and programming. http://www.rdmamojo.com/
[4] Cormode G, Muthukrishnan S. An improved data stream summary: the count-min sketch and its applications. [J] Springer Journal of Algorithm, 2004, LNCS 2976,pp. :29-38.
[5] Huang Q, Jin X, P. C. Lee P, et al. SketchVisor: Robust Network Measurement for Software Packet Processing[C]//Proceedings of the 2017 ACM SIGCOMM Conference, 2017: 113-126.
[6] Huang Q , P. C. Lee P, Bao Y. SketchLearn: Relieving User Burdens in Approximate Measurement with Automated Statistical Inference[C] //Proceedins of the 2018 ACM SIGCOMM Conference, 2018: 576-590.
[7] Yang T, Jiang J, Liu P, et al. Elastic Sketch: Adaptive and Fast Network-wide Measurements[C] //Proceeding of the 2018 ACM SIGCOMM Conference, 2018:561-575
[8] YU M, JOSE L, MIAO R. Software defined traffic measurement with OpenSketch[C]//Presented as Part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13). 2013: 29-42.
[9] Huang Q, P. C. Lee P. A hybrid local and distributed sketching design for accurate and scalable heavy key detection in network data. streams.[C]//Proceeding of the 2014 IEEE INFOCOMM conference. 2014: 298-315.
Rdma Protocol
[10] Dai M, Cheng G. Sketch-based data plane hardware model for software-defined measurement [J] Journal on Communications, 2017, 38 (10): 20172031-20172039.
[11] RDMA Samples. https://github.com/tarickb/the-geek-in-the-corner