Latest advances in performance of the most popular

2022-08-07
  • Detail

The latest progress in the performance of radioss explicit parallel computing

Abstract: Explicit Parallel Computing has always been the only choice for the automotive, aerospace, military and electronic industries in simulating transient and highly nonlinear problems such as collision, impact, explosion and fall. With the development of computer hardware and software technology, the solution time has been reduced by an order of magnitude. However, with the further study of the problem, we need more refined lattice and constitutive relations with failure modes to simulate local failure modes; In addition, with the change of computer hardware architecture and the emergence of multi-core processors and GPU computing units, how to effectively use the existing hardware architecture to develop parallel programs has become an important issue for software developers. This paper introduces the new methods used by radioss to improve the performance of parallel computing, including advanced quality scaling, multi domain solution technology, hybrid MPP

Keywords: parallel computing, advanced mass scaling, multi domain solution, hybrid MPP, radioss

1 introduction

explicit finite element solution technology has always been the only choice for the automotive industry, aerospace, military and electronic industries to simulate highly nonlinear problems such as collision, high-speed impact, explosion and fall. Especially in the automotive industry, with the improvement of safety laws and regulations, major OEMs have made a lot of investment in collision numerical simulation, so as to save the number of prototype vehicles for experiments and shorten the entire product development cycle, improve product safety and speed up the speed of new vehicle models into the market. In the past 30 years, the finite element grid scale of the whole vehicle has developed from 10000 units in the 1980s to 1000000~2000000 units now, and the unit scale has increased by 100~200 times. Of course, with the development of computer hardware and parallel MPI, the time of vehicle crash simulation has not increased by the same margin. However, even with the existing hardware and software parallel technology, the current vehicle crash simulation time is still between a few hours and more than ten hours. Therefore, in the current automobile design and development, there is no way to optimize the crash safety, consider the reliability analysis, and use the integrated failure model to analyze the damage of components. However, in the next 5~10 years, these will certainly become the problems to be considered in the simulation. In addition, the development of multi-core CPU and GPU computing technology requires innovation in the development of parallel software, so as to make full use of the existing hardware architecture to improve the performance of parallel computing and meet the increasingly sophisticated simulation requirements. As an explicit solver in the industry, radioss has made a series of attempts and improvements in this area, and has made significant breakthroughs

raioss is a new generation of finite element solver for linear and nonlinear simulation. It can be used to simulate structure, fluid, fluid solid coupling, sheet metal stamping and mechanism system motion. This powerful, multidisciplinary solution enables manufacturers to maximize the durability, noise and vibration performance, crashworthiness, safety and manufacturability of the design, so that new products can be put into the market faster. Since the first version was released in 1987, radioss has had more than 20 years of successful application experience in the automotive and military industries. From the automobile safety simulation initially applied in PSA and Ford to aerospace, military, railway, electronics, biology and other industries. At the radio exhibition, after SS joined the HyperWorks software package of Altair engineering in 2006, it has released the latest version 10.0, which has made great progress in software reliability, scalability and computing repeatability. The following contents of this paper will focus on the advanced quality scaling, multi domain solution, hybrid MPP technology, which are newly applied to the radioss solver, and their application effects in the actual model

2 advanced mass scaling

mass scaling is not a new concept in explicit collision analysis. It is widely used to improve the time step of the whole model. The standard mass scaling method improves the time step of the element by adding mass on some element nodes with small time step. This method will artificially introduce non physical mass and energy into the whole model, so it needs to be used with caution. In practice, it is not allowed to add too much mass to the local parts to cause the change of the global attributes of the parts, which limits the magnification of the time step by this method. This is why it is necessary to specify a reasonable minimum unit length when formulating grid division standards in the automotive industry. In order to solve the problem of standard mass scaling, some researchers have proposed a selective mass scaling technique for explicit finite element analysis [1]. Radioss software implements and develops this technology, providing advanced quality scaling capabilities in the software. Advanced quality scaling in radioss improves the time step by modifying the off diagonal terms of the quality matrix

m = [m] + a

where a is assembled by the element mass matrix. For a four node shell element, the element mass matrix is

the use of advanced mass scaling can provide higher parallel scalability, and only change the high-frequency part of the result, with little impact on the low-frequency. It can be applied to the vehicle crash simulation. This method can usually increase the time step by 10 to 20 times, but it will also consume part of the calculation time because of the need to inverse the off diagonal mass matrix. For this method, we use the following test model for comparison (as shown in Figure 1). The model is a box girder with one end fixed and the other end loaded with 2200n force. Johnson cook elastoplastic constitutive model is used for beam material. The two models use large-scale grids (time steps Δ t=0.57 μ s) And fine grid (time step Δ t=0.15 μ s)。 Figure 2 shows the deformation of the two models at 5ms

figure 1 Test model diagram

figure 2 Deformation diagram of rough lattice model and fine lattice model at 5ms

for this model, we use five different solution settings to compare the calculation time and the displacement response of beam end nodes. a) Rough cells, using default time steps Δ t=0.57 μ s; b) Fine cells, using default time steps Δ t=0.15 μ s; c) Fine cells, using standard quality scaling to force time steps of Δ t=0.57 μ s; e) Fine cells, using advanced quality scaling to force time steps to Δ t=0.57 μ s; f) Fine cells, using advanced quality scaling to force time steps to Δ T=1 it includes upper base and lower base μ s。 The end displacement responses for the five settings are shown in Figure 3

figure 3 The displacement response curves of end nodes with different settings

can be seen from Figure 3. Rough grid 1. Requirements of hydraulic universal testing machine for fixture materials: result a can not correctly describe the deformation of the structure, result C because too much quality is introduced, the actual model can not accurately represent the physical model, and the displacement of end nodes can not represent the actual state. B. The results of E and F can show the buckling of the results, and the results of the three curves are very close. The comparison of solution time and acceleration rate of the three credible result models is shown in Table 1. It can be seen from the results of B and e that theoretically, the acceleration rate should be 3.8 when the time step is increased from 0.15us to 0.57us. However, in practice, because the advanced quality scaling method requires additional CPU resources to inverse the quality matrix, the actual acceleration ratio will be smaller than this, but considerable improvement can be seen

3 multi domain solution

in many explicit integration solvers, the time step of the whole model is controlled by the unit with the smallest time step. The idea of multi domain solution is to divide the whole model into physically equivalent sub domain models, and each independent domain has different time steps. Each sub domain is solved by its own time step as an independent model, and the transfer of force and torque between sub domains is calculated through an independent program to ensure the stability constraints (as shown in Figure 4)

figure 4 Multi domain solution architecture

each independent domain is a complete radioss computable model. It runs independently and only communicates with rad2rad. Synchronization between domains is only performed at the specified time. As shown in Figure 5, it is assumed that two domains a and B have independent time steps TA and TB (ta>tb). The two domains operate independently, but both communicate with rad2rad. The two domains synchronize the force, torque and other physical quantities at the interface at the specified time point (the red time point in the figure)

figure 5 Time step synchronization diagram

Figure 6 shows the impact simulation of a head hitting the upper part of the front panel of the car. Among them, the parts and components of the wiper part are relatively fine, while the dimensions of other structural parts of the whole car body are relatively large, so the time steps between the two parts are quite different. Using the multi domain solution technology, the wiper components can be placed in one domain separately, while the other parts of the whole vehicle can be placed in another domain (as shown in Figure 6)

figure 6 Head impact model and its multi domain block mode

for the domain mode of this model, the comparison of the calculation time required by using a single domain and two domains under 1, 16, 32 and 64 CPUs is shown in Table 2. The comparison shows that in this practical model, the calculation time of multi domain can be reduced by half compared with that of single domain, and the acceleration ratio of multi domain solution can still be maintained at 2.01 from 1CPU to 64cpus. In terms of accuracy of calculation results, the acceleration response curve of the head under single domain and multi domain solution is shown in Figure 7

figure 7 Head acceleration response curve in single domain and multi domain solution

4 hybrid mpp

with the emergence of multi-core processors, it is also more and more used in computing clusters. Figure 8 shows the architecture of a computing unit of a typical parallel computing cluster. Each computing node is composed of multiple CPUs (corresponding to Socket1, socket2...), and each CPU has multiple cores (corresponding to C1, C2...), each node shares memory (corresponding to M1, M2...), and all independent computing nodes are interconnected through fast switching network devices. The hybrid MPP parallel mode uses OpenMP for parallel on each independent node, and MPI for parallel between nodes. By using the optimized message passing mechanism to effectively reduce the message volume between nodes, the overall performance of the existing hardware architecture can be effectively played in large-scale parallel computing. The latest version of radioss has realized this parallel mode, and has made a great breakthrough in computing speed in the actual vehicle collision simulation

figure 8 Typical cluster computing unit architecture and hybrid MPP mode threads distribution

the standard frontal crash test model neon is used_ 1m[2] in the actual test on the Intel Nehalem 2.8GhZ cluster, the radio hybrid MPP parallel mode shows very good acceleration performance. When the total number of cores is guaranteed to be 16, the total calculation time of hybrid MPP using 2~8threads is reduced from 1301s in SPMD mode to 256s (as shown in Figure 9)

figure 9 Comparison of computing speed between hybrid MPP and traditional SPMD

5 conclusion

radioss has been making effective innovations in improving the parallel computing performance of finite element solvers. In the latest real vehicle crash simulation test, the solver integrated with the latest solution technology has been able to complete the solution within 5 minutes on a cluster with 1024 cores [3]. This breakthrough in the scalability of parallel computing makes it possible for optimization, reliability analysis and fine failure simulation involving collision simulation analysis

6 references

[1] OVS

Copyright © 2011 JIN SHI