tips and tricks
quick file search:
    home » tips & tricks » tips from xansys
 
 
Sparse Solver (Memory Requirements, Performance)
  Q: Can someone direct me to more details on the sparse method? I didn't see anymore than a few sentences on it in the ansys 5.6 doc'n.

A: The sparse matrix is a direct solver. It directly solves for (x), for example, in the static equation [K](x)=(F), similar to the frontal solver. The frontal solver actually triangularizes [K] and the back-substitutes for (x). This is time-consuming and is also a hard drive hog (since the full [K] is factorized). Sparse solvers, on the other hand, take advantage of the fact that [K] is sparse and banded (usually non-zero terms near diagonal) to reduce memory requirements.
    I've only read two papers on sparse solvers, so I'm not an expert in this area (in fact, I usually have little idea what I write about, but I think I just like hearing myself type). However, as a layman's simplified explanation on this, it's basically trying to store only non-zero terms. The two papers I read do things differently, so I don't know if there's a common algorithm. As one explanation, let's view a [K] matrix of order nxn:

  1. For direct gaussian elimination, we basically need to store an nxn matrix (or, actually, upper triangle for symmetric matrices)
  2. For sparse solver, we store non-zero terms only (say "m"). Oftentimes, because these non-zero terms could be anywhere, the "row" and "column" numbers/locations need to be stored, too. So one ends up with a mx3 matrix, which is much smaller than an nxn matrix.

    This is an over-simplified explanation, but I hope you get the basic idea. As to how it actually solves this, it is too much for me to explain, and even I don't like typing *that* much. I'd refer you to some papers, but I don't think they're published (i.e., public), so I'm sorry about that...

    Sorry if you already know this, but iterative solvers (usually conjugate gradient solvers) solve, as an example, the equation [K](x)=(F) by guessing a solution of (x) and updating it (using a preconditioning matrix -- this, too, is more than I'd like to get into right now). That's why a PCG solver goes through maybe a hundred or more iterations. It does not explicitly solve for (x), but since the convergence is usually tight (1e-6 or 1e-8), the answer is basically the same as you would get from a direct solver.

  Posted by Sheldon Imaoka (CSI) on 05.18.2000
 
Sparse solver
  You are correct in your observation that memory usage can grow in nonlinear analyses using the sparse solver (or frontal for that matter). Contact elements are one cause of the problem. The good news here is that in version 5.7 we are going to be using supplemental memory allocations in the sparse solver loop. This should mean that ANSYS will stop less often with out of memory errors. However, automatically growing memory can actually use a lot more memory than you need. For example, say there is 250 Mbytes available in the ANSYS memory when the sparse solver is called but 300 Mbytes are required. A supplemental memory allocation of 300 Mbytes would happen automatically but the 250 Mbytes remains unused. The memory has to be in one contiguous block. You can imagine if each call to the sparse solver increases the memory by 50 Mbytes what could happen. Eventually you will run out of space even though after the initial extra 300 Mbytes there are 550 total Mbytes available. The solver can only use contiguous blocks. This is a design issue with the sparse solver and we are working with the folks from Boeing to break up the memory requirements. But this change is a major one for the sparse solver package from Boeing so it won't happen overnight.

The best strategy is probably to allow extra memory at the start of a nonlinear run like this. You will find that in 5.7 the sparse solver will be able to run well in a LOT less memory space than in 5.6, at least in some cases. The 300k DOF size job should benefit from the memory changes in 5.7. We are seeing a 20 percent reduction in CPU/WALL times on SGI systems from I/O improvements and we have reduced the file size requirements by nearly half. You will see this in a running job by the fact the file.LN22 will no longer contain a copy of the large LN09 file.

Hope this helps some. More improvements are in the works with the sparse solver. Eventually the wavefront will not matter at all. Right now the wavefront message appears because we use frontal assembly to feed the sparse solver. We already have alternative assembly paths in ANSYS but we are not ready to turn them on for all types of analyses and boundary conditions.
  Posted by Gene Poole (ANSYS, Inc.) on 06.14.2000
 
PCG Solver
  There are differences if you are using the PCG solver. First, it is not a factorization based direct solver. So, even though the matrices change just like in the frontal or sparse solver cases, there should not be the potentially large growth in the matrices that you may see with frontal or direct. The PCG solver assembles the matrices using sparse matrix technology so the wavefront should not enter into the problem at all. Also, all of that which I said about the solver needing large contiguous blocks of memory does not apply to the PCG solver. It may grow memory as in the frontal and sparse solvers but it does so more incrementally so you should not get potentially large unused blocks of memory space.

Having said all of that, if you were running out of PCG solver space, or close to the limit, then you could still fail for memory space. When the PCG solver fails then you really are out of memory. This is because the PCG solver will do supplemental memory allocations until the system memory is unavailable. The only thing you can do at this point is to reduce your database space and rerun. This will cause the file.page to be used but that is not necessarily a huge penalty. The other thing which you can do is to make sure you have plenty of physical swap space set aside. I'd suggest 2X your main memory size on PCs. That way the PCG solver can still run if it exceeds your physical memory size. (If you exceed physical memory size for the PCG solver be prepared for about a 10X hit in Wall time. It is ok to exceed physical memory for all of ANSYS but if the PCG solver also exceeds physical memory each PCG iteration goes through disk I/O to the virtual memory space and it will take awhile) The file.PCG provides a good estimate of the amount of memory required for the PCG solver. It is only written the first time the PCG solver is called so it will not reflect the growth in the problem which you were asking about. Still, it is a worthwhile estimate of the PCG solver space.
  Posted by Gene Poole (ANSYS, Inc.) on 06.14.2000
 
Choosing a Solver
  My main area of expertise is solver performance so I am very interested in your experience. Perhaps I can also help you some.

The correct solver choice depends on several factors and I did see someone post some advice that was quite accurate. The PCG solver will minimize disk space at the expense of memory. PCG is an iterative solver which means there is no matrix factorization as in the frontal or sparse solver options. The file.tri or file.LN09 files are the big files in the frontal and sparse solver runs. In addition there is a temporary scratch file written by the sparse solver that is essentially a copy of the sparse solver workspace plus all of the files used by the sparse solver. This file is file.LN22. It goes away after the program stops so it can be confusing to get out of disk error messages and yet there seems to be a lot of space available. In 5.7 the LN22 file will not be used at all except in some nonlinear runs and we will no longer be saving copies of the huge files in any case. That "feature" was actually put in there external to ANSYS and we have removed it. So the disk space requirements for the sparse solver in ANSYS 5.7 will be less than half of the current requirements.

Now, as to memory usage. The frontal is the least memory but the slowest algorithm and biggest external file - however with the current situation in 5.6 where the sparse solver files get stored essentially twice it may actually take less disk in some cases. The frontal solver also has very good parallel performance. But again, for the size problem you describe it could run for a LONG time and take a huge disk file - several Gbytes would not surprise me. It depends on the maximum wavefront size. If you want an estimate of file size for the frontal solver look in the output file where it tells you the R.M.S. Wavefront size. The file size is pretty close to RMS WF size * num of DOFs * 8 / 1024*1024 Mbytes. The 8 is the number of bytes per double precision word. So if your problem has 300,000 DOFs and the R.M.S wavefront is 5000 that is 1.5 Billion D.P. words, or around 11 Gbytes.

The sparse solver (eqslv,spar) is also a direct factorization method but a much newer technology. It will run out-of-core but does have a minimum memory requirement on any given job. In 5.6 this solver option will grab all of available memory in the ANSYS heap at the time the solver is called. It prints out some messages about the memory available to the solver and if you run with the following undocumented debug flag set you will get some nice performance stats from the sparse solver:

use eqslv,spar,,-5

The sparse solver can run quite efficiently out-of-core if you have a decent disk setup. It does quite well on SGI Origin machines, including running in parallel if you have set ANSYS up to run in parallel. Other UNIX workstation platforms with large memory and good I/O configuration should also do well. We will be adding more sparse solver hardware optimizations in the future.The performance is also quite good on NT systems because we have linked with a fast math library that is used with that solver. However, most NT workstations are short on memory and many have terrible I/O performance. You really need SCSII disks on them and plenty of room. One way to get the maximum memory for the sparse solver is to cut the db space back. So, if you ran with -m 1000 -db 256 and then reran with -m 1000 -db 56 the solver would have an additional 200 Mbytes available. Of course you will get a file.page that is 200 Mbytes larger, potentially, but that is not much of a performance hit.

If you are interested in more info from me let me know. I would like to know a bit more about what kind of system you are running on and if you could do a run with eqslv,spar,,-5 and send me the output file I can tell a lot about your job performance. If you run the PCG solver the file.PCS is the file that I would like to see for performance data.

Good luck! I hope we can continue to improve your solver options. You will definitely see improvements in 5.7.
  Posted by Gene Poole (ANSYS, Inc.) on 06.21.2000
 
Memory Allocation
  There is no performance issue with allowing supplemental memory allocations for your job rather than increasing total scratch memory at the start of your job. However, I would generally recommend asking for sufficient memory up front whenever you have a good idea of the amount you need. The reason is that when supplemental memory allocations occur the amount of memory actually allocated is determined as some fraction of the initial space you started with. So, there can be cases where perhaps you only needed 1 Mbyte more of space but the supplemental memory allocation would get an additional block that would be potentially much more then 1 Mbyte. This is not necessarily a performance hit but you might end up thinking you need a lot more memory to run your job than was actually required.

One of the responses to your email mentioned a paper on ANSYS memory usage at www.csi-ansys.com/tip_of_the_week.htm This is really a well written paper with some very good data. Check it out - it will be very useful.

Hopefully, the word is getting out that in 5.7 the sparse solver will now also use supplemental memory allocation, at least in many cases. This change will work for eqslv,spar as well as the modal analysis runs which use the block Lanczos solver; modopt,lanb. The biggest change besides adding supplemental memory allocation is that we have added some additional logic and functionality to the sparse solver interface so that on the solver can now function using significantly less memory. It does this at the expense of more I/O but we have also improved I/O performance and eliminated some unnecessary I/O to compensate. In most cases the sparse solver is faster and uses less memory. If you have a large memory system you will be able to run the sparse solver with no I/O at all for smaller problems (under 70,000 dofs) and with minimal memory for larger jobs, as long as you specify large initial memory settings via the -m command line. The choice between running in-core and out-of-core is automatic and depends on the memory available. The biggest change in the block Lanczos side is that we can now run fairly large Lanczos jobs with much less memory. In one example a 1.4 Million DOF modal run that required over 2000 Mbtyes to run in 5.6 will now run with Lanczos solver memory of just 450 Mbytes. The total job runs with -m 850.

The performance hit for using less memory is system dependent but it was minimal on many of the systems we have tested. Don't expect performance miracles on small NT systems with IDE drives.
  Posted by Gene Poole (ANSYS, Inc.) on 09.12.2000