Sparse Solver (Memory Requirements, Performance)
Q: Can someone direct me to more details on the sparse method? I didn't see
anymore than a few sentences on it in the ansys 5.6 doc'n.
The sparse matrix is a direct solver. It directly solves for
(x), for example, in the static equation [K](x)=(F), similar to the frontal
solver. The frontal solver actually triangularizes [K] and the
back-substitutes for (x). This is time-consuming and is also a hard drive hog
(since the full [K] is factorized). Sparse solvers, on the other hand, take
advantage of the fact that [K] is sparse and banded (usually non-zero terms
near diagonal) to reduce memory requirements.
I've only read two papers on sparse solvers, so I'm not an expert in
this area (in fact, I usually have little idea what I write about, but I
think I just like hearing myself type). However, as a layman's simplified
explanation on this, it's basically trying to store only non-zero terms.
The two papers I read do things differently, so I don't know if there's a
common algorithm. As one explanation, let's view a [K] matrix of order nxn:
- For direct gaussian elimination, we basically need to store an nxn
matrix (or, actually, upper triangle for symmetric matrices)
- For sparse solver, we store non-zero terms only (say "m").
Oftentimes, because these non-zero terms could be anywhere, the "row" and
"column" numbers/locations need to be stored, too. So one ends up with a
mx3 matrix, which is much smaller than an nxn matrix.
This is an over-simplified explanation, but I hope you get the basic idea.
As to how it actually solves this, it is too much for me to explain, and
even I don't like typing *that* much. I'd refer you to some papers, but I
don't think they're published (i.e., public), so I'm sorry about that...
Sorry if you already know this, but iterative solvers (usually conjugate
gradient solvers) solve, as an example, the equation [K](x)=(F) by guessing
a solution of (x) and updating it (using a preconditioning matrix -- this,
too, is more than I'd like to get into right now). That's why a PCG solver
goes through maybe a hundred or more iterations. It does not explicitly
solve for (x), but since the convergence is usually tight (1e-6 or 1e-8),
the answer is basically the same as you would get from a direct solver.
You are correct in your observation that memory usage can grow in nonlinear
analyses using the sparse solver (or frontal for that matter). Contact
are one cause of the problem. The good news here is that in version 5.7
we are going to be using supplemental memory allocations in the sparse
loop. This should mean that ANSYS will stop less often with out of memory
errors. However, automatically growing memory can actually use a lot more
memory than you need. For example, say there is 250 Mbytes available in
the ANSYS memory when the sparse solver is called but 300 Mbytes are
A supplemental memory allocation of 300 Mbytes would happen automatically
but the 250 Mbytes remains unused. The memory has to be in one contiguous
block. You can imagine if each call to the sparse solver increases the
memory by 50 Mbytes what could happen. Eventually you will run out of space
after the initial extra 300 Mbytes there are 550 total Mbytes available. The
solver can only use contiguous blocks. This is a design issue with the
solver and we are working with the folks from Boeing to break up the memory
requirements. But this change is a major one for the sparse solver package
from Boeing so it won't happen overnight.
The best strategy is probably to allow extra memory at the start of a
like this. You will find that in 5.7 the sparse solver will be able to run
in a LOT less memory space than in 5.6, at least in some cases. The 300k DOF
size job should benefit from the memory changes in 5.7. We are seeing a 20
reduction in CPU/WALL times on SGI systems from I/O improvements and we have
reduced the file size requirements by nearly half. You will see this in a
job by the fact the file.LN22 will no longer contain a copy of the large
Hope this helps some. More improvements are in the works with the sparse
Eventually the wavefront will not matter at all. Right now the wavefront
appears because we use frontal assembly to feed the sparse solver. We
alternative assembly paths in ANSYS but we are not ready to turn them on for
all types of analyses and boundary conditions.
There are differences if you are using the PCG solver. First, it is
not a factorization based direct solver. So, even though the matrices change
just like in the frontal or sparse solver cases, there should not be the
potentially large growth in the matrices that you may see with frontal
or direct. The PCG solver assembles the matrices using sparse matrix
technology so the wavefront should not enter into the problem at all.
Also, all of that which I said about the solver needing large contiguous
of memory does not apply to the PCG solver. It may grow memory as in the
frontal and sparse solvers but it does so more incrementally so you should
not get potentially large unused blocks of memory space.
Having said all of that, if you were running out of PCG solver space, or
to the limit, then you could still fail for memory space. When the PCG
fails then you really are out of memory. This is because the PCG solver will
do supplemental memory allocations until the system memory is unavailable.
The only thing you can do at this point is to reduce your database space
and rerun. This will cause the file.page to be used but that is not
a huge penalty. The other thing which you can do is to make sure you have
plenty of physical swap space set aside. I'd suggest 2X your main memory
size on PCs. That way the PCG solver can still run if it exceeds your
memory size. (If you exceed physical memory size for the PCG solver be
for about a 10X hit in Wall time. It is ok to exceed physical memory for
all of ANSYS but if the PCG solver also exceeds physical memory each PCG
goes through disk I/O to the virtual memory space and it will take awhile)
The file.PCG provides a good estimate of the amount of memory required for
PCG solver. It is only written the first time the PCG solver is called
so it will not reflect the growth in the problem which you were asking
about. Still, it is a worthwhile estimate of the PCG solver space.
Choosing a Solver
My main area of
is solver performance so I am very interested in your experience. Perhaps I
also help you some.
The correct solver choice depends on several factors and I did see someone
some advice that was quite accurate. The PCG solver will minimize disk space
the expense of memory. PCG is an iterative solver which means there is no
matrix factorization as in the frontal or sparse solver options. The
or file.LN09 files are the big files in the frontal and sparse solver runs.
addition there is a temporary scratch file written by the sparse solver that
essentially a copy of the sparse solver workspace plus all of the files
used by the sparse solver. This file is file.LN22. It goes away after
the program stops so it can be confusing to get out of disk error messages
and yet there seems to be a lot of space available. In 5.7 the LN22 file
will not be used
at all except in some nonlinear runs and we will no longer be saving copies
of the huge files in any case. That "feature" was actually put in there
external to ANSYS and we have
removed it. So the disk space requirements for the sparse solver in ANSYS
will be less than half of the current requirements.
Now, as to memory usage. The frontal is the least memory but the slowest
algorithm and biggest external file - however with the current situation
in 5.6 where the sparse solver files get stored essentially twice it may
take less disk in some cases. The frontal solver also has very good parallel
performance. But again, for the size problem you describe it could run for a
and take a huge disk file - several Gbytes would not surprise me. It depends
on the maximum wavefront size. If you want an estimate of file size for the
solver look in the output file where it tells you the R.M.S. Wavefront size.
The file size is pretty close to RMS WF size * num of DOFs * 8 / 1024*1024
The 8 is the number of bytes per double precision word. So if your problem
300,000 DOFs and the R.M.S wavefront is 5000 that is 1.5 Billion D.P. words,
around 11 Gbytes.
The sparse solver (eqslv,spar) is also a direct factorization method but a
newer technology. It will run out-of-core but does have a minimum memory
on any given job. In 5.6 this solver option will grab all of available
in the ANSYS heap at the time the solver is called. It prints out some
about the memory available to the solver and if you run with the following
undocumented debug flag set you will get some nice performance stats from
the sparse solver:
The sparse solver can run quite efficiently out-of-core if you have a
disk setup. It does quite well on SGI Origin machines, including running
in parallel if you have set ANSYS up to run in parallel. Other UNIX
platforms with large memory and good I/O configuration should also
do well. We will be adding more sparse solver hardware optimizations in
the future.The performance is
also quite good on NT systems because we have linked with a fast math
that is used with that solver. However, most NT workstations are short
on memory and many have terrible I/O performance. You really need SCSII
on them and plenty of room. One way to get the maximum memory for the sparse
solver is to cut the db space back. So, if you ran with -m 1000 -db 256
and then reran with -m 1000 -db 56 the solver would have an additional
200 Mbytes available. Of course you will get a file.page that is 200 Mbytes
larger, potentially, but that is not much of a performance hit.
If you are interested in more info from me let me know. I would like to
know a bit more about what kind of system you are running on and if you
do a run with eqslv,spar,,-5 and send me the output file I can tell a
lot about your job performance. If you run the PCG solver the file.PCS
is the file that I would like to see for performance data.
Good luck! I hope we can continue to improve your solver options. You will
definitely see improvements in 5.7.
There is no performance issue with allowing supplemental
memory allocations for your job rather than increasing total scratch memory
at the start of your job. However, I would generally recommend asking for
sufficient memory up front whenever you have a good idea of the amount you
The reason is that when supplemental memory allocations occur the amount
of memory actually allocated is determined as some fraction of the initial
space you started with. So, there can be cases where perhaps you only needed
1 Mbyte more of space but the supplemental memory allocation would get an
additional block that would be potentially much more then 1 Mbyte. This
is not necessarily a performance hit but you might end up thinking you
need a lot more memory to run your job than was actually required.
One of the responses to your email mentioned a paper on ANSYS memory usage
at www.csi-ansys.com/tip_of_the_week.htm This is really a well written paper
with some very good data. Check it out - it will be very useful.
Hopefully, the word is getting out that in 5.7 the sparse solver will now
also use supplemental memory allocation, at least in many cases. This
change will work for eqslv,spar as well as the modal analysis runs
which use the block Lanczos solver; modopt,lanb. The biggest change
besides adding supplemental memory allocation is that we have added
some additional logic and functionality to the sparse solver interface
so that on the solver can now function using significantly less memory.
It does this at the expense of more I/O but we have also improved I/O
performance and eliminated some unnecessary I/O to compensate. In
most cases the sparse solver is faster and uses less memory. If you
have a large memory system you will be able to run the sparse solver with
no I/O at all for smaller problems (under 70,000 dofs) and with minimal
memory for larger jobs, as long as you specify large initial memory
settings via the -m command line. The choice between running in-core and
out-of-core is automatic and depends on the memory available. The biggest
change in the block Lanczos side is that we can now run fairly large
Lanczos jobs with much less memory. In one example a 1.4 Million DOF
modal run that required over 2000 Mbtyes to run in 5.6 will now run
with Lanczos solver memory of just 450 Mbytes. The total job runs with -m
The performance hit for using less memory is system dependent but it was
minimal on many of the systems we have tested. Don't expect performance
miracles on small NT systems with IDE drives.