Linux – Memory Management insights
Nowadays the Linux memory management of a SAP system (application server) or SAP HANA system getting more important since the clear roadmap of SAP (Linux as only OS for HANA) is showing that the amount of Linux installations is rising steeply.
One of the worst things which could happen to such a system in context of performance is swapping or paging. But is swapping and paging the same?
A lot of people mean paging when they are talking about swapping. Swapping is the older method of moving data from memory to disk. To swap
With paging, when the kernel requires more main memory for an active process, only the least recently used pages of processes are moved to the swap space.
The most common Linux systems are mixed mode systems using paging and swapping.
A lot of customers are asking me in context of monitoring if the systems behavior is correct when the used memory is close to the physical memory size. As you may know there is a big difference in cause of the Unix memory concept and how an application handles its memory. The OS memory monitoring is totally useless if you want to use it for monitoring HANA systems.
The most famous tools are top (default), htop and nmon (contained in the most repositories).
With ps -ef or ps axu you can get a static view about the current processes.
But there are a lot of memory information like VSZ/VSS (virtual set size), RSS/RES (resident set size), SHR/SHM (shared memory).
Have you ever added up these values? In the most cases you will bust the physical memory size. But how can you determine the real usage of a process and may be the complete system?
We will start from top to bottom to get some insights.
- Complete system memory
- Buffer
- Cache
- Shared memory
- Slab
- Individual process memory usage
- Collect support details
Testsystem is a 16GB SLES12 SP4 Application Server with 20 workprocesses.
Complete system memory
The most popular way to see the complete memory consumption is the command:
free -m
Example:
total used free shared buffers cached
Mem: 16318 15745 573 6548 174 8062
-/+ buffers/cache: 7508 8810
Swap: 12283 2422 9861
Pretty simple to explain:
Total = physical memory
Used = used memory (incl. buffers/caches)
Shared = Shared memory (details see shared memory section)
Free = not allocated memory
Swap = used swap space on disk
- 15,7GB of 16GB are allocated – only 573MB are free
- 7,5GB of 16GB are used by buffer and caches
The real free memory is 8810MB. They result from free (573MB), buffers (174MB) and cached (8062MB) => 573+174+8062 = ~8810 MB
Cache
With (Page)Cache and Buffers it is the same as paging and swapping. Most people mixing up these terms.
Pagecache is caching of file data. When a file is read from disk or network, the contents are stored in pagecache. No disk or network access is required, if the contents are up-to-date in pagecache.
Note: tmpfs and shared memory segments count toward pagecache! |
Buffer
The buffercache is a type of pagecache for block devices (for example, /dev/sda). A file system typically uses the buffercache when accessing its on-disk metadata structures such as inode tables, allocation bitmaps, and so forth. Buffercache can be reclaimed similarly to pagecache.
These 2 terms were separated memory areas in Linux Kernel < 2.2. In newer Kernel 2.4+ they are building together the pagecache, because the buffer cache is writing its mapping of a block into a page. A more common description as holistic term you may know: filesystem cache (FS Cache)
Most of you may know that there is no need to panic if the used memory the system is using is close to the physical memory.
If you want to find out how much memory could be released if you clear the caches/buffer you can use this command:
sync; echo 3 > /proc/sys/vm/drop_caches
If some memory is not released that’s because of the shared memory. As already mentioned, shared memory segments count toward the pagecache which means that this memory is shared by current processes and can’t be released till all these processes which are using it are ended.
There are a lot of kernel parameter to control this automatically. Please do this only when the system is in trouble. Normally there is no need to do this besides for show and shine OS monitoring
For details you should check the meminfo:
cat /proc/meminfo
MemTotal: 16710208 kB
MemFree: 590720 kB
MemAvailable: 1662528 kB
Buffers: 178688 kB
Cached: 8069248 kB
SwapCached: 74560 kB
Active: 11600512 kB
Inactive: 1729024 kB
Active(anon): 10658816 kB
Inactive(anon): 1159424 kB
Active(file): 941696 kB
Inactive(file): 569600 kB
Unevictable: 150336 kB
Mlocked: 150336 kB
SwapTotal: 12578688 kB
SwapFree: 10097792 kB
Dirty: 1984 kB
Writeback: 0 kB
AnonPages: 5164224 kB
Mapped: 6361984 kB
Shmem: 6705536 kB
Slab: 380800 kB
SReclaimable: 112192 kB
SUnreclaim: 268608 kB
KernelStack: 11904 kB
PageTables: 42880 kB
[...]
At this point in time I will only quote the documentation (SLES 12 System Analysis and Tuning Guide) for the active and inactive memory parts:
Active, Active(anon), Active(file) Recently used memory that will not be reclaimed unless necessary or on explicit request. Active is the sum of Active(anon) and Active(file):Active(anon) tracks swap-backed memory. This includes private and shared anony- mous mappings and private file pages after copy-on-write.Active(file) tracks other file system backed memory.
Inactive, Inactive(anon), Inactive(file) Inactive(anon) tracks swap backed memory. This includes private and shared anonymous mappings and private file pages after copy-on-write. Inactive(file) tracks other file system backed memory. |
Shared Memory
Shared memory concept is heavily used by the SAP workprocesses. Each workprocess needs about 200-300MB exclusive memory footprint also when they are not active. Most of the memory is shared between the processes. The other part is the heap/working data itself.
You can check this for the complete system with:
grep -i shmem /proc/memory
Shmem: 6705536 kB
=> For an Application Server the shared memory value is always high – don’t worry about it is works-as-designed
For each shm segment:
ipcs -a
------ Message Queues --------
key msqid owner perms used-bytes messages
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00004dc4 125173760 sapadm 760 40141728 1
0x00004dbe 125206529 root 777 702916 1
0x000027bd 125370370 sidadm 740 60000000 1
0x00000000 229379 root 600 2610 0
0x00000000 262148 sidadm 740 1024 1
0x0382be85 294917 sidadm 640 4096 2
0x00002796 327686 sidadm 740 131072000 1
0x00000000 360455 sidadm 740 1024 1
0x0382be84 393224 sidadm 640 4096 21
0x00002749 425993 sidadm 740 2048592 20
0x0000271a 655370 sidadm 740 124000000 21
[…]
For cleaning up some zombie processes and shared memory segments SAP has introduced the binary cleanipc which is delivered by every SAP AS kernel. Normally this will be done with each clean shutdown process.
But you can trigger it by your own :
cleanipc <instance number> remove
Note: Do not use it when the system or any of its processes is still up and running. End all processes and check it with ps -fu <sidadm> |
For each individual process:
cat /proc/<PID>/smaps
For SAP HANA systems the RowStore is also based on the shared memory concept. This is also the reason why those tables can’t be paged out. It is one the first data areas which are read into memory during startup and hold by processes hdbrsutil also when the HANA DB was stopped.
Slab
– Memory allocation for internal data structures of the kernel –
Normal this area should not consume more than 2GB. For the smaller systems < 128GB memory the slab memory consume is around 500MB.
You can check it with:
grep -i slab -A 4 /proc/meminfo
Slab: 380800 kB
SReclaimable: 112192 kB
SUnreclaim: 268608 kB
KernelStack: 11904 kB
PageTables: 42880 kB
cat /proc/slabinfo
labinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
SCTPv6 2 42 1536 42 1 : tunables 24 12 8 : slabdata 1 1 0
SCTP 0 0 1280 51 1 : tunables 24 12 8 : slabdata 0 0 0
nf_conntrack 211 762 256 254 1 : tunables 120 60 8 : slabdata 3 3 0
nfs_direct_cache 0 0 360 181 1 : tunables 54 27 8 : slabdata 0 0 0
nfs_inode_cache 7666 23373 1040 63 1 : tunables 24 12 8 : slabdata 371 371 0
rpc_inode_cache 0 0 640 102 1 : tunables 54 27 8 : slabdata 0 0 0
fscache_cookie_jar 4 799 80 799 1 : tunables 120 60 8 : slabdata 1 1 0
ext4_groupinfo_4k 360 448 144 448 1 : tunables 120 60 8 : slabdata 1 1 0
ext4_inode_cache 4136 5820 1080 60 1 : tunables 24 12 8 : slabdata 97 97 0
ext4_allocation_context 2 504 128 504 1 : tunables 120 60 8 : slabdata 1 1 0
[…]
Realtime monitoring command for kernel memory:
slabtop
- Here we have 450MB active and nearly 700MB allocated
Individual process memory usage
I have to disappoint you, there is no easy way to determine the exact usage of a process without special tools or scripts. The resident memory (RSS) is a good indicator of the real usage but it does not include swapped out, inactive memory and shared memory. The closest value is PSS (Proportional Set Size). Never heard about this term? It is a new measurement concept.
VSZ / VSS :
The total amount of virtual memory used by the task. It includes all code, data and shared libraries plus pages that have been swapped out.
VIRT / VSZ / VSS = SWAP + RES
RSS / RES:
The non-swapped physical memory a task has used (incl. shared memory)
RES = CODE + DATA + SHM.
PSS:
Same as RSS but the shared memory will be tracked as a proportion used by the current process.
PSS = CODE + DATA + SHM / <processes using SHM>
Example:
25MB binary = > 50% loaded
200MB shared libraries (=shared memory) => 80% loaded
50MB heap => 75% used / loaded
10 Processes using the shared libs
VSZ: 25MB + 200MB + 50MB = 275MB
RSS: 25MB*0,5 + 200MB*0,8 + 50*0,75 = 210MB
PSS: 25MB*0,5 + 200MB*0,8/10 + 50*0,75 = 66MB
But how could you determine such values? You can do this manually with /proc/<PID>/smaps but depending on the process and it allocation areas this could be a long analyses.
Quick and dirty analyses (edit PID value):
grep -e \- -e ^Size -e ^Rss -e ^Pss /proc/$(PID)/smaps
For all detail analyses I recommend some scripts:
- linux_smap_analyzer.py by LanderlYoung (python)
- diag by James Hunt (perl – details)
Personally, I’m using the first one because I’m more familiar with python as perl. With this scripts you can analyze one process (=PID).
So, I´ve written a small shell script to analyze more than just one PID. Normally you want to analyze the complete SAP system which includes all workprocesses.
Code for free usage:
#!/bin/sh
p_name=$1
ps -eo pid,comm,cmd |grep $p_name | grep -v grep> /tmp/pid_name.out
awk '{print $1}' /tmp/pid_name.out > /tmp/pid.out
PID_file='/tmp/pid.out'
i=0
BAR='####################'
end=$(cat $PID_file |wc -l)
while read pid;
do
cmd=$(grep $pid /tmp/pid_name.out)
echo "##############" >> $2
echo "SMAPS: $pid" >> $2
echo "Process: $cmd" >> $2
echo "##############" >> $2
let i+=1
python <path-to-python-script-edit-here>/smaps_analyzer.py /proc/$pid/smaps Pss >> $2
status=$(($i * 100 / $end))
end_status=$(($end / 20 * $i))
echo -ne "\r${BAR:0:$end_status}"
echo $status "%"
sleep 0.1
done < $PID_file
rm /tmp/pid.out /tmp/pid_name.out
just edit the path to the python script which you have already downloaded before.
Means:
- download python script linux_smap_analyzer.py
- create bash script with execute rights including the content above
- edit path in bash script
- run it with search term and output path
Usage:
./scriptname.sh <search_term> <output>
./pid_analyses.sh dw.sap /tmp/pid_analyses.out
“dw.sap” is a short term for the disp+work processes, but you can also use the PID or another term for which is used by your application.
Here is an example output of the workprocess 6 (W6) of a system called “SID” an instance number “00”:
##############
SMAPS: 560
Process: 560 SID_00_DIA_W6 dw.sapSID_D00 pf=/usr/sap/SID/SYS/profile/SID_D00_aldSIDd0
##############
all data
Pss Rss Size name other
268548 kB 2950720 kB 6434816 kB /dev/zero (deleted)
247235 kB 247296 kB 309248 kB [heap]
95080 kB 1100736 kB 1290240 kB /SYSV00002716 (deleted)
38419 kB 323072 kB 1752000 kB /SYSV00002738 (deleted)
17482 kB 17664 kB 83776 kB unknown
11517 kB 217920 kB 255680 kB /SYSV00002712 (deleted)
8910 kB 91200 kB 149440 kB /SYSV00002718 (deleted)
8300 kB 84928 kB 84928 kB /SYSV00002763 (deleted)
6476 kB 66176 kB 92288 kB /SYSV00002746 (deleted)
4761 kB 56832 kB 88832 kB /usr/sap/SID/D00/exe/disp+work
3472 kB 52480 kB 52544 kB /SYSV00002739 (deleted)
2517 kB 44096 kB 44096 kB /SYSV00002724 (deleted)
1478 kB 27200 kB 58816 kB /SYSV00002759 (deleted)
960 kB 960 kB 1152 kB [stack]
764 kB 16064 34304 kB /SYSV00002725 (deleted)
669 kB 6464 kB 12864 kB /usr/sap/SID/hdbclient/libSQLDBCHDB.so
617 kB 6976 kB 121152 kB /SYSV0000271a (deleted)
526 kB 8192 kB 13120 kB /SYSV00002743 (deleted)
428 kB 4992 kB 850048 kB /SYSV00002713 (deleted)
379 kB 3648 kB 4928 kB /usr/sap/SID/D00/exe/libsapcrypto.so
269 kB 1536 kB 2688 kB /usr/sap/SID/D00/exe/dbhdbslib.so
191 kB 2304 kB 176704 kB /SYSV0000274e (deleted)
164 kB 1216 kB 1728 kB /usr/sap/SID/D00/exe/dw_xml.so
135 kB 256 kB 21056 kB /usr/sap/SID/D00/exe/dw_gui.so
114 kB 896 kB 1920 kB /usr/sap/SID/D00/exe/libicuuc.so.50
101 kB 640 kB 5312 kB /usr/sap/SID/D00/exe/dw_abp.so
94 kB 512 kB 2496 kB /usr/sap/SID/D00/exe/libicui18n.so.50
93 kB 1152 kB 2240 kB /usr/lib64/libstdc++.so.6.0.25
90 kB 384 kB 832 kB /usr/sap/SID/D00/exe/dw_rndrt.so
88 kB 448 kB 2944 kB /usr/sap/SID/D00/exe/dw_xtc.so
86 kB 1600 kB 2048 kB /SYSV00002749 (deleted)
85 kB 1728 kB 1856 kB /lib64/libc-2.22.so
83 kB 384 kB 1152 kB /usr/sap/SID/D00/exe/libregex.so
75 kB 1088 kB 1088 kB /SYSV00002714 (deleted)
73 kB 1024 kB 4160 kB /SYSV00002751 (deleted)
71 kB 832 kB 4160 kB /SYSV00002750 (deleted)
70 kB 256 kB 256 kB /lib64/libgcc_s.so.1
68 kB 256 kB 256 kB /lib64/libpthread-2.22.so
68 kB 192 kB 192 kB /lib64/libnss_sss.so.2
68 kB 320 kB 320 kB /lib64/ld-2.22.so
67 kB 192 kB 192 kB /lib64/libnss_files-2.22.so
42 kB 576 kB 1216 kB /usr/sap/SID/D00/exe/dw_stl.so
67 kB 192 kB 192 kB /lib64/libnss_files-2.22.so
42 kB 576 kB 1216 kB /usr/sap/SID/D00/exe/dw_stl.so
40 kB 640 kB 6144 kB /SYSV00002722 (deleted)
30 kB 384 kB 20480 kB /usr/sap/SID/D00/exe/libicudata.so.50
27 kB 576 kB 896 kB /lib64/libm-2.22.so
9 kB 128 kB 192 kB /SYSV00002744 (deleted)
8 kB 256 kB 256 kB /lib64/libresolv-2.22.so
7 kB 128 kB 15680 kB /usr/sap/SID/D00/exe/librender.so
7 kB 128 kB 1920 kB /usr/sap/SID/D00/exe/libicuuc51.so
7 kB 128 kB 192 kB /usr/sap/SID/D00/exe/libiculx51.so
7 kB 128 kB 448 kB /usr/sap/SID/D00/exe/libicule51.so
7 kB 192 kB 192 kB /lib64/libnss_dns-2.22.so
6 kB 192 kB 192 kB /lib64/libdl-2.22.so
6 kB 64 kB 576 kB /SYSV00002748 (deleted)
5 kB 128 kB 192 kB /usr/lib64/libuuid.so.1.3.0
4 kB 128 kB 192 kB /lib64/librt-2.22.so
4 kB 64 kB 64 kB /SYSV00002761 (deleted)
4 kB 64 kB 64 kB /SYSV00002717 (deleted)
3 kB 64 kB 64 kB /usr/sap/SID/SIDadm/.hdb/sap-ald-138-s/SQLDBC.shm
3 kB 64 kB 21952 kB /usr/sap/SID/D00/exe/libicudata51.so
3 kB 64 kB 64 kB /SYSV0382be84 (deleted)
3 kB 64 kB 64 kB /SYSV0000272e (deleted)
3 kB 64 kB 64 kB /SYSV00002711 (deleted)
1 kB 128 kB 128 kB [vdso]
stack maps
Pss Rss Size name other
960 kB 960 kB 1152 kB [stack]
all so maps
Pss Rss Size name other
669 kB 6464 kB 12864 kB /usr/sap/SID/hdbclient/libSQLDBCHDB.so
379 kB 3648 kB 4928 kB /usr/sap/SID/D00/exe/libsapcrypto.so
269 kB 1536 kB 2688 kB /usr/sap/SID/D00/exe/dbhdbslib.so
164 kB 1216 kB 1728 kB /usr/sap/SID/D00/exe/dw_xml.so
135 kB 256 kB 21056 kB /usr/sap/SID/D00/exe/dw_gui.so
101 kB 640 kB 5312 kB /usr/sap/SID/D00/exe/dw_abp.so
90 kB 384 kB 832 kB /usr/sap/SID/D00/exe/dw_rndrt.so
88 kB 448 kB 2944 kB /usr/sap/SID/D00/exe/dw_xtc.so
85 kB 1728 kB 1856 kB /lib64/libc-2.22.so
83 kB 384 kB 1152 kB /usr/sap/SID/D00/exe/libregex.so
68 kB 256 kB 256 kB /lib64/libpthread-2.22.so
68 kB 320 kB 320 kB /lib64/ld-2.22.so
67 kB 192 kB 192 kB /lib64/libnss_files-2.22.so
42 kB 576 kB 1216 kB /usr/sap/SID/D00/exe/dw_stl.so
27 kB 576 kB 896 kB /lib64/libm-2.22.so
8 kB 256 kB 256 kB /lib64/libresolv-2.22.so
7 kB 128 kB 15680 kB /usr/sap/SID/D00/exe/librender.so
7 kB 128 kB 1920 kB /usr/sap/SID/D00/exe/libicuuc51.so
7 kB 128 kB 192 kB /usr/sap/SID/D00/exe/libiculx51.so
7 kB 128 kB 1920 kB /usr/sap/SID/D00/exe/libicuuc51.so
7 kB 128 kB 192 kB /usr/sap/SID/D00/exe/libiculx51.so
7 kB 128 kB 448 kB /usr/sap/SID/D00/exe/libicule51.so
7 kB 192 kB 192 kB /lib64/libnss_dns-2.22.so
6 kB 192 kB 192 kB /lib64/libdl-2.22.so
4 kB 128 kB 192 kB /lib64/librt-2.22.so
3 kB 64 kB 21952 kB /usr/sap/SID/D00/exe/libicudata51.so
all dex maps
Pss Rss Size name other
app so maps
Pss Rss Size name other
app lib so maps
Pss Rss Size name other
app dex maps
Pss Rss Size name other
avlive txav maps
Pss Rss Size name other
tbs maps
Pss Rss Size name other
map Pss total = 720927 Kb
map Vss total = 12039104 Kb
stacks Pss = 960 kB
stacks Vss = 1152 kB
all so map Pss = 2391 kB
all so map Vss = 99264 kB
all dex map Pss = 0 kB
all dex map Vss = 0 kB
app so map Rss
app so map Rss = 0 kB
app so map Vss = 0 kB
app dex map Rss
app dex map Rss = 0 kB
app map Vss = 0 kB
app_tbs
tbs mem map Pss = 0 kB
avlive txav
tbs mem map Pss = 0 kB
To summarize this process details:
map Pss total = 720927 Kb => 720MB
map Vss total = 12039104 Kb => 12000MB
=> which means about 11,3GB are SHM (mostly the case) or swap
To analyze a bunch of processes in one output file:
grep "map Pss total" /tmp/pid_analyses.out | sort -gk 5
map Pss total = 1350 Kb
map Pss total = 20986 Kb
map Pss total = 77454 Kb
map Pss total = 140503 Kb
map Pss total = 220375 Kb
map Pss total = 247087 Kb
map Pss total = 267906 Kb
map Pss total = 307026 Kb
map Pss total = 320250 Kb
map Pss total = 369559 Kb
map Pss total = 720927 Kb
map Pss total = 737010 Kb
map Pss total = 749670 Kb
map Pss total = 752209 Kb
map Pss total = 763658 Kb
map Pss total = 783921 Kb
map Pss total = 794335 Kb
map Pss total = 794902 Kb
map Pss total = 895677 Kb
map Pss total = 912923 Kb
This means the system with 20 workprocesses needs currently about 9,8GB memory.
Collect support details
For creating all relevant information for the SAP / Linux support please run the script sapsysinfo.sh which is attached to note 618104.
Additionally, you can create details for SLES with the commands supportconfig and for RHEL sosreport.
Note: Please run these collection tools immediately in all other cases it is too late to reconstruct the scenario and what is the real root cause of your memory consumption! |
Summary
I hope you could get some insights of the memory management. The differences of swapping, paging, buffer and caches and VSZ(VSS)/RSS/PSS. This will help you to size your system properly and understand possible bottlenecks. There are tons of more details, but for most of the linux geeks out there this should be a good starting point to understand the memory management a bit better.
Sources:
88999999