Safeguarding Presto C++ Memory Usage with LinuxMemoryChecker
Problem
Running the Presto C++ worker stably in a production environment relies on proper configuration that maximizes stability without sacrificing performance. Presto C++ designed a LinuxMemoryChecker to achieve this goal.
The evaluation engine used in Presto C++ is Velox. Velox, the evaluation engine used in Presto C++, implements a MemoryManager that provides several advanced features such as fair memory sharing, transparent file cache (AsyncDataCache), and server out-of-memory (OOM) prevention. The OOM prevention is enabled by enforcing memory limits using the std::mmap allowing control of the physical memory allocation. The MemoryManager handles all memory allocations and deallocations for the file cache and query memory.
While Velox provides OOM prevention mechanisms, it does not control all memory allocations. This can result in OOM condition occurring.
The MemoryManager is initialized by Presto C++ with a specific capacity by using the system-memory-gb configuration parameter.
While system-memory-gb limits the maximum memory that can be used for most operations, not all memory allocations/deallocations are accounted for by the MemoryManager. Unaccounted memory is known to be used by the Parquet file metadata, http client/server, and possibly other areas. system-memory-gb is usually set close to the total system memory of the environment (host or container), which means that any excessive use of unaccounted memory can result in competing memory usage requirements. If the memory limit of the host or container is exceeded, the Presto C++ process or the container is stopped due to hitting the OOM condition.
The investigation found that the HashProbe operator was using unaccounted memory. This has since been addressed and the memory is now accounted for in the query memory pool.
When the file cache is enabled using the async-data-cache-enabled configuration option, most memory from the MemoryManager is used for caching data. This results in situations where most of the available capacity of the MemoryManager is used. The situation is exacerbated due to the use of unaccounted memory that can exceed the free system memory not under control of the MemoryManager. Its memory size can overlap with memory technically reserved for the MemoryManager. If the memory manager reaches its capacity, it either gets errors attempting to allocate pages by the system or the sum of allocated memory and the unaccounted memory exceeds the total system memory.
The OOM condition can be triggered by allocations of the memory manager or by allocations that are unaccounted for.
Figure 1 illustrates the initial setup where the capacity of the memory manager does not fully encompass the total available system memory.

Figure 2 illustrates unaccounted memory using up memory reserved for use by the memory manager.

Figure 3 illustrates an an example where an allocation by the memory manager results in an OOM condition.

Solution: The LinuxMemoryChecker
The LinuxMemoryChecker is a specialization of the generic pushback mechanism provided by the MemoryManager to handle spikes in non-Velox memory consumption. The pushback mechanism shrinks the file cache and returns the freed memory back to the OS.
The LinuxMemoryChecker, as the name suggests, enables monitoring of the memory consumption and helps prevent out-of-memory kills of the Presto C++ server process (presto_server) or a container running the Presto C++ server process on a Linux based operating system. It utilizes the generic pushback mechanism provided by the MemoryManager to shrink the in-memory file cache (AsyncDataCache) and return the freed memory back to the OS. As a result, it ensures the Presto C++ server process does not surpass the total system memory.
The LinuxMemoryChecker is not implemented for macOS because it makes use of Linux kernel specific features.
The LinuxMemoryChecker monitors current memory consumption of the Presto C++ server process in intervals and checks if the process reached a configurable memory limit. If it reaches this limit, it will invoke the memory pushback mechanism. This prevents the process from reaching the maximum allowed memory of a host or containerized environment and reaching the OOM condition.
In Linux systems, there is a kernel feature called Control Group (cgroup) that allows users to manage how resources such as CPU time, system memory, network bandwidth, or combinations of these resources are allocated to processes. There are two versions of cgroup, v1 and v2, that need to be taken into account depending on the version of the host operating system. Cgroup v2 is an improvement to cgroup v1 and offers a more unified and consistent interface and better resource control features.
For more information on cgroups visit:
For cgroup v1:
https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt
For cgroup v2:
https://www.kernel.org/doc/Documentation/cgroup-v2.txt
To find out what cgroup version is used in the environment, run mount | grep cgroup.
As an example, this is cgroup v1 output:
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0For a cgroup v2 example, this is the output:
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot) The LinuxMemoryChecker looks at the inactive_anon + active_anon metrics in the memory.stat file of the cgroup to track current system memory usage.
Linux cgroups have many memory related metrics. One challenge in choosing the proper metrics is that some metrics related to memory usage are fuzz values. That is, they are not updated in real time and may not return a correct measurement when inspecting. Inaccurate information of memory usage could result in an incorrect decision whether or not to shrink to avoid the OOM condition. During the investigation of which metrics provide the most accurate information of memory consumption of the process, inactive_anon + active_anon in the memory.stat file of the cgroup were found to be the most accurate current memory consumption measurement. As a result, these metrics were chosen to ensure current system memory usage is tracked accurately in a host and containerized environment.
Setting a memory limit
Once per second (not configurable), the LinuxMemoryChecker evaluates the current memory consumption and compares the value against a user configured parameter called system-mem-limit-gb. This parameter must be set below the total system memory and should be larger than system-memory-gb. Expressed as a simple rule:
system-memory-gb <= system-mem-limit-gb < available machine memory of deployment If the current memory value exceeds the system-mem-limit-gb then the LinuxMemoryChecker invokes the emergency shrink mechanism of the AsyncDataCache to prevent the OOM condition.
The emergency shrink compared to the regular shrink evicts data not based on the normally used eviction threshold which is time and use based. The check and shrink operations occur asynchronously while query processing occurs. As a result, any data found can be evicted which means that not necessarily the oldest or most unused data is evicted. The size of the shrink contributes to the number of shrink operations necessary within a given time frame (the smaller the size the more likely a shrink is required).
It is recommended to set system-mem-limit-gb to be about 95% of total system memory. The main consideration for selecting a value is how much memory could be allocated within one check interval (1s) which might depend on workload characteristics.
Figure 4 shows the memory consumption by the MemoryManager and the unaccounted memory. The memory limit is checked against the current memory usage.

Figure 5 shows the situation at which point the shrink is triggered if the memory exceeds the memory limit.

Figure 6 shows the memory situation after the shrink has been completed.

Setting a shrink size
The configuration option system-mem-shrink-gb defines how much memory is to be made available during the shrink.
Care needs to be taken that this parameter is not too small or too large to have a negative impact on the performance of queries.
Overall, there is a balance for a given workload between the shrink size, the allotted overhead (the difference between total system memory and the system-memory-gb configuration), and the memory limit.
The idea is to maximize the usage of the AsyncDataCache (by setting system-memory-gb slightly below total system memory) but also leave enough room for unaccounted memory to minimize the number of shrinks necessary.
It is recommended to use 10% of total system memory for the unaccounted memory and set the shrink size to about 10% of the memory limit.
How to Use
To enable the LinuxMemoryChecker for the build, set the CMake flag PRESTO_MEMORY_CHECKER_TYPE=LINUX_MEMORY_CHECKER
The LinuxMemoryChecker is controlled by the following config.properties options:
- system-mem-pushback-enabled
- system-memory-gb
- system-mem-limit-gb
- system-mem-shrink-gb
The system-mem-pushback-enabled configuration property must be set to true.
To work correctly, the system-memory-gb and system-mem-limit-gb configuration properties must be set like so:
system-memory-gb <= system-mem-limit-gb < available machine memory of deployment.
The configuration properties are documented here: https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/presto_cpp/properties.rst#memory-checker-properties
The default values for system-mem-pushback-enabled, system-memory-gb, system-mem-limit-gb and system-mem-shrink-gb configs are:
system-mem-pushback-enabled=false
system-memory-gb=57
system-mem-limit-gb=60
system-mem-shrink-gb=8 The system-mem-pushback-enabled config should be set to true to make use of the system-mem-limit-gb and system-mem-shrink-gb configs.
Log Output Examples
When LinuxMemoryChecker is enabled, the Presto C++ server process logs information about its configuration and the detected total system memory during startup. In addition, each occurrence of pushback is logged as an INFO message. There is also a velox.cache_shrink_count metric that keeps track of the number of shrinks that is exposed through the stats reporter that can be exported via Prometheus. Shrinking is also logged as an INFO message.
cgroup v1 log example
Successful start:
I0211 09:50:27.443691 490102 LinuxMemoryChecker.cpp:35] [PRESTO_STARTUP] Using cgroup v1.
I0211 09:50:27.443755 490102 LinuxMemoryChecker.cpp:55] [PRESTO_STARTUP] Using memory stat file: /sys/fs/cgroup/memory/memory.stat
I0211 09:50:27.443924 490102 LinuxMemoryChecker.cpp:58] [PRESTO_STARTUP] Using memory max file /sys/fs/cgroup/memory/memory.limit_in_bytes
I0211 09:50:27.444255 490102 LinuxMemoryChecker.cpp:90] [PRESTO_STARTUP] System memory in bytes: 2147483648
I0211 09:50:27.444301 490102 LinuxMemoryChecker.cpp:93] [PRESTO_STARTUP] System memory limit in bytes: 2147483648
I0211 09:50:27.444613 490102 LinuxMemoryChecker.cpp:97] [PRESTO_STARTUP] Available machine memory of deployment in bytes: 8331362304
I0211 09:50:27.444728 490102 PeriodicMemoryChecker.cpp:48] [PRESTO_STARTUP] Creating server memory pushback checker, memory check interval 1000ms, system memory limit: 2.00GB, memory shrink size: 1.00GB
I0211 09:50:27.444770 490102 PeriodicMemoryChecker.cpp:57] [PRESTO_STARTUP] Malloc memory heap dumper is not enabled Error – system-mem-limit-gb was higher than available machine memory of deployment:
I0211 09:34:08.711416 489817 LinuxMemoryChecker.cpp:35] [PRESTO_STARTUP] Using cgroup v1.
I0211 09:34:08.711459 489817 LinuxMemoryChecker.cpp:55] [PRESTO_STARTUP] Using memory stat file: /sys/fs/cgroup/memory/memory.stat
I0211 09:34:08.711473 489817 LinuxMemoryChecker.cpp:58] [PRESTO_STARTUP] Using memory max file /sys/fs/cgroup/memory/memory.limit_in_bytes
I0211 09:34:08.711670 489817 LinuxMemoryChecker.cpp:90] [PRESTO_STARTUP] System memory in bytes: 2147483648
I0211 09:34:08.711715 489817 LinuxMemoryChecker.cpp:93] [PRESTO_STARTUP] System memory limit in bytes: 64424509440
I0211 09:34:08.711890 489817 LinuxMemoryChecker.cpp:97] [PRESTO_STARTUP] Available machine memory of deployment in bytes: 8331362304
E0211 09:34:08.711936 489817 Exceptions.h:66] Line: /root/presto/presto-native-execution/presto_cpp/main/LinuxMemoryChecker.cpp:101, Function:start, Expression: config_.systemMemLimitBytes <= availableMemoryOfDeployment (64424509440 vs. 8331362304) system memory limit = 64424509440 bytes is higher than the available machine memory of deployment = 8331362304 bytes., Source: RUNTIME, ErrorCode: INVALID_STATE
terminate called after throwing an instance of 'facebook::velox::VeloxRuntimeError'
what(): Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: (64424509440 vs. 8331362304) system memory limit = 64424509440 bytes is higher than the available machine memory of deployment = 8331362304 bytes.
Retriable: False
Expression: config_.systemMemLimitBytes <= availableMemoryOfDeployment
Function: start
File: /root/presto/presto-native-execution/presto_cpp/main/LinuxMemoryChecker.cpp
Line: 101
Stack trace:
# 0 _ZN8facebook5velox7process10StackTraceC1Ei
# 1 _ZN8facebook5velox14VeloxException5State4makeIZNS1_C4EPKcmS5_St17basic_string_viewIcSt11char_traitsIcEES9_S9_S9_bNS1_4TypeES9_EUlRT_E_EESt10shared_ptrIKS2_ESA_SB_
# 2 _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 3 _ZN8facebook5velox17VeloxRuntimeErrorC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bS7_
# 4 _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
# 5 _ZN8facebook6presto18LinuxMemoryChecker5startEv
# 6 _ZN8facebook6presto12PrestoServer28addMemoryCheckerPeriodicTaskEv
# 7 _ZN8facebook6presto12PrestoServer3runEv
# 8 main
# 9 __libc_start_main
# 10 _start
*** Aborted at 1739295248 (Unix time, try 'date -d @1739295248') ***
*** Signal 6 (SIGABRT) (0x77959) received by PID 489817 (pthread TID 0x7fda627b8e80) (linux TID 489817) (maybe from PID 489817, UID 0) (code: -6), stack trace: ***
@ 000000000aca5db1 _ZN5folly10symbolizer12_GLOBAL__N_113signalHandlerEiP9siginfo_tPv
/root/presto/presto-native-execution/dependencies/deps-download/folly/folly/experimental/symbolizer/SignalHandler.cpp:453
@ 000000000001441f (unknown)
@ 000000000004300b gsignal
@ 0000000000022858 abort
@ 00000000000a4ee5 (unknown)
@ 00000000000b6f8b (unknown)
@ 00000000000b6ff6 _ZSt9terminatev
@ 00000000000b7257 __cxa_throw
@ 000000000ac4809a __cxa_throw
/root/presto/presto-native-execution/dependencies/deps-download/folly/folly/debugging/exception_tracer/ExceptionTracerLib.cpp:159
@ 000000000ab4659d _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
/root/presto/presto-native-execution/velox/velox/common/base/Exceptions.h:74
-> /root/presto/presto-native-execution/velox/velox/common/base/Exceptions.cpp
@ 000000000099ec70 _ZN8facebook6presto18LinuxMemoryChecker5startEv
/root/presto/presto-native-execution/presto_cpp/main/LinuxMemoryChecker.cpp:101
@ 0000000000ccb692 _ZN8facebook6presto12PrestoServer28addMemoryCheckerPeriodicTaskEv
/root/presto/presto-native-execution/presto_cpp/main/PrestoServer.cpp:1044
@ 0000000000cc6fd2 _ZN8facebook6presto12PrestoServer3runEv
/root/presto/presto-native-execution/presto_cpp/main/PrestoServer.cpp:552
@ 00000000009e137c main
/root/presto/presto-native-execution/presto_cpp/main/PrestoMain.cpp:30
@ 0000000000024082 __libc_start_main
@ 00000000007c909d _start
Fatal signal handler. ThreadDebugInfo object not found.
Aborted (core dumped) cgroup v2 log example
Successful start:
I0211 09:40:50.499085 4661 LinuxMemoryChecker.cpp:46] [PRESTO_STARTUP] Using cgroup v2.
I0211 09:40:50.499154 4661 LinuxMemoryChecker.cpp:55] [PRESTO_STARTUP] Using memory stat file: /sys/fs/cgroup/memory.stat
I0211 09:40:50.499213 4661 LinuxMemoryChecker.cpp:58] [PRESTO_STARTUP] Using memory max file /proc/meminfo
I0211 09:40:50.499545 4661 LinuxMemoryChecker.cpp:89] [PRESTO_STARTUP] System memory in bytes: 2147483648
I0211 09:40:50.499578 4661 LinuxMemoryChecker.cpp:92] [PRESTO_STARTUP] System memory limit in bytes: 4294967296
I0211 09:40:50.499729 4661 LinuxMemoryChecker.cpp:96] [PRESTO_STARTUP] Available machine memory of deployment in bytes: 67421741056
I0211 09:40:50.499758 4661 PeriodicMemoryChecker.cpp:48] [PRESTO_STARTUP] Creating server memory pushback checker, memory check interval 1000ms, system memory limit: 4.00GB, memory shrink size: 20.00GB
I0211 09:40:50.499864 4661 PeriodicMemoryChecker.cpp:57] [PRESTO_STARTUP] Malloc memory heap dumper is not enabled
Error – system-mem-limit-gb was higher than available machine memory of deployment:
I0211 09:44:01.242293 4985 LinuxMemoryChecker.cpp:46] [PRESTO_STARTUP] Using cgroup v2.
I0211 09:44:01.242357 4985 LinuxMemoryChecker.cpp:55] [PRESTO_STARTUP] Using memory stat file: /sys/fs/cgroup/memory.stat
I0211 09:44:01.242378 4985 LinuxMemoryChecker.cpp:58] [PRESTO_STARTUP] Using memory max file /proc/meminfo
I0211 09:44:01.242748 4985 LinuxMemoryChecker.cpp:89] [PRESTO_STARTUP] System memory in bytes: 2147483648
I0211 09:44:01.242784 4985 LinuxMemoryChecker.cpp:92] [PRESTO_STARTUP] System memory limit in bytes: 107374182400
I0211 09:44:01.242952 4985 LinuxMemoryChecker.cpp:96] [PRESTO_STARTUP] Available machine memory of deployment in bytes: 67421741056
E0211 09:44:01.242988 4985 Exceptions.h:66] Line: /root/presto/presto-native-execution/presto_cpp/main/LinuxMemoryChecker.cpp:99, Function:start, Expression: config_.systemMemLimitBytes <= availableMemoryOfDeployment (107374182400 vs. 67421741056) system memory limit = 107374182400 bytes is higher than the available machine memory of deployment = 67421741056 bytes., Source: RUNTIME, ErrorCode: INVALID_STATE
terminate called after throwing an instance of 'facebook::velox::VeloxRuntimeError'
what(): Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: (107374182400 vs. 67421741056) system memory limit = 107374182400 bytes is higher than the available machine memory of deployment = 67421741056 bytes.
Retriable: False
Expression: config_.systemMemLimitBytes <= availableMemoryOfDeployment
Function: start
File: /root/presto/presto-native-execution/presto_cpp/main/LinuxMemoryChecker.cpp
Line: 99
Stack trace:
# 0 _ZN8facebook5velox7process10StackTraceC1Ei
# 1 _ZN8facebook5velox14VeloxException5State4makeIZNS1_C4EPKcmS5_St17basic_string_viewIcSt11char_traitsIcEES9_S9_S9_bNS1_4TypeES9_EUlRT_E_EESt10shared_ptrIKS2_ESA_SB_
# 2 _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 3 _ZN8facebook5velox17VeloxRuntimeErrorC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bS7_
# 4 _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
# 5 _ZN8facebook6presto18LinuxMemoryChecker5startEv
# 6 _ZN8facebook6presto12PrestoServer28addMemoryCheckerPeriodicTaskEv
# 7 _ZN8facebook6presto12PrestoServer3runEv
# 8 main
# 9 0x0000000000029d8f
# 10 __libc_start_main
# 11 _start
*** Aborted at 1739295959 (Unix time, try 'date -d @1739295959') ***
*** Signal 6 (SIGABRT) (0x1372) received by PID 4985 (pthread TID 0x7ffff726b5c0) (linux TID 4985) (maybe from PID 4978, UID 0) (code: 0), stack trace: ***
I0211 09:45:59.955230 4992 PeriodicStatsReporter.cpp:252] Spill memory usage: current[0B] peak[0B]
@ 000000000a0017c7 _ZN5folly10symbolizer12_GLOBAL__N_113signalHandlerEiP9siginfo_tPv
/root/presto_oss_dependencies/folly/folly/experimental/symbolizer/SignalHandler.cpp:453
@ 000000000004251f (unknown)
@ 00000000000969fc pthread_kill
@ 0000000000042475 raise
@ 00000000000287f2 abort
@ 00000000000a2b9d (unknown)
@ 00000000000ae20b (unknown)
@ 00000000000ae276 _ZSt9terminatev
@ 00000000000ae4d7 __cxa_throw
@ 0000000009fefb9e __cxa_throw
/root/presto_oss_dependencies/folly/folly/experimental/exception_tracer/ExceptionTracerLib.cpp:159
@ 0000000009ea03b2 _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
/root/presto/presto-native-execution/velox/velox/common/base/Exceptions.h:82
-> /root/presto/presto-native-execution/velox/velox/common/base/Exceptions.cpp
@ 00000000008bc897 _ZN8facebook6presto18LinuxMemoryChecker5startEv
/root/presto/presto-native-execution/presto_cpp/main/LinuxMemoryChecker.cpp:99
@ 0000000000be378e _ZN8facebook6presto12PrestoServer28addMemoryCheckerPeriodicTaskEv
/root/presto/presto-native-execution/presto_cpp/main/PrestoServer.cpp:1044
@ 0000000000bdf34e _ZN8facebook6presto12PrestoServer3runEv
/root/presto/presto-native-execution/presto_cpp/main/PrestoServer.cpp:552
@ 00000000008fe625 main
/root/presto/presto-native-execution/presto_cpp/main/PrestoMain.cpp:30
@ 0000000000029d8f (unknown)
@ 0000000000029e3f __libc_start_main
@ 00000000006e4ec4 _start
Fatal signal handler. ThreadDebugInfo object not found. In summary
The LinuxMemoryChecker is a critical tool for ensuring Presto C++ runs reliably in production, especially in memory-constrained environments. By proactively monitoring system memory usage and dynamically shrinking the file cache before hitting OOM thresholds, it provides a safety net that balances performance with stability. For teams running Presto C++ at scale, especially with file caching enabled, enabling LinuxMemoryChecker and tuning its configuration is a practical step toward preventing unpredictable failures and maintaining consistent uptime.