# local_cache **Repository Path**: 121786404/local_cache ## Basic Information - **Project Name**: local_cache - **Description**: https://github.com/triton-inference-server/local_cache - **Primary Language**: Unknown - **License**: BSD-3-Clause - **Default Branch**: r25.06 - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-07-14 - **Last Updated**: 2026-01-13 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README [![License](https://img.shields.io/badge/License-BSD3-lightgrey.svg)](https://opensource.org/licenses/BSD-3-Clause) # Triton Local Cache This repo contains an example [TRITONCACHE API](https://github.com/triton-inference-server/core/blob/main/include/triton/core/tritoncache.h) implementation for caching data locally in-memory. Ask questions or report problems in the main Triton [issues page](https://github.com/triton-inference-server/server/issues). ## Build the Cache Use a recent cmake to build. First install the required dependencies. ``` $ apt-get install libboost-dev rapidjson-dev ``` To build the cache: ``` $ mkdir build $ cd build $ cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install .. $ make install ``` The following required Triton repositories will be pulled and used in the build. By default the "main" branch/tag will be used for each repo but the following CMake arguments can be used to override. * triton-inference-server/core: `-D TRITON_CORE_REPO_TAG=[tag]` * triton-inference-server/common: `-D TRITON_COMMON_REPO_TAG=[tag]` ## Configuring the Cache Like other `TRITONCACHE` implementations, this cache is configured through the `tritonserver --cache-config` CLI arg or through the `TRITONSERVER_SetCacheConfig` API. Currently, the following config fields are supported: - `size`: The fixed size (in bytes) of CPU memory allocated to the cache upfront. If this value is too large (ex: greater than available memory) or too small (ex: smaller than required overhead such as ~1-2 KB), initialization may fail. - example: `tritonserver --cache-config local,size=1048576` ## Metrics When `TRITON_ENABLE_METRICS` is enabled in this cache (enabled by default), it will check to see if the running Triton server has metrics enabled as well. If so, the cache will publish additional cache-specific metrics to Triton's metrics endpoint through the [Custom Metrics API](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/metrics.md#custom-metrics). ### Cache Metrics The following metrics are reported by this cache implementation: |Category |Metric |Metric Name |Description |Granularity |Frequency | |--------------|---------------------------|------------------------------|------------------------------------------------------------|------------|-------------| |Utilization |Total Cache Utilization |`nv_cache_util` |Total cache utilization rate (0.0 - 1.0) |Server-wide |Per interval | |Count |Total Cache Entry Count |`nv_cache_num_entries` |Total number of entries stored in cache |Server-wide |Per interval | | |Total Cache Lookup Count |`nv_cache_num_lookups` |Total number of cache lookups done by Triton |Server-wide |Per interval | | |Total Cache Hit Count |`nv_cache_num_hits` |Total number of cache hits |Server-wide |Per interval | | |Total Cache Miss Count |`nv_cache_num_misses` |Total number of cache misses |Server-wide |Per interval | | |Total Cache Eviction Count |`nv_cache_num_evictions` |Total number of cache evictions |Server-wide |Per interval | |Latency |Total Cache Lookup Time |`nv_cache_lookup_duration` |Cumulative time spent doing cache lookups (microseconds) |Server-wide |Per interval | | |Total Cache Insertion Time |`nv_cache_insertion_duration` |Cumulative time spent doint cache insertions (microseconds) |Server-wide |Per interval |