Some architectures do support "cache pinning" -- wherein you can instruct the CPU to reserve a portion of cache for a specific memory location (actually cache line or page). To the best of my knowledge, neither Intel nor AMD processors implement such a feature.
You can however instruct the CPU to load something into cache prior to using it, using a prefetch instruction. (It is still subject to be evicted from cache at a later time as usual.) In GCC, this is done using __builtin_prefetch() [1].
But -- if you add prefetches blindly, it will almost certainly slow down whatever it is you're trying to fix. You need to analyze your memory access patterns and the assembly code being executed, using tools like perf and llvm-mca, and recognize the cache line usage, cache access latency, pipeline stalls, cross-process and -core contention for the cache line, register pressure, etc. of the code in question, to understand whether a prefetch is appropriate, and where to place it if so. Notably, it's a challenge to get a compiler to emit a prefetch at a useful line of assembly.
What evidence are you working from, that access latency to a single specific global variable is a performance bottleneck for your application?
To share an anecdote -- the only time I've been in a similar situation, was working on a 10 Gbps line-rate network processor. Many functions had CPU budgets of fewer than 100 cycles. We found (using perf) that often, these cycles were eaten up stalling for various globals to be fetched from DRAM, despite that the globals were accessed with every packet. Notably, prefetching didn't help -- we could not prefetch early enough to avoid the entire stall, and issuing the prefetch itself ate precious CPU cycles. However, the true culprit was that the small handful of globals in question were each allocated on the start of a hugepage boundary -- and therefore assigned to the same cache line. This particular CPU had low cache associativity (L1 was 2-way IIRC), and thus the globals kept bumping each other out of cache. The solution in our case was to manually align the globals to different cache lines.
You might then ask how to keep it there, but I doubt you can.
Your best bet is to keep the working set of used memory small during the steps you care about. Also could be worth keeping in mind that caches work on "lines", small blocks of data. So you can give yourself a small edge if you keep all the crap you're going to use in one contiguous block of memory.
See also the book The Art of Multiprocessor Programming.