# Aleksey Kladov - It’s Not Always iCache (Highlights)

## Metadata
**Cover**:: https://readwise-assets.s3.amazonaws.com/static/images/article2.74d541386bbf.png
**Source**:: #from/readwise
**Zettel**:: #zettel/fleeting
**Status**:: #x
**Authors**:: [[Aleksey Kladov]]
**Full Title**:: It’s Not Always iCache
**Category**:: #articles #readwise/articles
**Category Icon**:: 📰
**URL**:: [matklad.github.io](https://matklad.github.io//2021/07/10/its-not-always-icache.html)
**Host**:: [[matklad.github.io]]
**Highlighted**:: [[2021-07-25]]
**Created**:: [[2022-09-26]]
## Highlights
- On Linux, the best tool to quickly access the performance of any program is perf stat.
#performance #example #linux
code
```
$ perf stat -e instructions,cycles,\
L1-dcache-loads,L1-dcache-load-misses,L1-dcache-prefetches,\
L1-icache-loads,L1-icache-load-misses,cache-misses \
./always
```
`
- While perf takes the real data from the CPU, an alternative approach is to run the program in a simulated environment.
That’s what cachegrind tool does.
- Note that the number of times CPU refers to iCache should correspond to the number of instructions it executes.
### Conclusions
- Inlining might cause C to use more registers.
This means that prologue and epilogue grow additional push/pop instructions, which also use stack memory.
- Generalizing from the first point, if S is called in a loop or in an if, the compiler might hoist some instructions of S to before the branch, moving them from the cold path to the hot path.
- With more local variables and control flow in the stack frame to juggle, compiler might accidentally pessimize the hot loop.