(Even if hardware support did exist earlier, you don't want to deal with errata for a new hardware feature. It's kind of amazing anything ever works.)
Not a new capability at all, but Linux support is new. The lag to adoption for this sort of stuff always seems very high. https://www.phoronix.com/news/Linux-6.17-ARM64
I know Arm and XTensa have offered on board trace buffers for ages so operating systems could record themselves.
What's neat here is that Apple has bundled this nicely into a polished developer tool rather than one more discreet tool.
Intel has supported such capability via Intel Processor Trace (PT) since at least 2014 [1]. Here is a full trace recorder built by Jane Street feeding into standard program trace visualizers [2].
ARM has supported such capability via the standard CoreSight Program Trace Macrocell (PTM)[3]/Embedded Trace Macrocell (ETM)[4] since at least 2000.
If you pair it with standard data trace, which is less commonly available, then you have the prerequisites for a hardware trace time travel debugger as originally seen in the early 2000s [5]
You can get similar performance/function tracing entirely in software via software-instrumented instruction trace and similar debugging information (though less granular performance information) via record-replay time travel debugger recordings.
[1] https://www.intel.com/content/www/us/en/support/articles/000...
[2] https://blog.janestreet.com/magic-trace/
[3] https://developer.arm.com/documentation/ihi0035/b/Program-Fl...
Real Scottish craftspeople enjoy having amazing tooling. They even know how to use a debugger!
Or you have the free time to be a loudmouth online and mouth off about how tech's x y and/or z are dumb and how a shitty bash script is more than enough.
Thankfully I think now Linux has some really great tooling now. But very few people have calls of duty where they are called to do super serious work. Mostly the job calls for pretty simple shit. Make the dumb app go. Throw more servers at the problem someday. The tension is real. It's unfortunate that the cutting edge is so spread out, is so far in advance of the main body of devs.
Other OSes have those too, but they're harder to use and the interfaces aren't as good.
I think it should work if you run `bputil -c`? Didn't try it though.
The idea of dtrace is that you can do that, yes, but to that, you need to be authorized, and the things you can do are (supposed to be) limited to looking to see what’s happening in the system. You can read arguments, but can’t, for example, change the arguments of a system call or disable authorization checks on system calls.
(Or rather, the exception requires restarting the machine to turn the security off.)
Where are the performace tools that wrap those capabilities? IPT has Magic Trace what is the equivalent tool for ARM?
Segger trace[3].
Lauterbach trace[4].
TI Code Composer trace[5].
[1] https://www.ghs.com/video/debug_in_minutes.html
[2] https://ghs.com/products/MULTI_IDE.html
[3] https://youtu.be/sT7N580EI-M?si=53S_DQ5E4IN8AXqM
[4] https://www2.lauterbach.com/pdf/trace_tutorial.pdf
[5] https://software-dl.ti.com/ccs/esd/documents/users_guide_ccs...
This seems unfair. Isn’t there a pretty good likelihood that the number of performance counters in the CPU (or whatever) simply don’t exist in the production versions of the previous CPUs?
do_not_redeem•5mo ago
Potentially interesting, but it's not really clear whether this is anything new or not. valgrind + kcachegrind does this too.
https://developer.apple.com/documentation/xcode/analyzing-cp...
These screenshots look a lot like kcachegrind with a slightly reimagined UI. Is there actually anything new here, or is this another case of Apple finally catching up to the open source world?
GeekyBear•5mo ago
Looking at the kcachegrind homepage, it doesn't sound like they are pulling their data directly from the CPU core itself:
> Callgrind uses runtime instrumentation via the Valgrind framework for its cache simulation and call-graph generation.
https://kcachegrind.github.io/html/Home.html
Apple seems to have modified it's core design so that it will stream data to a log file while the code is running.
> Recent Apple silicon devices can capture a processor trace where the CPU stores information about the code it runs, including the branches it takes and the instructions it jumps to. The CPU streams this information to an area on the file system so that you can analyze it with the Processor Trace instrument.
do_not_redeem•5mo ago
jauntywundrkind•5mo ago
Forgetting this tool-space, but at least some of these tools can make use of that hardware:
https://github.com/intel/pcm https://github.com/andikleen/pmu-tools
kaladin-jasnah•5mo ago
nkurz•5mo ago
do_not_redeem•5mo ago
(I even prefer cachegrind's approach since the numbers will be less distorted by other random background activity on the machine, but that could just be idealism on my part, who knows.)
If perf or the vendor-specific tools like vtune/uprof aren't sufficient for you then I'm curious what do you use?
nkurz•5mo ago
Cachegrind is occasionally inaccurate due to an inaccurate model, but the greater problem was that cache hit percentages only tell a fraction of the story. To be able to predict performance I often needed to be able to accurately measure things like the number of memory requests in flight.
Searching now for an example, I hit on a comment I made here a few years ago where this new tool probably would have been helpful: https://news.ycombinator.com/item?id=18442131
In general I have much greater faith in the on chip performance registers. That said, other than glancing at news stories like this I haven't been keeping up with recent advances. I guess it's possible that cachegrind and friends have improved since I was using them.
do_not_redeem•5mo ago
I've never come across pmu-tools, thanks for the tip. I'll try it out next time I'm in the trenches.