Probably not useful for most of my use cases (I'm usually injecting a payload, so I'd still have the pointer-distance issue between the executable and my payload), but it's still potentially handy. Will have to keep that around!
Indeed, aside from a party trick, why build an executable trampoline at runtime when you can store and retrieve the context, or a pointer to the context, with SetWindowLong() / GetWindowLong() [1]?
Slightly related: in my view Win32 windows are a faithful implementation of the Actor Model. The window proc of a window is mutable, it represents the current behavior, and can be changed in response to any received message. While I haven't personally seen this used in Win32 programs it is a powerful feature as it allows for implementing interaction state machines in a very natural way (the same way that Miro Samek promotes in his book.)
[1] https://learn.microsoft.com/en-us/windows/win32/api/winuser/...
The code as written, though, is missing a call to FlushInstructionCache() and might not work in processes that prohibit dynamic code generation. An alternative is to just pregenerate an array of trampolines in a code segment, each referencing a mutable pointer in a parallel array in the data segment. These can be generated straightforwardly with a little template magic. This adds size to the executable unlike an empty RWX segment, but doesn't run afoul of any dynamic codegen restrictions or require I-cache flushing. The number of trampolines must be predetermined, but the RWX segment has the same limitation.
This two step approach is the only way I found to use rust closures for wndproc without double allocation and additional indirection.
Another seemingly underutilised feature closely related to {Get,Set}WindowLong is cbClsExtra/cbWndExtra which lets you allocate additional data associated with a window, and store whatever you want there. The indices to the GWL/SWL function are quite revealing of how this mechanism works:
https://learn.microsoft.com/en-us/windows/win32/api/winuser/...
I guess I need to prove a point on my Github during next week.
C++ lambdas are basically old style C++ functors that are compiled generated, with the calling address being the operator().
https://github.com/pjmlp/LambdaWndProc/blob/main/LambdaWndPr...
https://github.com/pjmlp/LambdaWndProc/blob/main/LambdaWndPr...
And that's why I generally don't see C to have closures, and requires a JIT/dynamic code generation approach as this article has actually done (using shadow stacks). There is also a hack in GNU C which introduce local function lambda, but it is not in ISO C, and obviously won't in the next decade or so.
[^1]: https://en.wikipedia.org/wiki/Closure_(computer_programming)
This is cool, but isn’t runtime code generation pretty frowned upon nowadays?
cyberax•1mo ago
Windows actually had a workaround in its NX-bit implementation that recognized the byte patterns of these trampolines from the fault handler: https://web.archive.org/web/20090123222148/http://support.mi...
kmeisthax•1mo ago
pjc50•1mo ago
ack_complete•1mo ago
Windows x64 and ARM64 do use register passing, with 4 registers for x64 (rcx/rdx/r8/r9) and 8 registers for ARM64 (x0-x7). Passing an additional parameter on the stack would be cheap compared to the workarounds that everyone has to do now.
pwdisswordfishy•1mo ago
Const-me•1mo ago
They designed windows classes to be reusable, and assumed many developers going to reuse windows classes across windows.
Consider the following use case. Programmer creates a window class for a custom control, registers the class. Designs a dialog template with multiple of these custom controls in a single dialog. Then creates the dialog by calling DialogBoxW or similar.
These custom controls are created automatically multiple at once, hard to provide context pointers for each control.
barrkel•1mo ago