mirror of
https://github.com/azahar-emu/dynarmic.git
synced 2026-04-06 21:45:30 +02:00
`vfpclassp* k, xmm, i8` has better latency(4->3) and allocates better execution ports(01->5) that are out of the way of ALU-ports than `vcmpunordp* xmm, xmm, xmm`(`vcmpp* xmm, xmm, xmm, i8`) and removes the pipeline dependency on `xmm0` in favor AVX512 `k`-mask registers. `vblendmp* xmm, k, xmm, mem` is about the same throughput and latency as `blendvp* xmm. mem` but has the benefit of embedded broadcasts to reduce memory bandwidth(32/64-bit read rather than 128-bit) and lends itself to a future size optimization feature of `constant_pool`.