Hello, Davidbepo here with my second article for The Chip Collective.
Today I’m writing to explain my idea of a perfect binning and turbo method, this idea has been on my mind for more than a year. The idea is basically to have the best possible set of cores doing the workload, this is similar to the preferred core concept, but it goes beyond that. It’s also similar to how the AMD Ryzen 3000 series operates but it has none of the issues that come with it, while keeping the advantages.
Here are some graphs to better illustrate the idea before I get to explaining it more in depth:
|Current Status||Core 0||Core 1||Core 2||Core 3||Core 4||Core 5||Core 6||Core 7||SC Clock|
|Best possible without changing binning method||Core 0||Core 1||Core 2||Core 3||Core 4||Core 5||Core 6||Core 7||SC Clock|
This is how Intel currently bins the Core i9-9900K; which is my example CPU and how this idea could improve the current binning process without changing the method but only just eating the margin like they have been doing.
Here is what my idea would look like:
|My Idea||Core 0||Core 1||Core 2||Core 3||Core 4||Core 5||Core 6||Core 7||SC Clock|
The idea is to always use the best core for one core workload and the two best cores for two core workloads and so on. This improves the performance for lightly threaded workloads and can also improve the all - 2 cores turbo in situations where there are no thermal/power limits. This idea also addresses one of Ryzen 3000’s boost algorithm flaws, the lack of reliability. It does so by having a table of per core turbo speeds that is known and reliable.
|Current Status - Power Bound||1C||2C||3C||4C||5C||6C||7C||8C|
|My Idea - Not Power Bound||1C||2C||3C||4C||5C||6C||7C||8C|
|My Idea - Power Bound (Approximate)||1C||2C||3C||4C||5C||6C||7C||8C|
All of the above speeds are SSE4 p95 stable.
This could still leave a bit of performance out of the stock configuration, by not being workload dependent, unlike Ryzen 3000. That is easy to solve, just add a positive offset for lighter loads, like +100MHz for non SSE code, this is similar to the negative AVX offset.
This part was not on my original idea, but it makes it even better.
And basicaly, that is my idea, if it makes it easier you can think of it as a out of the box perfectly tuned per core OC.
Also important to note is that this idea requires OS support to keep the workloads pinned to the best set of cores, without this support clocks will be lower.
EDITORS NOTE: Windows 10 2H19 will have this ability for best core.
If you have any doubt please let me know on Twitter or Reddit.