How I ACTUALLY Measure Racehorse Performance

Race times are too noisy for accurate ratings — on that I agree. But the problem isn't ratings themselves, it's building them from the clock. The projection method rates races from the horses' form, not benchmark times, and produces better results. Here is my explanation of it.

How I ACTUALLY Measure Racehorse Performance
Photo by Pardeep Bhakar / Unsplash

An article recently published on FormNerds called "Problems of Using Times to Rate Races" lays out in detail why race times are too noisy to produce accurate ratings — distance measurement errors, going changes, wind, pace effects, and a host of other factors that make the final time an unreliable measure of race strength. It's a well-argued piece and I agree with almost every word of it.

But I also think the problems it describes, real as they are, apply to a specific approach to making ratings, not to ratings themselves. The figure makers I've learned the most from over the years don't work the way it's describing, and I think that's worth exploring.

Make sure you read the original piece before reading this blog article, because the content was written as a direct response to the FormNerds article.

P.s: This is a follow-up to the more general story of my rating making journey that was published here a month ago: https://www.anedvardsen.no/how-i-measure-racehorse-performance-across-11-countries-and-one-million-races/ . While the first article was intentionally intended for the masses, this piece is intended for the handful of people in the world interested in the more technical sides of the subject. And if you happen to be one of them, I would love to get in touch! Hit me up on https://x.com/akselnwe or send a mail to hello@mrworldpool.com.


I never made ratings that way

I've been making performance ratings for about twenty years now. I started in Scandinavia, expanded through the Middle East and Asia, and now cover most serious racing jurisdictions around the world, with the notable exception of American dirt racing (and with Australia under way). And in all that time, I have never once used benchmark times the way the article describes — comparing individual race times against a database of expected times for each class, distance, and track condition combination, then deriving race strength from the residual. That's the Beyer method, more or less, and it's been around since his speed figures were published in the Daily Racing Form in 1992. Beyer was a huge inspiration to me early on — his writing, his thinking about and love for the game, the sheer ambition of trying to quantify every race. But the method itself? The par-time approach? I moved away from that pretty quickly once I started actually making figures and ran into exactly the problems described.

The reason is exactly the one laid out in the article: the noise is too large. And regardless of how accurate the times themselves are — and yes, they'd need to be extraordinarily accurate for this approach to work — you still have the fundamental problem that pace dynamics, wind, going changes within a meeting, rail positions, and a dozen other factors all affect the final time in ways that have nothing to do with race strength. Even with perfect timing, you're measuring the wrong thing. The article nails this. When I come across someone who writes about how they make ratings primarily from times in this way — comparing race times against benchmark databases and deriving race strength from the residual — nothing more they have to say about the subject can ever interest me. I just move on. The people I listen to and learn from in this game all do it differently.

But here's the thing — there's a whole other way to make ratings that doesn't depend on any of that:

The projection method

This was always a central idea for me, even before I started making figures — something I picked up while still just reading everything I could get my hands on. It crystallised when I understood how Jerry Brown at Thoro-Graph made his figures. Brown had worked for Len Ragozin (another American figure maker, and in many ways the original), saw the problems in Ragozin's approach, and built something better. Brown's key insight was this: once you have a reasonable starting point, you're better off throwing the par times out the window and making figures from the horses, not from the clock. I've just leaned more and more in that direction with each passing year.

What does that mean in practice? It's actually quite simple in principle. You use what you already know about the horses in a race — their established ratings from previous starts — to determine the strength of today's race. The winning time becomes a secondary input, sometimes barely relevant at all. The margins between horses are what matter. The method is essentially a regression — you're finding the best fit for the entire field simultaneously, based on all available historical data points, weighted toward recent form.

Let me give a simplified example. Say Horse A has ratings of 110, 110, 111, 110 from its last four starts. Horse B has run 113, 113, 113, 112. Today, Horse B beats Horse A by a margin equivalent to 3 rating points. You don't need to look at the winning time to conclude that Horse B ran to about 113 and Horse A ran to about 110. The race calibrates itself from the horses' histories.

Now say there's a third horse in the race, a newcomer with no history, who finishes a margin equivalent to 2 points behind Horse A. That newcomer gets a rating of 108. You have enough evidence from the first two horses to anchor the race. The newcomer's rating falls out naturally.

This is the projection method. In the maths, you'd recognise it as something close to linear regression. The critical principle is that every horse in the race gets the same adjustment — if you move one horse up by two points, you move the whole race up by two points. You're always looking for the best fit across all runners, never adjusting individuals in isolation. The projection method has no class adjustment and doesn't need one — the horses bring their own ratings to the race, and those ratings already reflect their ability. Whether the race is officially labelled Class 3 or Listed is irrelevant to the maths.

And here's the telling part: even the Beyer team has been moving in this direction. Over the years, their figure makers have relied more and more on what they call "manual adjustments" — overriding what the par-based track variant says, based on what the horses' prior figures suggest the variant should be. That's the projection method. They just don't call it that. When you find yourself manually correcting your par-time variants by looking at what the horses should have run, you've conceded that the horses are a better measuring stick than the clock.

Why I don't lose sleep over the noise

Every problem raised in the article — distance measurement errors, going changes during a meeting, wind gusts, rail positions, cut-up tracks, the impossibility of getting a meaningful average from one or two races at a particular distance — is a problem for the benchmark-time approach because that approach depends on the absolute time being meaningful. The projection method doesn't. I need the margins between horses to be roughly right (I use 0.17 seconds per length, and it doesn't need to be super-precise), and I need a database of prior ratings to project from. The absolute winning time is almost irrelevant.

In my early years I did try to make everything on a race day add up — reconciling track variants across every race, chasing consistency between the clock and the projections. But experience taught me the same lesson: times simply can't be trusted. Where I came from, the timing quality was atrocious on some tracks. I was obsessive enough that I actually used video editing tools to break down times to thousandths of a second when I suspected the clock was unreliable. And what I found confirmed the suspicion — on certain tracks, the clock was just being started or stopped randomly, plus half a second here, minus half a second there, no consistency at all. Track variants bounced around wildly even on the same surface, same track, same afternoon.

But this experience also showed me something liberating: if I just trusted the projection results and ignored the clock, the ratings actually came out better — fewer obviously wrong ratings when you review them six months later. More often than not, the difference between what the clock said and what the projections said was just noise.

What I use the clock for

The winning time isn't completely useless — it's just not what I build the ratings from. I still maintain par times for each track-distance-surface combination, but mostly out of old habit. What I actually need them for is calculating the points-per-length conversion, which varies by distance — a length in a five-furlong sprint represents a bigger performance gap than a length in a two-mile staying race, simply because one length is a larger fraction of the total distance. That's how you normalise ratings so that a 120 means the same thing at 1000 metres as it does at 3200 metres. But honestly, it would work nearly as well with a rough generic estimate. If I don't have a proper par time for a particular distance, I just fall back to the nearest match I have. It barely matters. And you have to start somewhere when rating a race — so naturally you enter the winning time to kick off the process, anything else would be strange. But that's about the extent of its importance. The winning time can be a useful confirmation sometimes — it feels nice when the clock and the projection agree. But when they disagree, the projection wins for me. Almost every time.

Beyond that, the winning time serves as a mild sanity check. If the projection says a race was run to a certain level but a comparison race at the exact same distance and surface on the same day produces a very different implied track variant, I'll take note. Sometimes there's a reason — three-year-olds in development genuinely running faster than established form would suggest, or a race that was run dishonestly with a slow early pace. In those cases I might nudge the rating a point or two in the direction the clock suggests. And if a race was clearly run dishonestly, I might take a point or two off. But these are small, considered adjustments — not the foundation of the method.

The pace problem — and why it actually supports the projection approach

The article mentions pace as one of the noise factors, and it's absolutely right that opening tempo dictates final times far more than race strength does. This is extremely well-documented. A fast early pace produces a fast final time. A slow early pace produces a slow final time. Neither tells you much about how good the horses actually are.

Think of it like human athletics. In the 1500 metres, if you don't have a rabbit setting the perfect tempo, you can forget about world records. But that doesn't necessarily mean the performances was worse. If the first lap is tactical and slow, and then someone simply destroys the field and wins by a huge margin, that's not a weak race — it's a dominant performance in a tactically run event. Conversely, when a pacemaker sets it up perfectly and the whole field runs personal bests, I'm not sure the 5th place finisher who PBed ran better now than he did while winning the Olympics Gold Medal the year before, in a time 2 seconds slower. There are no rabbits in the olympics and so the times are almost always slower than in the Diamond League events, even though everyone in the field have had that olympic race marked in their calendars as the seasons biggest goal for at least four years, if not more. The clock says he did. Common sense says he didn't.

This is exactly the situation where the projection method excels. It doesn't care what the clock says. It cares what the margins say, what the horses' histories say, and what the best fit across all available data points says. Opening tempo becomes almost irrelevant as a rating input, as long as it's not too extreme either way.

I should mention that Craig Milkowski, who makes the figures for TimeformUS, takes a different but compatible approach. He also starts from the same principle — rating the whole race first, every horse getting the same adjustment — but then goes a step further and adjusts individual horses based on how the pace scenario affected them specifically, rewarding horses compromised by fast pace and penalising those who benefited from slow pace. His figures share the key philosophical principle: no par times, ratings built from the horses not the clock. I've respected his work since I first encountered it about ten years ago, and I wasn't even aware until recently that he had arrived at the same "no pars" philosophy independently. His approach requires excellent sectional data and deep expertise in working with sectionals at that level — not something I'd recommend as a starting point — but it produces serious results. If American racing is your thing.

On going back and revising

One thing I do that probably makes some people uncomfortable: I go back and revise ratings when later evidence makes it clear that I got them wrong. Jerry Brown does the same thing. The logic is straightforward — my entire system depends on historical ratings being as accurate as possible, because those historical ratings are the data points I use to calibrate new races. If I gave a race a variant that was clearly wrong, and I can see that now with the benefit of hindsight, then leaving the error in the database contaminates every future rating that touches those horses. Fixing it makes everything more accurate going forward.

I think some of the resistance to this practice comes from a belief that the original track variant calculation is objective — that it measured something absolute. I don't see it that way. I know I'm estimating something uncertain, and I'd rather correct an estimate when better evidence comes along than leave an error in the system out of principle. All ratings have margins of error. The question is whether you acknowledge that and work to reduce them, or whether you trust the initial calculation and move on.

Fat Tony knowledge

I'm a practical type. In Nassim Taleb's framing, I probably admire Fat Tony's hard-won street wisdom as much as Nero's intellectual rigour. And I think the fact that I came up in an environment with terrible data — no sectional times at all in Scandinavia, unreliable clocks, limited information — actually forced me to develop a method that's extremely robust against all the problems described in the article. I couldn't rely on the clock because the clock was lying to me half the time. I had to find another way. And it turns out that other way works better even when the clock is honest.

In Hong Kong, for instance, the timing is excellent — reliable, precise, consistent. And even there, the projection method produces better ratings than the clock alone would give you. Because the issue was never just data quality. Even with perfect timing, all the other problems remain — pace dynamics, wind shifts, going changes within a meeting, the rail moving out, horses scouting wider as the track wears. A 1:09 at Sha Tin means something completely different when the first 400m was run in 22.8 versus 24.0, and none of that has anything to do with how good the clock is.

On weight

The article mentions in passing the Australian tradition of weight-and-margin ratings. I adjust for weight too, and I think it's important. The relationship between weight and performance is well-established. A horse carrying 60kg that runs the same margin as a horse carrying 54kg has run a better race. That said, I'm less confident in the adjustment when the weight spread gets extreme. In Scandinavia, you regularly see 15kg differences between top weight and bottom weight, sometimes 20kg. At those extremes, I'm not sure any formula captures the true effect precisely. There's also the issue that a horse's ability to carry weight isn't constant — it's affected by track conditions, pace, track speed, and probably other things I haven't figured out yet. I haven't found a good way to adjust for all of that. But I still find it's better to correct for weight than not to.

The track speed bonus

There's another powerful thing the projection method gives you almost for free: track speed. If you have established ratings for most of the field, you can flip the equation around — instead of using track speed to rate the horses, you use the horses to rate the track. Every horse is essentially a measuring instrument. If a field of horses with known ability all come out three points below expectation, the track was slow. You don't need benchmark times or meeting averages to see that. It's the same regression running in reverse.

The honest summary

My ratings usually have a margin of error of 1–2 points on any given run. I'm fine with that. There are harder cases, of course — two-year-olds with no history are almost impossible to rate reliably until you get multiple races at the same distance and surface on the same day and can take a chance that the times are roughly correct. Early three-year-olds before they start meeting older horses have a similar problem. And when I expand into a new country, the first year is genuinely difficult — you have to lean on times and your own judgement more than you'd like and just wait for enough reference points to accumulate so the projections can take over. That bootstrapping period usually takes a couple of seasons before things really fall into place.

But the beauty of the regression approach is that precision increases the more data points you have. Each new race adds information. The more history you have on the horses in a field, the less any single error matters — it just gets diluted by everything else you know. So the ratings don't fix themselves backwards by itself, that needs manual intervention, but the new ones being produced get more and more accurate over time anyway.

The article asks whether times can produce accurate ratings. I think it's the wrong question. The right question is whether you need accurate times to produce accurate ratings. I don't think you do.


The ratings discussed in this article are the foundation of Mr. World Pool — a horse racing analytics platform with racecards, ratings, form, and ML predictions across most major racing jurisdictions worldwide.