For example, Austin's Measuring and Managing Performance in Organizations gives a helpful 3-party model for understanding how simplistic measurement-by-numbers goes awry. He starts with a Principal-Agent and then adds a Customer as the 3rd party; the net effect is that as a Principal becomes more and more energetic in enforcing a numerical management scheme, the Customer is at first better served and then served much worse.
As a side effect he recreates or overlaps with the "Equal Compensation Principle" (described in Milgrom & Roberts' Economics, Organization and Management). Put briefly: give a rational agent more than one thing to do, and they will only do the most profitable thing for them to do. To avoid this problem you need perfectly equal compensation of their alternatives, but that's flawed too, because you rarely want an agent to divide their time exactly into equal shares.
Then there's the annoyance that most goals set are just made the hell up. Just yanked out from an unwilling fundament. Which means you're not planning, you're not objective, you're not creating comparative measurement. It's a lottery ticket with delusions of grandeur. In Wheeler & Chambers' Understanding Statistical Process Control, the authors emphasise that you cannot improve a process that you have not first measured and then stablised. If you don't have a baseline, you can't measure changes. If it's not a stable process, you can't tell if changes are meaningful or just noise. As they put it, more pithily:
> This is why it is futile to try and set a goal on an unstable process -- one cannot know what it can do. Likewise it is futile to set a goal for a stable process -- it is already doing all that it can do! The setting of goals by managers is usually a way of passing the buck when they don't know how to change things.
That last sentence summarises pretty much how I feel about my strawperson impressions of OKRs.
 https://www.amazon.com/Understanding-Statistical-Process-Con..., though I prefer Montgomery's Introduction to Statistical Quality Control as a much broader introduction with less of an old-man-yells-at-cloud vibe -- https://www.amazon.com/Introduction-Statistical-Quality-Cont...
It talks about the different between informative and motivational metrics - the former being just for awareness while the latter being intended as explicit targets for employees. While it's easy to bleed the former into the latter, there's nothing inherently wrong with measuring game-able statistics as long as the incentive to muck with them is minimized. Easier said than done, but not impossible.
The article is pretty dismissive of story counting, which is a semi-popular approach to estimation:
(Anecdotally, we had someone look at our teams' previous sprints and found that # stories / sprint was more consistent than points per sprint over the time frame they reviewed. YMMV.)
The focus on velocity-related metrics seems smart - Git Prime did some analysis on this and found that frequency of commits was a useful measure:
I don't see how these are any less game-able than klocs though.