iOS and macOS Performance Tuning: Cocoa, Cocoa Touch, Objective-C, and Swift (Developer's Library)

Category: Programming
Author: Marcel Weiher
3.8
This Year Hacker News 7
This Month Hacker News 1

Comments

by mpweiher   2019-07-12
Yes and no.

You are right in that changes to drawing induced by the original iPhone are responsible for at least part of the widgetization of CocoaTouch. The first iPhone(s) had a really, really slow CPU but somewhat decent GPU, so moving more rendering functions to the GPU made sense.

Originally, Cocoa as well as its NeXTstep predecessor did essentially all drawing on the CPU (some blotting on the NeXTdimension notwithstanding). And this was usually fast enough. At some point, window compositing was moved to the GPU (Quartz Compositor). With the phone, animations were both wanted for "haptics" and needed in order to cover for the slowness of the device (distract the monkey by animating things into place while we catch up... g ), and the CPU was also rather slow.

So instead of just compositing the contents of windows, CocoaTouch (via CoreAnimation) now could and would also composite the contents of views. But that's somewhat in conflict with the drawing model, and the conflict was never fully resolved.

> texture upload is too slow

First, you don't have to have separate textures for every bit of text. You can also just draw the text into a bigger view.

> redrawing your text each frame

Second, Cocoa does not redraw the entire screen each time, and does not have to redraw/reupload the texture each time (if it is using textures). It keeps track of damaged regions quite meticulously and only draws the parts that have changed, down to partial view precision (if the views co-operate). Views that intersect the damage get their drawRect:: method invoked, and that method gets a damage list so it can also optimise its drawing.

Now if you actually have a texture living in the GPU and you are incapable of drawing into that texture, then you must replace the texture wholesale and the rectangle/view based optimisations won't work. However, I know that they do work, at least to some extent, because we were able to optimise animations on an iOS app by switching from layer-based drawing to a view with drawRect:: and carefully computing and honouring the damage-rect. It went from using 100% CPU for 2-6 fps to 2% CPU at 60fps. (discussed in more detail with other examples in my book: iOS and macOS Performance Tuning: Cocoa, Cocoa Touch, Objective-C, and Swift, https://www.amazon.com/gp/product/0321842847/ref=as_li_tl?ie...)

Third, if your text does change, you have to redraw everything from scratch anyway.

Fourth, while the original phone was too slow for this and lots of other things, modern phones and computers are easily capable of doing that sort of drawing. The performance can sometimes be better using a pure texture approach and sometimes it is (much) better using a more drawing-centred approach (see above).

by mpweiher   2019-07-12
Yeah, Swift-most-everything is pretty slow, but particularly parsing/generating. Pre-Swift Foundation serialisation code was already...majestic, and in the Swift conversion they've typically managed to slow things down even further. Which didn't seem possible, but they managed.

I have given a bunch of talks[1] on this topic, there's also a chapter in my iOS/macOS performance book[2], which I really recommend if you want to understand this particular topic. I did really fast XML[3][4], CSV[5] and binary plist parsers[6] for Cocoa and also a fast JSON serialiser[7]. All of these are usually around an order of magnitude faster than their Apple equivalents.

Sadly, I haven't gotten around to doing a JSON parser. One reason for this is that parsing the JSON at character level is actually the smaller problem, performance-wise, same as for XML. Performance tends to be largely determined by what you create as a result. If you crate generic Foundation/Swift dictionaries/arrays/etc. you have already lost. The overhead of these generic data structure completely overwhelms the cost of scanning a few bytes.

So you need something more akin to a steaming interface, and if you create objects you must create them directly, without generic temporary objects. This is where XML is easier, because it has an opening tag that you can use to determine what object to create. With JSON, you get "{" so basically you have to know what structure level corresponds to what objects.

Maybe I should write that parser...

[1] https://www.google.com/search?hl=en&q=marcel%20weiher%20perf...

[2] https://www.amazon.com/gp/product/0321842847/

[3] https://github.com/mpw/Objective-XML

[4] https://blog.metaobject.com/2010/05/xml-performance-revisite...

[5] https://github.com/mpw/MPWFoundation/blob/master/Collections...

[6] https://github.com/mpw/MPWFoundation/blob/master/Collections...

[7] https://github.com/mpw/MPWFoundation/blob/master/Streams.sub...

by bluk   2019-05-14
In microbenchmarks like https://www.amazon.com/iOS-macOS-Performance-Tuning-Objectiv... ) has pretty damning benchmarks against Swift versus Objective-C even without using any tricky optimizations (not what you originally asked but something to consider). Of course the whole book (written by a former Apple engineer IIRC) is pretty much full of performance gotchas across many Apple frameworks/APIs, but the chapter on Swift was pretty harsh and basically said Swift fails to live up to its name versus Objective-C (at least at the time the book was written).

I think most people think that Swift performs like Rust/C++/C since it doesn't have a garbage collector and imagine ARC is providing relatively free memory management, but it seems Swift performs closer to languages with a garbage collector due to the various design constraints on the language (interoperability with Objective-C, ARC isn't free, and maybe without the explicit lifetime/ownership declarations in the code like Rust, ARC can't be optimized as well).

by mpweiher   2017-12-20
> Generic types are opportunistically specialized and in my experience, the optimizer has gotten a bit better in that regard

That's always the answer: "the compiler has gotten better and will get better still". Your claim was that Objective-C has all this "extra work" and indirection, but Swift actually has more places where this applies, and pretends it does not. With Objective-C, what you see is what you get, the performance model is transparent and hackable. With Swift, the performance model is almost completely opaque and not really hackable.

>None of the above is possible in Objective-C, though, because of its type system.

What does the "type system" have to do with any of this? It is trivial to create, for example, extremely fast collections of primitive types with value semantics and without all this machinery. A little extra effort, but better and predictable performance. If you want it more generically, even NeXTSTep 2.x had NXStorage, which allowed you to create contiguous collections of arbitrary structs.

Oh...people seem to forget the Objective-C has structs. And unlike Swift structs they are predictable. Oh, and if you really want to get fancy you can implement poor-man's generics by creating a header with a "type variable" and including that in your .m file with the "type variable" #defined. Not sure I recommend it, but it is possible.

The fact the Foundation removed these helpful kinds of classes like NXStorage and wanted to pretend Objective-C is a pure OOPL is a faulty decision by the library creators, not a limitation of Objective-C. And that Foundation was gutted by CoreFoundation, making everything even slower still was also a purely political project.

In general, you seem to be using "Objective-C" in this pure OOPL sense of "Objective-C without the C" (which is kind of weird because that is what Swift is supposed to be, according to the propaganda). Objective-C is a hybrid language consisting of C and a messaging layer on top. You write your components in C and connect them up using dynamic messaging. And even that layer is fairly trivial to optimize with IMP-caching, object-caching and retain/release elision.

Chapter 9 goes into a lot of details on Swifft: https://www.amazon.com/gp/product/0321842847/

A few Swift issues surprised me, to be honest. For example native Swift dictionaries with primitive types (should be a slam dunk with value types and generics) are significantly slower than NSDictionary from Objective-C, which isn't exactly a high performance dictionary implementation. About 1.8x with optimizations, 3.5x without.

This is another point. The gap between Swift and Objective-C widens a lot with unoptimized code. Sometimes comically so, 10x isn't unusual and I've seen 100x and 1000x. This of course means that optimized Swift code is a dance on the volcano. Since optimizations aren't guaranteed and there are no diagnostics, your code can turn into a lead balloon at any time.

And of course debug builds in Xcode are compiled with optimization off. That means for some code either (a) the unoptimized build will be unusable or (b) all those optimizations actually don't matter. See "The Death of Optimizing Compilers" by D.J. Bernstein.

Anyway, you asked for some links (without providing any yourself):

https://github.com/helje5/http-c-vs-swift

https://github.com/bignerdranch/Freddy/wiki/JSONParser

"Several seconds to parse 1.5MB JSON files"

https://github.com/owensd/swift-perf

But really, all you need to do is run some real-world code.

You also mention looking at the assembly output of the Swift compiler to tune your program. This alone should be an indication that either (a) you work on the Swift compiler team or (b) you are having to expend a lot more effort on getting your Swift code to perform than you should. Or both.

by mpweiher   2017-10-23
The Knuth version goes on:

"Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified."

Above the famous quote:

"The conventional wisdom shared by many of today's software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by penny- wise-and-pound-foolish programmers, who can't debug or maintain their "optimized" programs. In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal; and I believe the same viewpoint should prevail in software engineering. Of course I wouldn't bother making such optimizations on a one-shot job, but when it's a question of preparing quality programs, I don't want to restrict myself to tools that deny me such efficiencies."

All this from "Structured Programming with Goto Statements"[1], which is an advocacy piece for optimization. And as I've written before, we as an industry typically squander many orders of magnitude of performance. An iPhone has significantly more CPU horsepower than a Cray 1 supercomputer, yet we actually think it's OK that programs have problems when their data-sets increase to over a hundred small entries (notes/tasks/etc.).

Anyway, I write a lot more about this in my upcoming book: "iOS and macOS Performance Tuning: Cocoa, Cocoa Touch, Objective-C, and Swift"[2]

[1] https://www.amazon.com/iOS-macOS-Performance-Tuning-Objectiv...

by mpweiher   2017-09-18
Lots of mobile apps are written in Objective-C, and Objective-C is C. A superset of C to be exact.

Since it's a single hybrid language, it's trivial to remove slower features fro, performance-intensive parts.

See my UIKonf talk https://www.youtube.com/watch?v=kHG_zw7%205SjE&feature=youtu...

Or my book: https://www.amazon.com/gp/product/0321842847/ref=as_li_tl?ie...

by mpweiher   2017-08-19
Yes.

For example, I wrote "iOS and macOS Performance Tuning: Cocoa, Cocoa Touch, Objective-C, and Swift"[1][2] using LaTeX, and I think it came out rather well (Pearson has some pretty amazing LaTeX compositors that took my rough ramblings and turned them into something beautiful).

Quite a while ago, I also used TeX (not LaTeX, IIRC) as part of the typesetting backend of a database publishing tool for the international ISBN agency, to publish the PID (Publisher's International Directory). This was a challenging project. IIRC, each of the directories (there were several) was >1000 pages, 4 column text in about a 4 point font. Without chapter breaks. My colleagues tried FrameMaker first on a subset, they let it run overnight and by morning it had kernel-panicked the NeXTStation we were running it on. The box had run out of swap.

TeX was great, it just chugged away at around 1-4 pages per second and never missed a beat. Customer was very happy. The most difficult part was getting TeX not to try so hard to get a "good layout", which wasn't possible given the constraints and for these types of entries just made everything looks worse.

[1] https://www.pearsonhighered.com/program/Weiher-i-OS-and-mac-...

[2] https://www.amazon.com/gp/product/0321842847/ref=as_li_tl?ie...

by mpweiher   2017-08-19
> "this cannot be the case, because ...".

What I should have added is that invariably, the problem would be in one of these places that they had eliminated by reasoning.

While you obviously need to think about your code, otherwise you can't formulate useful hypotheses, you then must validate those hypotheses. And if you've done any performance work, you will probably know that those hypotheses are also almost invariably wrong. Which is why performance work without measurement is usually either useless or downright counterproductive. Why should it be different for other aspects of code?

Again, needing to form hypotheses is obviously crucial (I also talk about this in my performance book, iOS and macOS Performance Tuning [1]), I've also seen a lot of waste in just gathering reams of data without knowing what you're looking for.

That's why I wrote experimentalist, not "data gatherer". An experiment requires a hypothesis.

[1] https://www.amazon.com/gp/product/0321842847/ref=as_li_tl?ie...

by mpweiher   2017-08-19
Hmm...sure puts all the criticism Apple got for not waiting for Kaby Lake for its MacBook Pros into perspective.

We are in an effective post-Moore's law world, and have been for a couple of years. Yes, we can still put more transistors on the chip, but we are pretty much done with single core performance, at least until some really big breakthrough.

On the other hand, as another poster pointed out, we really don't need all that much more performance, as most of the performance of current chips isn't actually put to good use, but instead squandered[1]. (My 1991 NeXT Cube with 25 MHz '40 was pretty much as good for word processing as anything I can get now, and you could easily go back further).

Most of the things that go into squandering CPU don't parallelize well, so removing the bloat is actually starting to become cheaper again than trying to combat it with more silicon. And no, I am not just saying that to promote my upcoming book[2], I've actually been saying the same thing since before I started writing it.

Interesting times.

[1] https://www.amazon.com/MACOS-PERFORMANCE-TUNING-Developers-L...