Learning Core Audio: A Hands-On Guide to Audio Programming for Mac and iOS

Category: Engineering
Author: Chris Adamson, Kevin Avila
4.5
All Stack Overflow 15
This Year Stack Overflow 5
This Month Stack Overflow 4

Comments

by holy_city   2019-09-09
I was a huge fan of your app! We used to play around with it a bit in high school.

To be fair to Apple here (and this was way before my time, so I'm probably mixing stuff up) - Garageband dates to 2004 with the eMagic acquisition and was (iirc) built on top of the Logic Pro audio engine. And back then, you still had to deal with the raw CoreAudio API (if anyone wants an adventure, try finding the documentation for it... you'll have to generate it yourself!) on MacOS. I recently had to dig through the old CoreAudio mailing list and read a book [1] on the API for low latency stuff, and I'm guessing that in terms of their engine there wasn't much needing change (at least architecturally) to port to iOS. I seem to recall the biggest difference between desktop/mobile versions was the featureset and UI, I think the problem you mention was present even in Logic. You had to get hacky with ES24's key/velocity maps to do what you mention.

From working in the pro audio world for awhile, I've found that other folks in the space just love to talk shop about how they do things. Audio devs are nerds like that. Doesn't always mean people are stealing from others.

[1] https://www.amazon.com/Learning-Core-Audio-Hands-Programming...

by anonymous   2019-07-21

Solution is to use ExtAudioFile()

I found it reading the most excellent Core-Audio bible

by anonymous   2019-07-21

Core Audio is the Apple way and there are lots of examples online for working with it. To apply filters you would use AudioUnits

A new audio engine called The Amazing Audio Engine has just been released built on top of CoreAudio which might be useful.

A good book is Chris Adamson's Learning Core Audio: A Hands-on Guide to Audio Programming for Mac and iOS

by anonymous   2019-07-21

I feel that this book has a good beginning overview of Audio Units (better than any Apple provides) and equips the reader with the tools to build more complicated programs.

http://www.amazon.com/Learning-Core-Audio-Hands-On-Programming/dp/0321636848

Chapter 7 in the book has an example of adding a reverb effect. So you can use that to guide you on whatever audio effects you want to add. As for the equalizer functions, there are audio units for it, also located in Chapter 7. If none of these accomplish what you want, you can always intercept the PCM audio data in the callback and manipulate the raw data. Though that does require you to know some DSP, which is too advanced for me.

Here are some useful posts I found:

http://www.deluge.co/?q=content/coreaudio-iphone-creating-graphic-equalizer

http://www.musicdsp.org/files/Audio-EQ-Cookbook.txt

Getting started with programmatic audio

by anonymous   2017-08-20

Not sure how much of an answer this is, but there will be too much text and links for a comment and hopefully it will help (maybe guide you to your answer).

First off I know with my current project adjusting the sample rate will effect the speed of the sound, so you can try to play with those settings. But 44k is what I see in most default implementation including the apple example SpeakHere. However I would spend some time comparing your code to that example because there are quite a few differences. like checking before enqueueing.

First check out this posting https://stackoverflow.com/a/4299665/530933 It talks about how you need to know the audio format, specifically how many bytes in a frame, and casting appropriately

also good luck. I have had quite a few questions posted here, apple forums, and the ios forum (not the official one). With very little responses/help. To get where I am today (audio recording & streaming in ulaw) I ended up having to open an Apple Dev Support Ticket. Which prior to tackling the audio I never knew existed (dev support). One good thing is that if you have a valid dev account you get 2 incidents for free! CoreAudio is not fun. Documentation is sparse, and besides SpeakHere there are not many examples. One thing I did find is that the framework headers do have some good info and this book. Unfortunately I have only started the book otherwise I may be able to help you further.

You can also check some of my own postings which I have tried to answer to the best of my abilities. This is my main audio question which I have spent alot of time on to compile all pertinent links and code.

using AQRecorder (audioqueue recorder example) in an objective c class

trying to use AVAssetWriter for ulaw audio (2)

by anonymous   2017-08-20

I will answer your second question first - don't wait for the app to crash, you can stop pulling audio from the track by checking the number of samples that are available in the CMSampleBufferRef you are reading; for example (this code will also be included in the 2nd half of my answer):

CMSampleBufferRef sample;
sample = [readerOutput copyNextSampleBuffer];

CMItemCount numSamples = CMSampleBufferGetNumSamples(sample);

if (!sample || (numSamples == 0)) {
  // handle end of audio track here
  return;
}

Regarding your first question, it depends on the type of audio you are grabbing - it could be wither PCM (non-compressed) or VBR (compressed) format. I'm not even going to bother addressing the PCM part because it's simply not smart to send uncompressed audio data from one phone to another over the network - it's unnecessarily expensive and will clog your networking bandwidth. So we're left with VBR data. For that you've got to send the contents of AudioBuffer and AudioStreamPacketDescription you pulled from the sample. But then again, it's probably best to explain what I'm saying by code:

-(void)broadcastSample
{
    [broadcastLock lock];

CMSampleBufferRef sample;
sample = [readerOutput copyNextSampleBuffer];

CMItemCount numSamples = CMSampleBufferGetNumSamples(sample);

if (!sample || (numSamples == 0)) {
    Packet *packet = [Packet packetWithType:PacketTypeEndOfSong];
    packet.sendReliably = NO;
    [self sendPacketToAllClients:packet];
    [sampleBroadcastTimer invalidate];
    return;
}


        NSLog(@"SERVER: going through sample loop");
        Boolean isBufferDataReady = CMSampleBufferDataIsReady(sample);



        CMBlockBufferRef CMBuffer = CMSampleBufferGetDataBuffer( sample );                                                         
        AudioBufferList audioBufferList;  

        CheckError(CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(
                                                                           sample,
                                                                           NULL,
                                                                           &audioBufferList,
                                                                           sizeof(audioBufferList),
                                                                           NULL,
                                                                           NULL,
                                                                           kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
                                                                           &CMBuffer
                                                                           ),
                   "could not read sample data");

        const AudioStreamPacketDescription   * inPacketDescriptions;

        size_t                               packetDescriptionsSizeOut;
        size_t inNumberPackets;

        CheckError(CMSampleBufferGetAudioStreamPacketDescriptionsPtr(sample, 
                                                                     &inPacketDescriptions,
                                                                     &packetDescriptionsSizeOut),
                   "could not read sample packet descriptions");

        inNumberPackets = packetDescriptionsSizeOut/sizeof(AudioStreamPacketDescription);

        AudioBuffer audioBuffer = audioBufferList.mBuffers[0];



        for (int i = 0; i < inNumberPackets; ++i)
        {

            NSLog(@"going through packets loop");
            SInt64 dataOffset = inPacketDescriptions[i].mStartOffset;
            UInt32 dataSize   = inPacketDescriptions[i].mDataByteSize;            

            size_t packetSpaceRemaining = MAX_PACKET_SIZE - packetBytesFilled - packetDescriptionsBytesFilled;
            size_t packetDescrSpaceRemaining = MAX_PACKET_DESCRIPTIONS_SIZE - packetDescriptionsBytesFilled;        

            if ((packetSpaceRemaining < (dataSize + AUDIO_STREAM_PACK_DESC_SIZE)) || 
                (packetDescrSpaceRemaining < AUDIO_STREAM_PACK_DESC_SIZE))
            {
                if (![self encapsulateAndShipPacket:packet packetDescriptions:packetDescriptions packetID:assetOnAirID])
                    break;
            }

            memcpy((char*)packet + packetBytesFilled, 
                   (const char*)(audioBuffer.mData + dataOffset), dataSize);

            memcpy((char*)packetDescriptions + packetDescriptionsBytesFilled, 
                   [self encapsulatePacketDescription:inPacketDescriptions[i]
                                         mStartOffset:packetBytesFilled
                    ],
                   AUDIO_STREAM_PACK_DESC_SIZE);  


            packetBytesFilled += dataSize;
            packetDescriptionsBytesFilled += AUDIO_STREAM_PACK_DESC_SIZE; 

            // if this is the last packet, then ship it
            if (i == (inNumberPackets - 1)) {          
                NSLog(@"woooah! this is the last packet (%d).. so we will ship it!", i);
                if (![self encapsulateAndShipPacket:packet packetDescriptions:packetDescriptions packetID:assetOnAirID])
                    break;

            }

        }

    [broadcastLock unlock];
}

Some methods that I've used in the above code are methods you don't need to worry about, such as adding headers to each packet (I was creating my own protocol, you can create your own). For more info see this tutorial.

- (BOOL)encapsulateAndShipPacket:(void *)source
              packetDescriptions:(void *)packetDescriptions
                        packetID:(NSString *)packetID
{

    // package Packet
    char * headerPacket = (char *)malloc(MAX_PACKET_SIZE + AUDIO_BUFFER_PACKET_HEADER_SIZE + packetDescriptionsBytesFilled);

    appendInt32(headerPacket, 'SNAP', 0);    
    appendInt32(headerPacket,packetNumber, 4);    
    appendInt16(headerPacket,PacketTypeAudioBuffer, 8);   
    // we use this so that we can add int32s later
    UInt16 filler = 0x00;
    appendInt16(headerPacket,filler, 10);    
    appendInt32(headerPacket, packetBytesFilled, 12);
    appendInt32(headerPacket, packetDescriptionsBytesFilled, 16);    
    appendUTF8String(headerPacket, [packetID UTF8String], 20);


    int offset = AUDIO_BUFFER_PACKET_HEADER_SIZE;        
    memcpy((char *)(headerPacket + offset), (char *)source, packetBytesFilled);

    offset += packetBytesFilled;

    memcpy((char *)(headerPacket + offset), (char *)packetDescriptions, packetDescriptionsBytesFilled);

    NSData *completePacket = [NSData dataWithBytes:headerPacket length: AUDIO_BUFFER_PACKET_HEADER_SIZE + packetBytesFilled + packetDescriptionsBytesFilled];        



    NSLog(@"sending packet number %lu to all peers", packetNumber);
    NSError *error;    
    if (![_session sendDataToAllPeers:completePacket withDataMode:GKSendDataReliable error:&error])   {
        NSLog(@"Error sending data to clients: %@", error);
    }   

    Packet *packet = [Packet packetWithData:completePacket];

    // reset packet 
    packetBytesFilled = 0;
    packetDescriptionsBytesFilled = 0;

    packetNumber++;
    free(headerPacket);    
    //  free(packet); free(packetDescriptions);
    return YES;

}

- (char *)encapsulatePacketDescription:(AudioStreamPacketDescription)inPacketDescription
                          mStartOffset:(SInt64)mStartOffset
{
    // take out 32bytes b/c for mStartOffset we are using a 32 bit integer, not 64
    char * packetDescription = (char *)malloc(AUDIO_STREAM_PACK_DESC_SIZE);

    appendInt32(packetDescription, (UInt32)mStartOffset, 0);
    appendInt32(packetDescription, inPacketDescription.mVariableFramesInPacket, 4);
    appendInt32(packetDescription, inPacketDescription.mDataByteSize,8);    

    return packetDescription;
}

receiving data:

- (void)receiveData:(NSData *)data fromPeer:(NSString *)peerID inSession:(GKSession *)session context:(void *)context
{

    Packet *packet = [Packet packetWithData:data];
    if (packet == nil)
    {
         NSLog(@"Invalid packet: %@", data);
        return;
    }

    Player *player = [self playerWithPeerID:peerID];

    if (player != nil)
    {
        player.receivedResponse = YES;  // this is the new bit
    } else {
        Player *player = [[Player alloc] init];
        player.peerID = peerID;
        [_players setObject:player forKey:player.peerID];
    }

    if (self.isServer)
    {
        [Logger Log:@"SERVER: we just received packet"];   
        [self serverReceivedPacket:packet fromPlayer:player];

    }
    else
        [self clientReceivedPacket:packet];
}

notes:

  1. There are a lot of networking details that I didn't cover here (ie, in the receiving data part. I used a lot of custom made objects without expanding on their definition). I didn't because explaining all of that is beyond the scope of just one answer on SO. However, you can follow the excellent tutorial of Ray Wenderlich. He takes his time in explaining networking principles, and the architecture I use above is almost taken verbatim from him. HOWEVER THERE IS A CATCH (see next point)

  2. Depending on your project, GKSession may not be suitable (especially if your project is realtime, or if you need more than 2-3 devices to connect simultaneously) it has a lot of limitations. You will have to dig down deeper and use Bonjour directly instead. iPhone cool projects has a nice quick chapter that gives a nice example of using Bonjour services. It's not as scary as it sounds (and the apple documentation is kinda overbearing on that subject).

  3. I noticed you use GCD for your multithreading. Again, if you are dealing with real time then you don't want to use advanced frameworks that do the heavy lifting for you (GCD is one of them). For more on this subject read this excellent article. Also read the prolonged discussion between me and justin in the comments of this answer.

  4. You may want to check out MTAudioProcessingTap introduced in iOS 6. It can potentially save you some hassle while dealing with AVAssets. I didn't test this stuff though. It came out after I did all my work.

  5. Last but not least, you may want to check out the learning core audio book. It's a widely acknowledged reference on this subject. I remember being as stuck as you were at the point you asked the question. Core audio is heavy duty and it takes time to sink in. SO will only give you pointers. You will have to take your time to absorb the material yourself then you will figure out how things work out. Good luck!

by anonymous   2017-08-20

You can find an example that uses this ring buffer if you download the example code of the book Learning Core Audio here (under the downloads tab). Jump to the chapter 8 example in a folder called CH08_AUGraphInput.

However, if you are simply reading audio from a file, then using an (extra) ring buffer seems like an overkill.. A ring buffer comes in handy when you are having real time (or near real time) input and output (read chapter 8 in the said book for a more detailed explanation of when a ring buffer is necessary.. note that the example in chapter 8 is about playing audio immediately after recording it by a mic, which isn't what you want to do).

The reason why I said extra ring buffer, is because in core Audio there is already an audio Queue (which can be thought of as a ring buffer.. or at least it in your case it replaces the need for a ring buffer: you populate it with data, it plays the data, then fires a callback that informs you that the data you supplied has been played). The apple documentation offers a good explanation on this one.

In your case, if you are simply reading audio from a file, then you can easily control the throughput of the audio from the file. You can pause it by blocking the thread that reads data from the audio file for example.

For a simple example of what I'm talking about, see this example I created on github. For a more advanced example, see Matt Gallagher's famous example.

by anonymous   2017-08-20

I think you're right - in this case a 'direct-from-disk' buffering approach is probably what you need. I believe the correct AudioUnit subtype is AudioFilePlayer. From the documentation:

The unit reads and converts audio file data into its own internal buffers. It performs disk I/O on a high-priority thread shared among all instances of this unit within a process. Upon completion of a disk read, the unit internally schedules buffers for playback.

A working example of using this unit on Mac OS X is given in Chris Adamson's book Learning Core Audio. The code for iOS isn't much different, and is discussed in this thread on the CoreAudio-API mailing list. Adamson's working code example can be found here. You should be able to adapt this to your requirements.

by anonymous   2017-08-20

If you're serious about learning Core Audio, do yourself a favour and get this book. It got me started, and Core Audio is not easy by all means! http://www.amazon.com/Learning-Core-Audio-Hands-Programming/dp/0321636848

Pier.