Design decisions for AIProxy's Anthropic Swift SDK

AIProxySwift has been updated to publicly expose additional Anthropic functionality to users. The update, which started from a simple feature request, had me revisit fundamental decisions baked into the library.

I arrived at the position that it's our responsibility to surface as much provider-specific functionality as possible, at the cost of use-site ergonomics and any ambitions of a unified API.

The new design acknowledges facts:

AI providers will continue to adapt their APIs in special-purpose ways.

Adopting provider-specific types and patterns is better than custom design for contributors, customers, and LLMs.

.md

The Anthropic client within the library has been rewritten with these views in mind. In the process, breaking changes were introduced. If you're an existing customer, see the migration guide which contains code-level details of how to resolve them.

How we prevent future breaking changes

Don't destructure enum cases

We no longer destructure the fields of union types into enum cases, and instead give each case its own type:

// We no longer do this:
enum MyUnion {
   case textBlock(String)
   case imageBlock(data: Data, mimeType: String)
}

// In favor of this:
enum MyUnion {
  case textBlock(TextBlock)
  case imageBlock(ImageBlock)
}

It's not immediately obvious why this is any better, because it adds more indirection to the call site, which is annoying. But consider when Anthropic adds something like a prompt caching feature. In the first case, we have to update the enum:

enum MyUnion {
   case textBlock(String, cacheControl: CacheControl?)
   case imageBlock(data: Data, mimeType: String, cacheControl: CacheControl?)
}

And the next time our customers update the library they find the compiler throws errors at them because their existing pattern matching case is incorrect.

Contrasting that with the dedicated structs, the enum case stays like this:

case textBlock(TextBlock)

And an optional field is added to TextBlock:

struct TextBlock: Decodable {
   let text: String
   let cacheControl: CacheControl?
}

Existing call-sites continue to compile with this scheme.

Introduce indirection

Consider streaming responses with multiple types of deltas (e.g. a text delta and a thinking text delta). One way to accommodate them is to collapse them into a single helper, e.g.:

for try await chunk in stream {
    print(chunk.deltaContent)
}

This is undeniably a simple call-site, but it has multiple problems:

the developer has no way of routing thinking text into a different spot of the UI than non-thinking text
New delta types may be introduced that can not be reasonably fit into this pattern. Consider citations_delta, how would you fit this in?

Verbose call-sites with indirection allow for flexibility when Anthropic introduces new deltas. Version 0.135.0 uses the following form

for try await case let .contentBlockDelta(contentBlockDelta) in stream {
    switch contentBlockDelta.delta {
    case .textDelta(let textDelta):
        print("Received a text delta: \(textDelta.text)")
    case .inputJSONDelta(let inputJSONDelta):
        print("Received an inputJSONDelta: \(inputJSONDelta.partialJSON)")
    case .citationsDelta(let citationsDelta):
        print("Received a citations delta: \(citationsDelta.citation)")
    case .thinkingDelta(let thinkingDelta):
        print("Received a thinking delta: \(thinkingDelta.thinking)")
    case .signatureDelta(let signatureDelta):
        print("Received a signature delta: \(signatureDelta.signature)")
    case .futureProof:
        continue
    }
}

Add a .futureProof case

You may have noticed the .futureProof case at the end of the snippet above. Why is that needed? Consider that Anthropic streams deltas like so:

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello!"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" How"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" can I help?"}}

What happens if they add an additional content_block_delta at a future date that live clients (meaning a developer's released app) do not know the structure of?

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"newly_added_delta","foo":"bar"}}

Our call site would throw and the developer's catch handler would be entered. We don't want that, because there are potentially more content_block_delta events in the stream that the live client does know how to handle. For the best experience of your end customers, we defensively map the newly_added_delta event to .futureProof and keep processing the stream.

Other changes

I'm no longer using the .init shorthand in sample code. It seems to confuse our customers, and it's too easy to break Xcode's cmd-click jump to definition feature. The jump fails when a named parameter is misspelled or any of the arguments is bad (i.e. precisely when you need to look up the definition). In contrast, spelling out the full type always helps Xcode jump correctly.
The secondsToWait argument is now required on all Anthropic calls. This makes our customers think about what makes sense for their use case, rather than assuming that URLSession's default timeout of 60 seconds is appropriate.
Providing accumulation helpers for tool calls. Previous versions did the accumulation for you, but offered no way to show partial responses as they arrived in a UI. The library helped greatly with one use case but made another use case impossible. Now, customers can use AnthropicToolCallAccumulator if they see fit.

Summary

This library is more explicit now, at least the Anthropic portion of it. I intend to adapt more providers to this design, which optimizes for:

Contributors getting their bearings quickly by mapping 1:1 to Anthropic's docs
LLMs finding the right types easily
Existing call-sites continuing to compile as the Anthropic API evolves
Anthropic-specific functionality surfaced through the public interface, as it may be our customer's secret sauce