Skip to content

Case insensitive string set#658

Draft
CTMacUser wants to merge 4 commits into
apple:mainfrom
CTMacUser:case-insensitive-string-set
Draft

Case insensitive string set#658
CTMacUser wants to merge 4 commits into
apple:mainfrom
CTMacUser:case-insensitive-string-set

Conversation

@CTMacUser

Copy link
Copy Markdown
Contributor

Introduces CaseInsensitiveStringSet, a sorted set of String that uses case-insensitive comparisons for ordering and equivalence.

The API is that CaseInsensitiveStringSet conforms to Comparable, CustomDebugStringConvertible, CustomReflectable, CustomStringConvertible, Decodable, Encodable, Hashable, Sequence, and SetAlgebra. A subset of RangeReplaceableCollection provides common functionality that the previous protocols don't cover.

The functions in "SortedSetMerge.swift" evaluate operations combining two sorted sets. rawSortedMerge(of: and:) vends the merger of the sets, giving an also-sorted result. Those results are expressed as one or two values with their category. Categories are: values only from the first sequence, values only from the second sequence, and values that the two sequences share. A shared vended element gives out the element value from both sources. This function doesn't necessarily need to be used directly, but it implements the actual user-facing functions.

sortedMerge(between: and: retaining:) vends a sorted sequence that applies the given set operation to combine the two given source sequences. doesSortedMerger(of: and: haveExclusivesToFirst: hasExclusivesToSecond: haveSharedElements:) checks the count of each result category, returning whether it matches the given pattern. A category can be checked for not being there, being there, or ignored.

The SortedSet type, in "SortedSetToo.swift", uses the above functions to implement SetAlgebra with an internal skip-list structure. SortedSet is not templated on Element directly, but through the protocol Orderable (in "Orderable.swift"). Orderable types specify their Element type and three functions that define ordering. You only have to implement the function that's an analogue to the less-than operation. The equal-to and greater-than operator analogues have default implementations that use the less-than function.

CaseInsensitiveStringSet wraps a SortedSet implementing an Orderable instantiation that uses case-insensitive comparisons from the Foundation system library. (The nil locale is used in `String.compare_: options: range: locale:)

When being printed, the regular version will print out the canonized casing for each element. (The hash implementation uses this form.) The debug version will print out the string that is actually stored.

Checklist

  • I've read the Contribution Guidelines
  • My contributions are licensed under the Swift license.
  • I've followed the coding style of the rest of the project.
  • I've added tests covering all new code paths my change adds to the project (if appropriate).
  • I've added benchmarks covering new functionality (if appropriate).
  • I've verified that my change does not break any existing tests or introduce unexplained benchmark regressions.
  • I've updated the documentation if necessary.

CTMacUser added 4 commits May 21, 2026 00:00
Add a type that can be a skeleton for a sorted-set. Useful for un-`Hashable` types or custom comparisons.

Add protocol to define custom comparison semantics. Add functions (with helper types) that perform set operations and check for certain combinations of elements.

Since the set must always be stored sorted, add conformance to Sequence and Comparable to take advantage.
Relax a method's visibility to be non-private, so it can be called from extension files. Improve the implementation of the self-mutating versions of set operations. Add a method to remove the first element of a set without comparing its value first.
Add _ensureUnique() calls to the element-level mutating set operations.
Introduce CaseInsensitiveStringSet as an internally-sorted list of strings, but any comparisons between strings are case-insensitive.

This type conforms to Comparable, CustomDebugStringConvertible, CustomReflectable, CustomStringConvertible, Decodable, Encodable, Hashable, Sequence, and SetAlgebra.
There are extra members that are a subset of RangeReplaceableCollection.
@CTMacUser

Copy link
Copy Markdown
Contributor Author

The inspiration came from the extension Sequence method uniqued from the swift-algorithms library. Using a new set type to determine equivalence would be more exact than uniqued. But uniqued is a term of art, and the function is a lot easier to implement, and my suggestion is major overkill for most uses.

Then I realized that I had a different problem. The Standard library's Set type is its sole (non-option based) set. I couldn't test my idea without a new set type that doesn't work off Hashable. CaseInsensitiveStringSet here is an example, and has a use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant