Valid Anagram — Sorting, HashMap Counting, and the 26-Slot Array Optimisation

A admin · Jun 22, 2026 · 43 views

So far in this series we’ve used a HashMap to remember where we’ve seen a number (Two Sum) and we’ve walked through an array once to track a running minimum (Buy and Sell Stock). Today’s problem teaches a third pattern that shows up in everything — from anagram checks to permutation problems to plagiarism detection: counting things with a HashMap (or an array used as one). Valid Anagram is in the top five most-asked easy problems at Google, Amazon, Meta, Flipkart, and Razorpay. It looks like a 30-second problem. It isn’t — not if you want to impress the interviewer.

The Problem

You’re given two strings s and t. Return true if t is an anagram of s, otherwise false.

An anagram is just a rearrangement of the same letters. “listen” and “silent” are anagrams — same letters, same counts, different order.

Input:  s = "anagram",  t = "nagaram"
Output: true

Why?
s = a n a g r a m   → a:3, n:1, g:1, r:1, m:1
t = n a g a r a m   → a:3, n:1, g:1, r:1, m:1
Same letters, same counts. Anagram. ✅

──────────────────────────────────────────────

Input:  s = "rat",  t = "car"
Output: false

s = r a t   → r:1, a:1, t:1
t = c a r   → c:1, a:1, r:1
't' is in s but not in t. 'c' is in t but not in s. ❌

Clarifying questions to ask the interviewer (always ask before writing code):

“Is the comparison case-sensitive?” — Usually yes. "Listen" vs "silent" would return false because of the capital L.
“Are we limited to lowercase English letters, or could there be Unicode — emojis, accented characters, Chinese?” — This question alone is a senior signal. The answer changes the optimal solution.
“Are spaces and punctuation significant?” — Usually yes for this problem. Sentence-level anagrams (“dormitory” ↔ “dirty room”) are a different problem entirely.

Step 1 — The Brute Force (Sorting)

The simplest correct solution is sorting both strings and comparing. If they’re anagrams, sorting them produces the same string.

“Let me start with the simplest correct approach — if two strings are anagrams, sorting them gives the same sequence of characters. So I can sort both and compare.”

s = "anagram"   sorted → "aaagmnr"
t = "nagaram"   sorted → "aaagmnr"
Same? Yes → anagram ✅

s = "rat"       sorted → "art"
t = "car"       sorted → "acr"
Same? No  → not an anagram ❌

The sorting code

fun isAnagram(s: String, t: String): Boolean {
    if (s.length != t.length) return false
    // Quick early exit — different lengths can NEVER be anagrams

    return s.toCharArray().sorted() == t.toCharArray().sorted()
    // toCharArray() is an EXTENSION FUNCTION on String — gives a CharArray
    // sorted() is an EXTENSION FUNCTION on CharArray — returns a sorted List<Char>
    // == on Lists compares element by element
}

Complexity of sorting approach

Time   : O(n log n)
            Sorting dominates. Most JVM sorts are TimSort or
            dual-pivot quicksort — both O(n log n).

Space  : O(n)
            sorted() allocates a new List of characters.
            Some interviewers will accept O(1) or O(log n) if you
            sort in place with a primitive char[], but for the
            Kotlin idiom above, it's O(n).

This works. It’s clean. The interviewer will ask for better — not because sorting is wrong, but because we’re doing more work than the problem requires.

Step 2 — The Interviewer Asks “Can You Do It in Linear Time?”

Here’s your line:

“Yes. Sorting is doing more than I need. I don’t actually care about the order of the letters — I just need to know that both strings have the same letters in the same counts. So I can count occurrences with a HashMap (or a small fixed array for lowercase English letters) in one pass each, then compare the counts.”

That sentence tells the interviewer: I understand what the problem actually requires, not just how to satisfy it.

Step 3 — Counting with a HashMap

The mental shift: an anagram check is just a multiset equality check. Two strings are anagrams when the set of (letter, count) pairs is identical.

Algorithm:

Walk s and count each character into a map.
Walk t and decrement each character’s count.
If any count ends up non-zero, it’s not an anagram.

The reason we decrement instead of building two maps is elegant — if both strings have the same letters in the same counts, every increment in step 1 gets cancelled by a decrement in step 2. Final state should be all zeros.

Full trace — s = "anagram", t = "nagaram"

Lengths: 7 == 7 ✅ (proceed)

Pass 1 — count characters in s = "anagram":
   a:1
   a:1, n:1
   a:2, n:1
   a:2, n:1, g:1
   a:2, n:1, g:1, r:1
   a:3, n:1, g:1, r:1
   a:3, n:1, g:1, r:1, m:1

Final map after s:
   { a:3, n:1, g:1, r:1, m:1 }

──────────────────────────────────────────────
Pass 2 — decrement characters in t = "nagaram":

   n:  count was 1 → now 0
   a:  count was 3 → now 2
   g:  count was 1 → now 0
   a:  count was 2 → now 1
   r:  count was 1 → now 0
   a:  count was 1 → now 0
   m:  count was 1 → now 0

Final map after t:
   { a:0, n:0, g:0, r:0, m:0 }

──────────────────────────────────────────────
Check: any count != 0?  No → return true ✅

Trace for a non-anagram — s = "rat", t = "car"

Lengths: 3 == 3 ✅ (proceed)

Pass 1, count s = "rat":
   { r:1, a:1, t:1 }

Pass 2, decrement t = "car":
   c: not in map yet → goes to -1
   a: was 1 → now 0
   r: was 1 → now 0

Final map: { r:0, a:0, t:1, c:-1 }

Some counts are not zero → return false ✅

The HashMap code

fun isAnagram(s: String, t: String): Boolean {
    if (s.length != t.length) return false

    val counts = mutableMapOf<Char, Int>()
    // mutableMapOf() is a TOP-LEVEL FUNCTION — builds a LinkedHashMap
    // Key   = a character
    // Value = how many times it appeared in s (after pass 1)
    //         then net difference between s and t (after pass 2)

    for (c in s) {
        counts[c] = (counts[c] ?: 0) + 1
        // ?: is Kotlin's ELVIS OPERATOR — "use 0 if null"
        // "counts[c] ?: 0" handles the first time we see this character
    }

    for (c in t) {
        counts[c] = (counts[c] ?: 0) - 1
        // Decrement. Could go negative if t has a character that's not in s.
    }

    return counts.values.all { it == 0 }
    // all { } is an EXTENSION FUNCTION on Iterable — true if every element matches
}

Complexity

Time   : O(n)
            Two single passes through strings of length n.
            HashMap operations are O(1) on average.

Space  : O(k)
            where k = number of unique characters.
            For lowercase English, k ≤ 26 — effectively O(1).
            For full Unicode, k can be much larger.

The Optimisation — When You Know It’s Lowercase English

This is the part interviewers love, and it’s why the clarifying question about Unicode mattered. If you confirmed “lowercase English only,” you can replace the HashMap with a 26-slot integer array. Faster constant factor, simpler code, and an unambiguous O(1) space bound.

fun isAnagram(s: String, t: String): Boolean {
    if (s.length != t.length) return false

    val counts = IntArray(26)
    // IntArray(26) creates an int array of size 26, all zeros
    // Index 0 = 'a', index 1 = 'b', ..., index 25 = 'z'

    for (i in s.indices) {
        counts[s[i] - 'a']++
        // s[i] - 'a' converts the char to its 0-25 index
        // 'a' - 'a' = 0, 'b' - 'a' = 1, ..., 'z' - 'a' = 25
        counts[t[i] - 'a']--
        // Increment for s and decrement for t in the SAME loop — one pass
    }

    return counts.all { it == 0 }
}

Two improvements packed into this version. First, we fused the two passes into one — since both strings have the same length, we increment for s and decrement for t at the same index in a single loop. Second, the array lookup is dramatically faster than a HashMap lookup. No hashing, no collision handling, just direct indexing.

Time   : O(n)        (same as HashMap version, but smaller constant factor)
Space  : O(1)        (a fixed 26-slot array, regardless of input size)

The Subtle Bug — Forgetting the Length Check

Skip the if (s.length != t.length) return false guard and you have a quietly broken solution.

s = "aab"
t = "aabbb"

If we just count and decrement without the length check:
   After s: { a:2, b:1 }
   After t: { a:0, b:-2 }

Counts aren't all zero → we return false correctly here, just by accident.

But change it slightly:

s = "aab"
t = "ab"

   After s: { a:2, b:1 }
   After t: { a:1, b:0 }

Counts aren't all zero → false. Correct again.

So why is the length check important?
→ Performance and clarity. Without it, you walk both strings before
   discovering an obvious early-exit case. Worse, in interviews,
   forgetting the check signals you haven't thought about edge cases.

Always include the early exit. It costs one line and demonstrates rigour.

Edge Cases to Mention

1. Both empty: s = "", t = ""
   → true. Two empty strings ARE anagrams of each other.

2. One empty, one not: s = "", t = "a"
   → false. Caught by the length check.

3. Same string: s = "abc", t = "abc"
   → true. A string is trivially an anagram of itself.

4. Single character: s = "a", t = "a"  vs  s = "a", t = "b"
   → true / false respectively.

5. Repeating letters: s = "aa", t = "aa"  vs  s = "aa", t = "ab"
   → true / false. The counting handles repeats correctly.

6. Case difference: s = "Aa", t = "aa"
   → false IF case-sensitive (the default).
   → true if you confirmed case-insensitive with the interviewer
       and added .lowercase() to both strings.

7. Unicode: s = "café", t = "afcé"
   → true with the HashMap version.
   → CRASH with the IntArray version (é - 'a' is out of bounds).
   → This is why the clarifying question matters.

The Conversation an Interviewer Wants to Hear

You: "Quick clarifications — is the comparison case-sensitive, and are
      we limited to lowercase English or could there be Unicode?"
INT: "Case-sensitive, lowercase English."

You: "Got it. Simplest approach first — if two strings are anagrams,
      sorting them gives the same character sequence. So I sort both
      and compare."
     [write sorting solution]
     "That's O(n log n) time, O(n) space because of the new array."

INT: "Can you do it in linear time?"

You: "Yes. I don't actually need the letters sorted — I just need
      matching letter counts. So I count characters from s and decrement
      counts from t. If everything ends at zero, they're anagrams."
     [write HashMap solution]
     "O(n) time, O(k) space where k is the alphabet size — at most 26
      for lowercase English."

INT: "Can you reduce the space?"

You: "Since we confirmed lowercase English, I can swap the HashMap for
      a 26-element IntArray. Same algorithm, but array access is faster
      than hashing, and space is now a fixed 26 slots — O(1)."
     [write array solution]

INT: "What if the inputs were Unicode?"

You: "The IntArray version would crash — the index 'é' - 'a' isn't in
      [0, 26). I'd fall back to the HashMap version, which handles any
      character. Or, if I cared about performance, I could allocate a
      larger fixed-size array sized to the relevant Unicode block."

INT: "Why the length check at the top?"

You: "Early exit — different lengths can never be anagrams, so I save
      the rest of the work. It also makes the intent of the function
      clear at a glance."

INT: "What if we cared about word-level anagrams — like 'dormitory'
      and 'dirty room'?"

You: "Different problem — I'd strip whitespace first, optionally
      normalise punctuation, then run the same character-counting logic
      on the cleaned strings."

Common Follow-Up Questions

Q1. Why HashMap over sorting?

Sorting is O(n log n); counting is O(n). For long strings the difference is real. Counting also expresses the intent more directly — the problem is about letter counts, not letter order.

Q2. Why one map instead of two?

Two maps work but waste memory and require an explicit equality comparison at the end. One map with increment-then-decrement is the same algorithm with half the storage and a simpler final check (all zero?). Mention you know both approaches but prefer the single-map version.

Q3. What about using a Set?

A Set tells you which characters appeared but not how many times. That’s wrong for anagrams — "aab" and "ab" have the same character set but different counts. A Set check would call them anagrams incorrectly.

Q4. Could we use the product or sum of character codes as a fingerprint?

Surprising number of candidates suggest this. The answer is no — sums collide ("ad" and "bc" have the same sum of ASCII codes) and products overflow quickly. Counting is the only reliable O(n) approach. If they push, you can mention prime number hashing — assigning a unique prime to each letter and multiplying — mathematically valid but practically useless because the products overflow 64-bit integers after about 12 characters.

Q5. How would you handle a stream — one character at a time?

The single-pass IntArray version is already stream-friendly for s. For t, decrement as characters arrive and at the end check that all counts are zero. Total memory: O(k), independent of stream length.

Q6. What if we wanted to find ALL anagrams of s inside a longer string?

That’s “Find All Anagrams in a String” (LeetCode 438) — a sliding window problem. Same counting trick, but the window slides one character at a time across the longer string, updating counts in O(1) per move. Worth mentioning the family without solving it unless asked.

Quick Reference

SORTING (BRUTE FORCE)
─────────────────────
- Idea       : Sort both strings, compare
- Time       : O(n log n)
- Space      : O(n)  (or O(log n) with in-place primitive sort)
- When       : Opening move in an interview

HASHMAP (OPTIMAL — GENERAL CASE)
──────────────────────────────────
- Idea       : Count s, decrement t, check all zeros
- Time       : O(n)
- Space      : O(k)  (k = alphabet size)
- When       : Strings can contain Unicode

INTARRAY (OPTIMAL — LOWERCASE ENGLISH)
────────────────────────────────────────
- Idea       : 26-slot array, increment for s, decrement for t, in one loop
- Time       : O(n)  (smaller constant factor than HashMap)
- Space      : O(1)
- When       : You've confirmed input is lowercase English only

ANSWER ORDER IN AN INTERVIEW
────────────────────────────
1. Clarify (case-sensitive? Unicode? whitespace?)
2. Sorting solution + complexity
3. Name the inefficiency (we don't need sorted order)
4. HashMap counting + complexity
5. If alphabet is known, swap to IntArray + complexity
6. Edge cases (empty, same letters but different counts, Unicode)
7. Mention sliding-window family if pushed

Wrap-Up

The pattern this problem teaches is bigger than anagrams: when you need to compare collections by content rather than order, count. That single idea solves Group Anagrams, Find All Anagrams in a String, Permutation in String, and a dozen other problems with minor variations. The HashMap-vs-IntArray choice you made here — general purpose vs known-alphabet — comes back over and over: in problems with digits (10 slots), DNA characters (4 slots), bitmasks (you can compress a 26-slot count into a single 32-bit integer for some problems).

Three problems in, three distinct patterns: HashMap for lookups (Two Sum), running variable for single-pass tracking (Buy and Sell Stock), and counting for content comparison (Valid Anagram). Practice the brute-force-then-optimise script with all three. It’s the same conversation rhythm regardless of the underlying problem — and once it’s muscle memory, your interviews stop feeling like trivia tests and start feeling like collaborative problem-solving.

Happy coding!

43 views · 0 comments

Comments (0)

No comments yet. Be the first to share your thoughts.