Skip to content

Cache type intersection results in QueryComplexity analyzer#5631

Open
sobrinho wants to merge 6 commits into
rmosolgo:masterfrom
sobrinho:claude/tender-nobel-b601b8
Open

Cache type intersection results in QueryComplexity analyzer#5631
sobrinho wants to merge 6 commits into
rmosolgo:masterfrom
sobrinho:claude/tender-nobel-b601b8

Conversation

@sobrinho
Copy link
Copy Markdown

types_intersect? was calling query.types.possible_types on every check, with an O(n) linear scan to find a common type. For queries with many abstract types (interfaces/unions), this became a bottleneck because the same type pairs are rechecked repeatedly during analysis.

Two caches are added to the analyzer instance:

  • @possible_types_cache – memoizes possible_types(type) as a Set so membership lookups are O(1) instead of O(n).
  • @intersect_cache – memoizes the boolean result of each pair (a, b) using a composite key built from their object_ids, so the intersection is computed at most once per pair per query execution.

Also adds benchmark/complexity.rb to measure the impact. The benchmark shows ~7× throughput improvement (2.8 i/s → 19.9 i/s) on a query that exercises repeated complexity analysis across abstract types.

@sobrinho
Copy link
Copy Markdown
Author

In the real world, this dropped our GraphQL/analyze from ~806ms to ~176ms on a majestic monolith.

`types_intersect?` was calling `query.types.possible_types` on every
check, with an O(n) linear scan to find a common type. For queries with
many abstract types (interfaces/unions), this became a bottleneck because
the same type pairs are rechecked repeatedly during analysis.

Two caches are added to the analyzer instance:

- `@possible_types_cache` – memoizes `possible_types(type)` as a Set so
  membership lookups are O(1) instead of O(n).
- `@intersect_cache` – memoizes the boolean result of each pair
  (a, b) using a composite key built from their object_ids, so the
  intersection is computed at most once per pair per query execution.

Also adds `benchmark/complexity.rb` to measure the impact. The benchmark
shows ~7× throughput improvement (2.8 i/s → 19.9 i/s) on a query that
exercises repeated complexity analysis across abstract types.
@sobrinho sobrinho force-pushed the claude/tender-nobel-b601b8 branch from 57d6a78 to dfce8ee Compare May 13, 2026 19:47
@rmosolgo
Copy link
Copy Markdown
Owner

Hey, thanks so much for investigating this and sharing your wins! I definitely want to incorporate this improvement. Could you do a couple things before I merge it?

  1. Remove the benchmark -- I don't have plans to keep running it, so I'd rather not version it with the gem source. For future reference, it's in the bottom of this comment.

  2. Could you document how key = ... works? I think that would help the next reader make sense of it. If I understood right, the key has two 32-bit slots, with the smaller of the two IDs being stored in the leftmost slot, and the larger one in the right slot. Why is 32 bits the right number here? Are Ruby object IDs guaranteed to be no greater than that?

Query Complexity possible types bottleneck benchmark

# frozen_string_literal: true
require "bundler/setup"
require "graphql"
require "benchmark"
require "benchmark/ips"

CONCRETE_TYPES_COUNT = 5_000
QUERY = "{ myObject { ... on MyInterface { name } } }"

module MyInterface
  include GraphQL::Schema::Interface

  field :name, String, null: false
end

CONCRETE_TYPES = (1..CONCRETE_TYPES_COUNT).map do |i|
  Class.new(GraphQL::Schema::Object) do
    implements MyInterface

    graphql_name "MyConcreteObject#{i}"

    field :name, String, null: false
  end
end

class MyQueryType < GraphQL::Schema::Object
  field :my_object, MyInterface, null: false

  def my_object
    { name: "Gabriel Sobrinho" }
  end
end

class MySchema < GraphQL::Schema
  query MyQueryType

  orphan_types CONCRETE_TYPES

  max_complexity 1_000

  complexity_cost_calculation_mode :compare

  def self.resolve_type(_type, _obj, _ctx)
    CONCRETE_TYPES[0]
  end
end

# Warmup
errors = MySchema.validate(QUERY)
warn errors.inspect if errors.any?

Benchmark.ips do |x|
  x.report("Running query with complexity analysis") do
    MySchema.execute(QUERY)
  end
end

sobrinho added 2 commits May 13, 2026 17:33
The benchmark was useful during development to measure the impact of the
caching changes, but it is not intended to be maintained or run regularly
as part of the gem. Removing it to keep the source tree clean.
The original key used a 32-bit shift, which is not safe: Ruby object IDs
on 64-bit platforms can exceed 32 bits (they are derived from memory
addresses), so ORing two IDs with only a 32-bit gap risks overlap and
silent cache collisions.

Switch to a 64-bit shift. Ruby integers are arbitrary-precision, so the
operation is completely lossless. Also adds a comment explaining the key
construction for future readers.
@sobrinho
Copy link
Copy Markdown
Author

@rmosolgo removed the benchmark and also fixed the key calculation to take into account it can be 64 bits (comment added as well).

@sobrinho
Copy link
Copy Markdown
Author

To add context, at first I tried to use an array like [a, b] for the key but that allocates one array on every call and that made GC kick-in too much. That's why I ended using the bitwise operation which won't allocate an array.

@rmosolgo
Copy link
Copy Markdown
Owner

Out of curiosity, I tried a couple of other optimizations here. First, I ran the benchmark locally on 6a15de3:

6a15de309a Fix intersect cache key to avoid collisions on 64-bit systems
ruby 4.0.2 (2026-03-17 revision d3da9fec82) +PRISM [arm64-darwin24]
Warming up --------------------------------------
Running query with complexity analysis
                         4.000 i/100ms
Calculating -------------------------------------
Running query with complexity analysis
                         40.720 (± 2.5%) i/s   (24.56 ms/i) -    204.000 in   5.015398s

Then, I added .compare_by_identity to both cache hashes and switched == to .equal?:

compare_by_identity and equal?

diff --git a/lib/graphql/analysis/query_complexity.rb b/lib/graphql/analysis/query_complexity.rb
index fd5f1d91ee..73d2eadcfd 100644
--- a/lib/graphql/analysis/query_complexity.rb
+++ b/lib/graphql/analysis/query_complexity.rb
@@ -9,8 +9,8 @@ module GraphQL
         super
         @skip_introspection_fields = !query.schema.max_complexity_count_introspection_fields
         @complexities_on_type_by_query = {}
-        @intersect_cache = {}
-        @possible_types_cache = {}
+        @intersect_cache = {}.compare_by_identity
+        @possible_types_cache = {}.compare_by_identity
       end
 
       # Override this method to use the complexity result

diff --git a/lib/graphql/analysis/query_complexity.rb b/lib/graphql/analysis/query_complexity.rb
index 73d2eadcfd..bdeb3971fb 100644
--- a/lib/graphql/analysis/query_complexity.rb
+++ b/lib/graphql/analysis/query_complexity.rb
@@ -160,7 +160,7 @@ module GraphQL
       end
 
       def types_intersect?(query, a, b)
-        return true if a == b
+        return true if a.equal?(b)
 
         id_a, id_b = a.object_id, b.object_id
         # Build a symmetric composite key: smaller ID in the high 64 bits, larger in the low

And ran the benchmark again. For me, it produced a small (1.7%) speedup:

363061f3ca Try identity operations
ruby 4.0.2 (2026-03-17 revision d3da9fec82) +PRISM [arm64-darwin24]
Warming up --------------------------------------
Running query with complexity analysis
                         4.000 i/100ms
Calculating -------------------------------------
Running query with complexity analysis
                         41.421 (± 2.4%) i/s   (24.14 ms/i) -    208.000 in   5.024739

Then I tried a larger change, using a two-layer compare_by_identity cache with classes as keys (pushed here as 7128ec8) and got a bit more speed (3.6% faster than baseline):

7128ec80a8 Use a two-layer cache by identity
ruby 4.0.2 (2026-03-17 revision d3da9fec82) +PRISM [arm64-darwin24]
Warming up --------------------------------------
Running query with complexity analysis
                         4.000 i/100ms
Calculating -------------------------------------
Running query with complexity analysis
                         42.196 (± 2.4%) i/s   (23.70 ms/i) -    212.000 in   5.027583s

I pushed this change here because it turned out to be faster ... and honestly, I feel a bit better about it because it doesn't use object_id-based key, which is not a technique I've seen in other Ruby projects. What do you think about it?


Also, not related to the microbenchmark, but maybe helpful for your app. Are you already using the new GraphQL::Schema::Visibility module?

use GraphQL::Schema::Visibility

This also produced a huge (75% over baseline) speedup:

ruby 4.0.2 (2026-03-17 revision d3da9fec82) +PRISM [arm64-darwin24]
Warming up --------------------------------------
Running query with complexity analysis
                         6.000 i/100ms
Calculating -------------------------------------
Running query with complexity analysis
                         71.261 (± 4.2%) i/s   (14.03 ms/i) -    360.000 in   5.060198s

You can find migration notes for this new implementation here: https://graphql-ruby.org/authorization/visibility.html#migration-notes

@rmosolgo
Copy link
Copy Markdown
Owner

🙈 I was totally sniped. I ran stackprof and saw lots of GC time so I invested object allocations and found a hot-path allocation that could be eliminated: .each_key.reduce was making an Array under the hood. I refactored out the .reduce call to use plain old .each_key. I also changed the flow of loops, and together I got a bit more speed:

Running query with complexity analysis
                         44.178 (± 2.3%) i/s   (22.64 ms/i) -    224.000 in   5.075867s

~8.5% above baseline now 😁

@sobrinho
Copy link
Copy Markdown
Author

I don't think equal? is faster than == 🤔

require "benchmark"
require "benchmark/ips"

a = Object.new
b = Object.new

Benchmark.ips do |x|
  x.report("== true") { a == a }
  x.report("== false") { a == b }
  x.report("equal? true") { a.equal?(a) }
  x.report("equal? false") { a.equal?(b) }
end
ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [arm64-darwin25]
Warming up --------------------------------------
             == true     3.351M i/100ms
            == false     3.354M i/100ms
         equal? true     2.870M i/100ms
        equal? false     2.789M i/100ms
Calculating -------------------------------------
             == true     33.473M (± 0.5%) i/s   (29.87 ns/i) -    167.556M in   5.005765s
            == false     33.548M (± 0.3%) i/s   (29.81 ns/i) -    171.063M in   5.099028s
         equal? true     28.949M (± 0.3%) i/s   (34.54 ns/i) -    146.349M in   5.055474s
        equal? false     28.649M (± 0.4%) i/s   (34.91 ns/i) -    145.018M in   5.062011s

@sobrinho
Copy link
Copy Markdown
Author

I'm trying to write a fair benchmark on the nested lookup against the flat lookup to confirm. Give me a few :)

@rmosolgo
Copy link
Copy Markdown
Owner

Oops -- I went back to == in 3594864, it does actually seem a bit faster:

Calculating -------------------------------------
Running query with complexity analysis
                         44.518 (± 4.5%) i/s   (22.46 ms/i) -    224.000 in   5.038780s

@sobrinho
Copy link
Copy Markdown
Author

You are right about nested being better here!

Benchmark:

require "benchmark/ips"
require "set"

GC.disable if ENV["GC_DISABLE"]

nested_values = []
flat_values = []

Benchmark.ips do |x|
  x.report("flat write") do |times|
    $f = {}
    i = 0

    while i < times
      # create new object for each iteration
      # won't hurt performance since nested will also create new objects
      a, b = Object.new, Object.new

      # actual algorithm
      id_a, id_b = a.object_id, b.object_id
      key = id_a < id_b ? (id_a << 64) | id_b : (id_b << 64) | id_a

      $f[key] = true
      flat_values << [a, b]

      # next iteration
      i += 1
    end
  end

  x.report("nested write") do |times|
    $n = Hash.new { |h, k| h[k] = {}.compare_by_identity }.compare_by_identity
    i = 0

    while i < times
      a, b = Object.new, Object.new

      if a.object_id < b.object_id
        first_cache = $n[a]
        second_key = b
      else
        first_cache = $n[b]
        second_key = a
      end

      first_cache[second_key] = true
      nested_values << [a, b]

      i += 1
    end
  end

  x.compare!
end

Benchmark.ips do |x|
  x.report("empty flat cache miss") do |times|
    f = {}
    i = 0

    while i < times
      a, b = Object.new, Object.new
    
      id_a, id_b = a.object_id, b.object_id
      key = id_a < id_b ? (id_a << 64) | id_b : (id_b << 64) | id_a

      f.key?(key) && f[key]

      i += 1
    end
  end

  x.report("empty nested cache miss") do |times|
    n = Hash.new { |h, k| h[k] = {}.compare_by_identity }.compare_by_identity
    i = 0

    while i < times
      a, b = Object.new, Object.new
      
      if a.object_id < b.object_id
        first_cache = n[a]
        second_key = b
      else
        first_cache = n[b]
        second_key = a
      end

      first_cache.key?(second_key) && first_cache[second_key]

      i += 1
    end
  end
  
  x.compare!
end

Benchmark.ips do |x|
  x.report("flat cache miss") do
    a, b = Object.new, Object.new
    
    id_a, id_b = a.object_id, b.object_id
    key = id_a < id_b ? (id_a << 64) | id_b : (id_b << 64) | id_a

    $f.key?(key) && $f[key]
  end

  x.report("nested cache miss") do
    a, b = Object.new, Object.new
    
    if a.object_id < b.object_id
      first_cache = $n[a]
      second_key = b
    else
      first_cache = $n[b]
      second_key = a
    end

    first_cache.key?(second_key) && first_cache[second_key]
  end
  
  x.compare!
end

Benchmark.ips do |x|
  x.report("flat cache hit") do
    a, b = flat_values.sample
    
    id_a, id_b = a.object_id, b.object_id
    key = id_a < id_b ? (id_a << 64) | id_b : (id_b << 64) | id_a

    $f.key?(key) && $f[key]
  end

  x.report("nested cache hit") do
    a, b = nested_values.sample
    
    if a.object_id < b.object_id
      first_cache = $n[a]
      second_key = b
    else
      first_cache = $n[b]
      second_key = a
    end

    first_cache.key?(second_key) && first_cache[second_key]
  end

  x.compare!
end

Benchmark.ips do |x|
  x.report("flat cache miss then write") do |times|
    f = {}
    i = 0

    while i < times
      a, b = Object.new, Object.new
    
      id_a, id_b = a.object_id, b.object_id
      key = id_a < id_b ? (id_a << 64) | id_b : (id_b << 64) | id_a

      f.key?(key) && f[key]
      f[key] = true

      i += 1
    end
  end

  x.report("nested cache miss then write") do |times|
    n = Hash.new { |h, k| h[k] = {}.compare_by_identity }.compare_by_identity
    i = 0

    while i < times
      a, b = Object.new, Object.new
    
      if a.object_id < b.object_id
        first_cache = n[a]
        second_key = b
      else
        first_cache = n[b]
        second_key = a
      end

      first_cache.key?(second_key) && first_cache[second_key]
      first_cache[second_key] = true

      i += 1
    end
  end

  x.compare!
end

Result:

ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [arm64-darwin25]
Warming up --------------------------------------
          flat write   110.294k i/100ms
        nested write    97.857k i/100ms
Calculating -------------------------------------
          flat write      1.527M (±48.6%) i/s  (655.03 ns/i) -      5.074M in   5.095572s
        nested write      1.283M (±69.2%) i/s  (779.70 ns/i) -      3.034M in   5.641034s

Comparison:
          flat write:  1526650.5 i/s
        nested write:  1282538.4 i/s - same-ish: difference falls within error

ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [arm64-darwin25]
Warming up --------------------------------------
empty flat cache miss
                       109.727k i/100ms
empty nested cache miss
                       190.134k i/100ms
Calculating -------------------------------------
empty flat cache miss
                          4.096M (±26.2%) i/s  (244.13 ns/i) -     15.691M in   5.005101s
empty nested cache miss
                          2.304M (±17.2%) i/s  (433.98 ns/i) -     11.408M in   5.071173s

Comparison:
empty flat cache miss:  4096200.6 i/s
empty nested cache miss:  2304246.3 i/s - 1.78x  slower

ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [arm64-darwin25]
Warming up --------------------------------------
     flat cache miss   282.826k i/100ms
   nested cache miss    36.178k i/100ms
Calculating -------------------------------------
     flat cache miss      1.868M (±60.4%) i/s  (535.34 ns/i) -      5.939M in   5.057530s
   nested cache miss      1.699M (±34.8%) i/s  (588.54 ns/i) -      2.641M in   5.013249s

Comparison:
     flat cache miss:  1867977.3 i/s
   nested cache miss:  1699118.9 i/s - same-ish: difference falls within error

ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [arm64-darwin25]
Warming up --------------------------------------
      flat cache hit   152.903k i/100ms
    nested cache hit    20.885k i/100ms
Calculating -------------------------------------
      flat cache hit      1.164M (±25.7%) i/s  (859.12 ns/i) -      5.352M in   5.041795s
    nested cache hit      1.061M (±21.2%) i/s  (942.65 ns/i) -      2.590M in   5.146799s

Comparison:
      flat cache hit:  1163980.5 i/s
    nested cache hit:  1060834.4 i/s - same-ish: difference falls within error

ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [arm64-darwin25]
Warming up --------------------------------------
flat cache miss then write
                        81.314k i/100ms
nested cache miss then write
                        69.604k i/100ms
Calculating -------------------------------------
flat cache miss then write
                          2.756M (±26.0%) i/s  (362.85 ns/i) -     12.685M in   5.010297s
nested cache miss then write
                          2.062M (±47.6%) i/s  (485.08 ns/i) -      3.063M in   5.096352s

Comparison:
flat cache miss then write:  2755951.1 i/s
nested cache miss then write:  2061520.5 i/s - same-ish: difference falls within error

Given the numbers:

Cache miss (always followed by a write): The flat algorithm is slightly better — it allocates one BigNum and performs one write (plus one preceding lookup). The nested algorithm allocates one Hash and performs two writes (plus one preceding lookup that triggers the default proc). Both allocate once, but nested does an extra write.

Cache hit: The flat algorithm allocates a BigNum and performs two lookups (key? + []), both using BigNum hashing. The nested algorithm allocates nothing and performs three lookups, all using identity hashing. Nested wins for two compounding reasons: identity hashing is much cheaper than BigNum hashing (it's effectively just the object_id), and there's no allocation. The flat algorithm allocates a fresh BigNum on every hit forever — that's recurring GC pressure that the nested version simply doesn't have.

In practice, simple queries such as { user { name email } } will perform slightly worse with the nested algorithm because the flat algorithm has a cheaper miss path, and simple queries are miss-dominated (few distinct type pairs, few total calls). But we're talking microseconds — negligible compared to overall query resolution time.

Complex queries, like the ones we have here, will clearly benefit from the nested algorithm: they make many calls but over a small set of distinct type pairs, so they're hit-dominated. The nested algorithm wins on hits both because identity hashing is cheaper than BigNum hashing and because it avoids the per-hit BigNum allocation that the flat algorithm pays forever.

Let me review the PR again.

@sobrinho
Copy link
Copy Markdown
Author

Overall this looks good now.

One thing I've been thinking about: should we cache this at the app level instead of per-query? Right now, once a query finishes, the cache is discarded. But after the app boots, we don't expect types to change — so both query.types.possible_types(...).to_set and the intersection result could in principle be cached for the entire process lifecycle.

Not suggesting we hold this PR — it's already a clear win as-is. Just wondering if there's room for a follow-up that takes this further.

@sobrinho
Copy link
Copy Markdown
Author

sobrinho commented May 14, 2026

If we started accepting procs (or callable objects) to build analyzers, we could lift the caches up to the analyzer factory itself, so they persist across queries:

class CachedComplexityAnalyzer
  def initialize
    @intersect_cache = Hash.new { |h, k| h[k] = {}.compare_by_identity }.compare_by_identity
    @possible_types_cache = {}.compare_by_identity
  end

  def call(subject)
    MaxQueryComplexity.new(
      subject,
      intersect_cache: @intersect_cache,
      possible_types_cache: @possible_types_cache
    )
  end
end

class MaxQueryComplexity
  alias call new

  # current class here
end

class RootSchema < GraphQL::Schema
  query_analyzer CachedComplexityAnalyzer.new
end

The analyzer instance lives for the lifetime of the schema, so the caches do too — each query gets a fresh MaxQueryComplexity but shares the underlying intersection and possible-types data.

I don't think we'd need an LRU or other eviction policy here — the schema isn't expected to change at runtime under normal circumstances, so the cache has a bounded size (number of type pairs in the schema) and stable contents.

Possible intersection with #5632.

@sobrinho
Copy link
Copy Markdown
Author

Last but not least: if we support JRuby / TruffleRuby / etc., we should use concurrent-ruby for the shared caches — plain Hash isn't thread-safe under true parallelism, and a schema-level cache would be hit by multiple threads concurrently.

class CachedComplexityAnalyzer
  def initialize
    @intersect_cache = Concurrent::Hash.new { |h, k| h[k] = Concurrent::Hash.new.compare_by_identity }.compare_by_identity
    @possible_types_cache = Concurrent::Hash.new.compare_by_identity
  end

  def call(subject)
    MaxQueryComplexity.new(
      subject,
      intersect_cache: @intersect_cache,
      possible_types_cache: @possible_types_cache
    )
  end
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants