Cache type intersection results in QueryComplexity analyzer#5631
Cache type intersection results in QueryComplexity analyzer#5631sobrinho wants to merge 6 commits into
Conversation
|
In the real world, this dropped our |
`types_intersect?` was calling `query.types.possible_types` on every check, with an O(n) linear scan to find a common type. For queries with many abstract types (interfaces/unions), this became a bottleneck because the same type pairs are rechecked repeatedly during analysis. Two caches are added to the analyzer instance: - `@possible_types_cache` – memoizes `possible_types(type)` as a Set so membership lookups are O(1) instead of O(n). - `@intersect_cache` – memoizes the boolean result of each pair (a, b) using a composite key built from their object_ids, so the intersection is computed at most once per pair per query execution. Also adds `benchmark/complexity.rb` to measure the impact. The benchmark shows ~7× throughput improvement (2.8 i/s → 19.9 i/s) on a query that exercises repeated complexity analysis across abstract types.
57d6a78 to
dfce8ee
Compare
|
Hey, thanks so much for investigating this and sharing your wins! I definitely want to incorporate this improvement. Could you do a couple things before I merge it?
Query Complexity possible types bottleneck benchmark
# frozen_string_literal: true
require "bundler/setup"
require "graphql"
require "benchmark"
require "benchmark/ips"
CONCRETE_TYPES_COUNT = 5_000
QUERY = "{ myObject { ... on MyInterface { name } } }"
module MyInterface
include GraphQL::Schema::Interface
field :name, String, null: false
end
CONCRETE_TYPES = (1..CONCRETE_TYPES_COUNT).map do |i|
Class.new(GraphQL::Schema::Object) do
implements MyInterface
graphql_name "MyConcreteObject#{i}"
field :name, String, null: false
end
end
class MyQueryType < GraphQL::Schema::Object
field :my_object, MyInterface, null: false
def my_object
{ name: "Gabriel Sobrinho" }
end
end
class MySchema < GraphQL::Schema
query MyQueryType
orphan_types CONCRETE_TYPES
max_complexity 1_000
complexity_cost_calculation_mode :compare
def self.resolve_type(_type, _obj, _ctx)
CONCRETE_TYPES[0]
end
end
# Warmup
errors = MySchema.validate(QUERY)
warn errors.inspect if errors.any?
Benchmark.ips do |x|
x.report("Running query with complexity analysis") do
MySchema.execute(QUERY)
end
end |
The benchmark was useful during development to measure the impact of the caching changes, but it is not intended to be maintained or run regularly as part of the gem. Removing it to keep the source tree clean.
The original key used a 32-bit shift, which is not safe: Ruby object IDs on 64-bit platforms can exceed 32 bits (they are derived from memory addresses), so ORing two IDs with only a 32-bit gap risks overlap and silent cache collisions. Switch to a 64-bit shift. Ruby integers are arbitrary-precision, so the operation is completely lossless. Also adds a comment explaining the key construction for future readers.
|
@rmosolgo removed the benchmark and also fixed the key calculation to take into account it can be 64 bits (comment added as well). |
|
To add context, at first I tried to use an array like |
|
Out of curiosity, I tried a couple of other optimizations here. First, I ran the benchmark locally on 6a15de3: Then, I added compare_by_identity and equal?
diff --git a/lib/graphql/analysis/query_complexity.rb b/lib/graphql/analysis/query_complexity.rb
index fd5f1d91ee..73d2eadcfd 100644
--- a/lib/graphql/analysis/query_complexity.rb
+++ b/lib/graphql/analysis/query_complexity.rb
@@ -9,8 +9,8 @@ module GraphQL
super
@skip_introspection_fields = !query.schema.max_complexity_count_introspection_fields
@complexities_on_type_by_query = {}
- @intersect_cache = {}
- @possible_types_cache = {}
+ @intersect_cache = {}.compare_by_identity
+ @possible_types_cache = {}.compare_by_identity
end
# Override this method to use the complexity result
diff --git a/lib/graphql/analysis/query_complexity.rb b/lib/graphql/analysis/query_complexity.rb
index 73d2eadcfd..bdeb3971fb 100644
--- a/lib/graphql/analysis/query_complexity.rb
+++ b/lib/graphql/analysis/query_complexity.rb
@@ -160,7 +160,7 @@ module GraphQL
end
def types_intersect?(query, a, b)
- return true if a == b
+ return true if a.equal?(b)
id_a, id_b = a.object_id, b.object_id
# Build a symmetric composite key: smaller ID in the high 64 bits, larger in the lowAnd ran the benchmark again. For me, it produced a small (1.7%) speedup: Then I tried a larger change, using a two-layer compare_by_identity cache with classes as keys (pushed here as 7128ec8) and got a bit more speed (3.6% faster than baseline): I pushed this change here because it turned out to be faster ... and honestly, I feel a bit better about it because it doesn't use object_id-based key, which is not a technique I've seen in other Ruby projects. What do you think about it? Also, not related to the microbenchmark, but maybe helpful for your app. Are you already using the new use GraphQL::Schema::VisibilityThis also produced a huge (75% over baseline) speedup: You can find migration notes for this new implementation here: https://graphql-ruby.org/authorization/visibility.html#migration-notes |
|
🙈 I was totally sniped. I ran stackprof and saw lots of GC time so I invested object allocations and found a hot-path allocation that could be eliminated: ~8.5% above baseline now 😁 |
|
I don't think require "benchmark"
require "benchmark/ips"
a = Object.new
b = Object.new
Benchmark.ips do |x|
x.report("== true") { a == a }
x.report("== false") { a == b }
x.report("equal? true") { a.equal?(a) }
x.report("equal? false") { a.equal?(b) }
end |
|
I'm trying to write a fair benchmark on the nested lookup against the flat lookup to confirm. Give me a few :) |
|
Oops -- I went back to |
|
You are right about nested being better here! Benchmark: require "benchmark/ips"
require "set"
GC.disable if ENV["GC_DISABLE"]
nested_values = []
flat_values = []
Benchmark.ips do |x|
x.report("flat write") do |times|
$f = {}
i = 0
while i < times
# create new object for each iteration
# won't hurt performance since nested will also create new objects
a, b = Object.new, Object.new
# actual algorithm
id_a, id_b = a.object_id, b.object_id
key = id_a < id_b ? (id_a << 64) | id_b : (id_b << 64) | id_a
$f[key] = true
flat_values << [a, b]
# next iteration
i += 1
end
end
x.report("nested write") do |times|
$n = Hash.new { |h, k| h[k] = {}.compare_by_identity }.compare_by_identity
i = 0
while i < times
a, b = Object.new, Object.new
if a.object_id < b.object_id
first_cache = $n[a]
second_key = b
else
first_cache = $n[b]
second_key = a
end
first_cache[second_key] = true
nested_values << [a, b]
i += 1
end
end
x.compare!
end
Benchmark.ips do |x|
x.report("empty flat cache miss") do |times|
f = {}
i = 0
while i < times
a, b = Object.new, Object.new
id_a, id_b = a.object_id, b.object_id
key = id_a < id_b ? (id_a << 64) | id_b : (id_b << 64) | id_a
f.key?(key) && f[key]
i += 1
end
end
x.report("empty nested cache miss") do |times|
n = Hash.new { |h, k| h[k] = {}.compare_by_identity }.compare_by_identity
i = 0
while i < times
a, b = Object.new, Object.new
if a.object_id < b.object_id
first_cache = n[a]
second_key = b
else
first_cache = n[b]
second_key = a
end
first_cache.key?(second_key) && first_cache[second_key]
i += 1
end
end
x.compare!
end
Benchmark.ips do |x|
x.report("flat cache miss") do
a, b = Object.new, Object.new
id_a, id_b = a.object_id, b.object_id
key = id_a < id_b ? (id_a << 64) | id_b : (id_b << 64) | id_a
$f.key?(key) && $f[key]
end
x.report("nested cache miss") do
a, b = Object.new, Object.new
if a.object_id < b.object_id
first_cache = $n[a]
second_key = b
else
first_cache = $n[b]
second_key = a
end
first_cache.key?(second_key) && first_cache[second_key]
end
x.compare!
end
Benchmark.ips do |x|
x.report("flat cache hit") do
a, b = flat_values.sample
id_a, id_b = a.object_id, b.object_id
key = id_a < id_b ? (id_a << 64) | id_b : (id_b << 64) | id_a
$f.key?(key) && $f[key]
end
x.report("nested cache hit") do
a, b = nested_values.sample
if a.object_id < b.object_id
first_cache = $n[a]
second_key = b
else
first_cache = $n[b]
second_key = a
end
first_cache.key?(second_key) && first_cache[second_key]
end
x.compare!
end
Benchmark.ips do |x|
x.report("flat cache miss then write") do |times|
f = {}
i = 0
while i < times
a, b = Object.new, Object.new
id_a, id_b = a.object_id, b.object_id
key = id_a < id_b ? (id_a << 64) | id_b : (id_b << 64) | id_a
f.key?(key) && f[key]
f[key] = true
i += 1
end
end
x.report("nested cache miss then write") do |times|
n = Hash.new { |h, k| h[k] = {}.compare_by_identity }.compare_by_identity
i = 0
while i < times
a, b = Object.new, Object.new
if a.object_id < b.object_id
first_cache = n[a]
second_key = b
else
first_cache = n[b]
second_key = a
end
first_cache.key?(second_key) && first_cache[second_key]
first_cache[second_key] = true
i += 1
end
end
x.compare!
endResult: Given the numbers: Cache miss (always followed by a write): The flat algorithm is slightly better — it allocates one BigNum and performs one write (plus one preceding lookup). The nested algorithm allocates one Hash and performs two writes (plus one preceding lookup that triggers the default proc). Both allocate once, but nested does an extra write. Cache hit: The flat algorithm allocates a BigNum and performs two lookups (key? + []), both using BigNum hashing. The nested algorithm allocates nothing and performs three lookups, all using identity hashing. Nested wins for two compounding reasons: identity hashing is much cheaper than BigNum hashing (it's effectively just the object_id), and there's no allocation. The flat algorithm allocates a fresh BigNum on every hit forever — that's recurring GC pressure that the nested version simply doesn't have. In practice, simple queries such as Complex queries, like the ones we have here, will clearly benefit from the nested algorithm: they make many calls but over a small set of distinct type pairs, so they're hit-dominated. The nested algorithm wins on hits both because identity hashing is cheaper than BigNum hashing and because it avoids the per-hit BigNum allocation that the flat algorithm pays forever. Let me review the PR again. |
|
Overall this looks good now. One thing I've been thinking about: should we cache this at the app level instead of per-query? Right now, once a query finishes, the cache is discarded. But after the app boots, we don't expect types to change — so both Not suggesting we hold this PR — it's already a clear win as-is. Just wondering if there's room for a follow-up that takes this further. |
|
If we started accepting procs (or callable objects) to build analyzers, we could lift the caches up to the analyzer factory itself, so they persist across queries: class CachedComplexityAnalyzer
def initialize
@intersect_cache = Hash.new { |h, k| h[k] = {}.compare_by_identity }.compare_by_identity
@possible_types_cache = {}.compare_by_identity
end
def call(subject)
MaxQueryComplexity.new(
subject,
intersect_cache: @intersect_cache,
possible_types_cache: @possible_types_cache
)
end
end
class MaxQueryComplexity
alias call new
# current class here
end
class RootSchema < GraphQL::Schema
query_analyzer CachedComplexityAnalyzer.new
endThe analyzer instance lives for the lifetime of the schema, so the caches do too — each query gets a fresh MaxQueryComplexity but shares the underlying intersection and possible-types data. I don't think we'd need an LRU or other eviction policy here — the schema isn't expected to change at runtime under normal circumstances, so the cache has a bounded size (number of type pairs in the schema) and stable contents. Possible intersection with #5632. |
|
Last but not least: if we support JRuby / TruffleRuby / etc., we should use concurrent-ruby for the shared caches — plain Hash isn't thread-safe under true parallelism, and a schema-level cache would be hit by multiple threads concurrently. class CachedComplexityAnalyzer
def initialize
@intersect_cache = Concurrent::Hash.new { |h, k| h[k] = Concurrent::Hash.new.compare_by_identity }.compare_by_identity
@possible_types_cache = Concurrent::Hash.new.compare_by_identity
end
def call(subject)
MaxQueryComplexity.new(
subject,
intersect_cache: @intersect_cache,
possible_types_cache: @possible_types_cache
)
end
end |
types_intersect?was callingquery.types.possible_typeson every check, with an O(n) linear scan to find a common type. For queries with many abstract types (interfaces/unions), this became a bottleneck because the same type pairs are rechecked repeatedly during analysis.Two caches are added to the analyzer instance:
@possible_types_cache– memoizespossible_types(type)as a Set so membership lookups are O(1) instead of O(n).@intersect_cache– memoizes the boolean result of each pair (a, b) using a composite key built from their object_ids, so the intersection is computed at most once per pair per query execution.Also adds
benchmark/complexity.rbto measure the impact. The benchmark shows ~7× throughput improvement (2.8 i/s → 19.9 i/s) on a query that exercises repeated complexity analysis across abstract types.