Skip to content

Log the sizes allocated through the SimpleMemoryPool for analysis.#380

Open
wyuka wants to merge 2 commits into
linkedin:3.0-lifrom
wyuka:request-size-logging
Open

Log the sizes allocated through the SimpleMemoryPool for analysis.#380
wyuka wants to merge 2 commits into
linkedin:3.0-lifrom
wyuka:request-size-logging

Conversation

@wyuka
Copy link
Copy Markdown

@wyuka wyuka commented Aug 29, 2022

[LI-HOTFIX]
TICKET = LIKAFKA-43625
LI_DESCRIPTION = We want to re-use memory buffers for request deserialization. To do this, we need to gather statistics on the most frequent allocation sizes on current clusters. We need histograms of the number of requests for every allocation size. This change enables us to specify Kafka configuration to log this histogram every few (configurable amount of) minutes.
EXIT_CRITERIA = This PR will be reverted once we determine the right range of sizes to use for the memory pool as part of the work on the aforementioned ticket.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@wyuka wyuka requested a review from lmr3796 August 29, 2022 20:54
@wyuka wyuka requested a review from gitlw August 29, 2022 23:22
public class MemoryPoolStatsStore {
private static final Logger log = LoggerFactory.getLogger(MemoryPoolStatsStore.class);

private final AtomicInteger[] histogram;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to not use the metrics object but implement this by hand?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Kafka metrics support publishing histograms only as percentiles. This will not be useful information for us to determine the range of most frequent memory allocations. For example, consider that around 60% of requests are of size 1 kb. Now, if we observe from metrics that the p90 request size is 400k and the p99 size is 1500k, we are unable to make any inference about the fact that most requests are within 1 kb range.

Hence, we are publishing the full histogram as logs.

@@ -114,6 +119,7 @@ public boolean isOutOfMemory() {
//allows subclasses to do their own bookkeeping (and validation) _before_ memory is returned to client code.
protected void bufferToBeReturned(ByteBuffer justAllocated) {
this.allocateSensor.record(justAllocated.capacity());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the metrics from this allocateSensor not good enough?
If so, can you please explain why?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sensor relies on the Stat class, and it appears that it can record statistics like max, sum, avg, and not the entire histogram.

memoryPoolStatsStore.clear()
}

val histogramPublisher = new KafkaScheduler(threads = 1, "histogram-publisher-")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse the same scheduler

var kafkaScheduler: KafkaScheduler = null

since the startup and shutting down of that scheduler is already taken care of.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we can do that. I was not sure myself, because we use this kafkaScheduler in some places inside KafkaServer, while in other places we allocate a new KafkaScheduler for other purposes. So I was confused as to which way to go.

I have fixed it.

class MemoryPoolStatsLogger extends Logging {
override lazy val logger = MemoryPoolStatsLogger.logger

def logStats(memoryPoolStatsStore: MemoryPoolStatsStore): Unit = {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems everything in this class can be moved into the object MemoryPoolStatsLogger

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the good suggestion.

log.debug("Requested bytes {} for allocation exceeds maximum recorded value {}", bytes, maxSizeBytes);
return -1;
} else {
return (bytes - 1) / segmentSizeBytes;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to record the bytes in log scale instead of linear scale?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants