Skip to content

Commit 8978021

Browse files
authored
Merge pull request #698 from jborgers/pmd7-issue697
pmd7-issue697 Extend AvoidLoadingAllFromFile with all major load‑all‑into‑memory patterns
2 parents 8b38444 + 554b4ec commit 8978021

File tree

5 files changed

+172
-78
lines changed

5 files changed

+172
-78
lines changed

docs/JavaCodePerformance.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2080,6 +2080,10 @@ This especially applies to direct file streaming. Access through ClassLoader.get
20802080
20812081
**Observation: All bytes of a large file are loaded into memory.** E.g. for uploading or downloading documents, for instance to determine the mime type or create a digest.
20822082
**Problem:** Large objects are allocated on the heap, up to e.g. 50 MB. We also observed 300 MB and even 1 GB in back-end systems. This likely triggers long gc pauses for compaction or may trigger out of memory crashes.
2083+
APIs like `Files.readAllBytes`, `Files.readAllLines`, `Files.readString`, `InputStream.readAllBytes`, `IOUtils.toByteArray`, `ByteArrayInputStream`,
2084+
`ByteArrayOutputStream.toByteArray`, and `ResizableByteArrayOutputStream.toByteArray` all load entire content into heap memory.
2085+
This risks OutOfMemoryError, long GC pauses, and slow responses when processing large files.
2086+
**Note:** Not a problem for small files.
20832087
20842088
In the next example, there are two large byte arrays in memory: one in baos and the other is the returned byte array since a copy is made in toByteArray().
20852089
@@ -2102,7 +2106,8 @@ class Bad1 {
21022106
}
21032107
```
21042108
2105-
**Solution:** Stream-through: use streaming all the way, don't store the whole thing in memory, don't use byte arrays. A mime type is determined from the first few bytes of the file, don't read in all 50 MB - 1 GB for that. Often, functionality can be achieved in a streaming way, i.e. [a digest can be computed in a streaming way](http://www.mkyong.com/java/java-sha-hashing-example/).
2109+
**Solution:** Use streaming APIs end-to-end, do not buffer whole files in memory.
2110+
A mime type is determined from the first few bytes of the file, don't read in all 50 MB - 1 GB for that. Usually, functionality can be achieved in a streaming way, i.e. [a digest can be computed in a streaming way](http://www.mkyong.com/java/java-sha-hashing-example/).
21062111

21072112
Example 1 improved:
21082113

rulesets/java/jpinpoint-java-rules.xml

Lines changed: 39 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -482,47 +482,60 @@ class IterationsBetter {
482482

483483
<rule name="AvoidLoadingAllFromFile"
484484
language="java"
485-
message="Files.readAll methods load all bytes from a file into memory: a risk of memory problems."
485+
message="Files.readAll methods and similar APIs load all bytes into memory: a risk of memory problems."
486486
class="net.sourceforge.pmd.lang.rule.xpath.XPathRule"
487-
488487
externalInfoUrl="https://github.com/jborgers/PMD-jPinpoint-rules/tree/pmd7/docs/JavaCodePerformance.md#isio03">
489-
<description>Problem: Files.readAllBytes and Files.readAllLines load all bytes from a file into the heap memory.
490-
This may result in an OutOfMemoryError crash, or long gc pauses and slow responses.
491-
Solution: Stream-through: use streaming all the way, don't store the whole thing in memory, don't use byte arrays.
492-
Often, functionality can be achieved in a streaming way.
493-
Note: not a problem for small files.
488+
<description>
489+
Problem: APIs like Files.readAllBytes, Files.readAllLines, Files.readString,
490+
InputStream.readAllBytes, IOUtils.toByteArray, ByteArrayInputStream,
491+
ByteArrayOutputStream.toByteArray, and ResizableByteArrayOutputStream.toByteArray
492+
load entire content into heap memory. This risks OutOfMemoryError, long GC pauses,
493+
and slow responses when processing large files.
494+
Solution: Use streaming APIs end-to-end, do not buffer whole files in memory.
495+
Note: Not a problem for small files. In that case use e.g.:
496+
@SuppressWarnings("PMD.AvoidLoadingAllFromFile") // only small files: max 1 kB.
497+
(jpinpoint-rules)
494498
(jpinpoint-rules)</description>
495499
<priority>2</priority>
496500
<properties>
497501
<property name="tags" value="jpinpoint-rule,memory,performance,sustainability-medium" type="String" description="classification"/>
498502
<property name="xpath">
499503
<value><![CDATA[
500-
//MethodDeclaration//MethodCall[pmd-java:matchesSig('java.nio.file.Files#readAllBytes(java.nio.file.Path)')
501-
or pmd-java:matchesSig('java.nio.file.Files#readAllLines(java.nio.file.Path)')
502-
or pmd-java:matchesSig('java.nio.file.Files#readAllLines(java.nio.file.Path,_)')
504+
//MethodCall[
505+
(TypeExpression[pmd-java:typeIs('java.nio.file.Files')] and @MethodName=('readAllBytes', 'readAllLines', 'readString'))
506+
or pmd-java:matchesSig('java.io.InputStream#readAllBytes()')
507+
or (TypeExpression[pmd-java:typeIs('org.apache.commons.io.IOUtils')] and @MethodName='toByteArray')
508+
or (VariableAccess[pmd-java:typeIs('java.io.ByteArrayOutputStream')] and @MethodName='toByteArray')
509+
or (VariableAccess[pmd-java:typeIs('org.springframework.util.ResizableByteArrayOutputStream')] and @MethodName='toByteArray')
503510
]
504-
]]></value>
511+
|
512+
//ConstructorCall[pmd-java:typeIs('java.io.ByteArrayInputStream')]
513+
]]></value>
505514
</property>
506515
</properties>
507-
<example>
508-
<![CDATA[
516+
<example><![CDATA[
509517
class Bad {
510-
void bad(Path path) {
511-
byte[] fileBytes = Files.readAllBytes(path); // bad
512-
List<String> fileLines = Files.readAllLines(path); // bad
513-
// process bytes / lines
518+
void bad(Path path, InputStream in) throws Exception {
519+
byte[] fileBytes = Files.readAllBytes(path); // bad
520+
List<String> fileLines = Files.readAllLines(path); // bad
521+
String s = Files.readString(path); // bad
522+
byte[] full = IOUtils.toByteArray(in); // bad
523+
ByteArrayInputStream bis = new ByteArrayInputStream(full); // bad
524+
ByteArrayOutputStream baos = new ByteArrayOutputStream();
525+
byte[] arr = baos.toByteArray(); // bad
514526
}
515527
}
516-
517528
class Good {
518-
void good(Path in) throws IOException {
519-
try (BufferedReader reader = Files.newBufferedReader(in)) {
520-
String line = reader.readLine();
521-
// process line by line
529+
void good(Path path, InputStream in) throws Exception {
530+
try (InputStream s = Files.newInputStream(path)) {
531+
IOUtils.copy(s, System.out); // streaming OK
532+
}
533+
try (BufferedReader reader = Files.newBufferedReader(path)) {
534+
String line = reader.readLine(); // streaming OK
522535
}
523536
}
524537
}
525-
]]>
538+
]]>
526539
</example>
527540
</rule>
528541

@@ -6124,10 +6137,9 @@ return HttpClientBuilder.create() // good, setConnectionManager called, pool con
61246137
language="java"
61256138
message="Apache HttpClient builder is used and not all three timeouts are configured."
61266139
externalInfoUrl="https://github.com/jborgers/PMD-jPinpoint-rules/tree/pmd7/docs/JavaCodePerformance.md#IBI10">
6127-
<description>Problem: for connectionRequestTimeout, connectTimeout, socketTimeout (using
6128-
HttpComponentsClientHttpRequestFactory/RequestConfig) or readTimeout/responseTimeout (using RequestConfig v4/v5) and connectTimeout (using ConnectionConfig v5)
6129-
the default timeout settings are not optimal in most cases. &#13;
6130-
Solution: Set the timeouts explicitly to proper reasoned values. See best practice values via the link. Use
6140+
<description>Problem: default timeout values for Apache HttpClient are often sub optimal (connectionRequestTimeout, connectTimeout, socketTimeout in HttpComponentsClientHttpRequestFactory;
6141+
readTimeout/responseTimeout in RequestConfig v4/v5; and connectTimeout in ConnectionConfig v5) &#13;
6142+
Solution: Set all these timeouts explicitly to proper reasoned values. See best practice values via the link. Use
61316143
the setDefaultRequestConfig with a method with a RequestConfig object on HttpClient builders to set the
61326144
timeouts. (jpinpoint-rules)
61336145
(jpinpoint-rules)</description>

rulesets/java/jpinpoint-rules.xml

Lines changed: 39 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -482,47 +482,60 @@ class IterationsBetter {
482482

483483
<rule name="AvoidLoadingAllFromFile"
484484
language="java"
485-
message="Files.readAll methods load all bytes from a file into memory: a risk of memory problems."
485+
message="Files.readAll methods and similar APIs load all bytes into memory: a risk of memory problems."
486486
class="net.sourceforge.pmd.lang.rule.xpath.XPathRule"
487-
488487
externalInfoUrl="https://github.com/jborgers/PMD-jPinpoint-rules/tree/pmd7/docs/JavaCodePerformance.md#isio03">
489-
<description>Problem: Files.readAllBytes and Files.readAllLines load all bytes from a file into the heap memory.
490-
This may result in an OutOfMemoryError crash, or long gc pauses and slow responses.
491-
Solution: Stream-through: use streaming all the way, don't store the whole thing in memory, don't use byte arrays.
492-
Often, functionality can be achieved in a streaming way.
493-
Note: not a problem for small files.
488+
<description>
489+
Problem: APIs like Files.readAllBytes, Files.readAllLines, Files.readString,
490+
InputStream.readAllBytes, IOUtils.toByteArray, ByteArrayInputStream,
491+
ByteArrayOutputStream.toByteArray, and ResizableByteArrayOutputStream.toByteArray
492+
load entire content into heap memory. This risks OutOfMemoryError, long GC pauses,
493+
and slow responses when processing large files.
494+
Solution: Use streaming APIs end-to-end, do not buffer whole files in memory.
495+
Note: Not a problem for small files. In that case use e.g.:
496+
@SuppressWarnings("PMD.AvoidLoadingAllFromFile") // only small files: max 1 kB.
497+
(jpinpoint-rules)
494498
(jpinpoint-rules)</description>
495499
<priority>2</priority>
496500
<properties>
497501
<property name="tags" value="jpinpoint-rule,memory,performance,sustainability-medium" type="String" description="classification"/>
498502
<property name="xpath">
499503
<value><![CDATA[
500-
//MethodDeclaration//MethodCall[pmd-java:matchesSig('java.nio.file.Files#readAllBytes(java.nio.file.Path)')
501-
or pmd-java:matchesSig('java.nio.file.Files#readAllLines(java.nio.file.Path)')
502-
or pmd-java:matchesSig('java.nio.file.Files#readAllLines(java.nio.file.Path,_)')
504+
//MethodCall[
505+
(TypeExpression[pmd-java:typeIs('java.nio.file.Files')] and @MethodName=('readAllBytes', 'readAllLines', 'readString'))
506+
or pmd-java:matchesSig('java.io.InputStream#readAllBytes()')
507+
or (TypeExpression[pmd-java:typeIs('org.apache.commons.io.IOUtils')] and @MethodName='toByteArray')
508+
or (VariableAccess[pmd-java:typeIs('java.io.ByteArrayOutputStream')] and @MethodName='toByteArray')
509+
or (VariableAccess[pmd-java:typeIs('org.springframework.util.ResizableByteArrayOutputStream')] and @MethodName='toByteArray')
503510
]
504-
]]></value>
511+
|
512+
//ConstructorCall[pmd-java:typeIs('java.io.ByteArrayInputStream')]
513+
]]></value>
505514
</property>
506515
</properties>
507-
<example>
508-
<![CDATA[
516+
<example><![CDATA[
509517
class Bad {
510-
void bad(Path path) {
511-
byte[] fileBytes = Files.readAllBytes(path); // bad
512-
List<String> fileLines = Files.readAllLines(path); // bad
513-
// process bytes / lines
518+
void bad(Path path, InputStream in) throws Exception {
519+
byte[] fileBytes = Files.readAllBytes(path); // bad
520+
List<String> fileLines = Files.readAllLines(path); // bad
521+
String s = Files.readString(path); // bad
522+
byte[] full = IOUtils.toByteArray(in); // bad
523+
ByteArrayInputStream bis = new ByteArrayInputStream(full); // bad
524+
ByteArrayOutputStream baos = new ByteArrayOutputStream();
525+
byte[] arr = baos.toByteArray(); // bad
514526
}
515527
}
516-
517528
class Good {
518-
void good(Path in) throws IOException {
519-
try (BufferedReader reader = Files.newBufferedReader(in)) {
520-
String line = reader.readLine();
521-
// process line by line
529+
void good(Path path, InputStream in) throws Exception {
530+
try (InputStream s = Files.newInputStream(path)) {
531+
IOUtils.copy(s, System.out); // streaming OK
532+
}
533+
try (BufferedReader reader = Files.newBufferedReader(path)) {
534+
String line = reader.readLine(); // streaming OK
522535
}
523536
}
524537
}
525-
]]>
538+
]]>
526539
</example>
527540
</rule>
528541

@@ -6124,10 +6137,9 @@ return HttpClientBuilder.create() // good, setConnectionManager called, pool con
61246137
language="java"
61256138
message="Apache HttpClient builder is used and not all three timeouts are configured."
61266139
externalInfoUrl="https://github.com/jborgers/PMD-jPinpoint-rules/tree/pmd7/docs/JavaCodePerformance.md#IBI10">
6127-
<description>Problem: for connectionRequestTimeout, connectTimeout, socketTimeout (using
6128-
HttpComponentsClientHttpRequestFactory/RequestConfig) or readTimeout/responseTimeout (using RequestConfig v4/v5) and connectTimeout (using ConnectionConfig v5)
6129-
the default timeout settings are not optimal in most cases. &#13;
6130-
Solution: Set the timeouts explicitly to proper reasoned values. See best practice values via the link. Use
6140+
<description>Problem: default timeout values for Apache HttpClient are often sub optimal (connectionRequestTimeout, connectTimeout, socketTimeout in HttpComponentsClientHttpRequestFactory;
6141+
readTimeout/responseTimeout in RequestConfig v4/v5; and connectTimeout in ConnectionConfig v5) &#13;
6142+
Solution: Set all these timeouts explicitly to proper reasoned values. See best practice values via the link. Use
61316143
the setDefaultRequestConfig with a method with a RequestConfig object on HttpClient builders to set the
61326144
timeouts. (jpinpoint-rules)
61336145
(jpinpoint-rules)</description>

src/main/resources/category/java/common.xml

Lines changed: 36 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -2141,47 +2141,60 @@ class Foo {
21412141

21422142
<rule name="AvoidLoadingAllFromFile"
21432143
language="java"
2144-
message="Files.readAll methods load all bytes from a file into memory: a risk of memory problems."
2144+
message="Files.readAll methods and similar APIs load all bytes into memory: a risk of memory problems."
21452145
class="net.sourceforge.pmd.lang.rule.xpath.XPathRule"
2146-
21472146
externalInfoUrl="${doc_root}/JavaCodePerformance.md#isio03">
2148-
<description>Problem: Files.readAllBytes and Files.readAllLines load all bytes from a file into the heap memory.
2149-
This may result in an OutOfMemoryError crash, or long gc pauses and slow responses.
2150-
Solution: Stream-through: use streaming all the way, don't store the whole thing in memory, don't use byte arrays.
2151-
Often, functionality can be achieved in a streaming way.
2152-
Note: not a problem for small files.
2147+
<description>
2148+
Problem: APIs like Files.readAllBytes, Files.readAllLines, Files.readString,
2149+
InputStream.readAllBytes, IOUtils.toByteArray, ByteArrayInputStream,
2150+
ByteArrayOutputStream.toByteArray, and ResizableByteArrayOutputStream.toByteArray
2151+
load entire content into heap memory. This risks OutOfMemoryError, long GC pauses,
2152+
and slow responses when processing large files.
2153+
Solution: Use streaming APIs end-to-end, do not buffer whole files in memory.
2154+
Note: Not a problem for small files. In that case use e.g.:
2155+
@SuppressWarnings("PMD.AvoidLoadingAllFromFile") // only small files: max 1 kB.
2156+
(jpinpoint-rules)
21532157
</description>
21542158
<priority>2</priority>
21552159
<properties>
21562160
<property name="tags" value="jpinpoint-rule,memory,performance,sustainability-medium" type="String" description="classification"/>
21572161
<property name="xpath">
21582162
<value><![CDATA[
2159-
//MethodDeclaration//MethodCall[pmd-java:matchesSig('java.nio.file.Files#readAllBytes(java.nio.file.Path)')
2160-
or pmd-java:matchesSig('java.nio.file.Files#readAllLines(java.nio.file.Path)')
2161-
or pmd-java:matchesSig('java.nio.file.Files#readAllLines(java.nio.file.Path,_)')
2163+
//MethodCall[
2164+
(TypeExpression[pmd-java:typeIs('java.nio.file.Files')] and @MethodName=('readAllBytes', 'readAllLines', 'readString'))
2165+
or pmd-java:matchesSig('java.io.InputStream#readAllBytes()')
2166+
or (TypeExpression[pmd-java:typeIs('org.apache.commons.io.IOUtils')] and @MethodName='toByteArray')
2167+
or (VariableAccess[pmd-java:typeIs('java.io.ByteArrayOutputStream')] and @MethodName='toByteArray')
2168+
or (VariableAccess[pmd-java:typeIs('org.springframework.util.ResizableByteArrayOutputStream')] and @MethodName='toByteArray')
21622169
]
2163-
]]></value>
2170+
|
2171+
//ConstructorCall[pmd-java:typeIs('java.io.ByteArrayInputStream')]
2172+
]]></value>
21642173
</property>
21652174
</properties>
2166-
<example>
2167-
<![CDATA[
2175+
<example><![CDATA[
21682176
class Bad {
2169-
void bad(Path path) {
2170-
byte[] fileBytes = Files.readAllBytes(path); // bad
2171-
List<String> fileLines = Files.readAllLines(path); // bad
2172-
// process bytes / lines
2177+
void bad(Path path, InputStream in) throws Exception {
2178+
byte[] fileBytes = Files.readAllBytes(path); // bad
2179+
List<String> fileLines = Files.readAllLines(path); // bad
2180+
String s = Files.readString(path); // bad
2181+
byte[] full = IOUtils.toByteArray(in); // bad
2182+
ByteArrayInputStream bis = new ByteArrayInputStream(full); // bad
2183+
ByteArrayOutputStream baos = new ByteArrayOutputStream();
2184+
byte[] arr = baos.toByteArray(); // bad
21732185
}
21742186
}
2175-
21762187
class Good {
2177-
void good(Path in) throws IOException {
2178-
try (BufferedReader reader = Files.newBufferedReader(in)) {
2179-
String line = reader.readLine();
2180-
// process line by line
2188+
void good(Path path, InputStream in) throws Exception {
2189+
try (InputStream s = Files.newInputStream(path)) {
2190+
IOUtils.copy(s, System.out); // streaming OK
2191+
}
2192+
try (BufferedReader reader = Files.newBufferedReader(path)) {
2193+
String line = reader.readLine(); // streaming OK
21812194
}
21822195
}
21832196
}
2184-
]]>
2197+
]]>
21852198
</example>
21862199
</rule>
21872200

0 commit comments

Comments
 (0)