We've been investigating a memory leak issue where we have sockets that remain stuck in CLOSE_WAIT. The sockets are being using to pull blob files (~ 250KB - 1MB in size) from Amazon S3. The code we use to pull the data is using the Knox library [https://github.com/Automattic/knox], but it's really just a wrapper around http.ClientRequest.
The code is straightforward and essentially boils down to:
var req = https.request(options);
req.end();
req.on('response', function (read_stream) {
read_stream.on('data', function (chunk) {
// buffer data
}
read_stream.on('end', function () {
// call our main callback with buffered data
}
I cannot reproduce this issue locally, but in production I'll have sockets stuck in this state even 10 minutes after a restart. Also, what I thought was just a memory leak appears to result in us silently not completing client request for the s3 data.
When turning on debug logging of the net.js module and cross referencing it against a tcpdump I was able to find that handle.readStop() is being called [https://github.com/nodejs/node/blob/v0.10.29-release/lib/net.js#L533] during the data transfer. After this, we never end up reading from that socket any further. The amount of data left in the kernel's Recv-Q for that socket (via netstat) is equal to the remainder shown in the tcpdump output. That is, the amount of data that node processed (calculated via https://github.com/nodejs/node/blob/v0.10.29-release/lib/net.js#L504) plus the remainder in the kernel equals the total sent from the remote host.
- Any idea why
readStop is being called, but readStart is not, given that there is still data to read?
- Is it worth trying to switch from reading the stream via
data events to using the readable event with the read() method (streams2); would that even make any difference?
- Any thoughts on how to further debug this? I've tried to recreate by lowering the stream's
highWaterMark to get readStop() to fire, but even if readStop gets called in these tests, the stream always ends up resuming. I've tried hitting S3 with low and high load but still cannot reproduce. Any suggestions?
Thanks,
Dave
v0.10.36
Linux hd1app1 3.13.0-83-generic src: fix unaligned access in ucs2 string encoder #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
net.js, http.js, _stream_readable.js
We've been investigating a memory leak issue where we have sockets that remain stuck in
CLOSE_WAIT. The sockets are being using to pull blob files (~ 250KB - 1MB in size) from Amazon S3. The code we use to pull the data is using the Knox library [https://github.com/Automattic/knox], but it's really just a wrapper around http.ClientRequest.The code is straightforward and essentially boils down to:
I cannot reproduce this issue locally, but in production I'll have sockets stuck in this state even 10 minutes after a restart. Also, what I thought was just a memory leak appears to result in us silently not completing client request for the s3 data.
When turning on debug logging of the
net.jsmodule and cross referencing it against a tcpdump I was able to find thathandle.readStop()is being called [https://github.com/nodejs/node/blob/v0.10.29-release/lib/net.js#L533] during the data transfer. After this, we never end up reading from that socket any further. The amount of data left in the kernel's Recv-Q for that socket (vianetstat) is equal to the remainder shown in the tcpdump output. That is, the amount of data that node processed (calculated via https://github.com/nodejs/node/blob/v0.10.29-release/lib/net.js#L504) plus the remainder in the kernel equals the total sent from the remote host.readStopis being called, butreadStartis not, given that there is still data to read?dataevents to using thereadableevent with theread()method (streams2); would that even make any difference?highWaterMarkto getreadStop()to fire, but even ifreadStopgets called in these tests, the stream always ends up resuming. I've tried hitting S3 with low and high load but still cannot reproduce. Any suggestions?Thanks,
Dave