When trying to open the interpreted results of a query run that has produced a sarif results file of >4GB, we get an error like this:
[2021-01-28 18:21:22] CSV_IMB_QUERIES: Query,edges#query#ffffffffffffff nodes#query#fffffffff #select#query#ffffffffffffffffffffff,padlockws2-2.ql,26,Success,291.651,407918,291939
Exception during results interpretation: Reading output of interpretation failed: RangeError [ERR_FS_FILE_TOO_LARGE]: File size (6638382197) is greater than possible Buffer: 4294967295 bytes. Will show raw results instead.
Node limits the size of strings and buffers to 4294967295 bytes, even on machines that have enough ram to support more.
The parsed version of the sarif results could fit in memory, even if the string cannot. It's possible that a streaming JSON parser, like JSONStream could work, but I need to explore this library in more detail and make sure it is safe and stable before we can use.
I don't think it is a good idea to roll our own streaming parser if there is a suitable OSS one available since there would be a fair amount of work involved and getting the edge cases to work is tricky.
Suggested breakdown:
When trying to open the interpreted results of a query run that has produced a sarif results file of >4GB, we get an error like this:
Node limits the size of strings and buffers to 4294967295 bytes, even on machines that have enough ram to support more.
The parsed version of the sarif results could fit in memory, even if the string cannot. It's possible that a streaming JSON parser, like JSONStream could work, but I need to explore this library in more detail and make sure it is safe and stable before we can use.
I don't think it is a good idea to roll our own streaming parser if there is a suitable OSS one available since there would be a fair amount of work involved and getting the edge cases to work is tricky.
Suggested breakdown:
JSONSchemaas a dependencyJSONSchemawhen reading the SARIF file produced by results interpretationRangeError