Memoization Fails for large input streams


When the input stream become large (~1.1M nodes without children, or ~6K nodes with children) the ID used to track memos in the memo table ceases to be unique. This causes "garbage" production matches with no error reported.

Attached is a patch that uses a first-class key to identify the memos => survives largish inputs. It also places a test after memo retrieval to verify and fail if the memo does not refer to the expected input position. This can be removed once confidence is gained...

file attachments

Closed Sep 19, 2013 at 2:24 PM by justinc
Accepted patch from Aetheon.


justinc wrote Feb 2, 2012 at 6:21 PM

You mentioned that you had a script that was provoking the collisions earlier, do you have that handy by any chance? I would really like to create a unit test so I can ensure that it's fixed and that we're not regressing this issue...

justinc wrote Feb 4, 2012 at 6:47 PM

Unfortunately with this patch the time it takes to run all the tests goes from about 15s to 30s on my laptop... I will run some perf analysis and see what I can do here.

justinc wrote Feb 4, 2012 at 10:37 PM

Committed #4e896f18ec7c I will wait for the grammar to reproduce this and make a test before closing this.