[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

TEBenchmark: Look for examples of rejected logically valid outputs



A common complaint about the paper is that we're checking syntactic
equality, whereas more logically-powerful alternatives could be used.

We have the output data as JSON, so we should look to see whether this
is actually a problem in reality.