or, logical invalid CoT prompting gains substantial performance by giving double Negations in English statement. As written English does not allow multiple negatives, such statements (e.g., not (not not True) ) becomes (not (not True)) leads to False - which is a correct answer via a wrong intermediate step. • Invalid CoT achieve ~100% accuracy on tracking shuffled objects task whereas answer only and valid CoT prompts achieve ~85% accuracy. Similarly, there are large gaps in finding answers in other Big-Bench Hard tasks, e.g., word sorting, dyck language, formal fallacies and logical deduction for seven objects tasks on Codex and InstructGPT.