Reasoning skills of large language models are often overestimated