I had some fun asking ChatGPT about cases from “Counterexamples in Analysis.” You get this kind of uncanny valley math, syntactically and stylistically correct but still wildly wrong.
This was a response to “Prove or disprove: there exists a nowhere continuous function whose absolute value is everywhere continuous.” It responded in TeX, which I coped into a TeX editor.
Another answer to the same question:
If I ask Bing the same question, it tells me about something called the “very not continuous function” (lol):
I can’t find the term “very not continuous function” anywhere on the web except this page, the one Bing cites.
The page looks kind of click-farm-like, and it’s not clear what function it means by “the very not continuous function.” But it does discuss the question I asked, so at least there’s that.
Anyway, it’s not web search relevance that I care about here – it’s math ability.
I tried again with Bing, this time with a different “Counterexamples in Analysis” case, an injunction not to perform a web search, and a half-hearted nod to chain-of-thought prompting.
The resulting discussion was an adventure in Helpful™ overconfidence:
(I said “bing ai” in the last screenshot due to a bizarre UI decision by Microsoft that makes it very easy to say “bing ai” to Bing without wanting or intending to. Don’t ask me, I didn’t do it ¯\_(ツ)_/¯ )
Here’s GPT-4 (on poe.com) answering the first of the two questions:
Update: tried the second example with GPT-4 (via ChatGPT plus).
It struggles in a similar manner to Bing. As with Bing, my attempts to reason with it do not work very well.
Maybe there’s a way of phrasing the responses that would make it think more carefully about their meaning and implications?
It’s hard to guess what will work because of the involvement of RLHF. (Otherwise I could just ask myself what a desriable version of this interaction might have looked like in the training data.)
Unfortunately, GPT-4 itself is inherently RLHF’d – a base model exists, but they aren’t exposing it to us, and I don’t see a reason to expect they ever will.
Screenshots under the cut











