CHAT GPT, My New Statistics Tutor

CHAT GPT can teach new students or refresh minds of older statisticians

Oct 25, 2025

If you’ve ever wrestled with “What is the correct way to compare returns?”, this post is for you. I walk through a real question: arithmetic means mislead for stock returns, geometric means fix the compounding logic — but how do we test differences in geometric returns statistically? No, standard software doesn’t have a button called “t-test for geometric means”… but yes, there is a clean solution.

My question:

I have some questions on the correct way to conduct comparisons and statistical tests on differences in mean returns on stock prices. Here is my problem, The arithmetic mean of stock returns is inappropriate measure of return -- 100 percent up followed by 50 percent down give 0 percent change but mean is 25. The geometric return gives the correct point estimate but what is the standard error of the geometric return and how do I test for difference in geometric returns between two stocks. Do standard software packages have the standard error of the geometric return or an appropriate t-test? What is best solution? log transformation?

ChatGPT said:

You are thinking up all the right issues; this is a subtle point.

1) Why arithmetic vs geometric matters

Arithmetic mean of returns is the correct estimator of expected one-period return.
Geometric mean is the unbiased estimator (in log space) of long-run compounded return.

Your 100% up then –50% example shows why the arithmetic mean is not a valid compounded measure. But arithmetic is still the mathematically correct object for one-period expectations.

So the question becomes:

If I want to statistically compare long-run compounded returns between two assets, what do I test on?

2) Work in log space

Let RtRt be gross returns (1+r). Define

g=1T∑t=1Tln⁡(1+rt)g=T1t=1∑Tln(1+rt)

Then geometric mean return is exp⁡(g)−1exp(g)−1.

Key fact: g is just a sample mean of iid (or weakly dep. but stationary) random variables. That means:

Its standard error is just the standard error of the sample mean of logs:

SE(g)=sln⁡(1+r)TSE(g)=Tsln(1+r)

And differences between two geometric means can be tested using a standard two-sample t-test on log returns, not on the exponentiated result.

3) Standard software

Yes — every standard package (R, Python, Stata, SAS, Matlab) gives you:

Sample mean of log(1+r)
Standard error of that mean
A t-test for mean difference between two series

You just do:

take log returns → test arithmetic difference in means of logs

There is no need for a “special test for geometric means”.

4) Why you should not run a t-test on exponentiated (compounded) returns

Exponentiation is nonlinear; after you exponentiate, the distribution becomes skewed and the delta method or bootstrapping is needed for inference. Working on logs (where the distribution is better behaved and additive) avoids that entirely.

5) Short summary — the best practice

So yes: log transformation is the standard solution.

If you want I can:

Show a quick numerical worked example with two stocks
Or show R / Python / Stata code for running the correct test
Or talk about non-iid returns (heteroskedasticity, serial corr, bootstrap variants)

Tell me which direction you want next.

Check out this column on whether it is now time to invest in food stocks.

Six month free:

https://bernsteinbook1958.substack.com/cea31403

50 percent off annual membership ($30 total.)

https://bernsteinbook1958.substack.com/4d9daaf9

Economic and Political Insights

Discussion about this post

Ready for more?