Learn more about analytics and research best practices, as well as real world examples and solutions for nonprofits.

Response Rate Testing and Statistical Significance

One of the more straightforward analyses we often do at Analytical Ones is to compare the results of a direct mail test to identify whether the differences are statistically significant.

Though this is a straightforward analysis, there are a lot to these tests. So, let me try to clarify a couple of things.

Generally, we are testing to determine whether the differences in either the response rates or average gift sizes are statically significant. Each of these tests are very different and require different data sets and tests.

Let’s take the easy one first, testing the statistical difference in response rates. Because there are just two outcomes for response – yes, the donor responded, denoted by a value of “1”, or no, the donor did not respond, denoted by a value of “0” – you can use summary statistics (averages) for this test. We recommend using a Z test.

All you need for testing response rates are:

1. Number of mailed in the control
2. Number of responses in the control
3. Number of mailed in the test
4. Number of responses in the test

You also need to know the level of test confidence you are comfortable. Confidence levels are standards. Let me try to explain this.

Typically, in Z tests, analysts use one of three confidence levels:

1. 99%
2. 95%
3. 90%

If your results are significant at the 99% confidence level, it means that if you repeated the test 100 times, the result will be statistically significant 99 in 100 times. That’s a very high level of confidence. Conversely, if your results are significant at the 90% confidence level, it means that if you repeated the test 100 times, the result will be statistically significant 90 in 100 times. A strong level of confidence, but not as high of a standard as 99%.

The level of confidence is indirectly proportional to the level of risk in making a change. So, for direct marketing tests, we recommend using a 90% level of confidence. No one is going to die if we make a bad decision – unlike pharmaceutical testing. So, if the test package beats the control with a 90% level of confidence, the recommendation would be to roll out with the new test package.

In my next blog, I’ll discuss testing average gift size differences.

By the way, here is a link to an online tool you can use to test response rate differences.

Part VII: How Analytical Ones Will be Using AI

by Bill Jacobs | Jul 16, 2025

Over the past couple of weeks, I have written about some potential effects that AI will have on the nonprofit sector. Today, I’m going to end this series on how we as a company intend to use AI. There’s no doubt there is a certain “wow-factor” using AI. It’s like Star...

Part VI: The Environmental Impact of AI

by Bill Jacobs | Jul 14, 2025

Up to this point in our AI blog series, I have been discussing (some might say ragging on) the practical implementational challenges of AI in the nonprofit sector. In today’s blog, I’m shifting the focus on a more global issue: Is using AI environmentally sustainable?...

Understanding Donor Trends with the Single-Largest Gift Cohort Analysis (SLG-CA)

by Bill Jacobs | Jul 7, 2025

In my last blog, First-half of 2025 Trends in Fundraising, I discussed the fundraising trend of fewer donors on file, but those that remain giving more. Since the pandemic, we have seen a greater proportion of an organization’s revenue come from major donors. To...