The problem (when it exists) is simply due to the level of compression used, and the number of times the voice message is digitized, converted to analog, digitized gain ad infinitum. PAcket drop rates and latency also affect the overall quality of the signals.
Way back when, a guy (named Claude Shannon) in Bell Labs developed a theorum (Shannon’s Law)that stated you have to sample a signal at twice the rate of its bandwidth to maintain the integrity (fidelity) of the input. That is why digital (T-1) carriers sample at 8KHz, and the channel bandwidth is 64KHz (8 bits x 8KHz sample rate). Way over simplified, by sampling at that rate you get a piece of each positive and negative sine wave of any signal. Whew,,, Now when the voice gets compressed to say 16KHz something gets lost.(that I believe is known as Blivot’s Law) In most point to point connections, the loss in fidelity is not noticed. But now set up a series of analog to digital to analog conversions (each compressed) and distortion can become noticeable and objectionable.
Again, there are a lot of variables and I don’t know of any proven guidelines to measure/plan against. It is a bit caveat emptor. I know of networks with voice mail that function quite satisfactorilly. I also know of instances where it has been pretty ugly.
Hope this helps; or confuses you totally.