No plugin required. This is entirely possible through parsing Software Automatic Mouth (a 1980s speech synthesiser that didn't use filters) to Xpressive time domain expressions. There are a lot of reverse engineered SAMs on Github. I am still learning and I have had limited success so far. Realistically, the "rendering / parsing" could be done externally in Python, Matlab or C++ with the resulting expression manually pasted into Xpressive.
Paste this proof of concept into Xpressive on the Alpha release and play at C2 (the syntax has changed on the nightly build).
Code: Select all
clamp(-1, floor((((t >= 0 & t < 0.06) * (0.9 * randv(t*480)) * (min(1, (t-0)*120) * min(1, (0.06-t)*120))) + ((t >= 0.06 & t < 0.2616) * ((0.75*sinew(integrate(370.5)) + 0.5*sinew(integrate(1306.5)) + 0.25*sinew(integrate(1774.5))) * (0.8 * (1 - mod(t*f*1.1, 1)))) * (min(1, (t-0.06)*50) * min(1, (0.2616-t)*50))) + ((t >= 0.2616 & t < 0.3336) * (0.9 * randv(t*480)) * (min(1, (t-0.2616)*120) * min(1, (0.3336-t)*120))) + ((t >= 0.3336 & t < 0.5136) * ((0.6*sinew(integrate(117)) + 0.4*sinew(integrate(1053)) + 0.2*sinew(integrate(2359.5))) * (0.8 * (1 - mod(t*f*1.05, 1)))) * (min(1, (t-0.3336)*50) * min(1, (0.5136-t)*50))) + ((t >= 0.5136 & t < 0.7536) * ((0.75*sinew(integrate(351)) + 0.5*sinew(integrate(585)) + 0.25*sinew(integrate(1716))) * (0.8 * (1 - mod(t*f*1.05, 1)))) * (min(1, (t-0.5136)*50) * min(1, (0.7536-t)*50))) + ((t >= 0.7536 & t < 0.9336) * ((0.6*sinew(integrate(117)) + 0.4*sinew(integrate(897)) + 0.2*sinew(integrate(1579.5))) * (0.8 * (1 - mod(t*f*1.05, 1)))) * (min(1, (t-0.7536)*50) * min(1, (0.9336-t)*50))) + ((t >= 0.9336 & t < 1.0776) * ((0.6*sinew(integrate(175.5)) + 0.4*sinew(integrate(1618.5)) + 0.2*sinew(integrate(2145))) * (0.8 * (1 - mod(t*f*1.05, 1)))) * (min(1, (t-0.9336)*50) * min(1, (1.0776-t)*50))) + ((t >= 1.0776 & t < 1.3044) * ((0.75*sinew(integrate(253.5)) + 0.5*sinew(integrate(663)) + 0.25*sinew(integrate(1599))) * (0.8 * (1 - mod(t*f*1.1, 1)))) * (min(1, (t-1.0776)*50) * min(1, (1.3044-t)*50))) + ((t >= 1.3044 & t < 1.4244) * ((0.5*sinew(integrate(175.5)) + 0.3*sinew(integrate(994.5)) + 0.15*sinew(integrate(1813.5))) * (0.8 * (1 - mod(t*f*1.05, 1)))) * (min(1, (t-1.3044)*50) * min(1, (1.4244-t)*50))) + ((t >= 1.4244 & t < 1.6044) * ((0.75*sinew(integrate(273)) + 0.5*sinew(integrate(1423.5)) + 0.25*sinew(integrate(1813.5))) * (0.8 * (1 - mod(t*f*1.05, 1)))) * (min(1, (t-1.4244)*50) * min(1, (1.6044-t)*50))) + ((t >= 1.6044 & t < 1.6764) * (0.9 * randv(t*480)) * (min(1, (t-1.6044)*120) * min(1, (1.6764-t)*120)))) * 16)/16, 1)
Edit: I haven't implemented any of the rhymical or timing ideas that musikbear has pointed out. Yes I understand to get the speech to sing and have rhythm is an even bigger task.