The spectral energy distribution:
In comparison with violin and cello, Erhu has the property of narrow spectral bandwidth and a concentration of low-frequency energy. The timbre of Erhu shows a fixed formant structure. There are mainly three peaks in the frequency spectrum - the one around 500Hz, the one around 1500Hz, and the one around 3000Hz approximately*. Besides, it suggested that the envelope below the first formant should be very sharp. This makes the fundamental, when the pitch is not high enough, largely missing in most cases. The phenomenon coincides with [4], in which, the author thought, it was due to the string was bowed close to the bridge**. In the above tones, energy distributed not exactly the same even in the same pitches. It depends largely on the specific techniques and construction features. In addtion, the somewhat rough sound quality of Erhu can be seen from its spectrum - there is large energy distributed on nonharmonics according to the above figures.
*In the idealized case,the timbre of human voice has formant frequencies at 500, 1500 and 2500Hz with bandwidths of 60 to 100Hz when with approximately 17cm long vocal tract and a uniform tube. This is amazingly close to that of Erhu. Is it the reason for Erhu's similar sound with huamn voice?
**why?
Temporal patterns:
Like other string instruments, Erhu doesn't show much synchronicity in the collective attacks and decays of upper harmonics. During the attack segment, Erhu displays more low-amplitude inharmonic energy than violin and cello, which makes the figure more fuzzy in the initial segment. This might be another reason to explain why Erhu sounds rougher than the other two.