mp3的压缩比的深度研究

( 10 )

malisa

891

#1 01-12-21 13:54 …

mp3的压缩比的深度研究

Facts: 128 kbit/s is not cd quality 256 kbit/s is cd quality (x) (in case of Lame or some Fraunhofer, not Xing) In february 2000 c't magazin organised a blind listening test. 300 Audiophiles were involved, finalists tested 17 1-min clips from different artists (classic and pop): original CD recording 128 Kbit/s Joint Stereo [MusicMatch (FhG) v4.4] encoded PC decoded Mac 256 Kbit/s Joint Stereo [MusicMatch (FhG) v4.4] encoded PC decoded Mac all on cdrs and played in a Recording Studio on: B&W Nautilus 803, Marantz CD14 with amp PM14 (Straightwire Pro cabling and extra's) [DM30000- so bit more than $15000] Sennheiser Orpheus Electrostatic Reference-headphones with tweaked accompanying amp (digital and analog out) [>$10000] Conclusions: 90% of the 128 Kbit material was picked out MP3@256 was rated to have the same music quality as cd! If you find MP3@256 to be of inferior quality compared to the original cd, you're very likely to be doing something wrong with the test (correct decoder, no objective double blind testing, DSP filters distorting the process, ...) Maybe this is something for you. You can always read the article in the german c't 6/2000 on p92. The treshold of mp3 transparency lies somewhere between 128kbit/s and 256kbit/s, depending on the kind of music and your hearing and equipment. Why not to use any Xing encoders (Xing, Audio Catalyst, MpegDJ, ...): stereo separation problems in joint stereo mode (listen with headphones) all above 16kHz is mutilated, at any given bitrate (up to 320, and VBR High) In mp3 you can have long and short blocks. Xing encoders don't use "short blocks", which, by definition, leaves Xing encoding quick peaks and sharp signals not nearly as accurate as with short blocks, which are used in Blade, FhG and LAME encoders. the code is buggy, and music will get mangled from time to time (try first few seconds of Rammstein on Matrix Soundtrack) Knowing the facts of mp3, you could, if space is not really an issue use: cbr 256kbit/s by Lame or some Fraunhofer encoders Remarks: Guaranteed perfect(x) transparent encoding, but guaranteed overkill on most parts of the music. Also: space is always an issue if you use mp3, because otherwise why no use a lossless (eg. zip) compressor for your music (eg. WavPack, Monkey's Audio or my favorite LPAC) or just store the wavs? The classic trade-off between space and quality for mp3-archival quality is: cbr 192kbit/s by any Fraunhofer encoder (audioactive, radium codec, mp3enc, ...) Remarks: Decent sound quality, but not perfect so no archival quality. Clearly audible encoding artifacts on some music when using hq headphones. Everytime you see a vbr encoder take a bitrate >192 to encode a frame, you know that 192 is not sufficient for that part of the music. LAME brings us the first (and still only) optimally tweaked (unlike Fraunhofer) VBR mp3 encoder that does not mess up (unlike Xing): lame --r3mix infile.wav outfile.mp3 (LAME 3.89b) "--r3mix -b112" is synonym for "--nspsytune --vbr-mtrh -V1 -mj -h -b96 --lowpass 19.5 --athtype 3 --ns-sfb21 2 -Z --scale 0.98 -X0", explained here Remarks: Perfect(x) sound quality at optimal bitrate. Sounds perfect to me and many others with -V2 -b32, but for the sake of people with very expensive audio equipment and "golden ears" an extra few bytes (about 10% size increase) are allowed with this setting. Downside to Lame VBR is the slightly longer encoding time when comparing to CBR. The resulting average bitrate will be 170-175 kbit/s (tested on +500 random mp3's). On a rare occasion you will get a small file (120 kbit) and some very rare tracks require mp3 to use up to 260-270 kbit on average to be reproduced with good quality. However, the global average for all your music is 170-175 kbit/s. If you're into really loud and busy music, like some metal collections, your album averages could be around 200kbit/s. If you're into classical music, your album average may be 160kbit/s. To learn more why this size difference, read here. Why VBR?(answer by ThomasLG) VBR seems like a no-brainer to me. Near the beginning and ending of a song (assuming it starts and ends softly), where the volume is lower, and the music is less "demanding" in terms of its encodability, it makes sense to drop the bit rate, simply because there's not much there to encode, and the wasted space is overkill. In the middle of the song, where it may be more complicated, the idea of giving the encoder the option of "bumping up" the rate on a frame-by-frame basis is great! You may end up with a file that's the same overall size as a 170kbps CBR, but that uses frames as low as 32 on the really dead parts, and as high as 320 on the really tough parts. The bitrate is dynamically adapting to keep the quality constant. To know that the whole file isn't bloated where it isn't necessary, is a real bonus. Here a smoothened overview of a VBR mp3 fragment: (made by EncSpot) As you see there are fragments of a few seconds when the mp3 requires 128kbit/s to sustain quality, and there are parts when the mp3 uses 224 and 256kbit/s to sustain that same quality. Now, as a CBR user, what would you do? Or you choose to encode 128kbit/s and have some horrible sounding seconds in your music clip, or you're left to encode this clip in 256 or even 320 kbit/s, wasting a lot of bits in the process. VBR is being used in all new compression techniques: AAC, MPC, ... and also in MP3, as the standard defines! Why Joint Stereo? Joint Stereo (-mj) is the best setting with Lame. This allows the encoder to dynamically adapt to the music and chooses the best stereo mode for each frame: stereo or ms-stereo. No stereo separation problems can occur in a good implemented JS mode, because when there is too much difference between the L and R channel, simple stereo will be used on that frame. All the frames that are encoded in ms-stereo benefit from the lower bitrate requirement and can thus use the extra bits for more accurate encoding. The underlying reason why you read sometimes that "joint stereo is no good" is simply because the Xing and Fraunhofer implementation is not perfect. Again Lame is much more tweaked and brings a good joint stereo mode. Another reason is that some people mix up the terms "joint stereo", "ms stereo" and "intensity stereo". What's "athtype 3"? ATH stands for "Absolute Treshold of Hearing". ( aka "minimal audition threshold" ) It is the characteristic which plots the minimal volume the human ear can perceive at a certain frequency. The area under the curve contains sounds at unaudible volume. If you read the graph correctly you can see that starting at about 4kHz the human ear becomes less sensitive with raising frequency. Mp3 won't encode sounds situated under this threshold. This also means mp3 will tolerate distortions as long as their volume is under the ATH curve, because these will be inaudible to the listener. Upto version 3.87 the "Type 0: Painter & Spanias" ath curve was used in LAME. Problem with this curve was that it was too inaccurate starting from about 13kHz leaving the higher music frequencies (HF) relatively under-emphasized compared to the lower frequencies. In some music clips this allowed lame to create HF artifacts because the encoder assumed they were "unaudible", which wasn't the case at all. After a great deal of listening tests and tweaks it was found that lowering the HF region of the ATH curve solved just about all HF artifacts. Two new ATH curves were introduced into lame. "Type 1" which was by Frank Klemm and "Type 2" by Gabriel Bouvigne. Of course higher accuracy in the HF area, which is the hardest to encode, comes with higher bitrate demands. This is why my own "Type 3" was derived from "Type 2". In combination with a lowpass filter it tries to compensate for HF bitrate demands by lowering LF emphasis and using a 19.5kHz lowpass filter. On the graph you see in the type0-type3 transition bits are taken from the gray LF area which are later invested in the green HF area. With success. Thanks to the many people from the messageboard who tested this in 3.88 alpha and assured this is without quality loss on the LF end. Type 3 is default in the "--r3mix" setting while for the time being 3.88 beta has Type 2 defaulted for all cbr/vbr/abr settings except nspsytune settings. Type 2 has roughly the same form of Type 3, but there is no gray area in the graph. Why the 19.5kHz lowpass filter? A number of people can hear up to 18,19 and even 22kHz. Depends on ears, age and equipment. I myself can hear freqs above 20kHz. Why would one deliberately lowpass a signal removing fidelity from the signal? The answer is simple: no-one hears the difference between a 19.5kHz lowpassed signal and the same full-range clip in a double-blind test. It's been proven by science many times before (even with 18.5khz on a very significant number of youngsters) and I did the test myself on my site&forum. In a poll only two people claimed they heard a difference between a 18.5khz and a full range one and the difference was gone at 19kHz. The 19.5 is an extra safety margin. using a pure 22050Hz (44.1khz / 2) "full range" CD source signal is an illusion anyhow because the audio CD standard states you should use a 20kHz lowpass filter anyhow to avoid aliassing. Which is done in normal and professional audio equipment. using mp3, leaving the unaudible frequencies in "just in case" would do more harm than good. Now since the Type 3 ATH with the HF's actually encoded correctly, the higher the frequency the harder to encode. Bitrate is limited and you'd end up wasting necesary bits on inaudible parts and leaving yourself open to other audible artifacts because of bitreservoir depletion. You'd get a 22kHz extremely large mp3 which has a big chance of sounding less like the original than the smaller 19.5kHz version. Why the "V1" quality setting? I've received a few emails asking me why I chose "V1", or second best and why not, for example, "V0". Let me try to explain: I've tested -V2 -b32 for myself on over 500 files, and there was only one with a passage which gave me the feeling that too few bits were allocated. This was with my normal amp & stereo connected to the pc. Other Lame vbr users, with better ears and equipment, also used V2, but in a rare occasion, they chose -b112 -V1 because V2 wasn't sufficient for archival quality. V1 gives a perfect balance imho between quality and space (about 170 kbit/s average). If you use -V0 (or -ms) you end up with a file of about 220kbit/s. Due to the imperfect nature of the psy-model and quantization algorhytms of a perceptual encoder, you can better opt for the safe 256kbit/s solution if you don't mind ending up with such large files. ("using -V 0 does not sound better than a fixed 256kbs encoding": straight from the Lame manual) So after long debates I decided to choose the -V1 -mj -h setting so that even people with better ears and better equipment have a guaranteed quality. Now that I have my new set of headphones and a HQ soundcard, I'm glad I took that 10% extra room for archiving my files. This is something to always keep in mind, and most users that still use 128kbit/s forget that one day they might want to listen to their music on something else than their $10 pc speakers or their walkman. There are few things so annoying than having to encode your complete album collection over and over again every time you upgrade your equipment or get some extra listening experience. An example of how an album looks like in "--r3mix": (eurodance, hq: no distortion nor noise) Bitrate: (kbit/s)Frames:Percentage: ||||128139758,3% ||||||||||||||||||||||| 1607770146,4% ||||||||||||| 1924517327,0% ||||| 224157179,4% || 25683635,0% || 32065473,9% Average bitrate: (kbit/s)Length: (min:sec)Total frames: 183,072:54:94167478 It comes out sounding marvellous, only 183 kbit/s average! As you can see the biggest part of the music is encoded in 160kbit/s frames and for the 48,3% harder parts, higher bitrates are assigned. There are even 6547 instances where the psycho-accoustic model of Lame chooses 320kbit/s while aiming to stay below V1 noise levels. Taking into account 256kbit/s should be enough for cd-quality (x) encoding, this shows how low the V1 noise levels actually are. In LAME 3.90, --r3mix switched to the faster vbr-mtrh. It's complete commandline is now: "--r3mix" = "--nspsytune --vbr-mtrh -V1 -mj -h -b96 --lowpass 19.5 --athtype 3 --ns-sfb21 2 -Z --scale 0.98 -X0" a brief explanation of the switches: --nspsytuneuses Naoki Shibata's advanced psycho accoustic model enhancements. Improves many things: Method to calculate tonality Method to calculate ATH Use -X0 and 'careful noise shaping' when encoding in CBR mode Method to switch between long and short block Allocates more bits to short block in which attack exists Method to calculate masking in mid-side stereo Method to switch between midside stereo and regular stereo frames Better bitreservoir handling Allows lower bitrates and is overall more efficient and quality yielding than the default LAME gpsycho. Fixes problems on most of the hardest tracks and solved all JS problems I know. --vbr-mtrhthe fast and high quality vbr mode designed by Mark Taylor and Robert Hegemann. About 2.5x as efficient as vbr-old. -V1VBR quality 1, scale 0-9 -mjJoint stereo, much more efficient than plain stereo for VBR use, giving better quality in just about all cases. Switches between Mid-Side frames and Single Stereo when needed. It's a dynamic switching algorimts which picks the most efficient way of coding on a frame basis. -hhigh quality setting, normally defaulted -b96minimal frame size 96kbit/s. perfectly possible without risk of problems thanks to nspsytune. Allows movie soundtracks and old or vocal tracks to be encoded very small (early 100 kbit/s) at just the same quality some 220kbit/s metal tracks would be. --lowpass 19.5still accepted by the best ears as the best solution for mp3. no audible bits gone at all. extensive info here --athtype 3slightly adapted ATHtype compared to 3.89 and 3.88. principles here --ns-sfb21 2Will reduce sizes of some loud and distorted signals packed with high freqs 20-50kbit/s compared to the default setting without. Still, does not generate problems with HF's in 3.90's --r3mix because of nspsytune and the shape of the ATHcurve. Related to this. -ZThis switch actually disables "-Z" aka scalefac_scale which generated noise pumping problems at lower framesizes. Adds no real size but improves quality significantly. --scale 0.98This will systematically "normalize" down all encoded pieces to 98% of what you put in. Don't worry, this is by far unaudible and will guarantee that even the signals with the highest peak volumes, up to 100% will have virtually no clipping after the mp3 encoding-decoding cycle. -X0disable the experimental X setting, which is defaulted by nspsytune but adds needless weight to files at current on vbr-mtrh files. I hope this helps explains a bit how big the changes are in the 3.89 - 3.90 transition. Now you're ready to make cd-quality mp3's (x) at the optimum bitrate. Have fun!

补充日期: 2001-12-17 16:45:53

-------------------------------------------------------------
Facts:

128 kbit/s is not cd quality
256 kbit/s is cd quality (x) (in case of Lame or　 some Fraunhofer, not Xing)
In february 2000 c't magazin organised a blind listening test. 300 Audiophiles were involved, finalists tested 17 1-min clips from different artists (classic and pop):
original CD recording
128 Kbit/s Joint Stereo [MusicMatch (FhG) v4.4] encoded PC decoded Mac
256 Kbit/s Joint Stereo [MusicMatch (FhG) v4.4] encoded PC decoded Mac
all on cdrs and played in a Recording Studio on:
B&W Nautilus 803, Marantz CD14 with amp PM14 (Straightwire Pro cabling and extra's) [DM30000- so bit more than $15000]
Sennheiser Orpheus Electrostatic Reference-headphones with tweaked accompanying amp (digital and analog out) [>$10000]
Conclusions:

90% of the 128 Kbit material was picked out
MP3@256 was rated to have the same music quality as cd!
If you find MP3@256 to be of inferior quality compared to the original cd, you're very likely to be doing something wrong with the test (correct decoder, no objective double blind testing, DSP filters distorting the process, ...)　Maybe this is something for you. You can always read the article in the german c't 6/2000 on p92.
The treshold of mp3 transparency lies somewhere between 128kbit/s and 256kbit/s, depending on the kind of music and your hearing and equipment.
-------------------------------------------------------------
Why not to use any Xing encoders (Xing, Audio Catalyst, MpegDJ, ...):

stereo separation problems in joint stereo mode (listen with headphones)
all above 16kHz is mutilated, at any given bitrate (up to 320, and VBR High)
In mp3 you can have long and short blocks. Xing encoders don't use "short blocks", which, by definition, leaves Xing encoding quick peaks and sharp signals not nearly as accurate as with short blocks, which are used in Blade, FhG and LAME encoders.
the code is buggy, and music will get mangled from time to time (try first few seconds of Rammstein on Matrix Soundtrack)

-------------------------------------------------------------
Knowing the facts of mp3, you could, if space is not really an issue use:
cbr 256kbit/s by Lame or some Fraunhofer encoders
Remarks: Guaranteed perfect(x) transparent encoding, but guaranteed overkill on most parts of the music. Also: space is always an issue if you use mp3, because otherwise why no use a lossless (eg. zip) compressor for your music (eg. WavPack, Monkey's Audio or my favorite LPAC) or just store the wavs?

-------------------------------------------------------------
The classic trade-off between space and quality for mp3-archival quality is:
cbr 192kbit/s by any Fraunhofer encoder (audioactive, radium codec, mp3enc, ...)
Remarks: Decent sound quality, but not perfect so no archival quality. Clearly audible encoding artifacts on some music when using hq headphones.  Everytime you see a vbr encoder take a bitrate >192 to encode a frame, you know that 192 is not sufficient for that part of the music.

-------------------------------------------------------------
LAME brings us the first (and still only) optimally tweaked (unlike Fraunhofer) VBR mp3 encoder that does not mess up (unlike Xing):
　
lame --r3mix  infile.wav outfile.mp3 (LAME 3.89b)

"--r3mix -b112" is synonym for "--nspsytune --vbr-mtrh -V1 -mj -h -b96 --lowpass 19.5 --athtype 3 --ns-sfb21 2 -Z --scale 0.98 -X0", explained here

Remarks: Perfect(x) sound quality at optimal bitrate. Sounds perfect to me and many others with -V2 -b32, but for the sake of people with very expensive audio equipment and "golden ears" an extra few bytes (about 10% size increase) are allowed with this setting. Downside to Lame VBR is the slightly longer encoding time when comparing to CBR.　The resulting average bitrate will be 170-175 kbit/s (tested on +500 random mp3's).  On a rare occasion you will get a small file (120 kbit) and some very rare tracks require mp3 to use up to 260-270 kbit on average to be reproduced with good quality. However, the global average for all your music is 170-175 kbit/s. If you're into really loud and busy music, like some metal collections, your album averages could be around 200kbit/s.　If you're into classical music, your album average may be 160kbit/s.　To learn more why this size difference, read here.
-------------------------------------------------------------
Why VBR?(answer by　ThomasLG)

VBR seems like a no-brainer to me. Near the beginning and ending of a song (assuming it starts and ends softly), where the volume is lower, and the music is less "demanding" in terms of its encodability, it makes sense to drop the bit rate, simply because there's not much there to encode, and the wasted space is overkill. In the middle of the song, where it may be more complicated, the idea of giving the encoder the option of "bumping up" the rate on a frame-by-frame basis is great! You may end up with a file that's the same overall size as a 170kbps CBR, but that uses frames as low as 32 on the really dead parts, and as high as 320 on the really tough parts. The bitrate is dynamically adapting to keep the quality constant. To know that the whole file isn't bloated where it isn't necessary, is a real bonus.

Here a smoothened overview of a VBR mp3 fragment: (made by EncSpot)

As you see there are fragments of a few seconds when the mp3 requires 128kbit/s to sustain quality, and there are parts when the mp3 uses 224 and 256kbit/s to sustain that same quality.　Now, as a CBR user, what would you do? Or you choose to encode 128kbit/s and have some horrible sounding seconds in your music clip, or you're left to encode this clip in 256 or even 320 kbit/s, wasting a lot of bits in the process. VBR is being used in all new compression techniques: AAC, MPC, ... and also in MP3, as the standard defines!
-------------------------------------------------------------
Why Joint Stereo?

Joint Stereo (-mj) is the best setting with Lame. This allows the encoder to dynamically adapt to the music and chooses the best stereo mode for each frame: stereo or ms-stereo. No stereo separation problems can occur in a good implemented JS mode, because when there is too much difference between the L and R channel, simple stereo will be used on that frame.　All the frames that are encoded in ms-stereo benefit from the lower bitrate requirement and can thus use the extra bits for more accurate encoding.

The underlying reason why you read sometimes that "joint stereo is no good" is simply because the Xing and Fraunhofer implementation is not perfect. Again Lame is much more tweaked and brings a good joint stereo mode. Another reason is that some people mix up the terms "joint stereo", "ms stereo" and "intensity stereo".
-------------------------------------------------------------
What's "athtype 3"?

ATH stands for "Absolute Treshold of Hearing". ( aka "minimal audition threshold" )　It is the characteristic which plots the minimal volume the human ear can perceive at a certain frequency.　The area under the curve contains sounds at unaudible volume.　If you read the graph correctly you can see that starting at about 4kHz the human ear becomes less sensitive with raising frequency.

Mp3 won't encode sounds situated under this threshold. This also means mp3 will tolerate distortions as long as their volume is under the ATH curve, because these will be inaudible to the listener.

Upto version 3.87 the "Type 0: Painter & Spanias" ath curve was used in LAME.　Problem with this curve was that it was too inaccurate starting from about 13kHz leaving the higher music frequencies (HF) relatively under-emphasized compared to the lower frequencies.　In some music clips this allowed lame to create HF artifacts because the encoder assumed they were "unaudible", which wasn't the case at all.

After a great deal of listening tests and tweaks it was found that lowering the HF region of the ATH curve solved just about all HF artifacts.  Two new ATH curves were introduced into lame. "Type 1" which was by Frank Klemm and "Type 2" by Gabriel Bouvigne.　Of course higher accuracy in the HF area, which is the hardest to encode, comes with higher bitrate demands.　This is why my own "Type 3" was derived from "Type 2".　In combination with a lowpass filter it tries to compensate for HF bitrate demands by lowering LF emphasis and using a 19.5kHz lowpass filter.　On the graph you see in the type0-type3 transition bits are taken from the gray LF area which are later invested in the green HF area.　With success.

Thanks to the many people from the messageboard who tested this in 3.88 alpha and assured this is without quality loss on the LF end.

Type 3 is default in the "--r3mix" setting while for the time being 3.88 beta has Type 2 defaulted for all cbr/vbr/abr settings except nspsytune settings.　Type 2 has roughly the same form of Type 3, but there is no gray area in the graph.

-------------------------------------------------------------
Why the 19.5kHz lowpass filter?

A number of people can hear up to 18,19 and even 22kHz.　Depends on ears, age and equipment.　I myself can hear freqs above 20kHz.　Why would one deliberately lowpass a signal removing fidelity from the signal?

The answer is simple:

no-one hears the difference between a 19.5kHz lowpassed signal and the same full-range clip in a double-blind test.　It's been proven by science many times before (even with 18.5khz on a very significant number of youngsters) and I did the test myself on my site&forum.　In a poll only two people claimed they heard a difference between a 18.5khz and a full range one and the difference was gone at 19kHz.　The 19.5 is an extra safety margin.
using a pure 22050Hz (44.1khz / 2) "full range" CD source signal is an illusion anyhow because the audio CD standard states you should use a 20kHz lowpass filter anyhow to avoid aliassing.　Which is done in normal and professional audio equipment.
using mp3, leaving the unaudible frequencies in "just in case" would do more harm than good.　Now since the Type 3 ATH with the HF's actually encoded correctly, the higher the frequency the harder to encode.　Bitrate is limited and you'd end up wasting necesary bits on inaudible parts and leaving yourself open to other audible artifacts because of bitreservoir depletion.  You'd get a 22kHz extremely large mp3 which has a big chance of sounding less like the original than the smaller 19.5kHz version.

-------------------------------------------------------------
Why the "V1" quality setting?
I've received a few emails asking me why I chose "V1", or second best and why not, for example, "V0".　Let me try to explain:

I've tested -V2 -b32 for myself on over 500 files, and there was only one with a passage which gave me the feeling that too few bits were allocated.　This was with my normal amp & stereo connected to the pc.
Other Lame vbr users, with better ears and equipment, also used V2, but in a rare occasion, they chose -b112 -V1 because V2 wasn't sufficient for archival quality.
V1 gives a perfect balance imho between quality and space (about 170 kbit/s average).　If you use -V0 (or -ms) you end up with a file of about 220kbit/s.　Due to the imperfect nature of the psy-model and quantization algorhytms of a perceptual encoder, you can better opt for the safe 256kbit/s solution if you don't mind ending up with such large files. ("using -V 0 does not sound better than a fixed 256kbs encoding": straight from the Lame manual)
So after long debates I decided to choose the -V1 -mj -h setting so that even people with better ears and better equipment have a guaranteed quality.  Now that I have my new set of headphones and a HQ soundcard, I'm glad I took that 10% extra room for archiving my files.　This is something to always keep in mind, and most users that still use 128kbit/s forget that one day they might want to listen to their music on something else than their $10 pc speakers or their walkman.　There are few things so annoying than having to encode your complete album collection over and over again every time you upgrade your equipment or get some extra listening experience.
An example of how an album looks like in "--r3mix": (eurodance, hq: no distortion nor noise)

Bitrate: (kbit/s) Frames: Percentage:
|||| 128 13975 8,3%
||||||||||||||||||||||| 160 77701 46,4%
||||||||||||| 192 45173 27,0%
||||| 224 15717 9,4%
|| 256 8363 5,0%
|| 320 6547 3,9%

Average bitrate: (kbit/s) Length: (min:sec) Total frames:
183,0 72:54:94 167478

It comes out sounding marvellous, only 183 kbit/s average!　As you can see the biggest part of the music is encoded in 160kbit/s frames and for the 48,3% harder parts, higher bitrates are assigned.  There are even 6547 instances where the psycho-accoustic model of Lame chooses 320kbit/s while aiming to stay below V1 noise levels.　Taking into account 256kbit/s should be enough for cd-quality (x) encoding, this shows how low the V1 noise levels actually are.
-------------------------------------------------------------

In LAME 3.90, --r3mix switched to the faster vbr-mtrh.　It's complete commandline is now:

"--r3mix" = "--nspsytune --vbr-mtrh -V1 -mj -h -b96 --lowpass 19.5 --athtype 3 --ns-sfb21 2 -Z --scale 0.98 -X0"

a brief explanation of the switches:

--nspsytune uses Naoki Shibata's advanced psycho accoustic model enhancements.  Improves many things:
Method to calculate tonality
Method to calculate ATH
Use -X0 and 'careful noise shaping' when encoding in CBR mode
Method to switch between long and short block
Allocates more bits to short block in which attack exists
Method to calculate masking in mid-side stereo
Method to switch between midside stereo and regular stereo frames
Better bitreservoir handling
Allows lower bitrates and is overall more efficient and quality yielding than the default LAME gpsycho.　Fixes problems on most of the hardest tracks and solved all JS problems I know.
--vbr-mtrh the fast and high quality vbr mode designed by Mark Taylor and Robert Hegemann. About 2.5x as efficient as vbr-old.
-V1 VBR quality 1, scale 0-9
-mj Joint stereo, much more efficient than plain stereo for VBR use, giving better quality in just about all cases. Switches between Mid-Side frames and Single Stereo when needed.　It's a dynamic switching algorimts which picks the most efficient way of coding on a frame basis.
-h high quality setting, normally defaulted
-b96 minimal frame size 96kbit/s.　perfectly possible without risk of problems thanks to nspsytune. Allows movie soundtracks and old or vocal tracks to be encoded very small (early 100 kbit/s) at just the same quality some 220kbit/s metal tracks would be.
--lowpass 19.5 still accepted by the best ears as the best solution for mp3. no audible bits gone at all. extensive info here
--athtype 3 slightly adapted ATHtype compared to 3.89 and 3.88. principles here
--ns-sfb21 2 Will reduce sizes of some loud and distorted signals packed with high freqs 20-50kbit/s compared to the default setting without.　Still, does not generate problems with HF's in 3.90's --r3mix because of nspsytune and the shape of the ATHcurve.　Related to this.
-Z This switch actually disables "-Z" aka scalefac_scale which generated noise pumping problems at lower framesizes.　Adds no real size but improves quality significantly.
--scale 0.98 This will systematically "normalize" down all encoded pieces to 98% of what you put in.　Don't worry, this is by far unaudible and will guarantee that even the signals with the highest peak volumes, up to 100% will have virtually no clipping after the mp3 encoding-decoding cycle.
-X0 disable the experimental X setting, which is defaulted by nspsytune but adds needless weight to files at current on vbr-mtrh files.

I hope this helps explains a bit how big the changes are in the 3.89 - 3.90 transition.

-------------------------------------------------------------

Now you're ready to make cd-quality mp3's (x) at the optimum bitrate. Have fun!

回复此帖报告

malisa

891

#2 01-12-21 13:56 …

原文地址：
http://www.xici.net/board/doc.asp?id=11087258&sub=0

回复此帖报告

孙军

205

#3 01-12-22 23:21 …

不会吧~~~~~~~~~全是鸟语，有说中国话的吗~~~~~俺是粗人，水平有限，不好意思~~~~~~:D

回复此帖报告

Loory

896

#4 01-12-22 23:46 …

128K当然不可能达到CD音质，256K应该跟MD差不多。

回复此帖报告

anotherbbs

2087

#5 01-12-22 23:55 …

我现在基本上都用动态的mp3编码，音质没得说，就是在选择播放位置的时候有问题。

回复此帖报告

龙歌

1652

#6 01-12-23 18:12 …

提示: 作者被禁止或删除内容自动屏蔽

回复此帖报告

anotherbbs

2087

#7 01-12-23 18:28 …

最初由龙歌发布
[B]

指教一下什么是动态的mp3编码？

SAM2496和SOUNDFORGE压MP3都不行吗？
什么软件好啊？ [/B]

动态编码就是按照音频的成分动态选择压缩比率，
用winamp听起来，那里的压缩比就变来变去的，你可以看看我新的作品
就是动态编码

我用Alto mp3 maker，别的我没怎么用过。

回复此帖报告

DJ_Bryan

#8 02-1-24 23:53 …

MP3应用到电子音乐上超过128K没有意义

回复此帖报告

liuhuhuhu

1591

#9 02-1-25 05:03 …

cooledit支持

回复此帖报告

小昭

1033

#10 02-1-25 14:34 …

用lame来压缩的时候，动态的192k已经达到md的最佳音质了。
用频谱分析很容易看出，听觉也一样。

回复此帖报告

baddog

1552

#11 02-1-30 22:19 …

你们为什么说电子音乐音质差点就过得去了？真是奇怪，动态频响最好的应该是电子音乐，怎么可以过得去？？？听一下ROLAND JV1080的示范曲吧！

回复此帖报告

返回列表