Voice communication is real-time communication, and there are many factors affecting voice quality. These factors can be roughly divided into two categories: one is that the surrounding environment factors such as echo noise cause poor voice quality; the other is network environment factors such as packet loss delay. Poor voice quality. These two types of factors are different because of their different causes and solutions. Let's talk about ways to improve voice quality.
First look at the solution to poor speech quality due to environmental factors. This kind of method mainly uses signal processing algorithms to improve the sound quality. Different factors have different processing algorithms. The echo cancellation algorithm is used to eliminate the echo, the noise suppression algorithm is used to suppress the noise, and the automatic gain control algorithm is used to adjust the volume to a The expected value. These are more specialized algorithms in the field of signal processing. Fortunately, webRTC is now open source and includes these algorithms (AEC/ANS/AGC). We only have to use these algorithms very well and have very good results. The AEC in the debugging of these algorithms is relatively complicated. I wrote about how to debug in the previous article (audio processing echo cancellation and debugging experience). If you are interested, you can take a look. ANS/AGC is relatively simple. First, do a small application verification algorithm under Linux. It is possible to adjust the parameters and find a value with a relatively good effect. The process of verifying the algorithm is also a process familiar with how the algorithm is used, and it is beneficial to apply the algorithm to the scheme later.
Let's look at the solution to poor voice quality due to network environment factors. Network environment factors mainly include delay, out-of-order, packet loss, jitter, etc. There are also several ways to improve the sound quality, mainly Jitter Buffer, Packet Loss Compensation (PLC), Forward Error Correction (FEC). , retransmission, etc., the following are introduced one by one.
1, Jitter Buffer
Jitter Buffer is mainly for out-of-order and jitter factors. The main function is to sort out out-of-order packets, and at the same time cache the packets for some time (tens of milliseconds) to eliminate the jitter between voice packets to make the playback smoother. I wrote the design and implementation of Jitter Buffer in the previous article (Jitter Buffer Design and Implementation of Audio Transmission). If you are interested, you can take a look.
2, FEC
FEC is mainly for the factor of packet loss. FEC is a channel code. Friends who want to understand the principle can look for related articles. I won't talk about it here. Besides, I can't talk about it well. I master the source code (voice coding is a kind of source code), and I only know about channel coding. The use of FEC for voice compensation is mainly based on the FEC code generated by the originating RTP packet (several groups, called the original packet), and the redundancy packet is sent to the receiving end, and the receiving end receives the redundant packet and combines the FEC decoding. Get the original RTP package and fill in the dropped RTP package. As for generating several redundant packets, this depends on the packet loss rate fed back by the receiver. For example, the original packet is a group of five, and the packet loss rate is 30%. After FEC encoding, two redundant packets need to be generated, and the seven packets are sent to each other. The other party receives the original packet and the number of redundant packets and can restore 5 original packets through FEC as long as five, and the lost in the original 5 packets is compensated in this way. The original RTP packet has a header and a payload, and an FEC header (in the middle of the RTP header and the payload) is added to the redundant packet. The FEC header structure is as follows:
The Group first sequence number refers to the sequence number of the first packet in the original packet, the original count refers to the number of original packets, the redundant count refers to the number of generated redundant packets, and the Redundant index refers to the number. Several redundant packages. The redundant packet has its own payload type and sequence number. It is necessary to tell the other party what the payload type of the redundant packet is in the SDP of the SIP. After receiving the payload type packet, the other party performs redundant packet processing.
FEC does not rely on the payload in the voice packet, and can accurately recover the lost packet. But it also has shortcomings. First, it has to accumulate a specified number of packets to be able to recover accurately, which increases the delay; second, it generates redundant packets and sends them to the other party, increasing traffic.
3, PLC
PLC is also mainly for the packet loss factor. It is essentially a signal processing method that uses the one or several packets received previously to approximate the current lost packet. There are many techniques for generating compensation packets, such as pitch waveform copying (this technology is used by G711 Appendix A PLC), waveform similarity overlay technology (WSOLA), and pitch synchronization overlay (PSOLA) technology. These are very professional and interesting. You can find related articles to see. For codec, if you support PLC function, such as G729, you don't need to add PLC function externally. You only need to configure the codec to make its PLC function enabled. If the PLC function is not supported, such as G711, the PLC needs to be implemented externally.
PLC has a good effect on small packet loss rate (< 15%), the big packet loss rate effect is not good, especially continuous packet loss, the first lost packet compensation effect is not bad, the more lost to the back The worse the package effect.
By combining Jitter Buffer, FEC, and PLC, you can get the following sound quality improvement scheme for the network environment on the receiving side:
The RTP packets received from the network, such as the original package, must not only be PUT into JB, but also PUT into FEC. If it is a redundant packet, it only PUTs into the FEC. In the FEC, if the number of original packets in a group of packets plus the number of redundant packets reaches the specified value, FEC decoding is started to obtain the lost original packet, and those lost are obtained. The original package PUT into JB. The voice frame is GET decoded from the JB when needed and it is possible to do the PLC.
4, retransmission
Retransmission is also mainly for the loss of packets, and the lost packets are re-transmitted to the other party, generally using the method of retransmission on demand. I did this when I used the retransmission method: the receiver puts the received packets in the buffer and puts them in the buffer. If the sequence number in the RTP header is divisible by 5 (ie, modulo 5), the statistics are collected. If there is any packet that has not been played in front of this packet, it is not received (that is, the corresponding position in the buffer is empty), and it is recorded in the form of a bit (the previous packet of the packet that can be divisible by 5 is represented by bit 0, and the packet is lost. Set to 1, no packet is set to 0, the bit is a total of 16 bits (short type), so you can see if the first 16 packets have packet loss, and then form a control packet (the payload of the control packet has two aspects: The sequence number (short type) of the packet that can be divisible by 5 and the 16-bit bit formed above are sent to the other party, and the other party resends the packets. After receiving the control packet, the receiver can parse which packets are lost and then retransmit the packets. In the payload of the control packet, the sequence number of each lost packet can also be sent to the other party. Here, the bit is mainly used to reduce the payload size and save traffic.
Re-transmission in actual use is not very effective, mainly because it is too late to retransmit the packet, and has missed the playback window and can only actively discard it. It is the least effective of these methods.
5, RFC2198
RFC2198 is RTP Payload for Redundant Audio Data (RTP payload format for redundant audio data). After using it, it can not only carry the payload of the current voice but also the payload of the first few packets in the current RTP packet. The more packets there are, the better the effect is in the case of high packet loss rates, but the greater the delay, the more traffic is consumed. Compared to FEC, it consumes more traffic because FEC encodes one or more redundant packets with a set of RTP packets, and one RTP packet carries one or more payloads of the previous packets. It can be used under wired network or WIFI, and it is recommended to use it carefully under cellular network.
The above is the method I used to improve the sound quality. There are other ways, I have not practiced, I will not write, and it is also written on paper. Welcome everyone to add other methods.
Various battery capacities Vape Pen
Various Battery Capacities Vape Pen,Disposable Vape,1000Mah Capacities Vape Pen,Different Capacity Electronic Cigarette Pen
Lensen Electronics Co., Ltd , https://www.lensenvape.com