This article refers to the address: http://
In the past, speech recognition applications have been limited to PC-based systems, telephony servers, high-end handsets, and PDAs. However, in recent years, advances in technology have enabled low-cost speech recognition processors to be used in the consumer electronics arena.Today's speech recognition processors integrate more features, they are more accurate, and have better development tool support, making it relatively easy to add voice I/O to consumer electronics. Controlling home environment lighting with voice is a consumer application with market potential.
Type of speech recognition
Speech recognition (sometimes referred to as voice recognition or VR) techniques can be divided into three broad categories: speaker-independent (SI) recognition techniques, speaker-related (SD) recognition techniques, and voice confirmation (SV) techniques. Each technology has its own advantages and is suitable for different applications. Products that use SI technology require voice commands that can be used without training the user.
For example, speaker-independent (SI) recognition techniques are generally best suited for lighting controllers. Just as we use names to draw the attention of others, it is also a good idea to activate the lighting controller with a SI command called a "trigger". After the lighting controller is activated, it can accept multiple commands.
Products that incorporate speech recognition typically require a way to let users know that they have heard the instructions and are ready to accept the next indication. That is, they must let the user know that the product is already in the control process. Since the control process is very simple, the lighting controller will respond with a short tone, which shortens the user interaction time with the controller and does not cause too much problems if a false start occurs.
Since speech is a natural communication method for human beings, the speech recognition function can increase the ease of use of the product, and it also extends the physical control range of the user. For example, a voice-controlled lighting switch can provide this value. Perhaps the user is sitting watching TV, and the lighting switch is not within reach, or the environment is too dark to see the light switch, which can be easily solved with a simple voice command.
Figure 1: A typical light controller with speech recognition will operate using the steps shown in this flowchart. |
Design considerations
Because speech recognition is based on a probability function, the designer must make a compromise between accepting instructions (contained in the set of recognition instructions) and rejecting instructions (not included in the set of recognition instructions). For example, if the product has to react very sensitively and occasionally misrecognition (false start) does not pose a big problem, then the application developer may prefer to accept the instruction more importantly. Other applications do not allow for false starts, such as voice-activated ovens or lighting controllers.
Background noise is the nemesis of speech recognition. Both detection and identification require a signal-to-noise ratio (SNR) within a reasonable range (approximately 3:1 or higher). If the application conditions permit, it is best to use a directional microphone or a close-talking microphone to reduce the noise.
Cost is also a consideration. When the end user purchases the product, the product price is already 4 to 5 times the original manufacturing cost. Fortunately, the highly integrated speech processors currently available on the market include the necessary mic preamps, analog-to-digital converters (ADCs), digital filters, core processors, digital-to-analog converters (DACs), and mathematics. Computing engine.
These processors are also bundled with recognition and synthesis techniques from text input to speaker-independent (T2SI). These chips also serve as the main controller for a variety of consumer product functions, and their price points are competitive for consumer electronics. This allows the product to add little or no cost while adding voice functionality.
Design principle of lighting controller
These superior performance make the VR lighting controller very attractive while also helping to address the speech recognition challenges in this application. In a home environment, recognizing an instruction at a distance means eliminating the effects of background noise such as people talking, television, music, dish collisions, and collisions. In addition, such applications must also be adapted to adults and children of different genders.
The signal integrity of the speech recognition output is only the same as the processed signal, so proper microphone circuit design is the most basic. The microphone circuit should be designed so that the combination of the microphone, bias resistor and preamplifier stage can make full use of the output bits of the ADC, that is, to use the output bits of the ADC as much as possible to achieve the best resolution, and is not saturated. In addition, the design should also take into account the range of possible powers when people speak softly or loudly, and the range of distances that the lighting controller may be used (usually up to about 10 feet).
It is best to set the lighting controller to avoid false starts (in a noisy environment, the user may have to repeatedly issue commands), which can be achieved with the Quick T2SI tool settings. Keeping the instruction set as small as possible is important to minimize the erroneous operation of erroneous instructions, especially in noisy environments (such as at home). To maximize the difference between instructions, the T2SI instruction should be as different as possible in terms of sound and length.
Finally, the logic flow of the lighting controller must be simple, natural, and easy to use. To avoid user confusion, the control steps from the lighting controller to the active instruction set state should be minimized. The active instruction set should always contain a copy of the trigger word, thus allowing the user to rebuild their position in the process at any time. Trigger words should be easily associated with lighting control functions, and active commands must be the most commonly used for lighting control. Figure 1 illustrates the process that will be used in the design.
Figure 2: Sensory's VR stamp is a low-cost module that simplifies design by adding the basic functions and components necessary for a speech recognition system. |
hardware design
To simplify the development of lighting controllers, Sensory's VR Stamp was used in this example. The VR Stamp is a low-cost module that includes a Sensory RSC-4128 microprocessor, an audio circuit discrete capacitor and mic preamp, a 3.58MHz crystal, a reset circuit, and 128KB of flash memory for program code.
The VR Stamp also comes with 128KB of serial EEPROM memory, but it is not used in lighting controller applications (see Figure 2). The VR Stamp tool suite includes VR Stamp, Integrated Development Environment (IDE), Quick T2SI, FluentChip library (with a variety of speech recognition and compositing features, including T2SI), VR Stamp programming board and supporting files.
In this voice-activated lighting controller circuit, the VR Stamp module accepts voice commands from the user, then provides control signals to turn the lights on/off, and adjusts the desired lamp brightness by setting the duty cycle (Figure 3). .
The circuit is powered by a 120V, 60Hz AC line power supply. Transformer (T1) and diode bridge (D1) complete the conversion and rectification from AC to DC. The RSC-4128 operates in the 2.4 to 3.6V range. The regulator (U1) provides a stable 3.3V supply to the VR Stamp module. The 3300Ω resistor (R1) reduces the AC linear current to a few milliamps to allow the RSC-4128 to detect when a voltage has zero crossings.
The role of the internal diode is to prevent the chip from being damaged due to excessive input voltage. The two-terminal AC switching element/triac pair (U2/Q2) controls the AC line current at the output (P2). To filter out the low frequency ripple on V DD , a 100μF capacitor (C3) must be used because the unstable V DD will couple into the audio circuitry and will reduce the accuracy of speech recognition.
A microphone for voice recognition input (MK1) and a speaker for sound output (LS1) implement the functional modules of the application. This is a classic circuit that was used to power electric lights. The circuit also reduces the brightness of the lamp by delaying the start-up. This design implements four illumination switch brightness levels, where “brightest†and “off†use 100% and 0% duty cycles, respectively, “medium bright†and “dark†are approximately 50% and 10%, respectively. Duty cycle.
When designing a PCB with speech recognition, designers should remember two design principles:
1. Keeping the analog power supply and analog ground stable should use a voltage regulator to keep the power and ground signals as stable as possible. PCB layout and routing should be properly designed to separate all analog and analog ground signals from digital ground. The analog power supply and analog ground should be connected to the main power supply and the main ground (regulator for this application). This type of connection is often referred to as "star grounding." Place the regulator as close as possible to the MIC _ RET pin of the VR Stamp and use thick wires and PCB traces for all power and ground signals.
2. The microphone connection is as short as possible, and shielded protection makes all analog traces on the PCB as short as possible. In particular, the main audio signal path from the positive input of the microphone to the VR Stamp should be as short as possible. The amplitude of the high impedance audio signal is only a few millivolts of peak-to-peak. To avoid antenna effects from digital noise and electromagnetic interference (EMI), a shielded cable must be used to connect the microphone to the circuit.
The VR Stamp is designed to provide superior recognition performance with inexpensive omni-directional electret microphones. Panasonic's WM-64PKT is used in this application, but many other manufacturers and other models are also available. Although electret microphones require an external power supply to drive the internal FET buffers, they can also be used as current sources after being biased. In addition, the bias current controls the sensitivity of the entire microphone. In this dimmer switch, a microphone with a sensitivity of -44 dB is used. If microphones with different sensitivities are used, the microphone's bias resistance (R4) should be modified as follows:
Sensitivity is the microphone sensitivity you want (in dB format in the microphone specification), R is the microphone impedance, and RS is the microphone bias resistance (R4) required to achieve a given sensitivity.
The layout of the microphone is also a key factor in the success of VR design, and three important design principles should be remembered.
1. Buried mounting microphone components should be placed as close as possible to the mounting surface and should be adequately attached to the plastic housing. There must be no gap between the microphone element and the plastic housing.
2. Unobstructed objects and large enough holes to avoid affecting the recognition effect, and ensure that there are no obstacles in the area in front of the microphone components. The outer casing of the front of the microphone has an opening diameter of at least 5 mm. If it is necessary to add a plastic surface in front of the microphone, make it as thin as possible and, if possible, preferably no more than 0.7 mm.
3. Isolation To prevent the auditory noise generated by operating or vibrating the product from being “acquired†by the microphone, sound insulation should be provided between the microphone and the housing.
Figure 3: The VR stamp module in the voice activated lighting control circuit receives the user's verbal command, provides an on/off light control signal, and sets the brightness of the light. |
software design
Sensory's VR Stamp can run programs developed using FluentChip technology firmware tools and libraries. The FluentChip program is created and managed using the IDE tools included in the VR Stamp tool suite. A program contains one or more code modules (which can be written in assembly or C) and other program resources, including target data files for the T2SI recognition instruction set and SX voice prompts.
The T2SI trigger and instruction set was created using Quick T2SI, a Windows-based SI recognition instruction set creation tool. To use this graphical user interface (GUI)-based tool, the designer simply types the word or phrase to be recognized into the text box and presses the "Build" button. A custom SI collection is created. Note that the trigger word should be entered into the trigger text box and the command entered into the command text box.
These words and phrases can be tested using a PC or downloaded to the VR Stamp for testing. If some words are difficult to identify or are prone to confusion, the designer should adjust the pronunciation of the recognized words and phrases and immediately retest. The Quick T2SI tool also creates object files that can be linked to any T2SI application.
The "Out of Vocabulary Sensitivity" item in the Quick T2SI tool should be set to "Reject More" or "Reject Most" to reduce false starts. The T2SI words should be carefully chosen so that VR can easily distinguish them, and these words are natural to the user. For example, "on" and "off" should not be included in the T2SI word because the pronunciation of the two is too similar and can easily cause confusion.
Longer words such as "power" are a better choice. In addition, this separate word can be used as a switch for turning the light on/off. Other command words, "dimmer low", "dimmer medium", "dimmer high", and "light switch" are long enough to vary widely and are less likely to cause confusion.
Automatic Bending Machine,Steel Bending Machine,Steel Wire Bending Machine,Rod Bending Machine
Yancheng C&J Machinery Co., Ltd. , http://www.chlathemachine.com