Design of Intelligent IP Camera System Based on Blackfin

1. Background and overview

In recent years, with the increasing complexity of embedded applications and more and more applications, especially multimedia functions are rapidly developing in various fields, high-performance computing has become ubiquitous, from consumer electronics, network communication to industrial control and monitoring. Most applications require higher digital signal processing capabilities. Due to cost and design difficulty, people tend to use a single chip to do all the work. Traditional DSP processors and MCU processors begin to fuse in many forms:

1. The traditional MCU+DSP cooperation solution is integrated into a chip package; or further realized as a true heterogeneous multi-core, which can share some or even all external devices.

2. Add data processing capabilities based on fixed hardware IP modules to the MCU in the form of SoC, such as some codecs. For some fixed standard applications.

3. As the high-end embedded processor frequency breaks through 500MHz and is advancing to 1GHz, the MCU and DSP platforms begin to truly merge. The number of peripheral interfaces and control capabilities of the DSP processor are gradually improved, while the bandwidth and computing power of the MCU are especially software. The multimedia processing power is also getting stronger and stronger, and there is no longer a qualitative difference between the two.

The above three forms of integration actually borrow and cross each other. From the beginning of the future development, there is no real limit between the core and the core of the multi-core processor between the DSP and the MCU. For any application, the corresponding processor is a combination of flexible software computing capabilities, efficient hardware IP modules and corresponding peripherals, the so-called "Convergent Platform".

Such a processor platform puts new and higher requirements on embedded software developers. The high-level language compiler should be able to maximize the use of hardware details to optimize compilation performance, reduce the workload of manual optimization, and maintain the versatility of the software; DSP algorithm developers should not only understand the hardware platform, but also the impact of the architecture of the software platform on algorithm implementation and optimization, such as separating hardware-related parts and unrelated parts to adapt to the structure of the operating system; software platform developers should also consider The details of the processor for system optimization, such as abstracting the system interface for parallel processing of multicore or DMA channels. The boundaries between hardware designers, algorithm designers, and system software designers are also disappearing, and composite embedded system developers and teams can best adapt to changes in the underlying hardware platform.

In this megatrend of DSP/MCU/hardware IP module integration, ADI's Blackfin series processor is a representative product. From the earliest BF53x series, to the current BF54x series of image processing units and the low-power BF52x series, as well as the dual-core BF561, the same core configuration with diverse bandwidth and peripherals for different applications and markets. . There are many factors to consider when selecting and designing software on such a processor:

1. How to maintain maximum efficiency of hardware details of processor DSP features to achieve optimal algorithm performance and parallelism.

2. How to maximize software versatility and avoid excessive hardware related code and assembly code.

3. How to reduce the development cost of software platform, whether it is suitable to adopt open source operating system or module.

4. Whether the system needs real-time performance, whether the adopted software platform provides such real-time guarantee.

These factors are sometimes contradictory, and it is necessary to consider the characteristics of specific applications and their own resources to make a compromised optimal design.

ADI has made a lot of efforts for Blackfin's software platform to provide a diverse software ecosystem. Blackfin supports uClinux, VDK, uC/OSII, Nucleus and many other OS/RTOS. Based on VDSP development tools, it provides a variety of free audio and video codecs, hardware abstract function libraries and drivers. In this BF53x-based intelligent monitoring system, we hope to maximize the chip processing capability and achieve the best coding and intelligent algorithm capabilities, so we chose an RTOS-uC/OSII and its network protocol stack as the operating system platform. In a small RTOS, the user state and the kernel state are generally not distinguished. The overhead of accessing the system hardware resources is small, the interrupt and task switching time are guaranteed in real time, and the use and allocation of memory are relatively free. These characteristics determine the RTOS. It is easy to play the performance of Blackfin as a DSP processor, and can directly use the high-performance H264 encoding library provided by ADI; but the disadvantage is that compared with Linux, which has rich open source resources, RTOS lacks some off-the-shelf applications such as HTTP Server. More development investment or third-party resources.

2.Blackfin IP Camera System Architecture

The entire IP Camera system mainly includes audio and video capture, intelligent video analysis or intelligent audio analysis, audio/video encoding, streaming media package transmission, system control and other modules (as shown in Figure 1).

The basic system is Blackfin's video interface PPI connected to the digital video stream input to receive video signals, while the serial SPORT interface can be connected to the audio input, audio and video data is transferred to the SDRAM in a dedicated DMA channel; if intelligent monitoring is required, it can be inserted as needed Different analysis modules; then the software encoder is responsible for compressing the incoming audio and video in real time and transmitting it as a TS (Transport Stream) stream. The whole system is driven by data flow. The different input, analysis and coding modules in the following figure can be selected according to the needs. The input and output of each module are standard-compliant data streams, which can be flexibly inserted into different positions of the system data stream. Process it. A typical Blackfin single-core chip can only use some of the modules at the same time, but in a dual-core chip such as the BF561 or a two-chip solution, all modules can be used simultaneously.

IP Camera system block diagram

Figure 1 Block diagram of the IP Camera system based on the Blackfin 537 processor

Details: Blackfin 537 processor for IP Camera

2.1 Video capture and encoding

The Blackfin family of DSPs integrates a parallel peripheral interface (PPI) for high-speed parallel data, especially video data, adding a dedicated data throughput channel to the traditional data bus.

The PPI interface not only works in the "hardware synchronization" mode of the BT.601 video stream, but also automatically decodes the BT.656 preamble, allowing seamless connection to multiple video sources and image sensors, as well as direct memory access (DMA). The controller works in conjunction with the PPI to read-only valid video information in the complete video frame information, or only the blanking area. This saves bandwidth significantly when full video frames are not needed. In addition, the PPI can ignore all second-field image information of the interlaced BT.656 video stream, providing a very efficient method for quickly extracting input signals. Finally, because the PPI itself can decode the BT.656 video stream, it can be directly connected to the popular ADV7183A video decoder.

The mainstream coding standards on IP Camera such as H.264, MPEG4, etc., ADI have provided free encoder software. This project adopts H.264 video compression standard. ADI's H.264 encoder performance is optimized to the fullest. It makes full use of the on-chip L1 memory. The data is moved by DMA and executed in parallel with the processor's operation. The main features are: support YUV420 and UYVY422 (CCIR-656) video input format, the output is the basic video stream in NAL unit; for H264 encoding, support Baseline Profile and some Main profile features (Interlaced encoding, CABAC), in The BF53x can achieve maximum real-time 1/2 D1, BF561 supports D1 real-time, supports I and P frames, and adaptive CBR rate control. For different applications, the bit rate of the ADI H.264 encoder is adjustable, enabling real-time transmission even in low-bandwidth applications such as CDMA1x at low bit rates.

2.2 Intelligent monitoring

At present, the development of the monitoring market is becoming more and more intelligent, and various intelligent analysis algorithms for video or audio are industrialized, such as moving target detection and tracking, intrusion detection, special sound detection and positioning, and the like. The Blackfin processor has excellent support for multimedia processing from architecture to instruction set, and has dedicated video processing instructions, so it is especially suitable for implementing flexible multimedia intelligent analysis algorithms. ADI has launched the "Image Tool Box" intelligent monitoring software package, which has been specially optimized for some common and basic functions in the intelligent monitoring algorithm. It has good performance and can accelerate the implementation and optimization of the upper layer algorithm.

The implementation of general intelligent analysis is for uncoded media streams, but there are also algorithms that use encoder output. Because the project directly uses the code library given by ADI, the front-end intelligent module directly analyzes the input media stream and outputs the result. There are many types of intelligent processing of audio and video, and they are constantly improving, so they are generally implemented with high-performance DSP. There are a variety of intelligent processing modules on Blackfin, such as fisheye correction, moving object detection and motion detection based upper algorithms such as remnant detection, intrusion detection, gunshot detection and positioning. According to the output result of the intelligent module, the system control and coding part can perform corresponding intelligent processing, such as sketching the frame of the moving object, increasing the resolution of the encoder, adjusting the direction of the camera according to the position of the sound source, and the like. These modules generally have relatively standard input and output interfaces to facilitate system integration in secondary development.

2.3 Media Streaming

The main purpose of IP Camera is to transmit remote video information in real time through the network. This project uses Transport Stream (TS) to transmit via UDP or upper RTP protocol. A transport stream is a data stream defined in accordance with ITU-T Rec. H.222.0 | ISO/IEC 13818-2 and ISO/IEC 13818-3 protocols, which is intended to be carried out in an environment where serious errors are likely to occur. Or the transmission and storage of encoded data for multiple programs. TS is mainly used for real-time transmission of programs, such as real-time broadcasted TV programs. The main feature is that it is required to be independently decodable from any segment of the video stream. Therefore, it can be accessed at any time on the receiving end. At present, there is no unified media stream standard in the video surveillance field, but the TS over RTP/UDP standard is conducive to future system integration. Some third-party ADIs provide complete RTP protocol stack products, and there are also some open source implementations available on the network.

In terms of data links, for compressed video transmission, the general network interface chip can meet the bandwidth requirements, but the processor occupancy ratio is also a very important standard when evaluating network performance. The BF537 chip in the BF53x series has a built-in 10/100M MAC interface and a dedicated DMA data channel, so the transmission and processor footprint are very good. The network traffic of 1Mbps is only consumed on the BF537-based IP Camera. 1% of processor performance, such as monitoring the H264 D1 resolution monitoring stream, consumes less than 10 MIPS of processor.

2.4 Software Architecture

μC/OSII is one of the RTOS supported by Blackfin. It has strong hard real-time performance on the high-speed Blackfin processor. The OS interrupt response time is about 110 cycles (about 0.18us at 600MHz). The system first creates a main task that is responsible for system initialization and creating other module tasks. Each module task runs independently, processing its own input and output data streams, and the degree of coupling between the modules is low, which allows for flexible trade-offs. In terms of the network protocol stack, Blackfin also has a variety of options. In addition to the TCP/IP protocol stack provided by each commercial RTOS, LWIP is the leader in the open source network protocol stack and also has a ported version on the Blackfin processor. This project uses the uC/IP protocol stack that is compatible with uC/OSII.

The software architecture is divided into audio and video capture, intelligent analysis, coding and packaging, network transmission, system control and other modules. Each module is responsible for tasks with different priorities, which is very beneficial to system integration and modular design. Modules are independent of each other, synchronized by semaphores, and the data structures between modules are designed to be double buffered or multi-buffered to ensure parallel execution of IO modules and arithmetic modules. For fault tolerance under high system load, the program and data structure are also taken into consideration. Occasional frame dropping will not affect the system's continued operation and report errors to the system control section.

The following table (Table 1) lists the sources of each module in the system:

Table 1 IP Camera System Module Source

Table 1 IP Camera System Module Source

3. System optimization

In the DSP system, once the algorithm is determined, the optimization ideas in the implementation process are generally fixed. First, use some optimization switches and means of the compiler, then analyze the algorithm, find out the key code and data, and do some key parts. Manual adjustments, such as rewriting into assembly. However, in realizing a complete system including multiple input and output and multiple algorithms in parallel, how to achieve the optimization of the overall operation, in addition to the traditional algorithm optimization, also need to consider some factors from the perspective of the system:

1. Maximize and optimize system bandwidth

In such a complex system, multiple inputs and outputs of video and audio data introduce conflicts and delays, which have a great impact on the efficiency of off-chip memory usage. Blackfin's SDRAM controller supports multi-bank data concurrent transmission, so we should try to put different channel IO data in different banks of memory. Therefore, in the design of audio and video and network data structure, not only to ensure efficient synchronization, but also to use the features provided by Blackfin development tools to spread the data in different banks.

2. Efficient allocation of on-chip L1 memory

Traditionally, the L1 high-speed memory inside the DSP processor can be directly accessed, storing key codes and data, and improving the efficiency of the algorithm. The L1 of the MCU is generally used for Cache, and the software cannot directly control it. On high-performance processors such as Blackfin, L1 can be flexibly configured as Cache or direct access. In a complete system, we have to take into account each module and the operating system itself, part of L1 memory is used as Cache to ensure the overall cache hit rate; the other part is used as a key module of SRAM algorithm. Here we need to do some repeated adjustments and tests to find an optimized L1 configuration, the ultimate goal is to achieve the highest L1 memory usage (hit rate).

3. Use DMA channel most efficiently

More and more processors provide dedicated DMA channels for the IO interface to ease the burden on the processor for data input and output. In addition to the DMA of the audio and video interface, Blackfin also has a dedicated memory DMA channel. But to optimize the use of DMA, the most important thing is to use ping-pong buffering to streamline the processor and DMA channels. In the data input, the internal memory DMA, data output and other aspects of the algorithm must use DMA and ping-pong buffer to ensure the highest system efficiency. This requires each driver and software module to support such data structures and operational models.

In summary, new processors often integrate multiple performance-enhancing mechanisms. The idea of ​​software system optimization is to ensure the total system bandwidth (such as multi-bus, multi-DMA channel) and total computing units (such as multi-core, multi-multiplier). Parallelization and pipelining require developers to be guaranteed at different levels of the system and application.

4. Performance analysis

The overhead of the RTOS used by the system is mainly TImer TIck with a timing of 10ms, which is negligible. Due to the high network performance of the BF537, the processor time occupied by network transmission is also very small. The main processor time is consumed in the intelligent analysis and encoding of audio and video.

The project can run on a single-core or dual-core Blackfin platform, and the receiver uses the open source project Video Lan Client (VLC) to receive and play. For the 600MHz Blackfin core of the single-cycle instruction set, we generally use 600 MIPS to represent the total processor capacity of the single core. Some of the system module consumption processor capabilities listed in the following table are also expressed in MIPS, as shown in the following table (Table 2). Show.

Table 2 Performance Test of IP Camera (Unit: MIPS)
IP Camera performance test

As can be seen from the performance list above, for each module that can be used in the system, we can have a performance analysis under different parameters. On this basis, different performance processors, different modules, different encodings are used. Format, even different frame rates, we can combine different systems for different applications to achieve product differentiation. For example, we can run the intelligent processing algorithm completely, and only start the encoding module to send the key part of the media stream when necessary. It can also run the encoding module at low bit rate and low frame rate, and the intelligent module dynamically controls the key frame rate. And frame rate. Such an intelligent monitoring system will be more practical and minimize the labor burden.

Intelligent Monitoring IP Camera

Figure 2 Blackfin-based intelligent monitoring IP Camera

5. Summary

In the future, the development of embedded processors, with high computing performance, hardware IP co-processing, multi-core, application-oriented, etc., software platforms and software developers need to adapt to such changes and characteristics. The monitoring field, which is rapidly developing in the direction of intelligence, is in need of such software and hardware platforms to provide support. As a representative of the new generation of processing power and control capability fusion processor, the Blackfin processor family needs to design and update a more complete software platform to match its performance and flexibility, which can meet the needs of the market as quickly as possible. Ensure the difference and innovation between manufacturers. At the same time, Analog Devices and its partners offer a variety of tools and support to ensure that customers can quickly and cost-effectively develop a variety of embedded products that the market needs.

Author:

Yang Wei ADI DSP/Embedded Processor Advanced Technology Application Engineer

Zhang Tiehu ADI Video Surveillance Technology Marketing Manager

Ungrouped

  • [Worldwide Compatibility]With a worldwide 100-240V AC input, it's a truly global charger and perfect for international traveling. Compatible with iOS, Android, & Windows smart phones as well as tablets, speakers, cameras, and other 5V USB devices
  • With 9 years experience in the filed, Shenzhen WAWEIS Technology Co., Ltd is one of the best power supply device manufacturer in China. Our world-class production plant passed ISO9001:2008 & ISO14001:2004 certifications and is equipped with the state-of-art technology and machines. The main products we make are power adapters, which can be used in laptop, LCD display, LED lights, CCTV camera(12v series), Speaker(24v series), Balance car(42v series). All our products comply with European environmental standard as well as CCC,RoHS ,CE , FCC.

Ungrouped,High Quality Ungrouped,Ungrouped Details, CN

Shenzhen Waweis Technology Co., Ltd. , https://www.szwaweischarger.com