Introduction
Ubiquitous camera networks in smart cities create massive amounts of images and videos at a range of spatial-temporal scales. However, the capabilities of visual processing systems often lag behind the rapid growth of video data and city brain system. To address this challenge, a novel collaborative visual computing framework, termed as digital retina, has been established to align high-efficiency and intelligent perception models with the emerging visual coding for machines. Within this framework, video stream, feature stream, and model stream work collaboratively over the end-edge-cloud platform. Specifically, the compressed video stream targets human vision, the compact feature stream serves for machine vision, and the model stream incrementally updates deep learning models to improve the performance of human/machine vision tasks. The digital retina system enables comprehensive, intelligent, and efficient interactions among retina-like cameras, edge servers, and cloud centers through multiple data streams. It is expected to play a fundamental role in visual big data analysis and retrieval in smart cities. Standardization of digital retinal systems can bring remarkable benefits such as efficient utilization and real-time processing of massive visual data, full utilization of resources, and achieving competitive performance by processing the original visual signals.
A series of standards for digital retina systems is planned by the 3161 Working Group of IEEE Computer Society Data Compression Standards Committee with 12 parts, including: system architecture, end subsystem, edge subsystem, cloud subsystem, algorithm and model repository, storage system, end-edge-cloud collaboration, security and privacy protection, protocols and interfaces, test specification, Measurement and evaluation system, and application guideline. This standard is the first part—-system architecture.
Overview of the Standard
IEEE Std 3161-2022 aims to establish a unified architecture for visual computing systems. By standardizing the collaborative framework among the end, edge, and cloud, along with the cooperative characteristics of multiple data streams, it seeks to reduce the data transmission pressure within the system and alleviate the computational burden on the cloud. Ultimately, it enhances both the efficiency and the performance of video data processing in large-scale applications. On this foundation, this standard specifies a biologically-inspired visual computing framework, named as digital retina, in which three streams, i.e., video stream, feature stream, and model stream, work collaboratively for real-time analysis and processing of video big data. It mainly defines the architecture, components, and functional requirements of digital retina systems. In addition, this standard addresses the fields of visual perception systems and visual information processing technologies. It is applicable to various application scenarios such as intelligent transportation, public safety and intelligent manufacturing in smart cities.
Key Features and Benefits
IEEE Std 3161-2022 outlines a reference architecture, technical characteristics, components, and functional requirements for digital retina systems. The end-edge-cloud collaboration and multi-stream cooperation mechanisms are established, ensuring efficient interaction across all levels. Specifically, technical characteristics of digital retina systems are specified, including globally unified time-space ID, efficient video coding, compact feature representation, model-updatability, software-definability, attention-adjustability, etc. On this foundation, end subsystem is defined as a subsystem mainly used for the perception of scenario information, with such functions as data acquisition, processing, analysis, and transmission. Edge subsystem is defined as a subsystem using edge computing and providing functionalities of multi-channel data aggregation and forwarding, cooperative resource scheduling, and data computing. Cloud subsystem is defined as a subsystem using cloud computing and providing functionalities of system management and collaborative interaction, data aggregation and storage, data collaborative analysis, mining, and decision-making at the global level.
This standard provides an advanced end-edge-cloud collaborative computing architecture for massive video acquisition, processing and transmission, which optimizes resource utilization, and enhances video processing efficiency and performance. Moreover, efficient data processing and transmission mechanisms significantly reduce bandwidth and storage demands, resulting in cost savings for stakeholders. The stakeholders involve and are not limited to end, edge, and cloud device manufacturers or service providers, AI algorithm providers, system developers or integrators.
Adoption and Impact
IEEE Std 3161-2022 has been adopted and implemented on the internet of video things system, thereby overcoming the bottlenecks in large-scale video data processing, effectively reducing the cost of artificial intelligence applications and facilitating the establishment of technology ecosystem for internet of video things. Within this system, end devices continuously capture video streams in real-time, with intelligent analysis performed either at the end device or at the edge, while the cloud is utilized for data mining and decision-making. Relevant systems developed based on IEEE Std 3161-2022 have been deployed across various industries such as safety production, transportation and public safety. In intelligent transportation, the system realizes real-time traffic flow monitoring and analysis and generates real-time traffic congestion heatmaps and accident risk prediction models. It provides fast and accurate decision support for adaptive traffic signal control, thus enhancing traffic management efficiency significantly. In public safety, the system enables rapid and extensive detection of abnormal events, strengthening overall monitoring and response capabilities of societal security.
... continue reading