Indexed by:
Abstract:
In recent years, there has been a substantial increase in the amount of visual data generated by edge devices. Machines typically process this data to accomplish tasks such as object detection without human visual judgment. However, human viewing is sometimes required during human-robot interaction. Here, there exists a significant difference in the focus of information between humans and machines. To tackle this issue, we propose an end-to-end learning-based image coding framework, aiming to strike a balance between human and machine vision tasks. Also, a portion of the latent space is used for both machine vision and human vision. This is different from a compression framework that only targets human vision. Because of this difference, correlations still exist between tasks. So we propose a partial-channel context model to improve coding performance.Our scalable coding framework achieves simultaneous support for both human and machine vision by partitioning the latent space. Machine vision tasks are handled by a subset of the latent space, referred to as the base layer. More complex human visual reconstruction tasks are accomplished by an additional subset of the latent space, comprising both base and enhancement layers. In the experimental section, we present the performance of human visual reconstruction and machine vision tasks, comparing them with other benchmarks. The experiments demonstrate that our framework achieves a 28.27%-38.16% reduction in bitrate for machine vision tasks and matches the performance of state-of-the-art image codecs in terms of input reconstruction. © 2024 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2024
Page: 1852-1857
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 3
Affiliated Colleges: