OPTIMIZING FRAME STRUCTURE FOR INTERACTIVE MULTIVIEW VIDEO STREAMING WITH VIEW SYNTHESIS
Xiaoyu Xiu, Gene Cheung, Antonio Ortega, Jie LiangAbstract
Traditional multiview video coding schemes compress all captured video frames exploiting all possible inter-view and temporal frame correlation for coding gain, creating complex inter-frame dependencies in the process. In contrast, interactive multiview video streaming (IMVS) demands data navigation flexibility in the frame structure design, so that server can send only a single periodically selected video view for decoding and display at client, saving transmission bandwidth. In this paper, we generalize previous IMVS frame structure optimization to allow a client to request an arbitrary virtual view; i.e., the server sends two adjacent coded views for the client to synthesize the desired virtual view. Since existing IMVS schemes transmit only one view at a time, they employ only cross-time prediction; i.e., the frame of previous time instant from which the client switches is used as predictor for the requested view. In our new scenario, two coded views are transmitted, thus within-time prediction can also be used, where the coded frame of one transmitted view is used to predict the frame of the other view of same time instant. Using I-frames, P-frames and Merge (M-) frames as building blocks, we formulate a Lagrangian problem to find the optimal frame structure for a desired storage/streaming rate tradeoff, with the right mixture of cross-time / within-time prediction types. Experiments show that for the same storage cost, the expected streaming rate of the proposed structure can be 40% lower than that of the I-frame-only structure, and 9% lower than that of the structure using M-frames but with cross-time prediction only.
Read Submission [41]