AI 讓觀眾成為 3D 版《老友記》的導演了？

阿新 • • 發佈：2022-12-12

《老友記》上線 3D 版了？

允許使用者旋轉鏡頭，且從近景切換到全景觀看故事？

今年出爐的 3D 方向 AI 專案 SitCom3D，能夠自動補齊《老友記》原劇中的三維拍攝空間，使用者可以選擇主檢視、側檢視等不同角度欣賞劇集。鏡頭的主導權在觀眾手中，彷彿親臨拍攝片場。

https://www.bilibili.com/video/BV1p84y1r7ye/?aid=606073788&cid=917566564&page=1

舉個栗子，在原劇中只出現了以下兩個畫面。

看了《老友記》後，AI直接還原出畫面的3D場景，構造不同角度下的鏡頭故事。

https://www.bilibili.com/video/BV1p84y1r7ye/?aid=606073788&cid=917566640&page=2

這流暢的運鏡，彷彿就是新上線的 3D 版老友記，並且你就是導演！

專案介紹：情景喜劇的三維重構

這個專案由 UC 伯克利的 Georgios Pavlakos 等研究人員在 ECCV 2022 “The One Where They Reconstructed 3D Humans and Environments in TV Shows” 論文中提出，旨在藉助 AI 完成影視劇集中的3D重建。

3D建模在很多領域都有廣泛的應用，然而傳統的重建方式耗時巨大，比如在製造業，一個工業級模型需要專業建模師花費數週時間。目前快速獲得3D模型的方式有兩種，一種是靠儀器掃描獲得三維形狀資料，例如點雲；另一種則是基於深度學習，使用AI建模。後者更直觀，且成本更低。想象一下用手機拍幾張、甚至一張照片就能獲得高精度的3D模型，既可以為元宇宙線上生活提供基礎設施，又能在傳統工業領域加速研發流程。

提到2D影象轉3D模型，繞不開近兩年大火的NeRF神經網路。自2020年發表以來眾多高校和業界公司基於它研發了各自版本的NeRF：英偉達推出了極速版的instant NeRF；蘋果的NeuMan框架等等。

NeRF為2D轉3D提供了新的思路，其背後的機制，相關論文解讀有很多，這裡僅做簡述：NeRF通過對數張2D照片的學習，使用神經輻射場的方式建立起畫素點位置(x, y, z)和相機引數(θ, φ)對應影象的volume density體積密度和RGB顏色值的關係，訓練完成後以此生成新的視角。通過這種方式，NeRF相較傳統的3D建模方式能夠生成更精細的還原。

回到TV Show的三維重建專案，如下圖(fig2)所示，研究人員正是基於NeRF模型，通過分析整個劇集裡的三維資訊，精確感知和重建3D人體姿勢和演員位置，並生成新的不存在劇集裡的2D角度影象。

生成3D模型的原理如下圖所示，首先從劇集輸入中通過SfM（Structure-from-Motion）方法估計出攝像機的位置，並通過NeRF重建出精確的環境3D場景資訊。接著根據多鏡頭和單鏡頭的情形，進行人體3D重建。最後基於這些資訊進行更多的編輯和開發，比如在電視劇中刪除一個人，或者插入一隻兔子。

這個專案目前已在github上開源。

專案地址：https://github.com/ethanweber/sitcoms3D
論文地址：https://ethanweber.me/sitcoms3D

感興趣的小夥伴可以直接在矩池雲上覆現這個專案，具體操作方式如下：

專案復現：用矩池雲快速實現三維重構

1、開啟矩池雲官網，進入主機市場，找到合適的機器

等待機器啟動，啟動完成介面如下，點選進入 Jupyter Notebook

2、進入命令列，進行程式碼下載與安裝

依次輸入以下命令

cd /mnt
git clone https://hub.fastgit.xyz/ethanweber/sitcoms3D.git

如果下載慢可以用進入 https://hub.fastgit.xyz/ethanweber/sitcoms3D 手動下載後上傳

https://github.com/CMUAbstract/cote.git

進入資料夾

cd sitcoms3D/

安裝依賴

pip install -r requirements.txt
pip install tqdm

解壓資料
矩池雲已經為大家準備好了 sitcoms3D 資料，大家直接將 /public/data/image/sitcoms3D_data.zip 解壓到 sitcoms3D-master 專案資料夾中的 data 下即可

unzip /public/data/videos_and_music/sitcoms3D_data.zip -d /mnt/sitcoms3D/data

3、進入Jupyter notebook

開啟demo，在Jupyter中“Run all cell”即可執行官方的例程。

4、調整版程式碼

官方 Demo 碼有演示性質，直接執行有可能一些變數會受到干擾，因此我們對程式碼進行了一定的精簡，可以根據需要用以下方式進一步進行使用。

這一模型匯入的資料檔案有以下七個目錄，應該是不同的資料，比如預設的sitcom_location = "Friends-monica_apartment" 表示從老友記裡面選取資料

在第2個cell中更改sitcom_location 可以改變資料。

sitcom_location = sit_locs[0] 
# sitcom_location = "Friends-monica_apartment"

下一步為是選擇影象，原文中使用了romdom隨機選擇一個影象，我們可以加一行程式碼來指定自己的影象。

# choose a random image to work with
image_name = random.choice(list(nerf_image_name_to_info.keys()))
image_name = "ELR_S09E01_00007186.jpg"
print("Showing camera information for image:", image_name)
pprint(nerf_image_name_to_info[image_name])

更改後的程式碼如下：在不指定影象的情況下，每次run all cell 即可隨機抽取影象。

# import various modules
%load_ext autoreload
%autoreload 2
import copy
import json
import os
import random
from pprint import pprint

import mediapy as media
import numpy as np
import smplx
import torch
import trimesh
from tqdm import tqdm
# some custom code
# gross import... maybe put into a python module later
import sys
sys.path.append("..")
from utils.dataloader import human_to_nerf_space, load_colmap_cameras_from_sitcom_location
from utils.render_utils import render_human
from utils.io import load_from_json

這一段定位了資料的路徑並抽取出影象

sit_locs = sorted(os.listdir("../data/sparse_reconstruction_and_nerf_data"))
print(sit_locs)

sitcom_location = sit_locs[3]
print("load.....",sitcom_location)

cameras = load_from_json(f"../data/sparse_reconstruction_and_nerf_data/{sitcom_location}/cameras.json")

nerf_image_name_to_info = {}
for dict_ in cameras["frames"]:
    nerf_image_name_to_info[dict_["image_name"]] = {
        "intrinsics": np.array(dict_["intrinsics"]),
        "camtoworld": np.array(dict_["camtoworld"]),
    }
    
image_name = random.choice(list(nerf_image_name_to_info.keys()))
print("image name: ", image_name)
    
basedir = f"../data/sparse_reconstruction_and_nerf_data/{sitcom_location}"
colmap_image_name_to_info = load_colmap_cameras_from_sitcom_location(basedir)

point_cloud_transform = np.array(cameras["point_cloud_transform"])
scale_factor = np.array(cameras["scale_factor"])
# colmap_rescale = float(smpl_data["colmap_rescale"])

根據抽取的影象讀取其中的人物資料

# the set of image names that we used for nerf
# these images are included in the sparse_reconstruction_and_nerf_data/ folder
nerf_image_names = set(nerf_image_name_to_info.keys())

# the set of image names that we have smpl parameters for
# this is wherever our method "calibrated multi-shot" was run
human_pairs = load_from_json(f"../data/human_pairs/{sitcom_location}.json")
image_name_to_shot_change_image_name = {}
calibrated_multishot_image_names = set()
for image_name_a, human_idx_a, image_name_b, human_idx_b in human_pairs:
    calibrated_multishot_image_names.add(image_name_a)
    calibrated_multishot_image_names.add(image_name_b)
    image_name_to_shot_change_image_name[image_name_a] = image_name_b
    image_name_to_shot_change_image_name[image_name_b] = image_name_a

image_names = nerf_image_names.intersection(calibrated_multishot_image_names)
print("Found {} images that are used for nerf and contain smpl parameters".format(len(image_names)))

# choose a random image name to work with and visualize
# image_name = random.choice(list(image_names))
# image_name = "Friends_S08E20_00001431.jpg"

# read data for the image and a human...
human_data = load_from_json(f"../data/human_data/{sitcom_location}.json")

image_human_data = human_data[image_name]
print("going to visualize image {} with {} humans".format(image_name, len(image_human_data)))

顯示讀取的圖片

image = media.read_image(f"../data/sparse_reconstruction_and_nerf_data/{sitcom_location}/images/{image_name}")
media.show_image(image, height=200)

匯入SMPL模型

model_folder = "../data/smpl_models"
model_type = "smpl"
gender = "neutral"

body_model = smplx.create(model_folder,
                     model_type=model_type,
                     gender=gender)

用於將人物資料載入到模型，提取出mesh的函式

def get_human_obj_mesh(image_name: str, human_idx: int):
    if "smpl" not in human_data[image_name][human_idx]:
        print(f"smpl values don't exist for {image_name} and human_idx {human_idx}")
        return None
    smpl_data = human_data[image_name][human_idx]["smpl"]
    print(smpl_data.keys())

    camera_translation = torch.tensor(smpl_data["camera_translation"])[None]
    betas = torch.tensor(smpl_data["betas"])[None]
    global_orient = torch.tensor(smpl_data["global_orient"])[None]
    body_pose = torch.tensor(smpl_data["body_pose"])[None]
    colmap_rescale = float(smpl_data["colmap_rescale"])

    output = body_model(
        betas=betas,
        global_orient=global_orient,
        body_pose=body_pose,
        return_verts=True)

    vertices = output.vertices + camera_translation
    pose_colmap = torch.from_numpy(colmap_image_name_to_info[image_name]["camtoworld"]).float()
    pose_colmap[:3,3] *= colmap_rescale
    # homogeneous coordinates
    vertices = torch.cat([vertices, torch.ones_like(vertices[..., 0:1])], dim=-1)
    vertices = vertices @ pose_colmap.T
    vertices = vertices[...,:3]

    out_mesh = trimesh.Trimesh(vertices[0].detach().numpy(), body_model.faces, process=False)
    human_obj_filename = "temp.obj"
    out_mesh.export(human_obj_filename);

    # specify the human to render
    obj_mesh_original = trimesh.load(human_obj_filename, process=False)
    obj_mesh = human_to_nerf_space(obj_mesh_original, point_cloud_transform, scale_factor, colmap_rescale)
    return obj_mesh

應用get_human_obj_mesh函式，獲取影象的mesh資料

human_obj_meshes = []
for human_idx in range(len(image_human_data)):
    print("human_idx", human_idx)
    obj_mesh = get_human_obj_mesh(image_name, human_idx)
    if obj_mesh:
        human_obj_meshes.append(obj_mesh)

現實提取出模型的結果

def show_humans(human_obj_meshes, pose, K, image_name):
    image = media.read_image(f"../data/sparse_reconstruction_and_nerf_data/{sitcom_location}/images/{image_name}")
    color_h, depth_h, alpha_h = render_human(human_obj_meshes, pose, K)
    media.show_image(image, height=200, title="Image we use for camera pose and intrinsics")
    media.show_image(color_h, height=200, title="Image of humans rendered from this camera")
    composited = (color_h * alpha_h[...,None] + image * (1 - alpha_h[...,None])).astype("uint8")
    media.show_image(composited, height=200, title="Composited image")

pose = nerf_image_name_to_info[image_name]["camtoworld"]
K = nerf_image_name_to_info[image_name]["intrinsics"]
show_humans(human_obj_meshes, pose, K, image_name) # image_name to read the background image

模型的侷限性

當然，Sitcoms 也受到訓練模型的的一些侷限性，在我們執行的案例中，有相對成功的結果，也有相對混亂的結果。

相對失敗的圖片可以完整地提取出三維資訊，但有可能圖片僅能顯示出畫面的一部分人，甚至會將一些物體誤判為人物，Sitcom 仍存在一些魯棒性的問題。

參考資料

論文地址：https://arxiv.org/abs/2207.14279
GitHub地址：https://github.com/ethanweber/sitcoms3D
專案主頁：https://ethanweber.me/sitcoms3D/
``

AI 讓觀眾成為 3D 版《老友記》的導演了？

《老友記》上線 3D 版了？

專案介紹：情景喜劇的三維重構

專案復現：用矩池雲快速實現三維重構

1、開啟矩池雲官網，進入主機市場，找到合適的機器

2、進入命令列，進行程式碼下載與安裝

3、進入Jupyter notebook

4、調整版程式碼

模型的侷限性

參考資料

AI 讓觀眾成為 3D 版《老友記》的導演了？

學3D次世代模型設計軟體，讓興趣成為職業！零基礎轉行須知

蘋果工程師用 AI 賦能“工業 3D 列印”，讓成品與設計圖差距大幅縮小

愛奇藝、騰訊視訊、優酷、B站：《老友記》全十季高清版將於 2 月 11 日全網首播

讀懂這些BAT大廠面試“潛規則”，讓你成為行走的“offer收割機”

5 個 IDEA 必備外掛，讓效率成為習慣

華為 AI 音箱 2 星雲白版上市：Huawei Sound 音質，支援一碰傳音

3D版CenterNet: CenterPoint，小修小改也能刷爆榜單

如何讓 vim 成為我們的神器(小結)

視訊教程 | 3D版切水果遊戲開發實戰5：載入美術資源

《靈媒》新補丁讓PC版表現更差了優化問題嚴重

讓宇宙成為你的畫布，沙盒獨遊“戴森球計劃”成長攻略

國外開發者發明智慧揹包：用 AI 讓視障人士 “看見”世界

樂高《老友記》公寓套裝5月19日上市可還原各種經典場景

愛奇藝、騰訊視訊、優酷聯合譴責 B 站《老友記重聚特輯》盜版行為

遭愛奇藝、騰訊視訊、優酷聯合譴責後，B 站下架《老友記》相關侵權視訊

AI 版“創造 101”來了：不僅出單曲還拍電視劇，真人偶像迎來失業危機？

IT之家鴻蒙OS版上架記

截拳道，詠春，《赦免者》讓你成為下一個李小龍！

《戰地2042》A測內容曝光網易UU加速器綜合提速讓你成為戰場殺神

AI 讓觀眾成為 3D 版《老友記》的導演了？

《老友記》上線 3D 版了？

專案介紹：情景喜劇的三維重構

專案復現：用矩池雲快速實現三維重構

1、開啟矩池雲官網，進入主機市場，找到合適的機器

2、進入命令列，進行程式碼下載與安裝

3、進入Jupyter notebook

4、調整版程式碼

模型的侷限性

參考資料

相關推薦