compression, processing, indexing and retrieval of 3d ... · compression, processing, indexing and...

Post on 20-May-2020

18 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Compression, Processing, Indexing and Retrieval of 3D Objects and Data:

How to extend image/video processing to graphics?

Tsuhan ChenCarnegie Mellon University

tsuhan@cmu.edu

Joint work with Howard Leung, Masa Okuda, and Cha Zhang

Tsuhan Chen

(Mis)UnderstandingTo graphics and vision communities

Video is just low-level processingTo the video community

Graphics is just some fancy toolsVision is things that don’t work in practice

Tsuhan Chen

First Attempt…MPEG-4

Started out as model-based codingAnalysis and synthesisUsing vision/graphics for video coding

That didn’t happen (not completely)Settled with 2D shape-based codingModel-based coding for limited content, e.g., faces

Modeling and CodingEXAMPLESMODELS CODED INFORMATION

PCM

Predictive CodingTransform Coding

Block-based codingH.261/263, MPEG-1/2

Model-based coding

MPEG-4

Pixels

Statistically dependent pixels

Moving blocks

Moving objects

Facial models

Moving regions

Color of pixels

Prediction error ortransform coeffs

Motion vectors and prediction error

Shapes, motion, and colors of objects

Action units

Shapes, motion, and colors of regions

Region-based codingH.263+, MPEG-4

MPEG-7A/V objects Description

Tsuhan Chen

Modeling and Coding (cont.)Better modeling implies

Higher compressionMore content accessibilityMore complexityLess error resilience

Video and vision/graphics do go hand-in-hand all alongVideo research is evolution of vision and graphics techniques

Tsuhan Chen

TopicsCompression for image based renderingCompression for 3D meshes

Streaming in texture and geometry jointly Indexing and retrieval of 3D objectsBuilding immersive environments

Tsuhan Chen

Image-Based Rendering

……

[Shum et. al]

Tsuhan Chen

CompressionThe number of images is large, so we need compressionGood to have fewer samples

Does not guarantee fewer bits

Consider these as a video sequenceGeneral video coding applies

Tsuhan Chen

DCT Q

IDCT

IQ

D

ME

MCMV

?

?+

+IDCT

IQ

+D MC

MV

Network or Storage

Encoder Decoder

Video Codec

1 Intra coding2 Inter coding

1

1 1

2

2 2

Previous frame(reference frame)

Current frame

Tsuhan Chen

Intra Coding

Disadvantage: Does not exploit the correlation between images

I-frame I-frame

……

1 2 3I-frame

Tsuhan Chen

Inter Coding

Disadvantage: Does not provide random access

i.e., frame N depends on frame N-1

I-frame P-frame

……

1 2 3P-frame

Tsuhan Chen

Prediction from Sprite

25…………

1 2 k3

[cf. Anandan et. al]

Tsuhan Chen

Generation of SpriteImage 1

Image 2

Image N-1

Image N

Image 1

Image 1

Image 2

Image N-1

Image N

Sprite

Step 1: Finding the offset Step 2: Generating the sprite

Tsuhan Chen

Weighting- need to find a weighting function to blend the

images to form the sprite

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

0.01

0 100 200 300 400

Column number

Wei

ght

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

0.01

0 100 200 300 400

Column number

Wei

ght

0

0.02

0.04

0.06

0.08

0.1

0 100 200 300 400

Column number

Wei

ght

Constant weighting Triangular weighting Delta weighting

Tsuhan Chen

Constant Weighting

…………

1 2 k3

Tsuhan Chen

Triangular Weighting

…………

1 2 k3

Tsuhan Chen

Delta Weighting

…………

1 2 k3

Tsuhan Chen

Modified Codec3 Prediction from sprite image without MC

DCT Q

Prediction

IQ?

?+ IDCT

Network or Storage

Encoder Decoder

sprite + offset

Prediction

+

sprite + offset

Tsuhan Chen

With Motion Compensation4 Prediction from sprite image with MC

DCT Q

Prediction

IQ

ME

MCMV

?

?+ IDCT

Network or Storage

Encoder Decoder

Sprite + offset

Prediction

MCMV

Sprite + offset

+

Tsuhan Chen

With vs. Without MC

3 4without MC with MC

Tsuhan Chen

Test Sequences (1)

Synthetic sequence 1: NetICE room Synthetic sequence 2: Park

Tsuhan Chen

Test Sequences (2)

Real sequence 1: Kids Real sequence 2: Kongmiao

[Shum, et al]

Tsuhan Chen

Weighting function Results

Synthetic sequence 1 Synthetic sequence 2

Real sequence 1 Real sequence 2

25

27

29

31

33

35

37

0 0.05 0.1 0.15 0.2

Bit rate (bpp)

PSNR

(dB)

25

27

29

31

33

35

37

0 0.05 0.1 0.15 0.2

Bit rate (bpp)

PSNR

(dB)

29

31

33

35

37

39

41

43

0 0.02 0.04 0.06 0.08 0.1

Bit rate (bpp)

PSNR

(dB)

34

36

38

40

42

44

0 0.01 0.02 0.03 0.04 0.05

Bit rate (bpp)

PSNR

(dB)

Constant weighting Triangular weighting Delta weighting

Tsuhan Chen

Compression Result

Synthetic sequence 1 Synthetic sequence 2

29

31

33

35

37

39

41

0 0.05

Bit rate (bpp)

PS

NR

(dB

)

Intra coding Inter coding Mosais without MC Mosaic with MC

34

36

38

40

42

44

0 0.01 0.02 0.03 0.04 0.05

Bit rate (bpp)P

SN

R (d

B)

Intra coding Inter coding Mosais without MC Mosaic with MC

Intra coding Inter coding Sprite without MC Sprite with MC

Tsuhan Chen

Compression Result

Real sequence 1 Real sequence 2

2526272829303132333435

0 0.05 0.1

Bit rate (bpp)

PS

NR

(dB

)

Intra coding Inter coding Mosais without MC Mosaic with MC

25262728293031323334

0 0.05 0.1

Bit rate (bpp)P

SN

R (d

B)

Intra coding Inter coding Mosais without MC Mosaic with MC

Intra coding Inter coding Sprite without MC Sprite with MC

Tsuhan Chen

EnhancementsWindow size for searching offsetsStripe motion compensationMC using a large reference frameMultiple sprites

Kids

26

27

28

29

30

31

32

33

0.2 0.3 0.4 0.5 0.6 0.7 0.8

bit rate (bpp)

PS

NR

(d

B)

1 sprite 3 sprites 5 sprites 7 sprites 9 sprites

Tsuhan Chen

Recap…Sprite prediction with MC better than Intra coding

Sprite prediction with MC is preferred for random accessBetter than Inter coding for real data

Delta weighting is the best for constructing the spriteCan be extended to higher dimensions

Lumigraph, lightfield, etc.

Tsuhan Chen

Streaming 3D

3 second

10 seconds40 seconds

Geometry/Texture

Tsuhan Chen

Texture + Geometry = 3D Object

Corner-Based

Vertex-Based

Tsuhan Chen

Why Compression?Each vertex: three floating-point numbersIf each vertex shared by 6 triangles, and max number of vertices per model is 220

? 108 bits/triangle needed

? 100KB~1MB for an average model + texture

trianglebits

IDvertexbits

triangleIDsvertex

vertexbits

trianglevertices 10820

*33*32

*3

*61 ????

?????

????

??????

???

??

????

????

????

??

Compression of 3D ObjectsTexture compression

Static textures: JPEG or JPEG 2000Dynamic textures: MPEG or H.263

Geometry compressionQuantization of vertex coordinatesPredictive codingEntropy coding

Granular/stable progressive codingMesh optimization/simplification

[Hoppe et al][Heckbert et al][Schroder et al][Taubin et al.]

Tsuhan Chen

Texture Coding

Block Diagram

Vertex Quantization

Entropy Coding

Vertex Coordinates

Prediction

Bitstream

Connectivity++

-

3D ModelTexture

Tsuhan Chen

EncodingVertex decimation

C

C

1

234

56

165

4

Re-triangulation

? ?iv

Tsuhan Chen

Importance of Vertices

(1) Volume

(2) Color

)(iv

)(ic

V1V2

Tsuhan Chen

Rank all vertices from high to low based on a cost function:

v (i ) is the geometry costc (i ) is the texture/color/normal cost? is an user-specified parameter

Decimate the vertices with low cost firstTransmit the vertices with high cost first

)()1()()( icivim ?? ???

Tsuhan Chen

Coding of TextureVertex-based

Wavelet (SPIHT) + entropy coding

Corner-basedPadding + DCT + run-length coding + entropy codingTexture re-mapping needed

Tsuhan Chen

Texture Re-Mapping

m

v

vm

(a) (b)

New Texture M ap

t’1

t’2 t’3

t’5

t’4

t1 t2 t3 t4 t5

Texture Rem apping

Retriangulate

t6

t’6

(c)

Tsuhan Chen

683651605

79266234

6750

VRML+ gzip

texture-126Vasetexture-15Ducknone1814Pieta

texture-160Totem

none4840Horsenone4136CrocodileNone1512CowNone1410Beethoven

AttributesMPEG-4OurAlgorithm

(in KBytes)

Comparison

Tsuhan Chen

View-Adaptive Transmission

Viewpoint B

Viewpoint A

Hypothetical Viewpoint

Tsuhan Chen

Retrieval of 3D ObjectsIndexing and retrieval

Much is done for images[Huang et al][Cox et al]

Recent work for 3D objectsRelated to MPEG-7

Feature extractionFeature matching

Tsuhan Chen

Feature ExtractionFeature extraction

Traditionally vertex/surface-basedNew region-based features

moment invariants, Fourier transform coefficients, etc.

Preprocessing to close the model

Surface Region

Tsuhan Chen

Feature Extraction (cont.)Efficiently calculate region-based feature directly from meshSigned feature for each mesh elementRobust to triangulationApplies to any feature that can be decomposed to each mesh element +

??

?+

+

Tsuhan Chen

3D Model Retrieval

Tsuhan Chen

Annotation and Active LearningSemantic thru annotation is needed

Low level features not enoughHierarchical annotationCompatible concepts in annotation

Active learningComplete annotation is impracticalSelect the object most uncertain forannnoation

Tsuhan Chen

Annotation

Tsuhan Chen

Active LearningFor each model, each concept, we maintain a probability of this model belonging to this conceptSet the probability to 1 or 0 if annotatedEstimate probabilities of the unlabeled objects with potential functionUse the probabilities to estimate uncertainty and to measure the semantic distance

Tsuhan Chen

Active Learning

-5 -4 -3 -2 -1 0 1 2 3 4 50.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

max2 d?

d

p )/exp(5.05.0 2max

20 ddcp ????

-10 -8 -6 -4 -2 0 2 4 6 8 100.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

f

kp

Annotated models

One annotated neighborhood Multiple annotated neighborhoods

The potential function

Tsuhan Chen

Estimate the UncertaintyThe general criterion:

fDi: local density functionUi: uncertainty measurement

. ..., ,2 ,1 ),,...,,( 21 NipppfUfG iKiiiDiiDi ??????

).1log()1(log),...,,( maxmaxmaxmax21 iiiiiKiii pppppppU ???????

Kkpp ikki ..., ,2 ,1 ),(maxmax ??

Tsuhan Chen

Estimate the Uncertainty

1p

2p

iep

Tsuhan Chen

Semantic DistanceCannot use Kullback-Leibler convergenceOur semantic disance is defined as:

Annotated models

Unannotated modes

? ?.

,eeconcept tr in the levelat is concept ,1:1maxlLowestLeve

s

jkikk

d

lkandppllLowestLeve

??

????

? ?

? ?

???

?

???

?

?

?????

?????

?

?????

????

? ????

?????

. 0 if ,)1(

0 if ,min1 ;eeconcept tr in the levelat is concept :1

;eeconcept tr in the levelat is concept

,5.0 5.0 ,:maxarg

;,...,2,1 ),1()1(

0

0

20

lLowestLeved

lLowestLevedd

lkllLowestLevelkand

porpTdlk

Kkppppd

lLowestLevesk

skks

jkiksk

k

ikjkjkiksk

?

Tsuhan Chen

Results

0

0.5

1

1.5

2

2.5

3

0 50 100 150 200 250 300 350

# of Samples Annotated

Ret

riev

al P

erfo

rman

ce (D

)

Best Gradient Search

Random Sampling

Our Algorithm

0

1

2

3

4

5

6

0 50 100 150 200 250

# of Models Annotated

Ret

riev

al P

erfo

rman

ce (D

) Best Gradient Search

Random Sampling

Our Algorithm

Synthetic database A small database

Tsuhan Chen

Results (cont.)

0

1

2

3

4

5

6

7

8

9

0 500 1000 1500 2000

# of models annotated

Ret

riev

al P

erfo

rman

ce (D

) Random Sampling

Our algorithm

Tsuhan Chen

Recap…New feature set for 3D modelsActive learning to improve annotation efficiencyCompatible concept tree for annotationProbability for both uncertainty estimation and semantic distance

“Collaboration from anywhere, through any media, as if face-to-face in one room”

Network

Immersive Environments

Tsuhan Chen

Tsuhan Chen

A PrototypeNetICE: Networked Intelligent Collaborative EnvironmentLip-sync facilitates speech understanding

Who is speaking and what is being saidConsistent spatial relationship with eye contact

Whom is spoken toFacial expressions and voice-driven hand gesturesDirectional sound give sense of distance and direction

Who is where; Who is speakingEnable small-group interaction in a room full of people

Information sharingShared whiteboardStreaming 3D objectsEnable collaborative design, e.g., cars, buildings, etc.

Tsuhan Chen

NetICE

Tsuhan Chen

NetICE

Tsuhan Chen

NetICE

Tsuhan Chen

NetICE

Tsuhan Chen

Case Study: Online Auction

Tsuhan Chen

Ongoing WorkUse IBR for background renderingUser study

Together or on-locationTracking for rendering

Head tracking for head orientationGaze tracking for eye contactHand tracking for hand gestures

Tsuhan Chen

SummaryCompression for IBRCompression for 3D meshesIndexing and retrieval of 3D objectsImmersive environments

Tsuhan Chen

Advanced Multimedia Processing Lab

Please visit us at:http://amp.ece.cmu.edu

top related