Download Video Content Analysis Tool: 3DVideoAnnotator

Transcript
Video Content Analysis Tool:
3DVideoAnnotator
Graphical User Interface for annotating
video sequences
Version 2.4
1
Table of Contents
1. Features .......................................................................................................................................... 3 1.1. Introduction ............................................................................................................................. 3 1.2. Implementation........................................................................................................................ 4 1.3. Installing and Uninstalling ...................................................................................................... 4 2. Graphical User Interface ................................................................................................................ 4 2.1. File Menu ................................................................................................................................ 4 2.2. View Menu .............................................................................................................................. 5 2.3. Windows Menu ....................................................................................................................... 7 2.3.1. Player ............................................................................................................................... 7 2.3.2. Annotator ......................................................................................................................... 9 2.3.2.1. Shot Annotation ...................................................................................................... 10 2.3.2.2. Key Segment Annotation........................................................................................ 11 2.3.2.3. Event Annotation .................................................................................................... 12 2.3.2.4. Object Annotation................................................................................................... 12 2.3.2.5. Human Annotation ................................................................................................. 14 2.3.3. Timeline ......................................................................................................................... 17 2.3.4. Editor.............................................................................................................................. 20 2.3.4.1. Shot Editing ............................................................................................................ 21 2.3.4.2. Transition Editing ................................................................................................... 23 2.3.4.3. Key Segment Editing .............................................................................................. 24 2.3.4.4. Event Editing .......................................................................................................... 26 2.3.4.5. Static Object Editing............................................................................................... 27 2.3.4.6. Static Human Editing.............................................................................................. 29 2.3.4.7. Moving Object Editing ........................................................................................... 31 2.3.4.8. Moving Human Editing .......................................................................................... 33 2.3.4.9. Cut Editing.............................................................................................................. 36 2.3.4.10. Header Editing ........................................................................................................ 37 2.3.5. Analyzer ......................................................................................................................... 38 2.3.5.1. Shot Boundary Detector’s Manual ......................................................................... 40 2.3.5.2. Haarcascade frontal face detector manual .............................................................. 40 2.3.5.3. Color+Haarcascade Frontal Face Detector’s Manual ............................................. 41 2.3.5.4. Frontal–Profile Face Detector’s Manual ................................................................ 42 2.3.5.5. Object Detector’s Manual ....................................................................................... 43 2.3.5.6. Particles Tracker’s Manual ..................................................................................... 44 2.3.5.7. LSK Stereo Tracker’s Manual ................................................................................ 45 2.3.5.8. LSK Tracker’s Manual ........................................................................................... 46 2.3.5.9. 3D Rules Detector’s Manual .................................................................................. 47 2.3.5.10. UFO Detector’s Manual ......................................................................................... 48 2.3.5.11. Keyframe Selection Tool’s Manual ........................................................................ 49 2
1. Feattures
1.1. Introoduction
3DVideoAnnnotator is an application that asssists users in
n the task of
o annotatinng video seq
quences andd
viewing thee corresponnding resultts. A video sequence can
c be a sin
ngle-view vvideo or a stereoscopic
s
c
video consiisting of tw
wo channelss (left and rright) and their
t
corressponding diisparity chaannels. Eachh
video can bbe annotateed with shott descriptionns, key seg
gment descrriptions, eveent descripttions, objecct
and humann (either stattic or moving) descripptions, eitheer manually or automattic through algorithmss.
Users can nnavigate thee descriptions through user friend
dly modules such as tim
melines and
d a tree view
w
representatiion, and eddit them. Th
he applicatioon also allo
ows the annotated descr
criptions to be stored inn
an output A
AVDP/XML
L file and can
c read exxisting desccriptions fro
om an AVD
DP/XML file. Figure 1
displays thee application.
Figure 1: 3D
DVideoAnnotato
or application.
3
1.2. Implementation
3DVideoAnnotator is a Windows Forms Application. It is coded in C++/CLI programming language
making use of the OPENCV library for handling videos and the XERCES library for parsing the
XML files. Libraries, which implement various video content analysis algorithms and the storage of
descriptions in AVDP/XML files, are used as well. It is a multiple-document interface (MDI)
application, where all main operations (e.g., manual annotation, navigation of video and audio
content’s descriptions) are executed through separate forms. GUI components such as buttons,
sliders and drop-down menus are used in order to provide user friendliness and ease of use.
1.3. Installing and Uninstalling
This is a stand-alone application, which means no installation is required. All application’s required
files reside inside the root folder which can be extracted anywhere in the user’s computer. The
program requires .NET Framework 4 and Microsoft Visual C++ 2010 Redistributable Package to be
installed on the computer.
Uninstalling the application can be performed by simply erasing the root folder.
2. Graphical User Interface
All operations are executed through menus, which are described below.
2.1. File Menu
Through the File Menu, the user can open an AVI file, save/load a video content’s description, etc..
The menu contains the following functions, as shown in Figure 2.

Open Single Video – It opens an AVI file.

Open Stereo Video → Two videos (L/R) – It opens two dialog boxes though the user selects
the left and the right channel of a stereoscopic video, respectively.

Open Stereo Video → Four videos (L/R plus Disparity) – It opens four dialog boxes
though the user selects the left and the right channel of a stereoscopic video and the
respective disparity channels, respectively.

Open Stereo Video → One video → Left-Right – It opens an AVI file corresponding to a
stereoscopic video which contains the two channels side-by-side.
4

V
→ One video → Top-Botttom – It opeens an AVI file corresp
ponding to a
Opeen Stereo Video
stereoscopic viideo which contains
c
thee two chann
nels in a top-bottom maanner.

V
→ One
O video → Left-Right plus Disparity
D
– It opens an
a AVI filee
Opeen Stereo Video
corrresponding to a stereosscopic videeo which co
ontains the color
c
and ddisparity channels sideby-sside.

Opeen Stereo Video
V
→ One
O video → Top-Bottom plus Disparity – It opens an AVI filee
corrresponding to a stereosscopic videoo which con
ntains the color and dissparity two channels inn
a top-bottom manner.
m

Reaad XML (A
AVDP) – It loads a viddeo content’s descriptio
on from an AVDP/XM
ML file. Thee
loadded descripttion is added to the exi sting descriiption.

Savve XML (AV
VDP) – It saves
s
the viddeo annotattions as an AVDP/XML
A
L file.

Exitt
Figgure 2: File meenu.
2.2. View
w Menu
The View M
Menu (Figurre 3) is asso
ociated withh the video viewing.
v
Specifically:
Figgure 3: View Menu.

Playy – It includdes the follo
owing threee playback modes
m
(Figu
ure 4):
o Slow – Play the vid
deo in slow
w mode.
o Normaal – Play thee video accoording to itss frame rate..
o Fast – Play
P the vid
deo in fast m
mode.
5
Figurre 4: Playback modes.
m

i Figure 5.. Note that it
i is possiblee
Zooom – It incluudes seven available zooom factorss as shown in
for some zoom
m factors to be disabledd, if the scrreen resolution is low or if the frrame size iss
largge.
Fiigure 5: Zoomin
ng.

ut video is a stereoscop
pic video, the user can sselect which
h of the
Sterreo Mode – If the inpu
channnels will be
b visible in
n the Player Window th
hrough the modes
m
show
wn in Figure 6.
Figuure 6: Stereo Modes.
6
2.3. Win
ndows Me
enu
The Windows Menu (F
Figure 7) is the most im
mportant on
ne since all main operaations (videeo playbackk,
video conteent’s descripption annotaation, editinng and navig
gation) are initiated
i
herre.
Figurre 7: Windows Menu.
M
Each of thee five windoows is descrribed next.
2.3.1. Pla
ayer
The Playerr Window oppens a video
o player witth navigatio
on buttons and slider, ass depicted in
i Figure 8:
Figurre 8: Player Win
ndow.
7
The functioonality of thhe buttons of the video player (Figu
ure 8) is exp
plained beloow:
1. It m
moves to thee start of thee video.
2. It m
moves to thee first frame of the prevvious shot.
3. It sttarts playbacck of the video.
4. It sttops playbacck.
5. It m
moves to thee first frame of the nextt shot.
6. It m
moves to thee end of the video.
7. By ddragging thhe slider the user can naavigate throough the viddeo.
Note that thhe first fram
me is frame number
n
1.
The video pplayer also handles
h
som
me mouse eevents. If thee user presses the rightt button of the
t mouse at
a
any position on the video player, a dropdow
wn menu will appear th
hrough whicch the video
o player cann
be resized ((Figure 9).
Figure 9: Zoooming in the video
v
player.
If the userr double-cllicks within a boundding box, then the description
d
of the correspondingg
moving/stattic object (oor human) is
i presentedd on the Ed
ditor Window
w. If the useer double-cclicks withinn
the frame bbut outside the boundiing boxes, tthe descripttion of the correspondi
ding shot (or transition)
appears on the Editor Window. Fo
or more detaails see the section 2.3.4 Editor.
Finally, thee bounding boxes, whiich are dispplayed on th
he video player, can bbe moved orr resized byy
mouse-clickk events, ass depicted in
n Figure 10..
8
Figure 100: Bounding bo
ox editing.
2.3.2. Annotator
The Annotaator Window
w (Figure 11) gives thee ability to manually
m
an
nnotate the vvideo content.
Figure 111: Annotator Window.
W
9
Specificallyy, each videeo can be annotated
a
w
with descriptions of sho
ots, key seggments, eveents, objectss
(moving annd static) or
o humans (moving annd static). First
F
the usser selects tthe type off annotationn
he/she wishhes to perfoorm (see Fiigure 11) annd then pro
oceeds with
h the annotaation. Addittionally, thee
user can deefine on whhich channeels the annootation willl be applied
d. Each of tthe annotation types iss
presented inn detail nexxt.
2.3.2.1. S
Shot Annootation
Pressing thee Shot buttoon, the user can start annnotating a shot.
Figuree 12: Shot Anno
otation.
According to Figure 122, the user is
i able to deefine the folllowing attriibutes:

i initialized
d to the fraame numbeer where thee
Starrt Frame - The first frame of thhe shot. It is
annotation startted.

End
d Frame - The last frame of the shot. It is updated to the frame number off the currennt
fram
me shown.

Chaaracterization - Thee shot cann be characcterized with terms, such as close-up
c
orr
com
mfortable foor viewing, by selectinng a charactterization frrom the corrresponding
g drop-downn
list.

Traansition - If
I the user wants to aannotate a series of frames
f
as bbeing a traansition, thee
corrresponding checkbox should be chhecked.
10

ype of the transition (such
(
as cro
oss-dissolvee or fade-in
n) should bee
Traansition Tyype - The ty
seleected from the
t correspo
onding droppdown list, if
i a transitio
on is being aannotated.

Cutt Characterization in
n Start - Thhe start of the
t shot is automaticaally annotated as a cutt.
Opttionally, thee cut can be charactterized with
h characterrizations suuch as com
mfortable orr
uncomfortable,, using the correspondi
c
ing drop-down list.

n End - Thhe end of th
he shot is automaticallly annotateed as a cutt.
Cutt Characteerization in
Opttionally, thee cut can be charactterized with
h characterrizations suuch as com
mfortable orr
uncomfortable,, using the correspondi
c
ing drop-down list.
wn lists conttain some ppre-defined terms. How
wever, it is ppossible forr the user too
Note that aall drop-dow
define and add new teerms. In orrder to ensuure that thee entire video will connsist of no-overlappingg
shots (or trransitions), any new sh
hot annotatiion causes changes
c
to the durationn of the exiisting shotss.
For examplle, if a videeo with 100 frames connsists of two shots (thee first one sstarts from frame
f
1 andd
has durationn 45 framess, while the second onee starts from
m frame 46 and ends inn frame 100
0) and a new
w
shot (from frame 30 too frame 70)) is insertedd, the video
o will consist of the foollowing sho
ots: the firsst
shot will sttart from fraame 1 and end
e in framee 29, the seecond one will
w start froom frame 30
0 and end inn
frame 70 annd the third one will staart from fraame 71 and end in fram
me 100.
2.3.2.2. K
Key Segment Anno
otation
In order too annotate a frame or a series off frames, as being a keey frame orr a key vid
deo segmennt
respectivelyy, the user should
s
press the Key SSegment button and sim
mply set its dduration, ass depicted inn
the Figure 113.
Figure 13: Key Segment Annotation.
A
11
2.3.2.3. E
Event Ann
notation
Pressing thee Event buttton, the useer can start aannotating an
a event.
Figuree14: Event Annotation.
According to Figure 144, the user is
i able to deefine the folllowing attriibutes:

Starrt Frame - The first frame
f
of thhe event. It is initializeed to the frrame numbeer when thee
annotation startted.

d Frame - The last fraame of the event. It iss updated to
o the frame number off the currennt
End
fram
me shown.

Eveent Type - The
T type off the event should be defined
d
eith
her by selectting a pre-d
defined term
m
from
m the drop-ddown list orr by adding a new one.

Texxt – A free text
t can be added,
a
so ass to describe the event by using naatural langu
uage.
2.3.2.4. O
Object An
nnotation
Pressing the Object buutton, the usser can startt annotating
g an object. By object, w
we mean an
ny region onn
nd to a persson. Firstly, the user sh
hould definee the objectt appearancee
a frame whhich does noot correspon
over one orr more fram
mes. The region of a stattic object on
n a frame iss defined byy clicking-an
nd-draggingg
the mouse to create a bounding box over a video fram
me. If the user wants to annotatte a movingg
object, he/sshe must drraw the bou
unding boxxes in subseequent fram
mes. The app
pplication geenerates thee
bounding bboxes in inteermediate frames
fr
of thhe same videeo channel (e.g. the lefft one) and in the samee
12
positions inn the otherr video chaannels (e.g. the right one). Then, the boundding boxes, which aree
displayed oon the videoo, can be mo
oved or resizzed by mou
use-click events.
As seen in F
Figure 15, the
t user is able
a to definne the follow
wing attribu
utes for a staatic object:

Objject Class - The objectt class, suchh as chair or
o car, is deffined by sellecting a terrm from thee
corrresponding drop-down list.

Objject Type - The specifiic type of thhe object, fo
or example an office chhair.

Oriientation - The
T orientation of the oobject, such
h as oriented
d left.

Possition - The position off the object, such as left
ft.

Sizee - The size of the object, such as ssmall.
Note that thhe four last attributes arre optional. Also, all drrop-down liists contain some pre-d
defined
terms, whille it is possiible for the user
u to definne and add new terms.
Figure 15:: Static Object Annotation.
A
If a movingg object is being
b
annotaated, accordding to Figu
ure 16, the user
u is able tto define thee followingg
attributes:

Objject Class - The objeect class, ssuch as balll, is defineed by seleccting a term
m from thee
corrresponding drop-down list.

Objject Type - The specifiic type of thhe object, fo
or example a football bball.

Movvement Dirrection - Th
he movemennt direction
n of the objeect, such as left.

Movvement Speed - The movement
m
sppeed of the object, such
h as fast.
13

b-Movemen
nt - In case different m
movements occur
o
within the same object appearance, i.ee.
Sub
withhin a conseecutive set of framess where thee object ap
ppears, the user can define theirr
duraations and specific
s
characteristics..
Note that onnly the Object Class is obligatory to be set.
Figure 16: M
Moving Object Annotation.
2.3.2.5. H
Human An
nnotation
n
By pressingg the Humaan button, the
t user caan start annotating a human, i.e. aany region on a framee
which corrresponds too a person, such as bbody, face. Firstly, the user shoould define the humann
appearancee over one or
o more fram
mes. The reggion of a staatic human on a frame is defined by
b clickingand-dragginng the mousse to create a boundingg box over a video fram
me. If the uuser wants to
o annotate a
moving huuman, he/shhe should must
m
the boounding bo
oxes in sub
bsequent fra
rames. The applicationn
generates thhe boundingg boxes in intermediatee frames of the same viideo channeel (e.g. the left
l one) andd
in the samee positions in the otheer video chhannels (e.g
g. the right one). Thenn, the boun
nding boxess,
which are ddisplayed onn the video, can be movved or resizzed by mousse-click eveents.
As seen in F
Figure 17, the
t user is able
a to definne the follow
wing attribu
utes for a staatic human:

Bod
dy Part - The
T body paart of the huuman actorr enclosed in the boundding box iss defined byy
seleecting a term
m from the correspondi
c
ing drop-down list.
14

me - The naame of the human. Thhis can referr to either an
a actual naame (e.g., Bogart)
B
or a
Nam
sym
mbolic namee (e.g., perso
on_1).

Acttivity - The activity of the
t human, such as waalk.

Exp
pression - The
T facial ex
xpression off the human
n, such as an
nger.

Oriientation - The
T orientation of the hhuman, such
h as orienteed left.

Possition - The position off the human,, such as lefft.

Sizee - The size of the hum
man, such as small.
Note that thhe six last atttributes aree optional. A
All drop-down lists con
ntain some ppre-defined
d
terms, whille it is possiible for the user
u to definne and add new terms.
Figure 17: Static Human Annotation.
A
If a movingg human is being
b
annottated, then aaccording to
o Figure 18 the user cann define thee following
attributes:

Bod
dy Part - The
T body paart of the huuman actorr enclosed in the boundding box iss defined byy
seleecting a term
m from the correspondi
c
ing drop-down list.

Nam
me - The naame of the human. Thhis can referr to either an
a actual naame (e.g., Bogart)
B
or a
sym
mbolic namee (e.g., perso
on_1).

Acttivity - The activity of the
t human, such as waalk.

Exp
pression – The
T facial expression oof the human
n, such as anger.
a

Movvement Dirrection - Th
he movemennt direction
n of the hum
man, such ass left.
15

m
sppeed of the human, succh as fast.
Movvement Speed - The movement

Sub
b-Activity - In case diffferent activvities occur within the same humaan appearan
nce, the userr
can define theirr durations and specificc activities.

Sub
b-Expressioon - In case different exxpressions occur
o
within
n the same human appearance, thee
userr can definee their durattions and sppecific expreessions.

Sub
b-Movemen
nt - In case different m
movements occur
o
within
n the same hhuman appearance, thee
userr can definee their durattions and sppecific moveements.
Note that onnly the Boddy Part is ob
bligatory to be set.
Figure 18: M
Moving Human
n Annotation.
16
2.3.3. Tim
meline
The Timeliine Window
w (Figure 19) providess a user friendly way to view tim
me-related parts of thee
video and audio conteent’s descriiption. Spe cifically, th
he user can
n navigate tthe descripttions of thee
shots, transsitions and cuts (Figu
ure 19), keyy frames and key vid
deo segmennts (Figure 19), eventss
(Figure 20)), static objjects (Figurre 21) and humans (F
Figure 22), moving obbjects (Figu
ure 23) andd
humans (Fiigure 24) and
a audio sources.
s
Thhe descriptio
ons are rep
presented byy colored areas
a
on thee
Timeline. T
The length of
o each areaa shows the duration off the corresp
ponding shoot, event, etcc.
The descripptions are organized
o
based on sem
mantic information. Th
hus, the eveents are gro
ouped basedd
on the typee of the eveent (Figure 20), while the static and
a moving
g objects (orr humans) are groupedd
based on thheir object type and name
n
respecctively (Fig
gure 21-24)), if this inf
nformation is
i availablee.
Otherwise, the uniquee ID is used
d. Finally, tthe audio so
ources are grouped baased on the type of thee
source.
The user caan navigate the descrip
ptions per chhannel by selecting
s
thee channel hhe/she wishees to inspecct
from the coorresponding drop-dow
wn list. Wheen the mouse hovers ov
ver an area oof the Timeline (e.g. onn
a shot or a moving period
p
appeearance), thhe id of thee correspon
nding entityy appears (Figure
(
19)).
Additionallly, if the user clicks on
n an area off the Timeliine, the corrresponding description
n appears onn
the Editor W
Window (seee the sectio
on 2.3.4 Ediitor), while the first fraame of the ddescription is displayedd
on the videeo player. Also, in th
he case of static or moving
m
objeects and hum
umans, the overlappingg
bounding bbox on the frame is sh
haded. Finaally, the Timeline Win
ndow can bbe resized, in
i order forr
timelines too be shown in more dettail.
Figure 19: Shots and K
Key Segments on
n the Timeline Window.
17
Figure 20: Eveents on the Tim
meline Window.
Fig
gure 21: Static O
Objects on the Timeline
T
Wind
dow.
18
Fig
gure 22: Static H
Humans on the Timeline Wind
dow.
Figu
ure 23: Movingg Objects on thee Timeline Window.
19
Figu
ure 24: Moving Humans on thee Timeline Win
ndow.
2.3.4. Editor
The Editor Window (F
Figure 25) provides
p
an alternative way to nav
vigate and eedit the videeo content’ss
p of the window dissplays the description
d
in a hierarrchical tree view form
m,
description. The left part
a easy to use. The ddescription is
i initially divided
d
intoo header an
nd channelss,
making it sstructured and
while each channel connsists of tw
wo groups w
which contaiin the shots,, transitionss and cuts, respectively
r
y.
Each of theem containss in groups the
t key seggments (key
y frames and
d key videoo segments), the eventss,
the static aand movingg objects an
nd humans, as depicted
d in Figuree 25. Also, if audio deescription iss
available, aan extra nodde is added to
t the left ppart, which contains
c
nodes for view
wing the au
udio sourcess.
Each part oof the descriiption can be
b expandedd or collapsed. When th
hese nodes are double--clicked, thee
first frame of the correesponding description
d
iis displayed
d on the vid
deo player, w
while in thee case of thee
static and m
moving objjects and humans,
h
thee corresponding bound
ding box is shaded. Also,
A
if theirr
appearancee spans more than one shots (or traansitions) th
hey are not saved in ann AVDP/XM
ML file andd
the node’s label is unnderlined. Through the right part of the wind
dow the useer can see and
a edit thee
description for each element of th
he above grooups. Each group
g
is preesented in ddetail next.
20
Figuree 25: Editor Wiindow.
2.3.4.1. S
Shot Editing
By left cliccking on a node
n
which
h represents a shot, thee right part of the winddow display
ys the shot’ss
description. So, accordding to Figu
ure 26, the uuser can see and edit the followingg attributes:

ID - The uniquue id of the shot.
s
The vaalue cannot be changed
d.

Starrt Frame - The first frame of the shot.

End
d Frame - The
T last fram
me of the shhot.
21

c
orr
Chaaracterization - Thee shot cann be characcterized with terms, such as close-up
com
mfortable foor viewing, by selectinng a charactterization frrom the corrresponding
g drop-downn
list. New termss can be add
ded.

Spaatial Spreaad of Objeects - Thee spread off many staatic objectss on a frame, that iss
charracterized with
w terms as
a spread orr concentratted. Adding
g and deletiing descripttion terms iss
posssible througgh the correesponding buuttons. Thee changes caan be applieed to all the channels inn
the respective frames. Also,
A
for eaach term a confidencee level andd a text to save extraa
m can be stoored by dou
uble-clickin
ng on the terrm, as show
wn in Figuree
infoormation about the term
27.
Note that aany change to the start and/or the end frame of the shot causes chaanges to the duration of
the other shhots/transitioons, as desccribed in Seection 2.3.2.1. Shot Ann
notation.
Figuure 26: Shot Ed
diting.
22
Figurre 27: Spread Editing.
2.3.4.2. T
Transition
n Editing
By left cliccking on a node which
h representss a transitio
on, the right part of thhe window displays thee
transition’ss descriptionn.
Figure 28: Transition Editing.
According to Figure 288, the user can
c see and edit the folllowing attriibutes:

t
T
The value caannot be changed.
ID - The uniquue id of the transition.
23

Start Frame - The first frame of the transition.

End Frame - The last frame of the transition.

Transition Type - The type of the transition (such as cross-dissolve or fade-in). The value
can be changed by selecting a new term from the corresponding drop-down list. New terms
can be added.

Characterization - The transition can be characterized with terms, such as comfortable for
viewing, by selecting a characterization from the corresponding drop-down list. New terms
can be added.

Spatial Spread of Objects - The spread of many static objects on a frame, that is
characterized with terms as spread or concentrated. Adding and deleting description terms is
possible through the corresponding buttons. The changes can be applied to all the channels in
the respective frames. Also, for each term a confidence level and a text to save extra
information about the term can be stored by double-clicking on the term, as shown in Figure
27.
Note that any change to the start and/or the end frame of the transition causes changes to the duration
of the other shots/transitions, as described in Section 2.3.2.1. Shot Annotation.
2.3.4.3. Key Segment Editing
By left clicking on a node which represents a key segment, the right part of the window displays the
key segment’s description.
According to Figure 29, the user can see and edit the following attributes:

ID - The unique id of the key segment. The value cannot be changed.

Start Frame - The first frame of the key segment.

End Frame - The last frame of the key segment.
Note that any change can be applied to all the channels by checking the corresponding box, only if
descriptions of the key segment exist in other channels.
24
Figure 299: Key Segmen
nt Editing.
By right cliicking on a node whicch representts a key seg
gment, a dro
opdown meenu will app
pear (Figuree
30) throughh which the user can:

Delete the desccription of th
he key segm
ment.

Delete the desccriptions of the key seggment from all the chan
nnels.

Go to the descrription of th
he key segm
ment in anoth
her channel.

ment to anoth
her channel.
Coppy the descrription of the key segm

Set as a descripption for a specific
s
channnel, an exiisting descriiption of a kkey segmen
nt.
Fiigure 30: Right--clicking on a key
k segment node.
25
2.3.4.4. E
Event Editting
By left cliccking on a node whicch represent
nts an eventt, the right part of thee window displays
d
thee
event’s description. Soo, according
g to Figure 331, the user can see and
d edit the foollowing atttributes:

ID - The uniquue id of the event.
e
The vvalue canno
ot be changeed.

Starrt Frame - The first frame of the event.

End
d Frame - The
T last fram
me of the evvent.

Eveent Type - The
T type off the event. The value can
c be chan
nged by seleecting a new
w term from
m
the corresponding drop-do
own list. Neew terms can
n be added.

he event by uusing free text.
Texxt – A descrription of th
Note that aany change can be applied to all tthe channelss by checkiing the corrresponding box, only if
descriptions of the eveent exist in other
o
channnels.
Figurre 31: Event Ed
diting.
By right cllicking on a node whicch represennts an eventt, a dropdow
wn menu w
will appear (Figure 32)
through whhich the userr can:
26
Figure 32: Rigght-clicking on an event node.

he event.
Delete the desccription of th

c
Delete the desccriptions of the event frrom all the channels.

he event in aanother chan
nnel.
Go to the descrription of th

nnel.
Coppy the descrription of the event to aanother chan

Set as a descripption for a specific
s
channnel, an exiisting descriiption of ann event.
2.3.4.5. S
Static Object Editin
ng
By left clicking on a node
n
which represents
r
a static objeect (i.e. an object
o
whosee appearancce is markedd
on a single frame), thee right part of
o the windoow displayss the static object’s
o
desccription.
Figure 333: Static Objecct Editing.
27
According to Figure 333, the user can
c see and edit the folllowing attriibutes:

ID - The uniquue id of the static
s
objectt. The valuee cannot be changed.

Fraame - The frrame in whiich the statiic object app
pears. The value
v
cannoot be changeed.

Objject Class - The objectt class, suchh as chair or
o car, is deffined by sellecting a terrm from thee
corrresponding drop-down list. New teerms can bee added.

Objject Type - The speciffic type of tthe static ob
bject, for ex
xample an ooffice chair.. New termss
can be added.

Oriientation - The
T orientation descripption of the static objecct, e.g., oriennted left.

Possition - The position deescription off the static object,
o
e.g., left.

Sizee - The size description
n of the statiic object, e.g., small.

Sizee of Field - The size-off-field descrription of th
he static object, e.g., cloose-up.
For the foour last atttributes ad
dding and deleting description
d
terms is possible through
t
thee
corresponding buttonss. Also, for each term
m a confid
dence level and a texxt to save some extraa
informationn about the term
t
can bee stored by ddouble-click
king on the term, as sho
hown in Figu
ure 27.
Note that aany change can be applied to all tthe channelss by checkiing the corrresponding box, only if
descriptions of the stattic object ex
xist in otherr channels.
By right clicking on a node whicch representts a static object,
o
a dro
opdown meenu will app
pear (Figuree
34) throughh which the user can:
Fiigure 34: Rightt-clicking on a static
s
object nod
de.

Delete the desccription of th
he static obj
bject.

Delete the desccriptions of the static obbject from all
a the chann
nels.

Go to the descrription of th
he static objeect in anoth
her channel.

Coppy the descrription of the static objeect to anoth
her channel.

Set as a descripption for a specific
s
channnel, an exiisting descriiption of a sstatic objectt.
28
2.3.4.6. Static Human Editing
By left clicking on a node which represents a static human (i.e. a human whose appearance is marked
on a single frame), the right part of the window displays the static human’s description. So,
according to Figure 35, the user can see and edit the following attributes:

ID - The unique id of the static human. The value cannot be changed.

Frame - The frame in which the static human appears. The value cannot be changed.

Body Part - The body part of the human actor enclosed in the bounding box is defined by
selecting a term from the corresponding drop-down list. New terms can be added.

Name - The name of the human. This can refer to either an actual name (e.g., Bogart) or a
symbolic name (e.g., person_1).

Activity - The activity (e.g. walk) of the static human. The value can be changed by selecting
a new term from the corresponding drop-down list. New terms can be added.

Expression – The facial expression (e.g. anger) of the static human. The value can be
changed by selecting a new term from the corresponding drop-down list. New terms can be
added.

Orientation - The orientation description of the static human, e.g., oriented left.

Position - The position description of the static human, e.g., left.

Size - The size description of the static human, e.g., small.

Size of Field - The size-of-field description of the static human, e.g., close-up.
For the four last attributes adding and deleting description terms is possible through the
corresponding buttons. Also, for each term a confidence level and a text to save some extra
information about the term can be stored by double-clicking on the term, as shown in Figure 27.
Note that any change can be applied to all the channels by checking the corresponding box, only if
descriptions of the static human exist in other channels.
29
Figure 355: Static Human
n Editing.
By right cliicking on a node whicch representts a static human, a dro
opdown meenu will app
pear (Figuree
36) throughh which the user can:
Fiigure 36: Right--clicking on a static
s
human node.

Delete the desccription of th
he static huuman.

Delete the desccriptions of the static huuman from all the chan
nnels.
30

Go to the description of the static human in another channel.

Copy the description of the static human to another channel.

Set as a description for a specific channel, an existing description of a static human.
2.3.4.7. Moving Object Editing
By left clicking on a node which represents a moving object (namely a series of bounding boxes,
over a number of consecutive frames that depict an object that moves over time), the right part of the
window displays the moving object’s description. So, according to Figure 37, the user can see and
edit the following attributes:

ID - The unique id of the moving object. The value cannot be changed.

Start Frame - The start frame in which the moving object appears. The value cannot be
changed.

End Frame - The end frame in which the moving object appears. The value cannot be
changed.

Object Class - The object class, such as chair or car, is defined by selecting a term from the
corresponding drop-down list. New terms can be added.

Object Type - The specific type of the moving object, for example an office chair. New
terms can be added.

Movement - The movement of the moving object.

Position - The position description of the moving object, e.g., left.

Size - The size description of the moving object, e.g., small.

Size of Field - The size-of-field description of the moving object, e.g., close-up.

Sub-Movement - In case different movements occur within the same object appearance, the
user can see and edit their durations and specific movements.

Related Movement – The movement between this moving object and another moving object
or human.
For the six last attributes adding and deleting description terms is possible through the corresponding
buttons. Also, for each term a confidence level and a text to save some extra information about the
term can be stored by double-clicking on the term, as shown in Figure 27.
31
Figure 377: Moving Objeect Editing.
Note that aany change can be applied to all tthe channelss by checkiing the corrresponding box, only if
descriptions of the movving object exist in othher channelss.
Fig
gure 38: Right-cclicking on a moving
m
object no
ode.
32
By right clicking on a node which represents a moving object, a dropdown menu will appear (Figure
38) through which the user can:

Merge two moving objects, i.e. two set of bounding boxes (object trajectories). The two
moving objects must have the same Object Class, have the same Object Type if such
information is specified and appear in the same channels.

Split the moving object into two moving objects.

Delete the description of the moving object.

Delete the descriptions of the moving object from all the channels.

Go to the description of the moving object in another channel.

Copy the description of the moving object to another channel.

Set as a description for a specific channel, an existing description of a moving object.
2.3.4.8. Moving Human Editing
By left clicking on a node which represents a moving human (namely a series of bounding boxes,
over a number of consecutive frames that depict a human that moves over time), the right part of the
window displays the moving human’s description. So, according to Figure 39, the user can see and
edit the following attributes:

ID - The unique id of the moving human. The value cannot be changed.

Start Frame - The start frame in which the moving human appears. The value cannot be
changed.

End Frame - The end frame in which the moving human appears. The value cannot be
changed.

Body Part - The body part of the human actor enclosed in the bounding box is defined by
selecting a term from the corresponding drop-down list. New terms can be added.

Name - The name of the human. This can refer to either a actual name (e.g., Bogart) or a
symbolic name (e.g., person_1).

Activity - The activity (e.g. walk) of the static human. The value can be changed by selecting
a new term from the corresponding drop-down list. New terms can be added.

Expression - The facial expression (e.g. anger) of the static human. The value can be
changed by selecting a new term from the corresponding drop-down list. New terms can be
added.

Movement - The movement of the moving human.

Position - The position description of the moving human, e.g., left.
33

Size - The size description of the moving human, e.g., small.

Size of Field - The size-of-field description of the moving human, e.g., close-up.

Sub-Activity - In case different activities occur within the same human appearance, the user
can see and edit their durations and specific activities.

Sub-Expression - In case different expressions occur within the same human appearance, the
user can see and edit their durations and specific expressions.

Sub-Movement - In case different movements occur within the same human appearance, the
user can see and edit their durations and specific movements.

Related Movement – The movement between this moving human and another moving object
or human.
For the eight last attributes adding and deleting description terms is possible through the
corresponding buttons. Also, for each term a confidence level and a text to save some extra
information about the term can be stored by double-clicking on the term, as shown in Figure 27.
Note that any change can be applied to all the channels by checking the corresponding box, only if
descriptions of the moving human exist in other channels.
By right clicking on a node which represents a moving human, a dropdown menu will appear (Figure
40) through which the user can:

Merge two moving humans, i.e. two set of bounding boxes (human trajectories). The two
moving humans must have the same Body Part, have the same Name if such information is
specified and appear in the same channels.

Split the moving human into two moving humans.

Delete the description of the moving human.

Delete the descriptions of the moving human from all the channels.

Go to the description of the moving human in another channel.

Copy the description of the moving human to another channel.

Set as a description for a specific channel, an existing description of a moving human.
34
Figure 39 : Moving Human Editing.
Fig
gure 40: Right-cclicking on a moving human node.
35
2.3.4.9. C
Cut Editin
ng
By left cliccking on a node which
h representss a cut, thee right part of the wind
ndow displaays the cut’ss
description. So, accordding to Figu
ure 41, the uuser can see and edit the followingg attributes:

ID - The uniquue id of the cut.
c The vallue cannot be
b changed.

Starrt Frame - The first frame of the ccut. The value cannot be
b changed..

End
d Frame - The
T last fram
me of the cuut. The valu
ue cannot bee changed.

Chaaracterization - The cut can bbe characteerized with
h terms, suuch as com
mfortable orr
uncomfortable for viewin
ng, by seleecting a chaaracterizatio
on from thhe correspon
nding dropdow
wn list. New
w terms can be added.
Figuure 41: Cut Ediiting.
36
2.3.4.10.
Headeer Editing
g
By left cliccking on thhe node whiich is label ed “Headerr”, the rightt part of thhe window displays
d
thee
“header” geeneral inforrmation for the video, such as thee location of
o the videoo, the comprression, etcc.
So, accordiing to Figurre 41, the usser can see aand edit attrributes regarrding:

Thee location annd time of video
v
produuction.

Thee rights of thhe video con
ntent.

Thee role and naame of persons affiliateed with the production (“Person” ttabpage).

Varrious parameters regard
ding
o producttion (“Produ
uction Param
meters” pag
ge)
o video teechnical speecifications (“Video Teechnical Infformation” ppage)
o audio teechnical speecifications (“Audio Teechnical Infformation” ppage)
o specificcation of thee subtitles (““Subtitles” page)
o viewingg conditionss (“Monitorr” page).
Figurre 42: Header Editing.
E
37
2.3.5. Analyzer
The Analyzzer Window
w (Figure 43)
4 enables the user to
o execute various
v
videeo analysis algorithmss,
such as face/body/objject detectiion and traacking, shot detection, etc., by sselecting a number of
algorithms,, defining its execution
n sequencee and the viideo segmeents where the algorith
hms will bee
applied and setting which
w
algorithms willl be called
d in paralleel and whiich ones arre executedd
he group aree
sequentiallyy. When a group of algorithms iis called in parallel, alll the algoriithms of th
sequentiallyy executed for each frame of the selected viideo segmen
nt. When al
algorithms are
a executedd
sequentiallyy, an algoriithm finishees processinng of all fraames of the segment annd then the next one iss
executed.
Figure 43: Analyzer Window.
W
38
A description of the various information areas and buttons of the Analyzer Window (Figure 43) is
provided below:
1. Depicts the available algorithms. They are organized based on categories such as detectors,
trackers, etc..
2. Depicts the selected algorithms, that will be executed on the selected video segment (Batch
Processing list). The user can define their call sequence, delete them and set groups of
algorithms which are called in parallel, through the corresponding buttons. See below.
3. It adds the currently selected algorithm from the Algorithms list to the Batch Processing list.
An algorithm can be also added by double-clicking on it on the Algorithms list.
4. It deletes the currently selected algorithm from the Batch Processing list.
5. It deletes all the algorithms from the Batch Processing list.
6. It moves the currently selected algorithm up one slot.
7. It moves the currently selected algorithm down one slot.
8. It sets the currently selected algorithms to be executed in parallel.
9. It sets the first frame of the video segment where the algorithms will be applied.
10. It sets the last frame of the video segment where the algorithms will be applied.
11. It starts execution of the algorithms.
12. It stops execution of the algorithms.
13. It shows the progress of the batch processing.
The complete list containing the available algorithms is the following:

A shot cut detector

A key frame selector

Three face detectors

A tracker based on Particle filters

A general object detector based on Local Steering Kernels (available only in the 32bit
version)

An object tracker based on Local Steering Kernels (available only in the 32bit version)

An object tracker based on Local Steering Kernels (stereo version) (available only in the
32bit version)

Three size-of-field characterization algorithms

Two 3D quality defects detection algorithms (available only in the 32bit version)
a. In the following sections the corresponding manuals are given.
39
2.3.5.1. Sh
hot Bound
dary Deteector’s M
Manual
2.3.5.1.1. IIntroductioon
The Shot B
Boundary Deetector is a software tool that prov
vides users with
w the opttions to deteect the shotss
in 3D videoos. The algoorithm uses Mutual Info
formation fo
or the detecttion of shotss.
2.3.5.1.2. T
The param
meter inputt menu
A parameteer input mennu (see Figuure 4444) apppears when
n the user staarts the tooll from the Video
V
Content Annalysis Tooll’s analyzer window.
Figure 44: T
The parameteer input menu
u
a. Opttion to applyy shot detecction to bothh channels.
b. Opttion to applly shot deteection to eitther of the two video or disparityy channels (left and/orr
righht).
c. Opttion to apply shot detection to jusst one chann
nel either of
o the two vvideo or disparity mapss
(left
ft or right) annd transfer results to thhe other chaannel.
2.3.5.2. H
Haarcascad
de frontal face det ector man
nual
2.3.5.2.1. IIntroductioon
The Frontaal Face Deteector is an algorithm tthat provides users wiith the meaans to detecct images of
frontal facees from 2D
D/3D videoss. The algoorithm uses Haar-like features in order to calculate
c
thee
frontal facees.
40
2.3.5.2.2. T
The param
meter inputt menu
A parameteer input mennu (see Figuure 35) appeears when th
he user starts the tool frrom the Vid
deo Content
Analysis Toool’s analyzzer window
w.
Figure 35: T
The parameteer input menu
u
At the param
meter inputt menu, the user can sellect:
a. Thee path to thee xml file that contains the specificcations for the Haar-likke features.
b. Thee frequency of the face detection, tthat means how frequeently (every how many frames) thee
Facee Detector will
w be used
d (in the in-bbetween fraames faces are
a derived from the traacker used).
c. A faace-to-bodyy option thatt provides th
the possibiliity to return
n a ROI thatt contains allso the bodyy
beloow the face that has beeen detected.
2.3.5.3. Color+Haaarcascadee Frontal F
Face Deteector’s Manual
M
2.3.5.3.1. IIntroductioon
The Frontal Face Deteector is an algorihm
a
toool that provides users with
w the me ans to detecct images of
frontal facees from 2D
D/3D video
os. The algoorithm uses skin colo
or in combiination witth Haar-likee
features in order to calculate thee frontal faaces. The sk
kin color taakes param
meter in the HSV colorr
space.
2.3.5.3.2. T
The param
meter inputt menu
A parameteer input mennu (see Figuure 46) appeaars when th
he user starts the tool frrom the Vid
deo Content
Analysis Toool’s analyzzer window
w.
41
Figure 46: T
The parameteer input menu
u
a. Thee path to thee xml file that contains the specificcations for the Haar-likke features.
b. Thee minimum and maxim
mum values for the Huee channel of the video being used
d. The rangee
of thhe values (00-180) is shown on the GUI.
c. Thee minimum and maxim
mum values for the Satu
uration channel of the video bein
ng used. Thee
rangge of the vaalues (0-255) is shown oon the GUI.
d. Thee minimum value for th
he Value chhannel of th
he video beeing used. T
The range of
o the valuess
(0-2255) is show
wn on the GUI.
e. Thee frequency of the face detection, tthat means how frequeently (every how many frames) thee
Facee Detector will
w be used
d (in the in-bbetween fraames faces are
a derived from the traacker used).
f. A faace-to-bodyy option thatt provides th
the possibiliity to return
n a ROI thatt contains allso the bodyy
beloow the face that has beeen detected.
2.3.5.4. Frrontal–Prrofile Face Detectoor’s Manu
ual
2.3.5.4.1. IIntroductioon
The Frontaal–Profile Face
F
Detecto
or is a softtware tool that
t
providees users witith the mean
ns to detecct
images of ffrontal facess from 2D/3
3D videos. T
The algorith
hm uses skin color in ccombination
n with Haarlike featurees in order to
t calculate the frontal faces. The skin color takes param
meter in thee HSV colorr
space.
2.3.5.4.2. T
The param
meter inputt menu
A parameteer input mennu (see Figuure ) appearss when the user
u starts th
he tool from
m the Video
o Content
Analysis Toool’s analyzzer window
w.
42
Figure 47: T
The parameteer input menu
u
a. Thee path to thhe xml file that contaiins the speccifications for the Haaar-like featu
ures for thee
fronntal facial im
mage detecttion.
b. Thee path to thhe xml file that contaiins the speccifications for the Haaar-like featu
ures for thee
proffile facial im
mage detecttion.
c. Thee minimum and maxim
mum values for the Huee channel of the video being used
d. The rangee
of thhe values (00-180) is shown on the GUI.
d. Thee minimum and maxim
mum values for the Satu
uration channel of the video bein
ng used. Thee
rangge of the vaalues (0-255) is shown oon the GUI.
e. Thee minimum value for th
he Value chhannel of th
he video beeing used. T
The range of
o the valuess
(0-2255) is show
wn on the GUI.
G
f. Thee frequency of the face detection, tthat means how frequeently (every how many frames) thee
Face Detector will
w be used
d (in the in-bbetween fraames faces are
a derived from the traacker used).
g. A faace-to-bodyy option thatt provides th
the possibiliity to return
n a ROI thatt contains allso the bodyy
beloow the face that has beeen detectedd.
2.3.5.5. O
Object Dettector’s Manual
M
2.3.5.5.1. IIntroductioon
The Objectt Detector iss a softwaree tool that pprovides useers with thee means to ddetect speciified objectss
in 2D/3D vvideos. The algorithm uses
u Local S
Steering Kerrnels (LSKss) for the deetection.
2.3.5.5.2. T
The param
meter inputt menu
A parameteer input mennu (see Figuure 48) appeears when th
he user starts the tool frrom the Vid
deo Contentt
Analysis Toool’s analyzzer window
w.
43
Figure 48: T
The parameteer input menu
u
a.
b.
c.
d.
e.
f.
g.
h.
i.
Thee path to thee image file of the objecct to be searrched for.
Thee image widdth to downsscale the quuery image.
Thee image heigght to down
nscale the quuery image.
Thee image widdth to downsscale the wiidth of the video
v
wheree the search is applied.
Thee image heigght to down
nscale the heeight of the video wherre the searchh is applied
d.
Thee window size of the LS
SK.
Thee step for thee search (ho
ow thoroughh the search
h will be).
An overall threeshold that is
i used to sppecify the ex
xistence of the object in
inside the frrame.
A thhreshold thaat is used to
o specify thhe potential existence of more thann one objectts inside thee
fram
me.
j. Thee frequency of the objeect detectionn, that meaans how freq
quently (evvery how many frames)
the Face Detecctor will be used (in thhe in-between frames faces
f
are deerived from
m the trackerr
usedd).
2.3.5.6. Paarticles Tracker’s
T
Manual
2.3.5.6.1. IIntroductioon
The Particlles Tracker is a software tool thaat provides users with
h the meanss to track an
a object onn
2D/3D videeos. The alggorithm usess particles ffilters to tracck the objecct.
44
2.3.5.6.2. T
The param
meter inputt menu
A parameteer input mennu (see Figuure ) appearss when the user
u starts th
he tool from
m the Video
o Content
Analysis Toool’s analyzzer window
w.
Figure 49: T
The parameteer input menu
u
a. Thee number off particle filtters that aree going to bee used.
b. Thee width of thhe downscalled image (ttemplate wiidth).
c. Thee height of thhe downscaaled image ((template heeight).
2.3.5.7. LS
SK Stereoo Trackerr’s Manu al
2.3.5.7.1. IIntroductioon
The LSK S
Stereo Trackker is a soft
ftware tool tthat providees users witth the meanns to track an
a object onn
2D/3D videeos. The alggorithm usess Local Steeering Kerneels to track the
t object.
2.3.5.7.2. T
The param
meter inputt menu
A parameteer input mennu (see Figuure 50) appeears when th
he user starts the tool frrom the Vid
deo Contentt
Analysis Toool’s analyzzer window
w.
45
Figure 50: T
The parameteer input menu
u
a.
b.
c.
d.
e.
f.
g.
Thee image widdth to downsscale the traacked object.
Thee image heigght to down
nscale the traacked objecct.
Thiss option dettermines thee size of seaarch region..
Thiss option dettermines thee window siize of the LSK.
Weiight of the similarity
s
with
w the objeect appearan
nce in the first frame.
Thee scaling facctor for the tracked
t
imaage for the downscaled
d
version of the tracked object.
Thee rotation faactor (in degrees) for tthe tracked image for the rotated version off the trackedd
objeect.
h. Thee value for thhe zero disp
parity (on thhe screen neeither in front nor behinnd the screeen).
2.3.5.8. LS
SK Track
ker’s Man
nual
2.3.5.8.1. IIntroductioon
The LSK S
Stereo Trackker is a soft
ftware tool tthat providees users witth the meanns to track an
a object onn
2D/3D videeos. The alggorithm usess Local Steeering Kerneels to track the
t object.
2.3.5.8.2. T
The param
meter inputt menu
A parameteer input mennu (see Figuure 54) appeaars when th
he user starts the tool frrom the Vid
deo Content
Analysis Toool’s analyzzer window
w.
46
Figure 54: T
The parameteer input menu
u
a.
b.
c.
d.
e.
f.
g.
Thee image widdth to downsscale the traacked object.
Thee image heigght to down
nscale the traacked objecct.
Thiss option dettermines thee size of seaarch region..
Thiss option dettermines thee window siize of the LSK.
Weiight of the similarity
s
with
w the objeect appearan
nce in the first frame.
Thee scaling facctor for the tracked
t
imaage for the downscaled
d
version of the tracked object.
Thee rotation faactor (in degrees) for tthe tracked image for the rotated version off the trackedd
objeect.
2.3.5.9. 3D
D Rules Detector’s
D
s Manual
2.3.5.9.1. IIntroductioon
The 3D Ruules Detectoor is a softw
ware tool thhat providees users with the option
ons to check
k 3D videoss
with disparrity maps for violationss of the 3D rrules.
2.3.5.9.2. T
The param
meter inputt menu
A parameteer input mennu (see Figuure 52) appeears when th
he user starts the tool frrom the Vid
deo Contentt
Analysis Toool’s analyzzer window
w.
47
Figure 52: T
The parameteer input menu
u
a. Thee algorithmss on which test
t can be rrun are disp
played in check buttonss.
b. Thee options forr algorithm include:
I.
Stereoscopic Wind
dow Violatioons,
II.
Bent Window
W
Effeects and
III.
Depth Jump
J
Cuts
c. An extra optionn for markin
ng the resullts in the vid
deo channells (normallyy the resultss are markedd
in thhe disparityy maps only)).
2.3.5.10.
UFO Detector’
D
s Manuall
2.3.5.10.1..
Introdu
uction
The UFO D
Detector is a software tool that prrovides useers with the options to check 3D videos withh
disparity m
maps for objeect impropeerly displayeed inside th
he theatre sp
pace (knownn as UFO).
2.3.5.10.2..
The paarameter in
nput menu
u
A parameteer input mennu (see Figuure ) appearss when the user
u starts th
he tool from
m the Video
o Content
Analysis Toool’s analyzzer window
w.
48
Figure 53: T
The parameteer input menu
u
b. Opttion to applyy the algoritthm to bothh channels.
c. Opttion to applyy the algoritthm to eitheer of the two
o channels (left
(
or right
ht) or even both.
b
d. Opttions to appply the algo
orithm to eiither of the two channels (left or right) and transfer thee
resuults to the otther channeel.
2.3.5.11.
Keyframe Selecction Toool’s Manu
ual
2.3.5.11.1..
Introdduction
The keyfraame selectorr tool is a software
s
toool that givees users the means to ccompute, visualize andd
manipulate keyframes of 2D/3D video
v
shots. The tool iss depicted in
n Figure 54..
49
Fig
gure 54: Th
he keyframe selector GUI
G
At the timee of this wriiting, three algorithm iimplementaations are av
vailable, thaat can be seelected from
m
the parametter input meenu. All thrree of them at their corre need to co
ompute disttances betw
ween framess.
For the fi
first two algorithms,
a
the distannce betweeen two frrames is tthe sum of
o all theirr
corresponding(having same coord
dinates) piixel distancces. Pixel distances cann be compu
uted by twoo
methods in this libraryy:
a. Disttance of th
he averagess of pixels: initially an
n average value
v
based on the RG
GB values of
the pixel is com
mputed (in essence thee pixel is sim
mply transfformed to grreyscale). The
T distancee
of tw
wo pixels iss the distancce of their aaverage valu
ues.
b. Eucclidean disstance of pixels:
p
the distance of
o two pixeels is compputed as an
n Euclideann
distance, (the square
s
root of the sum
m of the RGB values sq
quared). Thhis type of distance
d
is a
bit m
more precisse but sloweer than the ffirst one.
i
param
meter menu aare the follo
owing:
The algorithhms in the input
a. Sim
mple Distan
nces of Fra
ames: This algorithm initially
i
com
mputes the distance fo
or each shoot
fram
me pair (thaat is for fraame pairs 11-2, 1-3, … , 2-3, …) where the distance between
b
twoo
fram
mes is definned as the su
um of their ccorrespondiing(having same coorddinates) pixeel distancess,
as m
mentioned above.
a
After all distancces among shot framess are compuuted, the keeyframe cann
be dderived as the
t one thatt has the sm
mallest sum
m of frame distances,
d
m
meaning thaat is the onee
clossest to mostt other shot frames.
b. Disttances from
m Average Frame: Thhis algorith
hm computees an “averaage” shot frrame, whichh
in eessence is a frame whose
w
pixeels hold th
he averagee value of all the sh
hot frames’
corrresponding pixels. Thee keyframe will then be the one th
hat has the least distan
nce from thee
averrage frame.. Frame disstances are also compu
uted based on pixel diistances herre, as in thee
firstt algorithm. This is by far the fasteest algorithm
m of the two
o but slightlly less accu
urate.
c. Disttances of frame Histograms
H
s: This algorithm
a
follows a similar process too
KFS
SelectorAllD
Distances to produce its keyfram
mes, with th
he only diffference beiing that thee
distance betweeen two fram
mes in thiss algorithm is not the sum of theeir correspo
onding pixeel
50
distances, but the
t distancee of their hisstograms. This
T algorith
hm is the mo
most context sensitive of
d drasticallyy different results fro
om the firsst two. Forr histogram
m
the three, andd can yield
distances, the metrics provided byy OpenCV are used as is: Corrrelation, Chi-Squaree,
Inteersection, Bhhattacharyyya.
2.3.5.11.2..
The paarameter in
nput menu
u
A parameteer input mennu (see Figu
ure 55) appeears when th
he user starts the tool ffrom the Vid
deo Content
Analysis Toool’s analyzzer window
w.
Fig
gure 55: Th
he parameter input menu
At the param
meter inputt menu, the user can sellect:
d. Thee channel onn which the keyframe sselection wiill be based
d (for a 3D vvideo) and if she wantss
the rresulted keyyframes to be
b stored too all channells in the Vid
deo Contentt Analysis Tool.
T
e. Thee algorithm that
t will be used for keeyframe com
mputation. The
T algorithhms availab
ble.
f. Thee algorithm dependent distance
d
typpe the algoriithm will usse to compuute frame diistances.
g. If thhe GUI shouuld appear after
a
the com
mputation of
o the keyfraames or nott. If this is not
n checkedd,
the results will be immed
diately storeed in the “Video
“
Con
ntent Analyssis Tool”, without
w
anyy
n
ng frames w
will be storeed along with
w the keyyframe, form
ming a keychannges, and neighbourin
segm
ment of the shot insteaad. The key--segment will
w consist of
o 21 framess, with the keyframe inn
the middle of thhe segment.
2.3.5.11.3..
Resultt representtation and manipulattion
51
Fig
gure 56: Th
he parameter input menu
If the user hhas checkedd the option
n marked as “d” in Figu
ure 55, the GUI
G will apppear after th
he keyframee
computatioons (see Figuure 56).
The white panel on thhe left of th
he GUI conntains the shot
s
framess, which aree representeed by smalll
squares; theey can be cllicked and visualized
v
inn the upperr right corneer of the GU
UI (see Figu
ure 54). Thiss
panel basiccally is a grraphical rep
presentationn of the seleected shot’ss frames suuccess as rep
presentativee
frames. Thhe most valluable repreesentative fframes, are placed clo
oser to the start of thee axes. Thee
horizontal aaxis represeents the agg
gregate col or distance of the shot frame; tha
hat is the su
um of framee
distances thhe specific frame has to
t all the reest of the sh
hot. Similarly, the vertitical axis rep
presents thee
aggregate ddepth distannce, if dispaarity videoss are also av
vailable. On
n the far lefft side of th
he panel, thee
frame with the smallesst aggregatee color distaance is placced, which means
m
it haas the biggest similarityy
of color too most otheer shot fram
mes, thus bbeing the best
b
shot representativee between the coloredd
frames. Sim
milarly, thee rightmost placed fraame will bee the worstt shot repreesentative between
b
thee
colored fram
mes, an outtlier. The saame logic hholds for distances thatt are placedd on the verrtical(depth)
axis. Lastlyy, color annd depth ag
ggregate diistances aree combined
d (through an Euclideean distancee
metric) andd yield the best
b candidaate keyframee.
Each framee’s actual agggregate disstance valuees can be ex
xplicitly seeen when thee mouse pointer hoverss
over its squuare; a conteext menu diisplaying thhem appearss, as can be seen in Figuure 57.
52
Figgure 57: Th
he context menu
m
that aappears wh
hen hoverin
ng over a frrame’s square.
For user coonvenience, the graph is stretchedd on both diirections so that framees can be eaasily clickedd
on the grapph and not become ov
verly congeested. This feature has been impllemented, because thee
largest aggrregate distaance on one axis can byy far outweeigh the otheer largest agggregate distance, withh
the resultinng frame sqquares beco
oming greaatly congested (see Fiigure 58-b)). The prop
portions cann
change, by right-clickiing somewh
here in the white spacee of the graaph and seleecting “Switch betweenn
o modes caan be seen iin Figure 58.
5 The reaal
real/stretchhed proporttions”. The difference of the two
proportionss graph cann be useful for the useer to determ
mine how much
m
more contributio
on color hadd
over depth (and vice versa)
v
for thee keyframe computatio
on.
A
B
Figgure 58: a. Stretched Graph, b. Real
R Propo
ortions Graaph
The user ccan also eaasily add keyframes
k
tto the shot or removee them, byy right-click
king on thee
respective ssquare, as seeen in Figurre 59.
53
F
Figure 59: The contex
xt menu thaat appears on right cliicking a fraame squaree.
If a small trrailer-like video
v
of thee original onne is to be made
m
out off keyframess, a single keyframe
k
forr
each shot iis not enouugh. For thaat reason, ssome neigh
hbouring fraames can aalso be attaached to thee
keyframe, w
with all together forming a key-seegment of th
he shot. The GUI givees the user th
he ability too
select the number off neighbourrs that willl be attacheed to the keyframe
k
fr
from its lefft and righht
OfNeighbours + 1 +
independenntly, that is, the final key-seegment wiill consist of numO
numOfNeigghbours fraames. A fraame’s neighhbours can also be seeen on the grraph (their squares aree
green-colorred, as menntioned later), giving the user the
t opportu
unity to vieew how a keyframe’ss
neighbourinng frames reelate to it.
b
implem
mented for tthe graph’s frame squaares to makke importan
nt frames onn
A color schheme has been
the graph ddistinguishabble (see Fig
gure 54):



keyframe squares are blue-colored
b
d, unless they are click
ked by the uuser, which makes them
m
purple
a clicked frrame squaree is red-coloored, unless it is a keyfrrames as meentioned ab
bove
the clicked frame’s neeighbouring frames aree green-colo
ored (their nnumber can be changedd
on the GUII).
he bottom
If the user nneeds to revvert changess, he can cliick the apprropriate reseet buttons loocated on th
right part of the GUI (ssee Figure 60).
6 All chaanges made be the user will be revverted.
Last but noot least, the shot
s
(or chaannel for a sstereo video
o) can be changed from
m the bottom
m right
corner of thhe GUI (seee Figure 60). When thiss happens, th
he GUI rem
moves old frrames from the graph
and updatess it with thee new selectted shot’s frrames, with the frame depicted
d
on the upper right
r
of the
GUI changiing to a keyyframe of th
he new shot automaticaally.
54
Figure 60: The botttom right corner of the GUI.
55