data structures and algorithms · solution to moed a 2013 in data structures and algorithm ....

43
Data Structures and Algorithms Good Luck Vaad Handasa

Upload: others

Post on 19-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Data Structures and Algorithms

Good Luck

Vaad Handasa

Page 2: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

ID:

Tel Aviv University

The Faculty of Engineering

Moed B 2013 in Data Structures and Algorithm

Exam Time: 3 Hours Date: 82.03.2013

Teacher: Dana Ron Instructions:

You are allowed to use four study aids papers. You cannot use any sort of computer.

The exam includes 5 questions. The grade for each question appears in brackets.

Write your answers on the exam form in the designated place for it. It is highly

recommended to first write your answer in the draft paper that you have received and

only later to copy it, in a clear and readable way, to the exam form. Explain shortly but

yet clearly all your claims. A claim without an explanation will not be accepted.

You may use theorems and algorithms which have been learned in class and in the

recitations or which have appeared in the homework assignments that you did. In such

cases you may cite what was learned with no need for a proof. On the other hand, if you

are using a slightly different version of an algorithm or some other analysis you must

explain precisely what the differences are.

In this exam there are 9 pages (including this one). Please make sure you have them all in

your possession.

Don't forget to write your ID number at the marked place.

GOOD LUCK! (1) (2)

a.

b.

(3) a.

b.

(4) a.

b.

c.

d.

(5) a.

b.

c.

d.

e .

Page 3: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Question 1 (71 points)

Given the following algorithm that accepts as an input an array 𝐴 with of 𝑛 numbers and two

indexes 1 ≤ 𝑝, 𝑟 ≤ 𝑛. The running time of the procedure Proc1(A,p,r) for any given array is

𝑐 ∙ 𝑠 ∙ 𝑙𝑜𝑔𝑙𝑜𝑔(𝑠) for 𝑐 a constant and 𝑠 = 𝑟 − 𝑝 + 1 is the size of the sub array 𝐴[𝑝, … , 𝑟] and the

running time of Proc2(A,p,r) for any given array is 𝑐′ ∙ 𝑠 for 𝑐′ a constant.

Alg(A,n,p,r)

{

if (r-p+1 ≤ n1/4 )

Proc1(A,p,r)

else {

t := (r-p+1)/log(n)

for(i=1to log(n) )

Alg(A, n, p+(i-1)t, p+it-1)

Proc2(A,p,r)

}

}

We mark the running time of Alg when it is called with 𝑝 = 1 and 𝑟 = 𝑛 with 𝑇𝐴(𝑛).

What is 𝑔(𝑛) so that 𝑇𝐴(𝑛) = Θ(𝑔(𝑛))?

Instruction: show the recursion tree of the algorithm. What is the height of the recursion tree? What

does each level, except the leaves, contribute to the running time? What is the leaves contribution?

To remind you, log𝑎 𝑏 =log2 𝑏

log2 𝑎. You can assume, for the sake of simplicity that log(𝑛) is an integer,

that 𝑟−𝑝+1

log(𝑛) is always an integer and that the recursion always stops when 𝑟 − 𝑝 + 1 = 𝑛

1

4.

Recommendation: mark 𝑘 = 𝑛1

4 and 𝑑 = log(𝑛), and analyze first the running time as a function

of 𝑛, 𝑑, 𝑘.

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

Page 4: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Question 8 (72 points)

As a reminder for you, a maximal priority queue is defined like a minimal priority queue except

that it supports the function DeleteMax instead of DeleteMin. We assume that the elements in the

priority queue are numbers (not necessarily different from each other) and DeleteMax deletes from the

queue the element with the maximal value and returns it.

a. Explain what can be changed in the definition of the partially ordered binary tree so that it will fit the

maximal priority queue, and explain with words how does a DelteMax action is performed on the

partially ordered binary tree (as described in class for DeleteMin).

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

b. Write a pseudo code for the procedure Insert when the partially ordered binary tree is represented by a

heap 𝑃 (meaning a record in which are two fields: P.size that holds the number of elements in the

priority queue and P.T is an array with size 𝑀𝐴𝑋𝑆𝐼𝑍𝐸 that holds the elements of the queue). The

running time of Insert should be Θ(log(𝑛)) for 𝑛 is the number of elements in the queue.

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

Page 5: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Question 3 (75 points)

a. We are interested in a procedure that accepts as an input a pointer to the root of the binary search

tree 𝑇 and in it at most 𝑁 nodes, and also an array 𝐴 of size 𝑁 (in which there is no information).

The procedure insert the values in the tree 𝑇 sorted from smallest to largest, into an array 𝐴 (in a

sequential order) and returns the number of values, 𝒏, that was inserted to the array (the

number of nodes in the tree). In other words, at the end of the procedure's run, in

𝐴[1], 𝐴[2], … , 𝐴[𝑛] are the values that are in tree 𝑇, sorted from smallest to largest. Complete the

following pseudo code for this procedure.

Tree-to-array(pnode,A,i) /* Initially call the procedure with pnode = pointer to root of tree, i=1*/

{

if (pnode = NULL)

return( ____________ )

s1 := __________________________

A[i+s1] := ________________________

s2 := ____________________________

return( _________________ )

}

b. Complete the following pseudo code so that the procedure Array-to-tree(A,n) that

accepts as an input an array 𝐴 and an integer 𝑛, so that the numbers in 𝐴[1], … , 𝐴[𝑛] are different

from each other and sorted from smallest to largest. The procedure returns a pointer to the root of

a binary search tree in which are the numbers that are in the array, and its height is 𝑶(𝐥𝐨𝐠(𝒏)).

Explain why this is the height of the tree.

Array-to-tree(A,n) {

return(Array-to-tree-req(A,1,n,NULL)) }

Array-to-tree-req(A,p,r,parpnode)

{

create node and let pnode be a pointer to it

q := (p+r)/2

pnode VAL := ___________________

pnode LC := NULL; pnode RC := NULL; pnode PAR := parpnode

if (r > p) {

if (p < q)

pnode LC := ____________________________

if (r > q)

pnode RC := ____________________________

}

return(pnode)

}

Page 6: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Question 4 (82 points)

For each of the following claims, write if it is true or false and explain your answer. You may use,

as a basis, everything that you have seen in class but you must explain precisely what are you

basing your answers on. All the claims are in the subject of flow networks with 𝑛 > 2 vertices

(including 𝑠 and 𝑡) and 𝑚 edges such that any vertex is on at least one path from 𝑠 to 𝑡. This means

that a true claim is true if it always satisfy these conditions.

a. If given that the capacity of any edge in the network is either 1 or 2 (To remind you, if there is

no edge then this means that its capacity is 0) then the running time of Ford–Fulkerson algorithm

when it finds a path in the residual network using BFS is 𝑂(𝑚𝑛2).

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

b. If given that the capacity of any edge in the network is in the form of 𝑘

𝑛 for 𝑘 is an integer

between 𝑛 and 2𝑛, then the running time of Ford–Fulkerson algorithm when it finds a path in the

residual network using BFS is 𝑂(𝑚𝑛2).

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

Page 7: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

c. If given that from 𝑠 goes out 𝑑 edges with a weight of 1 then there must exist a flow function

in the net with value 𝑑.

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

d. If the capacity of each edge in the network is a value between 1 and 2 and there exist a cut (𝑆, 𝑇)

for 𝑠 ∈ 𝑆 and 𝑡 ∈ 𝑇 such that 𝑐(𝑆, 𝑇) = 1 then the value of the maximal flow is 1.

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

Page 8: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Question 5 (32 points)

In some factory are produced 𝑘 different types of products and there are 𝑛 machines (identical to

each other) to manufacture the products. For each type of product 1 ≤ 𝑖 ≤ 𝑘, and any number 0 ≤𝑗 ≤ 𝑛 of machines, 𝑝[𝑖][𝑗] is a number between 1 and 100 that represent the profit from selling a

product from type 𝑖 if there would be assigned to the assignment 𝑗 machines, when 𝑝[𝑖][0] and the

profit to that type of product cannot decrease if assigned to it a bigger number if machines (but

does not necessarily grow linear with the number of machines). We want to decide how many

machines to ascribe for each type of product, so that the total profit (the sum of the profits) from

the different types of products will be maximal (every machine can be ascribed to only one type

of product). In other words, we are looking for 𝑘 non-negative numbers 𝑥1, 𝑥2, … , 𝑥𝑘 so that

∑ 𝑥𝑖𝑘𝑖=1 = 𝑛 and ∑ 𝑝[𝑖][𝑥𝑖]

𝑘𝑖=1 are as large as possible.

First we find the maximal profit that can be obtained and afterward the optimal ascribe.

For each 1 ≤ 𝑖 ≤ 𝑘 and 0 ≤ 𝑗 ≤ 𝑛 we define 𝑀[𝑖][𝑗] to be the maximal profit that can be obtained

from the types of product 1, … , 𝑖, if ascribed for all these types together 𝑗 machines. In other words,

𝑀[𝑖][𝑗] is the maximal value of ∑ 𝑝[𝑙][𝑥𝑙]𝑖𝑙=1 over all the choices of 𝑖 non-negative numbers

𝑥1, 𝑥2, … , 𝑥𝑖 so that ∑ 𝑥𝑙𝑖𝑙=1 = 𝑗.

a. What is 𝑀[1][𝑗] for 0 ≤ 𝑗 ≤ 𝑛 ?

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

b. From the following equations, which is true for any 1 < 𝑖 ≤ 𝑘 and 0 ≤ 𝑗 ≤ 𝑛? Explain your

answer.

(i) 𝑀[𝑖][𝑗] = 𝑚𝑎𝑥𝑟=0𝑗 {𝑝[𝑖][𝑟] + 𝑀[𝑖 − 1][𝑗]}

(ii) 𝑀[𝑖][𝑗] = 𝑚𝑎𝑥𝑟=0𝑗 {𝑝[𝑖][𝑟]} + 𝑀[𝑖 − 1][𝑗 − 𝑟]

(iii) 𝑀[𝑖][𝑗] = 𝑚𝑎𝑥𝑟=0𝑗 {𝑝[𝑖][𝑟] + 𝑀[𝑖 − 1][𝑗 − 𝑟]}

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

Page 9: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

c. Write a pseudo code for an algorithm that accepts as an input 𝑘, 𝑛 and 𝑝[] [] and returns as an

output the maximal profit that can be obtained by ascribing 𝑛 machines to produce 𝑘 products,

when the profit from ascribing 𝑗 machines to a product 𝑖 is 𝑝[𝑖][𝑗]. The running time of the

algorithm should be polynomial in 𝑘 and 𝑛.

Remark: it is not possible to use one line of code to calculate the maximal or minimal value over

more than two values, but you must write a loop that calculates the maximal/minimal.

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

d. Give the best possible upper bound on the running time of your algorithm (In other words, a

function 𝑓(𝑛, 𝑘) that its running time is 𝑂(𝑓(𝑛, 𝑘))).

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

Page 10: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

e. Show by writing another procedure, and if needed an addition to the pseudo code that you wrote

(that does not change asymptotically the running time of the algorithm), how can you get

𝑥1, 𝑥2, … , 𝑥𝑘 that obtain an optimal total profit. In other words, the additional procedure prints

pairs: (1, 𝑥1), (2, 𝑥2), … , (𝑘, 𝑥𝑘) (from left to right) so that ∑ 𝑥𝑖𝑘𝑖=1 = 𝑛 and ∑ 𝑝[𝑖][𝑥𝑖]𝑘

𝑖=1 is as large

as possible. In case your procedure is recursive, you must write explicitly with which parameters

it is called with.

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

_____________________________________________________________________________________

Page 11: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Solution to Moed A 2013 in Data Structures and Algorithm

Question 1

Given the following algorithm that accepts as an input an array 𝐴 with of 𝑛 numbers and two

indexes 1 ≤ 𝑝, 𝑟 ≤ 𝑛. The running time of the procedure Proc1(A,p,r) for any given array is

𝑐 ∙ 𝑠 ∙ 𝑙𝑜𝑔𝑙𝑜𝑔(𝑠) for 𝑐 a constant and 𝑠 = 𝑟 − 𝑝 + 1 is the size of the sub array 𝐴[𝑝,… , 𝑟] and the

running time of Proc2(A,p,r) for any given array is 𝑐′ ∙ 𝑠 for 𝑐′ a constant.

Alg(A,n,p,r)

{

if (r-p+1 ≤ n1/4 )

Proc1(A,p,r)

else {

t := (r-p+1)/log(n)

for(i=1to log(n) )

Alg(A, n, p+(i-1)t, p+it-1)

Proc2(A,p,r)

}

}

We mark the running time of Alg when it is called with 𝑝 = 1 and 𝑟 = 𝑛 with 𝑇𝐴(𝑛).

What is 𝑔(𝑛) so that 𝑇𝐴(𝑛) = Θ(𝑔(𝑛))?Instruction: show the recursion tree of the algorithm. What is the height of the recursion tree? What

does each level, except the leaves, contribute to the running time? What is the leaves contribution?

To remind you, log𝑎 𝑏 =log2 𝑏

log2 𝑎. You can assume, for the sake of simplicity that log(𝑛) is an integer,

that 𝑟−𝑝+1

log(𝑛)is always an integer and that the recursion always stops when 𝑟 − 𝑝 + 1 = 𝑛

1

4.

Recommendation: mark 𝑘 = 𝑛1

4 and 𝑑 = log(𝑛), and analyze first the running time as a function

of 𝑛, 𝑑, 𝑘. _____________________________________________________________________________________

We will solve according to the instruction (and use assumptions regarding that the relevant

numbers are integers). The tree is built as follows: for every inside node there are 𝑑 = log(𝑛) children, so that in the 𝑖th level there are 𝑑𝑖 nodes, when every node fits to a sub array with size𝑛

𝑑𝑖. According to the recursion stop condition, every leaf fits to a sub array with size 𝑘 = 𝑛

1

4, and

there are 𝑛

𝑘= 𝑛

3

4 leafs. Since the running time of Proc2 is linear in size of the sub array on which

it is running, every level in the tree, except the leafs level, donates the total time of 𝑐′𝑛. The number

of these levels is log𝑑 (𝑛

𝑘) =

𝑙𝑜𝑔(𝑛

𝑘)

𝑙𝑜𝑔(𝑑)=

3

4𝑙𝑜𝑔(𝑛)

𝑙𝑜𝑔𝑙𝑜𝑔(𝑛)(since we stop at level 𝑖 that satisfies 𝑑𝑖 =

𝑛

𝑘) so

that together they donate Θ(𝑛𝑙𝑜𝑔(𝑛)

𝑙𝑜𝑔𝑙𝑜𝑔(𝑛)). Every leaf donates time of 𝑐𝑘𝑙𝑜𝑔𝑙𝑜𝑔(𝑘) and there are

𝑛

𝑘

sub arrays like that, so together, the leafs donates 𝑐𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑘) = Θ(𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛)). Therefore the

total running time is Θ(𝑛𝑙𝑜𝑔(𝑛)

𝑙𝑜𝑔𝑙𝑜𝑔(𝑛)).

Page 12: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Question 2 (81 points)

As a reminder for you, a maximal priority queue is defined like a minimal priority queue except

that it supports the function DeleteMax instead of DeleteMin. We assume that the elements

in the priority queue are numbers (not necessarily different from each other) and DeleteMax

deletes from the queue the element with the maximal value and returns it.

a. Explain what can be changed in the definition of the partially ordered binary tree so that it will

fit the maximal priority queue, and explain with words how does a DelteMax action is

performed on the partially ordered binary tree (as described in class for DeleteMin).

_____________________________________________________________________________________

All that needs to be changed in the definitions of the partially ordered binary tree is that the value

in every node is larger or equal to the value of its children (instead of smaller or equal to).

DeleteMax action takes the value from the root of the tree (which is the maximum) and put it

in a temporary variable. It will pass the value in its left-most last level to the root, and trickle its

value down the tree as long as it is smaller than at least one of its two children, when it replaces

it with the child that its value is the largest. At the end, the value that was in the root is returned.

b. Write a pseudo code for the procedure Insert when the partially ordered binary tree is

represented by a heap 𝑃 (meaning a record in which are two fields: P.size that holds the

number of elements in the priority queue and P.T is an array with size 𝑀𝐴𝑋𝑆𝐼𝑍𝐸 that holds

the elements of the queue). The running time of Insert should be Θ(log(𝑛)) for 𝑛 is the

number of elements in the queue.

Like in a heap that represent a partially ordered binary tree for a priority minimum queue, the

value in the root will be in place P.T[1], and the node that its value is in place 𝑖 in the array, its

children are in places P.T[2i] and P.T[2i+1].

Insert(P,x)

{

if (P.size = MAXSIZE)

return(OVERFLOW)

P.size := P.size + 1

P.T[P.size] := x

cur := P.size

while( (cur > 1) and (P.T[cur] > P.T[cur/2])) {

y:= P.T[cur]

P.T[cur] := P.T[cur/2]

P.T[cur/2] := y

}

}

Page 13: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Question 3

a.

Tree-to-array(pnode,A,i) /* Initially call the procedure with pnode = pointer to root of tree, i=1*/

{

if (pnode = NULL) return(0) /* the tree is empty so num of nodes is 0 */

𝑠1 := Tree-to-array(pnode LC,A,i)

/* enter elements in left subtree, starting from position i */

A[i+𝑠1] := pnode VAL /* enter root right after these elements */

𝑠2 := Tree-to-array(pnode RC,A,i+s1+1)

/* enter elements in right subtree, right after root */

return(𝒔𝟏 + 𝒔𝟐 + 1) /* this is total number of nodes in the tree */

}

b.

Array-to-tree(A,n) {

return(Array-to-tree-req(A,1,n,NULL)) }

Array-to-tree-req(A,p,r,parpnode)

{

create node and let pnode be a pointer to it

q := (p+r)/2

pnode VAL := A[q]; /* put middle (median) value in root */

pnode LC := NULL; pnode RC := NULL; pnode PAR := parpnode;

if (r > p) {

if (p < q) /* build left subtree */

pnode LC := Array-to-tree-req(A,p,q-1,pnode)

if (r > q) /* build right subtree */

pnode RC := Array-to-tree-req(A,q+1,r,pnode)

}

return(pnode)

}

The reason for that that the height of the tree is 𝑂(log(𝑛)) is that in every recursive call, the size

of the sub array on which the call is done is smaller by a factor of 2. In other words, the height of

the tree satisfies the recursive formula: ℎ(𝑛) = 1 + ℎ (𝑛

2) for ℎ(1) = 1, and therefore we get

ℎ(𝑛) = log(𝑛).

Page 14: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Question 4 (22 points)

For each of the following claims, write if it is true or false and explain your answer. You may use,

as a basis, everything that you have seen in class but you must explain precisely what are you

basing your answers on. All the claims are in the subject of flow networks with 𝑛 > 2 vertices

(including 𝑠 and 𝑡) and 𝑚 edges such that any vertex is on at least one path from 𝑠 to 𝑡. This means

that a true claim is true if it always satisfy these conditions.

a. If given that the capacity of any edge in the network is either 1 or 2 (To remind you, if there is

no edge then this means that its capacity is 0) then the running time of Ford–Fulkerson algorithm

when it finds a path in the residual network using BFS is 𝑂(𝑚𝑛2). _____________________________________________________________________________________

True. The flow is bounded from above by 2𝑛 (the sum of the capacities on the outgoing edges

from 𝑠), and all the capacities are integer numbers, so that there are at most 2𝑛 iterations, and the

running time of each one of them is 𝑂(𝑚) so that in total we get 𝑂(𝑚𝑛) = 𝑂(𝑚𝑛2).

b. If given that the capacity of any edge in the network is in the form of 𝑘

𝑛 for 𝑘 is an integer

between 𝑛 and 2𝑛, then the running time of Ford–Fulkerson algorithm when it finds a path in the

residual network using BFS is 𝑂(𝑚𝑛2). _____________________________________________________________________________________

True. The flow is bounded from above by 2𝑛 (the sum of the capacities on the outgoing edges

from 𝑠), and all the capacities are multiplications of 1

𝑛, so that the flow is growing by at least

1

𝑛

for each iteration, and therefore the total number of iterations is most 2𝑛2. Since the running

time of each one of them is 𝑂(𝑚) so that in total we get 𝑂(𝑚𝑛) = 𝑂(𝑚𝑛2).

c. If given that from 𝑠 goes out 𝑑 edges with a weight of 1 then there must exist a flow function

in the net with value 𝑑. _____________________________________________________________________________________

False. It is possible that a cut with small capacity exists, for example, to 𝑡 can enter only one

edge with a capacity of 1, and the flow value is at most 1.

d. If the capacity of each edge in the network is a value between 1 and 2 and there exist a cut (𝑆, 𝑇) for 𝑠 ∈ 𝑆 and 𝑡 ∈ 𝑇 such that 𝑐(𝑆, 𝑇) = 1 then the value of the maximal flow is 1. _____________________________________________________________________________________

True. According to the given information regarding the edges capacities, the capacity of every

cut is at least 1 (since at least one edge "cut" it and its capacity is at least 1), so that the presented

cut is necessarily with minimal cut, and based on the theorem we've seen in class, the maximal

flow value is equal to the minimal capacity of the cut.

Page 15: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Question 5 (32 points)

In some factory are produced 𝑘 different types of products and there are 𝑛 machines (identical to

each other) to manufacture the products. For each type of product 1 ≤ 𝑖 ≤ 𝑘, and any number 0 ≤𝑗 ≤ 𝑛 of machines, 𝑝[𝑖][𝑗] is a number between 1 and 100 that represent the profit from selling a

product from type 𝑖 if there would be assigned to the assignment 𝑗 machines, when 𝑝[𝑖][0] and the

profit to that type of product cannot decrease if assigned to it a bigger number if machines (but

does not necessarily grow linear with the number of machines). We want to decide how many

machines to ascribe for each type of product, so that the total profit (the sum of the profits) from

the different types of products will be maximal (every machine can be ascribed to only one type

of product). In other words, we are looking for 𝑘 non-negative numbers 𝑥1, 𝑥2, … , 𝑥𝑘 so that

∑ 𝑥𝑖𝑘𝑖=1 = 𝑛 and ∑ 𝑝[𝑖][𝑥𝑖]

𝑘𝑖=1 are as large as possible.

First we find the maximal profit that can be obtained and afterward the optimal ascribe.

For each 1 ≤ 𝑖 ≤ 𝑘 and 0 ≤ 𝑗 ≤ 𝑛 we define 𝑀[𝑖][𝑗] to be the maximal profit that can be obtained

from the types of product 1,… , 𝑖, if ascribed for all these types together 𝑗 machines. In other words,

𝑀[𝑖][𝑗] is the maximal value of ∑ 𝑝[𝑙][𝑥𝑙]𝑖𝑙=1 over all the choices of 𝑖 non-negative numbers

𝑥1, 𝑥2, … , 𝑥𝑖 so that ∑ 𝑥𝑙𝑖𝑙=1 = 𝑗.

a. What is 𝑀[1][𝑗] for 0 ≤ 𝑗 ≤ 𝑛 ? _____________________________________________________________________________________

When 𝑖 = 1 it is possible only to take from product 1, so that 𝑀[1][𝑗] = 𝑝[1][𝑗].

b. From the following equations, which is true for any 1 < 𝑖 ≤ 𝑘 and 0 ≤ 𝑗 ≤ 𝑛? Explain your

answer.

(i) 𝑀[𝑖][𝑗] = 𝑚𝑎𝑥𝑟=0𝑗 {𝑝[𝑖][𝑟] + 𝑀[𝑖 − 1][𝑗]}

(ii) 𝑀[𝑖][𝑗] = 𝑚𝑎𝑥𝑟=0𝑗 {𝑝[𝑖][𝑟]} + 𝑀[𝑖 − 1][𝑗 − 𝑟]

(iii) 𝑀[𝑖][𝑗] = 𝑚𝑎𝑥𝑟=0𝑗 {𝑝[𝑖][𝑟] + 𝑀[𝑖 − 1][𝑗 − 𝑟]}

_____________________________________________________________________________________

The correct formula is (iii). Given that there are 𝑗 machines, to product 𝑖 can be assigned

between 0 to 1 machines and receive a profit of 𝑝[𝑖][𝑟] for 𝑟 is the number of machines that were

assigned to it. For any choice of the 𝑟 machines that are assigned to product number 𝑖, remains

𝑗 − 𝑟 machines that can be assigned to 1,… , 𝑖 − 1 machines. The maximal profit that can be

received from assigning 𝑟 − 𝑗 machines to products 1,… , 𝑖 is 𝑀[𝑖 − 1][𝑗 − 𝑟], so that under the

assumption that 𝑟 machines are assigned to product 𝑖, the total sum of the profit is 𝑝[𝑖][𝑟] +𝑀[𝑖 − 1][𝑗 − 𝑟]. Since we do not know which 𝑟 is the optimal, we will take the maximum over

all possible choices of for 𝑟 of that profit.

Page 16: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

c. Write a pseudo code for an algorithm that accepts as an input 𝑘, 𝑛 and 𝑝[][] and returns as an

output the maximal profit that can be obtained by ascribing 𝑛 machines to produce 𝑘 products,

when the profit from ascribing 𝑗 machines to a product 𝑖 is 𝑝[𝑖][𝑗]. The running time of the

algorithm should be polynomial in 𝑘 and 𝑛.

Remark: it is not possible to use one line of code to calculate the maximal or minimal value over

more than two values, but you must write a loop that calculates the maximal/minimal.

_____________________________________________________________________________________

MaxProfit(n,k,p[][])

{

for (j=0 to n)

M[1][j] := p[1][j] /* Initialize first column */

for (i= 2 to k) {

for (j = 0 to n) {

M[i][j] := p[i][0] + M[i-1][j]

/* Initialize M[i][j] by max profit obtained when i is not assigned any machine (r=0) */

Best[i][j]:=0 /* for last part of the question */ for (r=1 to j) { /* Compute max profit for 1,…,i using j machines */

new_profit := p[i][r] + M[i-1][j-r]

if (new_profit) > M[i][j] {

M[i][j] : = new_profit

Best[i][j] := r }

}

}

}

return(M[k][n])

}

d. Give the best possible upper bound on the running time of your algorithm (In other words, a

function 𝑓(𝑛, 𝑘) that its running time is 𝑂(𝑓(𝑛, 𝑘))). _____________________________________________________________________________________

The first loop runs in time 𝑂(𝑛), and afterwards there are 3 nested loops: the exterior runs over

𝑘 − 1 iterations, the next runs 𝑛 + 1 iterations, and the insider runs at most 𝑛 iterations, so in

total the running time is 𝑂(𝑛2𝑘).

Page 17: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

e. Show by writing another procedure, and if needed an addition to the pseudo code that you wrote

(that does not change asymptotically the running time of the algorithm), how can you get

𝑥1, 𝑥2, … , 𝑥𝑘 that obtain an optimal total profit. In other words, the additional procedure prints

pairs: (1, 𝑥1), (2, 𝑥2), … , (𝑘, 𝑥𝑘) (from left to right) so that ∑ 𝑥𝑖𝑘𝑖=1 = 𝑛 and ∑ 𝑝[𝑖][𝑥𝑖]

𝑘𝑖=1 is as large

as possible. In case your procedure is recursive, you must write explicitly with which parameters

it is called with. _____________________________________________________________________________________

We will add the following code, when Best[i][j] holds the same number 𝑟 that brings to

maximum the profit from the formula from the sub question b. We will call the procedure

Print_Best with 𝑖 = 𝑘, 𝑗 = 𝑛 and the matrix Best.

Print_Best(i,j,Best[][])

{

if (i=1)

print(1,j)

Print_Best(i-1,j-Best[i][j])

print(i,Best[i][j])

}

Page 18: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

ID:

Tel Aviv University

The Faculty of Engineering

Moed A 2013 in Data Structures and Algorithm

Exam Time: 3 Hours Date: 14.02.2013

Teacher: Dana Ron Instructions:

You are allowed to use four study aids papers. You cannot use any sort of computer.

The exam includes 5 questions. The grade for each question appears in brackets.

Write your answers on the exam form in the designated place for it. It is highly

recommended to first write your answer in the draft paper that you have received and

only later to copy it, in a clear and readable way, to the exam form. Explain shortly but

yet clearly all your claims. A claim without an explanation will not be accepted.

You may use theorems and algorithms which have been learned in class and in the

recitations or which have appeared in the homework assignments that you did. In such

cases you may cite what was learned with no need for a proof. On the other hand, if you

are using a slightly different version of an algorithm or some other analysis you must

explain precisely what the differences are.

In this exam there are 11 pages (including this one). Please make sure you have them all

in your possession.

Don't forget to write your ID number at the marked place.

GOOD LUCK!

Page 19: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Question 1 (20 points)

Following are the algorithms InsertionSort and MergeSort as learned in class (for

Merge(A,p,q,r) is a procedure that merges between two sorted sub-arrays A[p,q] and

A[q+1,r] and its running time is linear in r-p+1 for any input.

InsertionSort(Input: array A, integer n) MergeSort(Input: array A, integers p,r)

{ {

for j = 2 to n { if (r p)

newnum A[j] return

i j-1 else {

while ( i > 0 and newnum < A[i] ) { q (p+r)/2

A[i+1] A[i] MergeSort(A,p,q)

i i-1 MergeSort(A,q+1,r)

} Merge(A,p,q,r)

A[i+1] newnum }

} }

}

As a reminder for you, for an array A, we marked the running time of InsertionSort on the

array A with 𝑇𝐼𝑆(𝐴). Let n be the size of A. We mark with 𝑠(𝐴) the number of pairs (i, j) that are

"not sorted among themselves", meaning that they satisfy 1 ≤ 𝑖 < 𝑗 ≤ 𝑛 however 𝐴[𝑗] < 𝐴[𝑖]. For example, if we take 𝐴 = [5,4,4,3] then 𝑠(𝐴) = 5 when the pairs (i, j) which are not sorted are: (1,2), (1,3), (1,4), (2,4), (3,4).

Explain clearly your answers on all the following sub-questions.

a. What is 𝑇𝐼𝑆(𝐴) as a function of 𝑠(𝐴) and n? Instruction: pay attention that 𝑠(𝐴) is a sum

on j (running from 2 until n) of the size of the sets of indexes {𝑖|𝑖 < 𝑗 𝐴𝑁𝐷 𝐴[𝑖] > 𝐴[𝑗]}.

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

Page 20: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

For the following two sub questions, if your answer is positive you must describe the structure of

A accurately (For example like so: "in the 1, … , 𝑛 − 𝑛1

2 places are stored the numbers 1, … , 𝑛 − 𝑛1

2

sorted from smallest to largest, and in the 𝑛 − 𝑛1

2 + 1, … , 𝑛 places are stored the numbers 𝑛 − 𝑛1

2 +1, … , 𝑛 sorted from larges to smallest").

b. Do there exists constants 𝑐1 and 𝑐2 so that for any 𝑛 ≥ 4 exists an array A in size n that for

it 𝑐1𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛) ≤ 𝑇𝐼𝑆(𝐴) ≤ 𝑐2𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛) ?

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

c. We mark with 𝑇𝑀𝑆(𝐴) the running time of MergeSort on array A when calling

MergeSort with 𝑝 = 1 and 𝑟 = 𝑛.

Do there exists constants 𝑐1 and 𝑐2 so that for any 𝑛 ≥ 4 exists an array A in size n that for

it 𝑐1𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛) ≤ 𝑇𝑀𝑆(𝐴) ≤ 𝑐2𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛)

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

Page 21: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Question 2 (15 points)

One of the ambitions in a Binary Search Tree is keeping the tree balanced, so that the distance

between the farthest leaf from the root and the root itself is not too large. Write a pseudo-code for

a procedure that accepts as an input a pointer to the root of the binary search tree (and in it one

node) and returns the distance of the leaf in the tree that is the farthest from the root (according to

the definition of distance in a graph: the distance of the root from itself is 0, the distance of its

children from him is 1 and so on). Assume that each node in the tree is represented via a struct

contains the following fields: PAR (a pointer to the parent), VAL (the value in the node), LC (a

pointer to the left child) and RC (a pointer to the right child).

In addition to the pseudo-code, explain shortly the idea behind the procedure and analyze the

procedure's running time (in terms of Θ) as a function of the number of vertices in the tree, 𝑛.

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

Page 22: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Question 3 (15 points)

As a reminder for you, in the data structure which we used for 2-3 trees, each node via a struct

contains the following fields: PAR (a pointer to the parent), VAL (the value in the node), LC (a

pointer to the left child), MC (a pointer to the middle child), RC (a pointer to the right child) MIN1

(the minimal value in the sub tree), MIN2 (the minimal value in the sub tree) and MIN3 (the

minimal value in the sub tree) if one exists. As an another reminder for you, the field VAL is

relevant only for leaves and the fields LC, MC, RC, MIN1, MIN2 and MIN3 are relevant only for

inner vertices. Assume that in each struct that represent a node, contains also another field, SIZE

and in it is the number of leaves in the sub-tree that the vertex is its root. We will mark with n the

number of leaves in the tree (the number of elements in the set that the tree represents). Assume

also that 𝑛 ≥ 2.

Given a 2-3 tree which is represented via the above data structure, complete the following pseudo-

code so that when the procedure is called with pnode which is a pointer to the root and a value,

1 ≤ 𝑘 ≤ 𝑛 it returns the value of the k'th element between the tree leaves (sorted from smallest to

largest). You are allowed to alternatively write your own pseudo-code (for procedure as efficient

as possible) on the other side of the paper.

Rank-2-3(pnode,k) /* pnode is a pointer to a node of a 2-3-tree with the additional SIZE field,

and 1 ≤ 𝑘 ≤ 𝑝𝑛𝑜𝑑𝑒 → 𝑆𝐼𝑍𝐸 */

{

if (pnode SIZE = 1)

return( ____________________________________________________)

if (k ≤__________________________________________________________________)

return (____________________________________ __________________________)

else if (k ≤_______________________________________________________________)

return (_______________________________________________________________)

else

return (_______________________________________________________________)

}

Explain shortly the algorithm and analyze its running time, meaning, what is 𝑔(𝑛) so that the

running time will be: Θ(𝑔(𝑛)).

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

Page 23: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Question 4 (15 points)

Apply one iteration of Ford-Fulkerson algorithm in the (in (1)) with the given marked flow. In

other words, draw the residual network including the residual capacities of it (in (2)), find an

augmenting path (any) and update the flow (in (3)). Does the flow you get have maximal value?

If your answer is "yes", prove it by marking a cut (S,T) for which |𝑓| = 𝑐(𝑆, 𝑇), if your answer is

"no", show how is it possible to get a better flow with a higher value.

As a reminder for you, the mark 𝑥/𝑦 on an edge (𝑢, 𝑣) means that 𝑐(𝑢, 𝑣) = 𝑦 and 𝑓(𝑢, 𝑣) = 𝑥.

If only one number is written then it is the capacity and the flow is 0 (if there is no edge from 𝑢 to

𝑣 then 𝑐(𝑢, 𝑣) = 0, and always 𝑓(𝑢, 𝑣) = −𝑓(𝑢, 𝑣)).

s t

3/3

3/3

2/4

2/2 1/1

1/1

2

3/3

3/3

2

2

2

2 4

(1) Network G and flow f

s t

(2) Residual network 𝑮𝒇

t s

(3) Network G and new flow f

Page 24: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Question 5 (35 points)

Let 𝑆 = 𝑆1 … 𝑆𝑡 be a DNA string and let 𝐼 be an array of 𝑛 ≥ 2 different indexes that is sorted from

smallest to largest such that 𝐼[1] = 1 and 𝐼[𝑛] = 𝑡 + 1. There is a segmentation process

(biological), that given a sub string 𝑆𝑖 … 𝑆𝑘 and indexes 𝑖 < 𝑗 ≤ 𝑘, splits the sub string to two

different sub strings 𝑆𝑖 … 𝑆𝑗−1 and 𝑆𝑗 … 𝑆𝑘, when the cost of each split is max(𝑗 − 1, 𝑘 − 𝑗 + 1)

(meaning, the maximum between the length of the two resulting sub strings). We are interested to

split 𝑆 to 𝑛 − 1 sub strings that are defined by the array 𝐼, meaning to 𝑆1 = 𝑆𝐼[1] … 𝑆𝐼[2]−1, 𝑆2 =

𝑆𝐼[2] … 𝑆𝐼[3]−1, … , 𝑆𝑛−1 = 𝑆𝐼[𝑛−1] … 𝑆𝐼[𝑛]−1 (Remember that 𝐼[1] = 1 and 𝐼[𝑛] = 𝑡 + 1) by a

sequence of splits that its cost is minimal.

For example, let 𝑡 = 10:

And the final split should be:

If the order of the splits will be (from left to right): 𝐼[2], 𝐼[3]

So, first we split to

two sub strings

Then we split the second string to

Therefore the total cost would be 8 + 5 = 13

A C

2 1

C G T A C G T A

3 4 5 7 6 8 9 10

A C C G T A C G T A

2 1 3 4 5 7 6 8 9 10

S

I(3)=6

cost = 8

cost = 5 C C G T A

7 6 8 9 10

I(2)=3

G T A

3 4 5

A C C G T A C G T A

2 1 3 4 5 7 6 8 9 10

S

1 3 6 11

2 1 3 4

I

A C

2 1

G T A

3 4 5

C C G T A

7 6 8 9 10

1S

2S

3S

Page 25: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

While if the order will be 𝐼[3], 𝐼[2], meaning that we first split to:

And then split the first sub string to:

Therefore the total cost would be 5 + 3 = 8

We would first like to calculate the minimal cost of a sequence of splits, given by the array 𝐼 and

then we would like to find the sequence of splits that gives this minimal value. For each pair 1 ≤𝑝 < 𝑟 ≤ 𝑛, let 𝐶(𝑝, 𝑟) be the minimal cost of a split of the sub string 𝑆𝐼[𝑝] … 𝑆𝐼[𝑟]−1 to 𝑟 − 𝑝 sub

strings 𝑆𝐼[𝑝] … 𝑆𝐼[𝑝+1]−1, 𝑆𝐼[𝑝+1] … 𝑆𝐼[𝑝+2]−1, … , 𝑆𝐼[𝑟−1] … 𝑆𝐼[𝑟]−1.

You must explain your answers clearly for each of the following sub questions.

a. What is 𝐶(𝑝, 𝑝 + 1) for each 𝑝?

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

b. Explain why the following equation is true for any pair (𝑝, 𝑟) so that 𝑟 > 𝑝 + 1:

𝐶(𝑝, 𝑟) = 𝑚𝑖𝑛𝑝<𝑞<𝑟{max(𝐼[𝑞] − 𝐼[𝑝], 𝐼[𝑟] − 𝐼[𝑞]) + 𝐶(𝑝, 𝑞) + 𝐶(𝑞, 𝑟)}

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

A C C G T A C G T A

2 1 3 4 5 7 6 8 9 10

I(3)=6

I(2)=3

cost = 5

cost = 3

A C G T A

2 1 3 4 5

C C G T A

A C

2 1

G T A

3 4 5

7 6 8 9 10

S

Page 26: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

c. Write a pseudo code for a procedure that runs in polynomial time of 𝑛 and calculates the

minimal cost of a series of splits given by the array 𝐼. _____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

d. Give the best possible upper bound on the running time of the procedure you wrote.

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

Page 27: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

e. Show by writing another procedure, and if needed an addition to the pseudo code that you

wrote (that does not change asymptotically the running time of the algorithm), how can you also

find the optimal sequence of splits (and not just its cost). In other words, the output of the

procedure is a series of tuples (𝑖1, 𝑗1, 𝑘1), (𝑖2, 𝑗2, 𝑘2), … , (𝑖𝑛−1, 𝑗𝑛−1, 𝑘𝑛−1) when (𝑖𝑥, 𝑗𝑥, 𝑘𝑥)

means that the 𝑥'th split is of the a sub string from place 𝑖𝑥 to place 𝑘𝑥 into two sub strings: the

first one is from place 𝑖𝑥 to place 𝑖𝑥 − 1, and the second one is from place 𝑗𝑥 to place 𝑘𝑥 . In

particular, for the first split, 𝑖1 = 1, 𝑘1 = 𝑡. (For the example in the first question, the output

would be (from left to right): (1,6,10), (1,3,5)).

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

_____________________________________________________________________________________________

Page 28: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with
Page 29: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Solution to Moed A 2013 in Data Structures and Algorithm

Question 1

Following are the algorithms InsertionSort and MergeSort as learned in class (for

Merge(A,p,q,r) is a procedure that merges between two sorted sub-arrays A[p,q] and

A[q+1,r] and its running time is linear in r-p+1 for any input.

InsertionSort(Input: array A, integer n) MergeSort(Input: array A, integers p,r)

{ {

for j = 2 to n { if (r p)

newnum A[j] return

i j-1 else {

while ( i > 0 and newnum < A[i] ) { q (p+r)/2

A[i+1] A[i] MergeSort(A,p,q)

i i-1 MergeSort(A,q+1,r)

} Merge(A,p,q,r)

A[i+1] newnum }

} }

}

As a reminder for you, for an array A, we marked the running time of InsertionSort on the

array A with 𝑇𝐼𝑆(𝐴). Let n be the size of A. We mark with 𝑠(𝐴) the number of pairs (i, j) that are

"not sorted among themselves", meaning that they satisfy 1 ≤ 𝑖 < 𝑗 ≤ 𝑛 however 𝐴[𝑗] < 𝐴[𝑖]. For example, if we take 𝐴 = [5,4,4,3] then 𝑠(𝐴) = 5 when the pairs (i, j) which are not sorted are:

(1,2), (1,3), (1,4), (2,4), (3,4).

Explain clearly your answers on all the following sub-questions.

a. What is 𝑇𝐼𝑆(𝐴) as a function of 𝑠(𝐴) and n? Instruction: pay attention that 𝑠(𝐴) is a sum

on j (running from 2 until n) of the size of the sets of indexes {𝑖|𝑖 < 𝑗 𝐴𝑁𝐷 𝐴[𝑖] > 𝐴[𝑗]}.

_____________________________________________________________________________________________

(Please note that this question is a variant of a question that was given to you at a home work

assignment, exercise number 2.)

a. Following the instruction, we mark 𝑠(𝐴) = |{𝑖|𝑖 < 𝑗 𝐴𝑁𝐷 𝐴[𝑖] > 𝐴[𝑗]}| so that 𝑆(𝐴) =∑ 𝑠(𝐴, 𝑗)𝑛

𝑗=2 . As you've seen in class, when reaching to 𝐴[𝑗] in the exterior loop, the

numbers in 𝐴[1, … , 𝑗 − 1] are sorted (and are the same ones that were in that sub array at

the beginning of the algorithm's run). Therefore the number of iterations of the while loop

when reaching to 𝐴[𝑗] is 𝑠(𝐴, 𝑗) (since the loop is going over all the values that are larger

than 𝐴[𝑗] in the sub array 𝐴[1, … , 𝑗 − 1] and stop only when it reaches to a value that is

smaller than/equal to 𝐴[𝑗] or after going over all the values). Therefore (for some constants

𝑐1 and 𝑐2):

𝑇𝐼𝑆(𝐴) = ∑ (𝑐1 + 𝑐2𝑠(𝐴, 𝑗)𝑛𝑗=2 = 𝑐1(𝑛 − 1) + 𝑐2𝑠(𝐴).

Page 30: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Common mistakes:

1. Referring to 𝑠(𝐴) as the number of actions that is done for every 𝒋 in the while loop

(wheather if as an upper bound or out of the lack of understanding of the definition),

and setting the running time to 𝑂(𝑛 ∙ 𝑠(𝐴)).

2. Neglecting the constant number of operations that the for loop is doing regarding

𝑠(𝐴), and setting the running time to 𝑂(𝑠(𝐴)).

For the following two sub questions, if your answer is positive you must describe the structure of

A accurately (For example like so: "in the 1, … , 𝑛 − 𝑛1

2 places are stored the numbers 1, … , 𝑛 − 𝑛1

2

sorted from smallest to largest, and in the 𝑛 − 𝑛1

2 + 1, … , 𝑛 places are stored the numbers 𝑛 − 𝑛1

2 +1, … , 𝑛 sorted from larges to smallest").

b. Do there exists constants 𝑐1 and 𝑐2 so that for any 𝑛 ≥ 4 exists an array A in size n that for

it 𝑐1𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛) ≤ 𝑇𝐼𝑆(𝐴) ≤ 𝑐2𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛) ?

b. Yes. In particular, for every 𝑛 it applies that for an array 𝐴 in which for 𝑡 = ⌊𝑙𝑜𝑔𝑙𝑜𝑔(𝑛)⌋, in places 1, … , 𝑡 are the numbers 𝑛 − 𝑡 + 1, … , 𝑛 sorted from smallest to largest, and in

places 𝑡 + 1, … , 𝑛 are the numbers 1, … , 𝑛 − 𝑡 sorted from smallest to largest. Pay attention

that 𝑠(𝐴, 𝑗) = 0 for every 1 ≤ 𝑗 ≤ 𝑡 and 𝑠(𝐴, 𝑗) = 𝑡 for every 𝑡 + 1 ≤ j ≤ n so that 𝑠(𝐴) =(𝑛 − 𝑡)𝑡 = 𝑛𝑡 − 𝑡2, meaning that 𝑠(𝐴) < 𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛) for every 𝑛 ≥ 4 and 𝑠(𝐴) >𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛)

2 for every 𝑛 ≥ 4 (since 𝑛 − 𝑡 >

𝑛

2), and we get what we are asking for (with 𝑐1 =

1

2 and 𝑐2 = 1).

This applies also for array 𝐴 of the following form: in the 𝑘 = ⌊(𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛))1

2⌋ first

places are sorted, from largest to smallest, and in the rest of the places the numbers that are

larger than every number from the first 𝑘 places, and are sorted from smallest to largest. In

such case, 𝑠(𝐴, 𝑗) = 𝑗 − 1 for every 2 ≤ 𝑗 ≤ 𝑘 and 𝑗 > 𝑘. Therefore 𝑠(𝐴) = 1 + 2 + ⋯ +

𝑘 − 1 =(𝑘−1)𝑘

2 which is again as requested.

Common mistakes:

1. In many cases the example that was given was wrong. It is important to note that

there are 𝑘 elements that are sorted from largest to smallest so the number of pairs

among them that contribute to 𝑠(𝐴) is squared in 𝑘. In particular, if the 𝑙𝑜𝑔𝑙𝑜𝑔(𝑛)

first elements in the array are sorted from largest to smallest and the rest of the

elements are all larger than them and sorted from smallest to largest then:

𝑠(𝐴) = 𝑐 ∙ (𝑙𝑜𝑔𝑙𝑜𝑔(𝑛))2

Page 31: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

2. There were cases in which there was no explanation or a counter example for an

arrangement of the array was not given as requested.

c. We mark with 𝑇𝑀𝑆(𝐴) the running time of MergeSort on array A when calling

MergeSort with 𝑝 = 1 and 𝑟 = 𝑛.

Do there exists constants 𝑐1 and 𝑐2 so that for any 𝑛 ≥ 4 exists an array A in size n that for

it 𝑐1𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛) ≤ 𝑇𝑀𝑆(𝐴) ≤ 𝑐2𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛)

_____________________________________________________________________________________________

c. No. The running time of MergeSort is an order of 𝑛𝑙𝑜𝑔(𝑛) for every input array (since

the structure of a recursion tree is not the same for every input, and the running time of

Merge is linear in the size of the sub array on which it is running, for every input. This

means that in a recursion tree there are log (𝑛) levels, and in every level 𝑗 there are 2𝑗

vertices, that each one matches the run of Merge on an array with size 𝑛

2𝑗, so that every

level is contributing in total a linear order of 𝑛 actions, for any input).

Page 32: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Question 2

One of the ambitions in a Binary Search Tree is keeping the tree balanced, so that the distance

between the farthest leaf from the root and the root itself is not too large. Write a pseudo-code for

a procedure that accepts as an input a pointer to the root of the binary search tree (and in it one

node) and returns the distance of the leaf in the tree that is the farthest from the root (according to

the definition of distance in a graph: the distance of the root from itself is 0, the distance of its

children from him is 1 and so on). Assume that each node in the tree is represented via a struct

contains the following fields: PAR (a pointer to the parent), VAL (the value in the node), LC (a

pointer to the left child) and RC (a pointer to the right child).

In addition to the pseudo-code, explain shortly the idea behind the procedure and analyze the

procedure's running time (in terms of Θ) as a function of the number of vertices in the tree, 𝑛.

_____________________________________________________________________________________________

The procedure works in a recursive way. Given a pointer pnode to a vertex, it returns the maximal

length from the vertex to the leaf in its sub tree. In particular, left_dist is the distance from

the vertex to the farthest leaf in the sub tree of its left child: if there is no such, then the value is 0,

else it is 1 plus the maximal distance of the left child from the leaf in its sub tree (calculates

recursively). In a similar way, right_dist is defined. In fact, this means that if the vertex is a

leaf so that it does not have any children, the value that is returned is 0. The initial call is with the

pnode that points to the root of the tree [if we send it with NULL, that is in fact an illegal input

(other then a tree that includes a single node that is the root and the lead) and then −1 is returned].

Max-leaf-dist(pnode) /* pnode is a pointer to a node */

{

if (pnode = NULL)

return(-1)

if (pnode → LC != NULL) /* compute distance to furthest leaf in subtree of left child */

left_dist := 1+ Max-leaf-dist(pnode → LC)

else left_dist := 0

if (pnode → RC != NULL) /* compute distance to furthest leaf in subtree of right child */

right_dist := 1+ Max-leaf-dist(pnode → RC)

else right_dist := 0

return(max(left_dist,right_dist))

}

Since exactly one call is preformed, and it is recursive for every vertex in the tree [from its

(single) parent], and the running time of the procedure is constant, then the total running time is

Θ(𝑛) (not including recursive calls from it).

Page 33: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

It is also possible to write a slightly more "compact" code (with the same running time up to a

constant) as follows (pay attention that if pnode points to a leaf, then in the recursive calls to

the children, which are NULL, −1 will be returned, and in addition to 1 we get 0 as required).

Max-leaf-dist(pnode) /* pnode is a pointer to a node */

{

if (pnode = NULL)

return(-1)

else return(1+ max(Max-leaf-dist(pnode → LC),Max-leaf-dist(pnode → RC)))

}

Common mistakes:

1. Returning a value of the height of the tree which is bigger in 1 from the resulting

value.

2. There were a few cases in which the calculations were done using BFS. This such

answer would have been accepted in case that the code took into account that it is

a tree [for example by referring to the fact that the following elements that are

entered to the queue are the children of the vertex in the tree, or a reference to that

that other than the conditions that were satisfied for BFS as we've seen in class (in

particular that the vertices are {1,2, … , 𝑛} and the graph is represented by the array

𝐿 in size 𝑛 in which 𝐿[𝑖] points to a linked list of the neighbors of vertex 𝑖) there is

no pre given identification to the vertices].

Page 34: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Question 3

As a reminder for you, in the data structure which we used for 2-3 trees, each vertex in the tree

will be represented via a list and in it the following fields: PAR (a pointer to the parent), VAL (the

value in the leaf), LC (a pointer to the left child), MC (a pointer to the middle child), RC (a pointer

to the right child) MIN1 (the minimal value in the left tree), MIN2 (the minimal value in the middle

tree) and MIN3 (the minimal value in the right tree) if one exists. As an another reminder for you,

the field VAL is relevant only for leaves and the fields LC, MC, RC, MIN1, MIN2 and MIN3 are

relevant only for inner vertices. Assume that in each list (a vertex) there is also another field, SIZE

and in it is the number of leaves in the sub-tree that the vertex is its root. We will mark with n the

number of leaves in the tree (the number of elements in the set that the tree represent). Assume

also that 𝑛 ≥ 2.

Given a 2-3 tree which is represented via the above data structure, complete the following pseudo-

code so that when the procedure is called with pnode which is a pointer to the root and a value,

1 ≤ 𝑘 ≤ 𝑛 it returns the value of the k in size between the tree leaves, element of the array (sorted

from smallest to largest). You are allowed to alternatively write your own pseudo-code (for

procedure as efficient as possible) on the other side of the paper.

Rank-2-3(pnode,k) /* pnode is a pointer to a node of a 2-3-tree with the additional SIZE field,

and 1 ≤ 𝑘 ≤ 𝑝𝑛𝑜𝑑𝑒 → 𝑆𝐼𝑍𝐸 */

{

if (pnode SIZE = 1)

return( ____________________________________________________)

if (k ≤__________________________________________________________________)

return (____________________________________ __________________________)

else if (k ≤_______________________________________________________________)

return (_______________________________________________________________)

else

return (_______________________________________________________________)

}

Explain shortly the algorithm and analyze its running time, meaning, what is 𝑔(𝑛) so that the

running time will be: Θ(𝑔(𝑛)).

_____________________________________________________________________________________________

Page 35: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Rank-2-3(pnode,k) /* pnode is a pointer to a node of a 2-3-tree with the additional SIZE field,

and 1 ≤ 𝑘 ≤ 𝑝𝑛𝑜𝑑𝑒 → 𝑆𝐼𝑍𝐸 */

{

if (pnode SIZE = 1) /* Reached a leaf whose value should be returned (note that k must be 1) */

return(pnode VAL)

if (k ≤(pnode → LC) → SIZE) /* The element must be in the subtree of the left child, and its relative

rank remains k */

return (Rank-2-3(pnode → LC,k))

else if (k ≤(pnode → LC) → SIZE + (pnode → MC) → SIZE) /* The element must be in the subtree

of the middle child, and its relative

rank in this subtree is k minus the

number of leaves in the subtree of the

left (“small”) child*/

return (Rank-2-3(pnode → MC, k - (pnode → LC) → SIZE)

else /*The element must be in the subtree of the right child, and its relative rank in this subtree is k

minus the total number of leavess in the subtrees of the left and middle children */

return (Rank-2-3(pnode → RC, k – (pnode → LC) → SIZE – (pnode → MC) → SIZE))

}

The running time of the procedure is linear in the height of the tree, meaning Θ(log(𝑛)).

Common mistakes:

Most of the students solved this question correctly, however there were a number

of mistakes that repeated:

1. The most common mistake was not updating the value of 𝑘 – in the recursive

calls. In addition, there were students who did not understand the meaning of

𝑠𝑖𝑧𝑒 and compared between 𝑘 and the minimal balue in one of the branches (for

example (pnodeLC)min).

2. Some students assumed wrongly that the tree always devidable by a factor of

three which caused a comparison between 𝑘 and the value of 𝑠𝑖𝑧𝑒

3 or to a

recursive call to a function with values such as 2 ∙𝑠𝑖𝑧𝑒

3 in the place which 𝑘

suppose to be at.

Page 36: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Question 4 (15 points)

Apply one iteration of Ford-Fulkerson algorithm in the (in (1)) with the given marked flow. In

other words, draw the residual network including the residual capacities of it (in (2)), find an

augmenting path (any) and update the flow (in (3)). Does the flow you get have maximal value?

If your answer is "yes", prove it by marking a cut (S,T) for which |𝑓| = 𝑐(𝑆, 𝑇), if your answer is

"no", show how is it possible to get a better flow with a higher value.

As a reminder for you, the mark 𝑥/𝑦 on an edge (𝑢, 𝑣) means that 𝑐(𝑢, 𝑣) = 𝑦 and 𝑓(𝑢, 𝑣) = 𝑥.

If only one number is written then it is the capacity and the flow is 0 (if there is no edge from 𝑢 to

𝑣 then 𝑐(𝑢, 𝑣) = 0, and always 𝑓(𝑢, 𝑣) = −𝑓(𝑢, 𝑣)).

s t

3/3

3/3

2/4

2/2 1/1

1/1

2

3/3

3/3

2

2

2

2 4

(1) Network G and flow f

s t

(2) Residual network 𝑮𝒇

t s

(3) Network G and new flow f

Page 37: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

The path from 𝑠 to 𝑡 for the residual network in the left is bolded

The maximal flow: the capacity of the presented cut is 𝑐(𝑠, 𝑎) + 𝑐(𝑠, 𝑒) + 𝑐(𝑖, 𝑡) = 3 + 2 +3 = 8 and that is also the value of the flow. The nodes of 𝑆 = {𝑠, 𝑐, ℎ, 𝑖} in the cut (marked in

red) are the nodes to which there is a path from 𝑠 in the residual graph that is in the left.

s t

3/3

3/3

2/4

2/2 1/1

1/1

2

3/3

3/3

2

2

2

2 4

(1) Network G and flow f

s t

(2) Residual network 𝑮𝒇

t s

(3) Network G and new flow f

3

2

2

2

2

2

2

3

1

3

1

2

2

4

3/3

4

1/1

2/2

2/2

2/4

1/1

3/3

3/3

3/3

2/2

2

2

3

2/2 s t

a a

a a b b

b b

c

c c

c

d d

d d

e

e

e

e

g g

g g

h h

h h

i i

i i

Page 38: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Question 5 (35 points)

Let 𝑆 = 𝑆1 … 𝑆𝑡 is a DNA string and let 𝐼 be an array of 𝑛 ≥ 2 different indexes that is sorted from

smallest to largest such that 𝐼[1] = 1 and 𝐼[𝑛] = 𝑡 + 1. There is a segmentation process

(biological), that given a sub string 𝑆𝑖 … 𝑆𝑘 and indexes 𝑖 < 𝑗 ≤ 𝑘, divides the sub string to two

different sub strings 𝑆𝑖 … 𝑆𝑗−1 and 𝑆𝑗 … 𝑆𝑘, when the cost of each division is max(𝑗 − 1, 𝑘 − 𝑗 + 1)

(meaning, the maximum between the length of the two resulting sub strings). We are interested to

divide 𝑆 to 𝑛 − 1 sub strings that are defined by the array 𝐼, meaning for 𝑆1 = 𝑆𝐼[1] … 𝑆𝐼[2]−1, 𝑆2 =

𝑆𝐼[2] … 𝑆𝐼[3]−1, … , 𝑆𝑛−1 = 𝑆𝐼[𝑛−1] … 𝑆𝐼[𝑛]−1 (Remember that 𝐼[1] = 1 and 𝐼[𝑛] = 𝑡 + 1) by a

sequence of divisions that its cost is minimal.

For example, let 𝑡 = 10:

And the final division should be:

If the order of the divisions will be (from left to right): 𝐼[2], 𝐼[3]

So, first we division to

two sub strings

Then we divide the second string to

Therefore the total cost would be 8 + 5 = 13

A C

2 1

C G T A C G T A

3 4 5 7 6 8 9 10

A C C G T A C G T A

2 1 3 4 5 7 6 8 9 10

S

I(3)=6

cost = 8

cost = 5 C C G T A

7 6 8 9 10

I(2)=3

G T A

3 4 5

A C C G T A C G T A

2 1 3 4 5 7 6 8 9 10

S

1 3 6 11

2 1 3 4

I

A C

2 1

G T A

3 4 5

C C G T A

7 6 8 9 10

1S

2S

3S

Page 39: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

While if the order will be 𝐼[3], 𝐼[2], meaning that we first divide to:

And then divide the first sub string to:

Therefore the total cost would be 5 + 3 = 8

We would first like to calculate the minimal cost of a sequence of divisions, given by the array 𝐼

and then we would like to find the sequence of divisions that give this minimal value. For each

pair 1 ≤ 𝑝 < 𝑟 ≤ 𝑛, let 𝐶(𝑝, 𝑟) be the minimal cost of a division of the sub string 𝑆𝐼[𝑝] … 𝑆𝐼[𝑟]−1

to 𝑟 − 𝑝 sub strings 𝑆𝐼[𝑝] … 𝑆𝐼[𝑝+1]−1, 𝑆𝐼[𝑝+1] … 𝑆𝐼[𝑝+2]−1, … , 𝑆𝐼[𝑟−1] … 𝑆𝐼[𝑟]−1.

You must explain your answers clearly for each of the following sub questions.

a. What is 𝐶(𝑝, 𝑝 + 1) for each 𝑝?

_____________________________________________________________________________________________

a. According to its definition in the question, 𝐶(𝑝, 𝑟) is the minimal cost of dividing a sub string

𝑆𝐼[𝑝] … 𝑆𝐼[𝑟]−1. When 𝑟 = 𝑝 + 1, there exists one string that is not dividable and therefore the

price for a division is 𝐶(𝑝, 𝑝 + 1) = 0.

Common mistakes:

Some understood that for 𝑟 = 𝑝 + 1 we get a single sub string, and then claimed that the price

for it is 𝐶(𝑝, 𝑝 + 1) = 𝐼[𝑝 + 1] − 𝐼[𝑝]. Though there is no price here for a division, no points

were taken for such answer since in it can be seen a legitimate interpretation. There were others

that understood that we get a single sub string but claimed that the price is 1 without

understanding that there is a single string, but from a wrong interpretation and confusion

between the number of sub strings and there length.

A C C G T A C G T A

2 1 3 4 5 7 6 8 9 10

I(3)=6

I(2)=3

cost = 5

cost = 3

A C G T A

2 1 3 4 5

C C G T A

A C

2 1

G T A

3 4 5

7 6 8 9 10

S

Page 40: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

b. Explain why the following equation is true for any pair (𝑝, 𝑟) so that 𝑟 > 𝑝 + 1:

𝐶(𝑝, 𝑟) = 𝑚𝑖𝑛𝑝<𝑞<𝑟{max(𝐼[𝑞] − 𝐼[𝑝], 𝐼[𝑟] − 𝐼[𝑞]) + 𝐶(𝑝, 𝑞) + 𝐶(𝑞, 𝑟)}

_____________________________________________________________________________________________

b. In order to determine the optimal price for dividing a sub string 𝑆𝐼[𝑝] … 𝑆𝐼[𝑟]−1 (into 𝑟 − 𝑝 > 1

sub strings) we will examine all the possibilities of the first division. In other words, to every 𝑞

between 𝑝 + 1 and 𝑟 − 1, the first division could be to 𝑆𝐼[𝑝] … 𝑆𝐼[𝑞]−1 and 𝑆𝐼[𝑞] … 𝑆𝐼[𝑟]−𝑞. When

the price for such division is the maximum between the length of the two obtained sub strings,

meaning max(𝐼[𝑞] − 𝐼[𝑝], 𝐼[𝑟] − 𝐼[𝑞]). Given some specific choice 𝑞 for the first division, an

optimal continuity of the division of each one of the two obtained sub strings is the sum of

𝐶(𝑝, 𝑞): an optimal cost of dividing the sub string 𝑆𝐼[𝑝] … 𝑆𝐼[𝑞] and 𝐶(𝑞, 𝑟): the optimal cost of

dividing the sub string 𝑆𝐼[𝑞] … 𝑆𝐼[𝑟]−1. In order to get the total optimum we need to find the

minimum over all the choices for the first division (meaning, the 𝑞 that is strictly bigger than 𝑝

and strictly smaller than 𝑟) of the sum of the three factors above.

Common mistakes:

There were not a few that wrote confusing / unclear answers. In general, there were a forgiving

attitude to that, however there were cases in which the unclarity was severe and showed a

misunderstanding, and for that points were taken off.

There were some that claimed that the minimum is taken only on the first part of the expression

(the maximum among the lengths), and for 𝑞 that gives the minimum added 𝐶(𝑝, 𝑞) + 𝐶(𝑞, 𝑟).

This of course is a severe mistake since you must choose 𝑞 that leads to minimum the sum of the

three factors.

c. Write a pseudo code for a procedure that run in polynomial time of 𝑛 and calculates the

minimal cost of a series of divisions given by the array 𝐼. _____________________________________________________________________________________________

c.

DNA_Segmentation(input: integer n, array of n integers I[])

/* Procedure that returns the optimal solution for the segmentation of a DNA strand. The matrix

C is assumed to be a global matrix (as well as the matrix Break). */

{

if (n<3)

return(0)

for (p= 1 to n-1) /*Initialization*/

for (r = 1 to n) /* (Could also run just from r=i+1 to n) */

if( r=p+1 )

C[p][r] := 0 /* By Item A in question */

else

C[p][r] := -1 /* Mark as uncomputed */

DNA_Segmentation_Mem(1,n,I) /* Computes C[1][n] by memoization */

Print_Best_Segmentation(1,n,I,Break) /* (for last item in the question) */

return(C[1][n])

}

Page 41: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

DNA_Segmentation_Mem(p,r,I) /* Computes C[p][r] recursively+memory (memorization) */

{

min_cost := ∞ /* ∞ stands for cost larger than any possible segmentation cost, e.g.,

n2. Alternatively can initialize with cost for q=p+1 */

for (q = p+1 to r-1) /* Computes cost for best first breaking point I[q] */

{

if (C[p][q] = -1) /* if C[p][q] has not been computed yet*/

DNA_Segmentation_Mem (p,q,I) /* Computes and fills in C[p][q] */

if (C[q][r] = -1) /* if C[q][r] has not been computed yet*/

DNA_Segmentation_Mem (q,r,I) /* Computes and fills in C[q][r] */

new_cost := max(I[q] – I[p], I[r] – I[q]) + C[p][q] + C[q][r]

if (new_cost < min_cost) { /* Found better initial segmentation */

min_cost := new_cost

best_break := q /* (for last item in the question)

Can also just write Break[p][r]:=q*/

}

}

C[p][r] := min_cost

Break[p][r] := best_break /* (for last item in the question) */

}

It is also possible to solve in a bottom up way as follows:

DNA_Segmentation_BU(n,I)

{

for (p=1 to n-1) /* Initialization */

C[p][p+1] := 0

for (d=2 to n-1) { /* d is difference between p and r */

for (p=1 to n-2) {

r := p+d

min_cost := ∞ /* Can also initialize with cost on q=p+1 */

for (q=p+1 to r-1) { /* Compute cost for best initial segmentation */

new_cost := max(I[q] – I[p], I[r] – I[q]) + C[p][q] + C[q][r]

if (new_cost < min_cost) /* Found better initial segmentation */

min_cost := new_cost

} /* End `for’ on q */

C[p][r] := min_cost

} /* End `for’ on p */

} /* End `for’ on d */

return(C[1][n])

}

Moreover, it is possible to preform a bottom up according to another (correct) order of

computation: in particularly, when 𝑝 runs from bigger to smaller and 𝑟 from smaller to bigger.

Page 42: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

Common mistakes:

1. Not returning 𝐶[1][𝑛] at the end of the bottom-up version.

2. Calling the wrong recursive procedure in the memorization version (meaning,

with wrong parameters).

3. Inexplicit boundaries of the loos (in particular of 𝑞).

4. Missing initialization or a wrong one for min_cost (for example to 0).

5. A bottom-up calculation with a wrong ordered calculation, that causes using the

values in the matrix that did not yet been calculated.

6. A recursive version that does not use memory – the running time in such case

would be exponential.

7. A severe confusion with using the indexes that implies of an attempt to match

the solution to a different problem, in context of the current one, without true

understanding.

8. Instead of calculating the minimum with a formula using a loop that run on 𝑞

(between 𝑝 + 1 and 𝑟 − 1), simply writing the formula (there were very few

that did that). There were also (very few) that wrote the loop but in addition left

the min without explaining between what is the minimum.

d. Give the best possible upper bound on the running time of the procedure you wrote.

_____________________________________________________________________________________________

d.

For the memorization version: the running time of the initialization loop is Θ(𝑛2) since its two

nested for loops. Since before each call to DNA_Segmentation_Mem with the pair 𝑝, 𝑞 (or 𝑞, 𝑟)

we check that 𝐶[𝑝][𝑞] (similarly 𝐶[𝑞][𝑟]) is not yet calculated, then to every such pair there is at

most one call for the procedure. The running time of the procedure with the parameters 𝑝, 𝑟 is

linear in 𝑟 − 𝑝 which is at most 𝑛 (when only the calls with 𝑟 > 𝑝 (according to the boundaries

of the while loop) are done). The total running time is bounded from above by the number of

pairs, 𝑛2, times the running time for each pair, which is at most 𝑐𝑛 for some constant 𝑐, and we

get 𝑂(𝑛3). A somewhat more exact computation gives a order of magnitude of:

∑ ∑ (𝑟 − 𝑝)𝑛𝑟=𝑝+1

𝑛−1𝑝=1 = ∑ ∑ 𝑑𝑛−𝑙

𝑑=1𝑛−1𝑙=1 ≤ ∑ ∑ 𝑑𝑛

𝑑=1𝑛𝑙=1 = ∑

𝑛(𝑛+1)

2

𝑛𝑑=1 = 𝑂(𝑛3)

When we used the identity 𝑑 = 𝑟 − 𝑝 in the series developing.

Page 43: Data Structures and Algorithms · Solution to Moed A 2013 in Data Structures and Algorithm . Question 1 . Given the following algorithm that accepts as an input an array 𝐴 with

For the Bottom Up version we get the same but more simply: there are 3 nested loops, each run

at most 𝑛 iterations.

e. Show, with writing another procedure if you need, an addition to the pseudo code that you

wrote (that does not change asymptotically the running time of the algorithm), how can you also

find the optimal sequence of divisions (and not just its cost). In other words, the output of the

procedure is a series of tuples (𝑖1, 𝑗1, 𝑘1), (𝑖2, 𝑗2, 𝑘2), … , (𝑖𝑛−1, 𝑗𝑛−1, 𝑘𝑛−1) when (𝑖𝑥, 𝑗𝑥, 𝑘𝑥)

means that the 𝑥 division is a sub string from place 𝑖𝑥 to place 𝑘𝑥. In particular, for the first

division, 𝑖1 = 1, 𝑘1 = 𝑡. (For the example in the first question, the output would be (from left to

right): (1,6,10), (1,3,5)).

_____________________________________________________________________________________________

e.

The addition to the code is marked in bold in the code in sub question c. In order to print the

optimal segmentation we use the following procedure:

Print_Best_Segmentation(p,r,I,Break) /*Prints the optimal segmentation*/

{

if (r < p+2) return /* No further segmentation needed */

q := Break[p][r] /* q is index of best initial segmentation point I[q] */

Print(I[p],I[q],I[r]-1)

Print_Best_Segmentation(p,q,I,Break) /*Recursively print segmentation of first substring*/

Print_best_Fragmentation(q,r,I,Break) /*Recursively print segmentation of second substring*/

}

Common mistakes:

1. Missing or wrong stop condition.

2. Wrong order of operations (in particular, the third row in the above code

appears at the end).

3. Printing of (𝑝, 𝑞, 𝑟) instead of (𝐼[𝑝], 𝐼[𝑞], 𝐼[𝑟] − 1).