data structures and algorithms · solution to moed a 2013 in data structures and algorithm ....
TRANSCRIPT
Data Structures and Algorithms
Good Luck
Vaad Handasa
ID:
Tel Aviv University
The Faculty of Engineering
Moed B 2013 in Data Structures and Algorithm
Exam Time: 3 Hours Date: 82.03.2013
Teacher: Dana Ron Instructions:
You are allowed to use four study aids papers. You cannot use any sort of computer.
The exam includes 5 questions. The grade for each question appears in brackets.
Write your answers on the exam form in the designated place for it. It is highly
recommended to first write your answer in the draft paper that you have received and
only later to copy it, in a clear and readable way, to the exam form. Explain shortly but
yet clearly all your claims. A claim without an explanation will not be accepted.
You may use theorems and algorithms which have been learned in class and in the
recitations or which have appeared in the homework assignments that you did. In such
cases you may cite what was learned with no need for a proof. On the other hand, if you
are using a slightly different version of an algorithm or some other analysis you must
explain precisely what the differences are.
In this exam there are 9 pages (including this one). Please make sure you have them all in
your possession.
Don't forget to write your ID number at the marked place.
GOOD LUCK! (1) (2)
a.
b.
(3) a.
b.
(4) a.
b.
c.
d.
(5) a.
b.
c.
d.
e .
Question 1 (71 points)
Given the following algorithm that accepts as an input an array 𝐴 with of 𝑛 numbers and two
indexes 1 ≤ 𝑝, 𝑟 ≤ 𝑛. The running time of the procedure Proc1(A,p,r) for any given array is
𝑐 ∙ 𝑠 ∙ 𝑙𝑜𝑔𝑙𝑜𝑔(𝑠) for 𝑐 a constant and 𝑠 = 𝑟 − 𝑝 + 1 is the size of the sub array 𝐴[𝑝, … , 𝑟] and the
running time of Proc2(A,p,r) for any given array is 𝑐′ ∙ 𝑠 for 𝑐′ a constant.
Alg(A,n,p,r)
{
if (r-p+1 ≤ n1/4 )
Proc1(A,p,r)
else {
t := (r-p+1)/log(n)
for(i=1to log(n) )
Alg(A, n, p+(i-1)t, p+it-1)
Proc2(A,p,r)
}
}
We mark the running time of Alg when it is called with 𝑝 = 1 and 𝑟 = 𝑛 with 𝑇𝐴(𝑛).
What is 𝑔(𝑛) so that 𝑇𝐴(𝑛) = Θ(𝑔(𝑛))?
Instruction: show the recursion tree of the algorithm. What is the height of the recursion tree? What
does each level, except the leaves, contribute to the running time? What is the leaves contribution?
To remind you, log𝑎 𝑏 =log2 𝑏
log2 𝑎. You can assume, for the sake of simplicity that log(𝑛) is an integer,
that 𝑟−𝑝+1
log(𝑛) is always an integer and that the recursion always stops when 𝑟 − 𝑝 + 1 = 𝑛
1
4.
Recommendation: mark 𝑘 = 𝑛1
4 and 𝑑 = log(𝑛), and analyze first the running time as a function
of 𝑛, 𝑑, 𝑘.
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
Question 8 (72 points)
As a reminder for you, a maximal priority queue is defined like a minimal priority queue except
that it supports the function DeleteMax instead of DeleteMin. We assume that the elements in the
priority queue are numbers (not necessarily different from each other) and DeleteMax deletes from the
queue the element with the maximal value and returns it.
a. Explain what can be changed in the definition of the partially ordered binary tree so that it will fit the
maximal priority queue, and explain with words how does a DelteMax action is performed on the
partially ordered binary tree (as described in class for DeleteMin).
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
b. Write a pseudo code for the procedure Insert when the partially ordered binary tree is represented by a
heap 𝑃 (meaning a record in which are two fields: P.size that holds the number of elements in the
priority queue and P.T is an array with size 𝑀𝐴𝑋𝑆𝐼𝑍𝐸 that holds the elements of the queue). The
running time of Insert should be Θ(log(𝑛)) for 𝑛 is the number of elements in the queue.
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
Question 3 (75 points)
a. We are interested in a procedure that accepts as an input a pointer to the root of the binary search
tree 𝑇 and in it at most 𝑁 nodes, and also an array 𝐴 of size 𝑁 (in which there is no information).
The procedure insert the values in the tree 𝑇 sorted from smallest to largest, into an array 𝐴 (in a
sequential order) and returns the number of values, 𝒏, that was inserted to the array (the
number of nodes in the tree). In other words, at the end of the procedure's run, in
𝐴[1], 𝐴[2], … , 𝐴[𝑛] are the values that are in tree 𝑇, sorted from smallest to largest. Complete the
following pseudo code for this procedure.
Tree-to-array(pnode,A,i) /* Initially call the procedure with pnode = pointer to root of tree, i=1*/
{
if (pnode = NULL)
return( ____________ )
s1 := __________________________
A[i+s1] := ________________________
s2 := ____________________________
return( _________________ )
}
b. Complete the following pseudo code so that the procedure Array-to-tree(A,n) that
accepts as an input an array 𝐴 and an integer 𝑛, so that the numbers in 𝐴[1], … , 𝐴[𝑛] are different
from each other and sorted from smallest to largest. The procedure returns a pointer to the root of
a binary search tree in which are the numbers that are in the array, and its height is 𝑶(𝐥𝐨𝐠(𝒏)).
Explain why this is the height of the tree.
Array-to-tree(A,n) {
return(Array-to-tree-req(A,1,n,NULL)) }
Array-to-tree-req(A,p,r,parpnode)
{
create node and let pnode be a pointer to it
q := (p+r)/2
pnode VAL := ___________________
pnode LC := NULL; pnode RC := NULL; pnode PAR := parpnode
if (r > p) {
if (p < q)
pnode LC := ____________________________
if (r > q)
pnode RC := ____________________________
}
return(pnode)
}
Question 4 (82 points)
For each of the following claims, write if it is true or false and explain your answer. You may use,
as a basis, everything that you have seen in class but you must explain precisely what are you
basing your answers on. All the claims are in the subject of flow networks with 𝑛 > 2 vertices
(including 𝑠 and 𝑡) and 𝑚 edges such that any vertex is on at least one path from 𝑠 to 𝑡. This means
that a true claim is true if it always satisfy these conditions.
a. If given that the capacity of any edge in the network is either 1 or 2 (To remind you, if there is
no edge then this means that its capacity is 0) then the running time of Ford–Fulkerson algorithm
when it finds a path in the residual network using BFS is 𝑂(𝑚𝑛2).
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
b. If given that the capacity of any edge in the network is in the form of 𝑘
𝑛 for 𝑘 is an integer
between 𝑛 and 2𝑛, then the running time of Ford–Fulkerson algorithm when it finds a path in the
residual network using BFS is 𝑂(𝑚𝑛2).
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
c. If given that from 𝑠 goes out 𝑑 edges with a weight of 1 then there must exist a flow function
in the net with value 𝑑.
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
d. If the capacity of each edge in the network is a value between 1 and 2 and there exist a cut (𝑆, 𝑇)
for 𝑠 ∈ 𝑆 and 𝑡 ∈ 𝑇 such that 𝑐(𝑆, 𝑇) = 1 then the value of the maximal flow is 1.
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
Question 5 (32 points)
In some factory are produced 𝑘 different types of products and there are 𝑛 machines (identical to
each other) to manufacture the products. For each type of product 1 ≤ 𝑖 ≤ 𝑘, and any number 0 ≤𝑗 ≤ 𝑛 of machines, 𝑝[𝑖][𝑗] is a number between 1 and 100 that represent the profit from selling a
product from type 𝑖 if there would be assigned to the assignment 𝑗 machines, when 𝑝[𝑖][0] and the
profit to that type of product cannot decrease if assigned to it a bigger number if machines (but
does not necessarily grow linear with the number of machines). We want to decide how many
machines to ascribe for each type of product, so that the total profit (the sum of the profits) from
the different types of products will be maximal (every machine can be ascribed to only one type
of product). In other words, we are looking for 𝑘 non-negative numbers 𝑥1, 𝑥2, … , 𝑥𝑘 so that
∑ 𝑥𝑖𝑘𝑖=1 = 𝑛 and ∑ 𝑝[𝑖][𝑥𝑖]
𝑘𝑖=1 are as large as possible.
First we find the maximal profit that can be obtained and afterward the optimal ascribe.
For each 1 ≤ 𝑖 ≤ 𝑘 and 0 ≤ 𝑗 ≤ 𝑛 we define 𝑀[𝑖][𝑗] to be the maximal profit that can be obtained
from the types of product 1, … , 𝑖, if ascribed for all these types together 𝑗 machines. In other words,
𝑀[𝑖][𝑗] is the maximal value of ∑ 𝑝[𝑙][𝑥𝑙]𝑖𝑙=1 over all the choices of 𝑖 non-negative numbers
𝑥1, 𝑥2, … , 𝑥𝑖 so that ∑ 𝑥𝑙𝑖𝑙=1 = 𝑗.
a. What is 𝑀[1][𝑗] for 0 ≤ 𝑗 ≤ 𝑛 ?
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
b. From the following equations, which is true for any 1 < 𝑖 ≤ 𝑘 and 0 ≤ 𝑗 ≤ 𝑛? Explain your
answer.
(i) 𝑀[𝑖][𝑗] = 𝑚𝑎𝑥𝑟=0𝑗 {𝑝[𝑖][𝑟] + 𝑀[𝑖 − 1][𝑗]}
(ii) 𝑀[𝑖][𝑗] = 𝑚𝑎𝑥𝑟=0𝑗 {𝑝[𝑖][𝑟]} + 𝑀[𝑖 − 1][𝑗 − 𝑟]
(iii) 𝑀[𝑖][𝑗] = 𝑚𝑎𝑥𝑟=0𝑗 {𝑝[𝑖][𝑟] + 𝑀[𝑖 − 1][𝑗 − 𝑟]}
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
c. Write a pseudo code for an algorithm that accepts as an input 𝑘, 𝑛 and 𝑝[] [] and returns as an
output the maximal profit that can be obtained by ascribing 𝑛 machines to produce 𝑘 products,
when the profit from ascribing 𝑗 machines to a product 𝑖 is 𝑝[𝑖][𝑗]. The running time of the
algorithm should be polynomial in 𝑘 and 𝑛.
Remark: it is not possible to use one line of code to calculate the maximal or minimal value over
more than two values, but you must write a loop that calculates the maximal/minimal.
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
d. Give the best possible upper bound on the running time of your algorithm (In other words, a
function 𝑓(𝑛, 𝑘) that its running time is 𝑂(𝑓(𝑛, 𝑘))).
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
e. Show by writing another procedure, and if needed an addition to the pseudo code that you wrote
(that does not change asymptotically the running time of the algorithm), how can you get
𝑥1, 𝑥2, … , 𝑥𝑘 that obtain an optimal total profit. In other words, the additional procedure prints
pairs: (1, 𝑥1), (2, 𝑥2), … , (𝑘, 𝑥𝑘) (from left to right) so that ∑ 𝑥𝑖𝑘𝑖=1 = 𝑛 and ∑ 𝑝[𝑖][𝑥𝑖]𝑘
𝑖=1 is as large
as possible. In case your procedure is recursive, you must write explicitly with which parameters
it is called with.
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
Solution to Moed A 2013 in Data Structures and Algorithm
Question 1
Given the following algorithm that accepts as an input an array 𝐴 with of 𝑛 numbers and two
indexes 1 ≤ 𝑝, 𝑟 ≤ 𝑛. The running time of the procedure Proc1(A,p,r) for any given array is
𝑐 ∙ 𝑠 ∙ 𝑙𝑜𝑔𝑙𝑜𝑔(𝑠) for 𝑐 a constant and 𝑠 = 𝑟 − 𝑝 + 1 is the size of the sub array 𝐴[𝑝,… , 𝑟] and the
running time of Proc2(A,p,r) for any given array is 𝑐′ ∙ 𝑠 for 𝑐′ a constant.
Alg(A,n,p,r)
{
if (r-p+1 ≤ n1/4 )
Proc1(A,p,r)
else {
t := (r-p+1)/log(n)
for(i=1to log(n) )
Alg(A, n, p+(i-1)t, p+it-1)
Proc2(A,p,r)
}
}
We mark the running time of Alg when it is called with 𝑝 = 1 and 𝑟 = 𝑛 with 𝑇𝐴(𝑛).
What is 𝑔(𝑛) so that 𝑇𝐴(𝑛) = Θ(𝑔(𝑛))?Instruction: show the recursion tree of the algorithm. What is the height of the recursion tree? What
does each level, except the leaves, contribute to the running time? What is the leaves contribution?
To remind you, log𝑎 𝑏 =log2 𝑏
log2 𝑎. You can assume, for the sake of simplicity that log(𝑛) is an integer,
that 𝑟−𝑝+1
log(𝑛)is always an integer and that the recursion always stops when 𝑟 − 𝑝 + 1 = 𝑛
1
4.
Recommendation: mark 𝑘 = 𝑛1
4 and 𝑑 = log(𝑛), and analyze first the running time as a function
of 𝑛, 𝑑, 𝑘. _____________________________________________________________________________________
We will solve according to the instruction (and use assumptions regarding that the relevant
numbers are integers). The tree is built as follows: for every inside node there are 𝑑 = log(𝑛) children, so that in the 𝑖th level there are 𝑑𝑖 nodes, when every node fits to a sub array with size𝑛
𝑑𝑖. According to the recursion stop condition, every leaf fits to a sub array with size 𝑘 = 𝑛
1
4, and
there are 𝑛
𝑘= 𝑛
3
4 leafs. Since the running time of Proc2 is linear in size of the sub array on which
it is running, every level in the tree, except the leafs level, donates the total time of 𝑐′𝑛. The number
of these levels is log𝑑 (𝑛
𝑘) =
𝑙𝑜𝑔(𝑛
𝑘)
𝑙𝑜𝑔(𝑑)=
3
4𝑙𝑜𝑔(𝑛)
𝑙𝑜𝑔𝑙𝑜𝑔(𝑛)(since we stop at level 𝑖 that satisfies 𝑑𝑖 =
𝑛
𝑘) so
that together they donate Θ(𝑛𝑙𝑜𝑔(𝑛)
𝑙𝑜𝑔𝑙𝑜𝑔(𝑛)). Every leaf donates time of 𝑐𝑘𝑙𝑜𝑔𝑙𝑜𝑔(𝑘) and there are
𝑛
𝑘
sub arrays like that, so together, the leafs donates 𝑐𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑘) = Θ(𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛)). Therefore the
total running time is Θ(𝑛𝑙𝑜𝑔(𝑛)
𝑙𝑜𝑔𝑙𝑜𝑔(𝑛)).
Question 2 (81 points)
As a reminder for you, a maximal priority queue is defined like a minimal priority queue except
that it supports the function DeleteMax instead of DeleteMin. We assume that the elements
in the priority queue are numbers (not necessarily different from each other) and DeleteMax
deletes from the queue the element with the maximal value and returns it.
a. Explain what can be changed in the definition of the partially ordered binary tree so that it will
fit the maximal priority queue, and explain with words how does a DelteMax action is
performed on the partially ordered binary tree (as described in class for DeleteMin).
_____________________________________________________________________________________
All that needs to be changed in the definitions of the partially ordered binary tree is that the value
in every node is larger or equal to the value of its children (instead of smaller or equal to).
DeleteMax action takes the value from the root of the tree (which is the maximum) and put it
in a temporary variable. It will pass the value in its left-most last level to the root, and trickle its
value down the tree as long as it is smaller than at least one of its two children, when it replaces
it with the child that its value is the largest. At the end, the value that was in the root is returned.
b. Write a pseudo code for the procedure Insert when the partially ordered binary tree is
represented by a heap 𝑃 (meaning a record in which are two fields: P.size that holds the
number of elements in the priority queue and P.T is an array with size 𝑀𝐴𝑋𝑆𝐼𝑍𝐸 that holds
the elements of the queue). The running time of Insert should be Θ(log(𝑛)) for 𝑛 is the
number of elements in the queue.
Like in a heap that represent a partially ordered binary tree for a priority minimum queue, the
value in the root will be in place P.T[1], and the node that its value is in place 𝑖 in the array, its
children are in places P.T[2i] and P.T[2i+1].
Insert(P,x)
{
if (P.size = MAXSIZE)
return(OVERFLOW)
P.size := P.size + 1
P.T[P.size] := x
cur := P.size
while( (cur > 1) and (P.T[cur] > P.T[cur/2])) {
y:= P.T[cur]
P.T[cur] := P.T[cur/2]
P.T[cur/2] := y
}
}
Question 3
a.
Tree-to-array(pnode,A,i) /* Initially call the procedure with pnode = pointer to root of tree, i=1*/
{
if (pnode = NULL) return(0) /* the tree is empty so num of nodes is 0 */
𝑠1 := Tree-to-array(pnode LC,A,i)
/* enter elements in left subtree, starting from position i */
A[i+𝑠1] := pnode VAL /* enter root right after these elements */
𝑠2 := Tree-to-array(pnode RC,A,i+s1+1)
/* enter elements in right subtree, right after root */
return(𝒔𝟏 + 𝒔𝟐 + 1) /* this is total number of nodes in the tree */
}
b.
Array-to-tree(A,n) {
return(Array-to-tree-req(A,1,n,NULL)) }
Array-to-tree-req(A,p,r,parpnode)
{
create node and let pnode be a pointer to it
q := (p+r)/2
pnode VAL := A[q]; /* put middle (median) value in root */
pnode LC := NULL; pnode RC := NULL; pnode PAR := parpnode;
if (r > p) {
if (p < q) /* build left subtree */
pnode LC := Array-to-tree-req(A,p,q-1,pnode)
if (r > q) /* build right subtree */
pnode RC := Array-to-tree-req(A,q+1,r,pnode)
}
return(pnode)
}
The reason for that that the height of the tree is 𝑂(log(𝑛)) is that in every recursive call, the size
of the sub array on which the call is done is smaller by a factor of 2. In other words, the height of
the tree satisfies the recursive formula: ℎ(𝑛) = 1 + ℎ (𝑛
2) for ℎ(1) = 1, and therefore we get
ℎ(𝑛) = log(𝑛).
Question 4 (22 points)
For each of the following claims, write if it is true or false and explain your answer. You may use,
as a basis, everything that you have seen in class but you must explain precisely what are you
basing your answers on. All the claims are in the subject of flow networks with 𝑛 > 2 vertices
(including 𝑠 and 𝑡) and 𝑚 edges such that any vertex is on at least one path from 𝑠 to 𝑡. This means
that a true claim is true if it always satisfy these conditions.
a. If given that the capacity of any edge in the network is either 1 or 2 (To remind you, if there is
no edge then this means that its capacity is 0) then the running time of Ford–Fulkerson algorithm
when it finds a path in the residual network using BFS is 𝑂(𝑚𝑛2). _____________________________________________________________________________________
True. The flow is bounded from above by 2𝑛 (the sum of the capacities on the outgoing edges
from 𝑠), and all the capacities are integer numbers, so that there are at most 2𝑛 iterations, and the
running time of each one of them is 𝑂(𝑚) so that in total we get 𝑂(𝑚𝑛) = 𝑂(𝑚𝑛2).
b. If given that the capacity of any edge in the network is in the form of 𝑘
𝑛 for 𝑘 is an integer
between 𝑛 and 2𝑛, then the running time of Ford–Fulkerson algorithm when it finds a path in the
residual network using BFS is 𝑂(𝑚𝑛2). _____________________________________________________________________________________
True. The flow is bounded from above by 2𝑛 (the sum of the capacities on the outgoing edges
from 𝑠), and all the capacities are multiplications of 1
𝑛, so that the flow is growing by at least
1
𝑛
for each iteration, and therefore the total number of iterations is most 2𝑛2. Since the running
time of each one of them is 𝑂(𝑚) so that in total we get 𝑂(𝑚𝑛) = 𝑂(𝑚𝑛2).
c. If given that from 𝑠 goes out 𝑑 edges with a weight of 1 then there must exist a flow function
in the net with value 𝑑. _____________________________________________________________________________________
False. It is possible that a cut with small capacity exists, for example, to 𝑡 can enter only one
edge with a capacity of 1, and the flow value is at most 1.
d. If the capacity of each edge in the network is a value between 1 and 2 and there exist a cut (𝑆, 𝑇) for 𝑠 ∈ 𝑆 and 𝑡 ∈ 𝑇 such that 𝑐(𝑆, 𝑇) = 1 then the value of the maximal flow is 1. _____________________________________________________________________________________
True. According to the given information regarding the edges capacities, the capacity of every
cut is at least 1 (since at least one edge "cut" it and its capacity is at least 1), so that the presented
cut is necessarily with minimal cut, and based on the theorem we've seen in class, the maximal
flow value is equal to the minimal capacity of the cut.
Question 5 (32 points)
In some factory are produced 𝑘 different types of products and there are 𝑛 machines (identical to
each other) to manufacture the products. For each type of product 1 ≤ 𝑖 ≤ 𝑘, and any number 0 ≤𝑗 ≤ 𝑛 of machines, 𝑝[𝑖][𝑗] is a number between 1 and 100 that represent the profit from selling a
product from type 𝑖 if there would be assigned to the assignment 𝑗 machines, when 𝑝[𝑖][0] and the
profit to that type of product cannot decrease if assigned to it a bigger number if machines (but
does not necessarily grow linear with the number of machines). We want to decide how many
machines to ascribe for each type of product, so that the total profit (the sum of the profits) from
the different types of products will be maximal (every machine can be ascribed to only one type
of product). In other words, we are looking for 𝑘 non-negative numbers 𝑥1, 𝑥2, … , 𝑥𝑘 so that
∑ 𝑥𝑖𝑘𝑖=1 = 𝑛 and ∑ 𝑝[𝑖][𝑥𝑖]
𝑘𝑖=1 are as large as possible.
First we find the maximal profit that can be obtained and afterward the optimal ascribe.
For each 1 ≤ 𝑖 ≤ 𝑘 and 0 ≤ 𝑗 ≤ 𝑛 we define 𝑀[𝑖][𝑗] to be the maximal profit that can be obtained
from the types of product 1,… , 𝑖, if ascribed for all these types together 𝑗 machines. In other words,
𝑀[𝑖][𝑗] is the maximal value of ∑ 𝑝[𝑙][𝑥𝑙]𝑖𝑙=1 over all the choices of 𝑖 non-negative numbers
𝑥1, 𝑥2, … , 𝑥𝑖 so that ∑ 𝑥𝑙𝑖𝑙=1 = 𝑗.
a. What is 𝑀[1][𝑗] for 0 ≤ 𝑗 ≤ 𝑛 ? _____________________________________________________________________________________
When 𝑖 = 1 it is possible only to take from product 1, so that 𝑀[1][𝑗] = 𝑝[1][𝑗].
b. From the following equations, which is true for any 1 < 𝑖 ≤ 𝑘 and 0 ≤ 𝑗 ≤ 𝑛? Explain your
answer.
(i) 𝑀[𝑖][𝑗] = 𝑚𝑎𝑥𝑟=0𝑗 {𝑝[𝑖][𝑟] + 𝑀[𝑖 − 1][𝑗]}
(ii) 𝑀[𝑖][𝑗] = 𝑚𝑎𝑥𝑟=0𝑗 {𝑝[𝑖][𝑟]} + 𝑀[𝑖 − 1][𝑗 − 𝑟]
(iii) 𝑀[𝑖][𝑗] = 𝑚𝑎𝑥𝑟=0𝑗 {𝑝[𝑖][𝑟] + 𝑀[𝑖 − 1][𝑗 − 𝑟]}
_____________________________________________________________________________________
The correct formula is (iii). Given that there are 𝑗 machines, to product 𝑖 can be assigned
between 0 to 1 machines and receive a profit of 𝑝[𝑖][𝑟] for 𝑟 is the number of machines that were
assigned to it. For any choice of the 𝑟 machines that are assigned to product number 𝑖, remains
𝑗 − 𝑟 machines that can be assigned to 1,… , 𝑖 − 1 machines. The maximal profit that can be
received from assigning 𝑟 − 𝑗 machines to products 1,… , 𝑖 is 𝑀[𝑖 − 1][𝑗 − 𝑟], so that under the
assumption that 𝑟 machines are assigned to product 𝑖, the total sum of the profit is 𝑝[𝑖][𝑟] +𝑀[𝑖 − 1][𝑗 − 𝑟]. Since we do not know which 𝑟 is the optimal, we will take the maximum over
all possible choices of for 𝑟 of that profit.
c. Write a pseudo code for an algorithm that accepts as an input 𝑘, 𝑛 and 𝑝[][] and returns as an
output the maximal profit that can be obtained by ascribing 𝑛 machines to produce 𝑘 products,
when the profit from ascribing 𝑗 machines to a product 𝑖 is 𝑝[𝑖][𝑗]. The running time of the
algorithm should be polynomial in 𝑘 and 𝑛.
Remark: it is not possible to use one line of code to calculate the maximal or minimal value over
more than two values, but you must write a loop that calculates the maximal/minimal.
_____________________________________________________________________________________
MaxProfit(n,k,p[][])
{
for (j=0 to n)
M[1][j] := p[1][j] /* Initialize first column */
for (i= 2 to k) {
for (j = 0 to n) {
M[i][j] := p[i][0] + M[i-1][j]
/* Initialize M[i][j] by max profit obtained when i is not assigned any machine (r=0) */
Best[i][j]:=0 /* for last part of the question */ for (r=1 to j) { /* Compute max profit for 1,…,i using j machines */
new_profit := p[i][r] + M[i-1][j-r]
if (new_profit) > M[i][j] {
M[i][j] : = new_profit
Best[i][j] := r }
}
}
}
return(M[k][n])
}
d. Give the best possible upper bound on the running time of your algorithm (In other words, a
function 𝑓(𝑛, 𝑘) that its running time is 𝑂(𝑓(𝑛, 𝑘))). _____________________________________________________________________________________
The first loop runs in time 𝑂(𝑛), and afterwards there are 3 nested loops: the exterior runs over
𝑘 − 1 iterations, the next runs 𝑛 + 1 iterations, and the insider runs at most 𝑛 iterations, so in
total the running time is 𝑂(𝑛2𝑘).
e. Show by writing another procedure, and if needed an addition to the pseudo code that you wrote
(that does not change asymptotically the running time of the algorithm), how can you get
𝑥1, 𝑥2, … , 𝑥𝑘 that obtain an optimal total profit. In other words, the additional procedure prints
pairs: (1, 𝑥1), (2, 𝑥2), … , (𝑘, 𝑥𝑘) (from left to right) so that ∑ 𝑥𝑖𝑘𝑖=1 = 𝑛 and ∑ 𝑝[𝑖][𝑥𝑖]
𝑘𝑖=1 is as large
as possible. In case your procedure is recursive, you must write explicitly with which parameters
it is called with. _____________________________________________________________________________________
We will add the following code, when Best[i][j] holds the same number 𝑟 that brings to
maximum the profit from the formula from the sub question b. We will call the procedure
Print_Best with 𝑖 = 𝑘, 𝑗 = 𝑛 and the matrix Best.
Print_Best(i,j,Best[][])
{
if (i=1)
print(1,j)
Print_Best(i-1,j-Best[i][j])
print(i,Best[i][j])
}
ID:
Tel Aviv University
The Faculty of Engineering
Moed A 2013 in Data Structures and Algorithm
Exam Time: 3 Hours Date: 14.02.2013
Teacher: Dana Ron Instructions:
You are allowed to use four study aids papers. You cannot use any sort of computer.
The exam includes 5 questions. The grade for each question appears in brackets.
Write your answers on the exam form in the designated place for it. It is highly
recommended to first write your answer in the draft paper that you have received and
only later to copy it, in a clear and readable way, to the exam form. Explain shortly but
yet clearly all your claims. A claim without an explanation will not be accepted.
You may use theorems and algorithms which have been learned in class and in the
recitations or which have appeared in the homework assignments that you did. In such
cases you may cite what was learned with no need for a proof. On the other hand, if you
are using a slightly different version of an algorithm or some other analysis you must
explain precisely what the differences are.
In this exam there are 11 pages (including this one). Please make sure you have them all
in your possession.
Don't forget to write your ID number at the marked place.
GOOD LUCK!
Question 1 (20 points)
Following are the algorithms InsertionSort and MergeSort as learned in class (for
Merge(A,p,q,r) is a procedure that merges between two sorted sub-arrays A[p,q] and
A[q+1,r] and its running time is linear in r-p+1 for any input.
InsertionSort(Input: array A, integer n) MergeSort(Input: array A, integers p,r)
{ {
for j = 2 to n { if (r p)
newnum A[j] return
i j-1 else {
while ( i > 0 and newnum < A[i] ) { q (p+r)/2
A[i+1] A[i] MergeSort(A,p,q)
i i-1 MergeSort(A,q+1,r)
} Merge(A,p,q,r)
A[i+1] newnum }
} }
}
As a reminder for you, for an array A, we marked the running time of InsertionSort on the
array A with 𝑇𝐼𝑆(𝐴). Let n be the size of A. We mark with 𝑠(𝐴) the number of pairs (i, j) that are
"not sorted among themselves", meaning that they satisfy 1 ≤ 𝑖 < 𝑗 ≤ 𝑛 however 𝐴[𝑗] < 𝐴[𝑖]. For example, if we take 𝐴 = [5,4,4,3] then 𝑠(𝐴) = 5 when the pairs (i, j) which are not sorted are: (1,2), (1,3), (1,4), (2,4), (3,4).
Explain clearly your answers on all the following sub-questions.
a. What is 𝑇𝐼𝑆(𝐴) as a function of 𝑠(𝐴) and n? Instruction: pay attention that 𝑠(𝐴) is a sum
on j (running from 2 until n) of the size of the sets of indexes {𝑖|𝑖 < 𝑗 𝐴𝑁𝐷 𝐴[𝑖] > 𝐴[𝑗]}.
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
For the following two sub questions, if your answer is positive you must describe the structure of
A accurately (For example like so: "in the 1, … , 𝑛 − 𝑛1
2 places are stored the numbers 1, … , 𝑛 − 𝑛1
2
sorted from smallest to largest, and in the 𝑛 − 𝑛1
2 + 1, … , 𝑛 places are stored the numbers 𝑛 − 𝑛1
2 +1, … , 𝑛 sorted from larges to smallest").
b. Do there exists constants 𝑐1 and 𝑐2 so that for any 𝑛 ≥ 4 exists an array A in size n that for
it 𝑐1𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛) ≤ 𝑇𝐼𝑆(𝐴) ≤ 𝑐2𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛) ?
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
c. We mark with 𝑇𝑀𝑆(𝐴) the running time of MergeSort on array A when calling
MergeSort with 𝑝 = 1 and 𝑟 = 𝑛.
Do there exists constants 𝑐1 and 𝑐2 so that for any 𝑛 ≥ 4 exists an array A in size n that for
it 𝑐1𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛) ≤ 𝑇𝑀𝑆(𝐴) ≤ 𝑐2𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛)
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
Question 2 (15 points)
One of the ambitions in a Binary Search Tree is keeping the tree balanced, so that the distance
between the farthest leaf from the root and the root itself is not too large. Write a pseudo-code for
a procedure that accepts as an input a pointer to the root of the binary search tree (and in it one
node) and returns the distance of the leaf in the tree that is the farthest from the root (according to
the definition of distance in a graph: the distance of the root from itself is 0, the distance of its
children from him is 1 and so on). Assume that each node in the tree is represented via a struct
contains the following fields: PAR (a pointer to the parent), VAL (the value in the node), LC (a
pointer to the left child) and RC (a pointer to the right child).
In addition to the pseudo-code, explain shortly the idea behind the procedure and analyze the
procedure's running time (in terms of Θ) as a function of the number of vertices in the tree, 𝑛.
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
Question 3 (15 points)
As a reminder for you, in the data structure which we used for 2-3 trees, each node via a struct
contains the following fields: PAR (a pointer to the parent), VAL (the value in the node), LC (a
pointer to the left child), MC (a pointer to the middle child), RC (a pointer to the right child) MIN1
(the minimal value in the sub tree), MIN2 (the minimal value in the sub tree) and MIN3 (the
minimal value in the sub tree) if one exists. As an another reminder for you, the field VAL is
relevant only for leaves and the fields LC, MC, RC, MIN1, MIN2 and MIN3 are relevant only for
inner vertices. Assume that in each struct that represent a node, contains also another field, SIZE
and in it is the number of leaves in the sub-tree that the vertex is its root. We will mark with n the
number of leaves in the tree (the number of elements in the set that the tree represents). Assume
also that 𝑛 ≥ 2.
Given a 2-3 tree which is represented via the above data structure, complete the following pseudo-
code so that when the procedure is called with pnode which is a pointer to the root and a value,
1 ≤ 𝑘 ≤ 𝑛 it returns the value of the k'th element between the tree leaves (sorted from smallest to
largest). You are allowed to alternatively write your own pseudo-code (for procedure as efficient
as possible) on the other side of the paper.
Rank-2-3(pnode,k) /* pnode is a pointer to a node of a 2-3-tree with the additional SIZE field,
and 1 ≤ 𝑘 ≤ 𝑝𝑛𝑜𝑑𝑒 → 𝑆𝐼𝑍𝐸 */
{
if (pnode SIZE = 1)
return( ____________________________________________________)
if (k ≤__________________________________________________________________)
return (____________________________________ __________________________)
else if (k ≤_______________________________________________________________)
return (_______________________________________________________________)
else
return (_______________________________________________________________)
}
Explain shortly the algorithm and analyze its running time, meaning, what is 𝑔(𝑛) so that the
running time will be: Θ(𝑔(𝑛)).
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
Question 4 (15 points)
Apply one iteration of Ford-Fulkerson algorithm in the (in (1)) with the given marked flow. In
other words, draw the residual network including the residual capacities of it (in (2)), find an
augmenting path (any) and update the flow (in (3)). Does the flow you get have maximal value?
If your answer is "yes", prove it by marking a cut (S,T) for which |𝑓| = 𝑐(𝑆, 𝑇), if your answer is
"no", show how is it possible to get a better flow with a higher value.
As a reminder for you, the mark 𝑥/𝑦 on an edge (𝑢, 𝑣) means that 𝑐(𝑢, 𝑣) = 𝑦 and 𝑓(𝑢, 𝑣) = 𝑥.
If only one number is written then it is the capacity and the flow is 0 (if there is no edge from 𝑢 to
𝑣 then 𝑐(𝑢, 𝑣) = 0, and always 𝑓(𝑢, 𝑣) = −𝑓(𝑢, 𝑣)).
s t
3/3
3/3
2/4
2/2 1/1
1/1
2
3/3
3/3
2
2
2
2 4
(1) Network G and flow f
s t
(2) Residual network 𝑮𝒇
t s
(3) Network G and new flow f
Question 5 (35 points)
Let 𝑆 = 𝑆1 … 𝑆𝑡 be a DNA string and let 𝐼 be an array of 𝑛 ≥ 2 different indexes that is sorted from
smallest to largest such that 𝐼[1] = 1 and 𝐼[𝑛] = 𝑡 + 1. There is a segmentation process
(biological), that given a sub string 𝑆𝑖 … 𝑆𝑘 and indexes 𝑖 < 𝑗 ≤ 𝑘, splits the sub string to two
different sub strings 𝑆𝑖 … 𝑆𝑗−1 and 𝑆𝑗 … 𝑆𝑘, when the cost of each split is max(𝑗 − 1, 𝑘 − 𝑗 + 1)
(meaning, the maximum between the length of the two resulting sub strings). We are interested to
split 𝑆 to 𝑛 − 1 sub strings that are defined by the array 𝐼, meaning to 𝑆1 = 𝑆𝐼[1] … 𝑆𝐼[2]−1, 𝑆2 =
𝑆𝐼[2] … 𝑆𝐼[3]−1, … , 𝑆𝑛−1 = 𝑆𝐼[𝑛−1] … 𝑆𝐼[𝑛]−1 (Remember that 𝐼[1] = 1 and 𝐼[𝑛] = 𝑡 + 1) by a
sequence of splits that its cost is minimal.
For example, let 𝑡 = 10:
And the final split should be:
If the order of the splits will be (from left to right): 𝐼[2], 𝐼[3]
So, first we split to
two sub strings
Then we split the second string to
Therefore the total cost would be 8 + 5 = 13
A C
2 1
C G T A C G T A
3 4 5 7 6 8 9 10
A C C G T A C G T A
2 1 3 4 5 7 6 8 9 10
S
I(3)=6
cost = 8
cost = 5 C C G T A
7 6 8 9 10
I(2)=3
G T A
3 4 5
A C C G T A C G T A
2 1 3 4 5 7 6 8 9 10
S
1 3 6 11
2 1 3 4
I
A C
2 1
G T A
3 4 5
C C G T A
7 6 8 9 10
1S
2S
3S
While if the order will be 𝐼[3], 𝐼[2], meaning that we first split to:
And then split the first sub string to:
Therefore the total cost would be 5 + 3 = 8
We would first like to calculate the minimal cost of a sequence of splits, given by the array 𝐼 and
then we would like to find the sequence of splits that gives this minimal value. For each pair 1 ≤𝑝 < 𝑟 ≤ 𝑛, let 𝐶(𝑝, 𝑟) be the minimal cost of a split of the sub string 𝑆𝐼[𝑝] … 𝑆𝐼[𝑟]−1 to 𝑟 − 𝑝 sub
strings 𝑆𝐼[𝑝] … 𝑆𝐼[𝑝+1]−1, 𝑆𝐼[𝑝+1] … 𝑆𝐼[𝑝+2]−1, … , 𝑆𝐼[𝑟−1] … 𝑆𝐼[𝑟]−1.
You must explain your answers clearly for each of the following sub questions.
a. What is 𝐶(𝑝, 𝑝 + 1) for each 𝑝?
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
b. Explain why the following equation is true for any pair (𝑝, 𝑟) so that 𝑟 > 𝑝 + 1:
𝐶(𝑝, 𝑟) = 𝑚𝑖𝑛𝑝<𝑞<𝑟{max(𝐼[𝑞] − 𝐼[𝑝], 𝐼[𝑟] − 𝐼[𝑞]) + 𝐶(𝑝, 𝑞) + 𝐶(𝑞, 𝑟)}
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
A C C G T A C G T A
2 1 3 4 5 7 6 8 9 10
I(3)=6
I(2)=3
cost = 5
cost = 3
A C G T A
2 1 3 4 5
C C G T A
A C
2 1
G T A
3 4 5
7 6 8 9 10
S
c. Write a pseudo code for a procedure that runs in polynomial time of 𝑛 and calculates the
minimal cost of a series of splits given by the array 𝐼. _____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
d. Give the best possible upper bound on the running time of the procedure you wrote.
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
e. Show by writing another procedure, and if needed an addition to the pseudo code that you
wrote (that does not change asymptotically the running time of the algorithm), how can you also
find the optimal sequence of splits (and not just its cost). In other words, the output of the
procedure is a series of tuples (𝑖1, 𝑗1, 𝑘1), (𝑖2, 𝑗2, 𝑘2), … , (𝑖𝑛−1, 𝑗𝑛−1, 𝑘𝑛−1) when (𝑖𝑥, 𝑗𝑥, 𝑘𝑥)
means that the 𝑥'th split is of the a sub string from place 𝑖𝑥 to place 𝑘𝑥 into two sub strings: the
first one is from place 𝑖𝑥 to place 𝑖𝑥 − 1, and the second one is from place 𝑗𝑥 to place 𝑘𝑥 . In
particular, for the first split, 𝑖1 = 1, 𝑘1 = 𝑡. (For the example in the first question, the output
would be (from left to right): (1,6,10), (1,3,5)).
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
_____________________________________________________________________________________________
Solution to Moed A 2013 in Data Structures and Algorithm
Question 1
Following are the algorithms InsertionSort and MergeSort as learned in class (for
Merge(A,p,q,r) is a procedure that merges between two sorted sub-arrays A[p,q] and
A[q+1,r] and its running time is linear in r-p+1 for any input.
InsertionSort(Input: array A, integer n) MergeSort(Input: array A, integers p,r)
{ {
for j = 2 to n { if (r p)
newnum A[j] return
i j-1 else {
while ( i > 0 and newnum < A[i] ) { q (p+r)/2
A[i+1] A[i] MergeSort(A,p,q)
i i-1 MergeSort(A,q+1,r)
} Merge(A,p,q,r)
A[i+1] newnum }
} }
}
As a reminder for you, for an array A, we marked the running time of InsertionSort on the
array A with 𝑇𝐼𝑆(𝐴). Let n be the size of A. We mark with 𝑠(𝐴) the number of pairs (i, j) that are
"not sorted among themselves", meaning that they satisfy 1 ≤ 𝑖 < 𝑗 ≤ 𝑛 however 𝐴[𝑗] < 𝐴[𝑖]. For example, if we take 𝐴 = [5,4,4,3] then 𝑠(𝐴) = 5 when the pairs (i, j) which are not sorted are:
(1,2), (1,3), (1,4), (2,4), (3,4).
Explain clearly your answers on all the following sub-questions.
a. What is 𝑇𝐼𝑆(𝐴) as a function of 𝑠(𝐴) and n? Instruction: pay attention that 𝑠(𝐴) is a sum
on j (running from 2 until n) of the size of the sets of indexes {𝑖|𝑖 < 𝑗 𝐴𝑁𝐷 𝐴[𝑖] > 𝐴[𝑗]}.
_____________________________________________________________________________________________
(Please note that this question is a variant of a question that was given to you at a home work
assignment, exercise number 2.)
a. Following the instruction, we mark 𝑠(𝐴) = |{𝑖|𝑖 < 𝑗 𝐴𝑁𝐷 𝐴[𝑖] > 𝐴[𝑗]}| so that 𝑆(𝐴) =∑ 𝑠(𝐴, 𝑗)𝑛
𝑗=2 . As you've seen in class, when reaching to 𝐴[𝑗] in the exterior loop, the
numbers in 𝐴[1, … , 𝑗 − 1] are sorted (and are the same ones that were in that sub array at
the beginning of the algorithm's run). Therefore the number of iterations of the while loop
when reaching to 𝐴[𝑗] is 𝑠(𝐴, 𝑗) (since the loop is going over all the values that are larger
than 𝐴[𝑗] in the sub array 𝐴[1, … , 𝑗 − 1] and stop only when it reaches to a value that is
smaller than/equal to 𝐴[𝑗] or after going over all the values). Therefore (for some constants
𝑐1 and 𝑐2):
𝑇𝐼𝑆(𝐴) = ∑ (𝑐1 + 𝑐2𝑠(𝐴, 𝑗)𝑛𝑗=2 = 𝑐1(𝑛 − 1) + 𝑐2𝑠(𝐴).
Common mistakes:
1. Referring to 𝑠(𝐴) as the number of actions that is done for every 𝒋 in the while loop
(wheather if as an upper bound or out of the lack of understanding of the definition),
and setting the running time to 𝑂(𝑛 ∙ 𝑠(𝐴)).
2. Neglecting the constant number of operations that the for loop is doing regarding
𝑠(𝐴), and setting the running time to 𝑂(𝑠(𝐴)).
For the following two sub questions, if your answer is positive you must describe the structure of
A accurately (For example like so: "in the 1, … , 𝑛 − 𝑛1
2 places are stored the numbers 1, … , 𝑛 − 𝑛1
2
sorted from smallest to largest, and in the 𝑛 − 𝑛1
2 + 1, … , 𝑛 places are stored the numbers 𝑛 − 𝑛1
2 +1, … , 𝑛 sorted from larges to smallest").
b. Do there exists constants 𝑐1 and 𝑐2 so that for any 𝑛 ≥ 4 exists an array A in size n that for
it 𝑐1𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛) ≤ 𝑇𝐼𝑆(𝐴) ≤ 𝑐2𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛) ?
b. Yes. In particular, for every 𝑛 it applies that for an array 𝐴 in which for 𝑡 = ⌊𝑙𝑜𝑔𝑙𝑜𝑔(𝑛)⌋, in places 1, … , 𝑡 are the numbers 𝑛 − 𝑡 + 1, … , 𝑛 sorted from smallest to largest, and in
places 𝑡 + 1, … , 𝑛 are the numbers 1, … , 𝑛 − 𝑡 sorted from smallest to largest. Pay attention
that 𝑠(𝐴, 𝑗) = 0 for every 1 ≤ 𝑗 ≤ 𝑡 and 𝑠(𝐴, 𝑗) = 𝑡 for every 𝑡 + 1 ≤ j ≤ n so that 𝑠(𝐴) =(𝑛 − 𝑡)𝑡 = 𝑛𝑡 − 𝑡2, meaning that 𝑠(𝐴) < 𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛) for every 𝑛 ≥ 4 and 𝑠(𝐴) >𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛)
2 for every 𝑛 ≥ 4 (since 𝑛 − 𝑡 >
𝑛
2), and we get what we are asking for (with 𝑐1 =
1
2 and 𝑐2 = 1).
This applies also for array 𝐴 of the following form: in the 𝑘 = ⌊(𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛))1
2⌋ first
places are sorted, from largest to smallest, and in the rest of the places the numbers that are
larger than every number from the first 𝑘 places, and are sorted from smallest to largest. In
such case, 𝑠(𝐴, 𝑗) = 𝑗 − 1 for every 2 ≤ 𝑗 ≤ 𝑘 and 𝑗 > 𝑘. Therefore 𝑠(𝐴) = 1 + 2 + ⋯ +
𝑘 − 1 =(𝑘−1)𝑘
2 which is again as requested.
Common mistakes:
1. In many cases the example that was given was wrong. It is important to note that
there are 𝑘 elements that are sorted from largest to smallest so the number of pairs
among them that contribute to 𝑠(𝐴) is squared in 𝑘. In particular, if the 𝑙𝑜𝑔𝑙𝑜𝑔(𝑛)
first elements in the array are sorted from largest to smallest and the rest of the
elements are all larger than them and sorted from smallest to largest then:
𝑠(𝐴) = 𝑐 ∙ (𝑙𝑜𝑔𝑙𝑜𝑔(𝑛))2
2. There were cases in which there was no explanation or a counter example for an
arrangement of the array was not given as requested.
c. We mark with 𝑇𝑀𝑆(𝐴) the running time of MergeSort on array A when calling
MergeSort with 𝑝 = 1 and 𝑟 = 𝑛.
Do there exists constants 𝑐1 and 𝑐2 so that for any 𝑛 ≥ 4 exists an array A in size n that for
it 𝑐1𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛) ≤ 𝑇𝑀𝑆(𝐴) ≤ 𝑐2𝑛𝑙𝑜𝑔𝑙𝑜𝑔(𝑛)
_____________________________________________________________________________________________
c. No. The running time of MergeSort is an order of 𝑛𝑙𝑜𝑔(𝑛) for every input array (since
the structure of a recursion tree is not the same for every input, and the running time of
Merge is linear in the size of the sub array on which it is running, for every input. This
means that in a recursion tree there are log (𝑛) levels, and in every level 𝑗 there are 2𝑗
vertices, that each one matches the run of Merge on an array with size 𝑛
2𝑗, so that every
level is contributing in total a linear order of 𝑛 actions, for any input).
Question 2
One of the ambitions in a Binary Search Tree is keeping the tree balanced, so that the distance
between the farthest leaf from the root and the root itself is not too large. Write a pseudo-code for
a procedure that accepts as an input a pointer to the root of the binary search tree (and in it one
node) and returns the distance of the leaf in the tree that is the farthest from the root (according to
the definition of distance in a graph: the distance of the root from itself is 0, the distance of its
children from him is 1 and so on). Assume that each node in the tree is represented via a struct
contains the following fields: PAR (a pointer to the parent), VAL (the value in the node), LC (a
pointer to the left child) and RC (a pointer to the right child).
In addition to the pseudo-code, explain shortly the idea behind the procedure and analyze the
procedure's running time (in terms of Θ) as a function of the number of vertices in the tree, 𝑛.
_____________________________________________________________________________________________
The procedure works in a recursive way. Given a pointer pnode to a vertex, it returns the maximal
length from the vertex to the leaf in its sub tree. In particular, left_dist is the distance from
the vertex to the farthest leaf in the sub tree of its left child: if there is no such, then the value is 0,
else it is 1 plus the maximal distance of the left child from the leaf in its sub tree (calculates
recursively). In a similar way, right_dist is defined. In fact, this means that if the vertex is a
leaf so that it does not have any children, the value that is returned is 0. The initial call is with the
pnode that points to the root of the tree [if we send it with NULL, that is in fact an illegal input
(other then a tree that includes a single node that is the root and the lead) and then −1 is returned].
Max-leaf-dist(pnode) /* pnode is a pointer to a node */
{
if (pnode = NULL)
return(-1)
if (pnode → LC != NULL) /* compute distance to furthest leaf in subtree of left child */
left_dist := 1+ Max-leaf-dist(pnode → LC)
else left_dist := 0
if (pnode → RC != NULL) /* compute distance to furthest leaf in subtree of right child */
right_dist := 1+ Max-leaf-dist(pnode → RC)
else right_dist := 0
return(max(left_dist,right_dist))
}
Since exactly one call is preformed, and it is recursive for every vertex in the tree [from its
(single) parent], and the running time of the procedure is constant, then the total running time is
Θ(𝑛) (not including recursive calls from it).
It is also possible to write a slightly more "compact" code (with the same running time up to a
constant) as follows (pay attention that if pnode points to a leaf, then in the recursive calls to
the children, which are NULL, −1 will be returned, and in addition to 1 we get 0 as required).
Max-leaf-dist(pnode) /* pnode is a pointer to a node */
{
if (pnode = NULL)
return(-1)
else return(1+ max(Max-leaf-dist(pnode → LC),Max-leaf-dist(pnode → RC)))
}
Common mistakes:
1. Returning a value of the height of the tree which is bigger in 1 from the resulting
value.
2. There were a few cases in which the calculations were done using BFS. This such
answer would have been accepted in case that the code took into account that it is
a tree [for example by referring to the fact that the following elements that are
entered to the queue are the children of the vertex in the tree, or a reference to that
that other than the conditions that were satisfied for BFS as we've seen in class (in
particular that the vertices are {1,2, … , 𝑛} and the graph is represented by the array
𝐿 in size 𝑛 in which 𝐿[𝑖] points to a linked list of the neighbors of vertex 𝑖) there is
no pre given identification to the vertices].
Question 3
As a reminder for you, in the data structure which we used for 2-3 trees, each vertex in the tree
will be represented via a list and in it the following fields: PAR (a pointer to the parent), VAL (the
value in the leaf), LC (a pointer to the left child), MC (a pointer to the middle child), RC (a pointer
to the right child) MIN1 (the minimal value in the left tree), MIN2 (the minimal value in the middle
tree) and MIN3 (the minimal value in the right tree) if one exists. As an another reminder for you,
the field VAL is relevant only for leaves and the fields LC, MC, RC, MIN1, MIN2 and MIN3 are
relevant only for inner vertices. Assume that in each list (a vertex) there is also another field, SIZE
and in it is the number of leaves in the sub-tree that the vertex is its root. We will mark with n the
number of leaves in the tree (the number of elements in the set that the tree represent). Assume
also that 𝑛 ≥ 2.
Given a 2-3 tree which is represented via the above data structure, complete the following pseudo-
code so that when the procedure is called with pnode which is a pointer to the root and a value,
1 ≤ 𝑘 ≤ 𝑛 it returns the value of the k in size between the tree leaves, element of the array (sorted
from smallest to largest). You are allowed to alternatively write your own pseudo-code (for
procedure as efficient as possible) on the other side of the paper.
Rank-2-3(pnode,k) /* pnode is a pointer to a node of a 2-3-tree with the additional SIZE field,
and 1 ≤ 𝑘 ≤ 𝑝𝑛𝑜𝑑𝑒 → 𝑆𝐼𝑍𝐸 */
{
if (pnode SIZE = 1)
return( ____________________________________________________)
if (k ≤__________________________________________________________________)
return (____________________________________ __________________________)
else if (k ≤_______________________________________________________________)
return (_______________________________________________________________)
else
return (_______________________________________________________________)
}
Explain shortly the algorithm and analyze its running time, meaning, what is 𝑔(𝑛) so that the
running time will be: Θ(𝑔(𝑛)).
_____________________________________________________________________________________________
Rank-2-3(pnode,k) /* pnode is a pointer to a node of a 2-3-tree with the additional SIZE field,
and 1 ≤ 𝑘 ≤ 𝑝𝑛𝑜𝑑𝑒 → 𝑆𝐼𝑍𝐸 */
{
if (pnode SIZE = 1) /* Reached a leaf whose value should be returned (note that k must be 1) */
return(pnode VAL)
if (k ≤(pnode → LC) → SIZE) /* The element must be in the subtree of the left child, and its relative
rank remains k */
return (Rank-2-3(pnode → LC,k))
else if (k ≤(pnode → LC) → SIZE + (pnode → MC) → SIZE) /* The element must be in the subtree
of the middle child, and its relative
rank in this subtree is k minus the
number of leaves in the subtree of the
left (“small”) child*/
return (Rank-2-3(pnode → MC, k - (pnode → LC) → SIZE)
else /*The element must be in the subtree of the right child, and its relative rank in this subtree is k
minus the total number of leavess in the subtrees of the left and middle children */
return (Rank-2-3(pnode → RC, k – (pnode → LC) → SIZE – (pnode → MC) → SIZE))
}
The running time of the procedure is linear in the height of the tree, meaning Θ(log(𝑛)).
Common mistakes:
Most of the students solved this question correctly, however there were a number
of mistakes that repeated:
1. The most common mistake was not updating the value of 𝑘 – in the recursive
calls. In addition, there were students who did not understand the meaning of
𝑠𝑖𝑧𝑒 and compared between 𝑘 and the minimal balue in one of the branches (for
example (pnodeLC)min).
2. Some students assumed wrongly that the tree always devidable by a factor of
three which caused a comparison between 𝑘 and the value of 𝑠𝑖𝑧𝑒
3 or to a
recursive call to a function with values such as 2 ∙𝑠𝑖𝑧𝑒
3 in the place which 𝑘
suppose to be at.
Question 4 (15 points)
Apply one iteration of Ford-Fulkerson algorithm in the (in (1)) with the given marked flow. In
other words, draw the residual network including the residual capacities of it (in (2)), find an
augmenting path (any) and update the flow (in (3)). Does the flow you get have maximal value?
If your answer is "yes", prove it by marking a cut (S,T) for which |𝑓| = 𝑐(𝑆, 𝑇), if your answer is
"no", show how is it possible to get a better flow with a higher value.
As a reminder for you, the mark 𝑥/𝑦 on an edge (𝑢, 𝑣) means that 𝑐(𝑢, 𝑣) = 𝑦 and 𝑓(𝑢, 𝑣) = 𝑥.
If only one number is written then it is the capacity and the flow is 0 (if there is no edge from 𝑢 to
𝑣 then 𝑐(𝑢, 𝑣) = 0, and always 𝑓(𝑢, 𝑣) = −𝑓(𝑢, 𝑣)).
s t
3/3
3/3
2/4
2/2 1/1
1/1
2
3/3
3/3
2
2
2
2 4
(1) Network G and flow f
s t
(2) Residual network 𝑮𝒇
t s
(3) Network G and new flow f
The path from 𝑠 to 𝑡 for the residual network in the left is bolded
The maximal flow: the capacity of the presented cut is 𝑐(𝑠, 𝑎) + 𝑐(𝑠, 𝑒) + 𝑐(𝑖, 𝑡) = 3 + 2 +3 = 8 and that is also the value of the flow. The nodes of 𝑆 = {𝑠, 𝑐, ℎ, 𝑖} in the cut (marked in
red) are the nodes to which there is a path from 𝑠 in the residual graph that is in the left.
s t
3/3
3/3
2/4
2/2 1/1
1/1
2
3/3
3/3
2
2
2
2 4
(1) Network G and flow f
s t
(2) Residual network 𝑮𝒇
t s
(3) Network G and new flow f
3
2
2
2
2
2
2
3
1
3
1
2
2
4
3/3
4
1/1
2/2
2/2
2/4
1/1
3/3
3/3
3/3
2/2
2
2
3
2/2 s t
a a
a a b b
b b
c
c c
c
d d
d d
e
e
e
e
g g
g g
h h
h h
i i
i i
Question 5 (35 points)
Let 𝑆 = 𝑆1 … 𝑆𝑡 is a DNA string and let 𝐼 be an array of 𝑛 ≥ 2 different indexes that is sorted from
smallest to largest such that 𝐼[1] = 1 and 𝐼[𝑛] = 𝑡 + 1. There is a segmentation process
(biological), that given a sub string 𝑆𝑖 … 𝑆𝑘 and indexes 𝑖 < 𝑗 ≤ 𝑘, divides the sub string to two
different sub strings 𝑆𝑖 … 𝑆𝑗−1 and 𝑆𝑗 … 𝑆𝑘, when the cost of each division is max(𝑗 − 1, 𝑘 − 𝑗 + 1)
(meaning, the maximum between the length of the two resulting sub strings). We are interested to
divide 𝑆 to 𝑛 − 1 sub strings that are defined by the array 𝐼, meaning for 𝑆1 = 𝑆𝐼[1] … 𝑆𝐼[2]−1, 𝑆2 =
𝑆𝐼[2] … 𝑆𝐼[3]−1, … , 𝑆𝑛−1 = 𝑆𝐼[𝑛−1] … 𝑆𝐼[𝑛]−1 (Remember that 𝐼[1] = 1 and 𝐼[𝑛] = 𝑡 + 1) by a
sequence of divisions that its cost is minimal.
For example, let 𝑡 = 10:
And the final division should be:
If the order of the divisions will be (from left to right): 𝐼[2], 𝐼[3]
So, first we division to
two sub strings
Then we divide the second string to
Therefore the total cost would be 8 + 5 = 13
A C
2 1
C G T A C G T A
3 4 5 7 6 8 9 10
A C C G T A C G T A
2 1 3 4 5 7 6 8 9 10
S
I(3)=6
cost = 8
cost = 5 C C G T A
7 6 8 9 10
I(2)=3
G T A
3 4 5
A C C G T A C G T A
2 1 3 4 5 7 6 8 9 10
S
1 3 6 11
2 1 3 4
I
A C
2 1
G T A
3 4 5
C C G T A
7 6 8 9 10
1S
2S
3S
While if the order will be 𝐼[3], 𝐼[2], meaning that we first divide to:
And then divide the first sub string to:
Therefore the total cost would be 5 + 3 = 8
We would first like to calculate the minimal cost of a sequence of divisions, given by the array 𝐼
and then we would like to find the sequence of divisions that give this minimal value. For each
pair 1 ≤ 𝑝 < 𝑟 ≤ 𝑛, let 𝐶(𝑝, 𝑟) be the minimal cost of a division of the sub string 𝑆𝐼[𝑝] … 𝑆𝐼[𝑟]−1
to 𝑟 − 𝑝 sub strings 𝑆𝐼[𝑝] … 𝑆𝐼[𝑝+1]−1, 𝑆𝐼[𝑝+1] … 𝑆𝐼[𝑝+2]−1, … , 𝑆𝐼[𝑟−1] … 𝑆𝐼[𝑟]−1.
You must explain your answers clearly for each of the following sub questions.
a. What is 𝐶(𝑝, 𝑝 + 1) for each 𝑝?
_____________________________________________________________________________________________
a. According to its definition in the question, 𝐶(𝑝, 𝑟) is the minimal cost of dividing a sub string
𝑆𝐼[𝑝] … 𝑆𝐼[𝑟]−1. When 𝑟 = 𝑝 + 1, there exists one string that is not dividable and therefore the
price for a division is 𝐶(𝑝, 𝑝 + 1) = 0.
Common mistakes:
Some understood that for 𝑟 = 𝑝 + 1 we get a single sub string, and then claimed that the price
for it is 𝐶(𝑝, 𝑝 + 1) = 𝐼[𝑝 + 1] − 𝐼[𝑝]. Though there is no price here for a division, no points
were taken for such answer since in it can be seen a legitimate interpretation. There were others
that understood that we get a single sub string but claimed that the price is 1 without
understanding that there is a single string, but from a wrong interpretation and confusion
between the number of sub strings and there length.
A C C G T A C G T A
2 1 3 4 5 7 6 8 9 10
I(3)=6
I(2)=3
cost = 5
cost = 3
A C G T A
2 1 3 4 5
C C G T A
A C
2 1
G T A
3 4 5
7 6 8 9 10
S
b. Explain why the following equation is true for any pair (𝑝, 𝑟) so that 𝑟 > 𝑝 + 1:
𝐶(𝑝, 𝑟) = 𝑚𝑖𝑛𝑝<𝑞<𝑟{max(𝐼[𝑞] − 𝐼[𝑝], 𝐼[𝑟] − 𝐼[𝑞]) + 𝐶(𝑝, 𝑞) + 𝐶(𝑞, 𝑟)}
_____________________________________________________________________________________________
b. In order to determine the optimal price for dividing a sub string 𝑆𝐼[𝑝] … 𝑆𝐼[𝑟]−1 (into 𝑟 − 𝑝 > 1
sub strings) we will examine all the possibilities of the first division. In other words, to every 𝑞
between 𝑝 + 1 and 𝑟 − 1, the first division could be to 𝑆𝐼[𝑝] … 𝑆𝐼[𝑞]−1 and 𝑆𝐼[𝑞] … 𝑆𝐼[𝑟]−𝑞. When
the price for such division is the maximum between the length of the two obtained sub strings,
meaning max(𝐼[𝑞] − 𝐼[𝑝], 𝐼[𝑟] − 𝐼[𝑞]). Given some specific choice 𝑞 for the first division, an
optimal continuity of the division of each one of the two obtained sub strings is the sum of
𝐶(𝑝, 𝑞): an optimal cost of dividing the sub string 𝑆𝐼[𝑝] … 𝑆𝐼[𝑞] and 𝐶(𝑞, 𝑟): the optimal cost of
dividing the sub string 𝑆𝐼[𝑞] … 𝑆𝐼[𝑟]−1. In order to get the total optimum we need to find the
minimum over all the choices for the first division (meaning, the 𝑞 that is strictly bigger than 𝑝
and strictly smaller than 𝑟) of the sum of the three factors above.
Common mistakes:
There were not a few that wrote confusing / unclear answers. In general, there were a forgiving
attitude to that, however there were cases in which the unclarity was severe and showed a
misunderstanding, and for that points were taken off.
There were some that claimed that the minimum is taken only on the first part of the expression
(the maximum among the lengths), and for 𝑞 that gives the minimum added 𝐶(𝑝, 𝑞) + 𝐶(𝑞, 𝑟).
This of course is a severe mistake since you must choose 𝑞 that leads to minimum the sum of the
three factors.
c. Write a pseudo code for a procedure that run in polynomial time of 𝑛 and calculates the
minimal cost of a series of divisions given by the array 𝐼. _____________________________________________________________________________________________
c.
DNA_Segmentation(input: integer n, array of n integers I[])
/* Procedure that returns the optimal solution for the segmentation of a DNA strand. The matrix
C is assumed to be a global matrix (as well as the matrix Break). */
{
if (n<3)
return(0)
for (p= 1 to n-1) /*Initialization*/
for (r = 1 to n) /* (Could also run just from r=i+1 to n) */
if( r=p+1 )
C[p][r] := 0 /* By Item A in question */
else
C[p][r] := -1 /* Mark as uncomputed */
DNA_Segmentation_Mem(1,n,I) /* Computes C[1][n] by memoization */
Print_Best_Segmentation(1,n,I,Break) /* (for last item in the question) */
return(C[1][n])
}
DNA_Segmentation_Mem(p,r,I) /* Computes C[p][r] recursively+memory (memorization) */
{
min_cost := ∞ /* ∞ stands for cost larger than any possible segmentation cost, e.g.,
n2. Alternatively can initialize with cost for q=p+1 */
for (q = p+1 to r-1) /* Computes cost for best first breaking point I[q] */
{
if (C[p][q] = -1) /* if C[p][q] has not been computed yet*/
DNA_Segmentation_Mem (p,q,I) /* Computes and fills in C[p][q] */
if (C[q][r] = -1) /* if C[q][r] has not been computed yet*/
DNA_Segmentation_Mem (q,r,I) /* Computes and fills in C[q][r] */
new_cost := max(I[q] – I[p], I[r] – I[q]) + C[p][q] + C[q][r]
if (new_cost < min_cost) { /* Found better initial segmentation */
min_cost := new_cost
best_break := q /* (for last item in the question)
Can also just write Break[p][r]:=q*/
}
}
C[p][r] := min_cost
Break[p][r] := best_break /* (for last item in the question) */
}
It is also possible to solve in a bottom up way as follows:
DNA_Segmentation_BU(n,I)
{
for (p=1 to n-1) /* Initialization */
C[p][p+1] := 0
for (d=2 to n-1) { /* d is difference between p and r */
for (p=1 to n-2) {
r := p+d
min_cost := ∞ /* Can also initialize with cost on q=p+1 */
for (q=p+1 to r-1) { /* Compute cost for best initial segmentation */
new_cost := max(I[q] – I[p], I[r] – I[q]) + C[p][q] + C[q][r]
if (new_cost < min_cost) /* Found better initial segmentation */
min_cost := new_cost
} /* End `for’ on q */
C[p][r] := min_cost
} /* End `for’ on p */
} /* End `for’ on d */
return(C[1][n])
}
Moreover, it is possible to preform a bottom up according to another (correct) order of
computation: in particularly, when 𝑝 runs from bigger to smaller and 𝑟 from smaller to bigger.
Common mistakes:
1. Not returning 𝐶[1][𝑛] at the end of the bottom-up version.
2. Calling the wrong recursive procedure in the memorization version (meaning,
with wrong parameters).
3. Inexplicit boundaries of the loos (in particular of 𝑞).
4. Missing initialization or a wrong one for min_cost (for example to 0).
5. A bottom-up calculation with a wrong ordered calculation, that causes using the
values in the matrix that did not yet been calculated.
6. A recursive version that does not use memory – the running time in such case
would be exponential.
7. A severe confusion with using the indexes that implies of an attempt to match
the solution to a different problem, in context of the current one, without true
understanding.
8. Instead of calculating the minimum with a formula using a loop that run on 𝑞
(between 𝑝 + 1 and 𝑟 − 1), simply writing the formula (there were very few
that did that). There were also (very few) that wrote the loop but in addition left
the min without explaining between what is the minimum.
d. Give the best possible upper bound on the running time of the procedure you wrote.
_____________________________________________________________________________________________
d.
For the memorization version: the running time of the initialization loop is Θ(𝑛2) since its two
nested for loops. Since before each call to DNA_Segmentation_Mem with the pair 𝑝, 𝑞 (or 𝑞, 𝑟)
we check that 𝐶[𝑝][𝑞] (similarly 𝐶[𝑞][𝑟]) is not yet calculated, then to every such pair there is at
most one call for the procedure. The running time of the procedure with the parameters 𝑝, 𝑟 is
linear in 𝑟 − 𝑝 which is at most 𝑛 (when only the calls with 𝑟 > 𝑝 (according to the boundaries
of the while loop) are done). The total running time is bounded from above by the number of
pairs, 𝑛2, times the running time for each pair, which is at most 𝑐𝑛 for some constant 𝑐, and we
get 𝑂(𝑛3). A somewhat more exact computation gives a order of magnitude of:
∑ ∑ (𝑟 − 𝑝)𝑛𝑟=𝑝+1
𝑛−1𝑝=1 = ∑ ∑ 𝑑𝑛−𝑙
𝑑=1𝑛−1𝑙=1 ≤ ∑ ∑ 𝑑𝑛
𝑑=1𝑛𝑙=1 = ∑
𝑛(𝑛+1)
2
𝑛𝑑=1 = 𝑂(𝑛3)
When we used the identity 𝑑 = 𝑟 − 𝑝 in the series developing.
For the Bottom Up version we get the same but more simply: there are 3 nested loops, each run
at most 𝑛 iterations.
e. Show, with writing another procedure if you need, an addition to the pseudo code that you
wrote (that does not change asymptotically the running time of the algorithm), how can you also
find the optimal sequence of divisions (and not just its cost). In other words, the output of the
procedure is a series of tuples (𝑖1, 𝑗1, 𝑘1), (𝑖2, 𝑗2, 𝑘2), … , (𝑖𝑛−1, 𝑗𝑛−1, 𝑘𝑛−1) when (𝑖𝑥, 𝑗𝑥, 𝑘𝑥)
means that the 𝑥 division is a sub string from place 𝑖𝑥 to place 𝑘𝑥. In particular, for the first
division, 𝑖1 = 1, 𝑘1 = 𝑡. (For the example in the first question, the output would be (from left to
right): (1,6,10), (1,3,5)).
_____________________________________________________________________________________________
e.
The addition to the code is marked in bold in the code in sub question c. In order to print the
optimal segmentation we use the following procedure:
Print_Best_Segmentation(p,r,I,Break) /*Prints the optimal segmentation*/
{
if (r < p+2) return /* No further segmentation needed */
q := Break[p][r] /* q is index of best initial segmentation point I[q] */
Print(I[p],I[q],I[r]-1)
Print_Best_Segmentation(p,q,I,Break) /*Recursively print segmentation of first substring*/
Print_best_Fragmentation(q,r,I,Break) /*Recursively print segmentation of second substring*/
}
Common mistakes:
1. Missing or wrong stop condition.
2. Wrong order of operations (in particular, the third row in the above code
appears at the end).
3. Printing of (𝑝, 𝑞, 𝑟) instead of (𝐼[𝑝], 𝐼[𝑞], 𝐼[𝑟] − 1).