splineregresion4

Upload: augusto-cabrera

Post on 07-Jan-2016

4 views

Category:

Documents


0 download

DESCRIPTION

spline regression

TRANSCRIPT

  • devosaurusdevosaurus

    T h u r s d a y , O c t o b e r 1 0 , 2 0 1 3

    Exploring B-Splines in PythonThis notebook is the first result of online exploration of material explaining B-Splines. As I get more familiar with them, I'll do other notebooks. I will document mysources and eventually add some notes about things that bothered me. This is only a reformulation with my own terms of the vast b-spline documentation around theweb. I try to explain the pitfalls and complete the ellipsis that other websites do. However, I'm not a mathematician so I might get some things wrong. But I reallywant this to be as precise yet understandable as possible using both precise terms and common language. If you spot errors, misconceptions or cryptic parts, pleasedo leave a comment.

    The GistIn NBViewer

    Used Tools:

    Python 3NumpyMatplotlibIPython

    I won't detail their installation but have a look at IPython's installation guide. I prefer running it inside a VirtualEnv. Once installed, run IPython with:

    ipython3 notebook --pylab=inline

    We will draw these types of curves:

    B-splines are mathematical objects which are very useful in computer aided design applications. They are a generalization of Bezier curves which were developpedfor the car industry in the mid 1900's. The user typically manipulates the vertices (control points) of a polyline and the b-spline will gracefully and smoothlyinterpolate those points. B-splines have nice curvature properties that make them very useful for design and industry and can be easily used in 2D or 3D (and maybemore...). They were further generalized into Non-Uniform Rational B-splines, yep: NURBs.

    A b-spline curve is a parametric curve whose points' coordinates are determined by b-spline functions of a parameter usually called $t$ (for time - this comes fromphysics and motion description). More precisely: A $D$-Dimensionnal b-spline curve is a parametric curve whose $D$ component functions are b-spline functions.Let's pretend $C(t)$ is a 2D B-Spline curve (each of its points has two coordinates) :

    eq. 1 : $C(t) = ( S_x(t), S_y(t) )$

    $S_x$ and $S_y$ are the component functions of $C$ for the $x$ and $y$ coordinates respectively. In other words, $S_x(t)$ and $S_y(t)$ will yield the x and ycoordinates of $C(t)$'s points. They are the actual b-spline functions.

    But $t$ can't define the shape of $C$ all alone. A b-spline curve is also defined by:

    $P$ : an $m$-sized list of control points. This defines which points the curve will interpolate. For example $P=[(2,3),(4,5),(4,7),(1,5)]$ be a list of $m=4$vertices with two de Boor points each ($D=2$). Graphically, $P$ forms the control polygone or control cage of the curve.$V$ : a $k$-sized list of non-decreasing real values called knot vector. It's the key to b-splines but is quite abstract. Knot vectors are very intrically tiedto...$n$ : the degree of the function. A higher degree means more derivability.

    $S_x$ and $S_y$ are the same function $S$ which simply processes different de Boor points (different coordinates of $P$'s points). So for the 2D curve $C$:

    eq. 2 : $C_{P,V,n} (t) = ( S_{Px,V,n} (t), S_{Py,V,n} (t) )$

    Note that $S$ is "configured" (subscripted) like $C$ except that for the $x$ component function it takes the $x$ coordinate of points in $P$, thus the $Px$ subscript,the same goes for the $y$ component/coordinate. This is exactly what I meant above by "processes different de Boor points". These two sentences are equivalent. Ialso didn't put $P$, $V$ and $n$ as parameters because when you evaluate $C$, you evaluate $S$ and you evaluate it for a varying $t$, while all other inputs remainconstant. This expression makes sense when a curve is to be evaluated more often than its control points are modified. Now we just need to know the definition of aB-spline function.

    The mathematical definition of a B-Spline of degree $n$ as found in literature [Wolfram,Wikipedia] is

    eq. 3 : $S_{n}(t) = \sum \limits_i^m p_i b_{i,n}(t)$

    $b$ is the b-spline basis function, we're getting close! $p_i$ is the vertex of index i in $P$. This means eq. 3 is for one dimension! Let's express it for curve of$D$-dimensions.

    eq. 4 : $\forall d \in \{1,...,D\}, C_{P,V,n} = (S_{n,d_1}(t), ...,S_{n,d_D}(t) ) = (\sum \limits_i^m p_{i,d_1} b_{i,n}(t), ..., \sum \limits_i^m p_{i,d_D}b_{i,n}(t))$

    Introduction to B-Splines curves by a dummy

    Parametric curves

    B-Spline Function

    HomeOn Github

    Pages

    2013 (4) October (1)

    ExploringB-Splinesin Python

    September(3)

    2012 (4)

    Blog Archive

    allocation (1)bamboo (1)blender (1)bsplines (1)c (1)c++ (1)calloc (1)compiler (1)dragonegg (1)gcc (1)ipython (1)it (2)linux (1)llvm (2)matplotlib (1)memoize (1)new (1)notebook (1)optimize (2)parametric (1)python (1)wacom (1)wireless (1)

    Labels

    devosaurus: Exploring B-Splines in Python http://devosaurus.blogspot.mx/2013/10/exploring-...

    1 of 9 08/19/2015 05:57 PM

  • Roughly, $S$ is a sum of monomials, making it a polynomial! :) There is one monomial for each index of a vertex and each monomial is the product of the de Boorpoint at that index and the basis function at $t$ for that index. This is repeated for $D$ dimensions. So computing b-splines this way is quite slow with many loops,etc... but the implementation is easy and there are ways to speed it up (like memoization [Wikipedia, StackOverflow]).

    Okay, time to translate eq. 4 to Python even though we don't know what $b$ is exactly. I'll try to follow the same naming but keep in mind that Python uses0-indexing while maths use 1-indexing, so we shall pay attention to the iterations and array sizes.

    In [3]:

    def C_factory(P, V=None, n=2): """ Returns a b-spline curve C(t) configured with P, V and n. Parameters ========== - P (list of D-tuples of reals) : List of de Boor points of dimension D. - n (int) : degree of the curve - V (list of reals) : list of knots in increasing order (by definition). Returns ======= A D-dimensionnal B-Spline curve. """ # TODO: check that p_len is ok with the degree and > 0 m = len(P) # the number of points in P D = len(P[0]) # the dimension of a point (2D, 3D) # TODO: check the validity of the input knot vector. # TODO: create an initial Vector Point. ############################################################################# # The following line will be detailed later. # # We create the highest degree basis spline function, aka. our entry point. # # Using the recursive formulation of b-splines, this b_n will call # # lower degree basis_functions. b_n is a function. # ############################################################################# b_n = basis_factory(n) def S(t, d): """ The b-spline funtion, as defined in eq. 3. """ out = 0. for i in range(m): #: Iterate over 0-indexed point indices out += P[i][d]*b_n(t, i, V) return out def C(t): """ The b-spline curve, as defined in eq. 4. """ out = [0.]*D #: For each t we return a list of D coordinates for d in range(D): #: Iterate over 0-indexed dimension indices out[d] = S(t,d) return out #################################################################### # "Enrich" the function with information about its "configuration" # #################################################################### C.V = V #: The knot vector used by the function C.spline = S #: The spline function. C.basis = b_n #: The highest degree basis function. Useful to do some plotting. C.min = V[0] #: The domain of definition of the function, lower bound for t C.max = V[-1] #: The domain of definition of the function, upper bound for t C.endpoint = C.max!=V[-1] #: Is the upper bound included in the domain. return C

    C_factory is a high-order function: it is a function which returns a function. The returned C function is a closure, it inherits the namespace of itsdefinition context and holds reference to those variables.In Python 3, the range(start, stop, step) function returns an iterator which iterates over integers like: $[0,X)$. This spares us much fiddling with"-1" adjustments all over the place. In Python 2, there it is called xrange(start, stop, step).

    Right, good question. $V$ is a list whose length is $k$ (see eq. 6), made of reals $t_q$ (called knots) with $q \in [1, k]$ where $t_q \leq t_{q+1}$. In other wordsvalues must never decrease. The current $t$ must lay somewhere in $[t_1, t_k)$. During the calculations $t$ will be compared to all the knot spans and if it falls inthose spans, then the basis function will have a certain influence on the final curve. It determines which vertices (de Boor points) will influence the curve at a given$t$.

    The distribution of the knots in $V$ will determine if the curve meets the endpoints of the control polyline (it is then clamped), or not (unclamped - the domain ofdefinition is shorter). It can be periodic or not if some knot spans repeat themselves, and uniform or not if the knot spans have a constant size, but more on thislater when I wrap my head around it a bit more! Just remember that the distribution of knots will depend on the type of the knot vector. We will focus on clampedknot vectors which are a special case of b-splines behaving like Bezier curves.

    A clamped knot vector looks like this:

    eq. 5 : $ V = \{t_1 = ... = t_n, ..., t_{k-n-1}, t_{k-n} = ... = t_{k} \} $Where, as a reminder, $k$ is the length of $V$, where $n$ is the degree of the curve. and the length of a knot vector is totally conditionned by the type of $V$, thedegree of the curve and the number of control points.

    The $t_1, ... ,t_n$ and $t_{k-n} = ... = t_k$ are called external knots, the others are internal knots.eq. 6 : $k = m + n + 2$

    We can now implement a basic make_knot_vector function which only handles clamped knot vectors.In [4]:

    def make_knot_vector(n, m, style="clamped"): """ Create knot vectors for the requested vector type.

    Some python background

    What is the Knot Vector?

    Clamped Knot Vector

    devosaurus: Exploring B-Splines in Python http://devosaurus.blogspot.mx/2013/10/exploring-...

    2 of 9 08/19/2015 05:57 PM

  • Parameters ========== - n (int) : degree of the bspline curve that will use this knot vector - m (int) : number of vertices in the control polygone - style (str) : type of knot vector to output Returns ======= - A knot vector (tuple) """ if style != "clamped": raise NotImplementedError total_knots = m+n+2 # length of the knot vector, this is exactly eq.6 outer_knots = n+1 # number of outer knots at each of the vector. inner_knots = total_knots - 2*(outer_knots) # number of inner knots # Now we translate eq. 5: knots = [0]*(outer_knots) knots += [i for i in range(1, inner_knots)] knots += [inner_knots]*(outer_knots) return tuple(knots) # We convert to a tuple. Tuples are hashable, required later for memoization

    Basis functions are the functions that will be multiplied and summed in the B-Spline function.

    The b-spline basis function is defined as follows:

    eq. 7 : $b_{i,1}(x) = \left \{ \begin{matrix} 1 & \mathrm{if} \quad t_i \leq x < t_{i+1} \\0 & \mathrm{otherwise} \end{matrix} \right.$eq. 8 : $b_{i,k}(x) = \frac{x - t_i}{t_{i+k-1} - t_i} b_{i,k-1}(x) + \frac{t_{i+k} - x}{t_{i+k} - t_{i+1}} b_{i+1,k-1}(x).$

    There is really not much to say about them right now. Sometime later we might try to understand them, but we're fine with this definition right now. However, I didget confused as some sites write eq. 7 this way:

    $b_{i,1}(x) = \left \{ \begin{matrix} 1 & \mathrm{if} \quad t_i \leq x \leq t_{i+1} \\0 & \mathrm{otherwise} \end{matrix} \right.$

    This is wrong (the double $\leq$), at least as far as I could see : graphically, spikes appear on the curve at knot values. It makes sense since at knots the sum (eq. 3)will cumulate values from different intervals because $b_{i,1}$ will evaluate to 1 instead of 0 at internal knots.

    It is a high-order function! It will return b-spline basis functions b_n of degree $n$ which will be used by S. We are ready to implement it. We need to translate eq.7 and eq. 8 to Python. Again, we must be extra careful with indexing.In [5]:

    def basis_factory(degree): """ Returns a basis_function for the given degree """ if degree == 0: def basis_function(t, i, knots): """The basis function for degree = 0 as per eq. 7""" t_this = knots[i] t_next = knots[i+1] out = 1. if (t>=t_this and t< t_next) else 0. return out else: def basis_function(t, i, knots): """The basis function for degree > 0 as per eq. 8""" out = 0. t_this = knots[i] t_next = knots[i+1] t_precog = knots[i+degree] t_horizon = knots[i+degree+1]

    top = (t-t_this) bottom = (t_precog-t_this) if bottom != 0: out = top/bottom * basis_factory(degree-1)(t, i, knots) top = (t_horizon-t) bottom = (t_horizon-t_next) if bottom != 0: out += top/bottom * basis_factory(degree-1)(t, i+1, knots) return out #################################################################### # "Enrich" the function with information about its "configuration" # #################################################################### basis_function.lower = None if degree==0 else basis_factory(degree-1) basis_function.degree = degree return basis_function

    We now have all that is needed to start drawing clamped b-splines!

    In [6]:

    # We decide to draw a quadratic curve

    B-spline basis function

    Definition

    What is basis_factory(n) ?

    The puzzle is complete

    Constructing the spline curve

    devosaurus: Exploring B-Splines in Python http://devosaurus.blogspot.mx/2013/10/exploring-...

    3 of 9 08/19/2015 05:57 PM

  • n = 2# Next we define the control points of our curveP = [(3,-1), (2.5,3), (0, 1), (-2.5,3), (-3,-1)]# Create the knot vectorV = make_knot_vector(n, len(P), "clamped")# Create the Curve functionC = C_factory(P, V, n)

    $C(t)$ is ready to be sampled (evaluated)! That means that we are ready to feed it with values for $t$ and getting point coordinates out. In the case of clampedcurves the domain of definition of $C$ - that is: the values for which $t$ is valid - is $[min(V), max(V))$. These bounds are available as C.min and C.max memberattributes of C. Do NOT sample at the endpoint, the curve is not defined there in the case of clamped knot vectors. There are situations where it is possible tosample at C.max but this value will be different to $max(V)$.There are many ways to sample a function. In general most effort is put in strategies the put more samples (decreasing sample steps) in curvy areas and less instraight lines. This can be done by analyzing the polyline angles for example. The best sampling visually is when each display point was tested for being a point ofthe curve.

    These are complicated strategies that we might explore later. For now we will use a linear sampling of the definition interval.

    In [7]:

    # Regularly spaced samplessampling = [t for t in np.linspace(C.min, C.max, 100, endpoint=C.endpoint)]# Sample the curve!!!!curvepts = [ C(s) for s in sampling ]

    # Create a matplotlib figurefig = plt.figure()fig.set_figwidth(16)ax = fig.add_subplot(111)# Draw the curve pointsax.scatter( *zip(*curvepts), marker="o", c=sampling, cmap="jet", alpha=0.5 )# Draw the control cage.ax.plot(*zip(*P), alpha=0.3)Out[7]:

    []

    Mission complete! We can now draw clamped 2D Curves!

    Compared to Bezier curves, a B-Spline of degree $n$ can interpolate a polyline of an almost arbitrary number of points without having to raise the degree. WithBezier curves, to interpolate a polyline of $m$ vertices you need a curve of $n=m-1$ degree. Or you do a "poly-bezier" curve: you connect a series of lower-degreebezier curves and put effort in keeping control points correctly placed to garantee the continuity of the curve at the "internal endpoints".

    In our example, the endpoint is on the left (dark red). It is quite clear that the curve's end doesn't match that of the polyline. This is because in our basic samplingexample, we didn't include the endpoint (when $t=max(V)$). And for a good reason : the curve is not defined there. However, we could include a sample $t_f$ veryclose to $max(V)$:

    $t_f=max(V)-\epsilon $

    In our example we sampled one short b-spline of low-degree and with a decent amount of samples and it went quickly. However, if you add more curves and samplethem in a similar way, you will notice the slowness of the calculations. Memoization is interesting in our case for several reasons:

    the recursive definition of $b_n$ (eq. 8) and its implementation (see basis_function(n)) imply many requests to create lower-level basis functions.the fact that the same $S(t)$ is used for the different components (coordinates) of the curve points implies many calls to the basis_functions with thesame exact input. So in a 2D case, for each point we save one full computation of $b_n$.if we consider that the curve will be more often sampled than modified (eg: adaptive sampling in a resolution-aware drawing system to get exactrepresentation of the curve) then there are chances that the curve will be sampled many times for the same $t$.

    This gives serious performance improvements at the expense of memory.

    See the final listing for an implementation with memoization. The creation of the knot vector was moved inside the C_factory function. A helper function wascreated for drawing purpose. It includes a dirty fixed hack to approach the endpoint too.

    Wolfram Mathworld's b-spline articleWikipedia's b-spline articleWikipedia Memoization articleStackOverflow Python Memoization article

    Sampling the curve

    Short discussionDegree

    Ugly endpoint

    Computation

    Complete implementation with memoization

    References

    devosaurus: Exploring B-Splines in Python http://devosaurus.blogspot.mx/2013/10/exploring-...

    4 of 9 08/19/2015 05:57 PM

  • In [8]:

    def memoize(f): """ Memoization decorator for functions taking one or more arguments. """ class memodict(dict): def __init__(self, f): self.f = f def __call__(self, *args): return self[args] def __missing__(self, key): ret = self[key] = self.f(*key) return ret return memodict(f)

    def C_factory(P, n=2, V_type="clamped"): """ Returns a b-spline curve C(t) configured with P, V_type and n. The knot vector will be created according to V_type Parameters ========== - P (list of D-tuples of reals) : List of de Boor points of dimension D. - n (int) : degree of the curve - V_type (str): name of the knit vector type to create. Returns ======= A D-dimensionnal B-Spline Curve. """ # TODO: check that p_len is ok with the degree and > 0 m = len(P) # the number of points in P D = len(P[0]) # the dimension of a point (2D, 3D) # Create the knot vector V = make_knot_vector(n, m, V_type) # TODO: check the validity of the input knot vector. # TODO: create an initial Vector Point. ############################################################################# # The following line will be detailed later. # # We create the highest degree basis spline function, aka. our entry point. # # Using the recursive formulation of b-splines, this b_n will call # # lower degree basis_functions. b_n is a function. # ############################################################################# b_n = basis_factory(n) @memoize def S(t, d): """ The b-spline funtion, as defined in eq. 3. """ out = 0. for i in range(m): #: Iterate over 0-indexed point indices out += P[i][d]*b_n(t, i, V) return out def C(t): """ The b-spline curve, as defined in eq. 4. """ out = [0.]*D #: For each t we return a list of D coordinates for d in range(D): #: Iterate over 0-indexed dimension indices out[d] = S(t,d) return out #################################################################### # "Enrich" the function with information about its "configuration" # #################################################################### C.P = P #: The control polygone C.V = V #: The knot vector used by the function C.spline = S #: The spline function. C.basis = b_n #: The highest degree basis function. Useful to do some plotting. C.min = V[0] #: The domain of definition of the function, lower bound for t C.max = V[-1] #: The domain of definition of the function, upper bound for t C.endpoint = C.max!=V[-1] #: Is the upper bound included in the domain. return C

    def make_knot_vector(n, m, style="clamped"): """ Create knot vectors for the requested vector type. Parameters ========== - n (int) : degree of the bspline curve that will use this knot vector - m (int) : number of vertices in the control polygone - style (str) : type of knot vector to output Returns ======= - A knot vector (tuple) """ if style != "clamped": raise NotImplementedError total_knots = m+n+2 # length of the knot vector, this is exactly eq.6 outer_knots = n+1 # number of outer knots at each of the vector. inner_knots = total_knots - 2*(outer_knots) # number of inner knots # Now we translate eq. 5: knots = [0]*(outer_knots) knots += [i for i in range(1, inner_knots)] knots += [inner_knots]*(outer_knots)

    Full implementation

    devosaurus: Exploring B-Splines in Python http://devosaurus.blogspot.mx/2013/10/exploring-...

    5 of 9 08/19/2015 05:57 PM

  • return tuple(knots) # We convert to a tuple. Tuples are hashable, required later for memoization

    @memoizedef basis_factory(degree): """ Returns a basis_function for the given degree """ if degree == 0: @memoize def basis_function(t, i, knots): """The basis function for degree = 0 as per eq. 7""" t_this = knots[i] t_next = knots[i+1] out = 1. if (t>=t_this and t

  • Posted by dani at 7:46 AM

    # Next we define the control points of our 3D curveP3D = [(3, 1, 8), (2.5, 3, 7), (0, 1, 6), (-2.5, 3, 5), (-3, 0, 4), (-2.5, -3, 3), (0, -1, 2), (2.5, -3, 1), (3, -1, 0)]# Create the a quadratic curve functiondraw_bspline(P=P3D, n=2, V_type="clamped")

    I decided to add this section a bit late. At first I was planning to use git commits to document changes but it is not possible. The revision number is the gist revisionnumber and I start at revision 11.

    r.14Fix details of range and xrange. This code is written for Python 3 and the info in that section was for Python 2.

    r.13More 1-indexing fixes in text, not in code.Corrections of some docstrings.Watch out for < and > in python code. For example "if d < myvar" won't interfer with the html layout when converting using nbconvert, but "if d

  • Older PostHome

    Subscribe to: Post Comments (Atom)

    Add comment

    dani January 29, 2014 at 2:06 PMHello Graham,

    Thank you for your comment. I am glad that this post was helpful. It did take me quite some time to digest all the sites I read about b-splines so I'm really glad it helped!

    Cheers!DanielReply

    nikkuen May 6, 2014 at 3:56 AMHi Dani,is it possible to export the spline's points?i would like to create a spline wiht python and then use it in a CAD-programm (ICEM CFD) as it is not possible to draw splines in a controlled way in ICEM. they are alwaysa bit random :)So is there a possibility to get the interpolated points not just in matplotlib but write them in list in python or in a csv-file?Are the points at all created in python or can I only see them in matplotlib? Is it possible to export the matplotlib-plot in a csv?thanks for your help.Reply

    dani May 6, 2014 at 7:03 AM@nikkuen : yes it is possible, here's the hack (I'm using ">" to indicate indentation level as I can't get Blogger to accept preformatted code)

    # First we modify draw_bspline so that it returns the sampled points.def draw_bspline(C=None, P=None, n=None, V_type=None, endpoint_epsilon=0.00001):> # ...> if C:>> # ...>> return curvepts

    Then, use it like this :

    # We open a output file, loop over the points and write down each coordinatesampledpts = draw_bspline(P=P, n=2, type='clamped'")with open("bspline.csv", "w") as f:> for point in sampledpts:>> for coord in point:>>> f.write(str(coord)+",")>> f.write("\n")

    There are other ways to write down the csv but this is the basic idea and you can customize the output to fit your needs. You can also have a look at python's "csv"module.Reply

    nikkuen May 7, 2014 at 12:29 AMIt works. Thanks a lot.I have another question: Do you know if there is a way to calculate the area emebedded by a closed spline?Reply

    dani May 7, 2014 at 7:46 AMThere certainly are ways but I haven't actually tried any! The best way would be to find a "closed form" solution, the bspline primitive, and implement it as a Pythonfunction that takes the same parameters as C but returns the area.Another hackish way would be to tesselate the closed (discrete) spline and sum the sub-triangle areas.Reply

    dani May 7, 2014 at 8:04 AMEar clipping (http://en.wikipedia.org/wiki/Polygon_triangulation) should be easy to do on the discrete bspline (sampledpts). One just must be careful to check that thethird edge of the current triangle is really inside the polygon not outside. If it's inside, then its area can be summed and the triangle can then be discarded.Reply

    devosaurus: Exploring B-Splines in Python http://devosaurus.blogspot.mx/2013/10/exploring-...

    8 of 9 08/19/2015 05:57 PM

  • MathJax,GooglePrettyPrint

    Daniel Barbeau Vasquez. Simple template. Powered by Blogger.

    devosaurus: Exploring B-Splines in Python http://devosaurus.blogspot.mx/2013/10/exploring-...

    9 of 9 08/19/2015 05:57 PM