developing and releasing sofa statistics

Download Developing and releasing SOFA Statistics

If you can't read please download the document

Upload: grant-paton-simpson

Post on 16-Apr-2017

14.684 views

Category:

Technology


3 download

TRANSCRIPT

SOFAPaton-Simpson & Associates LtdAuckland, New Zealand

SOFA Statistics

Developing and releasing
a Python open source application

Grant Paton-Simpsonsofastatistics.com

SOFAPaton-Simpson & Associates LtdAuckland, New Zealand

Overview

Introducing the SOFA Statistics application

How SOFA works with SQL databases

Using HTML for output (via wxWebKit)

Experience with existing statistics modules

wxPython GUI toolkit (esp the grid widget)

The release process (esp packaging)

In 30 minutes flat out!

SOFAPaton-Simpson & Associates LtdAuckland, New Zealand

Introducing SOFA Statistics

SOFA stands for Statistics Open For All

A cross platform desktop application for:Making report tables from data (database, spreadsheet, directly entered)

Producing charts

Running key statistical tests

The slogan is ease of use, learn as you go, and beautiful output

May be useful for specialist statisticians but emphasis on supporting non-specialists, and learning statisticians

Currently version 0.8.10 and pushing on towards a
version 1.0 release

SOFA Architecture

SQLiteMySQLMS Access...

SOFA

Linking not importing

Scripts from GUI or by hand (available for automation)

HTML output(spreadsheet-friendly)

PostgreSQLSQL Server

SOFAPaton-Simpson & Associates LtdAuckland, New Zealand

Working with SQL

Not using an abstraction layer. Wrote my own code using MySQLdb module etc Already experienced with SQL

Want control over the information I get about data configuration (e.g character set)

Want control over how I interact with the databases for performance reasons

Adding other databases, e.g. Oracle, is a process of copying an existing module and changing the implementation

SQL databases do things very differentlySQLite with data type integrity (like dynamic typing)

PostgreSQL and SUM(expression)

HTML Report Table Output

Tree-based for each dimension (rows, cols)

Created an artificial limit of 5000 cells

Scales linearly and Python not the bottleneck

Rendered locally using wxWebKit

Statistics Modules

Looked at using existing libraries but ended up using a modified subset of their code

Was not my preferred approach. Benefits to plugging in a module:Saving time and effort

Less risk (the results of statistical algorithms can wildly diverge because of small floating point errors compounding and multiplying)

Any issues found and fixed can help everyone

Reasons for creating own code (often based on existing)Standard code didn't return results separated from formatting

No option of using decimal instead of floating point maths

Half-baked code in some places

Keeping the installer file size down

But I do use existing libraries to test my code against
(using nosetests)

SOFAPaton-Simpson & Associates LtdAuckland, New Zealand

SOFAPaton-Simpson & Associates LtdAuckland, New Zealand

Demonstration of key functionality

Background for discussion of the GUI toolkit and other topics

NB still lots more functionality to be added

Switch over to Jaunty

Interactive visualisations using MatPlotLib (and wxWebKit)e.g. showing how a t-test works (ideas from Statistics Without Tears)

e.g. impact of altering your sample size

Output charting using RaphaelJS (SVG & JS)

Mac OS X package, more flexible packaging

Add ability to import from Calc and SPSS

Other databases e.g. Oracle, DB2, Interbase

Increase test coverage

SOFAPaton-Simpson & Associates LtdAuckland, New Zealand

Plans

More languages e.g. French, Spanish, German

ROC, power calculations

Overall, not trying to compete with R (or RK Ward)The slogan is ease of use, learn as you go, and beautiful output

May be useful for specialist statisticians but emphasis on supporting non-specialists, and learning statisticians

Focus on making most common needs easy to satisfy

Plugin extensions for rest

SOFAPaton-Simpson & Associates LtdAuckland, New Zealand

Plans cont ...

wxPython GUI Toolkit

Cross-platform and native widgets

Ubuntu (Dust theme)

Windows XP

wxPython in Action Great book

Mailing list (with Robin Dunn a regular contributor)

Lots of online documentation (but googling and integration
of different ideas often required)

There is a GUI for making GUIs but I prefer handcodingClean

Can reuse code across different forms

Can delegate parts of the GUI e.g. to database plugin modules

Lots of sophisticated, configurable widgetsWas able to make a data entry table work like I thought it should e.g. new row has column label of , specific behaviour when tabbing

Focus on grid control

SOFAPaton-Simpson & Associates LtdAuckland, New Zealand

Documentation & Community

With flexibility and power comes some complexity to handle

Only 1100 lines of code to make the data grid you saw in the demonstration (inc validation, ability to add new rows and edit values etc)

May be sensible to have more lines of documentation than code in some modules

Resolving issues can take you to the edge of what is known/ documentedHad data entry working like clockwork in Ubuntu

Found out Windows intercepted Tabs and Returns before they could be exposed and reacted to

But there was a solution

SOFAPaton-Simpson & Associates LtdAuckland, New Zealand

wx.Grid

SOFAPaton-Simpson & Associates LtdAuckland, New Zealand

Example of wx.Grid code

self.frame.Bind(wx.grid.EVT_GRID_EDITOR_CREATED,
self.OnGridEditorCreated)def OnGridEditorCreated(self, event):
"""
Need to bind KeyDown to the control itself e.g. a choice
control.
wx.WANTS_CHARS makes it work.
"""
control = event.GetControl()
control.WindowStyle |= wx.WANTS_CHARS
control.Bind(wx.EVT_KEY_DOWN, self.OnGridKeyDown)
event.Skip()def OnGridKeyDown(self, event): keycode = event.GetKeyCode()
if keycode in (wx.WXK_TAB, wx.WXK_RETURN):
etc

The user clicks on a cell to edit a value.
We bind to that event.

Now we can grab the control ... and respond to its key down event

Now we're away again :-)

SOFAPaton-Simpson & Associates LtdAuckland, New Zealand

Custom Controls

Option of label display e.g. Male not 1

Conditional formatting e.g. all values > 1000 red

Choice of toolkit very importantCan it support what you want to do, or will you hit a wall?

If I wanted to display sparklines or pie charts as cells in a table could I?

Hard to know whether a good choice until already committed

SOFAPaton-Simpson & Associates LtdAuckland, New Zealand

Extending the grid further

Lots of steps to get 100% right. My steps are:Do preparatory clean up (debugging off, demo database tidy).

Make sure I have translations for any new strings I've added.

Make and test the new deb and Windows packages. I use VirtualBox to give me identical install environments each time.

Add the new files to Sourceforge (I wanted to consolidate downloads to help me measure usage).

Add a new release to Launchpad and Freshmeat complete with updated release notes and change log (used Bazaar to push to Launchpad so can browse my commit comments).

Make announcements in both Launchpad and Freshmeat.

Update the project homepage to account for the new download location, new features.

Add a blog item to the project site.

Update release version and release date on Wikipedia.

Revisit any important threads commenting on open source
statistics packages.

SOFAPaton-Simpson & Associates LtdAuckland, New Zealand

Release process

Initially very daunting where do you start?

Found ShowMeDo video by Austrian developer Horst JensHis example was a Python project so very similar requirements

Ended up with very detailed step-by-step guidelines for packaging SOFA Statistics

NB installing application for all users, not a given userFiles are put into /usr/share/pyshared/sofa/...

Any files needed by an individual user are transferred during first use of application /home/username/sofa/...

SOFAPaton-Simpson & Associates LtdAuckland, New Zealand

Making Debian Package (for Ubuntu)

Nullsoft Scriptable Install System. Free. Used by Firefox, OpenOffice etc

Weird language cross between PHP and assembler

Plenty of documentation etc but best to start and then extend

Issue file size. Including mysqldb, numpy, wxpython, sqlite, python

Put program in Program Files and user files in
Documents and Settings\username\sofa\...

SOFAPaton-Simpson & Associates LtdAuckland, New Zealand

NSIS Windows Installer

Running an open source project can be very satisfying

Lots of new learning

A long-term commitment

Lots to do. Not just glamour coding - someone has to take out the trash

Phenomenal resources available in open source world bazaar, loggerhead, nosetests, etc

Hands up if ever considered it (or doing it)

SOFAPaton-Simpson & Associates LtdAuckland, New Zealand

Final Thoughts