Download - Developing and releasing SOFA Statistics
SOFAPaton-Simpson & Associates LtdAuckland, New Zealand
SOFA Statistics
Developing and releasing
a Python open source application
Grant Paton-Simpsonsofastatistics.com
SOFAPaton-Simpson & Associates LtdAuckland, New Zealand
Overview
Introducing the SOFA Statistics application
How SOFA works with SQL databases
Using HTML for output (via wxWebKit)
Experience with existing statistics modules
wxPython GUI toolkit (esp the grid widget)
The release process (esp packaging)
In 30 minutes flat out!
SOFAPaton-Simpson & Associates LtdAuckland, New Zealand
Introducing SOFA Statistics
SOFA stands for Statistics Open For All
A cross platform desktop application for:Making report tables from data (database, spreadsheet, directly entered)
Producing charts
Running key statistical tests
The slogan is ease of use, learn as you go, and beautiful output
May be useful for specialist statisticians but emphasis on supporting non-specialists, and learning statisticians
Currently version 0.8.10 and pushing on towards a
version 1.0 release
SOFA Architecture
SQLiteMySQLMS Access...
SOFA
Linking not importing
Scripts from GUI or by hand (available for automation)
HTML output(spreadsheet-friendly)
PostgreSQLSQL Server
SOFAPaton-Simpson & Associates LtdAuckland, New Zealand
Working with SQL
Not using an abstraction layer. Wrote my own code using MySQLdb module etc Already experienced with SQL
Want control over the information I get about data configuration (e.g character set)
Want control over how I interact with the databases for performance reasons
Adding other databases, e.g. Oracle, is a process of copying an existing module and changing the implementation
SQL databases do things very differentlySQLite with data type integrity (like dynamic typing)
PostgreSQL and SUM(expression)
HTML Report Table Output
Tree-based for each dimension (rows, cols)
Created an artificial limit of 5000 cells
Scales linearly and Python not the bottleneck
Rendered locally using wxWebKit
Statistics Modules
Looked at using existing libraries but ended up using a modified subset of their code
Was not my preferred approach. Benefits to plugging in a module:Saving time and effort
Less risk (the results of statistical algorithms can wildly diverge because of small floating point errors compounding and multiplying)
Any issues found and fixed can help everyone
Reasons for creating own code (often based on existing)Standard code didn't return results separated from formatting
No option of using decimal instead of floating point maths
Half-baked code in some places
Keeping the installer file size down
But I do use existing libraries to test my code against
(using nosetests)
SOFAPaton-Simpson & Associates LtdAuckland, New Zealand
SOFAPaton-Simpson & Associates LtdAuckland, New Zealand
Demonstration of key functionality
Background for discussion of the GUI toolkit and other topics
NB still lots more functionality to be added
Switch over to Jaunty
Interactive visualisations using MatPlotLib (and wxWebKit)e.g. showing how a t-test works (ideas from Statistics Without Tears)
e.g. impact of altering your sample size
Output charting using RaphaelJS (SVG & JS)
Mac OS X package, more flexible packaging
Add ability to import from Calc and SPSS
Other databases e.g. Oracle, DB2, Interbase
Increase test coverage
SOFAPaton-Simpson & Associates LtdAuckland, New Zealand
Plans
More languages e.g. French, Spanish, German
ROC, power calculations
Overall, not trying to compete with R (or RK Ward)The slogan is ease of use, learn as you go, and beautiful output
May be useful for specialist statisticians but emphasis on supporting non-specialists, and learning statisticians
Focus on making most common needs easy to satisfy
Plugin extensions for rest
SOFAPaton-Simpson & Associates LtdAuckland, New Zealand
Plans cont ...
wxPython GUI Toolkit
Cross-platform and native widgets
Ubuntu (Dust theme)
Windows XP
wxPython in Action Great book
Mailing list (with Robin Dunn a regular contributor)
Lots of online documentation (but googling and integration
of different ideas often required)
There is a GUI for making GUIs but I prefer handcodingClean
Can reuse code across different forms
Can delegate parts of the GUI e.g. to database plugin modules
Lots of sophisticated, configurable widgetsWas able to make a data entry table work like I thought it should e.g. new row has column label of , specific behaviour when tabbing
Focus on grid control
SOFAPaton-Simpson & Associates LtdAuckland, New Zealand
Documentation & Community
With flexibility and power comes some complexity to handle
Only 1100 lines of code to make the data grid you saw in the demonstration (inc validation, ability to add new rows and edit values etc)
May be sensible to have more lines of documentation than code in some modules
Resolving issues can take you to the edge of what is known/ documentedHad data entry working like clockwork in Ubuntu
Found out Windows intercepted Tabs and Returns before they could be exposed and reacted to
But there was a solution
SOFAPaton-Simpson & Associates LtdAuckland, New Zealand
wx.Grid
SOFAPaton-Simpson & Associates LtdAuckland, New Zealand
Example of wx.Grid code
self.frame.Bind(wx.grid.EVT_GRID_EDITOR_CREATED,
self.OnGridEditorCreated)def OnGridEditorCreated(self,
event):
"""
Need to bind KeyDown to the control itself e.g. a choice
control.
wx.WANTS_CHARS makes it work.
"""
control = event.GetControl()
control.WindowStyle |= wx.WANTS_CHARS
control.Bind(wx.EVT_KEY_DOWN, self.OnGridKeyDown)
event.Skip()def OnGridKeyDown(self, event): keycode =
event.GetKeyCode()
if keycode in (wx.WXK_TAB, wx.WXK_RETURN):
etc
The user clicks on a cell to edit a value.
We bind to that event.
Now we can grab the control ... and respond to its key down event
Now we're away again :-)
SOFAPaton-Simpson & Associates LtdAuckland, New Zealand
Custom Controls
Option of label display e.g. Male not 1
Conditional formatting e.g. all values > 1000 red
Choice of toolkit very importantCan it support what you want to do, or will you hit a wall?
If I wanted to display sparklines or pie charts as cells in a table could I?
Hard to know whether a good choice until already committed
SOFAPaton-Simpson & Associates LtdAuckland, New Zealand
Extending the grid further
Lots of steps to get 100% right. My steps are:Do preparatory clean up (debugging off, demo database tidy).
Make sure I have translations for any new strings I've added.
Make and test the new deb and Windows packages. I use VirtualBox to give me identical install environments each time.
Add the new files to Sourceforge (I wanted to consolidate downloads to help me measure usage).
Add a new release to Launchpad and Freshmeat complete with updated release notes and change log (used Bazaar to push to Launchpad so can browse my commit comments).
Make announcements in both Launchpad and Freshmeat.
Update the project homepage to account for the new download location, new features.
Add a blog item to the project site.
Update release version and release date on Wikipedia.
Revisit any important threads commenting on open source
statistics packages.
SOFAPaton-Simpson & Associates LtdAuckland, New Zealand
Release process
Initially very daunting where do you start?
Found ShowMeDo video by Austrian developer Horst JensHis example was a Python project so very similar requirements
Ended up with very detailed step-by-step guidelines for packaging SOFA Statistics
NB installing application for all users, not a given userFiles are put into /usr/share/pyshared/sofa/...
Any files needed by an individual user are transferred during first use of application /home/username/sofa/...
SOFAPaton-Simpson & Associates LtdAuckland, New Zealand
Making Debian Package (for Ubuntu)
Nullsoft Scriptable Install System. Free. Used by Firefox, OpenOffice etc
Weird language cross between PHP and assembler
Plenty of documentation etc but best to start and then extend
Issue file size. Including mysqldb, numpy, wxpython, sqlite, python
Put program in Program Files and user files in
Documents and Settings\username\sofa\...
SOFAPaton-Simpson & Associates LtdAuckland, New Zealand
NSIS Windows Installer
Running an open source project can be very satisfying
Lots of new learning
A long-term commitment
Lots to do. Not just glamour coding - someone has to take out the trash
Phenomenal resources available in open source world bazaar, loggerhead, nosetests, etc
Hands up if ever considered it (or doing it)
SOFAPaton-Simpson & Associates LtdAuckland, New Zealand
Final Thoughts