interscript.sourceforge.netinterscript.sourceforge.net/interscript/doc/en_iscr.txt · 2001-06-181....

1. Interscript Version----------------------

This document describes Interscript version 1.0a11build 41 on pelican by root.It was generated byInterscript version 1.0a11, build 40 on pelican by rootat Mon Jun 18, 2001 at 01:03 PM (UTC).

1.1. Contents-------------

Interscript Version .............................. 1 Contents ...................................... 9 Introduction .................................... 86 Interscript .................................. 90 Features .................................... 122 Requirements ................................... 338 Functionality ............................... 373 Interface ................................... 758 Management .................................. 852 Implementation pragmatics .................. 1009 Design Fundamentals ........................... 1052 Devices .................................... 1058 Processors ................................. 1239 Tanglers ................................... 1270 Lexical Scoping ............................ 1312 Weaver Control ............................. 1344 Source Tracking ............................ 1365 Parsing .................................... 1422 Documentation Constructions ................ 1448 Microformatting shortcuts .................. 1483 Weaver Architecture ........................ 1574 Tutorial ...................................... 1706 Weaving a document ......................... 1714 Tangling code .............................. 1921 Scripting .................................. 2042 Unit tests ................................. 2150 Fonts ...................................... 2268 Lists ...................................... 2296 Tables ..................................... 2366 Citations .................................. 2386 Cross References ........................... 2396 Including Files ............................ 2422 Translating Html ........................... 2464 Special constructions ...................... 2476 File Names ................................. 2575 Questions and Answers ...................... 2624 Implementation ................................ 2888 Interscript Module ......................... 2893 Core Subpackage ............................ 3616 Character set encodings .................... 3764 Unicode Data ............................... 3838 ISO-10646 encodings ....................... 53572 Iso 8859-x Mappings ....................... 53762 ShiftJis Mapping .......................... 54404 GB 2312 Mapping ........................... 55726 KSC-5601-1992 (Johab) Mapping ............. 57181 KSC-5601-1987 (Wansung) Mapping ........... 59507 Big5 Mapping .............................. 60743 8 bit Microsoft/IBM code pages ............ 62461 Drivers Subpackage ........................ 64474 Weavers ................................... 65079 Weaver Filters ............................ 67265 Tanglers .................................. 67316 Tokenisers ................................ 67772 Html parser test .......................... 67968 Architecture .............................. 68000 Module getoptions ......................... 69190 Get Options ............................... 69222 Language Translation ...................... 69262 Utility Modules ........................... 70838 Application and tool directory ............ 71125 Test package .............................. 71173 Appendices ................................... 71184 File List ................................. 71188 Source List ............................... 72889 Include List .............................. 73002 Bug History ............................... 73062 Bugs (etc) for Version 1a7 ................ 73070 Bugs (etc) for Version 1a8 ................ 73528 Bugs (etc) for Version 1a9 ................ 73986 Installation Guide ........................ 74037 Make tarball .............................. 74379 Makefile .................................. 74411

2. Introduction[intro]----------------------

2.1. Interscript----------------

Interscript is a major breakthough in the design ofliterate programming tools. Other tools such as Web,C-Web, and FunnelWeb, have limited functionality, andare mainly restricted to partitioning code anddocumentation.

Interscript is different because it embodies a completeand fully functional programming language, namelyPython. An Interscript source file consists of three,not two, kinds of source: the target source code,documentation, and executable Python script. Because ofthis feature, the user source can extend the tool inarbitrary ways at 'run time' (without modifying theoriginal tool).

In addition, the basic Interscript tool has an advancedobject based design, including pluggable drivers fordata sources and sinks, including automatic filedownloading, tanglers for various programming languagesincluding C, C++, Python, generic script , and rawdata, and plugable weavers for various typesettingsystems including HTML, TeX, Postscript, and plaintext.

Interscript includes functionality designed to supportbuilding, testing, installation, and verion control ofsoftware. In particular, the standard system includesfunctionality designed to replace 'make'.

2.2. Features-------------

Interscript supports many powerful features.

2.2.1. Python Scripting-----------------------

Interscript uses Python 1.5.1 both for itsimplementation and to supply scripting services toclient sourceware.

2.2.2. Platform independent filenames-------------------------------------

Are required in interscript sourceware. Filenames mustbe unix style relative filenames; interscript mapsthese to host operating system filenames automatically.

2.2.3. Documentation Constructions----------------------------------

Several specialised documentation constructions areprovided apart from paragraphs and headings.

2.2.3.1. Automatic heading numbering------------------------------------

Headings are automatically numbered, and may be nestedto any depth.

2.2.3.2. Cross references-------------------------

References to any labelled part of the document can becited. Headings may be labelled, and a label may be setwithin ordinary text.

2.2.3.3. Tables---------------

Basic tables with headings are provided.

2.2.3.4. Nestable Lists-----------------------

Lists are provided in bullet, numbered and keyedstyles. Lists can be nested.

2.2.3.5. URL citation---------------------

A specialised contruction for citing a URL becomes anactive link in HTML documents.

2.2.3.6. Code and Prose displays--------------------------------

Display constructions for quoting code and prose areprovided. [There is no support for mathematics yet.]

2.2.3.7. Basic font selection-----------------------------

For emphasised, strong, italic, bold, or code fonts.

2.2.4. Advanced Web Weaving---------------------------

The advanced web weaver supports many features.

2.2.4.1. Automatic pagination on headings-----------------------------------------

The web weaver spawns child pages on each heading.

2.2.4.2. Syntax Highlighting----------------------------

For tokenising tanglers, syntax highlighting issupported by use of the SPAN tag with CLASS attribute.

2.2.4.3. CSS1 Cascading Style Sheets------------------------------------

A standard interscript style sheet, interscript.css isprovided. A dummy style sheet, user.css, is provided tosupport client style overrides.

2.2.4.4. Folding Table of Contents----------------------------------

Using ECMAscript, operates conditionally on presence ofInternet Explorer style document object model (DOM).

2.2.4.5. Frame presentation---------------------------

A simple three frame presentation with a master indexcontrol panel, an index frame for holding various crossreference tables, and a document view frame.

2.2.4.6. Flexible navigation----------------------------

Navigation is via a cross reference index, or viastandard navigation links per page or per code section.

2.2.4.7. Table of classes-------------------------

Points of definition of classes.

2.2.4.8. Table of functions---------------------------

Points of definition of functions and methods.

2.2.4.9. Table of identifiers-----------------------------

Points of definition and use for identifiers.

2.2.4.10. Table of sections---------------------------

List of sections making up a code file.

2.2.4.11. Table of tests------------------------

Table of tests with test results where applicable.

2.2.4.12. Convergence status----------------------------

List of all output files with an indication of whetherthe file was changed.

2.2.4.13. Source tree---------------------

Heirachical list of all input files.

2.2.5. PerlPOD support----------------------

Perl POD is recognized and woven into documents.

2.2.6. Extensive Python Support-------------------------------

Specialised Python support includes numerous generalpurpose extension modules, a function generatorsupporting Eiffel style assertions including protocolverification, preconditions, postconditions, and perargument documentation.

2.2.7. Latex, Plain text and flat Html Weavers----------------------------------------------

Weavers are provided for Latex2e, plain text, and asingle flat html file.

2.2.8. Html input filter------------------------

Allows simple Html to be processed as input.

2.2.9. XML support planned--------------------------

XML is not currently supported, but and XML weaver andinput filter is planned.

2.2.10. Unit testing--------------------

On the fly unit tests can be embedded in code and aremarshalled in a table. Special support for Python isavailable to execute the tests.

2.2.10.1. Test output verification----------------------------------

Comparison of expected and actual output presented in adifference table if gnu diff is available.

2.2.11. Option help-------------------

Is provided with the --help command line option.

2.2.12. Tutorial----------------

A tutorial is provided which covers some basicconcepts.

2.2.13. Statement of Requirements---------------------------------

Discusses and tabulates some specific requirements.

2.2.14. Design Document-----------------------

Discusses some design and implementation issues.

2.2.15. Full source listing---------------------------

Interscript is literate programmed with itself, so thata full source listing is embedded in the implementationdocumentation.

3. Requirements[req]--------------------

Interscript is a software component intended to assistin the publishing and development of software byproviding integrated source code, documentation,testing and project management facilities.

The detailed requirements stem from fundamental designdecisions including the choice of python as animplementation language, the decision to restrict theuser interface to batch processing of text files, andthe embodiment of the principal notions of literateprogramming.

The sections below discuss the requirements from thepoint of view of desired functionality, interface,implementation pragmatics, and management.

However, it is not possible to entirely separatediscussion of requirements from design, becausedetailed requirements stem from design decisions, andguide more detailed design. Nor is it possible toseparate implementation from design, sinceimplementation details have design consequences -- andalso provided unexpected opportunities forfunctionality not in the original requirements.

In other words you can expect some discussion of designand implementation in the section, although theemphasis is on requirements. See the next major sectionfor a discussion focusing on design, and the followingsection for a detailed description of theimplementation.

3.1. Functionality------------------

This section discusses the functionality Interscriptrequires.

The overall functionality is easy to describe:interscript must process input source files and fromthem extract the target program files, and extract andformat a suitable set of documentation files. Inaddition it must be able to compile and build thetarget software, execute test code, and present theresults of all these processes in a comprehensibleformat.

We will call the goal of program code extraction'tangling', that of document construction 'weaving',and that of compilation and test code execution'building'.

3.1.1. Tangling---------------

This section describes the requirements of the tanglinggoal, and is by far the simplest to state (andimplement).

The client will largely present files containing one ormore program code files, separated by documentationsections. The tangler basically extracts this code,concatenates the sections sequentially, and writes theoutput verbatim to code files. Code sections targetinga single output file must be associated, and the targetfile identified and its location determined.

The detailed requirements for tangling, therefore,largely focus on deviations from normal processing.

3.1.1.1. Supported Programming Languages----------------------------------------

Special parsers shall be supplied for C, C++, Python,Java, Eiffel, Pascal, Modula, Perl. Tcl, interscript,and Unix shell scripts are too quirky. Basic is toougly :-) Cobol, Fortran and PL/1 are too archaic.

3.1.1.2. Source Tracking------------------------

Source tracking is the ability to determine wheregenerated program and documentation sections came fromin the original source files. It is vital at thebuilding stage, so that errors can be corrected. Wheretarget tools identify the source of errors, they shouldbe guided if possible, to point at the originalsources.

3.1.1.2.1. C and C++--------------------

For C and C++, interscript must generate "#line"preprocessing directives.

3.1.1.3. Chunking-----------------

The ability to build program files out of order issometimes called chunking. It is sometimes useful, forexample, to define functions after they are used evenwhen the target programming language requiresotherwise. However, chunking is not restricted toreordering code sections, but may be considered toinclude the ability to nest sections hierarchically.This permits the author to represent program structurein manner not provided by the target programminglanguage.

There are some further variations on chunking. Thefirst consideration is that code often has to berepeated, and so a section may be used more than once,possibly in distinct program files, but sometimes evenin the same file. Because hierarchical chunkingrequires naming chunks, reuse is availableautomatically; the principal issue here is theconverse: to ensure when required that a chunk is usedexactly once.

Stemming from the use of labelled chunks is the need tobe able to locate them: when a chunk is used, one needsto find where it is defined, and sometimes conversely.

More generally: a simple verbatim code file is aspecial case of breaking a code file into sectionsseparated by documentation but without reordering,which is a special case of reorderable sections whichis a special case of hierarchical chunking. Again, atree is a special case of a directed acyclic graphs,which embodies the notion that a chunk may be usedtwice and also that it may not be used at all.

Some conventional literate programming tools go furtherin permitting what might be called macros: chunkscontaining unbound variables which can be bound at thepoint of use: a simple case of parameterized macroprocessing. Macros are particularly useful forgenerating repetitive forms such as tables.

In addition, we might consider conditional compilation,which is useful for controlling software variationssuch as platform dependencies and debugging versions.Conditional compilation is heavily used in C.

There is yet a more general form of chunking ... inwhich the notion of the chunk begins to degenerate,namely generation or arbitrary code by executablescript. Such facility is essential, for example, wherethe code to be generated is sensitive to theenvironment, for example the inclusion of the currentdate. Perhaps more interesting is the ability ofexecutable script to generate several filessimultaneously, for example both the declaration in aheader file and the definition in the body file, of afunction in the C programming language. Buildingspecialized scripts tailored to the client'srequirement is an essential facility of literateprogramming tools because, although rarely used, it cansave a lot of work and provide considerable coherence,as well as generating tabulated documentation.

To provide all these facilities, so the mostspecialized, and most common case integrates seamlesslywith the most complex and general, interscriptleverages the Python scripting engine. The most complexcode generation is supported almost effortlessly bysimply allowing the client to write arbitrary pythonscript, while the more common simpler requirements aresimply provided as pre-built routines: all thefacilities are accessed in precisely the same way, byscript execution.

This feature is central to the interscript design andcannot be isolation from a discussion of requirements,which can be recast, in some sense, to a discussion ofthe architectural framework in which such scriptexecutes, and the set of pre-built functions whichought be made available.

Macros with parameters, however, are not especiallygood at expression skeletons. A skeleton, orboilerplate, is essentially a macro with largearguments.

3.1.1.4. Parsing for reference tables-------------------------------------

Where possible, interscript should tokenise and parseor partially parse program files to extract summarydata.

3.1.1.4.1. Tokenisable Languages--------------------------------

The following languages can be tokenised easily: C,C++, Python, Java, Eiffel, Pascal, Modula. [Only thePython tokeniser has been written].

3.1.1.4.1.1. Identifier reference---------------------------------

For each language which can be tokenised, produce anidentifier reference: C, C++, Python, Java, Eiffel.[Table generator done for web, html, latex]

3.1.1.4.2. Fully Parsable Languages-----------------------------------

Python and Java. C and C++ cannot be fully parsedwithout semantic analysis, and even then, it is tricky:it is hard to determine if a statement is a functiondeclaration, variable declaration, or executable. [Onlypartial python parsing has been implemented]

3.1.1.4.3. Partially Parsable Languages---------------------------------------

C and C++ can be partially parsed: informationextracted may not be completely correct.

3.1.1.4.3.1. Class reference----------------------------

For each language which supports user defined types,particularly classes, provide a table of classes.Python, Java, C, C++, Eiffel, Pascal, Modula. [Tablegenerator done for web, html, latex]

3.1.1.4.4. Function reference-----------------------------

For languages supporting functions, provide a functionreference. This includes methods/member functions.Provide for C++, Python, Java, Eiffel, Pascal, Modula.[Table generator done for web, html]

3.1.1.5. Parsing for embedded documentation-------------------------------------------

3.1.1.5.1. Perl DOC-------------------

Perl DOC is arbitrary documentation provided incomments, to be converted to interscript method calls.[Done]

3.1.1.5.2. Java DOC-------------------

Java DOC is a commenting protocol, and depends onparsing to identify the entity to which thedocumentation refers. [Not implemented]

3.1.1.5.3. Eiffel-----------------

Eiffel provides dedicated documentation constructions.[Not implemented]

3.1.1.5.4. Python Doc strings-----------------------------

Python supports Doc strings with a standard protocolfor module and class documentation. Although the Pythonruntime supports module, class, and function Docstrings, there is no standard way to relate Doc stringsto functions. [Not implemented]

3.1.1.6. Internationalisation-----------------------------

Interscript tanglers should provide support formultiple human languages. There are two kinds ofsupport determined by the binding time:

Interscript time binding Interscript time binding allows generating target code for a specified language. If multiple languages are specified, multiple versions of the target code are generated.

Run time binding Run time binding allows generating target code for a specified language set. Even if multiple languages are specified, only a single version of the target code is generated.

There are two levels of support which can be provided,which are not independent of binding time.

Identifiers Using names in the programmers native language to aid maintenance. This is clearly an interscript time option.

Strings Using strings in the client native language to aid use. This can be done either at interscript time or target software run time.

See alsoline 710 for details on the documentationaspects of internationalisation.

3.1.2. Weaving--------------

Weaving is the most complex subsystem.

3.1.2.1. Individual Weavers---------------------------

There shall be weavers producing plain text [done],simple HTML 4 (html) [done], advanced HTML 4 multi page(web) [done], XML [not implemented], latex2e [done].

3.1.2.2. Individual weaver capabilities---------------------------------------

These capabilities must be provided in each separateweaver.

3.1.2.2.1. Fonts----------------

Weavers should support italic, bold, emphasized, strongand code fonts.

3.1.2.2.2. Lists----------------

Weavers shall support bullet, numbered, and keyedlists.

3.1.2.2.3. Displays-------------------

Weavers shall support displays for code and prose, bothinline and from a separate source. Latex, shall supportmath display.

3.1.2.2.4. Tables-----------------

Weavers shall support simple tables with horizontalspanning and column headings.

3.1.2.2.5. Headings-------------------

Weavers shall support multilevel headings withautomatic heading number generation, and labels(anchors).

3.1.2.2.6. Code echo--------------------

Weavers shall provide a method to displayed linenumbered code lines with labels (anchors).

3.1.2.2.7. Citations--------------------

Weavers shall support citation of URLs, print media,code files, and interscript generated documents.

3.1.2.3. Internationalisation-----------------------------

Internationalisation of documentation consists of twofacets: automatic substitution of fixed literals suchas the titles of tables, and providing for alternatetranslations of client documentary text.

The former requirement should be met by run timetangler binding of interscript itself, seeline 613 fordetails.

3.1.3. Building---------------

3.1.3.1. Python hosting-----------------------

Interscript shall be able to host python script,optionally capturing standard output.

3.1.3.2. Compilers------------------

Interscript shall host system compilers.

3.1.3.2.1. CPython------------------

For CPython, C and C++ compilers shall be hosted. Bothexecutable applications and dynamically loadable pythonmodules shall be supported.

3.1.3.2.2. JPython------------------

For JPython, the javac compiler shall be hosted.

3.1.3.3. Diff/Patch-------------------

Interscript shall provide a file comparison/change toolsimilar to Unix Diff and patch. The tools mustprovide/apply reversible differentials.

3.1.3.4. Executing external tools---------------------------------

Interscript shall host external applications.

3.2. Interface--------------

The normal understanding of this topic requiresdiscussion of how interscript is launched. As a commandline tool it provides a standard interface, anequivalent GUI hosted tool would be little different.Interscript also provides a Python callable API, whichis more interesting, but largely unimportant to mostusers who will not be embedding it.

As a batch oriented text file processing tool, thesecondary interface requirements which describe theorganization and format of the input source files, areconsiderably more important, since it is this interfacethat most clients will use most of the time.

However, we cannot relegate the presentation of outputs-- both documents and program files -- to discussion offunctionality because, as a specialized tool,interscript must constrain -- or at least guide --presentation to suit its purpose as a developmentenvironment.

Finally, we cannot omit consideration of how theimplementation interfaces to the underlying operatingsystem and its tools, because, in the developmentprocess, the client must use interscript to host thelaunching and management of these tools.

In summary: interscript interfacing involves everythingwhich has visual appearance including input sourcefiles, output documents and program files, andpresentation of client tool interfaces, especiallyerror output.

3.2.1. Unified command interface--------------------------------

One of the principal goals of interface design is toprovide an API which is simple and comprehensible,while at the same time providing comprehensive accessto the underlying functionality.

It is a basic lesson of software modelling thatinterfaces reflect architectural structure (orconversely that the API design influences the systemarchitecture.)

Interscript maintains a current state analogous to agraphics device context: instead of pens, brushes, andcanvases, the state can be partitioned into objectslike the current weaver and current tangler.

Thus, as for graphics state, interscript provides a@select() command, to select objects into the currentcontext.

Interscript requires clients construct individualtanglers, and provides the @tangler() command tofacilitate this. In addition, some tanglers providespecialised tanglers for parts of the targetprogramming language, such as strings and comments, orfor extensions to the programming language.

On the other hand, weavers are generally notconstructed individually by the LP author. Instead, thecommand line processor, or other launch script whichinvokes interscript, determines which documents formatsthe client desires: a default of none makes sense asthis effectively turns the interscript process intotangle only mode.

The command line then constructs weavers for eachdesired format and hooks these weavers onto amultiplexor device, called a weaver loom, whichdelegates method calls to each of these weavers.

By contrast, typesetting different versions of adocument requires that the user construct and manage amultiple looms. For example, if a document is writtenin one language and a translation into another is alsoprovided, we need two looms, one of which is selectedwhen a translation is available. When common text, suchas code, is typeset, both weaver looms are selected.

It is necessary that the author construct these looms,and do so in a way that depends on the formats selectedby the command line options. Furthermore, user commandline options must be available to determine whichwhether to typeset the document in English, French,both, or neither: it is not the job of the author tomake this decision.

3.3. Management---------------

All software projects, even small ones, can benefitfrom good management; and management depends onavailability not only of technical documentation, butalso meta-information.

Project management meta-information includesstatistical measures of volume and complexity(so-called software metrics),

3.3.1. Software Metrics-----------------------

Interscript shall provide some built-in reporting ofproject meta-information, including conventionalstatistics reporting the number of lines of code (LOC)for inputs, tangled code, and documentation, number ofclasses, functions, and other constructions defined.

Perhaps more significant, however, are measures of unittest results.

3.3.1.1. Change Impact Analysis-------------------------------

It is my personal opinon, that these primitive metricsare only of minimal utility: they're provided becausethey're easy to compute, rather than because they'reparticularly useful.

In my opinion, the kind of metric which is actually auseful measure of progress and software quality is whatI'll call a change impact analysis.

Extensive changes are indicators of poor softwarequality. Basically, if a programmer is changing manyfiles, the system is unstable and poorly structured.

If, on the other hand, changes are intensive, andisolated, the system is likely to be robust and wellstructured. The intense work being done indicates newfunctionality, improved performance, or correction of abug, rather than tediously re-engineering manyinterfaces as a consequence of a minor change to onemodule.

The theory behind this kind of metric is based onnotions of coupling and modular dependencies: if,according to principles of loose coupling, as espousedby Bertrand Meyer in Object Oriented SoftwareConstruction, a program is well modularised, thenchanges to a module should have a limited impact onother modules. Indeed, a fundamental tenant of theOpen/Closed principal (again from Meyer), and notionsof information hiding, is that variations in theimplementation of an interface should have no impact onother modules at all.

These indicators include documentation. Therefore,traditional development technology is certain to get alousy rating by an impact change analysis: requirementsare written first, then the system designed, thenimplemented, then tested, and finally documented (ifthere is any time left).

This is a woeful strategy. I subscribe to thephilosophy most strongly emphasised by Robert Martin:that all aspects of software development should be donetogether, because they feed back into each other.

Interscript is especially designed to support thissuperior strategy by providing integrateddocumentation, design, programming, and testingfacilities, along with progress indicators: it isdesigned to support ongoing simultaneous development ofrequirements, analysis, design, implementation,testing, and management. As such, there is no notion of'maintenance': the same strategy is used throught thelifetime of the software, although the investment ofresources will likely vary over this period.

Change impact analysis is best based on interscriptsources, because target programming languages providedubious support for proper localisation. For example, Cand C++ require separate header and body files:parallel maintenance of function signatures is animpediement to quality imposed by the language. Aninterscript programmer will write generating tools tounify declaration and definition of a C function, sothe parallel maintenance will be handled by thecomputer system.

It would be entirely wrong to interpret extensivechanges as bad practice, however. On the contrary,historical impact change analysis ought to showalternating phases of intensive and extensive changes.The reason is that while extensive changes indicatepoor software quality and lack of robustness, suchindication is a good thing if the changes are followedby intensive changes, because they may indicate theprogrammer has recognized poor structure, devised asolution, and implemented it.

On multi-programmer projects, it is essential tocoordinate the extensive change phase; whereas duringtimes of intensive development, programmers can largelybe left to work independently.

Any sizeable project which fails to go through at leastone complete reorganisation is likely to be of verydubious quality. For example: interscript went througha complete reoganisation before 1.0a7 was released; theoriginal single file was restructured into a package ofseparate modules, and the single global scope waspartitioned into several distinct frames.

3.3.1.2. Measuring Change Impact--------------------------------

Interscript shall provide change impact metrics. Itisn't clear to me at this time exactly what to measureand how. The basic idea is to compare all the sourcefiles before each run, and record which ones have beenchanged. If, after some time period, one observeschanges to a small number of files, the development isintensive, whereas if one observes changes to a largenumber of files, the development is extensive. If thereare a large number of changes, the project is in adevelopment phase, whereas if there are a small numberof changes, debugging or tuning is indicated.

Management should correlate these predictors with oralcommunication and written status reports. Furthermore,examining the particular location of changes may helppinpoint a problem and suggest reallocation of humanresources.

3.3.2. Programming metrics--------------------------

Because projects and management styles differ, and anynotion of software metrics is new enough to be classeda black art rather than a science, interscript shallsupport user programmable metrics. Just as programmerscan use the Python scripting engine to generate code ordocumentation, in accordance with their needs, so toocan new software metrics be devised and programmed.

For this reason, interscript shall provide a basicstatistical analysis package. In addition, apart fromstandard metrics, interscript shall provide hooks forgleaning other data required for computation of thesenew metrics. [It is not clear at this time how to dothis.]

3.4. Implementation pragmatics------------------------------

This section discusses requirements in terms of theconstraints imposed by an implementation. There areseveral requirements all software must meet:

Performance Interscript must be very fast because it must be executed after every change to program source code or documentation. If this additional processing time is significant compared to the time other build components such as compilers take, it is likely to be an impediment to practical software development, especially that for which documentation is not as highly rated as functionality.

Portability Interscript must be highly portable: it must at least be able to execute on Unix, Windows, and Mac platforms with minimal installation and maintenance hassles. The installation and configuration must be manageable by people who are specialists in some computing field -- not necessarily the interscript implementation language or the host platform.

Accessibility Interscript must be easy to get hold of.

Client side maintenance Because the repertoire of facilities a literate programming tool could provide are huge, upgrades must be easy to install, must not compromise client configuration, and must remain largely compatible with sources already developed by interscript.

Third party development Interscript must provide third party development opportunities. This is essential when required functionality is certain to be highly deviant, and also very specific.

4. Design Fundamentals[design]------------------------------

This part looks at various issues in the design ofInterscript.

4.1. Devices------------

A 'device' is an object representing an actual input,output, or execution stream such as a disk file.Devices are simple objects which read, write, orexecute data without parsing or interpretation.Instead, they interface to actual system device objectssuch as disk files, remote files, or text stored inmemory. In Interscript parlance, an input device iscalled a 'source', an output device is called a _sink_,and a device which acts first as sink, and subsequentlysources data that was sunk into it, is called a'store'.

The input to interscript is represented by a sourcedriver object. Both the tanglers and weavers writeoutput via sink driver objects.

Currently implemented sources:

named_file_source The most commonly used source driver reads a disk file named by a native filename.

ftp_file_source This device initially reads files from an FTP host and creates a local copy. The local copy is read until it is deemed too old, at which point it is fetched again by FTP.

http_file_source This device initially reads files from an HTTP host and creates a local copy. The local copy is read until it is deemed too old, at which point it is fetched again by HTTP.

url_file_source Reads a file given a URL. Currently, ftp, http, gopher, and local files are supported. There is no caching.

stdin_source Reads a file from standard input.

null_sink Throws away data.

simple_named_file_sink Writes a disk file given a native filename.

named_file_sink This device writes a temporary disk file, and copies it to a nominated file if the two differ, both to prevent touching unchanged files, and to ensure the previous file is available during parsing, allowing generated files to be included back into the document.

stdout_sink Writes to standard output.

memory Writes and reads from memory.

disk Writes and reads to disk (with separate read and write heads).

Under development are patch readers and writers. Apatch writer compares two files, and, instead ofreplacing the old file with the new one if they differ,writes a patch file instead. The corresponding readerapplies the patches to a copy of the old file and readsthat instead. This mechanism provides rudimentaryversion control, allows stable files to be writeprotected, and permits posting and using patches byemail or news.

Writers for news and email are also in the works. Theemail sink device is particularly useful to allowautomatic regular updates (run as chron jobs) to sendadvice to the client.

The URL reader currently doesn't use a cache, becauseit uses the standard Python function 'urlopen', whichdoesn't use a cache. Hopefully this function will beupgraded to check expiration headers on http servers,and cache files locally.

A 'tee' writer --- a device which writes to severalother devices -- is planned.

4.1.1. File names-----------------

Common interscript commands always refer to files usingUnix relative filename convention, even on non-Unixplatforms; the interscript command line processor alsorequires master filenames given on the command line tofollow this convention. These filenames are convertedto the native format internally. The purpose of thismechanism is to ensure distributed source documents areplatform independent.

Interscript requires the native operating systemsupport long case sensitive filenames including theupper and lower case latin-1 (ASCII) letters, digits,and underscore, and a heirarchical directory systemwith some kind of current directory concept: thesefeatures are supported by all modern Unix , Windows,and Macintosh platforms. Note that interscript cannotoperate on DOS or Win3.x platforms.

4.1.1.1. Named File Source Names--------------------------------

Interscript 'named_file_source' takes two argumentsidentifying a local disk file. The first, mandatory,component is name in Unix relative filename format, andthe second, optional, component, is a prefix in nativeoperating system format.

This pair of names is the interscript file nameconvention. If the prefix is empty, it is replaced bythe absolute pathname of the current directory innative operating system format, including a trailingdirectory separator. Then the Unix filename isconverted to a native operating system filename, andappended to the prefix.

Commands such as 'include_file' which refer to sourcessupply the directory of the current file as the prefixso that the argument filename is relative to thelocation of the including input file. All sucharguments must be relative filenames in Unix format forthis reason.

Note that the command line provides the'--source-prefix=' option for the same reason: afilename given on the command line as a master sourcefile must be given as a relative filename in Unixformat, even on non-Unix systems. The source-prefixoption permits the native operating system name of thedirectory containing the file to be specified.

4.1.1.2. Named File Sink Names------------------------------

Interscript 'named_file_sink' takes two argumentsidentifying a local disk file. The first, mandatory,component is name in Unix relative filename format, andthe second, optional, component, is a prefix in nativeoperating system format.

This pair of names is the interscript file nameconvention. If the prefix is empty, it is replaced bythe absolute pathname of the current directory innative operating system format, including a trailingdirectory separator. The the Unix filename is thenconverted to a native operating system filename, andappended to the prefix.

The command line supports four options,'--tangler-prefix=', '--tangler-directory=','--weaver-prefix=', and '--weaver-directory=', whichfacilitate placement of outputs.

The two directory options allow a Unix relativefilename (or other prefixing characters) to beprepended to all filenames used for tangler or weaveroutputs, respectively.

The two prefix options allow a native operating systemformat filename to specify the output directory fortangler and weaver files. If the prefix is empty, theabsolute pathname of the current directory, includingtrailing directory separator, is used. The Unixfilename is converted to native operating system formatand appended to the prefix.

4.2. Processors---------------

A _processor_ is an object that performs someprocessing function. Processors are generally hooked upto to devices to read or write data. The separation ofprocessing data streams, and sourcing and sinking thosestreams, is the traditional operating system facilityknown as device independence.

Whereas a file, denoted by a filename, is used in anoperating system to implement device independence, inInterscript, a Python object is used instead.

The design of Interscript consists of four processors:the input, the code tangler, the document weaver, andthe Python engine.

The Interscript parser reads and parses the input andsends lines to one of the three outputs. If the linebegins with the special warning character @ it is sentto Python, otherwise if the tangler is not None it issent to the tangler, otherwise it is sent to theweaver. If the tangler receives the line, it is writtento a code file and echoed to the weaver.

The main source of power in this system is the abilityto execute arbitrary Python script. Interscript hassome builtin commands and data structures to facilitatecontrol.

4.3. Tanglers-------------

A tangler is an object that is designed to processsource code in some programming language for which thetangler is specialised. Tanglers are generally selectedby the @select() command in the source. Output to atangler is disabled by most documentation commands, sothe system reverts to generating documentation.

Tanglers can be stacked. Typically, test code or headercode will be embedded in files containing definitions.

Interscript comes with specialised tanglers for,several languages. The list below shows the currentlyimplemented special features of these tanglers.

data No special features.

C Tracks source with #linedirectives. Associated string and comment tanglers. Parses identifiers (badly).

C++ Tracks source with #line directives. Associated string and comment tanglers. Parses identifiers (badly).

python Tracks source with #line directives. Associated comment tangler. Parses identifiers properly.

perl Tracks source with #line directives. Parses and processes Plain Old Documentation constructions.

java Associated string and comment tanglers.

interscript No special features.

4.4. Lexical Scoping--------------------

The system maintains a stack of objects called inputframes to track input sources. Input can be stackedusing the "@include_file()" command, which isequivalent to a subroutine call. The stack is poppedwhen the included file is exhausted.

The commands "@begin()" and "@end()" can also be usedto push and pop the input stack, this is equivalent toa nested block.

User defined symbols are lexically scoped. The systemcurrently maintains a dictionary of user symbols witheach stack frame: all assignments enter the symbol intothe dictionary of the top of stack frame.

When the stack is pushed, the new top of stackdictionary is initialised by a copy of the old top ofstack dictionary. User symbols are searched for firstin the top of stack user dictionary, and then in theglobal interscript namespace.

As well as supporting scoped symbols, the parser isscoped. That is, changes to lexicology or processingmode made by modifying the parser tables are lost whenthe frame is dropped. This ensures that, for example, achange to the warning character in an included filedoes not affect the interpretation of the includingfile.

4.5. Weaver Control-------------------

Weaver control is the most complex. Weavers can bestacked, for example to allow summary files or notes tobe built incrementally. For HTML, detail pages can bestacked on a master, with hypertext links in the masterto access them.

More than one weaver can be active at once, so that,for example, a HTML Web site, a LaTeX book text, and aplain text news article can be generatedsimultaneously. Document lines are sent to all theactive weavers for processing.

Some typesetting constructions are too complex orspecialised to be represented in all weavers. In thiscase, verbatim text can be sent to a particular weaver,or sent to any active weavers obeying a specificprotocol.

4.6. Source Tracking--------------------

In the beginning, every system is a collection oforiginal source files, one of which is designated asthe initial (master) source. Interscript beginsprocessing the initial source, and may switch toanother source as a result of executing some commandsuch as 'include_file'. This can be done to break along work into sections, or to include common 'macros'(python script).

Each source must have a definite name, and each lineread is counted. This allows references to the originalsource to be generated in the code (and documentation)files, so that errors reported by language processorscan be corrected. The code files cannot be edited,because they are generated by Interscript and anychanges would be overwritten by the next processingrun.

A related reason for source tracking is to generatecross reference tables. For example some tanglersgenerate an identifier cross reference, which can beused, for example, to disentangle duplicated names. Thetables can also be used to generate vi editor tags, forexample.

Every line of output must have an associated originalsource. That includes code that is saved temporarily(in memory, or to a disk file), and it includesgenerated code.

In order to accurately track original source lines,each input driver must have a name, usually the inputfile name, and must count lines read. When a line isread by the parser, the filename and line number arereturned as well. The parser passes this information onto the relevant processor, usually the executor,tangler, or weaver.

In turn, the tangler, for example, will check the sinkdriver to see if the otput is synchronised with theoriginal source, and then store the source filename andline number in the sink driver.

If since the last write to the sink, not necessarily bythis tangler, the original source filename has changed,or the line number is not one more than last time, thena section header is written to the sink file before thedata. In C and C++ a #line directive is generated, inlanguages not supporting such directives (such as HTMLand Python), a comment is generated instead.(Coincidentally, many scripting languages use a leading# for a comment and so the C style #line directives isgenerated)

4.7. Parsing------------

Interscript is a language, and needs a parser. Parsingis a complex task. The main control algorithm, however,uses a very simple syntax driven parsing engine. Theparser table is a Python list of pairs. Each pair is atuple consisting of a compiled regular expression and afunction. The parser gets one line at a time from thesource, and runs through the list, attempting to matchthe line against a regular expression. When a match isfound, the corresponding function is called with thematch and source reference data as an argument.

Because this scheme is very simple, it can be extendedor modified easily by the end user.

Because the function invoked can read further linesfrom the input, more sophisticated parsing can beprogrammed. For example, the Python suite executionfunction matches against a line starting with an @ andending with an : or other character that indicates anincomplete suite. The function reads further lines upto the end of the script before executing the lot as asingle Python suite.

4.8. Documentation Constructions--------------------------------

Interscript supplies uses with various documentationconstructions. These include standard constructionssuch as a title, multilevel document headings, pageheadings, table of contents, index, nestableenumerated, bulletted, and keyed lists, displays (longquotations), footnotes, and tables. For a full list ofsupported constructions see below.

The requirements here are to support a rich enough setof constructions, with a fine enough level of controlof details, to do handle the bulk of work which wouldbe required by a serious author, while at the same timeproviding the casual programmer simple enough tools totypeset basic program documentation.

In addition, Interscript supplies constructionsspecialised to literate programming. Naturally, thereis a specialised construction for source code display,and some tables such as a list of files, and identifierindex unique to literate programmed code. In addition,it is necessary to be able to typeset code fragmentsfor 'examples of use', even though literate programmingdiscourages this. (Give real examples!)

These requirements have to balanced against theefficiency of the translator, the ease ofimplementation of the constructions, and theavailability of features in typesetting systems. How doyou typeset diagrams in plain text? (With difficulty)What about diagrams, colour, pictures, and fontcontrol?

4.9. Microformatting shortcuts------------------------------

By microformatting, I mean things like emphasising asingle word. Most typsetters can set a plain, bold,italic, underlined, and mono-spaced font (but not plaintext). Support for mathematics, however, is morelimited.

The general solution to microformatting is to havespecialised parsers. Just as the line by line parsercan be extended by the user, specialisedmicroformatting can be provided by the end user bywriting a parser to further translate document source.Naturally, the weavers to be used will have to besupport the constructions.

Weavers already perform some parsing. For example theLatex weaver has to translate the characters #$%^_ intothe Latex macros that produce them, since they'rereserved characters in standard Latex.

The biggest problem here is to specify a standardmicroformatting language. It is not too onerous toreserve the @ character at the beginning of a line, buthow does one designate three special fonts (bold,italic and monospaced) and the scope they apply to?What about font size? For something more difficult,mathematics?

Any such language must eat up characters which can betypeset 'as is': the fewer such reserved characters,the more cluttered the source will become, whereas ifmore are reserved, the more likely the user is toforget to quote them properly when the character itselfis required instead of 'magic'.

HTML reserves and uses tag pairs to do detailedmarkup, and uses & to allow quoting. Latex reserves#$%^&_\. Interscript reserves @ at the beginning of theline.

It is possible to do all formatting using lines. Butthat leads to a 'troff' like solution, which isextremely ugly. It should be possible to write normaltext and have it print properly -- and for a programmerthat will include setting special characters.Typesetting C code documentation in plain Latex is apain because underscore means subscript and is an erroroutside maths mode: but underscore is more or less theC version of a hyphen, and more or less an alphabeticcharacter.

The characters we can afford to reserve are those notcommonly used in program documentation. There aren'tany. Here's the proof by analogy: if we reserve @, forexample, then in the very documentation describing theconstruction implemented using the @ character, themost commonly used special character will, of course,be @.

The solution I have adopted to this intransigentproblem is as follows. First, all the constructionshave to be provided as commands. That means thatirrespective of other details, all the constructionsare available, even if it is a pain to typeset them.

Secondly, we provide regular expression matchingtechnology to extract microformatting details usingsome standard forms, but we will not enable it bydefault.

I'll call these things 'shortcuts'. For example, thefirst shortcut for code is an @ followed by a Cidentifier. An @ in any other context is typeset as an@.

Shortcuts are implemented by weavers. (The control loopnever sees them). To provide typesetter independentshortcuts, we need a special kind of weaver: a filter.A filtering weaver translates shortcuts and then callsthe normal weaver.

Interscript comes with a standard filtering weaver, andis equipped with a user programmable table of shortcutsbased on regular expression matching. The defaultversion of this weaver does not do any shortcuts,however. Shortcuts must be explicitly enabled by theprogrammer. However, there is a table of standardshortcuts prepared, and a command to enable them.

4.10. Weaver Architecture-------------------------

Interscript operates on the assumption that there isexactly one weaver that can process all weavercommands.

4.10.1. Multiple Typesetters----------------------------

Often, we want to weave the same document in differentformats, for example, using Latex for book output, HTMLfor a web, and plain text for email and news.

This facility is provided by a weaver front end calleda multiplexor. The multiplexor keeps a list of activeweavers, and sends every method call to all weavers inthe list that supports that method. It's not an errorif the weaver doesn't have the method, but it certainlyis an error if it does, but the call fails.

4.10.1.1. Raw output--------------------

When Interscript does not support detailedconstructions, it is necssary to hard code them intothe document. For example, if you really want frames inHTML, or category theory diagrams in LaTex, you have tocode raw HTML or LaTex (probably XYpic) becauseInterscript doesn't provide a generic interface tosupport.

The principal mechanism for raw output is to put itbetween a @rawif(protocol) command, and a @translate()command. The rawif command disables a target weaverunless it supports the nominated protocol, in whichcase, it is put in raw mode, whereas the enable commandenables the weaver and put it in translating mode.

If the weaver is the multiplexor, it dispatches thesecommands to all the weavers attached to it, therebyallowing raw output to be written to the subset ofweavers supporting it.

Generally, you should provide raw typesetter data forevery possible typesetter so that _something_ istypeset in every format of the document. (even if it isjust 'paste the diagram here' :-)

Every weaver constructs a protocol list when it iscreated, but the method 'add_tag' can be called to addanother protocol to a weaver. The standard protocolnames are html, latex, and text for the html and webweavers, the latex weaver, and the plain text weaver,respectively.

This mechanism is design to be used in documents ofparts of documents without requiring knowledge of whichweavers are active. If a particular weaver is activeand accessible, it can be controlled directly instead,but this is recommended only for specialised documents.

4.10.2. Multiple Human Languages--------------------------------

Sometimes, we want to prepare the same document inseveral human languages. Interscript cannot translateEnglish to German, for example, so it is necessary toprovide documentation text in both languages. Theprogram codes, however, are usually in common, exceptpossibly for comments.

Interscript can generate a multiple versions of asingle document (and each version will be generated inall the selected formats) using the same taggingmechanism that is used to control raw output. You canwrite sections of English documentation after thatcommand @enableif('English') and sections of Germanversion after the command @enableif('German'), and thiswill disable all weavers not supporting the nominatedprotocol.

This mechanism applies to interscript program codecomment commands. The commands generate ordinary woventext, but are also inserted into the tangled outputfiles. In this case, comments will be inserted in theselected language or languages. Be aware that whilethis will not change program semantics it will changethe physical source file.

It is also possible to generate string constants indifferent human languages with interscript, but this isa tangler function, having no special effect onweaving: because of the complexities of this issue, itmust be effected using python script crafted by theauthor for this purpose ... in other words there are nospecial commands for it :-)

4.10.3. Multiple Documents--------------------------

Sometimes, we wish to construct several documentssimultaneously. For example, we may have a short andlong version of a document. We need to select whichweavers to write to. It is easy to do this in simplecases by just assigning the weaver. For example:

@both = multiplexor((long, short)) @weaver = both This document describes Interscript. @weaver = long Here are some gory details.

Because a multiplexor represents a set of documents,and because one can multiplex multiplexors, it is easyto create small sets of weavers, and then createvarious unions of these sets.

4.10.4. Cumulated Appendices----------------------------

Sometimes it is useful to accumulate the text for adocument thru the source: the table of contents is anexample of this. Another important example is an issueslist: details of bugs or issues are written near therelevant source, collected, and printed as an appendix.

Other examples include footnotes, which are usuallyprinted all togther at the end of articles or chaptersin some styles, summaries of test results, etc. [To becontinued and stuff implemented]

5. Tutorial[tut]----------------

An easy introduction to the Interscript literateprogramming tool and environment. Please noteInterscript is still experimental, and the command setand architecture are not frozen.

5.1. Weaving a document-----------------------

To create a plain document is easy. First, you shouldcreate a heading like this, then type documentation.On-the-fly interscript for test 1 follows. 12: @head(1,'My Document') 13: This is a document describing Interscript, which 14: is a literate programming tool. You can use 15: any characters you like in the document, 16: such as ~!@#$%^&*(), with one exception: 17: you should not start a documentation line with @ in 18: column 1. The @ character in column 1 is used to flag a command.Test output at ../tests/output/test_1.html. Logfile at../tests/output/test_1.log.

5.1.1. Headings---------------

You can create sub-headings, and sub-subheadings, justuse a @head(n,'Heading') command with n set to theheading level you want. Headings are numbered from 1,level 1 is the biggest heading. (Technically, thedocument title is a level 0 heading.)

Heading levels should go up consecutively, becauseInterscript numbers all headings automatically. Here'sa document with several headings. On-the-flyinterscript for test 2 follows. 33: @head(1,'Several Headings') 34: Test with several headings. 35: 36: @head(2,'First Subheading') 37: Under the first subheading is a subsubheading. 38: 39: @head(3,'First Subsubheading') 40: Some details here. 41: 42: @head(3,'Second Subsubheading') 43: Some more details here. 44: 45: @head(2,'Second Subheading') 46: Here's the second subhead. And the last 47: text in the document.Test output at ../tests/output/test_2.html. Logfile at../tests/output/test_2.log.

5.1.2. Separating paragraphs----------------------------

You can separate paragraphs with the command

@p()

Blanks lines do not separate paragraphs. This isdeliberate. Any number of blank lines translates to asingle space. This allows you to separate parts of yourInterscript document with vertical white space. It isparticularly useful to add blank lines before headings.For example: On-the-fly interscript for test 3 follows. 61: @head(1,'A document') 62: We are flying to the moon today. 63: @p() 64: But not just any moon. The moon of Mars. 65: 66: @head(1,'Phobos') 67: Actually, there are two Martian moons. 68: @p() 69: @p() 70: @p() 71: 72: @p() 73: 74: @p() 75: One of them is Phobos.Test output at ../tests/output/test_3.html. Logfile at../tests/output/test_3.log.

You should note that @p() is idempotent, which is afancy way of saying two or more of them in a row arethe same as one. You can't add extra space betweenparagraphs. Not even by putting dummy blank lines inbetween.

5.1.3. Line and Page breaks---------------------------

You can force a line and page break with the commands

@line_break() @page_break()

respectively. On-the-fly interscript for test 4follows. 90: @head(1,'A break test') 91: Here is a short line 92: @line_break() 93: and another 94: @line_break() 95: and another. 96: End of page. 97: @page_break() 98: Now a new page.Test output at ../tests/output/test_4.html. Logfile at../tests/output/test_4.log.

5.1.4. Displaying code examples-------------------------------

In this tutorial, I've been showing you some examplecode. That is something most documentation writers wantto do. You can do it too, like this: On-the-flyinterscript for test 5 follows. 107: @head(1,'Code displays') 108: Here is a code display: 109: @begin_displayed_code() 110: while 1: 111: print 'Hello again and again' 112: @end_displayed_code()Test output at ../tests/output/test_5.html. Logfile at../tests/output/test_5.log.

5.1.5. Running Interscript--------------------------

Well, you should try an example file. To process a filewe'll say:

python iscr.py --weaver=html example.pak

where 'example.pak' is the name of the Interscriptdocument. This will create a single file"example.html". Try it!

If you prefer HTML split into lots of little pages,try:

python iscr.py --weaver=web example.pak

This produces a file "example_top.html", and a numberof auxiliary files, and a file for each heading.

You can also generate latex2e and plain text with thecommands:

python iscr.py --weaver=latex example.pak

which produces a file "example.tex", and,

python iscr.py --weaver=text example.pak

which produces a file example.txt. you can put morethan one file name at the end too. Each such documentwill be processed separately.

5.1.5.1. Option help--------------------

If you type:

python iscr.py --help

you will get a complete list of available options.

5.1.5.2. Passes---------------

The passes options causes Interscript to process filesmore than once. This is sometimes necessary to getcross reference information right. The default iscurrently 1.

python iscr.py --weaver=text --passes=2 example.pak

Interscript may stop before running the specifiednumber of passes. It will do this if, and only if,every buffered disk file ("named_file_sink") wouldwrite an output the same as the existing file. In thatcase, it assumes further passes wouldn't changeanything, and stops. This is called convergence.

5.1.5.3. Tangling parts-----------------------

If you construct your Interscript sources as a tree,using the "@include_file()" command, and you followcertain rules, you can run Interscript on the includedfile to extract the code, for just that file. TheInterscript sources are constructed this way. Thisfeature is vital for building big systems because itallows you to extract the code from files you havechanged, without extracting code from those that havenot.

You must ensure that code files are lexically containedentirely in a single include file. More generally, theinclude file does not rely on any context from itsparent (except for that which is determined from thecommand line).

If you weave an include file, you will get a separatedocument for that include file which will, in general,not be linked to the master document: it will be in aseparate file, named after the include file, andheadings will be numbered separately.

[There is currently no simple way to require a separatesource be built entirely independently so that themaster document can link to it. This would beespecially useful, because it would also permit timestamps to be checked and avoid unnecessary processing.]

5.2. Tangling code------------------

So far, we have just produced a document. What aboutprogramming? Here's a sample document with tangling.On-the-fly interscript for test 6 follows. 196: @py = tangler('interscript/tests/output/mymodule.py') 197: @head(1,'My Module') 198: This is my very own module. 199: @select(py) 200: import sys 201: class myclass: 202: def __init__(self, name): 203: self.name = name 204: @head(2,'hello method') 205: Just says hello. 206: @select(py) 207: def hello(self): 208: print 'hello','self.name 209: @doc() 210: And now back to doco.Test output at ../tests/output/test_6.html. Logfile at../tests/output/test_6.log.

5.2.1. Tangler Constructors---------------------------

In the previous example, we used the line

@py = tangler('interscript/tests/output/mymodule.py')

to construct a tangler object. The general form of thetangler constructor is

tangler(device,language, *args, **kwds)

The language is a programming language must be theinterscript key for a particular tangler, which isnamed as the key plus the suffix '_tangler'. Thetangler must be defined in a module language andprefixed with 'interscript.tanglers.'.

If the language is not given, it defaults to 'deduce',which tells the function to guess the right tangler byexamining the filename extension. Therefore the exampleabove is equivalent to

@py = tangler('interscript/tests/output/mymodule.py', 'python')

because interscript knows that the extension 'py' isfor a python code target. If deduction cannot pick aspecialised tangler, the built-in data tangler is used.

The device argument can be an object supporting sinkprotocol, in which case that sink object is useddirectly. The device argument can be an objectsupporting filename protocol, in which case anamed_file_sink object for that file is automaticallycreated as the tangler sink. Note that strings supportfilename protocol, which is why the example works.

The balance of the arguments are passed to the tangleras options.

5.2.2. Original Source References---------------------------------

Here's the code from the previous example again:

Start python section to interscript/tests/output/mymodule2.py[1]

1: #line 253 "tutorial.pak"

End python section to interscript/tests/output/mymodule2.py[1]

Notice the #line directives. They're called _originalsource references _because they refer to the original,editable, source file containing the code. If you arecreating C programs, the compiler will recognize themand report compiler errors in the original file.Integrated development environments will put the cursorright in the middle of the Interscript source. This isnecessary, you must not edit the code file. Yourchanges will get clobbered next time you runInterscript on the original source.

5.2.3. Code sections--------------------

As you can see, the code is displayed with linenumbers, and the file is named at the beginning andending of each chunk. Interscript calls these chunks_code sections _and the doco between them _documentsections. _ So basically, Interscript allows you to_interleave _code and document sections. This is called_gathering _ in some other literate programming tools.You can end a code section with an @head() command, ora @doc() command, which switches back to document mode.

Just so you can see it again, here is some codeinterleaved with documentation.




And now for the hello method:




You should see two code sections, embedded indocumentation.

5.3. Scripting--------------

Traditional literate programming tools have twoconceptual processes: _weaving _ (a document) and_tangling _ (a code file) which separate outinterleaved document and code sections (respectively).

Interscript has a third kind of section, the _scriptsection. _In case you're wondering what script sectionslook like, well, you've already used them. All thoselines starting with @ are just executable pythonscript. They aren't really special magical commands,just function calls to predefined python functions.

You can write any python script you like in a scriptsection. On-the-fly interscript for test 7 follows. 339: @name = 'John Skaller' 340: @print 'Hello',name 341: @print 'Running Python',sys.version 342: @head(1,'Hello World from '+name) 343: This is a scripting test. 344: @weave('Written by '+name+'.')Test output at ../tests/output/test_7.html. Logfile at../tests/output/test_7.log.

Notice you don't have to import sys: it is alreadyimported, because it is used in Interscript. [Add listof imported modules here]

You will also notice the

@weave(text)

command, which weaves its argument: use this when youwant to calculate text, as in the example. For moreinformation on the weave command, seeUnknown Label:weave command.

You should be careful with this feature. It isimmensely powerful! You can use it to test programs,and to extend Interscript for you needs in a_particular _document -- without changing the actualsource code for Interscript. See http://www.python.orgto find out more about python.

5.3.1. Long script sections---------------------------

You can code long script sections such as a classdefinition. The rule is: a long script section isstarted by a line starting with @ and ending with :,(or one of the characters that python would recognizethat signifies that there is more to come. You mustthen indent the code with exactly one extra space. Along script section is ended by the first line nothaving a space character in column 1 (or the end offile).

The whole of a long script section is collected andthen executed at once. On-the-fly interscript for test8 follows. 385: @head(1,'Long script sections') 386: Here is a long script. We define a class MyClass. 387: @class MyClass: 388: def __init__(self, name): 389: self.name = name 390: def hello(self): 391: print 'Hello',self.name 392: 393: # test it 394: me = MyClass('John') 395: me.hello() 396: deliberate error 397: @doc() 398: After all that, the deliberate error is ignored.Test output at ../tests/output/test_8.html. Logfile at../tests/output/test_8.log.

Errors in script sections are reported with a tracebackto the logfile, but do not halt processing. You cannotterminate an Interscript processing run inside a scriptsection, not even with sys.exit(). [Interscript can beterminated with a keyboard interrupt or system abortsignal, however.]

5.3.2. Very Long script sections--------------------------------

There's a better way to code long script sections,using the "python()" command. Here's an example:On-the-fly interscript for test 9 follows. 412: @head(1,'Long Script test') 413: Some script should generate a string of 3 'My names', 414: separated by 3 dashes. 415: @python('//') 416: x = 2 417: y = 3 418: z = 'My name' 419: weave((z + '-' * x) * y) 420: //Test output at ../tests/output/test_9.html. Logfile at../tests/output/test_9.log.

The 'python' command accepts a string argument which isa terminator line for the script section. The wholesection is gathered, without any processing, and thenexecuted. There is a danger to be aware of: if youdon't put the terminator in correctly, the command willread all the way to the end of the file.

5.4. Unit tests---------------

It is possible to test python script 'on the fly' as inthe example: On-the-fly interscript for test 10follows. 432: @head(1,'A python test') 433: Test the test_python function. 434: @test_python(hlevel=2,descr='A simple test',source_terminator='//') 435: print 'A simple test' 436: //Test output at ../tests/output/test_10.html. Logfile at../tests/output/test_10.log.

The test is also registered in a table of tests.

It is also possible to provide expected output:Interscript will verify your code by comparing theexpected and actual output, and print a differencetable if the test failed. Note that the differencetable is only available if the module"interscript.utilities.diff" is available and operatescorrectly: the current implementation uses GNU diffinvoked using "os.system()". Here's an example thatveifies OK: On-the-fly interscript for test 11 follows. 449: @head(1,'A diff OK test') 450: Test the test_python function. 451: @test_python(hlevel=2,\ 452: descr='A diff OK test',source_terminator='//', expected_terminator='//') 453: print 'A simple diff test' 454: print 'A simple diff test line 2' 455: print 'A simple diff test line 3' 456: // 457: A simple diff test 458: A simple diff test line 2 459: A simple diff test line 3 460: //Test output at ../tests/output/test_11.html. Logfile at../tests/output/test_11.log.

And here's one that should fail: On-the-fly interscriptfor test 12 follows. 464: @head(1,'A diff fail test') 465: Test the test_python function. 466: @test_python(hlevel=2,\ 467: descr='A diff test',source_terminator='//', expected_terminator='//') 468: print 'A simple diff test' 469: print 'A simple diff test line 2' 470: print 'A simple diff test line 3' 471: // 472: A simple diff test 473: A simple diff test line 2 CHANGED 474: A simple diff test line 3 475: //Test output at ../tests/output/test_12.html. Logfile at../tests/output/test_12.log.

5.4.1. Weaver Control---------------------

The python function 'get_weaver()' refers to thecurrent weaver. You can use it to call methods directlyon the weaver. For example:

@weaver = get_weaver() @weaver.write('Antidis') @weaver.begin_bold() @weaver.write('establishmentarianism') @weaver.end_bold() is a long word. Note the space on this line!

which comes out as Antidisestablishmentarianism is along word. Note the space on this line!

You can also set the weaver. Suppose you have a weavermynotes_weaver, then you can write:

@old_weaver = get_weaver() @set_weaver(mynotes_weaver) This is woven into notes. @set_weaver(old_weaver)

To make this more convenient, the set_weaver functionreturns the current weaver so you can write:

@old_weaver = set_weaver(mynotes_weaver) This is woven into notes. @set_weaver(old_weaver)

Even more convenient, you can push and pop weavers ontoa stack using

@push_weaver(mynotes_weaver) This is woven into notes. @pop_weaver()

The current weaver is lexically scoped.

5.4.2. Perl hates @-------------------

If you are programming Perl (or Interscript!) you willhate having @ as the warning character for scriptsections. There are two ways around this.

You can use two @ characters at the beginning of aline.

@@p = @x

or you can use a command like:

@set_warning_character(python='!')

which will set the python warning character to !instead of @. Advanced Note. This change applies onlyto the current file, and only to the end of thecontaining block, if any. The effect will not be passedup to an including file, and it won't be inherited byan included file either.

5.5. Fonts----------

The following commands change font. Note that begin/endpairs must be balanced, and nesting may or may not besupported, depending on the weaver.

@begin_emphasize() @end_emphasize()

@begin_strong() @end_strong()

@begin_code() @end_code()

@begin_small() @end_small()

@begin_big() @end_big()

@begin_italic() @end_italic()

@begin_bold() @end_bold()

5.6. Lists----------

The following commands build lists. Note that begin/endpairs must be balanced, and nesting is supported. Thereare three kinds of lists: numbered lists, which arenumbered automatically, bullet lists, which displaysome special bullet symbol, and keyed lists, whichdisplay a string of text for each item.

@begin_numbered_list(start=1): @end_numbered_list(): @begin_numbered_list_item(): @end_numbered_list_item():

@begin_bullet_list(): @end_bullet_list(): @begin_bullet_list_item(): @end_bullet_list_item():

@begin_keyed_list(): @end_keyed_list(): @begin_keyed_list_item(key): @end_keyed_list_item():

Here's an example:

@begin_keyed_list() @begin_keyed_list_item('bullet') A bullet or similar character at the start of each item. @end_keyed_list_item() @begin_keyed_list_item('numbered') A number at the start of each item. @end_keyed_list_item() @begin_keyed_list_item('keyed') A key, or definition term, at the start of each item. @end_keyed_list_item() @end_keyed_list()

which comes out like:

bullet A bullet or similar character at the start of each item.

numbered A number at the start of each item.

keyed A key, or definition term, at the start of each item.

5.6.1. Easier lists-------------------

By default, Interscript installs a special weavercalled 'multiplexor' which delegates commands to zero,one, or more weavers. This weaver also supportssimplified list definitions. Here's the example above,simplified.

@begin_list('keyed') @item('bullet') A bullet or similar character at the start of each item. @item('numbered') A number at the start of each item. @item('keyed') A key, or definition term, at the start of each item. @end_list()

5.7. Tables-----------

Interscript supports tables, although currently thesupport is fairly primitive. Here's how to create atable:

@begin_table('Column 1','Column 2','Column 3') @table_row('Data 11', 'Data 12','Data13') @table_row('Data 21', 'Data 22','Data23') @end_table()

which looks like this:+----------+----------+----------+| Column 1 | Column 2 | Column 3 |+----------+----------+----------+| Data 11 | Data 12 | Data13 || Data 21 | Data 22 | Data23 |+----------+----------+----------+

5.8. Citations--------------

You can cite a URL like:

@cite_url('http://www.triode.net.au/~skaller')

which will appear as a hyper link in HTML files likehttp://www.triode.net.au/~skaller.

5.9. Cross References---------------------

Interscript supports intra-document (internal) crossreferencing using the commands:

@set_anchor('MyLabel') ... please see @ref_anchor('MyLabel') for details.

The label must be a string, and is currently requiredto be a valid identifier since it is used literally bythe HTML weaver in an anchor tag. Latex imposes no suchrestrictions, nor does the plain text weaver. For HTML,the label is set as an anchor; for latex, the pagenumber is given, for plain text the line number.

Note that inter-document (external) cross referencesare different to intra-document cross references. Fortruly external references to existing published works,use citations or bibliographic references. Referencesacross volumes of a work or project are not yetsupported.

5.10. Including Files---------------------

Interscript allows you to include Interscript fileswith the command

@include_file(filename)

When you do this, you should be aware that it istreated like a subroutine call: a stack frame iscreated, and any symbols bound in script sections arelocal to the file. In addition, various parameters arelocalized. Therefore, you cannot define a new commandor variable in an include file and expect it to persistpast the end of the file.

5.10.1. Including code----------------------

You can include existing code directly to the currenttangler like:

@select(py) @include_code(filename)

This is not the same as including an Interscript file.The contents of the code inclusion 'filename' arecopied verbatim to the tangler 'py'. Leading @characters are not detected. The contents of the fileare still woven into the document.

5.10.2. Displaying Code-----------------------

You can display a file as code like this:

@display_code(filename)

This is very useful for printing the results of aprocessing run, or weaving example programs into abook.

5.11. Translating Html----------------------

Interscript normally reads Interscript. But it can alsoread (a small subset of) HTML. The command:

@include_html(filename)

will read an HTML file and translate the tags toInterscript. In this manner, you can convert flat HTMLinto stacked HTML, or into Latex or plain text.

5.12. Special constructions---------------------------

5.12.1. Table of Contents-------------------------

To print the table of contents, you can say:

@print_table_of_contents()

For Latex, the native table of contents construction isused. For flat HTML, the weaver generates a hyperlinked table of headings, but two passes are requiredto get it right. For plain text, two passes are alsorequired.

For stacked HTML, a separate contents page is createdautomatically. For this reason, the table of contentscommand is disabled for that weaver.

5.12.2. Identifier Index------------------------

This is table of all the identifiers used in yourprograms. For HTML, the entries are hyper linked toeach occurrence of the identifier. Finding identifiersis the task of tanglers. At this time, the Pythontangler can find most of them by tokenising pythonscript. None of the other tanglers support this featureproperly yet.

@print_identifier_cross_reference()

5.12.3. File list-----------------

A list of all the generated files. Use

@print_file_list()

to generate it. You can use this to assemble thegenerated files into a tar ball.

5.12.4. Source list-------------------

A list of all the input files. Use

@print_source_file_list()

to generate it. This command is useful so you know whatfiles are required for a package.

5.12.5. The Web Weaver----------------------

Here are some special features of the 'web' weaver.

5.12.5.1. Automatic table generation------------------------------------

As of version 1.0a6, the web weaver automaticallyproduces a number of tables: the table of contents,index of classes, index of functions, index ofidentifiers, index of unit tests, and file convergencestatus report. None of these tables can be disabled atpresent.

5.12.5.2. Mandatory Frames--------------------------

The web weaver also produces several framesets. Thereis no 'noframes' option at present.

5.12.5.3. Internet Explorer DHTML support-----------------------------------------

In addition, the table of contents uses ECMAscript todetermine if it is running on a version MicrosoftInternet Explorer; if so, it enables dynamic HTMLfeatures unique to the Microsoft object model whichpermit dynamic expansion or contraction of the table ofcontents tree at each branch.

5.12.5.4. Cascading Style Sheets--------------------------------

Both the web and html weavers uses CLASS attributes intags, and the web weaver in particular makes fairlyheavy use of the generic DIV and SPAN tags.

A standard Cascading Style Sheet called Interscript.csscan be found in the directory interscript/doc. Itcolours various elements in a suggestive way. Do notchange interscript.css; instead, supply user.css; itshould override interscript.css even if your browserfinds both files.

5.13. File Names----------------

The names of files used in interscript documents shouldbe relative pathnames obeying the Unix convention, evenon other platforms such as NT or the Mac: separatecomponents with a / character. Don't use sillycharacters such as : in components names.

I plan to upgrade the file naming convention to useURLs with 'interscript' addressing scheme, in which the'network' component is treated as a logical locationidentifier; the client will map these locations tophysical ones.

The current version of interscript does not providethis mechanism yet. Instead, there are four commandline options:

--weaver-prefix=nativepath --tangler-prefix=nativebspath --weaver-directory=relpath --tangler-directory=relpath

where the nativepath is a prefix in native operatingsystem format, and the relpath is a prefix in Unixformat. For an interscript file given as 'basename',the resulting actual filename is:

abspath+ (string.join(string.split(relpath+basename,'/').os.sep)

Note that if you use 'a' and 'b' as the prefix anddirectory a filename base will be called 'abbase': noseparators are put between the prefix, directory andbase. Here's an example for Windows:

python iscr.py \ --tangler-prefix=c:\mydevelopment\ \ --tangler-directory=code/ \ example.pak

Note that interscript creates directories automaticallyfor the 'Unix' part of the filename, but _not_ thenative prefix. Thus in the example 'c:\mydevelopment'must exist, whereas 'code'is created within itautomatically. If example.pak tangles a file'package/module.py', then 'package' is also createdautomatically.

5.14. Questions and Answers---------------------------

5.14.1. Why are HTML tags printed verbatim, even by the html weaver?--------------------------------------------------------------------

Weavers are _supposed_ to print all printablecharacters literally. So tagged text such astagged is printed exactly as you wrote it.

5.14.2. OK, so how do I put HTML into a document?-------------------------------------------------

If you want to write specialised HTML and put it into adocument, there are several ways to do it.

5.14.2.1. Tag method--------------------

This is the prefered method. You say:

@weaver.rawif('html') 'raw html' @weaver.enable()

What this does is disable the weaver except unless itis tagged 'html'. Then you raw-write the html, andfinally re-enable the weaver. This works even when youhave multiple weavers configured with filters, becausethe multiplexor and markup filters delegate thesecommands to their clients.

You add tags to a weaver with anything by saying

@weaver.add_tag(something)

The built in weavers are tagged 'text', 'html' 'latex'and 'raw' as appropriat