texmlbus - tug.org
TRANSCRIPT
<TeX>
texmlbus
H. Stamerjohanns
TUG, August 2021
H. Stamerjohanns texmlbus TUG, August 2021 1 / 35
<TeX>
Outline
1 MotivationLATEX and XMLWhy convert to XML?Previous projectstexmlbus: change of focus
2 WebsiteBuild system
3 Summary
H. Stamerjohanns texmlbus TUG, August 2021 2 / 35
<TeX>
Just use LATEXLATEX is the format to write math
I millions of scientific publications have beenwritten using LATEX
I best way to produce high quality mathtypesetting
drawbacksI mixes form and contentI no real semanticsI style files change over timeI no formal validationI long term preservation?
H. Stamerjohanns texmlbus TUG, August 2021 3 / 35
<TeX>
Conversion to XML
XMLI not something you want to directly editI document can be validated
⇒ possible archive formatI JATS Journal Article Text Suite
I MathML, XHTML⇒ render document directly in web browser
I easier for searching and indexing tools,screenreaders
H. Stamerjohanns texmlbus TUG, August 2021 4 / 35
<TeX>
Project based on...
arxivml build systemI written at Jacobs University BremenI use LaTeXML to create XMLI mass conversion to XHTML
≈ 500.000 documents converted
I create real-world MathML⇒ improve LaTeXML1
1B. Miller and D. Ginev, https://dlmf.nist.gov/LaTeXML/H. Stamerjohanns texmlbus TUG, August 2021 5 / 35
<TeX>
Build system
I open source (MIT licence)I implemented in scripting language (here php)I uses SQL database to store state
I distributes jobs on several hostsI sets timeout for each jobI analyzes conversion process
checks filesparses the result files (stderr.log)classifies results
I stores information in DB
H. Stamerjohanns texmlbus TUG, August 2021 6 / 35
<TeX>
texmlbus: change of focusI easy installation
use Docker images
I more interactivityupload files via browserupload files via browserupload files via browser
I other targets than XHTMLresult table for each target
I create same target using different systemsintroduce stages (target combined with image)needs subdirectories for each stage
H. Stamerjohanns texmlbus TUG, August 2021 7 / 35
<TeX>
texmlbus: change of focusI easy installation
⇒ use Docker images
I more interactivityupload files via browserupload files via browserupload files via browser
I other targets than XHTMLresult table for each target
I create same target using different systemsintroduce stages (target combined with image)needs subdirectories for each stage
H. Stamerjohanns texmlbus TUG, August 2021 7 / 35
<TeX>
texmlbus: change of focusI easy installation
⇒ use Docker images
I more interactivity⇒ upload files via browser⇒ import files directly from Overleafupload files via browser
I other targets than XHTMLresult table for each target
I create same target using different systemsintroduce stages (target combined with image)needs subdirectories for each stage
H. Stamerjohanns texmlbus TUG, August 2021 7 / 35
<TeX>
texmlbus: change of focusI easy installation
⇒ use Docker images
I more interactivity⇒ upload files via browser⇒ import files directly from Overleaf⇒ schedule jobs via browser
I other targets than XHTMLresult table for each target
I create same target using different systemsintroduce stages (target combined with image)needs subdirectories for each stage
H. Stamerjohanns texmlbus TUG, August 2021 7 / 35
<TeX>
texmlbus: change of focusI easy installation
⇒ use Docker images
I more interactivity⇒ upload files via browser⇒ import files directly from Overleaf⇒ schedule jobs via browser
I other targets than XHTML⇒ result table for each target
I create same target using different systemsintroduce stages (target combined with image)needs subdirectories for each stage
H. Stamerjohanns texmlbus TUG, August 2021 7 / 35
<TeX>
texmlbus: change of focusI easy installation
⇒ use Docker images
I more interactivity⇒ upload files via browser⇒ import files directly from Overleaf⇒ schedule jobs via browser
I other targets than XHTML⇒ result table for each target
I create same target using different systems⇒ introduce stages (target combined with image)⇒ needs subdirectories for each stage)
H. Stamerjohanns texmlbus TUG, August 2021 7 / 35
<TeX>
texmlbus build system
1
Interactive usage via browser
- add and remove documents- queue conversion jobs - manages document files
SQL database
- workqueue - statistics
Document files(each article
in subdirectory)
Worker containers
- handle conversion - call latexml and latexmlpost or conversion commands
Build Manager
- operates on workqueue- schedules make jobs on worker containers- analyzes log files
manages entries in workqueue - state and priority
reads workqueue
invokes make stores analyzed
log data
default: create result files
and log files
reads log files
H. Stamerjohanns texmlbus TUG, August 2021 8 / 35
<TeX>
texmlbus build system
3
Interactive usage via browser
- add and remove documents- queue conversion jobs - manages document files
SQL database
- workqueue - statistics
Document files(each article
in subdirectory)
Worker containers
- handle conversion - call latexml and latexmlpost or conversion commands
Build Manager
- operates on workqueue- schedules make jobs on worker containers- analyzes log files
manages entries in workqueue - state and priority
reads workqueue
invokes make stores analyzed
log data
default: create result files
and log files
reads log files
H. Stamerjohanns texmlbus TUG, August 2021 9 / 35
<TeX>
texmlbus build system
4
Interactive usage via browser
- add and remove documents- queue conversion jobs - manages document files
SQL database
- workqueue - statistics
Document files(each article
in subdirectory)
Worker containers
- handle conversion - call latexml and latexmlpost or conversion commands
Build Manager
- operates on workqueue- schedules make jobs on worker containers- analyzes log files
manages entries in workqueue - state and priority
reads workqueue
invokes make stores analyzed
log data
default: create result files
and log files
reads log files
H. Stamerjohanns texmlbus TUG, August 2021 10 / 35
<TeX>
texmlbus build system
5
Interactive usage via browser
- add and remove documents- queue conversion jobs - manages document files
SQL database
- workqueue - statistics
Document files(each article
in subdirectory)
Worker containers
- handle conversion - call latexml and latexmlpost or conversion commands
Build Manager
- operates on workqueue- schedules make jobs on worker containers- analyzes log files
manages entries in workqueue - state and priority
reads workqueue
invokes make stores analyzed
log data
default: create result files
and log files
reads log files
H. Stamerjohanns texmlbus TUG, August 2021 11 / 35
<TeX>
texmlbus build system
6
Interactive usage via browser
- add and remove documents- queue conversion jobs - manages document files
SQL database
- workqueue - statistics
Document files(each article
in subdirectory)
Worker containers
- handle conversion - call latexml and latexmlpost or conversion commands
Build Manager
- operates on workqueue- schedules make jobs on worker containers- analyzes log files
manages entries in workqueue - state and priority
reads workqueue
invokes make stores analyzed
log data
default: create result files
and log files
reads log files
H. Stamerjohanns texmlbus TUG, August 2021 12 / 35
<TeX>
texmlbus build system
7
Interactive usage via browser
- add and remove documents- queue conversion jobs - manages document files
SQL database
- workqueue - statistics
Document files(each article
in subdirectory)
Worker containers
- handle conversion - call latexml and latexmlpost or conversion commands
Build Manager
- operates on workqueue- schedules make jobs on worker containers- analyzes log files
manages entries in workqueue - state and priority
reads workqueue
invokes make stores analyzed
log data
default: create result files
and log files
reads log files
H. Stamerjohanns texmlbus TUG, August 2021 13 / 35
<TeX>
texmlbus build system
7
Interactive usage via browser
- add and remove documents- queue conversion jobs - manages document files
SQL database
- workqueue - statistics
Document files(each article
in subdirectory)
Worker containers
- handle conversion - call latexml and latexmlpost or conversion commands
Build Manager
- operates on workqueue- schedules make jobs on worker containers- analyzes log files
manages entries in workqueue - state and priority
reads workqueue
invokes make stores analyzed
log data
default: create result files
and log files
reads log files
H. Stamerjohanns texmlbus TUG, August 2021 14 / 35
<TeX>
texmlbus build system
8
Interactive usage via browser
- add and remove documents- queue conversion jobs - manages document files
SQL database
- workqueue - statistics
Document files(each article
in subdirectory)
Worker containers
- handle conversion - call latexml and latexmlpost or conversion commands
Build Manager
- operates on workqueue- schedules make jobs on worker containers- analyzes log files
manages entries in workqueue - state and priority
reads workqueue
invokes make stores analyzed
log data
default: create result files
and log files
reads log files
H. Stamerjohanns texmlbus TUG, August 2021 15 / 35
<TeX>
texmlbus build system
9
Interactive usage via browser
- add and remove documents- queue conversion jobs - manages document files
SQL database
- workqueue - statistics
Document files(each article
in subdirectory)
Worker containers
- handle conversion - call latexml and latexmlpost or conversion commands
Build Manager
- operates on workqueue- schedules make jobs on worker containers- analyzes log files
manages entries in workqueue - state and priority
reads workqueue
invokes make stores analyzed
log data
default: create result files
and log files
reads log files
H. Stamerjohanns texmlbus TUG, August 2021 16 / 35
<TeX>
texmlbus build system
10
Interactive usage via browser
- add and remove documents- queue conversion jobs - manages document files
SQL database
- workqueue - statistics
Document files(each article
in subdirectory)
Worker containers
- handle conversion - call latexml and latexmlpost or conversion commands
Build Manager
- operates on workqueue- schedules make jobs on worker containers- analyzes log files
manages entries in workqueue - state and priority
reads workqueue
invokes make stores analyzed
log data
create result files
and log files
reads log files
H. Stamerjohanns texmlbus TUG, August 2021 17 / 35
<TeX>
texmlbus build system
11
Interactive usage via browser
- add and remove documents- queue conversion jobs - manages document files
SQL database
- workqueue - statistics
Document files(each article
in subdirectory)
Worker containers
- handle conversion - call latexml and latexmlpost or conversion commands
Build Manager
- operates on workqueue- schedules make jobs on worker containers- analyzes log files
manages entries in workqueue - state and priority
reads workqueue
invokes make stores analyzed
log data
default: create result files
and log files
reads log files
H. Stamerjohanns texmlbus TUG, August 2021 18 / 35
<TeX>
texmlbus build system
12
Interactive usage via browser
- add and remove documents- queue conversion jobs - manages document files
SQL database
- workqueue - statistics
Document files(each article
in subdirectory)
Worker containers
- handle conversion - call latexml and latexmlpost or conversion commands
Build Manager
- operates on workqueue- schedules make jobs on worker containers- analyzes log files
manages entries in workqueue - state and priority
reads workqueue
invokes make
stores analyzed log data
default: create result files
and log files
reads log files
H. Stamerjohanns texmlbus TUG, August 2021 19 / 35
<TeX>
texmlbus build system
13
Interactive usage via browser
- add and remove documents- queue conversion jobs - manages document files
SQL database
- workqueue - statistics
Document files(each article
in subdirectory)
Worker containers
- handle conversion - call latexml and latexmlpost or conversion commands
Build Manager
- operates on workqueue- schedules make jobs on worker containers- analyzes log files
manages entries in workqueue - state and priority
reads workqueue
invokes make stores analyzed
log data
default: create result files
and log files
reads log files
Webserver / PHP
MySQL DB LaTeXML
Shared Volume
Document repository
H. Stamerjohanns texmlbus TUG, August 2021 20 / 35
<TeX>
texmlbus build system
MySQL DB Build SystemPHP
LaTeXMLworker
LaTeXMLworker
LaTeXMLworker
docker-compose
SharedVolume
H. Stamerjohanns texmlbus TUG, August 2021 21 / 35
<TeX>
texmlbus build system
H. Stamerjohanns texmlbus TUG, August 2021 22 / 35
<TeX>
texmlbus build system
H. Stamerjohanns texmlbus TUG, August 2021 23 / 35
<TeX>
texmlbus build system
H. Stamerjohanns texmlbus TUG, August 2021 24 / 35
<TeX>
texmlbus build system
H. Stamerjohanns texmlbus TUG, August 2021 25 / 35
<TeX>
texmlbus build system
H. Stamerjohanns texmlbus TUG, August 2021 26 / 35
<TeX>
texmlbus build system
H. Stamerjohanns texmlbus TUG, August 2021 27 / 35
<TeX>
texmlbus build system
H. Stamerjohanns texmlbus TUG, August 2021 28 / 35
<TeX>
texmlbus build system
H. Stamerjohanns texmlbus TUG, August 2021 29 / 35
<TeX>
texmlbus build system
H. Stamerjohanns texmlbus TUG, August 2021 30 / 35
<TeX>
texmlbus build system
H. Stamerjohanns texmlbus TUG, August 2021 31 / 35
<TeX>
The web interfaceResult statistics
H. Stamerjohanns texmlbus TUG, August 2021 32 / 35
<TeX>
Summary
I texmlbus allows to convert documents andgather statistics about conversions
I especially useful to detect regressions withreal-world documents
I stages allow to have same targets usingdifferent systems
I supports any converter
H. Stamerjohanns texmlbus TUG, August 2021 33 / 35
<TeX>
Outlook
Things to be done
I add converters more easily
I help to improve LaTeXML
H. Stamerjohanns texmlbus TUG, August 2021 34 / 35
<TeX>
texmlbus
https://github.com/stamer/texmlbus
Thanks to Overleaf for their support!
H. Stamerjohanns texmlbus TUG, August 2021 35 / 35