texmlbus - tug.org

40
<TeX> texmlbus H. Stamerjohanns TUG, August 2021 H. Stamerjohanns texmlbus TUG, August 2021 1 / 35

Upload: others

Post on 12-Apr-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: texmlbus - tug.org

<TeX>

texmlbus

H. Stamerjohanns

TUG, August 2021

H. Stamerjohanns texmlbus TUG, August 2021 1 / 35

Page 2: texmlbus - tug.org

<TeX>

Outline

1 MotivationLATEX and XMLWhy convert to XML?Previous projectstexmlbus: change of focus

2 WebsiteBuild system

3 Summary

H. Stamerjohanns texmlbus TUG, August 2021 2 / 35

Page 3: texmlbus - tug.org

<TeX>

Just use LATEXLATEX is the format to write math

I millions of scientific publications have beenwritten using LATEX

I best way to produce high quality mathtypesetting

drawbacksI mixes form and contentI no real semanticsI style files change over timeI no formal validationI long term preservation?

H. Stamerjohanns texmlbus TUG, August 2021 3 / 35

Page 4: texmlbus - tug.org

<TeX>

Conversion to XML

XMLI not something you want to directly editI document can be validated

⇒ possible archive formatI JATS Journal Article Text Suite

I MathML, XHTML⇒ render document directly in web browser

I easier for searching and indexing tools,screenreaders

H. Stamerjohanns texmlbus TUG, August 2021 4 / 35

Page 5: texmlbus - tug.org

<TeX>

Project based on...

arxivml build systemI written at Jacobs University BremenI use LaTeXML to create XMLI mass conversion to XHTML

≈ 500.000 documents converted

I create real-world MathML⇒ improve LaTeXML1

1B. Miller and D. Ginev, https://dlmf.nist.gov/LaTeXML/H. Stamerjohanns texmlbus TUG, August 2021 5 / 35

Page 6: texmlbus - tug.org

<TeX>

Build system

I open source (MIT licence)I implemented in scripting language (here php)I uses SQL database to store state

I distributes jobs on several hostsI sets timeout for each jobI analyzes conversion process

checks filesparses the result files (stderr.log)classifies results

I stores information in DB

H. Stamerjohanns texmlbus TUG, August 2021 6 / 35

Page 7: texmlbus - tug.org

<TeX>

texmlbus: change of focusI easy installation

use Docker images

I more interactivityupload files via browserupload files via browserupload files via browser

I other targets than XHTMLresult table for each target

I create same target using different systemsintroduce stages (target combined with image)needs subdirectories for each stage

H. Stamerjohanns texmlbus TUG, August 2021 7 / 35

Page 8: texmlbus - tug.org

<TeX>

texmlbus: change of focusI easy installation

⇒ use Docker images

I more interactivityupload files via browserupload files via browserupload files via browser

I other targets than XHTMLresult table for each target

I create same target using different systemsintroduce stages (target combined with image)needs subdirectories for each stage

H. Stamerjohanns texmlbus TUG, August 2021 7 / 35

Page 9: texmlbus - tug.org

<TeX>

texmlbus: change of focusI easy installation

⇒ use Docker images

I more interactivity⇒ upload files via browser⇒ import files directly from Overleafupload files via browser

I other targets than XHTMLresult table for each target

I create same target using different systemsintroduce stages (target combined with image)needs subdirectories for each stage

H. Stamerjohanns texmlbus TUG, August 2021 7 / 35

Page 10: texmlbus - tug.org

<TeX>

texmlbus: change of focusI easy installation

⇒ use Docker images

I more interactivity⇒ upload files via browser⇒ import files directly from Overleaf⇒ schedule jobs via browser

I other targets than XHTMLresult table for each target

I create same target using different systemsintroduce stages (target combined with image)needs subdirectories for each stage

H. Stamerjohanns texmlbus TUG, August 2021 7 / 35

Page 11: texmlbus - tug.org

<TeX>

texmlbus: change of focusI easy installation

⇒ use Docker images

I more interactivity⇒ upload files via browser⇒ import files directly from Overleaf⇒ schedule jobs via browser

I other targets than XHTML⇒ result table for each target

I create same target using different systemsintroduce stages (target combined with image)needs subdirectories for each stage

H. Stamerjohanns texmlbus TUG, August 2021 7 / 35

Page 12: texmlbus - tug.org

<TeX>

texmlbus: change of focusI easy installation

⇒ use Docker images

I more interactivity⇒ upload files via browser⇒ import files directly from Overleaf⇒ schedule jobs via browser

I other targets than XHTML⇒ result table for each target

I create same target using different systems⇒ introduce stages (target combined with image)⇒ needs subdirectories for each stage)

H. Stamerjohanns texmlbus TUG, August 2021 7 / 35

Page 13: texmlbus - tug.org

<TeX>

texmlbus build system

1

Interactive usage via browser

- add and remove documents- queue conversion jobs - manages document files

SQL database

- workqueue - statistics

Document files(each article

in subdirectory)

Worker containers

- handle conversion - call latexml and latexmlpost or conversion commands

Build Manager

- operates on workqueue- schedules make jobs on worker containers- analyzes log files

manages entries in workqueue - state and priority

reads workqueue

invokes make stores analyzed

log data

default: create result files

and log files

reads log files

H. Stamerjohanns texmlbus TUG, August 2021 8 / 35

Page 14: texmlbus - tug.org

<TeX>

texmlbus build system

3

Interactive usage via browser

- add and remove documents- queue conversion jobs - manages document files

SQL database

- workqueue - statistics

Document files(each article

in subdirectory)

Worker containers

- handle conversion - call latexml and latexmlpost or conversion commands

Build Manager

- operates on workqueue- schedules make jobs on worker containers- analyzes log files

manages entries in workqueue - state and priority

reads workqueue

invokes make stores analyzed

log data

default: create result files

and log files

reads log files

H. Stamerjohanns texmlbus TUG, August 2021 9 / 35

Page 15: texmlbus - tug.org

<TeX>

texmlbus build system

4

Interactive usage via browser

- add and remove documents- queue conversion jobs - manages document files

SQL database

- workqueue - statistics

Document files(each article

in subdirectory)

Worker containers

- handle conversion - call latexml and latexmlpost or conversion commands

Build Manager

- operates on workqueue- schedules make jobs on worker containers- analyzes log files

manages entries in workqueue - state and priority

reads workqueue

invokes make stores analyzed

log data

default: create result files

and log files

reads log files

H. Stamerjohanns texmlbus TUG, August 2021 10 / 35

Page 16: texmlbus - tug.org

<TeX>

texmlbus build system

5

Interactive usage via browser

- add and remove documents- queue conversion jobs - manages document files

SQL database

- workqueue - statistics

Document files(each article

in subdirectory)

Worker containers

- handle conversion - call latexml and latexmlpost or conversion commands

Build Manager

- operates on workqueue- schedules make jobs on worker containers- analyzes log files

manages entries in workqueue - state and priority

reads workqueue

invokes make stores analyzed

log data

default: create result files

and log files

reads log files

H. Stamerjohanns texmlbus TUG, August 2021 11 / 35

Page 17: texmlbus - tug.org

<TeX>

texmlbus build system

6

Interactive usage via browser

- add and remove documents- queue conversion jobs - manages document files

SQL database

- workqueue - statistics

Document files(each article

in subdirectory)

Worker containers

- handle conversion - call latexml and latexmlpost or conversion commands

Build Manager

- operates on workqueue- schedules make jobs on worker containers- analyzes log files

manages entries in workqueue - state and priority

reads workqueue

invokes make stores analyzed

log data

default: create result files

and log files

reads log files

H. Stamerjohanns texmlbus TUG, August 2021 12 / 35

Page 18: texmlbus - tug.org

<TeX>

texmlbus build system

7

Interactive usage via browser

- add and remove documents- queue conversion jobs - manages document files

SQL database

- workqueue - statistics

Document files(each article

in subdirectory)

Worker containers

- handle conversion - call latexml and latexmlpost or conversion commands

Build Manager

- operates on workqueue- schedules make jobs on worker containers- analyzes log files

manages entries in workqueue - state and priority

reads workqueue

invokes make stores analyzed

log data

default: create result files

and log files

reads log files

H. Stamerjohanns texmlbus TUG, August 2021 13 / 35

Page 19: texmlbus - tug.org

<TeX>

texmlbus build system

7

Interactive usage via browser

- add and remove documents- queue conversion jobs - manages document files

SQL database

- workqueue - statistics

Document files(each article

in subdirectory)

Worker containers

- handle conversion - call latexml and latexmlpost or conversion commands

Build Manager

- operates on workqueue- schedules make jobs on worker containers- analyzes log files

manages entries in workqueue - state and priority

reads workqueue

invokes make stores analyzed

log data

default: create result files

and log files

reads log files

H. Stamerjohanns texmlbus TUG, August 2021 14 / 35

Page 20: texmlbus - tug.org

<TeX>

texmlbus build system

8

Interactive usage via browser

- add and remove documents- queue conversion jobs - manages document files

SQL database

- workqueue - statistics

Document files(each article

in subdirectory)

Worker containers

- handle conversion - call latexml and latexmlpost or conversion commands

Build Manager

- operates on workqueue- schedules make jobs on worker containers- analyzes log files

manages entries in workqueue - state and priority

reads workqueue

invokes make stores analyzed

log data

default: create result files

and log files

reads log files

H. Stamerjohanns texmlbus TUG, August 2021 15 / 35

Page 21: texmlbus - tug.org

<TeX>

texmlbus build system

9

Interactive usage via browser

- add and remove documents- queue conversion jobs - manages document files

SQL database

- workqueue - statistics

Document files(each article

in subdirectory)

Worker containers

- handle conversion - call latexml and latexmlpost or conversion commands

Build Manager

- operates on workqueue- schedules make jobs on worker containers- analyzes log files

manages entries in workqueue - state and priority

reads workqueue

invokes make stores analyzed

log data

default: create result files

and log files

reads log files

H. Stamerjohanns texmlbus TUG, August 2021 16 / 35

Page 22: texmlbus - tug.org

<TeX>

texmlbus build system

10

Interactive usage via browser

- add and remove documents- queue conversion jobs - manages document files

SQL database

- workqueue - statistics

Document files(each article

in subdirectory)

Worker containers

- handle conversion - call latexml and latexmlpost or conversion commands

Build Manager

- operates on workqueue- schedules make jobs on worker containers- analyzes log files

manages entries in workqueue - state and priority

reads workqueue

invokes make stores analyzed

log data

create result files

and log files

reads log files

H. Stamerjohanns texmlbus TUG, August 2021 17 / 35

Page 23: texmlbus - tug.org

<TeX>

texmlbus build system

11

Interactive usage via browser

- add and remove documents- queue conversion jobs - manages document files

SQL database

- workqueue - statistics

Document files(each article

in subdirectory)

Worker containers

- handle conversion - call latexml and latexmlpost or conversion commands

Build Manager

- operates on workqueue- schedules make jobs on worker containers- analyzes log files

manages entries in workqueue - state and priority

reads workqueue

invokes make stores analyzed

log data

default: create result files

and log files

reads log files

H. Stamerjohanns texmlbus TUG, August 2021 18 / 35

Page 24: texmlbus - tug.org

<TeX>

texmlbus build system

12

Interactive usage via browser

- add and remove documents- queue conversion jobs - manages document files

SQL database

- workqueue - statistics

Document files(each article

in subdirectory)

Worker containers

- handle conversion - call latexml and latexmlpost or conversion commands

Build Manager

- operates on workqueue- schedules make jobs on worker containers- analyzes log files

manages entries in workqueue - state and priority

reads workqueue

invokes make

stores analyzed log data

default: create result files

and log files

reads log files

H. Stamerjohanns texmlbus TUG, August 2021 19 / 35

Page 25: texmlbus - tug.org

<TeX>

texmlbus build system

13

Interactive usage via browser

- add and remove documents- queue conversion jobs - manages document files

SQL database

- workqueue - statistics

Document files(each article

in subdirectory)

Worker containers

- handle conversion - call latexml and latexmlpost or conversion commands

Build Manager

- operates on workqueue- schedules make jobs on worker containers- analyzes log files

manages entries in workqueue - state and priority

reads workqueue

invokes make stores analyzed

log data

default: create result files

and log files

reads log files

Webserver / PHP

MySQL DB LaTeXML

Shared Volume

Document repository

H. Stamerjohanns texmlbus TUG, August 2021 20 / 35

Page 26: texmlbus - tug.org

<TeX>

texmlbus build system

MySQL DB Build SystemPHP

LaTeXMLworker

LaTeXMLworker

LaTeXMLworker

docker-compose

SharedVolume

H. Stamerjohanns texmlbus TUG, August 2021 21 / 35

Page 27: texmlbus - tug.org

<TeX>

texmlbus build system

H. Stamerjohanns texmlbus TUG, August 2021 22 / 35

Page 28: texmlbus - tug.org

<TeX>

texmlbus build system

H. Stamerjohanns texmlbus TUG, August 2021 23 / 35

Page 29: texmlbus - tug.org

<TeX>

texmlbus build system

H. Stamerjohanns texmlbus TUG, August 2021 24 / 35

Page 30: texmlbus - tug.org

<TeX>

texmlbus build system

H. Stamerjohanns texmlbus TUG, August 2021 25 / 35

Page 31: texmlbus - tug.org

<TeX>

texmlbus build system

H. Stamerjohanns texmlbus TUG, August 2021 26 / 35

Page 32: texmlbus - tug.org

<TeX>

texmlbus build system

H. Stamerjohanns texmlbus TUG, August 2021 27 / 35

Page 33: texmlbus - tug.org

<TeX>

texmlbus build system

H. Stamerjohanns texmlbus TUG, August 2021 28 / 35

Page 34: texmlbus - tug.org

<TeX>

texmlbus build system

H. Stamerjohanns texmlbus TUG, August 2021 29 / 35

Page 35: texmlbus - tug.org

<TeX>

texmlbus build system

H. Stamerjohanns texmlbus TUG, August 2021 30 / 35

Page 36: texmlbus - tug.org

<TeX>

texmlbus build system

H. Stamerjohanns texmlbus TUG, August 2021 31 / 35

Page 37: texmlbus - tug.org

<TeX>

The web interfaceResult statistics

H. Stamerjohanns texmlbus TUG, August 2021 32 / 35

Page 38: texmlbus - tug.org

<TeX>

Summary

I texmlbus allows to convert documents andgather statistics about conversions

I especially useful to detect regressions withreal-world documents

I stages allow to have same targets usingdifferent systems

I supports any converter

H. Stamerjohanns texmlbus TUG, August 2021 33 / 35

Page 39: texmlbus - tug.org

<TeX>

Outlook

Things to be done

I add converters more easily

I help to improve LaTeXML

H. Stamerjohanns texmlbus TUG, August 2021 34 / 35

Page 40: texmlbus - tug.org

<TeX>

texmlbus

https://github.com/stamer/texmlbus

Thanks to Overleaf for their support!

H. Stamerjohanns texmlbus TUG, August 2021 35 / 35