Ticket #1452 (closed news: fixed)

Opened 2 years ago

Last modified 22 months ago

2010-04-23: A Daemon for LaTeXML

Reported by: deyan Owned by: deyan
Priority: normal Milestone: Future
Component: daemon Version: arXMLiv Branch
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Due to close: YYYY/MM/DD
Include in GanttChart: no Dependencies:
Due to assign: YYYY/MM/DD

Description

It is my pleasure to announce a first implementation of a LaTeXML daemon, realized on the  arXMLiv branch of the  LaTeXML repository.

The daemon has the purpose of providing an efficient and convenient way for performing vast conversion batch jobs of TeX sources. In contrast to LaTeXML, which needs to be invoked individually on every source, the daemon is initialized once and can process an arbitrary amount of sources on demand.

Features:

  • Can emulate the behavior of latexml, latexmlmath and latexmlpost
  • Robust against malformed TeX - persists against normal and fatal errors caused by the TeX source.
  • Has three levels of speedup, depending on the converted inputs:
    1. math mode : Fastest, one formula per input line
    2. fragment mode : Faster, expects document fragments based on a homogeneous document class and/or packages (as filenames).
    3. standard mode : Slowest of the three, can process any heterogeneous collection of TeX documents (as filenames). It is still faster than the original "latexml" binary, as perl is initialized only once for the entire batch.
  • Easily customizable log and destination targets via the #dir,#name and #ext filepath patterns.
  • An "autoflush" switch to automatically restart the daemon after a certain amount of converted input (keeps memory consumption low)
  • A new "keepTeX" switch which allows to preserve the TeX formula source as a MathML annotation.

Download and Install:

You should follow the standard instructions for installing LaTeXML from a repository.

However, instead of checking out the trunk, you should check out the arXMLiv branch:

svn co https://svn.mathweb.org/repos/LaTeXML/branches/arXMLiv

Then you can continue with the standard installation from source. The daemon binary is located at bin/latexmld

ToDo? and Caveats:

  • The presented above software is an early first release, so use the daemon at your own risk. It has already been put to production use by the arXMLiv project and has shown stability in converting 100,000 LaTeX sources so far. However, no liability is taken for the provided software, proceed to use with caution.
  • Documentation is lacking examples and detailed use-cases for the moment, soon to come!
  • The --keepXMath and --openmath switches are yet to be supported, also some non-standard combinations of the switches may not work as expected. Eventually, all latexml and latexmlpost capabilities will be available within the daemon.
  • Since the eventual goal of the daemon is to enter the standard distribution of LaTeXML, it is released in the Public Domain.

How you can help:

Install the daemon, try it out and give us feedback! For any bug reports, feature requests and general usability comments, please open a new Ticket under the "latexmld" component at the LaTeXML Trac ( https://trac.mathweb.org/LaTeXML/). You can also leave comments on this news item itself.

We are looking forward to your first impressions!

Change History

comment:1 Changed 22 months ago by miller

I'd like to see about eventually merging in this functionality, but it is hard to see what is involved. If I do a diff of the trunk and the arXMLiv branch, it gives the impression that every file is changed! I can't quite tell what's going on in this branch: whether there are trunk changes that haven't been merged, or whether there are changes that should be submitted to trunk....

but in any case, I can't tell what's needed for the daemon; Can you provide some guidance?

comment:2 Changed 22 months ago by deyan

It's great that you are interested, I could really use some strict bashing of my code! :>

  1. The arXMLiv branch is a strict superset of LaTeXML - all trunk/ changes have been merged continuously, so any change you have committed is also within our branch (upto rev 1212 at the moment)
  1. The LaTeXML daemon consists of the bin/latexmls, bin/latexmlc and bin/latexmld binaries, as well as the LaTeXML::Daemon module, which tries to encapsulate the changes I needed to the LaTeXML.pm routines. The latexmld binary is a bit out of date and will probably be abandoned, latexmls is the main guy here.
  1. All other changes are different (attempts of) enhancements and bonus features, which I have tried to keep a track of in the Changes file. Those include:
    • Allowing URI input to LaTeXML (LWP, LWP::Simple, URI added as dependencies, changes to LaTeXML::Gullet and LaTeXML::Daemon). This only works with the daemon so far, since I didn't want to mangle the main LaTeXML class. It's really useful when you put LaTeXML on the web, which we are doing right now.
    • I enabled multiple files on input with the classical bin/latexml and added a SOURCEBASE state variable to support the approach. I.e. you can do something like:
      latexml pre.tex main.tex post.tex
      

to designate that those three files form one entire document. Internally this works via simply \input{}-ing all files together.

  • There are a lot of smaller fixes and bootstraps, mostly with comments and remarks in the Changes file in some of the other modules.

For the daemon, I needed to add the native IO::Socket as a dependency, which should be ok and recently the Clone module in order to easily make a restore point for the State table.

I am very open to further discussions and suggestions of what can be done better, the daemon is not yet done and I have another 3-4 deficiencies to fix on my shortlist.

comment:3 follow-up: ↓ 4 Changed 22 months ago by miller

OK, once I strip out the superflous ".svn" files, there are less differences, but still a lot, especially in LaTeXML/Packages/*.ltxml. Why are these forked? If they are valid patches shouldn't they be submitted to the trunk? If they are invalid, shouldn't they be undone?

Eg. Why is ProcessOptions?() commented out in article.cls.ltxml ?

I suppose I'm unclear on what arXMLiv branch is for. Is it to experiment with a new feature till it can be incorporated in trunk, or is it a fork? (I doubt that it is really that manageable to try to be both)

Is this arXMLiv branch what is being used to process the arXiv? With such extensive changes that would make it very hard to do any debugging of arXiv processing on my end.

comment:4 in reply to: ↑ 3 Changed 22 months ago by deyan

Replying to miller:

OK, once I strip out the superflous ".svn" files, there are less differences, but still a lot, especially in LaTeXML/Packages/*.ltxml. Why are these forked? If they are valid patches shouldn't they be submitted to the trunk? If they are invalid, shouldn't they be undone?

Ugh, I have to take a look at this, they should all be equivalent to the trunk.

Eg. Why is ProcessOptions?() commented out in article.cls.ltxml ?

Since the daemon was choking on it when I made a preload some time ago, but I think you fixed this. I should probably put it back in.

I suppose I'm unclear on what arXMLiv branch is for. Is it to experiment with a new feature till it can be incorporated in trunk, or is it a fork? (I doubt that it is really that manageable to try to be both)

It is NOT a fork and it was never intended to be. There are just some quasi-fixes committed there so that things work before they are completely stable.

Is this arXMLiv branch what is being used to process the arXiv?

No, arXiv is being processed by the trunk/ version. We will use the branch to process Zentrallblatt math, but it is my firm intention to have nothing forked, but only new features and small fixes present.

With such extensive changes that would make it very hard to do any debugging of arXiv processing on my end.

Well, I don't think the changes are that extensive, the svn just scares you off by being overly verbose, when it shouldn't! Most changes are very small, but the diff decided that the entire file was completely changed, etc... I will spend the weekend getting the branch as close to the trunk as possible and will document any intended long-term "fix" as a ticket on the Trac. The features are documented in Changes, as mentioned before.

comment:5 Changed 22 months ago by deyan

The notorious SVN merge... So, apparently the only right way to sanely see what is different at the branch compared to the trunk is the following (always use --dry-run so that no changes get introduced, but you just get the info)

svn merge --dry-run https://svn.mathweb.org/repos/LaTeXML/branches/arXMLiv/ . --reintegrate

Weirdly this only works well when you are really in your local copy of trunk/. You should be seeing the following footprint:

--- Merging differences between repository URLs into '.':
U    tools/symbolscan
U    t/math/testscripts.dvi
U    t/fonts/mathcolor.dvi
U    t/parse/compose.dvi
U    t/parse/terms.dvi
U    doc/manual/manual.tex
U    lib/LaTeXML.pm
U    lib/LaTeXML/MathParser.pm
U    lib/LaTeXML/Post/OpenMath.pm
U    lib/LaTeXML/Post/MathML.pm
U    lib/LaTeXML/State.pm
U    lib/LaTeXML/Package/graphics.sty.ltxml
U    lib/LaTeXML/Package/graphicx.sty.ltxml
U    lib/LaTeXML/Package/article.cls.ltxml
U    lib/LaTeXML/Rewrite.pm
U    lib/LaTeXML/schema/RelaxNG/LaTeXML-block.rnc
U    lib/LaTeXML/Package.pm
U    lib/LaTeXML/Mouth.pm
U    lib/LaTeXML/Gullet.pm
U    lib/LaTeXML/Util/KeyVal.pm
U    lib/LaTeXML/Util/Pathname.pm
A    lib/LaTeXML/Daemon.pm
U    Makefile.PL
U    Changes
U    bin/latexml
A    bin/latexmlc
A    bin/latexmls
A    bin/latexmld

Looking at these I have the following comments:

  • I am quite sure I made no changes to the .dvi files, or anything at t/ for that matter. If you run a filesystem diff you will see there are really no changes between the two head revisions. Stupid svn...
  • I am updating the manual right now with a chapter on the daemon, but that's incomplete. Again, I am only adding things, not mangling anything else.
  • LaTeXML.pm has a minor change making SOURCEFILE global and adding SOURCEBASE as mentioned before.
  • LaTeXML::MathParser? has a minor bootstrap with an explanatory comment addressed to you.
  • OpenMath? and MathML have the experimental features + some minor symbols added
  • I added a setStatus setter and a getActiveScope getter to LaTeXML::State that should be harmless, grep Changes for exact reason (or write me an email, I remember and need those).
  • graphics.sty.ltxml is probably what you called forked, sorry! I opened a ticket now (#1478). In my defense, you were very busy with DLMF at that point and I needed to make progress, so I hacked this down by myself... If you stay active I promise I will behave in the future! :>
  • article.cls.ltxml should be ok now, the difference is a single space before the ProcessOptions?(). I put it back and all works well.
  • LaTeXML::Rewrite has been enhanced with the active scope recognition that I write about at #1460.
  • I have added the "picture" element at LaTeXML-block.rnc, but I need to grep Changes/svn log to remember why.
  • LaTeXML::Package got a tiny change related to #1460 (active scopes at rewriting)
  • LaTeXML::Mouth has the locator changes we are discussing at #1425. We shouldn't merge those yet.
  • LaTeXML::Gullet has the alpha version of the URI handling on \input.
  • Util::KeyVal? has an additonal sub for parsing list arguments and I intend to put more functionality related to #1472.
  • Util::Pathname has no change at all, again stupid svn.
  • LaTeXML::Daemon, the new binaries, the Changes and Makefile.PL reflect the daemon-related add-ons.

Phew, and I think this is exhaustive! Now you have a complete report on what is changed on the branch and why and I hope I convinced you there is no forking intention at all. I am looking forward to your comments!

comment:6 Changed 22 months ago by kohlhase

  • Status changed from new to closed
  • Resolution set to fixed

news should be closed, so they do not show up in the active tickets.

Note: See TracTickets for help on using tickets.