README
¶
Base image for Overview converters.
How a Converter Works
A converter's job is to turn files of one type into files of another type. It does this in a loop. It receives jobs from an internal Overview HTTP server.
This base image provides portable executables that communicate with Overview. They make up a framework: they'll call your converter program, which you can write in any language.
Your converter will have a Dockerfile that looks like this:
FROM overview/overview-converter-framework AS framework
# multi-stage build
FROM alpine:3.7 AS build
... (build your executables, including `do-convert-single-file`)
FROM alpine:3.7 AS production
# Add ca-certificates to let container download from S3 https:// URLs
RUN apk add --update --no-cache ca-certificates
WORKDIR /app
# The framework provides the main executable
COPY --from=framework /app/run /app/run
# Your `do-convert` code can choose from a few different input and output
# formats. The framework provides many `/app/convert` implementations: pick
# the one that matches your `do-convert`.
COPY --from=framework /app/convert-single-file /app/convert
COPY --from=build /app/do-convert-single-file /app/do-convert-single-file
/app/run
This framework runs on a loop:
- Download a task from Overview as JSON.
- Open a stream to download the body of the input file.
- Stream the body to
/app/convert MIME-BOUNDARY JSONand pipe the results to Overview.
/app/run handles all communication with Overview. In particular:
/app/runpolls for tasks atPOLL_URL. Overview's administrator must setPOLL_URLfor your container./app/runwill retry if there is a connection error./app/runwill never crash.- TODO
/app/runwill poll Overview to check if the task is canceled. It will notify/app/convertwithSIGINTif the task is canceled.
/app/convert -- a.k.a., /app/convert-*
/app/convert is a program we provide, under a few different names. That is,
when you create your program you'll choose one of the following implementations
to copy into /app/convert in your image.
From /app/run's point of view, /app/convert will read the input stream
and JSON command-line argument and produce a multipart/form-data output
stream with MIME boundary MIME-BOUNDARY (in C lingo, argv[1]).
/app/convert will never crash, and it will always output a data stream that
Overview can handle.
Your code is invoked by /app/convert, following one of these strategies:
/app/convert-single-file
This version of /app/convert will:
- Write standard input to
input.blobin a temporary directory and verify it's the correct size - Run
/app/do-convert-single-file JSON(your code) in the temporary directory - Translate the
stdoutfrom your code into progress events or an error event - When your code exits with status
0and no error message, pipeoutput.json,output.blob-- and if they exist,output-thumbnail.jpg,output-thumbnail.pngandoutput.txt-- and adoneevent
Special cases:
- Cancelation: if
/app/runsends aSIGINTsignal, sends your programSIGINT. Your program should kill and wait for any child processes, then exit. Its standard output and standard error will be ignored. - Error: if
/app/do-convert-single-fileexits with non-zero return value, pipes anerrorevent.
You must provide /app/do-convert-single-file. The framework will invoke
/app/do-convert JSON. Your program can read input.blob in the current
working directory. Your program must:
- Write progress messages to
stdout, newline-delimited, that look like:p1/2-- "finished processing page 1 of 2"b102/412-- "finished processing byte 102 of 412"0.324-- "finished processing 32.4% of input"anything else at all-- "ERROR: [the line of text]"
- Write
output.json,output.blob, and optionallyoutput-thumbnail.jpg,output-thumbnail.pngand/oroutput.txt. - Exit with status code
0. Any other exit code is an error in your code.
Testing: /app/test-convert-single-file
You can test /app/do-convert-single-file by creating a Docker image with the
special framework program, /app/test-convert-single-file. This is designed to
integrate with automated build enviroments like Docker Hub.
Your Docker build stage doesn't need a CMD. It should include:
/app/test-convert-single-file-- and you shouldRUN [ "/app/test-convert-single-file" ]/app/do-convert-single-fileand everything it depends on --/app/test-convert-single-filewill invoke it once per test/app/test/test-*: one directory per test, e.g./app/test/test-with-ocr. Each test directory should contain:input.blobinput.json-- the JSON passed todo-convert-single-filestdout-- expected standard output fromdo-convert-single-file0.blob-- expected0.bloboutput0.json-- expected0.jsonoutput0.txt(optional) -- expected0.txtoutput0-thumbnail.{png,jpg}(optional) -- expected output
test-convert-single-file will run do-convert-single-file in a separate
directory per test. It will output in TAP format
and exit with status code 1 if any test fails.
Copying failed-test files from the test suite
The test output is designed to help you correct your tests. For instance, here
is example output from a test that fails because you did not write
0-thumbnail.jpg
Step 12/13 : RUN [ "/app/test-convert-single-file" ]
---> Running in f65521f3a30c
1..3
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
not ok 1 - test-jpg-ocr
do-convert-single-file wrote /tmp/test-do-convert-single-file912093989/0-thumbnail.jpg, but we expected it not to exist
...
Upon seeing this error, you can
docker cp f65521f3a30c:/tmp/test-do-convert-single-file912093989/0-thumbnail.jpg .
to inspect the file in question (and perhaps make it the expected one).
Testing PDF conversion
PDF output is a common case. We use QPDF for file comparison, to ease debugging.
Your Dockerfile must install QPDF -- e.g., apk --no-cache add qpdf -- before
running RUN [ "/app/test-convert-single-file" ] if you are testing PDF output.
/app/convert-stream-to-mime-multipart
This version of /app/convert will:
- Create an empty temporary directory
- Run
/app/do-convert-stream-to-mime-multipart MIME-BOUNDARY JSON(your code) within the temporary directory - Stream the input file from Overview to your program's
stdinand and pipe your program'sstdoutto Overview
Special cases:
- Cancelation: if
/app/runsends aSIGINTsignal, sends your programSIGINT. Your program should kill and wait for any child processes, then exit. Its standard output and standard error will be ignored. - Error: if your program exits with non-zero return value, pipes an
errorevent. - Buggy code: emits an
errorevent if your program does not produce aerrorordoneevent or end with--MIME-BOUNDARY--. - Temporary files: if your program emits temporary files to its current working directory, they will be deleted.
You must provide /app/do-convert-stream-to-mime-multipart. The framework
will invoke it with MIME-BOUNDARY and JSON as arguments. MIME-BOUNDARY
will match the regex [a-fA-F0-9]{1,60}. Your program can read input.blob
in the current directory.
Your program must write valid multipart/form-data output to stdout. For
instance:
--MIME-BOUNDARY\r\n
Content-Disposition: form-data; name="0.json"\r\n
\r\n
{JSON for first output file}\r\n
--MIME-BOUNDARY\r\n
Content-Disposition: form-data; name="0.blob"\r\n
\r\n
Blob for first output file\r\n
--MIME-BOUNDARY\r\n
Content-Disposition: form-data; name="progress"\r\n
\r\n
{"pages":{"nProcessed":1,"nTotal":3}}\r\n
--MIME-BOUNDARY\r\n
Content-Disposition: form-data; name="done"\r\n
\r\n
--MIME-BOUNDARY--
Rules:
- Your output must end with a
doneorerrorelement. Adoneelement should be empty; anerrorelement must include an error message. - Your output must be in order:
0.json,0.blob, (optionally0.png,0.jpgand/or0.txt),1.json,1.blob, ...,done. - You should output an accurate progress report before each
N.jsonto help Overview's progressbar behave well.
Roll your own
Even more lightweight than /app/convert-stream-to-mime-multipart is to roll
your own version of /app/convert. Beware, though:
- Your own version of
/app/convertmust always output messages to Overview: especially adoneorerrorevent. Without those events, Overview will never finish processing the file: it will retry indefinitely. - Your own version of
/app/convertmust always exit successfully. The trickiest case, in our experience, is handling "out of memory." If your/app/convertdoes not exit successfully, Overview will retry indefinitely and the file will never be processed. - Your own version of
/app/convertshould output helpful error messages, so you can debug it easily. - Your own version of
/app/convertshould end quickly after receivingSIGUSR, because Overview will ignore all further output. - Your own version of
/app/convertmust ensure temporary files invoked during one invocation aren't read by the next invocation: that would leak users' documents to other users.
/app/convert-stream-to-mime-multipart is small and fast, and it solves these
problems for you. You probably want it.
To Maintain This Repository
Coding
./dev will start a development loop that runs tests. Restart it if you edit
Dockerfile.
Testing
docker build . will run all tests.
Tests are in ./test/*/suite.bats. They're run in
bats, an ideal framework for testing
programs that pipe data around.
Releasing
./release MAJOR.MINOR.PATCH will push to GitHub. Docker Hub will build the
images for mass consumption.
License
This software is Copyright 2011-2018 Jonathan Stray and Copyright 2019-2020 Overview Computing Inc., and distributed under the terms of the GNU Affero General Public License. See the LICENSE file for details.
Directories
¶
| Path | Synopsis |
|---|---|
|
cmd
|
|
|
convert-single-file
command
|
|
|
convert-stream-to-mime-multipart
command
|
|
|
run
command
|
|
|
test-convert-single-file
command
|