Problems with links using pdfpages and pax

I have to merge some HTML documentation into a LaTeX workflow. I am running into problems with internal links using the pax package.

I am using wkhtmltopdf to produce the pdf pages from the html. I use the pdfpages package along with pax to embed those pdf pages into the LaTeX source.

When I compile with pdflatex the links from the html-sourced pdf no longer work.

The links do work in the wkhtmltopdf-generated pdfs before they're included in the LaTeX document.

I extract the link info using java 1.6 with this command:

java -cp /usr/local/share/java/pax.jar:/usr/local/share/java/PDFBox-0.7.3.jar \ pax.PDFAnnotExtractor filename.pdf 

I've also used the pdfannotextractor.pl script that comes with pax, with the same results: I get plenty of error messages like this:

!!! Warning: Annotation on page 1 not recognized! java.lang.NullPointerException 

Although at the end of those warnings, it says * Result: [ok] The debug information from the script looks like this:

PDFAnnotExtractor 0.1l, 2012/04/18 - Copyright (c) 2008, 2011, 2012 by Heiko Oberdiek. * CLASSPATH: [] * is_win: [0] * pax.jar: [/usr/local/share/java/pax.jar] * pdfbox.jar: [/usr/local/share/java/PDFBox-0.7.3.jar] * Which java: [/usr/local/bin/java] * System: [java -cp /usr/local/share/java/pax.jar:/usr/local/share/java/PDFBox-0.7.3.jar pax.PDFAnnotExtractor logistic_python.pdf] 

If I turn off internal links with wkhtmltopdf, I get no errors.

The pax file is still created but it looks non-informative--lines look like this:

\[{pagenum}{18}\\ \[{page}{1}{0 0 612 792}{}\\ \[{annot}{1}{Link}{50.82 579.93 75.57 591.18}{GoTo}{   DestLabel={1},   Border={[0 0 0]}, }\\ 

And sure enough, when I pdflatex the LaTeX document, the links in the in the generated PDF do not work.

I'm on FreeBSD, using TeXLive 2015, PDFBox-0.7.3. I create the pdfs from html on a Windows machine with the latest wkhtmltopdf.

edit

Thinking the difference between platforms might be the cause, I tried using the pax java program on windows as well as the wkhtmltopdf program (Win7). Same results. Also I am now using this simple html file for testing:

<html> <head><title>my title</title></head> <body>   <h1>test</h1>   <p><a href="#myanchor">Click</a> me.</p>   <h2><a name="myanchor">Anchor</a></h4>   <p>text</p> </body> </html> 

I get the null pointer exception even with this simple case.

wkhtmltopdf test.htm test.pdf java -cp path\to\pax.jar;\path\to\pdfbox.jar test.pdf 

Replay

Category: pdfpages Time: 2016-07-29 Views: 0
Tags: html pdfpages pax

Related post

iOS development

Android development

Python development

JAVA development

Development language

PHP development

Ruby development

search

Front-end development

Database

development tools

Open Platform

Javascript development

.NET development

cloud computing

server

Copyright (C) avrocks.com, All Rights Reserved.

processed in 0.166 (s). 12 q(s)