You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tabula-java [](https://travis-ci.org/tabulapdf/tabula-java)[](https://gitter.im/tabulapdf/tabula-java?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
`tabula-java` is a library for extracting tables from PDF files — it is the table extraction engine that used to power[Tabula](http://tabula.technology/) ([repo](http://github.com/tabulapdf/tabula)). You can use `tabula-java` as a command-line tool to programmatically extract tables from PDFs.
4
+
`tabula-java` is a library for extracting tables from PDF files — it is the table extraction engine that powers[Tabula](http://tabula.technology/) ([repo](http://github.com/tabulapdf/tabula)). You can use `tabula-java` as a command-line tool to programmatically extract tables from PDFs.
5
5
6
-
(This is the new version of the extraction engine; the previous code can be found at [`tabula-extractor`](http://github.com/tabulapdf/tabula-extractor).)
-g,--guess Guess the portion of the page to analyze per
37
42
page.
38
43
-h,--help Print this help text.
39
44
-i,--silent Suppress all stderr output.
40
-
-n,--no-spreadsheet Force PDF not to be extracted using
41
-
spreadsheet-style extraction (if there are
42
-
ruling lines separating each cell, as in a PDF
43
-
of an Excel spreadsheet)
45
+
-l,--lattice Force PDF to be extracted using lattice-mode
46
+
extraction (if there are ruling lines
47
+
separating each cell, as in a PDF of an Excel
48
+
spreadsheet)
49
+
-n,--no-spreadsheet [Deprecated in favor of -t/--stream] Force PDF
50
+
not to be extracted using spreadsheet-style
51
+
extraction (if there are no ruling lines
52
+
separating each cell)
44
53
-o,--outfile <OUTFILE> Write output to <file> instead of STDOUT.
45
54
Default: -
46
55
-p,--pages <PAGES> Comma separated list of ranges, or all.
47
56
Examples: --pages 1-3,5-7, --pages 3 or
48
57
--pages all. Default is --pages 1
49
-
-r,--spreadsheet Force PDF to be extracted using
50
-
spreadsheet-style extraction (if there are
51
-
ruling lines separating each cell, as in a PDF
52
-
of an Excel spreadsheet)
58
+
-r,--spreadsheet [Deprecated in favor of -l/--lattice] Force
59
+
PDF to be extracted using spreadsheet-style
60
+
extraction (if there are ruling lines
61
+
separating each cell, as in a PDF of an Excel
62
+
spreadsheet)
53
63
-s,--password <PASSWORD> Password to decrypt document. Default is empty
64
+
-t,--stream Force PDF to be extracted using stream-mode
65
+
extraction (if there are no ruling lines
66
+
separating each cell)
54
67
-u,--use-line-returns Use embedded line returns in cells. (Only in
55
68
spreadsheet mode.)
56
69
-v,--version Print version and exit.
57
-
58
70
```
59
71
60
-
It also includes a debugging tool, run `java -cp ./target/tabula-0.9.1-jar-with-dependencies.jar technology.tabula.debug.Debug -h` for the available options.
72
+
It also includes a debugging tool, run `java -cp ./target/tabula-1.0.5-jar-with-dependencies.jar technology.tabula.debug.Debug -h` for the available options.
61
73
62
74
You can also integrate `tabula-java` with any JVM language. For Java examples, see the [`tests`](src/test/java/technology/tabula/) folder.
63
75
64
76
JVM start-up time is a lot of the cost of the `tabula` command, so if you're trying to extract many tables from PDFs, you have a few options for speeding it up:
65
77
66
78
- the -b option, which allows you to convert all pdfs in a given directory
67
79
- the [drip](https://github.com/ninjudd/drip) utility
68
-
- the [Ruby](http://github.com/tabulapdf/tabula-extractor), [R](https://github.com/leeper/tabulizer), and [Node.js](https://github.com/ezodude/tabula-js) bindings
80
+
- the [Ruby](http://github.com/tabulapdf/tabula-extractor), [Python](https://github.com/chezou/tabula-py), [R](https://github.com/leeper/tabulizer), and [Node.js](https://github.com/ezodude/tabula-js) bindings
69
81
- writing your own program in any JVM language (Java, JRuby, Scala) that imports tabula-java.
70
-
- waiting for us to implement an API/server-style system (it's on the roadmap)
82
+
- waiting for us to implement an API/server-style system (it's on the [roadmap](https://github.com/tabulapdf/tabula-api))
71
83
72
84
## Building from Source
73
85
@@ -76,3 +88,30 @@ Clone this repo and run:
76
88
```
77
89
mvn clean compile assembly:single
78
90
```
91
+
92
+
## Contributing
93
+
94
+
Interested in helping out? We'd love to have your help!
95
+
96
+
You can help by:
97
+
98
+
-[Reporting a bug](https://github.com/tabulapdf/tabula-java/issues).
99
+
- Adding or editing documentation.
100
+
- Contributing code via a Pull Request.
101
+
- Spreading the word about `tabula-java` to people who might be able to benefit from using it.
102
+
103
+
### Backers
104
+
105
+
You can also support our continued work on `tabula-java` with a one-time or monthly donation [on OpenCollective](https://opencollective.com/tabulapdf#support). Organizations who use `tabula-java` can also [sponsor the project](https://opencollective.com/tabulapdf#support) for acknowledgement on [our official site](http://tabula.technology/) and this README.
106
+
107
+
Special thanks to the following users and organizations for generously supporting Tabula with donations and grants:
<atitle="The John S. and James L. Knight Foundation"href="http://www.knightfoundation.org/"target="_blank"><imgalt="The John S. and James L. Knight Foundation"src="https://knightfoundation.org/wp-content/uploads/2019/10/KF_Logotype_Icon-and-Stacked-Name.png"width="300"></a>
0 commit comments