Author: |
Akretion,Odoo Community Association (OCA) |
License: |
AGPL-3 |
Branch: |
simahawk-patch-1 |
Repository: |
brain-tec/edi |
Dependencies: |
account_invoice_import |
Languages: |
HTML (452, 14.9%),
PO File (447, 14.7%),
Python (1659, 54.6%),
XML (315, 10.4%),
and
reStructuredText (165, 5.4%) |
Other branches: |
14.0 |
Other repositories: |
Change2improve/edi,
ForgeFlow/edi,
OCA/edi,
TDu/edi,
acsone/edi,
akretion/edi,
aurestic/edi,
camptocamp/edi,
flotho/edi,
invitu/edi,
sebalix/edi,
simahawk/edi,
steingabelgaard/edi,
and
tegin/edi |
<h1 class="title">Account Invoice Import Simple PDF</h1>
<p><a class="reference external" href="https://odoo-community.org/page/development-status"><img alt="Beta" src="https://img.shields.io/badge/maturity-Beta-yellow.png" /></a> <a class="reference external" href="http://www.gnu.org/licenses/agpl-3.0-standalone.html"><img alt="License: AGPL-3" src="https://img.shields.io/badge/licence-AGPL--3-blue.png" /></a> <a class="reference external" href="https://github.com/OCA/edi/tree/14.0/account_invoice_import_simple_pdf"><img alt="OCA/edi" src="https://img.shields.io/badge/github-OCA%2Fedi-lightgray.png?logo=github" /></a> <a class="reference external" href="https://translation.odoo-community.org/projects/edi-14-0/edi-14-0-account_invoice_import_simple_pdf"><img alt="Translate me on Weblate" src="https://img.shields.io/badge/weblate-Translate%20me-F47D42.png" /></a> <a class="reference external" href="https://runbot.odoo-community.org/runbot/226/14.0"><img alt="Try me on Runbot" src="https://img.shields.io/badge/runbot-Try%20me-875A7B.png" /></a></p>
<p>This module is an extension of the module <em>account_invoice_import</em>: it adds support for simple PDF invoices i.e. PDF invoice that don't have an embedded XML file. This module has been developped to solve the drawbacks of the OCA module <strong>account_invoice_import_invoice2data</strong> ; its advantages are the following:</p>
<ul class="simple">
<li>Possibility to add support for a new vendor without developper skills: the accountant can do it!</li>
<li>Adding support for a new vendor is faster.</li>
<li>More tolerance on vendor invoice layout changes.</li>
<li>Easier to install.</li>
</ul>
<p>With this module, you can import all the invoices that you were able to import with the module <em>account_invoice_import_invoice2data</em>. In fact, this module uses the same design when importing a PDF vendor bill:</p>
<ol class="arabic simple">
<li>raw text extraction of the PDF file,</li>
<li>identify the partner using the VAT number (if the VAT number is present in the raw text extraction) or some keywords,</li>
<li>use regular expressions (regex) to extract the data needed to create the vendor bill in Odoo (single line configuration).</li>
</ol>
<p>The main difference with the OCA module <em>account_invoice_import_invoice2data</em> is that the regular expressions are auto-generated from the configuration made by the user in Odoo. No need to be a regex expert! But you can still write regex to extract some fields for some very specific needs.</p>
<p>The module can extract the following fields:</p>
<ul class="simple">
<li>Total Amount with taxes</li>
<li>Total Untaxed Amount</li>
<li>Total Tax Amount</li>
<li>Invoice Date</li>
<li>Due Date</li>
<li>Start Date</li>
<li>End Date</li>
<li>Invoice Number</li>
<li>Description (for that field, you have to write a regex)</li>
</ul>
<p>In this list, only 3 fields are required:</p>
<ul class="simple">
<li>Invoice Date</li>
<li>2 out of the 3 Amount fields (the 3rd can be deducted from the 2 others: Total Amount = Total Untaxed + Total Tax)</li>
</ul>
<p>To take advantage of the fields <em>Start Date</em> and <em>End Date</em>, you need the OCA module <em>account_invoice_start_end_dates</em> from the <a class="reference external" href="https://github.com/OCA/account-closing">account-closing</a> project.</p>
<p>To know the full story behind the development of this module, read <a class="reference external" href="https://akretion.com/en/blog/new-opensource-pdf-invoice-import-module-for-odoo">Akretion's blog post</a>.</p>
<p><strong>Table of contents</strong></p>
<div class="contents local topic" id="contents">
<ul class="simple">
<li><a class="reference internal" href="#installation" id="id1">Installation</a><ul>
<li><a class="reference internal" href="#install-pymupdf" id="id2">Install PyMuPDF</a></li>
<li><a class="reference internal" href="#install-pdftotext-python-lib" id="id3">Install pdftotext python lib</a></li>
<li><a class="reference internal" href="#install-pdftotext-command-line" id="id4">Install pdftotext command line</a></li>
<li><a class="reference internal" href="#install-pdfplumber" id="id5">Install pdfplumber</a></li>
<li><a class="reference internal" href="#other-requirements" id="id6">Other requirements</a></li>
</ul>
</li>
<li><a class="reference internal" href="#configuration" id="id7">Configuration</a></li>
<li><a class="reference internal" href="#bug-tracker" id="id8">Bug Tracker</a></li>
<li><a class="reference internal" href="#credits" id="id9">Credits</a><ul>
<li><a class="reference internal" href="#authors" id="id10">Authors</a></li>
<li><a class="reference internal" href="#contributors" id="id11">Contributors</a></li>
<li><a class="reference internal" href="#maintainers" id="id12">Maintainers</a></li>
</ul>
</li>
</ul>
</div>
<a name="installation"></a>
<h2><a class="toc-backref" href="#id1">Installation</a></h2>
<p>The most important technical component of this module is the tool that converts the PDF to text. Converting PDF to text is not an easy job. As outlined in this <a class="reference external" href="https://dida.do/blog/how-to-extract-text-from-pdf">blog post</a>, different tools can give quite different results. The best results are usually achieved with tools based on a PDF viewer, which exclude pure-python tools. But pure-python tools are easier to install than tools based on a PDF viewer. It is important to understand that, if you change the PDF to text tool, you will certainly have a slightly different text output, which may oblige you to update the field extraction rule, which can be time-consuming if you have already configured many vendors.</p>
<p>The module supports 4 different extraction methods:</p>
<ol class="arabic simple">
<li><a class="reference external" href="https://github.com/pymupdf/PyMuPDF">PyMuPDF</a> which is a Python binding for <a class="reference external" href="https://mupdf.com/">MuPDF</a>, a lightweight PDF toolkit/viewer/renderer published under the AGPL licence by the company <a class="reference external" href="https://artifex.com/">Artifex Software</a>.</li>
<li><a class="reference external" href="https://pypi.org/project/pdftotext/">pdftotext python library</a>, which is a python binding for the pdftotext tool.</li>
<li><a class="reference external" href="https://en.wikipedia.org/wiki/Pdftotext">pdftotext command line tool</a>, which is based on <a class="reference external" href="https://poppler.freedesktop.org/">poppler</a>, a PDF rendering library used by <a class="reference external" href="https://www.xpdfreader.com/">xpdf</a> and <a class="reference external" href="https://wiki.gnome.org/Apps/Evince/FrequentlyAskedQuestions">Evince</a> (the PDF reader of <a class="reference external" href="https://www.gnome.org/">Gnome</a>).</li>
<li><a class="reference external" href="https://pypi.org/project/pdfplumber/">pdfplumber</a>, which is a python library built on top the of the python library <a class="reference external" href="https://pypi.org/project/pdfminer.six/">pdfminer.six</a>. pdfplumber is a pure-python solution, so it's very easy to install on all OSes.</li>
</ol>
<p>PyMuPDF and pdftotext both give a very good text output. So far, I can't say which one is best. pdfplumber often gives lower-quality text output, but its advantage is that it's a pure-Python solution, so you will always be able to install it whatever your technical environnement is.</p>
<p>You can choose one extraction method and only install the tools/libs for that method.</p>
<a name="install-pymupdf"></a>
<h3><a class="toc-backref" href="#id2">Install PyMuPDF</a></h3>
<p>To install <strong>PyMuPDF</strong>, if you use Debian (Bullseye aka v11 or higher) or Ubuntu (20.04 or higher), run the following command:</p>
<pre class="code">
<code class="code">sudo apt install python3-fitz</code>
</pre>
<p>You can also install it via pip:</p>
<pre class="code">
<code class="code">sudo pip3 install --upgrade PyMuPDF</code>
</pre>
<p>but beware that <em>PyMuPDF</em> is just a binding on MuPDF, so it will require MuPDF and all the development libs required to compile the binding. That's why <em>PyMuPDF</em> is much easier to install via the packages of your Linux distribution (package name <strong>python3-fitz</strong> on Debian/Ubuntu, but the package name may be different in other distributions) than with pip.</p>
<a name="install-pdftotext-python-lib"></a>
<h3><a class="toc-backref" href="#id3">Install pdftotext python lib</a></h3>
<p>To install <strong>pdftotext python lib</strong>, run:</p>
<pre class="code">
<code class="code">sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev</code>
</pre>
<p>and then install the lib via pip:</p>
<pre class="code">
<code class="code">sudo pip3 install --upgrade pdftotext</code>
</pre>
<p>On OSes other than Debian/Ubuntu, follow the instructions on the <a class="reference external" href="https://github.com/jalan/pdftotext">project page</a>.</p>
<a name="install-pdftotext-command-line"></a>
<h3><a class="toc-backref" href="#id4">Install pdftotext command line</a></h3>
<p>To install <strong>pdftotext command line</strong>, run:</p>
<pre class="code">
<code class="code">sudo apt install poppler-utils</code>
</pre>
<a name="install-pdfplumber"></a>
<h3><a class="toc-backref" href="#id5">Install pdfplumber</a></h3>
<p>To install the <strong>pdfplumber</strong> python lib, run:</p>
<pre class="code">
<code class="code">sudo pip3 install --upgrade pdfplumber</code>
</pre>
<a name="other-requirements"></a>
<h3><a class="toc-backref" href="#id6">Other requirements</a></h3>
<p>This module also requires the following Python libraries:</p>
<ul class="simple">
<li><a class="reference external" href="https://pypi.org/project/regex/">regex</a> which is backward-compatible with the <em>re</em> module of the Python standard library, but has additional functionalities.</li>
<li><a class="reference external" href="https://github.com/scrapinghub/dateparser">dateparser</a> which is a powerful date parsing library.</li>
</ul>
<p>You can install these Python libraries via pip:</p>
<pre class="code">
<code class="code">sudo pip3 install --upgrade regex dateparser</code>
</pre>
<a name="configuration"></a>
<h2><a class="toc-backref" href="#id7">Configuration</a></h2>
<p>By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use <strong>PyMuPDF</strong>; if it fails (for example because the lib is not properly installed), then it will try to use the <strong>pdftotext python lib</strong>, if that one also fails, it will try to use <strong>pdftotext command line</strong> and, if it also fails, it will eventually try <strong>pdfplumber</strong>. If none of the 4 methods work, Odoo will display an error message.</p>
<p>If you want to force Odoo to use a specific text extraction method, go to the menu <em>Configuration > Technical > Parameters > System Parameters</em> and create a new System Parameter:</p>
<ul class="simple">
<li><em>Key</em>: <strong>invoice_import_simple_pdf.pdf2txt</strong></li>
<li><em>Value</em>: select the proper value for the method you want to use:<ol class="arabic">
<li>pymupdf</li>
<li>pdftotext.lib</li>
<li>pdftotext.cmd</li>
<li>pdfplumber</li>
</ol>
</li>
</ul>
<p>In this configuration, Odoo will only use the selected text extraction method and, if it fails, it will display an error message.</p>
<p>You will find a full demonstration about how to configure each Vendor and import the PDF invoices in this <a class="reference external" href="https://www.youtube.com/watch?v=edsEuXVyEYE">screencast</a>.</p>
<a name="bug-tracker"></a>
<h2><a class="toc-backref" href="#id8">Bug Tracker</a></h2>
<p>Bugs are tracked on <a class="reference external" href="https://github.com/OCA/edi/issues">GitHub Issues</a>.
In case of trouble, please check there if your issue has already been reported.
If you spotted it first, help us smashing it by providing a detailed and welcomed
<a class="reference external" href="https://github.com/OCA/edi/issues/new?body=module:%20account_invoice_import_simple_pdf%0Aversion:%2014.0%0A%0A**Steps%20to%20reproduce**%0A-%20...%0A%0A**Current%20behavior**%0A%0A**Expected%20behavior**">feedback</a>.</p>
<p>Do not contact contributors directly about support or help with technical issues.</p>
<a name="credits"></a>
<h2><a class="toc-backref" href="#id9">Credits</a></h2>
<a name="authors"></a>
<h3><a class="toc-backref" href="#id10">Authors</a></h3>
<ul class="simple">
<li>Akretion</li>
</ul>
<a name="contributors"></a>
<h3><a class="toc-backref" href="#id11">Contributors</a></h3>
<ul class="simple">
<li>Alexis de Lattre <<a class="reference external" href="mailto:alexis.delattre@akretion.com">alexis.delattre@akretion.com</a>></li>
</ul>
<a name="maintainers"></a>
<h3><a class="toc-backref" href="#id12">Maintainers</a></h3>
<p>This module is maintained by the OCA.</p>
<a class="reference external image-reference" href="https://odoo-community.org"><img alt="Odoo Community Association" src="https://odoo-community.org/logo.png" /></a>
<p>OCA, or the Odoo Community Association, is a nonprofit organization whose
mission is to support the collaborative development of Odoo features and
promote its widespread use.</p>
<p>Current <a class="reference external" href="https://odoo-community.org/page/maintainer-role">maintainer</a>:</p>
<p><a class="reference external" href="https://github.com/alexis-via"><img alt="alexis-via" src="https://github.com/alexis-via.png?size=40px" /></a></p>
<p>This module is part of the <a class="reference external" href="https://github.com/OCA/edi/tree/14.0/account_invoice_import_simple_pdf">OCA/edi</a> project on GitHub.</p>
<p>You are welcome to contribute. To learn how please visit <a class="reference external" href="https://odoo-community.org/page/Contribute">https://odoo-community.org/page/Contribute</a>.</p>