Search 1.9 billion lines of Odoo code on GitHub

account_invoice_import_simple_pdf

Author: Akretion,Odoo Community Association (OCA)
License: AGPL-3
Branch: simahawk-patch-1
Repository: brain-tec/edi
Dependencies: account_invoice_import
Languages: HTML (452, 14.9%), PO File (447, 14.7%), Python (1659, 54.6%), XML (315, 10.4%), and reStructuredText (165, 5.4%)
Other branches: 14.0
Other repositories: Change2improve/edi, ForgeFlow/edi, OCA/edi, TDu/edi, acsone/edi, akretion/edi, aurestic/edi, camptocamp/edi, flotho/edi, invitu/edi, sebalix/edi, simahawk/edi, steingabelgaard/edi, and tegin/edi

<h1 class="title">Account Invoice Import Simple PDF</h1> <p><a class="reference external" href="https://odoo-community.org/page/development-status"><img alt="Beta" src="https://img.shields.io/badge/maturity-Beta-yellow.png" /></a> <a class="reference external" href="http://www.gnu.org/licenses/agpl-3.0-standalone.html"><img alt="License: AGPL-3" src="https://img.shields.io/badge/licence-AGPL--3-blue.png" /></a> <a class="reference external" href="https://github.com/OCA/edi/tree/14.0/account_invoice_import_simple_pdf"><img alt="OCA/edi" src="https://img.shields.io/badge/github-OCA%2Fedi-lightgray.png?logo=github" /></a> <a class="reference external" href="https://translation.odoo-community.org/projects/edi-14-0/edi-14-0-account_invoice_import_simple_pdf"><img alt="Translate me on Weblate" src="https://img.shields.io/badge/weblate-Translate%20me-F47D42.png" /></a> <a class="reference external" href="https://runbot.odoo-community.org/runbot/226/14.0"><img alt="Try me on Runbot" src="https://img.shields.io/badge/runbot-Try%20me-875A7B.png" /></a></p> <p>This module is an extension of the module <em>account_invoice_import</em>: it adds support for simple PDF invoices i.e. PDF invoice that don't have an embedded XML file. This module has been developped to solve the drawbacks of the OCA module <strong>account_invoice_import_invoice2data</strong> ; its advantages are the following:</p> <ul class="simple"> <li>Possibility to add support for a new vendor without developper skills: the accountant can do it!</li> <li>Adding support for a new vendor is faster.</li> <li>More tolerance on vendor invoice layout changes.</li> <li>Easier to install.</li> </ul> <p>With this module, you can import all the invoices that you were able to import with the module <em>account_invoice_import_invoice2data</em>. In fact, this module uses the same design when importing a PDF vendor bill:</p> <ol class="arabic simple"> <li>raw text extraction of the PDF file,</li> <li>identify the partner using the VAT number (if the VAT number is present in the raw text extraction) or some keywords,</li> <li>use regular expressions (regex) to extract the data needed to create the vendor bill in Odoo (single line configuration).</li> </ol> <p>The main difference with the OCA module <em>account_invoice_import_invoice2data</em> is that the regular expressions are auto-generated from the configuration made by the user in Odoo. No need to be a regex expert! But you can still write regex to extract some fields for some very specific needs.</p> <p>The module can extract the following fields:</p> <ul class="simple"> <li>Total Amount with taxes</li> <li>Total Untaxed Amount</li> <li>Total Tax Amount</li> <li>Invoice Date</li> <li>Due Date</li> <li>Start Date</li> <li>End Date</li> <li>Invoice Number</li> <li>Description (for that field, you have to write a regex)</li> </ul> <p>In this list, only 3 fields are required:</p> <ul class="simple"> <li>Invoice Date</li> <li>2 out of the 3 Amount fields (the 3rd can be deducted from the 2 others: Total Amount = Total Untaxed + Total Tax)</li> </ul> <p>To take advantage of the fields <em>Start Date</em> and <em>End Date</em>, you need the OCA module <em>account_invoice_start_end_dates</em> from the <a class="reference external" href="https://github.com/OCA/account-closing">account-closing</a> project.</p> <p>To know the full story behind the development of this module, read <a class="reference external" href="https://akretion.com/en/blog/new-opensource-pdf-invoice-import-module-for-odoo">Akretion's blog post</a>.</p> <p><strong>Table of contents</strong></p> <div class="contents local topic" id="contents"> <ul class="simple"> <li><a class="reference internal" href="#installation" id="id1">Installation</a><ul> <li><a class="reference internal" href="#install-pymupdf" id="id2">Install PyMuPDF</a></li> <li><a class="reference internal" href="#install-pdftotext-python-lib" id="id3">Install pdftotext python lib</a></li> <li><a class="reference internal" href="#install-pdftotext-command-line" id="id4">Install pdftotext command line</a></li> <li><a class="reference internal" href="#install-pdfplumber" id="id5">Install pdfplumber</a></li> <li><a class="reference internal" href="#other-requirements" id="id6">Other requirements</a></li> </ul> </li> <li><a class="reference internal" href="#configuration" id="id7">Configuration</a></li> <li><a class="reference internal" href="#bug-tracker" id="id8">Bug Tracker</a></li> <li><a class="reference internal" href="#credits" id="id9">Credits</a><ul> <li><a class="reference internal" href="#authors" id="id10">Authors</a></li> <li><a class="reference internal" href="#contributors" id="id11">Contributors</a></li> <li><a class="reference internal" href="#maintainers" id="id12">Maintainers</a></li> </ul> </li> </ul> </div> <a name="installation"></a> <h2><a class="toc-backref" href="#id1">Installation</a></h2> <p>The most important technical component of this module is the tool that converts the PDF to text. Converting PDF to text is not an easy job. As outlined in this <a class="reference external" href="https://dida.do/blog/how-to-extract-text-from-pdf">blog post</a>, different tools can give quite different results. The best results are usually achieved with tools based on a PDF viewer, which exclude pure-python tools. But pure-python tools are easier to install than tools based on a PDF viewer. It is important to understand that, if you change the PDF to text tool, you will certainly have a slightly different text output, which may oblige you to update the field extraction rule, which can be time-consuming if you have already configured many vendors.</p> <p>The module supports 4 different extraction methods:</p> <ol class="arabic simple"> <li><a class="reference external" href="https://github.com/pymupdf/PyMuPDF">PyMuPDF</a> which is a Python binding for <a class="reference external" href="https://mupdf.com/">MuPDF</a>, a lightweight PDF toolkit/viewer/renderer published under the AGPL licence by the company <a class="reference external" href="https://artifex.com/">Artifex Software</a>.</li> <li><a class="reference external" href="https://pypi.org/project/pdftotext/">pdftotext python library</a>, which is a python binding for the pdftotext tool.</li> <li><a class="reference external" href="https://en.wikipedia.org/wiki/Pdftotext">pdftotext command line tool</a>, which is based on <a class="reference external" href="https://poppler.freedesktop.org/">poppler</a>, a PDF rendering library used by <a class="reference external" href="https://www.xpdfreader.com/">xpdf</a> and <a class="reference external" href="https://wiki.gnome.org/Apps/Evince/FrequentlyAskedQuestions">Evince</a> (the PDF reader of <a class="reference external" href="https://www.gnome.org/">Gnome</a>).</li> <li><a class="reference external" href="https://pypi.org/project/pdfplumber/">pdfplumber</a>, which is a python library built on top the of the python library <a class="reference external" href="https://pypi.org/project/pdfminer.six/">pdfminer.six</a>. pdfplumber is a pure-python solution, so it's very easy to install on all OSes.</li> </ol> <p>PyMuPDF and pdftotext both give a very good text output. So far, I can't say which one is best. pdfplumber often gives lower-quality text output, but its advantage is that it's a pure-Python solution, so you will always be able to install it whatever your technical environnement is.</p> <p>You can choose one extraction method and only install the tools/libs for that method.</p> <a name="install-pymupdf"></a> <h3><a class="toc-backref" href="#id2">Install PyMuPDF</a></h3> <p>To install <strong>PyMuPDF</strong>, if you use Debian (Bullseye aka v11 or higher) or Ubuntu (20.04 or higher), run the following command:</p> <pre class="code"> <code class="code">sudo apt install python3-fitz</code> </pre> <p>You can also install it via pip:</p> <pre class="code"> <code class="code">sudo pip3 install --upgrade PyMuPDF</code> </pre> <p>but beware that <em>PyMuPDF</em> is just a binding on MuPDF, so it will require MuPDF and all the development libs required to compile the binding. That's why <em>PyMuPDF</em> is much easier to install via the packages of your Linux distribution (package name <strong>python3-fitz</strong> on Debian/Ubuntu, but the package name may be different in other distributions) than with pip.</p> <a name="install-pdftotext-python-lib"></a> <h3><a class="toc-backref" href="#id3">Install pdftotext python lib</a></h3> <p>To install <strong>pdftotext python lib</strong>, run:</p> <pre class="code"> <code class="code">sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev</code> </pre> <p>and then install the lib via pip:</p> <pre class="code"> <code class="code">sudo pip3 install --upgrade pdftotext</code> </pre> <p>On OSes other than Debian/Ubuntu, follow the instructions on the <a class="reference external" href="https://github.com/jalan/pdftotext">project page</a>.</p> <a name="install-pdftotext-command-line"></a> <h3><a class="toc-backref" href="#id4">Install pdftotext command line</a></h3> <p>To install <strong>pdftotext command line</strong>, run:</p> <pre class="code"> <code class="code">sudo apt install poppler-utils</code> </pre> <a name="install-pdfplumber"></a> <h3><a class="toc-backref" href="#id5">Install pdfplumber</a></h3> <p>To install the <strong>pdfplumber</strong> python lib, run:</p> <pre class="code"> <code class="code">sudo pip3 install --upgrade pdfplumber</code> </pre> <a name="other-requirements"></a> <h3><a class="toc-backref" href="#id6">Other requirements</a></h3> <p>This module also requires the following Python libraries:</p> <ul class="simple"> <li><a class="reference external" href="https://pypi.org/project/regex/">regex</a> which is backward-compatible with the <em>re</em> module of the Python standard library, but has additional functionalities.</li> <li><a class="reference external" href="https://github.com/scrapinghub/dateparser">dateparser</a> which is a powerful date parsing library.</li> </ul> <p>You can install these Python libraries via pip:</p> <pre class="code"> <code class="code">sudo pip3 install --upgrade regex dateparser</code> </pre> <a name="configuration"></a> <h2><a class="toc-backref" href="#id7">Configuration</a></h2> <p>By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use <strong>PyMuPDF</strong>; if it fails (for example because the lib is not properly installed), then it will try to use the <strong>pdftotext python lib</strong>, if that one also fails, it will try to use <strong>pdftotext command line</strong> and, if it also fails, it will eventually try <strong>pdfplumber</strong>. If none of the 4 methods work, Odoo will display an error message.</p> <p>If you want to force Odoo to use a specific text extraction method, go to the menu <em>Configuration &gt; Technical &gt; Parameters &gt; System Parameters</em> and create a new System Parameter:</p> <ul class="simple"> <li><em>Key</em>: <strong>invoice_import_simple_pdf.pdf2txt</strong></li> <li><em>Value</em>: select the proper value for the method you want to use:<ol class="arabic"> <li>pymupdf</li> <li>pdftotext.lib</li> <li>pdftotext.cmd</li> <li>pdfplumber</li> </ol> </li> </ul> <p>In this configuration, Odoo will only use the selected text extraction method and, if it fails, it will display an error message.</p> <p>You will find a full demonstration about how to configure each Vendor and import the PDF invoices in this <a class="reference external" href="https://www.youtube.com/watch?v=edsEuXVyEYE">screencast</a>.</p> <a name="bug-tracker"></a> <h2><a class="toc-backref" href="#id8">Bug Tracker</a></h2> <p>Bugs are tracked on <a class="reference external" href="https://github.com/OCA/edi/issues">GitHub Issues</a>. In case of trouble, please check there if your issue has already been reported. If you spotted it first, help us smashing it by providing a detailed and welcomed <a class="reference external" href="https://github.com/OCA/edi/issues/new?body=module:%20account_invoice_import_simple_pdf%0Aversion:%2014.0%0A%0A**Steps%20to%20reproduce**%0A-%20...%0A%0A**Current%20behavior**%0A%0A**Expected%20behavior**">feedback</a>.</p> <p>Do not contact contributors directly about support or help with technical issues.</p> <a name="credits"></a> <h2><a class="toc-backref" href="#id9">Credits</a></h2> <a name="authors"></a> <h3><a class="toc-backref" href="#id10">Authors</a></h3> <ul class="simple"> <li>Akretion</li> </ul> <a name="contributors"></a> <h3><a class="toc-backref" href="#id11">Contributors</a></h3> <ul class="simple"> <li>Alexis de Lattre &lt;<a class="reference external" href="mailto:alexis.delattre&#64;akretion.com">alexis.delattre&#64;akretion.com</a>&gt;</li> </ul> <a name="maintainers"></a> <h3><a class="toc-backref" href="#id12">Maintainers</a></h3> <p>This module is maintained by the OCA.</p> <a class="reference external image-reference" href="https://odoo-community.org"><img alt="Odoo Community Association" src="https://odoo-community.org/logo.png" /></a> <p>OCA, or the Odoo Community Association, is a nonprofit organization whose mission is to support the collaborative development of Odoo features and promote its widespread use.</p> <p>Current <a class="reference external" href="https://odoo-community.org/page/maintainer-role">maintainer</a>:</p> <p><a class="reference external" href="https://github.com/alexis-via"><img alt="alexis-via" src="https://github.com/alexis-via.png?size=40px" /></a></p> <p>This module is part of the <a class="reference external" href="https://github.com/OCA/edi/tree/14.0/account_invoice_import_simple_pdf">OCA/edi</a> project on GitHub.</p> <p>You are welcome to contribute. To learn how please visit <a class="reference external" href="https://odoo-community.org/page/Contribute">https://odoo-community.org/page/Contribute</a>.</p>