QNA > C > Come Leggere Il File Pdf Riga Per Riga Usando Python

Come leggere il file PDF riga per riga usando Python

Python può leggere i file PDF e stamparne il contenuto dopo averne estratto il testo. Per questo dobbiamo prima installare il modulo richiesto che è PyPDF2. Di seguito è riportato il comando per installare il modulo. You should have pip already installed in your python environment.

pip install pypdf2

On successful installation of this module we can read PDF files using the methods available in the module.

Reading Single Page

import PyPDF2
pdfName = 'path\xyz.pdf'
read_pdf = PyPDF2.PdfFileReader(pdfName)
page = read_pdf.getPage(0)
page_content = page.extractText()
print page_content

When we run the above program, we get the output

Reading Multiple Pages

To read a pdf with multiple pages and print each of the page with a page number we use the a loop with getPageNumber() function. Nell'esempio seguente abbiamo il file PDF che ha due pagine. The contents are printed under two separate page headings.

import PyPDF2
pdfName = 'Path\xyz2.pdf'
read_pdf = PyPDF2.PdfFileReader(pdfName)
for i in xrange(read_pdf.getNumPages()):
page = read_pdf.getPage(i)
print 'Page No - ' + str(1+read_pdf.getPageNumber(page))
page_content = page.extractText()
print page_content

Thanks for reading, and as always, be sure to reach out with any questions! Follow Jayasimha Kv

Di Emeline

Articoli simili

Qual è il tasto di scelta rapida per creare un nuovo file? :: Can you run multiple python scripts at once?