Golang wrapper for Exiftool : extract as much metadata as possible (EXIF, ...) from files (pictures, pdf, office documents, ...)

Overview

go-exiftool

Mentioned in Awesome Go Build Status go report card GoDoc codecov

go-exiftool is a golang library that wraps ExifTool.

ExifTool's purpose is to extract as much metadata as possible (EXIF, IPTC, XMP, GPS, ...) from a lots of differents file types (Office documents, pictures, movies, PDF, ...).

go-exiftool uses ExifTool's stay_open feature to optimize performance.

Requirements

go-exiftool needs ExifTool to be installed.

  • On Debian : sudo apt-get install exiftool

Usage

et, err := exiftool.NewExiftool()
if err != nil {
    fmt.Printf("Error when intializing: %v\n", err)
    return
}
defer et.Close()

fileInfos := et.ExtractMetadata("testdata/20190404_131804.jpg")

for _, fileInfo := range fileInfos {
    if fileInfo.Err != nil {
        fmt.Printf("Error concerning %v: %v\n", fileInfo.File, fileInfo.Err)
        continue
    }

    for k, v := range fileInfo.Fields {
        fmt.Printf("[%v] %v\n", k, v)
    }
}

Output :

[FOV] 69.4 deg
[Orientation] Rotate 90 CW
[ColorSpace] sRGB
[Compression] JPEG (old-style)
[YCbCrSubSampling] YCbCr4:2:2 (2 1)
[Aperture] 1.7
[ColorComponents] 3
[SubSecCreateDate] 2019:04:04 13:18:03.0937
[FileSize] 26 kB
[FileAccessDate] 2019:05:17 22:44:26+02:00
[DateTimeOriginal] 2019:04:04 13:18:03
[CreateDate] 2019:04:04 13:18:03
(...)

Changelog

Comments
  • Support writing metadata to files + other improvements/fixes

    Support writing metadata to files + other improvements/fixes

    Other improvement/fixes (each in own commit):

    • only set -common_args option if there are addition init args
    • fix typo
    • wait for exiftool command to exit on Close()
    • Run tests that call NewExiftool in parallel (for me, test runtimes dropped from 3-4s to ~1s)
    • Run CI on push and pull requests

    If you'd prefer, I'm happy to break this PR out into multiple smaller PRs.

    Write 
    opened by dhui 9
  • Supply custom extraInitArgs

    Supply custom extraInitArgs

    This is probably less of an issue, and more of me being a go noob, so please excuse the ask but I've been running myself in circles and hoping you can help me out.

    I want to pass the '-ee' flag to exiftool. I have some files that the date I am looking for only shows with the -ee flag. Bellow is my call to NewExifTool. I get:

    s.extraInitArgs undefined (type *exiftool.Exiftool has no field or method extraInitArgs)

    Am I doing something wrong, or is this just not an option with go-exiftool?

    	et, err := exiftool.NewExiftool(func(s *exiftool.Exiftool) error {
    		s.extraInitArgs = append(s.extraInitArgs, "-ee")
    		return nil
    		})
    	if err != nil {
    		fmt.Printf("Error initializing %v\n", err)
    	}
    
    opened by shadow431 9
  • Support Windows and MacOS platforms

    Support Windows and MacOS platforms

    Each platform has different line breaking characters, so it stucks on

    if !e.scanout.Scan() {
      fms[i].Err = fmt.Errorf("nothing on stdout")
      continue
    }
    

    waiting for readyToken from the scanner. I issued this problem on Windows.

    opened by PROger4ever 9
  • Add debug logging option

    Add debug logging option

    It's a little hard to understand how the package is exercising exiftool, this debug option shows exactly what is happening on the wrapped shell and can help troubleshoot the CLI usage.

    Sample Output

    executing shell command:
    	 /opt/homebrew/bin/exiftool -stay_open True [email protected] -
    sending to exiftool STDIN:
    	-j
    	/path/to/file.jpg
    	-execute
    
    opened by jnathanh 7
  • ExtractMetadata becomes non-responsive after individual error

    ExtractMetadata becomes non-responsive after individual error

    Hi, thanks for sharing this package. I ran into an error case trying to do a filepath.Walk over my photos directory. Manually skipping directories does avoid this issue, but it seems like the wrong behavior for ExtractMetadata to silently start giving incorrect results as a result of a previous call. I'd like to incorporate this package into some of my workflows, so I'm raising the issue to help improve the package's usability/reliability. Here is a test that reproduces the behavior:

    func TestNonResponsiveExiftool(t *testing.T) {
    	t.Parallel()
    
    	// create test directory with about 200MB worth of photos
    	// 10,000 of the included jpg or 6 30MB CR2's both repro
    	dirPath := "./testdata/extractkiller"
    
    	err := os.Mkdir(dirPath, 0744)
    	require.NoError(t, err)
    	defer os.RemoveAll(dirPath)
    
    	for i:=0;i < 10_000;i++ {
    		err := copyFile("./testdata/20190404_131804.jpg", path.Join(dirPath, fmt.Sprintf("%d.jpg", i)))
    		require.NoError(t, err)
    	}
    
    
    	// initialize exiftool
    	e, err := NewExiftool(Debug())
    	require.NoError(t, err)
    	defer e.Close()
    
    
    	// control case, everything working normally
    	f := e.ExtractMetadata("./testdata/20190404_131804.jpg")
    	assert.Equal(t, 1, len(f))
    	assert.NoError(t, f[0].Err)
    
    	// case that breaks the session, reading a directory does not seem to be supported by this package, which is okay, and I would expect it to throw an error in that case, but it should not make future reads invalid.
    	f = e.ExtractMetadata(dirPath)
    	assert.Error(t, f[0].Err)
    
    	// control case no longer works
    	f = e.ExtractMetadata("./testdata/20190404_131804.jpg")
    	assert.Equal(t, 1, len(f))
    	assert.NoError(t, f[0].Err)
    
    }
    

    I suspect the issue is either a race (scanning output before it's finished), or improper buffer usage (overflow maybe?). I ran the same test with exiftool directly and got output right away without issue.

    One other note, it looks like exiftool supports paths to a directory as an input, or even a list of multiple files. It might be worth reworking the design to let exiftool handle the work of managing multiple paths (directories or files), and add support for parsing the multi-file output.

    I'll try to work on a PR for this, but posting the issue now as a heads up, and to see if you have any suggestions on the right design to fix the issue.

    bug 
    opened by jnathanh 7
  • Merge stdout and stderr to see errors easier

    Merge stdout and stderr to see errors easier

    Problem

    When exiftool reports about errors, it prints them to stderr. But go-exiftool pipes only stdout, so there's no way to find out what happened.

    Solution

    Let's merge stdout and stderr:

    cmd := exec.Command(binary, initArgs...)
    r, w := io.Pipe()
    e.stdMergedOut = r
    
    cmd.Stdout = w
    cmd.Stderr = w
    
    var err error
    if e.stdin, err = cmd.StdinPipe(); err != nil {
      return nil, fmt.Errorf("error when piping stdin: %w", err)
    }
    
    e.scanMergedOut = bufio.NewScanner(r)
    e.scanMergedOut.Split(splitReadyToken)
    

    It allows us to see error output when unmarshalling json:

    if err := json.Unmarshal(e.scanMergedOut.Bytes(), &m); err != nil {
      fms[i].Err = fmt.Errorf("error during unmarshaling (%v): %w)", e.scanMergedOut.Bytes(), err)
      continue
    }
    

    Warning

    Closing and freeing resources statements should be reviewed after me...

    opened by PROger4ever 6
  • Does not work with big amount of files

    Does not work with big amount of files

    I'm looping big ~10k files using this call for each iteration: fileInfos := et.ExtractMetadata(file)

    After the ~7k loops the program hangs. I debugged a bit and found that it hangs in https://github.com/barasher/go-exiftool/blob/master/exiftool.go#L121 on the line: fmt.Fprintln(e.stdin, executeArg)

    i tried to run the same file in 10k loops. Works fine. With the different files does not work.

    Can it be that e.stdin is overflowed?

    opened by asannikov 5
  • Error when scanning files with special chars in filename

    Error when scanning files with special chars in filename

    Hi,

    I'm currently using go-exiftool on windows, and everything is working fine so far, except when trying to call ExtractMetadata() on files where the path includes special characters, e.g. foö.jpg.

    error during unmarshaling (Error: File not found - test/foö.jpg
    ): invalid character 'E' looking for beginning of value)
    

    The same works fine when using exiftool.pl directly on the command line. Not sure whether that problem is limited to windows or would also appear on Mac or Linux...

    Best regards, Philipp

    opened by Blesmol 4
  • Fix supporting of Windows and MacOS platforms

    Fix supporting of Windows and MacOS platforms

    In #7 we supported Windows and MacOS platforms by using platform-specific line breakers. But tests used its own variable with readyToken, so they failed. Now it is fixed.

    Also I found a filename encoding problem on Windows: exiftool says file not found in stay_open mode. I fixed it by setting filename encoding explicitly.

    Now go-exiftool passes its tests on Linux and Windows (I have no MacOS to test on).

    opened by PROger4ever 4
  • Issue 6 buffer problem

    Issue 6 buffer problem

    An error nothing on stdout is occurred in case the token buffer is overflowed. This error comes form this part of the code:

    if !e.scanout.Scan() {
        fms[i].Err = fmt.Errorf("nothing on stdout")
        continue
    }
    

    And in core it comes from this part of the code: https://github.com/golang/go/blob/master/src/bufio/scan.go#L193

    opened by asannikov 4
  • fix scanner split infinite loop cause by extftool error output

    fix scanner split infinite loop cause by extftool error output

    if exiftool parse file error, maybe return unknow string (This is usually caused by exifTool error output). e.g. filehash: 8e2a7aaeeee829f77b1a1029b9f7524879bbe399 outpuut: 'x' outside of string in unpack at /usr/share/perl5/vendor_perl/Image/ExifTool.pm line 5059.

    opened by mel2oo 3
  • About stay_open using multiprocessing

    About stay_open using multiprocessing

    If I have []byte data from internet and now I have to save it a file and then call ExtractMetadataInfo(filename). Does the stay_open pattern support input []byte data? Such as func ExtractMetadataInfo(data []byte)(rst FileMetadata,err error){ }

    opened by jishi92 1
  • Custom extraInitArgs

    Custom extraInitArgs

    Would you be open to a pull request that allowed a user of the module to set their own extraInitArgs? Something like:

    func Args(args ...string) func(*Exiftool) error {
    	return func(e *Exiftool) error {
    		e.extraInitArgs = append(e.extraInitArgs, args...)
    		return nil
    	}
    }
    

    That way a user that knows what they're doing could put any fields they want. For example if I wanted to get the Orientation tag as a number.

    et, err := exiftool.NewExiftool(
    	exiftool.Args("-Orientation#"),
    )
    
    enhancement 
    opened by agorman 3
  • Binary data extraction

    Binary data extraction

    ExtractAllBinaryMetadata option extracts binary metadata among other metadata. Exiftool encode binary values using base64, which is not optimized when you're only interested in the binary fileds.

    The idea is to add a tool that extracts binary metadata directly as []byte instead of base64 string.

    enhancement binary 
    opened by barasher 1
Owner
null
goldmark-pdf is a renderer for goldmark that allows rendering to PDF.

A PDF renderer for the goldmark markdown parser.

Stephen Afam-Osemene 91 Oct 21, 2022
Create all possible binaries from go files

nextBuild.go Create all possible binaries of a project in go ChangeLog 0.0.1 ─ First release. Flags You can alter a few things when creating the binar

FlamesX128 3 Dec 16, 2021
Create all possible binaries from go files

nextBuild.go Create all possible binaries of a project in go ChangeLog 0.0.1 ─ First release. Flags You can alter a few things when creating the binar

FlamesX128 3 Dec 16, 2021
A Docker-powered stateless API for PDF files.

Gotenberg provides a developer-friendly API to interact with powerful tools like Chromium and LibreOffice to convert many documents (HTML, Markdown, Word, Excel, etc.) to PDF, transform them, merge them, and more!

Gotenberg 4.3k Dec 8, 2022
Easily create & extract archives, and compress & decompress files of various formats

archiver Introducing Archiver 3.1 - a cross-platform, multi-format archive utility and Go library. A powerful and flexible library meets an elegant CL

Matt Holt 3.8k Nov 30, 2022
This is a tool to extract TODOs, NOTEs etc or search user provided terms from given files and/or directories.

ado This is a tool to extract TODOs, NOTEs etc or user provided terms from given files and/or directories. DEPRECIATED: My project seek has cleaner co

Meelis Utt 0 Aug 11, 2022
Convert document to pdf with golang

Convert document to pdf Build docker: docker build --pull --rm -f "Dockerfile" -t convertdocument:latest "." docker run -p 3000:3000 registry.gitlab.

null 0 Nov 29, 2021
RtxTest - Extract this zip file into your golang development environment

Documentation 1. Clone or extract file extract this zip file into your golang de

Abdul Rauf 1 May 12, 2022
A PDF processor written in Go.

pdfcpu: a Go PDF processor pdfcpu is a PDF processing library written in Go supporting encryption. It provides both an API and a CLI. Supported are al

pdfcpu 3.5k Dec 7, 2022
A simple library for generating PDF written in Go lang

gopdf gopdf is a simple library for generating PDF document written in Go lang. Features Unicode subfont embedding. (Chinese, Japanese, Korean, etc.)

Signin Technology 1.8k Dec 3, 2022
A PDF document generator with high level support for text, drawing and images

GoFPDF document generator Package go-pdf/fpdf implements a PDF document generator with high level support for text, drawing and images. Features UTF-8

null 161 Nov 10, 2022
PDF tools for reMarkable tablets

rm-pdf-tools - PDF tools for reMarkable Disclaimer: rm-pdf-tools is currently in a very early version, bugs are to be expected. Furthermore, the inten

Niels Saurer 13 Oct 14, 2022
A command line tool for mainly exporting logbook records from Google Spreadsheet to PDF file in EASA format

Logbook CLI This is a command line tool for mainly exporting logbook records from Google Spreadsheet to PDF file in EASA format. It also supports rend

Vladimir Simakhin 0 Feb 6, 2022
PDF file parser

#pdf A pdf document parsing and modifying library The libary provides functions to parse and show elements in PDF documents. It checks the validity

null 0 Nov 7, 2021
create PDF from ASCII File for Cable labels

CableLable create PDF from ASCII File for Cable labels file format is one label per line, a line containing up to 3 words, each word is a line on the

null 0 Nov 8, 2021
Ghostinthepdf - This is a small tool that helps to embed a PostScript file into a PDF

This is a small tool that helps to embed a PostScript file into a PDF in a way that GhostScript will run the PostScript code during the

Emil Lerner 134 Nov 9, 2022
Read data from rss, convert in pdf and send to kindle. Amazon automatically convert them in azw3.

Kindle-RSS-PDF-AZW3 The Kindle RSS PDF AZW3 is a personal project. The Kindle RSS PDF AZW3 is a personal project. I received a Kindle for Christmas, a

Elia 0 Jan 10, 2022
Go-wk - PDF Generation API with wkhtmltopdf

Simple PDF Generation API with wkhtmltopdf Quick start Clone the repo locally an

Gustavo Andrioli 0 Jan 25, 2022
Newser is a simple utility to generate a pdf with you favorite news articles

Newser A simple utility to crawl some news sites or other resources and download content into a pdf Building Make sure you have config.yaml setup and

Nenad 80 Nov 9, 2022