I found this method (sure is not optimal, better ideas?)
First create a file where results will be stored
$ touch /tmp/hits
Then list the files in your folder send the to pdftotext and grep the result. After each grep we print the name of the file (so the name is AFTER the results). In this case I was searching for papers mentioning "Tensegrity"
ls -1 | parallel 'pdftotext {1} - | echo {1}: $(grep -i tensegrity) >> /tmp/hits'Finally you can explore the results in the /tmp/hits file. Note that this uses the GNU parallel command.
A useful command to explore the results is
cat /tmp/hits | sed -n -e '/:$/!p'This will show only the files indeed contain the string we searched for.
Useful?
No hay comentarios:
Publicar un comentario