Labels

Sunday, January 29, 2012

Rename PDF's files pseudo-automatically with MsWord

As an engineer, many times I have to download scientific papers from internet, when I do this, the best thing would be that the download file would have a name like:

John Doe - The Importance of Scientific PDFs for Engineers.pdf

or, at least:

jdoe-scientificpdfsengineers.pdf

you know, something to make easy the desktop search afterwards, when you have a folder with maybe hundreds of papers. Instead, the name you get is:

018293857362394932.pdf
or: jd-ispe.pdf

and you don't have time to rewrite, or copy-past the author and title, because it comes with those annoying line breaks that pdf files have (plus asterisks,crosses, etc).
On the other hand, maybe the properties of the pdf come with the proper author and title, but many times this is not the case.

So I made up this simple macro for MsWord. I know it's not an optimal solution. But it gives good results in principle. Maybe I'll add some more features later, but for now this is somewhat helpful.

One example:
The article
The Effect of Age of Cochlear Implantation on Language
Growth in Infants and Toddlers
by
J. Bruce Tomblin, Linda Spencer, & Brittan Barker

Can be downloaded from here

And if you download it like that, the name will be "age-ci.pdf" (not very meaningful, right?)

Now, using this macro, in 3 steps you could have a name like:

"The Effect of Age of Cochlear Implantation on Language Growth in Infants and Toddlers J. Bruce Tomblin, Linda Spencer, Brittan Barker.pdf"

Which, at least to me, seams a LOT more meaningful and easy to find.

Sub CorrPDFtext()
'
' CorrPDFtext Macro
' Erases undesired characters from pdf pieces of text, in order to rename more easily a
' scientific paper
' Actually, it can be used to many other things, it is an automatic way of replacing large
' amounts of text, you just have to modify it a little
' Copyright Andres Mauricio Gonzalez Vargas, 2012

'-Instructions:
'First, you should open the pdf you are downloading on your browser
'Second, select manually the title and the name of the authors, altogether
'Third, Copy the selection and go to msword.
'Then you can call the "CorrPDFtext" sub
'-----------------

'The sub starts pasting what you have in the clipboard, (title and author, suposedly)
Selection.Paste
Selection.WholeStory
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting

'Here I call the Sub "DoReplace" given a lower and an upper limit,
'based on the ASCII code of the characters
DoReplace 0, 31 'ASCII control characters (character code 0-31)
'ASCII 32 is " " space
DoReplace 33, 43 'ASCII printable characters that are not letters
'DoReplace 41, 42 'ASCII printable characters that are not letters
'ASCII 44 is "," comma
'ASCII 45 is "-" hyphen
'ASCII 46 is "." period
DoReplace 47, 64 'ASCII printable characters that are not letters
DoReplace 91, 96 'ASCII printable characters that are not letters
DoReplace 123, 191 'ASCII printable characters (some letters with accents, actually could be useful)
'ASCII 192-255 are mostly letters with acute,tilde,dieresis, etc.

'The following is to get rid of multiple spaces left by the replacement process
For i = 1 To 3
With Selection.Find
.Text = " "
.Replacement.Text = " "
.Forward = True
.Wrap = wdFindStop 'This is to avoid the confirmation message box
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
Next i

'---------------------
'And finally cut everything and sent to clipboard, so you can paste it in the "save as" dialog
Selection.WholeStory
Selection.Cut
End Sub

Sub DoReplace(loLim As Double, upLim As Double)
'This sub is intended to replace the characters given by the upper and lower limits

For i = loLim To upLim
With Selection.Find
.Text = Chr(i) 'the text to be replaced is given by the current ASCII code
.Replacement.Text = " " 'The replacement string is an space, it could be empty,
'but I prefer not to, in order to not join separate words
.Forward = True
.Wrap = wdFindStop 'This way you avoid the confirmation window
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll

Next i


End Sub

'---------And that's it!



No comments:

Post a Comment