| Author |
Message |
So Cal
Joined: 01 Oct 2007 Posts: 6
|
Posted: Mon Oct 01, 2007 2:24 pm Post subject: doesn't see "mailto:xxx@xxx.com" in files |
|
|
Hi.
I just purchased the product and have the latest version.
I just realized that eMail Extractor has difficulty extracting email addresses from Word or PDF docs if the email address is a hyperlink such as "mailto:xxx@xxx.com."
I tried two workarounds:
1.) Create an exception stating that anything starting with, or contains "mailto:" would be valid. It didn't work.
2.) Save-as/convert the document to a *.txt file, then extract. However, if the visible content reads something "click here to email Joe," and the email address is in the code, i.e., "mailto:joe@xxx.com," that it will be lost/stripped out when converted to text only.
Any ideas?
Thanks,
Rob |
|
| Back to top |
|
 |
stanbusk Site Admin
Joined: 28 Dec 2005 Posts: 2268
|
Posted: Mon Oct 01, 2007 3:16 pm Post subject: |
|
|
| eMail Extractor can't process Word or PDF files directly, only plain text files. In the case of Word, save the document as plain text first. |
|
| Back to top |
|
 |
So Cal
Joined: 01 Oct 2007 Posts: 6
|
Posted: Mon Oct 01, 2007 3:56 pm Post subject: |
|
|
Thanks for the note, but FYI, that's not been my experience at all.
I can drag-and-drop Acrobat, Word, HTML files, etc. directly onto the eMail Extractor window and, it'll do a nice job extracting email addresses -- except, as I stated -- it does not recognize anything with a "mailto:" preface as in "mailto:xxx@xxx.com."
Seems odd because if someone's name just happened to be "mailto:xxx" you'd think it'd recognize it as a complete address. If that was the case, I could then run a search-and-replace filter to find "mailto:" and replace it with a null.
Anyone else have a thought? Or experience with creating a rule to allow as valid?
-Rob |
|
| Back to top |
|
 |
stanbusk Site Admin
Joined: 28 Dec 2005 Posts: 2268
|
Posted: Mon Oct 01, 2007 8:13 pm Post subject: |
|
|
| PDF and Word documents are binary files. They may contain text and a lot of junk as well. eMail Extractor has not been designed to process binary files but plain text files only. Despite it may work, it is not recommended at all. Also bear in mind that what you see on screen for a given binary file can be radically different at file level for eMail Extractor. If you have a *plain text* file with a "mailto:xxx" address eMail Extractor doesn't extract please open a support ticket and add the file to it. |
|
| Back to top |
|
 |
So Cal
Joined: 01 Oct 2007 Posts: 6
|
Posted: Tue Oct 02, 2007 5:56 pm Post subject: |
|
|
| I'll do that soon. No time right now. Thanks. |
|
| Back to top |
|
 |
|