Lots of spurious partial addresses with Gmail Takeout mbox

#1
Hello,

I am using Email Extractor to get addresses from a mbox file generated by Google's data takeout function.

The process works, but I find that I get a lot of partial addresses - some truncated at the beginning and some at the end. Examples:

frances@somewhere.com
rances@somewhere.com
es@somewhere.com

donna@somewhere.com
na@somewhere.com

lpalomino@shmewhere.com
alomino@somewhere.com

somebody@otherplace.com
somebody@otherplace.co

I also see a lot of what appear to be SMTP message IDs:
c6ee96ed-3fad-4bcc-90c5-ddfd6a047688@somewhere.com
caj2qrmy1pyif6nv2swrphplopwn5erocmuneyp ... .gmail.com
f26551f62e62400b91a6ee2d6c35fb48@dm2pr0 ... utlook.com
dc115947bc87b943b8d95092c02cad53a63952e ... ompany.com

There's far too much junk in here for this to be usable. Is there any way of resolving this?
 

stanbusk

Administrator
Staff member
#2
Re: Lots of spurious partial addresses with Gmail Takeout mb

Why do you use mbox files? Why not saving the messages as plain text files from the software? That way you will generate plain text files.
 
#3
Re: Lots of spurious partial addresses with Gmail Takeout mb

The use of mbox files is dictated by the fact that I am working with a full mailbox export from gmail. That's the format they provide. Also, an mbox file is just a big text file.
 

stanbusk

Administrator
Staff member
#4
Re: Lots of spurious partial addresses with Gmail Takeout mb

Yes but the file is not right since you would not get that problem. eMail Extractor does nothing to the file, it just extracts emails. Emails are peaces of text between spaces, line breaks or some other few characters that contain a '@', a domain name and an extension. Look at you mbox file with a text editor, search for one of those addresses, how does it appear there?
 
Top