diff options
author | Christian Cleberg <hello@cleberg.net> | 2025-02-24 20:29:56 -0600 |
---|---|---|
committer | Christian Cleberg <hello@cleberg.net> | 2025-02-24 20:29:56 -0600 |
commit | 3c43dffbd2455ca69f79b1601122057f3ed5054e (patch) | |
tree | fc20d9b4a093234e93de7a8f45cc7b13c2efa7de | |
parent | 8be42db8eca7931a3da077b80bb0d96d8bc2dda5 (diff) | |
download | cleberg.net-3c43dffbd2455ca69f79b1601122057f3ed5054e.tar.gz cleberg.net-3c43dffbd2455ca69f79b1601122057f3ed5054e.tar.bz2 cleberg.net-3c43dffbd2455ca69f79b1601122057f3ed5054e.zip |
publish email-migration.org
-rw-r--r-- | content/blog/2025-02-24-email-migration.org | 252 | ||||
-rw-r--r-- | theme/templates/index.html | 12 |
2 files changed, 257 insertions, 7 deletions
diff --git a/content/blog/2025-02-24-email-migration.org b/content/blog/2025-02-24-email-migration.org new file mode 100644 index 0000000..7f3d921 --- /dev/null +++ b/content/blog/2025-02-24-email-migration.org @@ -0,0 +1,252 @@ +#+date: <2025-02-25 Mon 19:20:05> +#+title: A Painful Email Migration +#+description: Some notes on my painful experience migrating emails out of the Proton Mail platform. +#+filetags: :email: +#+slug: email-migration + +* The Setup + +I recently migrated my emails from Proton Mail to Migadu after a failed attempt +to get myself into the Proton ecosystem and wanted to detail my process, as it +was far more painful than expected. + +To give some context: I had nearly 5000 messages stored, accounting for around +2.5 GB of space. + +Overall, this process would have taken all day had I done it in one sitting, but +I decided to break it up and it lasted a couple days before I was able to say that +all my messages were stored in my new account. + +* Exporting Messages + +To start, I needed to export my messages from Proton Mail. As I am using macOS, +I was able to use the [[https://proton.me/support/proton-mail-export-tool][Proton Mail Export Tool]]. However, the downside is that +this dumps every single email on your account into a single folder in the =.eml= +format. They also export a JSON file for each message, in the case you're +importing back into Proton Mail. + +This means that anything in my Inbox, Sent, Archive, Trash, and user-created +folders were all dumped out into a single folder with incomprehensible names. + +Without a clear path to easily figure out how to re-organize my emails into a +new account, I was left a bit annoyed at Proton's export process. + +* Importing Messages + +Left with a pile of messages and no way to discern what they were without +opening each one, I decided to try and use Thunderbird to import messages into +my new Migadu IMAP account. + +This led to a dead end as my two methods failed: + +1. [[https://addons.thunderbird.net/en-US/thunderbird/addon/importexporttools-ng/][ImportExportTools NG]] does not work with my version of Thunderbird (135). +2. Manually dragging the =.eml= files onto a folder in Thunderbird worked for + small batches of files, but seemed to lock up if I tried to import more than + a few hundred at a time. It also seemed a bit buggy, as I ended up with many + duplicate, and sometimes triplicate, messages. + +At this point, I decided to take a step back and use [[https://github.com/djcb/mu][mu]], a command-line utility +that would index my files and sync back and forth with Migadu for me. + +Using my blog post, [[https://cleberg.net/blog/mu4e.html][Email in Doom Emacs with Mu4e on macOS]], (and skipping the +mu4e parts) I was able to set up a minimal directory connected to my Migadu IMAP +account. Using my terminal, I simply moved all of my messages into the =mu= +directory and synchronized the account, and voila, my messages synchronized +successfully to the remote server and my other email clients. + +However, the remaining issue was that I now had all 5000 messages in the Archive +folder and needed to figure out how to organize them back into their proper +directories. + +* Organizing Messages Into Folders + +As with any problem, I used Python as my hammer to fix the problem. I started by +creating the directories required in Thunderbird, fetching them with =mbsync= so +that they appeared in my =mu= directory, and using Python to organize my +messages into the newly-created sub-folders. + +** Sent Messages + +I started by organizing my Sent messages. This required checking each file for +the =From= header and moving them to the Sent folder. + +#+begin_src shell +cd ~/.maildir/migadu/Archive/cur +nano _sent.py +#+end_src + +#+begin_src python +# _sent.py +import os +import glob +import shutil + +# Loop through all files in the current folder +for file in glob.glob("*.eml"): + # Create boolean to check if we should move the file + move = False + + # Open the current file + f = open(file, 'r') + + # For each line in file, find the From header + for line in f: + if line.startswith("From:"): + # If we find ourself, mark the message for move + if "user@example.com" in line: + move = True + + # Close the file + f.close() + + # Move the file, if marked for move + if move == True: + filepath = os.path.join("/Users/YOUR_USERNAME/.maildir/migadu/Archive/cur/", file) + new_filepath = os.path.join("/Users/YOUR_USERNAME/.maildir/migadu/Sent/cur/", file) + shutil.move(filepath, new_filepath) +#+end_src + +#+begin_src python +python3 _sent.py +#+end_src + +The only downside to my current approach is that it was the quick and dirty +option, so I re-ran it while editing the =user@example.com= string for each +email I wanted to move. If I had wanted to create a more well-defined solution, +I would have created an array of addresses to check for and have the =if= +statement check against that array. + +Regardless, I was able to run this with the addresses I wanted to move to the +Sent folder and was soon finished. + +** Archive Sub-Folders + +Next, I needed to move the remaining ~3000 messages from the Archive folder into +dated sub-folders, organized as such: + +- Archive/2016 +- ... +- Archive/2025 + + +To do this, I followed a similar approach as the method above but check for the +=Date= header instead of the =From= header. + +#+begin_src shell +cd ~/.maildir/migadu/Archive/cur +nano _archive.py +#+end_src + +This approach requires finding the =X-Pm-Date= header and splitting it by the +spaces contained within. Once split into a list, we must select the fourth +element, as that contains the year which will match the directory we should move +it to. + +For example, the header =X-Pm-Date: Fri, 07 Feb 2025 16:12:08 +0000= will be +split into a list as such: + +0. X-Pm-Date: +1. Fri, +2. 07 +3. Feb +4. 2025 +5. 16:12:08 +6. +0000 + +From this list, we select the fourth element (=2025=) and use that to build the +destination path. + +#+begin_src python +# _archive.py +import os +import glob +import shutil + +# Loop through all files in the sub-folders under Archive +for file in glob.glob("*.eml"): + # Create boolean to check if we should move the file + move = False + + # Open the current file + f = open(file, 'r') + + # For each line in file, find the X-Pm-Date header + for line in f: + if line.startswith("X-Pm-Date"): + # Split the line into a list by spaces; + # Then select the item that contains the year + year = line.split(" ")[4] + move = True + + # Close the file + f.close() + + # Move the file, if marked for move + if move == True: + filepath = os.path.join("/Users/YOUR_USERNAME/.maildir/migadu/Archive/cur/", file) + new_filepath = os.path.join(f"/Users/YOUR_USERNAME/.maildir/migadu/Archive/{year}/cur/", file) + shutil.move(filepath, new_filepath) +#+end_src + +#+begin_src python +python3 _archive.py +#+end_src + +At this point, we've now moved all Sent messages to the Sent box and organized +all messages under the Archive folder into their correct sub-folders. + +If you exported other files, such as files from your Inbox, Trash, etc., you +could follow a similar approach and determine the best header or attribute to +identify them for further organization. + +** Synchronize the Results + +Before synchronizing the files in their new locations, I needed to remove the +characters at the end of the file name since =mu= appends IDs to the end of file +names. + +#+begin_src shell +cd ~/.maildir/migadu/Archive +nano _sync_prep.py +#+end_src + +This script prepares the =Archive= sub-folders for synchronization, but the same +concept applies to the Sent folder, except you'd replace =*/cur/*= with =*= if +this script were inside the =Sent/cur= directory. + +#+begin_src python +import glob +import shutil + +# Loop through all files in the sub-folders under Archive +for file in glob.glob("*/cur/*"): + # Remove the characters at the end of the file name created by =mu= + new_file = file.split(",U=",1)[0] + # Move the file to the new file name + shutil.move(file, new_file) +#+end_src + +#+begin_src shell +python3 _sync_prep.py +#+end_src + +Finally, we can synchronize the results. + +#+begin_src shell +mbsync -aV +#+end_src + +* Removing Duplicates + +My only remaining issue at the time of writing is identifying and removing +duplicate messages. I have toyed with simple Python and command-line solutions +to identify duplicate files, but could not get them to effectively define all +the duplicates found in any specific directory. + +I've even tried using the [[https://github.com/pkolaczk/fclones][fclones]] utility, to no avail. It seems that something +in the Proton export, my manual Thunderbird method attempt, or possible sync +issues between Thunderbird -> Migadu <-> mu caused duplicates where content +within the message has been modified. + +Although I now seem to be wasting space and in need of a deduplication tool, I +have all of my messages migrated to my new service. diff --git a/theme/templates/index.html b/theme/templates/index.html index 198ef70..905bd57 100644 --- a/theme/templates/index.html +++ b/theme/templates/index.html @@ -9,6 +9,11 @@ uid [ultimate] Christian Cleberg <hello@cleberg.net></pre> <h2>Recent Blog Posts</h2> <div class="post"> + <time datetime="2025-02-24 19:20:05">2025-02-24</time> + <a href="/blog/email-migration.html" >A Painful Email Migration</a> + </div> + + <div class="post"> <time datetime="2025-02-11 11:40:00">2025-02-11</time> <a href="/blog/obscura-vpn.html">A Review of Obscura VPN: A Two-Party VPN</a> @@ -21,13 +26,6 @@ uid [ultimate] Christian Cleberg <hello@cleberg.net></pre> > </div> - <div class="post"> - <time datetime="2024-12-29 17:45:00">2024-12-29</time> - <a href="/blog/self-hosting-the-lounge.html" - >Self-Hosting The Lounge: An IRC Web Client</a - > - </div> - <br /> <a href="/blog/">All Posts →</a> </section> |