aboutsummaryrefslogtreecommitdiff
path: root/content/blog/2025-02-24-email-migration.org
diff options
context:
space:
mode:
authorChristian Cleberg <hello@cleberg.net>2025-02-24 20:29:56 -0600
committerChristian Cleberg <hello@cleberg.net>2025-02-24 20:29:56 -0600
commit3c43dffbd2455ca69f79b1601122057f3ed5054e (patch)
treefc20d9b4a093234e93de7a8f45cc7b13c2efa7de /content/blog/2025-02-24-email-migration.org
parent8be42db8eca7931a3da077b80bb0d96d8bc2dda5 (diff)
downloadcleberg.net-3c43dffbd2455ca69f79b1601122057f3ed5054e.tar.gz
cleberg.net-3c43dffbd2455ca69f79b1601122057f3ed5054e.tar.bz2
cleberg.net-3c43dffbd2455ca69f79b1601122057f3ed5054e.zip
publish email-migration.org
Diffstat (limited to 'content/blog/2025-02-24-email-migration.org')
-rw-r--r--content/blog/2025-02-24-email-migration.org252
1 files changed, 252 insertions, 0 deletions
diff --git a/content/blog/2025-02-24-email-migration.org b/content/blog/2025-02-24-email-migration.org
new file mode 100644
index 0000000..7f3d921
--- /dev/null
+++ b/content/blog/2025-02-24-email-migration.org
@@ -0,0 +1,252 @@
+#+date: <2025-02-25 Mon 19:20:05>
+#+title: A Painful Email Migration
+#+description: Some notes on my painful experience migrating emails out of the Proton Mail platform.
+#+filetags: :email:
+#+slug: email-migration
+
+* The Setup
+
+I recently migrated my emails from Proton Mail to Migadu after a failed attempt
+to get myself into the Proton ecosystem and wanted to detail my process, as it
+was far more painful than expected.
+
+To give some context: I had nearly 5000 messages stored, accounting for around
+2.5 GB of space.
+
+Overall, this process would have taken all day had I done it in one sitting, but
+I decided to break it up and it lasted a couple days before I was able to say that
+all my messages were stored in my new account.
+
+* Exporting Messages
+
+To start, I needed to export my messages from Proton Mail. As I am using macOS,
+I was able to use the [[https://proton.me/support/proton-mail-export-tool][Proton Mail Export Tool]]. However, the downside is that
+this dumps every single email on your account into a single folder in the =.eml=
+format. They also export a JSON file for each message, in the case you're
+importing back into Proton Mail.
+
+This means that anything in my Inbox, Sent, Archive, Trash, and user-created
+folders were all dumped out into a single folder with incomprehensible names.
+
+Without a clear path to easily figure out how to re-organize my emails into a
+new account, I was left a bit annoyed at Proton's export process.
+
+* Importing Messages
+
+Left with a pile of messages and no way to discern what they were without
+opening each one, I decided to try and use Thunderbird to import messages into
+my new Migadu IMAP account.
+
+This led to a dead end as my two methods failed:
+
+1. [[https://addons.thunderbird.net/en-US/thunderbird/addon/importexporttools-ng/][ImportExportTools NG]] does not work with my version of Thunderbird (135).
+2. Manually dragging the =.eml= files onto a folder in Thunderbird worked for
+ small batches of files, but seemed to lock up if I tried to import more than
+ a few hundred at a time. It also seemed a bit buggy, as I ended up with many
+ duplicate, and sometimes triplicate, messages.
+
+At this point, I decided to take a step back and use [[https://github.com/djcb/mu][mu]], a command-line utility
+that would index my files and sync back and forth with Migadu for me.
+
+Using my blog post, [[https://cleberg.net/blog/mu4e.html][Email in Doom Emacs with Mu4e on macOS]], (and skipping the
+mu4e parts) I was able to set up a minimal directory connected to my Migadu IMAP
+account. Using my terminal, I simply moved all of my messages into the =mu=
+directory and synchronized the account, and voila, my messages synchronized
+successfully to the remote server and my other email clients.
+
+However, the remaining issue was that I now had all 5000 messages in the Archive
+folder and needed to figure out how to organize them back into their proper
+directories.
+
+* Organizing Messages Into Folders
+
+As with any problem, I used Python as my hammer to fix the problem. I started by
+creating the directories required in Thunderbird, fetching them with =mbsync= so
+that they appeared in my =mu= directory, and using Python to organize my
+messages into the newly-created sub-folders.
+
+** Sent Messages
+
+I started by organizing my Sent messages. This required checking each file for
+the =From= header and moving them to the Sent folder.
+
+#+begin_src shell
+cd ~/.maildir/migadu/Archive/cur
+nano _sent.py
+#+end_src
+
+#+begin_src python
+# _sent.py
+import os
+import glob
+import shutil
+
+# Loop through all files in the current folder
+for file in glob.glob("*.eml"):
+ # Create boolean to check if we should move the file
+ move = False
+
+ # Open the current file
+ f = open(file, 'r')
+
+ # For each line in file, find the From header
+ for line in f:
+ if line.startswith("From:"):
+ # If we find ourself, mark the message for move
+ if "user@example.com" in line:
+ move = True
+
+ # Close the file
+ f.close()
+
+ # Move the file, if marked for move
+ if move == True:
+ filepath = os.path.join("/Users/YOUR_USERNAME/.maildir/migadu/Archive/cur/", file)
+ new_filepath = os.path.join("/Users/YOUR_USERNAME/.maildir/migadu/Sent/cur/", file)
+ shutil.move(filepath, new_filepath)
+#+end_src
+
+#+begin_src python
+python3 _sent.py
+#+end_src
+
+The only downside to my current approach is that it was the quick and dirty
+option, so I re-ran it while editing the =user@example.com= string for each
+email I wanted to move. If I had wanted to create a more well-defined solution,
+I would have created an array of addresses to check for and have the =if=
+statement check against that array.
+
+Regardless, I was able to run this with the addresses I wanted to move to the
+Sent folder and was soon finished.
+
+** Archive Sub-Folders
+
+Next, I needed to move the remaining ~3000 messages from the Archive folder into
+dated sub-folders, organized as such:
+
+- Archive/2016
+- ...
+- Archive/2025
+
+
+To do this, I followed a similar approach as the method above but check for the
+=Date= header instead of the =From= header.
+
+#+begin_src shell
+cd ~/.maildir/migadu/Archive/cur
+nano _archive.py
+#+end_src
+
+This approach requires finding the =X-Pm-Date= header and splitting it by the
+spaces contained within. Once split into a list, we must select the fourth
+element, as that contains the year which will match the directory we should move
+it to.
+
+For example, the header =X-Pm-Date: Fri, 07 Feb 2025 16:12:08 +0000= will be
+split into a list as such:
+
+0. X-Pm-Date:
+1. Fri,
+2. 07
+3. Feb
+4. 2025
+5. 16:12:08
+6. +0000
+
+From this list, we select the fourth element (=2025=) and use that to build the
+destination path.
+
+#+begin_src python
+# _archive.py
+import os
+import glob
+import shutil
+
+# Loop through all files in the sub-folders under Archive
+for file in glob.glob("*.eml"):
+ # Create boolean to check if we should move the file
+ move = False
+
+ # Open the current file
+ f = open(file, 'r')
+
+ # For each line in file, find the X-Pm-Date header
+ for line in f:
+ if line.startswith("X-Pm-Date"):
+ # Split the line into a list by spaces;
+ # Then select the item that contains the year
+ year = line.split(" ")[4]
+ move = True
+
+ # Close the file
+ f.close()
+
+ # Move the file, if marked for move
+ if move == True:
+ filepath = os.path.join("/Users/YOUR_USERNAME/.maildir/migadu/Archive/cur/", file)
+ new_filepath = os.path.join(f"/Users/YOUR_USERNAME/.maildir/migadu/Archive/{year}/cur/", file)
+ shutil.move(filepath, new_filepath)
+#+end_src
+
+#+begin_src python
+python3 _archive.py
+#+end_src
+
+At this point, we've now moved all Sent messages to the Sent box and organized
+all messages under the Archive folder into their correct sub-folders.
+
+If you exported other files, such as files from your Inbox, Trash, etc., you
+could follow a similar approach and determine the best header or attribute to
+identify them for further organization.
+
+** Synchronize the Results
+
+Before synchronizing the files in their new locations, I needed to remove the
+characters at the end of the file name since =mu= appends IDs to the end of file
+names.
+
+#+begin_src shell
+cd ~/.maildir/migadu/Archive
+nano _sync_prep.py
+#+end_src
+
+This script prepares the =Archive= sub-folders for synchronization, but the same
+concept applies to the Sent folder, except you'd replace =*/cur/*= with =*= if
+this script were inside the =Sent/cur= directory.
+
+#+begin_src python
+import glob
+import shutil
+
+# Loop through all files in the sub-folders under Archive
+for file in glob.glob("*/cur/*"):
+ # Remove the characters at the end of the file name created by =mu=
+ new_file = file.split(",U=",1)[0]
+ # Move the file to the new file name
+ shutil.move(file, new_file)
+#+end_src
+
+#+begin_src shell
+python3 _sync_prep.py
+#+end_src
+
+Finally, we can synchronize the results.
+
+#+begin_src shell
+mbsync -aV
+#+end_src
+
+* Removing Duplicates
+
+My only remaining issue at the time of writing is identifying and removing
+duplicate messages. I have toyed with simple Python and command-line solutions
+to identify duplicate files, but could not get them to effectively define all
+the duplicates found in any specific directory.
+
+I've even tried using the [[https://github.com/pkolaczk/fclones][fclones]] utility, to no avail. It seems that something
+in the Proton export, my manual Thunderbird method attempt, or possible sync
+issues between Thunderbird -> Migadu <-> mu caused duplicates where content
+within the message has been modified.
+
+Although I now seem to be wasting space and in need of a deduplication tool, I
+have all of my messages migrated to my new service.