- To Keep in Mind
- The
sync
command - The
check
command - Appendix
Recap
The previous post detailed how Rclone can reliably upload large files with their checksums to Backblaze unlike other programs. This post will outline the workflow and some gotchas to keep in mind when doing massive data loads over the internet.
With trial and error, I was able to archive 8 TB of footage from my Synology NAS to Backblaze B2 in about a month.
To Keep in Mind
First, the overall workflow.
Remote to Remote is Possible
- Rclone ('rsync for cloud storage') is a command line program to sync files and directories to and from different cloud storage providers. Storage providers. 1Fichier 📄 Alibaba Cloud (Aliyun) Object Storage System (OSS) 📄 Amazon Drive 📄 Amazon S3 📄 Backblaze B2 📄 Box 📄 Ceph 📄 Citrix ShareFile 📄.
- Backblaze B2 is part of the Cantemo Achive Framework. This means the Cantemo Portal can enable sophisticated rules to automatically archive or restore an asset or collection to/from Backblaze B2 Cloud Storage.
Vertical Backup is a special edition of Duplicacy developed for VMware vSphere (ESXi) which can backup virtual machines to networked drives, SFTP server, Amazon S3, Wasabi, Microsoft Azure, and Backblaze B2.
Keep in mind Rsync supports copying between two remotes directly. The computer running Rclone will stream data in RAM as it shuttles data between the two.
In fact that’s what I mainly did: transferred assets from a personal B2 bucket to the organization’s new B2 bucket. Pretty neat!
List Folders Syntax: lsd
After setting up your remote with rclone config
, use the list directory command lsd
to double check your source/target folders.
For example, if the B2 remote name is called b2-remote1
then the command to list the root is:
Note the :
at the end.
If a folder contains spaces, you use double quotes like this rather than backticks .
Also use trailing forward slashes /
instead of asterisks *
to indicate the files inside.
Consider copy
instead of sync
From the docs1:
rclone copy
- Copy files from source to dest, skipping already copied.rclone sync
- Make source and dest identical, modifying destination only.
Depending on your intention, copy
may be better.
Expect Errors and Verify
Although Rclone automatically retries upload errors (by default up to 10 times) there are few reasons why files never get uploaded. See the appendix for various scenarios.
Therfore, in a nutshell, always verify your transfer after (see below).
Beware Quota Restrictions
Unexpected EOF (end of file) errors can occur when streaming from a remote because of Backblaze quota restrictions.
Double Check the Source Supports (and has) Checksums
Since Backblaze only supports SHA-1 checksums, the Rclone docs indicate the source must also support SHA-1 checksums.2
For a large file to be uploaded with an SHA1 checksum, the source needs to support SHA1 checksums. The local disk supports SHA1 checksums so large file transfers from local disk will have an SHA1. See the overview for exactly which remotes support SHA1.
So B2 to B2 syncs should always populate checksums, right? Wrong. It will only if the source B2 bucket had checksums.
As detailed in the previous post, that means if the large files were copied with Rclone would they have checksums.
Rclone Browser is Great (but Deprecated) for Local <-> Remote
Rclone Browser is a wrapper that the same config as the CLI. Rclone Browser does not support direct remote to remote syncs, but it is good for normal use. Unfortunately the program deprecated in favor of the WebGUI, but the latter doesn’t let you yet upload things. 🤷🏾♂️
On Mac, Rclone Browser can be installed with Homebrew via brew cask install rclone-browser
⬆︎ Reliability by ⬆︎ Chunk Size (using ⬆︎ RAM)
The default settings seem to be optimized for small files, like webpages.
- Single part upload cutoff of 200 MB
- Chunk size of 96 MB
- Four concurrent transfers
For whatever reason, the error rate with these defaults was higher than I expected (see below).
Instead, I found better stability for large video files with:
- Cutoff of 1G
- 1G <= chunk size <=4G
- Two concurrent transfers
Note that all concurrent chunks are buffered into memory, so there is significantly more RAM usage with larger chunk sizes. Hence the downgrade to two transfers.
More specifics in the sync section below.
Measure Twice, Cut Once: dryrun
Before discussing the sync
command, it’s imperative mention the --dryrun
flag for the following reasons.
- Backblaze bills by usage/throughput
- B2 doesn’t support renaming files after they are uploaded
Therefore, when running rclone sync
always use the --dryrun
option first.
The sync
command
My goto sync
(orcopy
) command is:
rclone sync <source> <dest> --exclude .DS_Store -vv --b2-upload-cutoff 1G --b2-chunk-size 1G --transfers 2
Explanation of Flags
--exclude .DS_Store
to excluding Mac specific files-vv
to enable DEBUG logging for visibility into chunk retries, etc.--b2-upload-cutoff
files above this size will switch to a multipart chunked transfer--b2-chunk-size
the size of the chunks, buffered in memory--transfers
number of simulatenous transfers.b2-chunk-size
xtransfers
must fit in RAM
Phased Approach with --max-size
Sometimes I found it helpful to transfer all files under a certain size limit first, say 1 GB, and then re-run the command for larger files.
Backblaze B2 Rsync
To do so, add --max-size 1G
to the rclone sync
command.
The check
command
Always verify after a sync. Even if you think you don’t need to. The command is straightforward:
rclone check <source> <dest> --exclude .DS_Store
If there are discrepancies the output will look like:
Use error output to create diff file
By massaging the rclone check
standard output into a new file with just the file names, it is possible to re-sync just these files. This saves us Backblaze read transactions on the files already copied.
Assuming a file mydiff.txt
:
the sync command is:
Then, run rclone check
again on all the files.
The cleanup
command
If your buckets are created with default settings, the file lifecyle is set to Keep all versions
.
To purge deleted files, use a similar syntax to the lsd
command.
Also note that3:
Note that cleanup
will remove partially uploaded files from the bucket if they are more than a day old.
Appendix
Performance Logs
The exact command I used at first was
and it completed, roughly 3 days later with a 5% error rate.
Instead, by using a chunk size 1G and two max transfers (total 2G in RAM at a time) transfers were noticeably more stable.
Upload cutoffs of “5G”
During my experiments, I once tried a 5G single-part cutoff: --b2-chunk-size 2G --b2-upload-cutoff 5G --max-size 5G
. The docs state This value should be set no larger than 4.657GiB ( 5GB)
however it threw this error.
Backblaze B2 Vs Rsync.net
So apparently 5G
is too high. 4G
worked fine though.
500 Internal Server Error
Something is wrong with Backblaze, usually a transient problem. Rclone will retry, by default up to 10 times with built-in rate limiting (pacer) as shown with the incident a7691a3d7f71-e47fc872d7ba
below.
References
João Dalvi
Junior Member
I need to know how does the cloud sync tasks work. Does it use rsync or something like that? Does it have some kind of transport encryption, or is my data transfered unencrypted over the internet to backblaze servers?
And finally, is there a way to sync files to backblaze in such a way that they cannot read my files, in such a way that they leave my FreeNAS server encrypted?