Out-of-band updates
Out-of-band updates
Out-of-band updates
Table of Contents
There are multiple ways to fully synchronize in-cluster content with remote backend. Let’s first take a look at the following ais cp and ais prefetch examples:
Notice the --sync option.
Alternatively, to fully synchronize in-cluster content (and since “prefetch” typically does not imply any deletions) we can also use ais evict followed by ais prefetch:
Notice the
--keep-mdoption above.
TIP: always a good idea to check
--helpfor the most recent updates.
One (but not the only one) way to deal with out-of-band updates is to configure bucket as follows:
Here, s3://abc is presumably an Amazon S3 bucket, but it could be any Cloud or remote AIS bucket.
It could also be any
ais://bucket with Cloud or remote AIS backend. For usage, seebackend_bckoption in CLI documentation and examples.
Once validate_warm_get is set, any read operation on the bucket will take a bit of extra time to compare the in-cluster metadata with its remote counterpart.
Further, if and when this comparison fails, aistore performs a cold GET, to create a new copy of the remote object and make sure that the cluster has the latest version.
Needless to say, the latest version will be always returned to the user as well.
But sometimes, we may want to perform a single given operation without updating bucket configuration. For instance:
Notice the --latest switch above. As far as this particular prefetch is concerned --latest will have the same effect as setting versioning.validate_warm_get=true. But only “as far” - the scope of validating in-cluster versions will be limited to this specific batch job.
The same applies to copying buckets and copying ranges and lists of objects, and certainly getting (as in GET) individual objects.
Here’s the an excerpt from GET help (and note --latest below):
ais cp command and, in particular, its --sync option.versionETagais object checksum (by default, xxhash that we store as part of custom Cloud metadata)MD5CRC32CTo enable version validation, run:
No assumption is being made on whether any of the above is present (except, of course, the size aka “Content-Length”).
The rules are simple:
size vs size, MD5 and MD5, etc.);size (in other words, same size does not contribute to decision in favor of skipping cold GET);ETag vs MD5);When there are no matches, we go ahead with cold GET.
A single match - e.g. only the version (if exists), or only ETag, etc. - is currently resolved positively iff the source backend is the same as well.
E.g., copying object from Amazon to Google and then performing validated GET with aistore backend “pointing” to Google - will fail the match.
TODO: make it configurable to require at least two matches.
Needless to say, if querying remote metadata fails the corresponding GET transaction will fail as well.
But there’s one special condition when the call to query remote metadata returns “object not found”. In other words, when the remote backend unambiguously indicates that the remote object does not exist (any longer).
In this case, there are two configurable choices as per (already shown) versioning section of the bucket config:
The knob called versioning.synchronize is simply a stronger variant of the versioning.validate_warm_get;
that entails both:
To recap:
if an attempt to read remote metadata returns “object not found”, and versioning.synchronize is set to true, then
we go ahead and delete the object locally, thus effectively synchronizing in-cluster content with it’s remote source.
But sometimes, there may be a need to have a more fine-grained, operation level, control over this functionality.
AIS API supports that. In CLI, the corresponding option is called --latest. Let’s see a brief example, where:
s3:///abc is a bucket that containss3://abc/README.md object that was previouslyIn other words, the setup we describe boils down to a single main point:
s3://abc/README.md).Namely:
AIS, on the other hand, shows:
Moreover, GET operation with default parameters doesn’t help:
To reconcile, we employ the --latest option:
Notice that we now have the latest KJOQsGc... version (that s3api also calls VersionIdMarker).
ais cp command and, in particular, its --sync option.