Recent comments posted to this site:

comment 17 360caa8972c2daa94044cc95188306e9
[[!comment Error: unsupported page format sh]]
Thu Oct 31 21:20:13 2024
comment 23 70dcb7e7ffdd14351adaf4c40ee7fdd0
[[!comment Error: unsupported page format hs]]
Thu Oct 31 21:20:13 2024
comment 3 e6ce9bb92c973350852c9498b7ffb50f
[[!comment Error: unsupported page format sh]]
Thu Oct 31 21:20:13 2024
Tuning is not experimental for some time, I've removed the warnings.
Comment by joey Tue Oct 22 17:26:55 2024

importtree=yes remotes are always untrusted. The reason is that something else is assumed to be writing to those remotes, which is what populates them with files. And that could delete or change any file at any time. So if git-annex didn't untrust the remote, and relied on it to hold the only copy of a file, such a change would cause data loss.

There would need to be a new config setting to add the concept of guaranteed readonly importtree=yes remotes.

git-annex does not allow --numcopies to be set to 0 as that can cause data loss.

Comment by joey Mon Oct 21 18:54:55 2024

This is plausible. git-annex requires that special remotes only show a file as present after a successful upload. If the data store doesn't work that way, the file needs to be uploaded to a temporary name and renamed atomically instead. If that's not possible, the data store is not safe for use by git-annex.

Given all the different types data stores supported by rclone, this may be difficult, but it's the right thing for the external special remote to do. I think you should file a bug.

(Does rclone gitannex also have this problem?)

Comment by joey Mon Oct 21 14:35:17 2024

I'd like to set a few additional configurations so that all clones treat a special remote similarly.

Particularly I'd like to set the trustlevel and tracking-branch for an exporttree special remote so that any clone that enables this remote also have these configurations enabled. In particular this is justified for a certain remote of mine because it exports to a version controlled environment that I trust, so it would just be nice not to have to run git config remote.name.annex-tracking-branch and git annex trust name semitrusted for every clone.

Of course, are git annex config --set remote.name.annex-trustlevel "semitrusted" and git annex config --set remote.name.annex-tracking-branch "main" (called once) any easier than the above called multiple times? Maybe not, but it would be slightly less mental overhead to not do the above.

Off hand can you imagine any caveats that would preclude adding these settings to the list of supported for this command? I agree that only some make sense for all clones to see rather than anything one can set in git config but of course that specification requires manual addition of config cases that do make sense. Maybe this is one of them.

Comment by Spencer Wed Oct 9 23:10:17 2024

What's the reason for not supporting importtree via webdav?

Would be nice to keep a tree in sync on my nextcloud and sync to my phone etc.

Comment by annex Wed Oct 9 06:42:40 2024

To access the manifest and bundles, one needs the UUID of the special remote initially configured. Then one can run

[[!format sh """ git clone 'annex::?type=directory&encryption=none&directory=/path/to/space%20sanitized%20directory' """]]

A bit tedious for both the need to type all settings (even those not shown by the remote helper when doing the push operations from the initial repo, in this case the directory, in other cases all required settings to init the remote in the first place) and for having to HTML sanitize any URL disallowed characters. But doable

The other option would be to manually clone by initializing the new empty repo, then adding the special remote the normal git annex way. This doesn't work right just yet because --uuid is not an allowed option for initremote. It would be nice if this were an option simply to avoid the tedium of typing the URL as above (one could copy and paste git --no-pager show git-annex:remote.log into initremote)

Despite the URL tedium, an exciting result of the current system is that any number of repos and file annexes can share one directory! Like an entire organization (or repo group) in one folder. Datalad has a similar archetype (remote indexed archives) which offer (slightly) improved user friendliness by filing each repo UUID into meaningfully-named folders (unhashed first three/remaining is nice for being actually the UUID but it still doesn't let me easily copy/paste the UUID for cloning). Although I kind of like how git-annex's implementation encourages a single unified "annex" (rather than RIA's UUID/Annexwhich gives each UUID a separate annex) and of course bundles over loose git files, especially for cloud special remotes which can be slow to upload each and every loose file.

Looking forward to seeing how this feature develops!

Comment by Spencer Mon Oct 7 20:00:24 2024

Perhaps Joey can help me out here a bit with some background knowledge:

I've been seeing sporadic corruption with this setup:

  • chunking
  • encryption
  • old helper program git-annex-remote-rclone
  • rclone's pcloud backend

As it seems, rclone keeps partial files under the name of the full file when a transfer is interrupted, for the pcloud backend. (This is for rclone <= 1.67.0; 1.68.0 has changes for pcloud, which may fix this.) My theory how the corruption might have happened:

  • First interrupted run of git-annex uploads chunks A and a partial(!) chunk B
  • Second run skips chunks A and B(!); and proceedsto upload the rest of the chunks (C and D)
  • At the end we have uploaded A, C and D and a corrupted/partial chunk B

Joey: Is this a possible error scenario?

Comment by mike Fri Sep 27 12:18:41 2024