Proto commits in google/cdc-file-transfer

These 8 commits are when the Protocol Buffers files have changed:

2023-01-18

Commit:	efca985
Author:	Lutz Justen	2023-01-18 17:49:52 +0100
Committer:	GitHub	2023-01-18 17:49:52 +0100

[cdc_rsync] [cdc_stream] Switch from scp to sftp (#66) Use sftp for deploying remote components instead of scp. sftp has the advantage that it can also create directries, chmod files etc., so that we can do everything in one call of sftp instead of mixing scp and ssh calls. The downside of sftp is that it can't switch to ~ resp. %userprofile% for the remote side, and we have to assume that sftp starts in the user's home dir. This is the default and works on my machines! cdc_rsync and cdc_stream check the CDC_SFTP_COMMAND env var now and accept --sftp-command flags. If they are not set, the corresponding scp flag and env var is still used, with scp replaced by sftp. This is most likely correct as sftp and scp usually reside in the same directory and share largely identical parameters.

The documentation is generated from this commit.

2022-12-12

Commit:	f8438ae
Author:	Lutz Justen	2022-12-12 10:58:33 +0100
Committer:	GitHub	2022-12-12 10:58:33 +0100

[cdc_rsync] [cdc_stream] Remove SSH port argument (#41) This CL removes the port arguments for both tools. The port argument can also be specified via the ssh-command and scp-command flags. In fact, if a port is specified by both port flags and ssh/scp commands, they interfere with each other. For ssh, the one specified in ssh-command wins. For scp, the one specified in scp-command wins. To fix this, one would have to parse scp-command and remove the port arg there. Or we could just remove the ssh-port arg. This is what this CL does. Note that if you need a custom port, it's very likely that you also have to define custom ssh and scp commands.

2022-12-05

Commit:	1b8ad0e
Author:	Lutz Justen	2022-12-05 10:09:37 +0100
Committer:	GitHub	2022-12-05 10:09:37 +0100

[cdc_stream] Add wildcard support to stop command (#30) Adds support for stuff like cdc_stream stop * or cdc_stream stop user*:dir*.

2022-12-02

Commit:	1120dcb
Author:	Lutz Justen	2022-12-02 14:34:36 +0100
Committer:	GitHub	2022-12-02 14:34:36 +0100

[cdc_stream] Automatically start service (#28) Starts the streaming service if it's not up and running. This required adding the ability to run a detached process. By default, all child processes are killed when the parent process exits. Since detached child processes don't run with a console, they need to create sub- processes with CREATE_NO_WINDOW since otherwise a new console pops up, e.g. for every ssh command. Polls for 20 seconds while the service starts up. For this purpose, a BackgroundServiceClient is added. This will be reused in a future CL by a new stop-service command to exit the service. Also adds --service-port as additional argument to start-service.

2022-12-01

Commit:	01a60e2
Author:	Lutz Justen	2022-12-01 16:14:56 +0100
Committer:	GitHub	2022-12-01 16:14:56 +0100

Rename asset_stream_manager to cdc_stream (#27) Fixes #13

2022-11-18

Commit:	269fb2b
Author:	Lutz Justen	2022-11-18 10:59:42 +0100
Committer:	GitHub	2022-11-18 10:59:42 +0100

[cdc_stream] Add a CLI client to start/stop asset streaming sessions (#4) Implements the cdc_stream client and adjusts asset streaming in various places to work better outside of a GGP environment. This CL tries to get quoting for SSH commands right. It also brings back the ability to start a streaming session from asset_stream_manager. Also cleans up Bazel targets setup. Since the sln file is now in root, it is no longer necessary to prepend ../ to relative filenames to make clicking on errors work.

2022-11-16

Commit:	76bbdb0
Author:	chrschng	2022-11-16 11:20:32 +0100
Committer:	GitHub	2022-11-16 11:20:32 +0100

Merge dynamic manifest updates to Github (#7) This change introduces dynamic manifest updates to asset streaming. Asset streaming describes the directory to be streamed in a manifest, which is a proto definition of all content metadata. This information is sufficient to answer `stat` and `readdir` calls in the FUSE layer without additional round-trips to the workstation. When a directory is streamed for the first time, the corresponding manifest is created in two steps: 1. The directory is traversed recursively and the inode information of all contained files and directories is written to the manifest. 2. The content of all identified files is processed to generate each file's chunk list. This list is part of the definition of a file in the manifest. * The chunk boundaries are identified using our implementation of the FastCDC algorithm. * The hash of each chunk is calculated using the BLAKE3 hash function. * The length and hash of each chunk is appended to the file's chunk list. Prior to this change, when the user mounted a workstation directory on a client, the asset streaming server pushed an intermediate manifest to the gamelet as soon as step 1 was completed. At this point, the FUSE client started serving the virtual file system and was ready to answer `stat` and `readdir` calls. In case the FUSE client received any call that required file contents, such as `read`, it would block the caller until the server completed step 2 above and pushed the final manifest to the client. This works well for large directories (> 100GB) with a reasonable number of files (< 100k). But when dealing with millions of tiny files, creating the full manifest can take several minutes. With this change, we introduce dynamic manifest updates. When the FUSE layer receives an `open` or `readdir` request for a file or directory that is incomplete, it sends an RPC to the workstation about what information is missing from the manifest. The workstation identifies the corresponding file chunker or directory scanner tasks and moves them to the front of the queue. As soon as the task is completed, the workstation pushes an updated intermediate manifest to the client which now includes the information to serve the FUSE request. The queued FUSE request is resumed and returns the result to the caller. While this does not reduce the required time to build the final manifest, it splits up the work into smaller tasks. This allows us to interrupt the current work and prioritize those tasks which are required to handle an incoming request from the client. While this still takes a round-trip to the workstation plus the processing time for the task, an updated manifest is received within a few seconds, which is much better than blocking for several minutes. This latency is only visible when serving data while the manifest is still being created. The situation improves as the manifest creation on the workstation progresses. As soon as the final manifest is pushed, all metadata can be served directly without having to wait for pending tasks.

2022-11-03

Commit:	4326e97
Author:	Christian Schneider	2022-10-07 10:47:04 +0200
Committer:	Christian Schneider	2022-11-03 10:39:10 +0100

Releasing the former Stadia file transfer tools The tools allow efficient and fast synchronization of large directory trees from a Windows workstation to a Linux target machine. cdc_rsync* support efficient copy of files by using content-defined chunking (CDC) to identify chunks within files that can be reused. asset_stream_manager + cdc_fuse_fs support efficient streaming of a local directory to a remote virtual file system based on FUSE. It also employs CDC to identify and reuse unchanged data chunks.