DVC filesystem abstraction layer (0.8.0)

This package provides high-level API work easy writing/reading/listing files inside the DVC. It can be used for automation systems integrated with data pipelines.

dvc-fs provides basic compatibility (Still work in progress) with PyFilesystem2 API.

Installation

To install this package please do:

  $ python3 -m pip install "dvc-fs==0.8.0"

Or with Poetry:

  $ poetry install dvc-fs

Usage

Using via PyFielsystem2:

The dvc-fs package is integrated with PyFilesystem, so you can do:

from fs import open_fs
fs1 = open_fs("dvc://github.com/covid-genomics/data-artifacts") # Clone by https
fs2 = open_fs("dvc://ssh@github.com/covid-genomics/data-artifacts") # Clone by ssh
fs3 = open_fs("dvc://<PAT>@github.com/covid-genomics/data-artifacts") # Clone by https with personal access token
 # You can also use normal HTTPS and create env variable GIT_TOKEN
 # In that case Personal Access Token will be injected in the clone url

Usage:

from fs import open_fs
with open_fs("dvc://github.com/covid-genomics/data-artifacts") as fs:
    fs.writetext("fs_test/fasta2.txt", "TEST")

** Explicitly creating DVCFS: **

from dvc_fs.fs import DVCFS
with DVCFS("https://<GITHUB_PERSONAL_TOKEN>@github.com/covid-genomics/dvc_repo.git") as fs:
    for path in fs.walk.files():
        # Print all paths in repo
        print(path)

Reading/writing:

Read and write contents:

from dvc_fs.fs import DVCFS
with DVCFS("https://<GITHUB_PERSONAL_TOKEN>@github.com/covid-genomics/dvc_repo.git") as fs:
    contents = fs.readtext('data/1.txt')
    print(f"THIS IS CONTENTS: {contents}")
    fs.writetext("test.txt", contents+"!")

Basically you can directly use DVC high-level api via the Client:

from dvc_fs.client import Client, DVCPathUpload

# Git repo with DVC configured
client = Client("https://<GITHUB_PERSONAL_TOKEN>@github.com/covid-genomics/dvc_repo.git")
client.update([
    # Upload local file ~/local_file_path.txt to DVC repo under path data/1.txt
    DVCPathUpload("data/1.txt", "~/local_file_path.txt"),
])

The upload operator supports various types of data inputs that you can feed into it.

Uploading a string as a file:

from dvc_fs import Client, DVCStringUpload
from datetime import datetime

Client("<DVC_REPO>").update([
    DVCStringUpload("data/1.txt", f"This will be saved into DVC. Current time: {datetime.now()}"),
])

Uploading local file using its path:

from dvc_fs import Client, DVCPathUpload

Client("<DVC_REPO>").update([
    DVCPathUpload("data/1.txt", "~/local_file_path.txt"),
])

Upload content generated by a python function:

from dvc_fs import Client, DVCCallbackUpload

Client("<DVC_REPO>").update([
    DVCCallbackUpload("data/1.txt", lambda: "Test data"),
])

We can use download operation similarily to the upload. The syntax is the same:

from dvc_fs import Client, DVCCallbackDownload

# Download DVC file data/1.txt and print it on the screen
Client("<DVC_REPO>").download([
    DVCCallbackDownload("data/1.txt", lambda content: print(content)),
])

Versioning

To bump project version before release please use the following command (for developers):

    $ poetry run bump2version minor