Metadata Documentation
This document describes the various metadata files generated by a TableVault repository to track table states and transformations.
While these files should never be manually modified by users, they provide essential insights into the repository's internal state. Below is a detailed explanation of each metadata file type and its contents.
General Metadata
metadata/logs.txt
and metadata/log_ids.txt
These files record information about completed logs. logs.txt
includes comprehensive details, whereas log_ids.txt
only lists the process_id
.
metadata/active_logs.json
This file stores details about all currently active processes. Users can directly query this data through the TableVault API.
metadata/tables_history.json
, metadata/columns_history.json
, and metadata/tables_temp.json
These files maintain a historical record of all fully stored table instances:
-
tables_history.json
records: -
When a dataframe was first created (potentially by a different table instance).
- When an instance was initially materialized.
-
When an instance ceased to be active.
-
columns_history.json
tracks creation at the column level. -
tables_temp.json
records temporary table instances.
These files are used internal operations, optimization strategies, and historical tracking.
locks/*
TableVault implements custom file-based read-write locks to enable multiprocessing capabilities.
_temp/*
This directory temporarily stores previous file states during active operations. If an operation fails, it allows safe restoration of the repository's previous state.
metadata/ARCHIVED_TRASH/*
Upon deletion of tables and instances, their dataframes and artifacts are removed, but the associated metadata is archived in this folder. This feature preserves historical context.
Note
Files within this folder can safely be deleted if storage space is limited. Typically, these files occupy minimal space.
lock.LOCK
file
This lock file ensures exclusive write access to the repository metadata, preventing concurrent write operations.
.tablevault
file
This file identifies the directory explicitly as a TableVault repository.
Table and Instance Metadata
descriptions.yaml
Each TableVault repository, table, and instance has a dedicated YAML file containing specific metadata. While some metadata is automatically generated, users can include optional free-form descriptions during creation. This capability allows arbitrary contextual details to be preserved.
dtypes.json
For each materialized instance, a dtypes.json
file specifies the data types of all columns. This is particularly useful for managing custom data types and tracking artifact_string
columns.
EXECUTION_ARCHIVE/*
folder
Each executed instance contains an EXECUTION_ARCHIVE
folder, explicitly documenting the Python functions executed during the instance's lifecycle.