Core API Reference

The Vault class is the main interface for tracking ML items and their lineage.

Initialization

`Vault`

Vault(
    user_id: str,
    process_name: str,
    parent_process_name: str = "",
    parent_process_index: int = 0,
    arango_url: str = "http://localhost:8529",
    arango_db: str = "tablevault",
    arango_username: str = "tablevault_user",
    arango_password: str = "tablevault_password",
    new_arango_db: bool = True,
    arango_root_username: str = "root",
    arango_root_password: str = "passwd",
    description_embedding_size: int = 1024,
    log_file_location: str = "~/.tablevault/logs/",
) -> Vault

Initialize the Vault singleton. Only one vault can be active per Python process. Once active, all subsequently executed code is tracked in the TableVault repository.

Parameters:

Name	Type	Description
`user_id`	`str`	Unique identifier for the user
`process_name`	`str`	Unique name for this process
`parent_process_name`	`str`	Name of the generating process (if exists)
`parent_process_index`	`int`	Index of the generating process (if exists)
`arango_url`	`str`	URL of the ArangoDB server
`arango_db`	`str`	Name of the database to use
`arango_username`	`str`	Username for database access
`arango_password`	`str`	Password for database access
`new_arango_db`	`bool`	If True, create a new database (drops existing)
`arango_root_username`	`str`	Root username for database creation
`arango_root_password`	`str`	Root password for database creation
`description_embedding_size`	`int`	Dimension of description embeddings
`log_file_location`	`str`	Directory for log files

Returns: Vault instance

Create Functions

Functions for creating new item lists.

`create_file_list`

create_file_list(item_name: str) -> None

Create a new file list.

Parameters:

Name	Type	Description
`item_name`	`str`	Unique name for the file list

`create_document_list`

create_document_list(item_name: str) -> None

Create a new document list.

Parameters:

Name	Type	Description
`item_name`	`str`	Unique name for the document list

`create_embedding_list`

create_embedding_list(item_name: str, ndim: int) -> None

Create a new embedding list.

Parameters:

Name	Type	Description
`item_name`	`str`	Unique name for the embedding list
`ndim`	`int`	Dimensionality of the embeddings in this list

`create_record_list`

create_record_list(item_name: str, column_names: List[str]) -> None

Create a new record list with specified column names.

Parameters:

Name	Type	Description
`item_name`	`str`	Unique name for the record list
`column_names`	`List[str]`	List of column names for records in this list

`create_description`

create_description(
    item_name: str,
    description: str,
    embedding: List[float],
    description_name: str = "BASE"
) -> None

Adds a joint text and embedding description to an item list.

Parameters:

Name	Type	Description
`item_name`	`str`	Name of the item to describe
`description`	`str`	Text description of the item
`embedding`	`List[float]`	Embedding vector for the description
`description_name`	`str`	Label for this description (default "BASE")

Append Functions

Functions for appending content to existing item lists.

`append_file`

append_file(
    item_name: str,
    location: str,
    input_items: Optional[InputItems] = None,
    index: Optional[int] = None
) -> None

Append a file reference to a file list.

Parameters:

Name	Type	Description
`item_name`	`str`	Name of the file list to append to
`location`	`str`	File path or location string
`input_items`	`Optional[InputItems]`	Mapping of dependency item key → [start_position, end_position]
`index`	`Optional[int]`	Specific index to insert at (appends to end if None)

`append_document`

append_document(
    item_name: str,
    text: str,
    input_items: Optional[InputItems] = None,
    index: Optional[int] = None,
    start_position: Optional[int] = None
) -> None

Append a text chunk to a document list.

Parameters:

Name	Type	Description
`item_name`	`str`	Name of the document list to append to
`text`	`str`	Text content of the document
`input_items`	`Optional[InputItems]`	Mapping of dependency item key → [start_position, end_position]
`index`	`Optional[int]`	Specific index to insert at (appends to end if None)
`start_position`	`Optional[int]`	Character position within the document stream

Note

Both index and start_position must be provided together when specifying manual positions.

`append_embedding`

append_embedding(
    item_name: str,
    embedding: List[float],
    input_items: Optional[InputItems] = None,
    index: Optional[int] = None,
    build_idx: bool = True,
    index_rebuild_count: int = 10000
) -> None

Append an embedding vector to an embedding list.

Parameters:

Name	Type	Description
`item_name`	`str`	Name of the embedding list to append to
`embedding`	`List[float]`	The embedding vector to store
`input_items`	`Optional[InputItems]`	Mapping of dependency item key → [start_position, end_position]
`index`	`Optional[int]`	Specific index to insert at (appends to end if None)
`build_idx`	`bool`	Whether to rebuild the vector index
`index_rebuild_count`	`int`	Threshold for triggering index rebuild

`append_record`

append_record(
    item_name: str,
    record: Dict[str, Any],
    input_items: Optional[InputItems] = None,
    index: Optional[int] = None
) -> None

Append a record (row) to a record list.

Parameters:

Name	Type	Description
`item_name`	`str`	Name of the record list to append to
`record`	`Dict[str, Any]`	Dictionary with column names as keys and values
`input_items`	`Optional[InputItems]`	Mapping of dependency item key → [start_position, end_position]
`index`	`Optional[int]`	Specific index to insert at (appends to end if None)

Note

Top-level dictionary keys must match the initial column names defined when the record list was created.

Operation Management

Functions for managing vault operations and cleanup.

`get_current_operations`

get_current_operations() -> Dict[str, Any]

Get all currently active operations.

Returns: Dictionary of active operation timestamps

`vault_cleanup`

vault_cleanup(
    interval: int = 60,
    selected_timestamps: Optional[List[int]] = None
) -> None

Clean up stale operations that have exceeded the interval.

Parameters:

Name	Type	Description
`interval`	`int`	Time in seconds after which an operation is considered stale
`selected_timestamps`	`Optional[List[int]]`	If provided, only clean up these specific timestamps

Process Control

Functions for controlling process execution lifecycle.

`checkpoint_execution`

checkpoint_execution() -> None

Mark a safe checkpoint in code where stop and pause requests can be executed. This avoids stopping during undesirable conditions (e.g., while waiting for outgoing API calls).

`pause_execution`

pause_execution(process_name: str) -> None

Request to pause another process's execution.

Parameters:

Name	Type	Description
`process_name`	`str`	Name of the process to pause

`stop_execution`

stop_execution(process_name: str) -> None

Request to stop another process's execution.

Parameters:

Name	Type	Description
`process_name`	`str`	Name of the process to stop

`resume_execution`

resume_execution(process_name: str) -> None

Resume a paused process by name.

Parameters:

Name	Type	Description
`process_name`	`str`	Name of the process list to resume

Note

Currently only works when processes are on the same machine or container.

Delete Functions

Functions for deleting item lists.

`delete_list`

delete_list(item_name: str) -> None

Delete an item list's content.

Parameters:

Name	Type	Description
`item_name`	`str`	Name of the item list to delete

Utility Functions

Helper functions for checking vault state.

`has_vector_index`

has_vector_index(ndim: int) -> bool

Check if a vector index exists for embeddings of a given dimension.

Parameters:

Name	Type	Description
`ndim`	`int`	Dimensionality of the embeddings

Returns: True if a vector index exists for this dimension

List Queries

Functions for querying across item lists with filtering and similarity search.

`query_process_list`

query_process_list(
    code_text: Optional[str] = None,
    parent_code_text: Optional[str] = None,
    description_embedding: Optional[List[float]] = None,
    description_text: Optional[str] = None,
    filtered: Optional[List[str]] = None
) -> List[Any]

Query process items. Can optionally filter by descriptions and parent process.

Parameters:

Name	Type	Description
`code_text`	`Optional[str]`	Text to search in process code
`parent_code_text`	`Optional[str]`	Text to search in parent process code
`description_embedding`	`Optional[List[float]]`	Embedding vector for similarity search
`description_text`	`Optional[str]`	Text to search in descriptions
`filtered`	`Optional[List[str]]`	List of process names to restrict search to

Returns: List[List] — one 5-element list per matching process run:

Index	Type	Description
`[0]`	`str`	Process name
`[1]`	`int`	Run index of this process execution
`[2]`	`int`	Start position offset of this run in the process stream
`[3]`	`List[str]`	Matched description names; empty when no description filter applied
`[4]`	`List[[str, int]]`	Matched parent processes as `[process_name, process_index]`; empty when no `parent_code_text` filter applied

`query_embedding_list`

query_embedding_list(
    embedding: Optional[List[float]] = None,
    description_embedding: Optional[List[float]] = None,
    description_text: Optional[str] = None,
    code_text: Optional[str] = None,
    filtered: Optional[List[str]] = None,
    use_approx: bool = False
) -> List[Any]

Query embedding items. Can optionally filter by descriptions and parent process.

Parameters:

Name	Type	Description
`embedding`	`Optional[List[float]]`	Query embedding vector for similarity search
`description_embedding`	`Optional[List[float]]`	Embedding for description similarity
`description_text`	`Optional[str]`	Text to search in descriptions
`code_text`	`Optional[str]`	Text to search in process code
`filtered`	`Optional[List[str]]`	List of embedding names to restrict search to
`use_approx`	`bool`	Use approximate (faster) similarity search

Returns: List[List] — one 5-element list per matching embedding entry:

Index	Type	Description
`[0]`	`str`	Embedding list name
`[1]`	`int`	Position index of the entry within its embedding list
`[2]`	`int`	Numeric start position of the entry
`[3]`	`List[str]`	Matched description names; empty when no description filter applied
`[4]`	`List[[str, int]]`	Matched processes as `[process_name, process_index]`; empty when no `code_text` filter applied

`query_record_list`

query_record_list(
    record_text: Optional[str] = None,
    description_embedding: Optional[List[float]] = None,
    description_text: Optional[str] = None,
    code_text: Optional[str] = None,
    filtered: Optional[List[str]] = None
) -> List[Any]

Query record items. Can optionally filter by descriptions and parent process.

Parameters:

Name	Type	Description
`record_text`	`Optional[str]`	Text to search in record data
`description_embedding`	`Optional[List[float]]`	Embedding for description similarity
`description_text`	`Optional[str]`	Text to search in descriptions
`code_text`	`Optional[str]`	Text to search in process code
`filtered`	`Optional[List[str]]`	List of record names to restrict search to

Returns: List[List] — one 5-element list per matching record entry:

Index	Type	Description
`[0]`	`str`	Record list name
`[1]`	`int`	Position index of the entry within its record list
`[2]`	`int`	Numeric start position of the entry
`[3]`	`List[str]`	Matched description names; empty when no description filter applied
`[4]`	`List[[str, int]]`	Matched processes as `[process_name, process_index]`; empty when no `code_text` filter applied

`query_document_list`

query_document_list(
    document_text: Optional[str] = None,
    description_embedding: Optional[List[float]] = None,
    description_text: Optional[str] = None,
    code_text: Optional[str] = None,
    filtered: Optional[List[str]] = None
) -> List[Any]

Query document items. Can optionally filter by descriptions and parent process.

Parameters:

Name	Type	Description
`document_text`	`Optional[str]`	Text to search in document content
`description_embedding`	`Optional[List[float]]`	Embedding for description similarity
`description_text`	`Optional[str]`	Text to search in descriptions
`code_text`	`Optional[str]`	Text to search in process code
`filtered`	`Optional[List[str]]`	List of document names to restrict search to

Returns: List[List] — one 5-element list per matching document chunk:

Index	Type	Description
`[0]`	`str`	Document list name
`[1]`	`int`	Position index of the chunk within its document list
`[2]`	`int`	Character offset where this chunk begins
`[3]`	`List[str]`	Matched description names; empty when no description filter applied
`[4]`	`List[[str, int]]`	Matched processes as `[process_name, process_index]`; empty when no `code_text` filter applied

`query_file_list`

query_file_list(
    description_embedding: Optional[List[float]] = None,
    description_text: Optional[str] = None,
    code_text: Optional[str] = None,
    filtered: Optional[List[str]] = None
) -> List[Any]

Query file items. Can optionally filter by descriptions and parent process.

Parameters:

Name	Type	Description
`description_embedding`	`Optional[List[float]]`	Embedding for description similarity
`description_text`	`Optional[str]`	Text to search in descriptions
`code_text`	`Optional[str]`	Text to search in process code
`filtered`	`Optional[List[str]]`	List of file names to restrict search to

Returns: List[List] — one 5-element list per matching file entry:

Index	Type	Description
`[0]`	`str`	File list name
`[1]`	`int`	Position index of the file entry within its file list
`[2]`	`int`	Numeric start position of the entry
`[3]`	`List[str]`	Matched description names; empty when no description filter applied
`[4]`	`List[[str, int]]`	Matched processes as `[process_name, process_index]`; empty when no `code_text` filter applied

Basic Queries

Functions for querying individual items and their relationships.

`query_item_content`

query_item_content(
    item_name: str,
    index: Optional[int] = None,
    start_position: Optional[int] = None,
    end_position: Optional[int] = None
) -> Any

Query the content of an item list by index chunk or position range.

Parameters:

Name	Type	Description
`item_name`	`str`	Name of the item list to query
`index`	`Optional[int]`	Specific index chunk to retrieve
`start_position`	`Optional[int]`	Start of position range (if index not specified)
`end_position`	`Optional[int]`	End of position range (if index not specified)

Returns: When index is given, a single item whose type depends on the list type:

List type	Return type	Value
`process_list`	`dict`	`{"text": str, "status": str, "error": str, "start_position": int, "index": int}`
`file_list`	`str`	File location/path
`embedding_list`	`List[float]`	Embedding vector
`document_list`	`str`	Text chunk
`record_list`	`Dict[str, Any]`	Record data keyed by column name

When index is None, a List of the above types for all entries whose position range overlaps [start_position, end_position), sorted by start_position.

`query_item_names`

query_item_names(item_type: str) -> List[str]

Get all item names of a given collection type.

Parameters:

Name	Type	Description
`item_type`	`str`	Collection type to filter by: `"process_list"`, `"file_list"`, `"embedding_list"`, `"document_list"`, or `"record_list"`

Returns: List[str] — sorted list of item names belonging to that collection type

`query_item_type`

query_item_type(item_name: str) -> Optional[str]

Get the collection type of an item by name.

Parameters:

Name	Type	Description
`item_name`	`str`	Name of the item to look up

Returns: The collection type string — one of "process_list", "file_list", "embedding_list", "document_list", or "record_list" — or None if the item does not exist.

`query_item_list`

query_item_list(item_name: str) -> Dict[str, Any]

Get metadata for an item list.

Parameters:

Name	Type	Description
`item_name`	`str`	Name of the item list

Returns: Dict[str, Any] — the list's metadata document. Common fields:

Field	Type	Description
`n_items`	`int`	Number of entries currently in the list
`length`	`int`	Total length/size (entry count for file/record/embedding lists; total character count for document lists)
`deleted`	`int`	Deletion marker (`-1` = not deleted)

Additional fields by list type:

List type	Extra field	Type	Description
`embedding_list`	`n_dim`	`int`	Dimensionality of stored embeddings
`record_list`	`column_names`	`List[str]`	Ordered column names

`query_item_parent`

query_item_parent(
    item_name: str,
    start_position: Optional[int] = None,
    end_position: Optional[int] = None
) -> List[Any]

Query input dependencies of an item list. Allows optional position filtering.

Parameters:

Name	Type	Description
`item_name`	`str`	Name of the item list
`start_position`	`Optional[int]`	Filter by start position
`end_position`	`Optional[int]`	Filter by end position

Returns: List[List] — one 6-element list per dependency edge in the filtered range:

Index	Type	Description
`[0]`	`int`	Start position of the parent entry that has this dependency
`[1]`	`int`	End position of the parent entry
`[2]`	`str`	Collection type of the input dependency (e.g. `"file_list"`, `"document_list"`)
`[3]`	`str`	Name of the input dependency item list
`[4]`	`int`	Start position within the dependency list
`[5]`	`int`	End position within the dependency list

`query_item_child`

query_item_child(
    item_name: str,
    start_position: Optional[int] = None,
    end_position: Optional[int] = None
) -> List[Any]

Query items that depend on an item list. Allows optional position filtering.

Parameters:

Name	Type	Description
`item_name`	`str`	Name of the item list
`start_position`	`Optional[int]`	Filter by start position
`end_position`	`Optional[int]`	Filter by end position

Returns: List[List] — one 6-element list per outgoing dependency edge in the filtered range:

Index	Type	Description
`[0]`	`int`	Start position of the dependency edge on this item
`[1]`	`int`	End position of the dependency edge on this item
`[2]`	`str`	Collection type of the dependent (child) item list (e.g. `"embedding_list"`, `"record_list"`)
`[3]`	`str`	Name of the child item list
`[4]`	`int`	Start position of the child entry
`[5]`	`int`	End position of the child entry

`query_item_description`

query_item_description(item_name: str) -> List[str]

Get descriptions associated with an item list.

Parameters:

Name	Type	Description
`item_name`	`str`	Name of the item list

Returns: List[List] — one 2-element list per description attached to this item:

Index	Type	Description
`[0]`	`str`	Description label (e.g. `"BASE"`)
`[1]`	`str`	Full text of the description

`query_item_creation_process`

query_item_creation_process(item_name: str) -> List[Dict[str, Any]]

Get the process that created an item list.

Parameters:

Name	Type	Description
`item_name`	`str`	Name of the item list

Returns: List[Dict[str, Any]] — one dict per creating process:

Key	Type	Description
`process_id`	`str`	ArangoDB document ID of the creating process (e.g. `"process_list/my_process"`)
`index`	`int`	Run index at which the item was created

`query_item_process`

query_item_process(
    item_name: str,
    start_position: Optional[int] = None,
    end_position: Optional[int] = None
) -> List[Dict[str, Any]]

Get processes that modified an item list. Can filter by position range within the list.

Parameters:

Name	Type	Description
`item_name`	`str`	Name of the item list
`start_position`	`Optional[int]`	Filter by start position
`end_position`	`Optional[int]`	Filter by end position

Returns: List[Dict[str, Any]] — one dict per process that wrote entries in the filtered range:

Key	Type	Description
`process_id`	`str`	ArangoDB document ID of the process (e.g. `"process_list/my_process"`)
`index`	`int`	Run index at which the entries were written

`query_process_item`

query_process_item(process_name: str) -> List[Dict[str, Any]]

Get all items created or modified by a given process name.

Parameters:

Name	Type	Description
`process_name`	`str`	Name of the process list

Returns: List[Dict[str, Any]] — one dict per item list touched by this process:

Key	Type	Description
`name`	`str`	Name of the item list
`start_position`	`int \\| None`	Earliest start position written by this process; `None` if not recorded
`end_position`	`int \\| None`	Latest end position written by this process; `None` if not recorded

Description Queries

Functions for searching across descriptions attached to any item list type.

`query_description`

query_description(
    description_text: str,
    k: int = 500,
    text_analyzer: str = "text_en"
) -> List[Any]

Search descriptions by token match across all data types. All tokens in description_text must match.

Parameters:

Name	Type	Description
`description_text`	`str`	Text to search in descriptions (all tokens must match)
`k`	`int`	Maximum number of results to return
`text_analyzer`	`str`	ArangoSearch analyzer to use for tokenization

Returns: List[List] — one 4-element list per matching description:

Index	Type	Description
`[0]`	`str`	Description label (e.g. `"BASE"`)
`[1]`	`str`	Full text of the description
`[2]`	`str`	Name of the item list this description belongs to
`[3]`	`str`	Collection type of the item list (e.g. `"file_list"`, `"embedding_list"`)

`query_description_embedding`

query_description_embedding(
    embedding: List[float],
    k: int = 500,
    use_approx: bool = False
) -> List[Any]

Search descriptions by embedding similarity across all data types.

Parameters:

Name	Type	Description
`embedding`	`List[float]`	Query embedding vector
`k`	`int`	Maximum number of results to return
`use_approx`	`bool`	Use approximate (faster) nearest-neighbor search when available

Returns: List[List] — one 4-element list per matching description, sorted by descending cosine similarity:

Index	Type	Description
`[0]`	`str`	Description label (e.g. `"BASE"`)
`[1]`	`str`	Full text of the description
`[2]`	`str`	Name of the item list this description belongs to
`[3]`	`str`	Collection type of the item list (e.g. `"file_list"`, `"embedding_list"`)