TableReference String Guide
TableReference strings provide a powerful way to dynamically fetch and use data from other table instances, or the current table instance, directly within most string-based fields of your builder YAML files. This allows for highly dynamic and data-driven configurations.
Core Concept: The << ... >> Wrapper
The fundamental syntax for a TableReference string is to enclose the reference string within double angle brackets: << ... >>.
The << ... >> syntax can be applied to any singular string, boolean or numeric entry in your builder definition, such as values in the arguments block, items in the changed_columns list, or even the python_function and code_module names themselves.
The primary exception is the dependencies field, which does not support this dynamic resolution (dependencies inform the system which tables are loaded and must be resolved before TableReference strings are parsed).
Example of broad usage in a builder YAML:
# In an IndexBuilder or ColumnBuilder
builder_type: IndexBuilder
python_function: "<<config_table.builder_function_name[use_case::'main_index']>>" # Dynamic function name
code_module: "<<config_table.builder_module_name>>" # Dynamic module name
changed_columns:
- "<<config_table.primary_output_column>>" # Dynamic column name
- "fixed_secondary_column"
- "<<self.another_dynamic_column_name_source[index]>>" # Column name derived from self
arguments:
source_file_path: "/data/raw/<<config_table.file_name[source_id::'source_A']>>.csv"
lookup_value: "<<lookup_table.value[key::<<self.current_key[index]>>]>>"
static_text_with_ref: "Report for ID: <<self.id[index]>>"
If a field value is entirely a TableReference string (e.g., python_function: "<<config.func_name>>"), the resolved value of the reference will be used directly. If the reference is part of a larger string (e.g., in arguments), the resolved value will be converted to a string and substituted into place.
Anatomy of a TableReference String
Inside the << ... >> wrapper, a TableReference string follows a specific structure to identify the table, an optional instance_id, specific columns, and optional filtering conditions:
All parts (instance_id, columns, conditions) are optional.
Components:
tableName: The name of the table to query.(instance_id): (Optional) Specifies a particular instance of the table..{columns}or.COLUMN: (Optional) Selects specific columns.[conditions]: (Optional) Filters the rows of the table.
1. Table Name (tableName)
- Syntax: A string of alphanumeric characters, underscores (
_), or hyphens (-). -
Special Keyword
self: The keywordselfrefers to the current table instance being processed by the builder. -
Dynamic Table Name: The table name itself can be a nested
TableReferencestring.- Example in a field:
code_module: "<< <<table_map.module_column[type::'etl']>> >>" - Reference string example:
<< <<another_table.config_key[type::'source']>>.data_column >>
- Example in a field:
self Keyword Restrictions
The self keyword can only be used in the argument field.
2. Version ((instance_id))
- Syntax: Instance in parentheses, e.g.,
(base_1748113624_d049944b-8548-46d2-a247-bbf3769fbadc). - Optional: If omitted, TableVault will typically use the latest available instance of the table based on its internal logic.
- Dynamic Version: The version string can be a nested
TableReference.- Example reference string:
my_table(<<version_control_table.active_version[table_name::'my_table']>>)
- Example reference string:
3. Columns (.{columns} or .COLUMN)
- Syntax:
- Single Column: Preceded by a dot (
.), e.g.,.user_id. - Multiple Columns: Preceded by a dot (
.) and enclosed in curly braces{}, with column names separated by commas, e.g.,.{name,email,age}.
- Single Column: Preceded by a dot (
- Optional:
- If omitted, and conditions are present, all columns are available for filtering, and the selected columns depend on the output simplification (see below).
- If omitted, and no conditions are present, the entire DataFrame (or its simplified form) is returned.
- Dynamic Column Names: Column names within the list (or the single column name) can be nested
TableReferences. This is highly relevant for fields likechanged_columnsorprimary_key.- Example in
changed_columns:changed_columns: ["id", "<<config_table.main_data_field_name>>"] - Example reference string for a column name:
my_table.<<config.target_column>> - Example reference string with multiple dynamic columns:
my_table.{id,<<audit_table.tracked_field[user::'admin']>>,status}
- Example in
- Special Case:
self.index: If you use<<self.index>>this specifically resolves to the current row's physical index value during row-wise operations.
4. Conditions ([conditions])
- Syntax: Enclosed in square brackets
[...]. Multiple conditions are separated by commas,. - Optional: If omitted, all rows (of the selected version and columns) are considered.
- Each condition specifies a column to filter on and the criteria.
Condition Types:
-
Equality (
columnName::value):- Filters rows where
columnNameequalsvalue. - Example reference string:
orders.product_id[customer_id::'cust123',status::'shipped'] - The
valueis automatically quoted for string comparisons if not already quoted (e.g.,status::shippedbecomesstatus == 'shipped'). Numerical values are used directly. valuecan be a nested aTableReferencestring:orders.items[user_id::<<user_table.id[username::'jdoe']>>]
- Filters rows where
-
Range (
columnName::start_value:end_value):- Filters rows where
columnNameis greater than or equal tostart_valueAND less thanend_value. - Example reference string:
events.timestamp[timestamp::'2023-01-01T00:00:00':'2023-01-01T23:59:59'] start_valueandend_valuecan be literals or nestedTableReferencestrings. Values are formatted appropriately for comparison based on the column's data type.
- Filters rows where
-
Implicit Index/Contextual Value (
columnName):- Filters rows where
columnNameequals a contextually providedindexvalue (the physical index of the row currently being processed by the builder). - Example: If processing row
101ofself, then the reference<<other_table.data_column[join_key_in_other_table]>>(within an argument) would attempt to find rows inother_tablewherejoin_key_in_other_table == 101. - This is particularly useful for lookups related to the current item in
self. - If
self.some_column[key_column]is used, andindexis defined, it impliesself.some_columnwherekey_column == index.
- Filters rows where
-
Dynamic Keys and Values: All parts of a condition (the column name, the value, start/end values) can be nested
TableReferencestrings.- Example reference string:
my_table[<<config.filter_column>>::<<config.filter_value>>]
- Example reference string:
index Condition
The index keyword can only be used in the arguments key of a row-wise function (when row-wise is set to true).
Nested References
As shown in examples above, any component of a TableReference string —the table name, version string, column names, condition keys, or condition values—can itself be another TableReference stringenclosed in << ... >>. TableVault will resolve the innermost references first and use their results to construct the outer reference before resolving it.
Complex Example (from code, used in an argument):
<<stories.artifact_name[paper_name::<<self.paper_name[index]>>]>>
-
<<self.paper_name[index]>>: Resolves first. It fetches thepaper_namefrom the current row (index) of theselftable. -
Let's say the above yields
'my_research_paper'. -
The outer reference becomes:
<<stories.artifact_name[paper_name::'my_research_paper']>>. -
This then fetches
artifact_namefrom thestoriestable wherepaper_nameis'my_research_paper'.
Resolution and Data Retrieval
- When a builder is executed, TableVault parses these reference strings from the relevant YAML fields.
- It uses an internal cache of DataFrames (for already loaded tables and versions) to retrieve data efficiently.
- For references involving
selfor implicit index conditions, the context of the current row being processed (often an integerindex) is crucial for resolving the correct data. - The recursive parsing handles references within lists, dictionaries, and other nested structures in the YAML, as long as they ultimately resolve to strings or collections of strings where references are found.
Examples of Reference Strings
These examples illustrate the reference string syntax itself. These strings would be placed inside << >> within a suitable YAML field.
-
Fetch a single column from another table:
my_data_table.user_email- Result: A dataframe with a single
user_emailcolumn (might be converted to a list contextually).
- Result: A dataframe with a single
-
Get a specific value using a filter:
users_table.full_name[user_id::'user-007']- Result: A single string representing
full_nameforuser_id.
- Result: A single string representing
-
Get a value from
selfbased on the current row's context (implicit index):self.status[id_column_of_self]- Result: A single boolean representing
self.statuswhereid_column_of_self == index.
- Result: A single boolean representing
-
Reference with a specific version:
app_settings(base_1748275064_5782ef5b-4023-4618-a419-cf921c365c64).timeout_ms- Result: A dataframe with a single
user_emailcolumn that is from thebase_1748275064_5782ef5b-4023-4618-a419-cf921c365c64instance.
- Result: A dataframe with a single
-
Using a range condition:
transactions.amount[transaction_date::'2024-01-01':'2024-01-31']- Result: A dataframe with a single
amountcolumn with transactions between2024-01-01and2024-01-31(if properly sorted). - Note the range is calculated by the physical index.
- Result: A dataframe with a single
-
Nested reference for dynamic filtering:
preferences.setting_value[user_id::<<self.user_identifier[index]>>, setting_key::'theme']- Result: A single string representing
setting_valueforuser_idat current index with thethemekey.
- Result: A single string representing
Error Handling
If a TableReference string is malformed (e.g., unbalanced brackets, illegal characters) or if a reference cannot be resolved at runtime (e.g., table not found, column missing, nested reference fails), a TableReferenceError will typically be raised, halting the builder process. Ensure your references are correct and the data they point to exists.