Default Code Functions API
There are several simple code functions included with the tablevault
library. You can execute them by specifying the right parameters in a YAML Builder
file.
Dataframe Creation
These functions are loaded with module_name: table_generation
and is_custom: false
. They are meant to be used with an IndexBuilder
YAML file.
The YAML Builder tab has the specified arguments of the builder file for the specific functions. You may have to fill out additional arguments not shown.
create_paper_table_from_folder
builder_type: IndexBuilder
changed_columns: ['file_name', 'artifact_name', 'original_path']
primary_key: ['file_name']
python_function: create_paper_table_from_folder
code_module: table_generation
arguments:
folder_dir: str
copies: int
artifact_folder: ~ARTIFACT_FOLDER~
extension: str
is_custom: false
dtypes:
artifact_name: artifact_string
Scan a directory for extension
files, copy each into an artifact directory, and return a table describing every copy.
Parameter | Type | Description |
---|---|---|
folder_dir |
str |
Folder containing the source PDFs |
copies |
int |
How many copies per file (≥1) |
artifact_folder |
str |
Destination directory for the copies |
extension |
str |
File extension to be filtered |
The resulting DataFrame
has three columns:
file_name
– base filename (without extension)artifact_name
– copied file’s name (includes suffixes whencopies > 1
)original_path
– path to the original PDF
create_data_table_from_table
Return a copy of df
, with optional truncation or random sampling.
Parameter | Type | Default | Description |
---|---|---|---|
df |
pandas.DataFrame |
Source data | |
nrows |
int |
None |
Row limit (leave None for all rows) |
random_sample |
bool |
False |
If True , randomly sample nrows from df |
create_data_table_from_csv
Load a CSV file into a new DataFrame
and return a copy.
Parameter | Type | Description |
---|---|---|
csv_file_path |
str |
Path to the CSV on disk |
create_data_table_from_list
Turn an in-memory Python list into a single-column table.
Parameter | Type | Description |
---|---|---|
vals |
list |
Values to place in the column |
Random String Module
random_row_string
Produce a single tuple of random strings—one per name in column_names
.
Parameter | Type | Description |
---|---|---|
column_names |
list[str] |
Column labels that determine tuple length |
**kwargs |
unused | Reserved for future options |
Returns: a length-len(column_names)
tuple of 20-character strings or a singular string (if only one column).