Pipelines reference - Retrievers
This section provides reference documentation for Pipelines Retrievers. It includes information on the functions and views available in the aidb extension related to Retrievers.
Views
aidb.retrievers
The aidb.retrievers
view shows information about the retrievers that have been created in the database.
Column | Type | Description |
---|---|---|
id | integer | |
name | text | Name of the retriever. |
vector_table_name | text | Name of the table where the embeddings are stored. Gets newly created if it doesn’t exist, managed by aidb. |
vector_table_key_column | text | The column that be used to store the key that references the key in source data when computing embeddings. Recommend to use the default and let aidb manage this table. |
vector_table_vector_column | text | The column to store embeddings in. Recommend to use the default and let aidb manage this table. |
model_name | text | Name of the registered model to use for embedding computation and retrievals. |
topk | integer | How many results should be returned during a retrieve by default. Similar to LIMIT in SQL. |
distance_operator | aidb.DistanceOperator | During retrieval, what vector operation should be used to compare the vectors. |
options | jsonb | Currently unused. |
source_type | text | Type of source data the retriever is working with. Can be either 'Table' or 'Volume'. |
source_table_name | regclass | name of the table that has the source data we compute embeddings for, and that we retrieve from. Only applicable to retrievers configured with aidb.register_retriever_for_volume(). |
source_table_data_column | text | column name in the source table that we compute embeddings for. This is also the column that will be returned in retrieve operations. |
source_table_data_column_type | aidb.RetrieverSourceDataFormat | Type of data the retriever working with. Uses type aidb.RetrieverSourceDataFormat . Only relevant for table based retrievers. In the case of a volume based retriever, the format/type information is discovered from the volume. |
source_table_key_column | text | column to use as key for storing the embedding in the vector table. This provides a reference from the embedding to the source data |
source_volume_name | text | Name of the volume to use as a data source. Only applicable to retrievers configured with aidb.register_retriever_for_volume(). |
Types
aidb.DistanceOperator
The aidb.DistanceOperator
type is an enum that represents the distance operators that can be used during retrieval.
Value | Description |
---|---|
L2 | Euclidean distance. |
Inner | Inner product. |
Cosine | Cosine similarity. |
L1 | L1 distance. |
Hamming | Hamming distance. |
Jaccard | Jaccard distance. |
SQL definition:
aidb.RetrieverSourceDataFormat
The aidb.RetrieverSourceDataFormat
type is an enum that represents the data formats that can be used as source data.
Value | Description |
---|---|
Text | Text data. |
Image | Image data. |
PDF data. |
SQL definition:
Functions
aidb.register_retriever_for_table
Registers a retriever for a given table.
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
p_name | TEXT | Required | Name of the retriever |
p_model_name | TEXT | Required | Name of the registered model to use |
p_source_table_name | regclass | Required | Name of the table to use as source |
p_source_table_data_column | TEXT | Required | Column name in source table to use |
p_source_table_data_column_type | aidb.RetrieverSourceDataFormat | Required | Type of data in that column ("Text"."Image","PDF") |
p_source_table_key_column | TEXT | 'id' | Column to use as key to reference the rows |
p_vector_table_name | TEXT | NULL | |
p_vector_table_vector_column | TEXT | 'embeddings' | |
p_vector_table_key_column | TEXT | 'id' | |
p_topk | INTEGER | 1 | |
p_distance_operator | aidb.distanceoperator | 'L2' | |
p_options | JSONB | '{}'::JSONB | Options |
Example
aidb.register_retriever_for_volume
Registers a retriever for a given PGFS volume.
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
p_name | TEXT | Required | Name of the retriever. |
p_model_name | TEXT | Required | Name of the model. |
p_source_volume_name | TEXT | Required | Name of the volume. |
p_vector_table_name | TEXT | NULL | Name of the vector table. |
p_vector_table_vector_column | TEXT | 'embeddings' | Name of the vector column. |
p_vector_table_key_column | TEXT | 'id' | Name of the key column. |
p_topk | INTEGER | 1 | Number of results to return. |
p_distance_operator | aidb.distanceoperator | 'L2' | Distance operator. |
p_options | JSONB | '{}'::JSONB | Options. |
Example
aidb.enable_auto_embedding_for_table
Enables automatic embedding generation for a given table.
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
p_name | TEXT | Name of registered table which should have auto-embedding enabled. |
Example
aidb.disable_auto_embedding_for_table
Enables automatic embedding generation for a given table.
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
p_name | TEXT | Name of registered table which should have auto_embedding disabled. |
Example
aidb.bulk_embedding
Generates embeddings for all data in a given table if there is existing data in the table.
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
retriever_name | TEXT | Name of retriever which which should have embeddings generated. |
Example
aidb.retrieve_key
Retrieves a key from matching embeddings without looking up the source data.
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
retriever_name | TEXT | Name of retriever which should be used for retrieval. | |
query_string | TEXT | Query string to be used for retrieval. | |
number_of_results | INTEGER | 0 | Number of results to be returned. |
Example
aidb.retrieve_text
Retrieves the source text data from matching embeddings by joining the embeddings with the source table.
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
retriever_name | TEXT | Name of retriever which should be used for retrieval. | |
query_string | TEXT | Query string to be used for retrieval. | |
number_of_results | INTEGER | 0 | Number of results to be returned. |
Returns
Column | Type | Description |
---|---|---|
key | text | Key of the retrieved data. |
value | text | Value of the retrieved data. |
distance | double precision | Distance of the retrieved data from the query. |
Example
aidb.delete_retriever
Deletes only the retriever's configuration from the database.
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
retriever_name | TEXT | Name of retriever which should be deleted. |
Example
aidb.create_volume
Creates a volume from a PGFS storage location for use as a data source in retrievers.
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
name | TEXT | Name of the volume to be created. | |
server_name | TEXT | Name of the storage location to be used for the volume. | |
path | TEXT | Path to the volume in the storage location. | |
mime_type | TEXT | Type of the data in the volume (Text or Image) |
Example
aidb.list_volumes
Lists all the volumes that have been created in the database.
Example
aidb.drop_volume
Drops a volume from the database.
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
volume_name | TEXT | Name of the volume to be dropped. |
Example
Could this page be better? Report a problem or suggest an addition!