Pipelines reference - Retrievers

This section provides reference documentation for Pipelines Retrievers. It includes information on the functions and views available in the aidb extension related to Retrievers.

Views

aidb.retrievers

The aidb.retrievers view shows information about the retrievers that have been created in the database.

ColumnTypeDescription
idinteger
nametextName of the retriever.
vector_table_nametextName of the table where the embeddings are stored. Gets newly created if it doesn’t exist, managed by aidb.
vector_table_key_columntextThe column that be used to store the key that references the key in source data when computing embeddings. Recommend to use the default and let aidb manage this table.
vector_table_vector_columntextThe column to store embeddings in. Recommend to use the default and let aidb manage this table.
model_nametextName of the registered model to use for embedding computation and retrievals.
topkintegerHow many results should be returned during a retrieve by default. Similar to LIMIT in SQL.
distance_operatoraidb.DistanceOperatorDuring retrieval, what vector operation should be used to compare the vectors.
optionsjsonbCurrently unused.
source_typetextType of source data the retriever is working with. Can be either 'Table' or 'Volume'.
source_table_nameregclassname of the table that has the source data we compute embeddings for, and that we retrieve from. Only applicable to retrievers configured with aidb.register_retriever_for_volume().
source_table_data_columntextcolumn name in the source table that we compute embeddings for. This is also the column that will be returned in retrieve operations.
source_table_data_column_typeaidb.RetrieverSourceDataFormatType of data the retriever working with. Uses type aidb.RetrieverSourceDataFormat. Only relevant for table based retrievers. In the case of a volume based retriever, the format/type information is discovered from the volume.
source_table_key_columntextcolumn to use as key for storing the embedding in the vector table. This provides a reference from the embedding to the source data
source_volume_nametextName of the volume to use as a data source. Only applicable to retrievers configured with aidb.register_retriever_for_volume().

Types

aidb.DistanceOperator

The aidb.DistanceOperator type is an enum that represents the distance operators that can be used during retrieval.

ValueDescription
L2Euclidean distance.
InnerInner product.
CosineCosine similarity.
L1L1 distance.
HammingHamming distance.
JaccardJaccard distance.

SQL definition:

CREATE TYPE DistanceOperator AS ENUM (
  'L2',
  'InnerProduct',
  'Cosine',
  'L1',
  'Hamming',
  'Jaccard'
);

aidb.RetrieverSourceDataFormat

The aidb.RetrieverSourceDataFormat type is an enum that represents the data formats that can be used as source data.

ValueDescription
TextText data.
ImageImage data.
PdfPDF data.

SQL definition:

CREATE TYPE RetrieverSourceDataFormat AS ENUM (
  'Text',
  'Image',
  'Pdf'
);

Functions

aidb.register_retriever_for_table

Registers a retriever for a given table.

Parameters

ParameterTypeDefaultDescription
p_nameTEXTRequiredName of the retriever
p_model_nameTEXTRequiredName of the registered model to use
p_source_table_nameregclassRequiredName of the table to use as source
p_source_table_data_columnTEXTRequiredColumn name in source table to use
p_source_table_data_column_typeaidb.RetrieverSourceDataFormatRequiredType of data in that column ("Text"."Image","PDF")
p_source_table_key_columnTEXT'id'Column to use as key to reference the rows
p_vector_table_nameTEXTNULL
p_vector_table_vector_columnTEXT'embeddings'
p_vector_table_key_columnTEXT'id'
p_topkINTEGER1
p_distance_operatoraidb.distanceoperator'L2'
p_optionsJSONB'{}'::JSONBOptions

Example

SELECT aidb.register_retriever_for_table(
               p_name => 'test_retriever',
               p_model_name => 'simple_model',
               p_source_table_name => 'test_source_table',
               p_source_table_data_column => 'content',
               p_source_table_data_column_type => 'Text',
       );

aidb.register_retriever_for_volume

Registers a retriever for a given PGFS volume.

Parameters

ParameterTypeDefaultDescription
p_nameTEXTRequiredName of the retriever.
p_model_nameTEXTRequiredName of the model.
p_source_volume_nameTEXTRequiredName of the volume.
p_vector_table_nameTEXTNULLName of the vector table.
p_vector_table_vector_columnTEXT'embeddings'Name of the vector column.
p_vector_table_key_columnTEXT'id'Name of the key column.
p_topkINTEGER1Number of results to return.
p_distance_operatoraidb.distanceoperator'L2'Distance operator.
p_optionsJSONB'{}'::JSONBOptions.

Example

SELECT aidb.register_retriever_for_volume(
               p_name => 'demo_vol_retriever',
               p_model_name => 'simple_model',
               p_source_volume_name => 'demo_bucket_vol'
       );

aidb.enable_auto_embedding_for_table

Enables automatic embedding generation for a given table.

Parameters

ParameterTypeDefaultDescription
p_nameTEXTName of registered table which should have auto-embedding enabled.

Example

SELECT aidb.enable_auto_embedding_for_table('test_retriever');

aidb.disable_auto_embedding_for_table

Enables automatic embedding generation for a given table.

Parameters

ParameterTypeDefaultDescription
p_nameTEXTName of registered table which should have auto_embedding disabled.

Example

SELECT aidb.enable_auto_embedding_for_table('test_retriever');

aidb.bulk_embedding

Generates embeddings for all data in a given table if there is existing data in the table.

Parameters

ParameterTypeDefaultDescription
retriever_nameTEXTName of retriever which which should have embeddings generated.

Example

edb=# select aidb.bulk_embedding('test_retriever');
Output
INFO:  bulk_embedding_text found 3 rows in retriever test_retriever
 bulk_embedding
----------------

(1 row)

aidb.retrieve_key

Retrieves a key from matching embeddings without looking up the source data.

Parameters

ParameterTypeDefaultDescription
retriever_nameTEXTName of retriever which should be used for retrieval.
query_stringTEXTQuery string to be used for retrieval.
number_of_resultsINTEGER0Number of results to be returned.

Example

SELECT * FROM aidb.retrieve_key('test_retriever', 'shoes', 2);
Output
key  |      distance
-------+--------------------
 43941 | 0.2938963414490189
 19337 | 0.3023805122617119
(2 rows)

aidb.retrieve_text

Retrieves the source text data from matching embeddings by joining the embeddings with the source table.

Parameters

ParameterTypeDefaultDescription
retriever_nameTEXTName of retriever which should be used for retrieval.
query_stringTEXTQuery string to be used for retrieval.
number_of_resultsINTEGER0Number of results to be returned.

Returns

ColumnTypeDescription
keytextKey of the retrieved data.
valuetextValue of the retrieved data.
distancedouble precisionDistance of the retrieved data from the query.

Example

SELECT * FROM aidb.retrieve_text('test_retriever', 'jacket', 2);
Output
key  |                       value                        |      distance
-------+----------------------------------------------------+--------------------
 19337 | United Colors of Benetton Men Stripes Black Jacket | 0.2994317672742334
 55018 | Lakme 3 in 1 Orchid  Aqua Shine Lip Color          | 0.3804609668507203
(2 rows)

aidb.delete_retriever

Deletes only the retriever's configuration from the database.

Parameters

ParameterTypeDefaultDescription
retriever_nameTEXTName of retriever which should be deleted.

Example

select aidb.delete_retriever('test_retriever');
Output
 delete_retriever
------------------

(1 row)

aidb.create_volume

Creates a volume from a PGFS storage location for use as a data source in retrievers.

Parameters

ParameterTypeDefaultDescription
nameTEXTName of the volume to be created.
server_nameTEXTName of the storage location to be used for the volume.
pathTEXTPath to the volume in the storage location.
mime_typeTEXTType of the data in the volume (Text or Image)

Example

select aidb.create_volume('demo_bucket_vol', 'demo_bucket', 'demo_bucket/demo_folder', 'Text');
Output
 create_volume
---------------
(1 row)

aidb.list_volumes

Lists all the volumes that have been created in the database.

Example

select * from aidb.list_volumes();

aidb.drop_volume

Drops a volume from the database.

Parameters

ParameterTypeDefaultDescription
volume_nameTEXTName of the volume to be dropped.

Example

select aidb.drop_volume('demo_bucket_vol');
Output
 drop_volume
-------------
(1 row)

Could this page be better? Report a problem or suggest an addition!