update to V0.12
Support Embedding API call, Embedding vectorization of the incoming text, support string or text array
Embedding (get the embedding vector of the text)
init(self, API_Key='', Proxy='', Model='text-embedding-ada-002', URL='
https://api.openai.com/v1/embeddings', Debug=False)
Initialize the creation of the Embedding object, with the following parameters
API_Key: your openAI API Key
Proxy: If needed, set your http proxy server, e.g.:
http://192.168.3.1:3128
Model: If needed, you can set it according to the OpenAI API documentation.
URL: If OpenAI changes the API call address, you can change it here. Note that this is a list of two addresses, the first address is the original language output, the second address is translated into English.
Debug: if there is a network error or call error, whether to print out the error message, the default is not
embed(data)
data: the string or list of strings to be encoded
The result is a list of embed vectors (1536 dimensions) corresponding to the strings, which can be obtained by
For the input string, ret[0].get('embedding') can be used to get the vector
For a list of input strings, you can get the list of vectors with [i.get('embedding') for i in ret]
Statistics
Call_cnt: the cumulative number of calls to Whisper
Total_tokens: cumulative number of transcribed texts (Note: OpenAI is billed for the length of the audio, not the number of texts)
Simple example
import tinyOpenAI
Embedding('your OpenAI API_Key', Debug=True)
r = e.embed('just for fun')
print('vector dimension:',len(r[0].get('embedding')))
# Compare the similarity of two texts
r = e.embed(['just for fun','hello world.'])
import numpy as np
print('Similarity result:',np.dot(r[0].get('embedding'), r[1].get('embedding')))
Ref:
OpenAI Embeddings Doc