Hi All,
I try to use
client = connect_to_weaviate()
client = weaviate.connect_to_local()
df = pd.read_json(‘mtcars.json’,lines=True)
column_names = df.columns
properties_list =
for column_name in column_names:
# print(column_name)
properties_list.append(wc.Property(name=column_name, data_type=wc.DataType.TEXT))
# print(properties_list)
print(properties_list)
client.collections.create(
name=“Cars”,
properties=properties_list,
# Define the vectorizer module
vectorizer_config=wc.Configure.Vectorizer.text2vec_openai(),
# Define the generative module
generative_config=wc.Configure.Generative.openai()
)
Get the collection
cars = client.collections.get(“Cars”)
Enter context manager
with cars.batch.dynamic() as batch:
# Loop through the data
for i, car in tqdm(df.iterrows()):
# Convert data types
# Convert a JSON array to a list of integers
# Build the object payload
car_obj = {}
for value in column_names:
car_obj[value]= str(car[value])
# print(car_obj)
# Add object to batch queue
# print(generate_uuid5(car["model"]))
batch.add_object(
properties=car_obj,
uuid=generate_uuid5(car["model"])
# references=reference_obj # You can add references here
)
# Batcher automatically sends batches
Check for failed objects
if len(cars.batch.failed_objects) > 0:
print(f"Failed to import {len(cars.batch.failed_objects)} objects")
but I got error to provide
headers = {“X-OpenAI-Api-Key”: os.getenv(“OPENAI_APIKEY”)} # Replace with your OpenAI API key
can we create and import collection dataset without using headers or third party API keys
Thanks
Hi @chiranjithazra,
Welcome to our community and it’s great to have you here.
You may need to set the vectorizer to none on the collection level. Here’s how you can do it:
import weaviate.classes as wvc
questions = client.collections.create(
“Question”,
vectorizer_config=wvc.config.Configure.Vectorizer.none(),
)
Regards,
Mohamed Shahin
Weaviate Support
Hi @Mohamed_Shahin
Thanks for your response.
But if I use
questions = client.collections.create(
“Question”,
vectorizer_config=wvc.config.Configure.Vectorizer.none(),
)
then
response = cars.query.near_text(
query=“Merc”,
limit=2,
return_metadata=wq.MetadataQuery(distance=True)
)
for o in response.objects:
print(o.properties)
print(o.metadata.distance)
### Not Working due to :--- vectorize params: could not vectorize input for collection DipsCarsTest with search-type nearText. Make sure a vectorizer module is configured for this collection
This one not working
can you help me on this?
Thanks
Hi @chiranjithazra,
I apologize for my misread earlier regarding the near_text(). This will require a vectorizer because when you insert a query with a string as “Merc,” this string needs to be vectorized before performing a similarity search. Initially, I thought you were using near_vector().
With Vectorizer set to none, bring your own vectors, you can search with near_vector without the need for a vectorizer. However, if you plan to use near_text and pass a string, that string indeed needs to be vectorized to be used for vector similarity search.
So, think of it like this: if you take “Merc” and vectorize it elsewhere matching the same length of your vectors in the vector space in the DB, then take those floating numbers and pass them into a near_vector search, it will work as if you do near_text() because you’ve provided the vectors of that string.
Does that make sense?