Filtering equals does not perform an equality comparison

Description

Below you can see how equal is not performing a hard equality but rather it seems to be doing a similarity or contains comparison.
In my input I specify I want it to be equal to “https://www.cise.ufl.edu” and yet I get an item with “https://www.cise.ufl.edu/wp-content/uploads/2018/12/UofFIABCharter5.0.pdf
INPUT

from weaviate.classes.query import Filter

collection = client.collections.get(name="web_retrieval")
# response = collection.query.near_text(
#     query="CISE academic advisors are an invaluable resource for students at all stages of their academic careers.",
#     limit=5
#     filters=[]
# )
response = collection.query.fetch_objects(
    filters=Filter.by_property("url").equal("https://www.cise.ufl.edu"),
    limit=10000
)

print(len(response.objects))
print(response.objects[0].properties)
print(response.objects[1].properties)
print(response.objects[2000].properties)

OUTPUT

8988
{'content': 'UF Website Listing Accessibility Text-Only Version Privacy Policy Regulations', 'depth': 1.0, 'url': 'https://www.cise.ufl.edu'}
{'content': 'University of Florida Search Submit Search  \nComputer & Information Science & Engineering  \nSearch Submit Search  \nHome About Research Academics Admissions People News & Events Resources & Help  \nAccreditation Department History Department Administration Industrial Advisory Board (IAB) Faculty Openings  \nResearch Areas Research Centers & Labs Faculty Research Videos  \nUndergraduate Graduate Academic Advising Special Topics Courses Course Syllabi Labs & Conference Rooms Availability  \nUndergraduate Graduate  \nFaculty Staff PH.D. Students on the Job Market  \nNews Newsletters CISE Career Fair Events Submit an Event  \nStudents Faculty & Staff IT Help Department Awards Departmental Committees Alumni & Friends  \nCurrent Students Faculty & Staff Family & Visitors Alumni & Friends Herbert Wertheim College of Engineering CISE Career Fair Giving Alumni & Friends UF Admissions Directory **Faculty Openings**  \nFacebook Twitter YouTube LinkedIn Instagram  \nWelcome', 'depth': 1.0, 'url': 'https://www.cise.ufl.edu'}
{'content': 'e', 'depth': 2.0, 'url': 'https://www.cise.ufl.edu/wp-content/uploads/2018/12/UofFIABCharter5.0.pdf'}

Server Setup Information

Any additional Information

Hi @CakeCrusher !

What is the tokenization you have set for the url field?

If you have not touched this configuration, it will default to WORD.

You can check this with:

for p in collection.config.get().properties:
    print(p.name, p.tokenization)

There are different tokenizations you can set. You will probably want the field as it will will not touch the property and treat it as a whole. On the other hand, word will “break” the property value into words

Here we have more info on tokenization:

and here an academy course that will double click on that:

Let me know if this helps!

Thanks!