Ash Vardanian Profile Banner
Ash Vardanian Profile
Ash Vardanian

@ashvardanian

1,342
Followers
629
Following
279
Media
1,065
Statuses

Founder @unum_cloud | Exascale Search & AI

San Francisco, CA
Joined June 2012
Don't wanna be here? Send us removal request.
Pinned Tweet
@ashvardanian
Ash Vardanian
3 months
We are releasing a new tiny VLM 🎉 The model is smaller and much more accurate than our previous @unum_cloud "uform-gen" downloaded 100K times/mo The new decoder is only 0.5 billion parameters and the model is already available on @huggingface 🤗 🧵
Tweet media one
2
17
85
@ashvardanian
Ash Vardanian
6 months
Cosine Similarity: - 2'500x Faster than 🐍 - 12x faster than C I've combined AVX-512 VNNI extensions with BMI2 and masked loads, all of those seem to be surprisingly rare in open-source
5
11
132
@ashvardanian
Ash Vardanian
6 months
I may be wrong here, but is my open-source library 1000x more cost-efficient than a closed-source product that has recently raised over $100M?
Tweet media one
4
12
74
@ashvardanian
Ash Vardanian
1 year
We are open-sourcing UForm, a Multi-Modal inference lib for Semantic Search! It extends @huggingface transformers with Mid-Fusion and comes with a light-weight multi-lingual pre-trained Vision-Language model, beating @OpenAI CLIP in speed and accuracy 🥳
2
12
68
@ashvardanian
Ash Vardanian
2 months
StringZilla is trending on @github 🎉 The world wastes at least $100M a year due to low CPU utilization in basic string operations - let's fix them together 🤗
Tweet media one
1
7
58
@ashvardanian
Ash Vardanian
6 months
Billion Scale Vector Search on 1 server. Over 200,000 queries/second. is probably half of that throughput.
Tweet media one
2
4
56
@ashvardanian
Ash Vardanian
1 month
Indexing and searching 40 Million English Wikipedia pages on a laptop? 13'000 insertions/second 26'000 queries/second😎 Recipe: pull binary @cohere text embeddings from @huggingface , take numerics and search from @unum_cloud , and a laptop with 32 GB of RAM 🧵
Tweet media one
2
4
57
@ashvardanian
Ash Vardanian
10 months
Thanks for sharing! Stealing this diagram for future talks 😅
Tweet media one
@MarkCallaghanDB
Mark Callaghan
10 months
Yet another great LeanStore paper
1
16
84
1
12
57
@ashvardanian
Ash Vardanian
6 months
Big News 📢 I've partnered with @awscloud Open Data to release 28 billion molecule embeddings - potentially the largest public #cheminformatics dataset, searchable with @unum_cloud USearch, to accelerate AI-powered drug discovery and broader science ⚗️
6
14
53
@ashvardanian
Ash Vardanian
1 year
Last week was 🔥 for vector search. Weaviate raised $50M, and Pinecone raised $100M... That's a lot and makes you believe that vector search is hard. But it's not. I have spent the last few days implementing a single-file vector search engine... 🧵 1/7
@unum_cloud
Unum
1 year
We are excited to share a single-header C++11 vector-search engine with SDKs for #Python , #JavaScript , #Java , #Rust , and soon #GoLang & #Wolfram ! It's tiny and easy to use, but still brings half-precision & integer quantization and scales beyond 4B points!
1
7
35
2
5
44
@ashvardanian
Ash Vardanian
5 months
If you need a hardware-accelerated Vector Search on iOS devices - check out USearch 😉 - already used by publishers with 1M+ installations, here is a Swift demo app:
@Karmedge
Robert Lukoszko — e/acc
5 months
Apple has started showing/leaking some of their LLM developments in public. during last few weeks I saw this: - WebGPU now available for testing in Safari Technology Preview - "MLX" – array framework for Apple Silicon - "LLM in a flash" paper to run efficiently on device -…
Tweet media one
8
42
251
1
6
40
@ashvardanian
Ash Vardanian
3 months
UForm-Gen free #opensource ~1.5B parameter model vs Google Gemini, who'd win? Already 70k downloads from @huggingface this month! Definitely a cherry pick on our side, but I'm still proud for @unum_cloud and excited for next models Spotted by @or_vov 🤗
Tweet media one
@skalskip92
SkalskiP
3 months
Gemini Advanced vs. GPT-4V? I did some tests, and Gemini Advanced is behind other LMMs. At least when it comes to image processing. - Only one question about the image. - No multi-image reasoning. - “Sorry, I can't help with images of people yet.” ↓ Read more
Tweet media one
20
18
148
1
6
36
@ashvardanian
Ash Vardanian
2 months
CUDA and C++ Hackers gathering at @AGIHouseSF today 🚀🚀🚀 Thanks to @khoomeik , @kylejohnmorris and @JvNixon for hosting 🤗 Should we do it every month?
Tweet media one
5
5
33
@ashvardanian
Ash Vardanian
3 months
@CramerTracker Nobody has to choose between a CPU and a GPU - everyone should get both 😅
3
0
34
@ashvardanian
Ash Vardanian
1 month
This is no joke - the most portable approximate vector search engine is getting even more portable. Over 100M devices reached already 😎 Thanks to @ngalstyan4 , @sroussey , and @simonw 🤗
Tweet media one
2
2
31
@ashvardanian
Ash Vardanian
10 months
@LukeGessler This can be improved by replacing GZip with different compression, and can be easily combined with something like HNSW to perform log-time-search. With USearch you can also store texts already compressed, and decompress+recompress on the fly while computing distances 😉
1
1
30
@ashvardanian
Ash Vardanian
28 days
This week StringZilla 🦖 received a new much friendlier mascot and improved 🦀 Rust support! And if it wasn’t enough to start using it, Google is also switching to Arm-based CPUs for cloud. 50% faster search than LibC on x86 250% faster on Arm 💪
2
2
30
@ashvardanian
Ash Vardanian
1 month
SimSIMD crosses 100 kernels and 300K PyPi downloads 🎉 The C 99 core also reaches 50K git-clones per month, but what about Rust and JavaScript bindings? I guess computing dot-products and cosine distances should be universally important across languages 🤷‍♂️
Tweet media one
2
1
26
@ashvardanian
Ash Vardanian
4 months
Just a couple of hours since the @unum_cloud /uform-gen release, and the community already created a @huggingface gradio example for image captioning🔥 👏 @albfresco
2
8
25
@ashvardanian
Ash Vardanian
1 year
Huge congrats to our fellow Armenian colleagues at @Picsart , currently ranking first on HackerNews with their recently released text-to-video models!
Tweet media one
0
3
27
@ashvardanian
Ash Vardanian
3 months
USearch v3.9 brings SQLite extensions for Vector, Full-Text, and GIS Search 🎉 It can handle BLOBs, TEXTs, JSONs, bitsets, and arbitrary scalar columns analyzed together. It exposes all kinds of distances from Cosine to Levenshtein - from measuring the distance between AI model…
Tweet media one
4
4
27
@ashvardanian
Ash Vardanian
3 months
Vector Search in @ClickHouse is getting faster and faster - from 33 seconds to 0.064 seconds for 33 million 768d vectors.. is it a 515x performance improvement?! 🤯 Proud to see such applications of our tech - @unum_cloud USearch🚀
Tweet media one
3
0
27
@ashvardanian
Ash Vardanian
5 months
@jimkxa @whatifalthist ... Space exploration? Gravitational telescopes?
0
0
24
@ashvardanian
Ash Vardanian
2 months
Microbenchmarks are fine, but what about real-world applications? I've patched @huggingface datasets library common in AI training pipelines to use StringZilla's SIMD-based iterators. Resulted in 3x throughput increase, so probably worth pushing to upstream🤔
Tweet media one
1
4
25
@ashvardanian
Ash Vardanian
8 months
USearch v2.6 is out with ~3-6x faster than FAISS exact search, even without my SimSIMD lib 🤯 Pros, what speed difference in construction and search would you anticipate for a 1 Billion points HNSW index in USearch vs FAISS? 10x? 100x?
Tweet media one
3
2
24
@ashvardanian
Ash Vardanian
11 days
What a day it was! While @peshotrie was presenting his work in Boston at RECOMB 2024, our team was accelerating DNA sequence alignment in San Francisco 🌁 We've patched 3 #opensource libraries and published a new dataset 🤯 Let's hope it accelerates pharma R&D and we all get to…
Tweet media one
@AGIHouseSF
AGI House SF
11 days
@khoomeik DNA-Zilla - accelerating DNA sequence matching with A* heuristics Overheard some of their conversations…really interesting hardcore algorithms work and all shipped open source @ashvardanian @alexbowe @Adriano34554795
Tweet media one
1
1
6
1
0
24
@ashvardanian
Ash Vardanian
4 months
Wow! People are actually using UCall (70x faster FastAPI) and it seems to work fine - over 100K downloads already! The main branch is over 100 commits behind and quite outdated, but seems like I must find a way to grow our team to support all these 😅
1
4
24
@ashvardanian
Ash Vardanian
3 months
We also now have a much more complete @Gradio space to browse different examples and play with the model 🤗
Tweet media one
@ashvardanian
Ash Vardanian
3 months
We are releasing a new tiny VLM 🎉 The model is smaller and much more accurate than our previous @unum_cloud "uform-gen" downloaded 100K times/mo The new decoder is only 0.5 billion parameters and the model is already available on @huggingface 🤗 🧵
Tweet media one
2
17
85
0
3
21
@ashvardanian
Ash Vardanian
2 months
I was waiting for @AMD Bergamo chips for many years… But this Genoa-X variant sounds insane. With over 1 GB of L3 cache and 192 threads, you can literally fit an entire Alpine Linux Docker image on every vCPU… with every image cached on the CPU itself… not in RAM 🤯
Tweet media one
@ashvardanian
Ash Vardanian
2 months
Choosing between Intel and @AMD CPUs? Considering AVX512 support and cost-efficiency? Intel first implemented AVX512 in 2016. AMD recently added those in Zen4 CPUs you can get in AWS "7a" instances. So I've benchmarked several SIMD kernels and here are my first observations 🧵
Tweet media one
1
1
7
1
1
21
@ashvardanian
Ash Vardanian
5 months
Oh, nice! SimSIMD-related article is trending on the first page of HackerNews 🎉
Tweet media one
1
2
22
@ashvardanian
Ash Vardanian
5 months
Google has published a competitive neural network with less than 1 million parameters! It's designed for efficient large-scale deduplication and similarity search, was published with a new multi-lingual W4NT3D benchmark for text retrieval in an adversarial setting, and uses…
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
3
20
@ashvardanian
Ash Vardanian
25 days
Apple has over 2 billion devices worldwide. Most of them are equipped with powerful GPUs. Time to bring multimodal AI to every pocket. Time to bring UForm to 🍏
@AlexReibman
Alex Reibman 🖇️
25 days
12/ UForm 4 Swift Family of pocket-sized multimodal AI models (small enough to run on watches) re-written in ONNX and ported to Swift @ashvardanian
Tweet media one
Tweet media two
1
4
14
3
5
20
@ashvardanian
Ash Vardanian
5 months
@timur_audio Developer experience. Haven't tried vcpkg, but submitting to Conan is an unintuitive manual process. For PyPi and NPM it takes me just 1 CI action to upload a release: CMake fetch is almost ok for header-only stuff, but I'll never get used to its syntax
0
0
20
@ashvardanian
Ash Vardanian
8 months
Typical “build it and they will come” scenario: 🦖 #opensource #python #substring #search
Tweet media one
1
3
19
@ashvardanian
Ash Vardanian
3 months
What a spike in downloads yesterday?! @unum_cloud UForm-Gen multimodal transformer crosses 15K downloads from @huggingface in the first month 🥳 In the meantime, we have 5 other different-size multimodal models in the training - it's gonna be a very packed February!
Tweet media one
1
1
17
@ashvardanian
Ash Vardanian
4 months
StringZilla v3 is starting to look serious! Crushing C++ standard library by 3-20x is one thing... But beating LibC on real-world data is what really what made my morning 🔥 The development branch:
Tweet media one
1
2
18
@ashvardanian
Ash Vardanian
5 months
Here is a simple example showcasing the cost of `if` statements in code. Up to 4096 branches - all memorized, no branching happens and the expression evaluates in only 2 CPU cycles. After that on average ~10 cycles are wasted per iteration. 5x slowdown.
Tweet media one
@ashvardanian
Ash Vardanian
5 months
HPC gurus, I'm harvesting short code snippets that would trouble the CPU's branch predictor. Any suggestions? This variant only adds around 5 cycles on average:
Tweet media one
1
1
6
1
0
18
@ashvardanian
Ash Vardanian
2 years
We went through life with a smile. Now I am smiling through tears, alone. Yesterday was the memorial service. Yesterday I was sitting next to a coffin, with the love of my life and my child in it. I must tell their sorry.
Tweet media one
2
3
18
@ashvardanian
Ash Vardanian
1 year
With this new website theme my old “Mastering C++ with Google Benchmark” beginner tutorial looks better than ever! Especially the code highlighting and the ability to reference specific lines in Markdown code sections!
3
2
17
@ashvardanian
Ash Vardanian
3 months
Compared to the C++ standard library on @Arm : - 4.4x faster substring search - 16.8x faster in reverse order - 1.7x faster sorting + vastly more functionality! Are you using @awscloud Graviton, @AmpereComputing Altra, @Azure Cobalt, @Nvidia Grace... You need StringZilla 😉
Tweet media one
@ashvardanian
Ash Vardanian
3 months
Bindings for C++, Rust, Swift, AVX-512, Levenshtein Distance & Needleman-Wunsch Scores, Rolling Fingerprints, Lazy Ranges, Faster Sorting... Yes, StringZilla v3 is out! 🦖3️⃣🎉 Special thanks to @keithmadams , @vatsal_manot , & @MikayelGrigory3 🤗
Tweet media one
0
5
15
0
3
17
@ashvardanian
Ash Vardanian
2 months
New post - NumPy vs BLAS: Losing 90% of Throughput I've compared the performance of NumPy against it's underlying OpenBLAS implementation, recording up to 10x throughput difference in dot-products of 1536-dimensional vectors, that @OpenAI Ada produces
1
3
16
@ashvardanian
Ash Vardanian
3 months
Beating standard Linux utilities - 3x faster `wc`, 4x faster `split` 😎 I've used StringZilla Python SDK to prototype replacements for popular CLI tools that deal with large texts. Initial results look great. Any Python and Shell lovers willing to join?
Tweet media one
3
2
16
@ashvardanian
Ash Vardanian
4 months
Yes, it was supposed to be about the company and its story, but I still managed to bring up SIMD, and how we use Arm SVE to go beyond "fast SIMD libraries" with SSE support and compiler auto-vectorization pragmas... no cure for me 😅 Thanks to the amazing @cerebral_valley team!
@cerebral_valley
Cerebral Valley
4 months
We're live with our first CV Deep Dive, featuring @ashvardanian and @unum_cloud : "How USearch, @ashvardanian 's Vector-Search Engine, Reached 500k+ Python Downloads 🌐" Ash covers OSS, multi-modal and more... Share and RT if you found it valuable 👀 Link below 👇
1
2
11
0
3
15
@ashvardanian
Ash Vardanian
1 month
The amazing world of Python. Load 2 GB of floats - 77 seconds. Downcast them to bits - 53 seconds. The frustration - priceless. For everything else there is MasterCard...
Tweet media one
0
0
16
@ashvardanian
Ash Vardanian
2 months
OMG! TensorFlow and JAX call latency is insane, totally inappropriate for small vector-vector operations. JAX is 6x faster than TF, Torch is 10x faster than JAX, NumPy is 10x faster than Torch, And SimSIMD is another 2x faster than NumPy 😎
Tweet media one
3
6
16
@ashvardanian
Ash Vardanian
6 months
USearch just cross 1000 ⭐️ and SimSIMD is turning 3 months old 🎉 Thanks to everyone for upvotes and spreading the word! With major releases in @unum_cloud UCall and StringZilla coming this December - its my biggest year in opensource since registering on #github in 2012
Tweet media one
1
2
16
@ashvardanian
Ash Vardanian
6 months
It blows my mind, that even on `double`-precision floating point numbers NumPy and SciPy (using OpenBLAS) are so slow! Cosine distance in SimSIMD: - 9x faster on Intel Sapphire Rapids (thx to AVX-512) - 18x faster on AWS Graviton 3 (thx to SVE)
1
0
16
@ashvardanian
Ash Vardanian
2 years
My talk was on the basics of using Google Benchmarks with examples of misaligned memory loads, C++ attributes, ffast math and more!
Tweet media one
@ashvardanian
Ash Vardanian
2 years
Yesterday we had an amazing monthly gathering at C++🇦🇲 with talks exclusively on benchmarking, both from @Siemens and @unum_cloud 100+ attendees, cinema hall and 🍿popcorn, what else can one desire? 😂 #cpp #meetup
Tweet media one
2
2
8
1
3
15
@ashvardanian
Ash Vardanian
6 months
Two things I didn't know about #FAISS : 1. It also supports clustering and down-casting to f16/i8. 2. It has good performance at start, but as you grow datasets - it plummets. USearch is easily 10x faster at scale. also 10x smaller binary. and has bindings to 10 languages 😎
@ashvardanian
Ash Vardanian
6 months
Billion Scale Vector Search on 1 server. Over 200,000 queries/second. is probably half of that throughput.
Tweet media one
2
4
56
2
3
15
@ashvardanian
Ash Vardanian
7 months
USearch - the fastest Vector Search engine crossing 300,000 #Python downloads 🎉 Assuming the AI wave is about to hit every software developer, there are also bindings for JS, Rust, Go, Java, ObjC, Swift, C#, etc. Just need to find a way to popularize them 🤦‍♂️
Tweet media one
1
0
15
@ashvardanian
Ash Vardanian
3 months
Bindings for C++, Rust, Swift, AVX-512, Levenshtein Distance & Needleman-Wunsch Scores, Rolling Fingerprints, Lazy Ranges, Faster Sorting... Yes, StringZilla v3 is out! 🦖3️⃣🎉 Special thanks to @keithmadams , @vatsal_manot , & @MikayelGrigory3 🤗
Tweet media one
0
5
15
@ashvardanian
Ash Vardanian
1 month
Microsoft, Salesforce, and @unum_cloud are behind some of the @huggingface 's most trending multimodal models. David and Goliath, AI edition 2024 💪
Tweet media one
1
3
15
@ashvardanian
Ash Vardanian
10 days
I didn't realize ~42% of M3 Max area is a GPU! With 92 billion transistors in a chip, the GPU part is comparable to a 4090 Sadly, our UForm models can't yet access the NPUs for matrix multiplications, and I am curious if there is any piece of software using AMX. Seems redundant
Tweet media one
0
0
15
@ashvardanian
Ash Vardanian
5 months
I've made Great Circle distance computations 4x faster compared to LibC-based on Arm Not a typical SIMD case, only 4 floats & 3 trig. functions Working on hybrid GIS/Vector/Text search in USearch now
Tweet media one
1
2
14
@ashvardanian
Ash Vardanian
5 months
@Karmedge There has been a ton of fuzz around MLX, but it seems to be a vanilla high-level wrapper around their DSP SDK and Metal API. My expectations were much higher If you are really interested in doing serious compute on Apple devices, check this:
1
1
14
@ashvardanian
Ash Vardanian
1 month
USearch has had filtering support (predicate callbacks) in its low-level engine for a long time. Now, I am exposing them to C and Rust APIs 🥳 PR: PS: ChatGPT and other LLMs are so useless when it comes to well documented behaviors, that aren't used by…
0
1
14
@ashvardanian
Ash Vardanian
4 months
StringZilla is getting Rust and Swift bindings, as well as dynamic runtime dispatch packaged into a tiny C library without the LibC dependency 🎉 More benchmarks coming!
Tweet media one
0
3
13
@ashvardanian
Ash Vardanian
3 months
Systems Hackathon Alert 🎉 February 24th, San Francisco. Full day gathering for those who build AI training frameworks, compilers, databases, search engines, and other infra! No LLM wrappers this time 😉 Co-organized by the @AGIHouseSF and @CppBayArea
1
4
14
@ashvardanian
Ash Vardanian
3 months
I've benchmarked one of most popular #Rust libraries `memchr` (200M downloads) against `stringzilla` 🦖 Won 7 benchmarks out 8. I was hoping for a clear win, but apparently I still have things to optimize on Arm 💪 The benchmarks are very easy to repeat:
Tweet media one
2
0
14
@ashvardanian
Ash Vardanian
1 month
This is sick. A decade has passed since that post, but it doesn't seem like we've made much positive progress. @lemire has a great point as usual 🤗 So if you are by any chance using my "usearch-molecules" dataset for drug discovery and broader biotech (or any other of my open…
Tweet media one
@lemire
Daniel Lemire
1 month
@ashvardanian Do we need patents?
0
1
5
0
3
13
@ashvardanian
Ash Vardanian
1 year
New phase for @unum_cloud ! After years of building, we start integrations and UForm is just the first step. Outside of Big Tech we are the first ones with end-to-end optimized Embeddings, Databases, and Vector Search, all #opensource !
@FogSim
SimFog
1 year
New GPTCache's Feature: Support the uform embedding, which can be used the bilingual (english + chinese) language thanks @ashvardanian 's contribution !!!
0
1
5
0
1
8
@ashvardanian
Ash Vardanian
1 month
The appetite for SIMD is growing. This slide is from 8 months ago, from a talk that was just made public. StringZilla ⭐️ 22 -> 1.7K = 70x SimSIMD⭐️ 15 -> 0.7K = 45x USearch ⭐️0.3K -> 1.5K = 5x Over 100 Million devices reached 🥳
Tweet media one
2
0
13
@ashvardanian
Ash Vardanian
3 months
There is an extremely timely article by @lemire on rolling hashes! I'm having a hard time going beyond 1 GB/s/core for 64-bit hashes in StringZilla v3. I've shared a few links under the HN discussion and would definitely appreciate advice 🤗
@lemire
Daniel Lemire
3 months
How fast is rolling Karp-Rabin hashing? A hash function maps values (e.g., strings) into a fixed number of strings, typically smaller than the original. It is useful to compare quickly two long strings, for example. Instead of comparing the strings, you may compare the hash…
Tweet media one
5
15
114
4
1
13
@ashvardanian
Ash Vardanian
4 months
I love the smell of new AI models baking 🧠
Tweet media one
1
0
13
@ashvardanian
Ash Vardanian
4 months
Apple seems to be producing the best consumer grade chips for AI. We've benchmarked UForm with @BlackForestBoi on RTX 3090 vs two M2 CPUs, and UForm on M2 crushed LLaVA on GPU even in raw throughput, let alone efficiency🤯Anyone with M3 and RTX 4090?
Tweet media one
2
1
13
@ashvardanian
Ash Vardanian
4 months
USearch just crossed 500,000 Python downloads! Enjoy the holidays and join the @unum_cloud AMA on Thursday 10 am 🎉
Tweet media one
3
1
12
@ashvardanian
Ash Vardanian
13 days
I've said it before and will repeat again. More conferences should start with a poker tournament. Kudos to @openoceanvc 🤗
Tweet media one
1
0
12
@ashvardanian
Ash Vardanian
6 months
In the 3 most popular open-source projects I maintain, the next major releases are full rewrites with extended functionality 🤯 Its one of those situations when you are equally excited and scared, esp trying to fit >1M LOC functionality into less than 100K LOC of C99 😰
0
0
12
@ashvardanian
Ash Vardanian
7 months
I have been on LinkedIn for a few years now, and it seems like they've got the ChatGPT preview long before the inception of OpenAI 😅
@thealexker
Alex Ker 🔭
7 months
LinkedIn is about to get a whole lot better 🌤️:
Tweet media one
40
27
336
0
0
12
@ashvardanian
Ash Vardanian
1 year
@RReverser In C for such verbose interfaces we simply use structs as arguments. Should be easy adopt the same pattern in any high-level language:
0
0
12
@ashvardanian
Ash Vardanian
6 months
I wrote a script and computed the stargazers intersection between some of the most popular projects on @github , across databases, web, AI, and basic utilities including some of my own 😅 1/11🧵
@ashvardanian
Ash Vardanian
6 months
So the question is, how many of those 877 people intersect?
Tweet media one
0
0
8
4
2
12
@ashvardanian
Ash Vardanian
2 months
My Linux Foundation 30m talk on large-scale RAG, search, and small Language Models is live on Youtube, but here some of my favorite parts with broad implications for entire industry 🧵
2
3
12
@ashvardanian
Ash Vardanian
3 months
Wow. The new 30x improved pg_vector indexes 1 million vectors in ~10 minutes... The same operation in @unum_cloud USearch can be done in ~10 seconds... another at least 60x space for optimization in @supabase
@kiwicopple
Paul Copplestone — e/postgres
3 months
@pgvector v0.6.0 was just released. Previously building HNSW indexes took a long time. This release fixes that we benchmarked it against the v0.5.1: it's not just a small improvement - it's up to 30x faster coming soon to @supabase . benchmarks in thread ↓
Tweet media one
7
13
86
4
0
12
@ashvardanian
Ash Vardanian
2 months
Done, SimSIMD dynamic SIMD dispatch now also works on Windows starting with v4.1 🤗 🪟 USearch next!
@ashvardanian
Ash Vardanian
2 months
Weird problem of the day. MSVC lacks the `_mm_rsqrt14_ps` reciprocal square-root intrinsic, but has its masked variant - `_mm_maskz_rsqrt14_ps`. Why?! That is to say, SimSIMD and USearch are about to get dynamic SIMD-kernel dispatch on Windows as well 🤗🪟
Tweet media one
2
0
3
0
2
12
@ashvardanian
Ash Vardanian
27 days
A Freudian Slip in San Francisco The weird things you can find at @cerebral_valley HQ 😅
Tweet media one
0
0
11
@ashvardanian
Ash Vardanian
6 months
Thanks to @ZFPhalanx USearch just crossed 1,000 stars and 90,000 monthly PyPi downloads 🎉🎉🎉 I’m very excited about the next major release combining it with StringZilla for SIMD-accelerated hybrid vector and text search
@ZFPhalanx
phalanx
6 months
ベクトル検索エンジンusearch クソ早い
Tweet media one
1
80
639
1
1
11
@ashvardanian
Ash Vardanian
3 months
There are several specialized Levenshtein distance computation libraries in the Python ecosystem. Practically all of them are implemented in C, same as StringZilla... Even without the upcoming assembly optimizations it's already 2-7x faster 🫳🎤
Tweet media one
0
1
11
@ashvardanian
Ash Vardanian
1 year
@lemire Sounds good, don't you think?
1
0
11
@ashvardanian
Ash Vardanian
2 years
C++🇦🇲 group is rapidly growing! Mostly <35yo, 30% women, with lots of members from @synopsys @Siemens @XilinxInc @krispHQ and @unum_cloud , of course! Maybe time to follow @corecpp and @cpp_russia and organize a local conf? #cpp #programming #community
Tweet media one
1
3
11
@ashvardanian
Ash Vardanian
8 months
@jedisct1 Oh, you must pay if you want to see those expected particles expose themselves 😂
0
2
11
@ashvardanian
Ash Vardanian
2 months
Anyone working with big data? I'm merging a huge PR, adding new APIs to StringZilla for lazy dataset processing. In some cases it results in 10x throughput improvement at 20x lower memory consumption even for `split`. Using the new `split_iter` the mem. usage approaches zero 🧵
Tweet media one
3
0
11
@ashvardanian
Ash Vardanian
1 month
Unikraft is launching KraftCloud - low latency cloud platform build around unikernels - tiny single-purpose bootable application images 🔥
1
4
11
@ashvardanian
Ash Vardanian
6 months
The new documentation is looking good! I am still waiting for @github to allow HTML tabs, so that we fit the QuickStart guide for all 11 programming languages on 1 page.
Tweet media one
1
0
10
@ashvardanian
Ash Vardanian
1 month
Get ready for the next Hardcore Systems Hackathon in San Francisco! C, C++, CUDA, Rust, and other software and hardware engineers, let's gather together at the @AGIHouseSF on May 12th to work on computing infrastructure together 🤗
0
3
11
@ashvardanian
Ash Vardanian
2 months
... the perennial programming quandary: "How do I take good data in string form and painlessly turn it into garbage?" Source: GNU C Library How have I survived without ever using this 😂
Tweet media one
1
1
10
@ashvardanian
Ash Vardanian
4 months
SimSIMD is on the HackerNews main page - probably for the first time 🎉 Thanks to @sroussey for porting it to every JavaScript runtime and spreading the word - go get "hardware-accelerated similarity metrics" from NPM for your LLM applications 🤗 HN:
Tweet media one
1
1
10
@ashvardanian
Ash Vardanian
2 years
Even at war, we stay compassionate to those looking for shelter. Even losing our children, we find the strength to fight, to live, and to love. I am proud to be #Armenian 🇦🇲
Tweet media one
0
1
10
@ashvardanian
Ash Vardanian
4 months
I was attempting to use ternary logic x86 instructions in StringZilla today and failed miserably... but discovered these few macros with precomputed masks on GitHub, in case you ever need them @pshufb also has a project that would help you generate such masks
Tweet media one
1
2
10
@ashvardanian
Ash Vardanian
2 years
1
2
10
@ashvardanian
Ash Vardanian
4 months
StringZilla C++ binding, replacing the standard <string> is already 3,500 lines of code 🧠🔫 That’s excluding the C-level SIMD implementation, which is expected to cross 4,500 with NEON and AVX2 added back. Anyone wants to add more test cases:
1
1
10
@ashvardanian
Ash Vardanian
21 days
7'000 GitHub stars reached across StringZilla, SimSIMD, and @unum_cloud USearch, UForm, UCall 🎉 Thanks to all the amazing contributors! It's sometimes hard to believe that others would be ready to invest their free time to refactor my crappy CMake scripts 😅
Tweet media one
0
0
10
@ashvardanian
Ash Vardanian
2 months
What's wrong with AI for Science?! I've looked into 3 papers in the last hour. First model was on Google Drive, the second was on Microsoft OneDrive, and the third was on IBM Box 🤯 How about Git LFS? S3 & object stores? Or @huggingface if you aren't into reinventing wheels?
1
0
9
@ashvardanian
Ash Vardanian
2 months
@lin72h We aim to support all modern compilers - GCC, LLVM, and MSVC. LLVM cross-compilation works fine and is used extensively already for our C and C++ code. Zig is a nice language, but we don't really need it.
0
0
1
@ashvardanian
Ash Vardanian
1 year
It was fun, and to my surprise, it performed really well, reaching 300K QPS on Amazon c7g instances. I had never used third-party vector search products, but the first testers of #USearch suggested 3x performance improvement over their existing solutions 🔞 🧵 5/7
Tweet media one
1
0
9