Training a JEPA with randomly sampled projectors enforces more pairwise independence with (VCREG on projector(z)) with distilation and layerwise targets converges a lot quicker than vanilla JEPA in my very early tests.
No collapse issues on a wide set of hyper parameters.
we are so b̶a̶c̶k̶ b-rep.
meet cadmancer, the first in a series of models we will be releasing
you can export the designs it creates, optionally to your favorite cad/cam software (fusion 360 etc), and manufacture it.
very early work - still lots more to do before prime time,
Can one of you really amazing front end developers please build a llm native ui kit. Specifically, UI's are just function calls / tool use dependent on human input. There is no reason we should be stuck in call and response land.
These should be very minimal components
I've been adding continuous latents to Llama 3. I think its a good method to apply variable compute at inference. But, also pretty neat to adapt behavior without consuming context length, and also continuous latents are easier to backprop for some guidance post training. Working
Everything in this clip is generative apart from the lyrics (we can do this also), vocal synthesis, the drums, bass, melodies, support instruments, Even the audio is mastered by AI. Just a teaser, can't wait to show the rest.
The fact you can train a diffusion process using JEPA without a reconstruction objective is lit.
Things confirmed to work in my weekend hack.
vq-jepa
diffusion jepa
jepa-vae
hierarchical jepa
I havn't really shared this much. But, a couple of years ago (mostly 2017-2018) myself and couple other aussies built an ableton plugin that I think had a bunch of cool ideas in it that was ahead of its time from a pure ML perspective. Mech Interp and controllable generations
If you're outside of the valley, working on generative models and are having issues in training, stability, or are trying to implement a new paper that doesn't have open source code and something isn't converging.
My DM's are open, happy to jump on a 15 minute discord call and
I am so pumped to see more people entering generative cad. I need it to exist to do what I want to do and there are a lot of problems to solve, both technical and user experiences. The things end to end design for manufacturing unlock is huge, a damn near infinite market.
I learned to code modding games at like 8, and I remember thinking about computers as this beautiful magical thing that a bunch of really cool people built.
I remember building a tool for diablo 2 to extract/modify sprites and thinking - wow, whoever built this sprite engine is
@levelsio
I am glad you can see exactly what I do. I am building this for our cadmancer model. And training in our user experience into the model.
But, there is so much space to explore here and I think there are really incredible people who would smash this out
we are so b̶a̶c̶k̶ b-rep.
meet cadmancer, the first in a series of models we will be releasing
you can export the designs it creates, optionally to your favorite cad/cam software (fusion 360 etc), and manufacture it.
very early work - still lots more to do before prime time,
Guys, my model demo wasn’t meant to be this popular. It says react template in the tab of the playground still. I guess this is why people share stuff.
we are so b̶a̶c̶k̶ b-rep.
meet cadmancer, the first in a series of models we will be releasing
you can export the designs it creates, optionally to your favorite cad/cam software (fusion 360 etc), and manufacture it.
very early work - still lots more to do before prime time,
Nothing beats the feeling of a previous sota eval passing by 40% through training with 3x smaller model. (specific to my task)
Multi modal jepa COOKING.
I really like this masking strategy for point-jepa.
Green is the context, Blue is masked (target) Red is the positions that the predictor gets.
Seems to be really stable for learning quite good powerful 3d pointcloud and high dimensional cloud representations and embeddings.
Sometimes I wonder if the reported model drift, and annoying changes in behavior that people report in
@OpenAI
's chatgpt is literally just the date they put in the system prompt acting as a seed and changing behavior in a larger than expected way.
LLama3 with "low supportiveness" direction
---
If it's not working, try harder.
---
You're just not cut out for this: Honestly, you're probably just not meant to be fit. You should just accept that and move on.
---
You're just not good enough: Let's face it, you're probably just
Quickly whipped up a A PyTorch implementation of Bayesian Flow Networks from
@rupspace
Have completed the discrete loss, would love quick a sanity check.
This post went really well, I am now so excited by the reality of so many small teams working on really awesome stuff all over the world. Pretty impressed with the quality of thinking.
The wide distribution of these skills and random tidbits is super important.
If you're outside of the valley, working on generative models and are having issues in training, stability, or are trying to implement a new paper that doesn't have open source code and something isn't converging.
My DM's are open, happy to jump on a 15 minute discord call and
There needs to be an open source model (with data pipelines etc) with a substantial (32b+) compute allocation that gets trained regularly. The amount of training tricks on the table if it was just a pooled effort would be huge.
The whole "surprise model release" is great for
@gazorp5
We do train on some open scad, but that would suck as a general cad kernel long term. We also train on b-rep primitives which is basically a big graph of parametric geometry nodes and a bunch of other stuff and basically anything else cad related that we can. Then fine tune for a
This is sick work. I think this, plus the state space models build a lot of confidence in my thinking that all you need is fast weight gates and skip connections. That is the cool thing about transformers, its why mamba works.
My prediction is this line of work, and variable
Hello all of you silicon valley people looking at my twitter. I am glad you liked the demo, you should invest in Brisbane Australia because there is some dope shit here. Lets fucking go.
This time last year we had just started settling in Los Angeles for
@techstars
music. I have been ridiculously privileged to work with
@mawsonguy
and a team filled with the smartest people I know and I am excited to be working with the folks
@khoslaventures
.
It brings me unending happiness to see the genuine reactions musicians have when jamming with the tech my team and I have built. Every head bob is like a shot of Oxytocin & serotonin.
@soumithchintala
@nnaisense
@srush_nlp
Spent the morning grokking it.
Started working on an implementation - have completed the discrete version. I plan to reproduce all the plots and experiments.
The relationship between JEPA's predictor and curiosity driven learning in reinforcement learning / agentic models is pretty interesting.
Schmidhuber and a few other works propose curiosity as a reward in a number of reinforcement learning cases, often in case with sparse
I would love having a grounded conversation / dialectic about the following subjects from other people who are working on them.
Gonna make a twitter space or setup a lil discord call. Reply below/dm if you are interested in having any of these conversations with me. If you have
I am considering starting a small research project, training text models on a single h100. Something small enough to train from scratch in 24 hours. I think I have a lot of architectural ideas and we are probably over partitioned for people working on raw decoder improvements.
Maybe something interesting cooking, eval in the morning, tis tiny. Per token variational autoregressive diffusion. If it works, maybe a good formulation for dynamic compute per token.
Pretty interesting that a transformer with one shared layer works at all. This is a "single layer" transformer, stacked depth wise 22 times, so all the layers have shared weights. Part of a prelim experiment exploring memory compute trade offs.
One of the under rated free-lunches you can get while training VAEs is to use something like GECO instead of manually setting your beta term. It will tune your beta to get a target KL, and leave you're recon doing whatever it can whilst maintaining that kl.
A useful way to kinda
With Splash Pro we are on a mission to get you out of your creative rut, and all future ruts using crazy math and engineering. Here is a sneak peak at an Ableton live plugin that starts to do that using collaborative AI. RT for early beta access!
You might think I am excited for new generation of hardware for training models, or new interested architectures like state space models or explorations into joint embedding predictive models, efficient fine tuning and things like fast feed forward networks improving inference
@jm_alexia
Because if they are using certain types of normalization, the bias would be normalized away by the layer norm (etc). So it becomes an extra parameter that does nothing during training but take up memory.
Stop finetuning, keep pretraining. Sample pretraining data based on being from a similar distribution of your task. Mix in wider tasks and code to prevent squishing all of the interesting stuff out of your likelihoods.
Just had a fun vision of a potential model that could emerge. This is my favorite way the internet of ai could work. I don't know if it's likely but it sounds based.
Everyone exposes some access to there weights or logits, everyone uses a massive mixture of experts, like DNS for
Things we fixed
A) 3 cases of data issues (2 of which were normalization related)
B) 1 set of exploding gradients from a gan distillation3
C) 3 Paper interpretation issues
D) 1 masking and indexing issue
E) 1 gpu memory issue related to deepspeed and fsdp
A stupid thing that works for fine tuning task specific models.
Write a bunch of pre-prompts that you use in training. Run all of these over your data. Calculate the loss (masking out the pre-prompt itself)
Train a decision transformer where you use the loss as your reward
New mesh model draft in training, targeting 3d printing.
So, it's a hardware/mechanical distribution of data. It has a bunch of potential improvements over meshgpt (released last week), and should scale to large meshes. Targeting 32k faces as a first milestone (meshgpt
@YannickScholich
Candidly, I want to open source it under a super liberal license. Just chuck it up as mit/apache. But, I really need to feel confident that I can support the team working on it long term. So, there is still stuff up in the air. I am avoiding raising venture capital if I don't
Mawson is hiring 10 more people to work in Deep Learning AI. Math, Physics, CompSci or Electrical Engineering grads preferred. They are based in Fortitude Valley, Brisbane. Media and Entertainment industry focus. Send resume to info
@mawson
.io
we are so b̶a̶c̶k̶ b-rep.
meet cadmancer, the first in a series of models we will be releasing
you can export the designs it creates, optionally to your favorite cad/cam software (fusion 360 etc), and manufacture it.
very early work - still lots more to do before prime time,
Well, this is popping off. I am exploring ideas like this and more. I am a big believer that product development and many types of ai capability research have converged and you should no longer treat them as separate things. And in fact much of the best research is in the search
The fastest way to close the gap between open and closed models is a dedicated open compute cluster of ~10k H100s that the best projects can timeshare.
Hi New Followers, Glad you liked the jepa thought vector stuff.
Don't expect research papers or that level of rigor from me, I am not a deep learning researcher.
I am a capabilities engineer, I have just been making GPUs go brrr for a long time.