Pleased to announce Calliar, the first online dataset for Arabic Calligraphy. Joint work with
@_MagedSaeed_
@alwaridi
and Yousif Al-Wajih.
Paper:
Code & data:
Colab:
Pleased to announce fast image-to-image translation in the browser. With 3 trained models introduced. Also I released a processed dataset of 1000 images for edges2cats translation.
Demo:
Code:
Gradient Flow: a new notebook that explains automatic differentiation using eager execution in
@TensorFlow
. I go over computational graphs, vector valued functions, gradient, etc ..
These were my best courses I took in Machine Learning in no particular order
* Coursera Machine Learning by
@AndrewYNg
* CS231n course from Stanford by
@karpathy
and
@jcjohnss
* Fastai courses on practical machine learning by
@jeremyphoward
What about yours ?
In this notebook I explore the brand new
@TensorFlow
2.0. I discuss basic ops, gradients, data preprocessing and augmentation, training and saving.
Link:
Others:
Thrilled to announce I guest-authored a chapter in
@OReillyMedia
’s Practical Deep Learning book () on 'AI in the Browser'! It teaches quickly building applications with TensorFlow.js: pix2pix, GANs, pose estimation + deep insights from tfjs &
@ml5js
teams.
Just subscribed to Colab Pro. The main advantages are
* Execlusive excess to faster GPUs and TPUs
* Duration for connection is 24 hrs
* Larger memory ~ 27 GB
In this thread I will share my experience
Eager Execution Enabled
In this notebook I explain different concepts in eager execution. I go over variables, ops, gradients, custom gradients, callbacks, metrics and creating models with tf.keras and saving/restoring them.
🚀Introducing CIDAR: the first open Arabic instruction dataset culturally-aligned by native Arabic speakers. CIDAR contains 10,000 instructions and outputs capturing the essence of the Arab region and its unique culture. CIDAR can be used to fine-tune Arabic LLMs to follow
1/7. Excited to announce the largest public catalogue of Arabic text and speech datasets: Masader, with 200 datasets. Work with/
@superrzk
,
@alwaridi
and
@_MagedSaeed_
.Part of data sourcing efforts
@BigscienceW
🌸🧵
paper:
masader:
I am creating my own blog. I am going to talk about PhD life, mental health and some projects. I need your recommendations for what platform to use and themes to go with.
How to capture rich patterns in label-free data ? Currently taking Deep Unsupervised Learning by
@pabbeel
, et al. The website contains all the materials!
Fine-tuned wave2vec from
@facebookai
on Egyptian dialect using
@huggingface
. The model was trained on ~ 5 hours of audio and got ~ 0.5 WER on the test set. You can try the model using your mic on colab
Swift + TensorFlow is a next-generation platform for machine learning that incorporates differentiable programming. In this notebook a go over its basics and also how to create a simple NN and CNN.
Notebook:
We are pleased to release, whisperar: finetuned whisper models on Arabic. We achieve better WER in multiple datasets. Powered by
@arabicml2
open source community. Thanks to
@huggingface
and
@LambdaAPI
for the compute.
GitHub:
Demo:
BigGanEx Notebook
An introduction to BigGans, latent space and the truncation trick. Moreover, some cool experiments are introduced. Thanks to
@ajmooch
for helping with some concepts.
Added the largest publicly available dataset for Arabic to
@huggingface
. This dataset can be used for unsupervised training for models like GPT-2 and BERT. Here is a minimal code for loading the dataset
Announcing 'klaam" a library for Arabic speech recognition and classification as part of
@arabicml2
. The library allows prediction and training by fine tuning wav2vec models using
@huggingface
.
github:
colab :
18 notebooks covering
1. Data preparation, generation and preprocessing
2. Classification, segmentation and detection
3. Transfer Learning and Deployment
4. BigGans
5. TPU training
etc ..
With colab,
@github
and
@TensorFlow
.js you can train, convert and deploy a model super fast. All of this in just one notebook and solely in the browser. No GPU setup, no server for running the model on the browser.
I made a notebook to explain how to work with Eager Execution in
@TensorFlow
. EE allows you to evaluate operations immediately. Moreover, I explained how to watch the gradient of an arbitrary loss function and update the parameters of a CNN model.
I have lost many opportunities for conferences and summer schools because of visa difficulties. Your nationality is becoming a barrier between you and excelling in your field.
In the last five months, I have been part of
@BigScienceW
🌸an open research environment hosted by
@huggingface
with +700 researchers from around the world. I have co-authored two papers and there are two more in the making. What was it like ? ,🧵
This presentation by
@AlecRad
from
@OpenAI
is just amazing. An overview of the NLP research that every one should watch. Thanks
@pabbeel
for making it happen.
I always look up to
@hardmaru
as an inspirational figure. With only 4 years experience in Neural Networks, he co-authored many interesting papers like Sketch-RNN and World Models. He explained his work in this presentation
It has been two years since we founded
@arabicml2
with
@_MagedSaeed_
to democratize Arabic NLP. Here is a summary of the open source work we have done so far
0/10. thread🧵
I made a very simple api for stylegan training and visualization. Now you can train on any dataset in any resolution in just 6 lines of code. All
@GoogleColab
notebooks available on GitHub:
We are pleased to announce Adawat أدوات, our latest work from
@arabicml2
led by
@Emad_A_Alghamdi
and
@superrzk
. We create a catalogue for Arabic NLP tools with colab notebooks to test them.
Website:
GitHub:
I created a notebook to illustrate how to make some cool gifs by interpolation in the BigGan latent space. Use a the same class or interpolate between different classes.
check:
A collection of 12 notebooks that I created with
@GoogleColab
, Fork … and run them directly in colab. If you have any suggestions for future notebooks post them below.
Today I reached 1k citations with +60% papers having +10 citations. I want to thank my collaborators from
@BigscienceW
,
@arabicml2
, and
@KFUPM
. I will continue working on open source models, datasets, and tools.
We at
@arabicml2
would like to support people working on the intersection of Arabic and machine learning. If you are a graduate student or an engineer working on NLP/Speech who need guidance
or would like to brainstorm come talk to us.
Third notebook in AttentioNN series. Image captioning using soft attention and doubly stochastic regularization with tf 2.0
Link:
others:
cc
@fchollet
@random_forests
Ashaar (أشعار): The Largest Study on Arabic Poetry Analysis & Generation w/
@_MagedSaeed_
, Moataz Ahmed . Took us 5 years to finalize. We release 10 resources: 4 datasets and 6 models.
Code:
Paper:
Demo:
Looking forward to dicsuss more about Arabic NLP at . Thanks
@YJernite
and
@huggingface
for setting it up. There are many other rooms for other languages. Creating such communities for every language is an important step for democritizing NLP.
Excited to announce
@arabicml2
Board where we host Arabic researchers to talk about their interesting papers. Stay tuned for a series of exciting talks in the next few weeks.
Delighted to announce that our paper on
"ARBML: Democratizing Arabic Natural Language Processing Tools" (
@arabicml2
) has been accepted at
@emnlp2020
NLP-OSS workshop.
A joint work with
@_MagedSaeed_
.
This is one of the reasons we created
@arabicml2
. We want to design processors, tokenizers, datasets and models specific for Arabic. It is important to understand that most algorithms in nlp are not language-agnostic. Many have been only demonstrated in English.
Why You Should Do NLP Beyond English
7000+ languages are spoken around the world but NLP research has mostly focused on English. In this post, I give an overview of why you should work on languages other than English.
@hardmaru
If the research objective is to get into big institutes or becoming popular then it will be difficult for the mentioned reasons. However, why not doing research for fun? for tackling problems that you think matter?
Our paper "Arabic Compact Language Modelling for Resource Limited Devices" has been accepted to
#WANLP2021
@eaclmeeting
. We show that you can train smaller models on smaller datasets for Arabic and get reasonable results in multiple tasks. A join work with Irfan Ahmad.
Arabic is ranked 6 in terms of the number of datasets
@huggingface
with around 54 entries. A wide range of datasets tackling monolingual and multilingual tasks.
Glad to have participated in supporting Arabic in this effort. In one week we have two instruction datasets for Arabic 🚀. Time to build many Arabic LLMs in 2024.
Today, we’re launching Aya, a new open-source, massively multilingual LLM & dataset to help support under-represented languages. Aya outperforms existing open-source models and covers 101 different languages – more than double covered by previous models.
لازال باب التسجيل مفتوح في #مدرسة_فهم_الصيفية للتعلم العميق بادروا بالتسجيل للاستفادة من نخبة من المتمكنين في مجال #الذكاء_الاصطناعي و #التعلم_العميق
مع تقديم شهادات حضور بعد اتمام المتطلبات
I have been contacted by some instructors about the possibility of using my notebooks as part of some courses. The notebooks are purely educational. So feel free to use them (MIT license).
Feature Visualization by
@ch402
@zzznah
@ludwigschubert
. I cannot praise enough how well-written this paper is! Enjoying every bit of it with the accompanying notebooks.
Help
@huggingface
folks democratize NLP datasets by contributing to their community sprint especially for low resource languages. This is an important step for future research in these directions
Looking for research internships in NLP preferably remote. Would like to investigate multilinguality and cross lingual transfer learning, open to other problems as well. Please pass around.
Transformer I: The encoder layer in the transformer model with a detailed description of positional encoding, multi-head attention etc.
check AttentioNN:
Here is a newish feature we weren’t sure if worthy of a tweet but came up in conversation 😅 You can now filter by content type(s) to find the right type of resource when you search. Here we search resources for `self-supervised-learning` that have `code` and `video`. Details 👇
Every week on Thursday we organize a meeting using hangout were we discuss machine learning and present live coding sessions. The requirements are
1. Basic ML/Deep learning
2. Have time to participate in the meetings
Resister here
1/7. Excited to announce the largest public catalogue of Arabic text and speech datasets: Masader, with 200 datasets. Work with/
@superrzk
,
@alwaridi
and
@_MagedSaeed_
.Part of data sourcing efforts
@BigscienceW
🌸🧵
paper:
masader: