I earned a master's degree following my thesis in machine learning and can tell you right now that some of the BEST coders I've worked with never had formal education.
It 💯% helped me land a job interview, but it does not make me a better coder/data scientist than anyone else.
Stanford has released Prof. Manning's NLP course for free (!!!) on Youtube if you are interested in diving deeper into NLP.
It starts with the very basic word2vec and expends to:
-> Domain adaptation for supervised sentiment
-> Retrieval augmented in-context learning
->…
While Data Scientists are the most profitable jobs in Tech, there is a huge rise in demand for people who understand the Machine Learning Pipeline and can take responsibility for the whole stack.
Yes, that includes extensive work with Data & Analytics stack.
Interesting times.
Microsoft partnered with PyTorch to provide you with a completely FREE PyTorch fundamentals course.
The course includes a built-in sandbox experience where you can code directly from your browser.
No need to install/download anything, just schedule the time to learn.
Link ⤵️
Personal ANNOUNCEMENT ✨
Happy to share the early release of my book on Machine Learning with Apache Spark. It contains 3 chapters, and new chapters, corrections, and updates will be released every month.
Looking forward to your thoughts and how it can better serve your needs!
“ Solid understanding of Spark and ability to write, debug, and optimize Spark code. “
I was today years old when I discovered that openAI relies heavily on Spark.
Entropy is the golden measurement for machine learning.
High entropy means data chaos, low information gain, not accurate model.
Low entropy means - better knowledge in the system, high information gain and better model.
Noise reduction & clean data is a must for good model!
In the Data Engineers community, we're working on a list of fundamental concepts you should know:
- Distributed Systems
- Load Balancing
- Caching
- Data Partitioning
- Indexes
- Redundancy and Replication
- SQL vs. NoSQL
- CAP Theorem
- Consistent Hashing
What did we miss on?
A Self thought friend is now, officially a Software Engineer at google.
It took her 6 years. How she did it?
👉 Studied google open source technologies in depth
👉 Joined a small startup as a dev
👉 Moved to a bigger company to learn more tech stacks
👉 Practice with LeetCode
We always focus on good code as 1st priority for building software.
But Machine Learning is different. With ML, Code takes a secondary role, and "data" becomes the lead actor.
If you can understand how to produce, collect, manage and analyze data, you’ll own your future.
If you want to be a data engineer, you have to build data pipelines! Write some code, make some integration and break some stuff! It will literally make you a better at understanding data!
Getting into data science/engineering can be overwhelming. Pick just one open-source library and start digging into it. You'll learn design patterns, func, and algorithms all in one open-source code. You’ll also get exposed to how to write code that is more than hello world.
After fantastic years in Microsoft, it’s time for a new adventure.
Excited about the next chapter in my career, where I will continue to push the boundaries of Data and Machine Learning technologies and work towards novel open-source solutions.
The Future is Open ✨
Machine learning models are not used in production as is. You have to wrap & serve them in a way that supports your application requirements. It can be served as Rest API, streaming, or batch. Yes, You will need to take care of those too. ML workflow is more than building models.
You must appreciate this infographic. While a couple of improvements/corrections can be made to address staging & production stages, it captures the high-level flow of a machine learning pipeline well.
Design by Sebastian Eberstaller.
Infographic from -
Having a machine learning model that performs great locally is just the beginning. To productionize your machine learning and move it through the stages of dev, validate, staging all the way to production requires you to adopt tools and data practices of production scale.
Having worked with aws for 2 years now, returning to Azure is a pleasure.
My dockerized app was registered in ACR and deployed to AKS inside my vnet in less than 10 minutes.
A bit of Github action is on its way too!
Wow this one just arrived ✨
**
#Dev
2019 Distinguished Author **
I am thankful to be one of 2019 top 500 authors of
@ThePracticalDev
Thanks so much dear community 🙏❤️ May the next year will be full with content and empathy 💜
Happy 2020 🎉
New beginnings are scary.
Yet, with the right team and brave, can-do, empathetic culture. Everything is possible.
Excited to share that I've started a new position as Director @ Confluent. Working with a brilliant team of rock stars ✨
Expect more Kafka, Flink, and Data…
Want to be a software developer? Great!
Motivated by the money? Great!
We need more people and.. guess what!
There is enough space for everyone!
Find your niche and go all in!
📚It was such a pleasure collaborating on this book with a long list of talented Data Engineers.
If you want to learn about data cleaning, processing, wrangling, storing, ingesting and much more. Check it out!
Don't negotiate with your brain. Motivation is an after-effect of experiencing progress and success. The real barrier is you. Identify it and set a goal to overcome it. Think about what you can do today. the small thing that takes 10min but will develop you greatly.
Microsoft is a company that demonstrates consistency in giving employees more.
Up until now, we've got extra vacations days twice and a special bonus of $1.5K for everyone, equally, wherever you are on the planet.
With that, I’m hiring for my team:
Data is eating the world.
Happy to share my new adventure as VP of DevEx at lakeFS io. Developing open-source to enable Data best practices is an exciting challenge! What do lakeFS offer to solve our data & machine learning pain?
→ link in comment.
Without code, you can’t build machine learning-based systems.
Without data, you won’t be able to build machine learning models.
But, you are always able to leverage precomputed ML models through open-source, cognitive services, and the cloud.
SQL is easy, but good data architecture is hard.
The challenge is not writing queries but understanding a highly complex data model.
How would you go about simplifying data models? Find granularity, define relationships, use indexes and plan for future schemas evolution.
This morning I officially signed a technical book contract.
I equipped myself w/ detailed market research, an Excel sheet, early morning alarm clock, and a decent coffee stash.
Setting up my intention for 2021 to be about Curiosity, Learnings, and Education is on track 🛣️
@svpino
Many smart people get stuck in the knowledge-gathering stage but never use it. It’s one thing to learn something but completely different to act on it.
The rise of Analytics Engineer. Check out this interesting table of data responsibilities. The more data continues to dominate our products, the more professions, majors, and jobs will open up in the tech industry for you to join the ride. Keep learning.
Table by Claire Carroll.
The magic behind software is not data, architecture, algorithms, or programming language.
The magic piece is you. It's all about your creativity.
Nothing replaces your ability to turn a whole lot of nothing into something that changes the world.
“Uber adopted Apache Pinot several years ago and today Pinot is a key technology inside Uber Data Platform to power multiple mission-critical real-time analytics applications“
Interesting article from Uber engineers on leveraging Presto, Kafka, Pinot,etc.
The baby bird is not a baby nor a bird anymore.
This is a South American Electric Fish, also known as electric eel. Despite the name, it is not an eel but a knifefish. It is considered a freshwater teleost which contains an electrogenic tissue that produces electric discharges.
Welcome, DocLLM by JPMorgan (yes, yes). As the name implies, it understands documents (invoices, receipts, reports, contracts, etc.).
JPMorgan emphasizes that this model is not just another language model and explains how they built it slightly differently to achieve better…
📚 Curious to learn more about the basics of distributed systems? patterns and paradigms? have some time to read?
Checkout - Designing Distributed Systems by
@brendandburns
.
Free PDF version -->
Data Engineering:
✦ CS Fundamentals
✦ Java / Python / Go
✦ Testing
✦ DB,No/SQL
✦ Scaling, CAP, OLTP vs. OLAP
✦ Data WareHouse
✦ Distributed Computing
✦ Messaging
✦ Monitoring
✦ Data Security & Privacy
✦ Orchestrators
✦ CI/CD
it takes a team. It's not a 1 person job.
Thank You for joining Santiago’s Twitter Space on Starting out with Machine Learning.
Great insights and questions.
Let's start a 🧵.
I invite you to fill in your thoughts ⤵️
Traditional career paths are out.
It's time for You to develop your own journey based on what you’re passionate about and enjoy doing.
Getting inspired by others is Great.
But focus on yourself and what spark joy for you.
Also, know that a career is a marathon, not a sprint.
StackOverflow Survey 2022 is out.
🐳Docker adoption is increasing - 55% to 69%
🦀Rust is still the most loved language
📊PostgreSQL wins over Redis as the most loved
💼💼Data skills are well compensated w/ Apache Spark, Apache Kafka, & Hadoop- the top 3
You are the only one responsible for your career.
Don't wait for people to provide you with a syllabus, tell you what to learn or do.
Go ask questions. Do the research. Make decisions.
It's your life, be the driver, not the passenger - don't let life drive you.
I talk to people in the data industry every day.
Most know everything about ETL but can't mention a single tool for validating data products. Not a single one!
Or let alone, what a data product is.
We are so early.
Data Science is still VERY hyped.
If you want to SECURE yourself a future & a job in tech,
take a look at these career paths:
a). Building Data Science platforms - ML Engineer
b). Manage production ML lifecycle - MLOps
c). Making data available for ML&Analytics - Data Engineer
Many machine learning algorithms can not handle free text out of the box. We need to marshal the data into a tabular format while removing noise, hashing Strings, and build a translation table to explain the model outcomes. Interpretable ML is 🔑 to solve the black box problem.
You should speak at tech events:
✅ Your journey is unique
✅ Your voice matters
✅ Extend your professional network
✅ The prepping process will make you dive into the smallest detail
✅ You’ll become better at it
What to talk about:
—> Success, Failures, Best practices..
" To become a data scientist, you could earn a Bachelor's degree in Computer science, Social sciences, Physical sciences, and Statistics. ... The truth is, most data scientists have a Master's degree or Ph. D "
Can you do it with a course?
Working at Microsoft provided me with a front-row view of how the best leaders in the industry practice humility every day. Admitting mistakes, giving space, allowing ideas to emerge, building a learning culture, and so many more, I'm happy to continue practicing in my role.
📚Free O’Reilly book introducing MLOps and how to strategize the organizational culture to bring all engineering stakeholders to support ML.
I had the pleasure to review it early on, and it's a great read covering essential aspects of productionizing ML.
Link in first comment.
Elasticsearch and Kibana have changed their license from Apache V2 to SSPL.
By continuing using them in your online services code, you are at risk of being forced to release every supporting piece of software your product is built from.
DON'T IGNORE THIS.
While Scala dominated the distributed data world for a long time, it wasn't as friendly to engineers such as Python or Go. Just understanding what is monad took an experienced engineer a whole month.
This choice impacts our productivity and learning curve. Chose wisely.
In software development, articulating your ideas and experience in simple language is the strongest capability of all. Otherwise, you will miss out on people that couldn’t understand your wisdom due to complex language.
Taking a long weekend off work to recharge, relax and come back energized, ready to innovate and continue building great things with the team! 🧘♀️
For inspiration, I'm going to read my favorite book once again ;)
You can too!
It’s free. Link in the first comment 📚
3 Reasons to invest in Data Engineering teams:
👉 "If you want to become a software company, you need data to make better decisions as a business”.
👉 "every company is becoming a big data company”.
👉 Data is growing on a massive scale, and big data is here to stay.