David Andrés 🤖📈🐍 @daansan_ml Twitter profile

Pinned Tweet

David Andrés 🤖📈🐍

6 months

Exploratory Data Analysis (EDA) is a process used for investigating your data to discover patterns, anomalies, relationships, or trends using statistical summaries and visual methods. Let's find out more 🧵👇

15

639

3K

Last Seen Profiles

@Lypning

@noble_x_x_

@Grancracker_

@qin_duke

@MidWorstt

@mijiiiiiiiii

@rilemtwit

@ClintDinnel

@CRN_UK

@SaviorsWeb3

@Parlons_PP

@davidmccraw

@jandakembangstw

@AlfieGWhattam

@PreciousGNSD

@mplidofficial

@DRSPILZ

@Butt3_

@sewal_ls

@stimOTD

@shilpa1300

@Slap_BattlesRBX

@JAIDBANG

@Dolyeoja

@Jarahhhh

@RetroSaints3

@majrknct

@Try_slot

@crncau

@markeluie

@vwinter

@iamfares3

@jiaozhushadow

@jjkfg_

@Clumsterscore

@nn_dgg

David Andrés 🤖📈🐍

@daansan_ml

6 months

In Data Science you can find multiple data distributions... But where are they typically found? 🤔 This is part 1 - tomorrow I'll share the second one! Check it out 🧵👇

14

363

2K

David Andrés 🤖📈🐍

@daansan_ml

6 months

There are several types of data distributions you might encounter in a dataset. Here are some common ones 👇🧵

18

258

1K

David Andrés 🤖📈🐍

@daansan_ml

5 months

Is your data normal? 🤔 What I meant is if your data follows a normal distribution... Discover this elegant distribution 🧵👇

13

267

1K

David Andrés 🤖📈🐍

@daansan_ml

7 months

ARIMA is one of the most popular traditional statistical methods used for time series forecasting. THREAD 🧵 👇

19

183

894

David Andrés 🤖📈🐍

@daansan_ml

4 months

ARIMA models have three parameters: 'p', 'q' and 'd'. They need to be optimized... but, before that, do you know how to interpret each of them? Learn what each of them mean here 🧵 👇

13

197

815

David Andrés 🤖📈🐍

@daansan_ml

6 months

Where can you find the most common data distributions? (2nd part) Check this thread for real-world examples! 🧵 👇

7

158

766

David Andrés 🤖📈🐍

@daansan_ml

4 months

Time Series data with seasonality? Split it into its main 3 components! Check an example here (code at the end) 👨‍💻 🧵 👇

7

145

751

David Andrés 🤖📈🐍

@daansan_ml

5 months

Are you familiar with the most common Machine Learning algorithms? Today, I introduce 6 of the most commonly used ones! Check them out 🧵 👇

10

194

734

David Andrés 🤖📈🐍

@daansan_ml

5 months

ARIMA models are essential in Time Series forecasting. You can add multiple components to make them fit your particular data: go from a basic AR model to a complex SARIMAX model! 🧵 👇

17

163

713

David Andrés 🤖📈🐍

@daansan_ml

5 months

ARIMA is one of the most popular traditional statistical methods used for time series forecasting. THREAD 🧵 👇

10

145

698

David Andrés 🤖📈🐍

@daansan_ml

9 months

ARIMA is one of the most popular traditional statistical methods used for time series forecasting. THREAD 🧵 👇

21

145

679

David Andrés 🤖📈🐍

@daansan_ml

5 months

Volatility can be a big problem in Time Series forecasting! Be careful with it: ✅ Low volatility ❌ High volatility Learn how you can take it into account 🧵👇

14

137

685

David Andrés 🤖📈🐍

@daansan_ml

4 months

Do you want to forecast seasonal time series data? Remove the seasonality and add it back at the end! That's basically what STL method does.

7

141

660

David Andrés 🤖📈🐍

@daansan_ml

5 months

ARIMA is really useful for time series forecasting, however you can only forecast 1 variable at a time... VAR (Vector AutoRegression) solves this problem! Discover more 🧵 👇

9

154

656

David Andrés 🤖📈🐍

@daansan_ml

5 months

Do you have outliers in your data? What should you do with them? 🤔 Here's a guide on effectively managing them 🧵 👇

16

158

651

David Andrés 🤖📈🐍

@daansan_ml

10 months

How can you detect outliers? But first of all, what are outliers? 🤔 🧵 👇

20

146

606

David Andrés 🤖📈🐍

@daansan_ml

5 months

⭐ Time Series is an essential skill in Data Science. You don't know where to start? Here you have a roadmap for you to start on the right foot! Have a look 👇 🧵

13

150

596

David Andrés 🤖📈🐍

@daansan_ml

6 months

After fitting a Time Series model such as ARIMA, you should always check the 𝗿𝗲𝘀𝗶𝗱𝘂𝗮𝗹 𝗱𝗶𝗮𝗴𝗻𝗼𝘀𝘁𝗶𝗰𝘀 to assess how well your model captures all the patterns in the data. See how to do it 👇

9

134

586

David Andrés 🤖📈🐍

@daansan_ml

10 months

What is data normalization, and how can it be achieved? Let's find out more about this! 🧵 👇

11

116

556

David Andrés 🤖📈🐍

@daansan_ml

5 months

What is data smoothing? ...and why may you need it? 🤔 Read this thread to learn more about it! 🧵 👇

9

111

553

David Andrés 🤖📈🐍

@daansan_ml

5 months

Your data is possibly too noisy! You can try these 2️⃣ techniques to discover its trend, seasonality or even outliers! 🧵 👇

10

98

544

David Andrés 🤖📈🐍

@daansan_ml

4 months

Having an imbalanced dataset is a problem. 😟 Discover SMOTE, it can help you deal with this! 🧵 👇

32

87

539

David Andrés 🤖📈🐍

@daansan_ml

5 months

Do you want to identify outliers or find a global trend in your Time Series data? LOWESS may be what you are looking for! It means Locally Weighted Scatterplot Smoothing, and you can find out more about it here 🧵 👇

10

100

533

David Andrés 🤖📈🐍

@daansan_ml

6 months

Data preprocessing is a crucial step in the machine learning pipeline, ensuring that the dataset is ready for training. One essential aspect of data preprocessing is ✨feature scaling✨, which involves adjusting the range and distribution of the data. 🧵 👇

7

106

534

David Andrés 🤖📈🐍

@daansan_ml

5 months

5 great courses to learn Time Series Analysis and Forecasting in Python 🧵👇👇👇

11

124

528

David Andrés 🤖📈🐍

@daansan_ml

5 months

🚨Your data may be hiding a trend, seasonality or even outliers !! Let's learn 2️⃣ basic techniques to smooth your data and get rid of the noise 🧵 👇

14

100

523

David Andrés 🤖📈🐍

@daansan_ml

3 months

You can forecast Time Series data using a Machine Learning algorithm like XGBoost or Random Forest. However, you need to reframe your problem as a Supervised Learning one. Learn here how to do it 🧵 👇

9

110

529

David Andrés 🤖📈🐍

@daansan_ml

6 months

Linear Regression is a fundamental algorithm in supervised Machine Learning used for predictive modeling. Learn more about it here 🧵 👇

12

109

520

David Andrés 🤖📈🐍

@daansan_ml

6 months

Time Series Forecasting plays a crucial role in predicting future values based on historical patterns. However, most of the time, to achieve accurate and reliable results, one of the key prerequisites is working with stationary data. But, why is that? 🤔 🧵 👇

5

88

513

David Andrés 🤖📈🐍

@daansan_ml

2 months

In the ARIMA methodology, the AR part stands for Auto-Regressive model. An AR model suggests that the current value of a time series is a linear combination of its previous values and a random error term. Let's find out more about it! 👇 🧵

9

112

506

David Andrés 🤖📈🐍

@daansan_ml

6 months

You can forecast Time Series data using a Machine Learning algorithm like XGBoost or Random Forest. However, you need to reframe your problem as a Supervised Learning one. Learn here how to do it 🧵 👇

3

117

494

David Andrés 🤖📈🐍

@daansan_ml

5 months

Make sure your model is considering all your data features equally! Scaling can be your life saver! Learn how to do it when you have normally distributed features 🧵👇

13

104

480

David Andrés 🤖📈🐍

@daansan_ml

5 months

Discover how Kernel Smoothing can discover hidden trends in your data! Do you know this Data Smoothing technique? Find out more here 🧵 👇

7

105

481

David Andrés 🤖📈🐍

@daansan_ml

4 months

Stationarity is a property of a Time Series where its statistical features such as mean and variance remain constant over time. It's crucial for Time Series analysis because many statistical models assume stationarity for reliable forecasts. Find out how to check it 🧵👇

12

120

470

David Andrés 🤖📈🐍

@daansan_ml

5 months

Too much noise on your time series data? Looking for hidden trends? You may want to consider data smoothing. Here's when to use it 🧵 👇

5

86

452

David Andrés 🤖📈🐍

@daansan_ml

6 months

What is the difference between Classification and Regression in Machine Learning? 🤔 🧵 👇

13

112

447

David Andrés 🤖📈🐍

@daansan_ml

5 months

In this week's 💊MLPills we talk about how to discover the Data Distribution of your dataset features. Join almost 5000 subscribers and don't miss any future issues... for free! (Check next tweet)

3

98

447

David Andrés 🤖📈🐍

@daansan_ml

3 months

What is the difference between Classification and Regression in Machine Learning? 🤔 🧵 👇

4

94

427

David Andrés 🤖📈🐍

@daansan_ml

3 months

ARIMA is one of the most popular traditional statistical methods used for time series forecasting. Let's understand its components 🧵 👇

6

97

421

David Andrés 🤖📈🐍

@daansan_ml

14 days

You've trained your ARIMA model, but is it a good model? Today you'll learn how to evaluate the performance of your model. Also when to use each metric 🧵👇

6

105

410

David Andrés 🤖📈🐍

@daansan_ml

4 months

How can you estimate a suitable value for 'p' in your ARIMA model? Here you have the definite guide! 🧵👇

8

82

400

David Andrés 🤖📈🐍

@daansan_ml

4 months

Have you chosen the best model? You may want to check AIC and BIC. Let's explore what they are and how they can help in finding the optimal ARIMA model 🧵👇

12

111

395

David Andrés 🤖📈🐍

@daansan_ml

5 months

XGBoost is powerful and very well-known. But it's not the absolute best for every single case... Find out how to choose between the best 3️⃣ algorithms for tabular data 🧵👇

12

82

398

David Andrés 🤖📈🐍

@daansan_ml

4 months

Creating the right features for Time Series data can make a significant impact on the performance of your model. Today I'll introduce 2 key ones, essential for capturing the sequential aspect of time series! 🧵👇

9

76

388

David Andrés 🤖📈🐍

@daansan_ml

3 months

Understanding feature importance in machine learning models is essential for interpreting their predictions. Today I'll share with you 2 methods to get it 🧵 👇

8

83

390

David Andrés 🤖📈🐍

@daansan_ml

5 months

🚨NEVER split your data randomly! At least when working with Time Series data... Learn here what are the dangers of doing so 🧵 👇

13

76

382

David Andrés 🤖📈🐍

@daansan_ml

5 months

Do you need to build an ARIMA model.... but you don't want the hassle of selecting the parameters to find the optimal model? 😟 Say hello to autoArima! It simplifies the process of selecting the best ARIMA model. 👇 🧵

12

77

361

David Andrés 🤖📈🐍

@daansan_ml

5 months

What is the difference between seasonality and cyclicality in time series forecasting❓ Discover it below 👇 🧵

6

92

359

David Andrés 🤖📈🐍

@daansan_ml

10 months

You can forecast Time Series data using a Machine Learning algorithm like XGBoost or Random Forest. However, you need to reframe your problem as a Supervised Learning one. Learn here how to do it 🧵 👇

11

82

358

David Andrés 🤖📈🐍

@daansan_ml

4 months

ACF and PACF are two important concepts in time series analysis, especially if what you need is an ARIMA model! Let's understand what they are🧵 👇

9

88

352

David Andrés 🤖📈🐍

@daansan_ml

2 months

In time series analysis, the trend component is key. It indicates the directional movement of data over time. Let's learn more about the trend 👇🧵

7

88

353

David Andrés 🤖📈🐍

@daansan_ml

2 months

After fitting a Time Series model such as ARIMA, you should always check the 𝗿𝗲𝘀𝗶𝗱𝘂𝗮𝗹 𝗱𝗶𝗮𝗴𝗻𝗼𝘀𝘁𝗶𝗰𝘀 to assess how well your model captures all the patterns in the data. See how to do it 👇 🧵

4

86

352

David Andrés 🤖📈🐍

@daansan_ml

5 months

Time Series analysis and forecasting is a really valuable skill to have in your Data Science toolkit. Here are 4️⃣ reasons WHY you should learn it... Do you agree? 🧵👇

11

77

344

David Andrés 🤖📈🐍

@daansan_ml

2 months

In time series analysis and forecasting, the Moving Average (MA) model plays a crucial role within the ARIMA framework. Let's delve into what it entails! 👇 🧵

5

91

342

David Andrés 🤖📈🐍

@daansan_ml

9 months

Data preprocessing is a crucial step in the machine learning pipeline, ensuring that the dataset is ready for training. One essential aspect of data preprocessing is ✨feature scaling✨, which involves adjusting the range and distribution of the data. 🧵 👇

7

77

332

David Andrés 🤖📈🐍

@daansan_ml

9 months

Discover one of the most used feature scaling techniques: ✨Min-Max Scaling✨ 🧵 👇

12

59

328

David Andrés 🤖📈🐍

@daansan_ml

4 months

Which value of "d" should you choose for your ARIMA model? Today I present an easy method to find it! 🧵 👇

6

77

324

David Andrés 🤖📈🐍

@daansan_ml

6 months

Generating or engineering features from Time Series data when using an ML approach involves extracting meaningful information that can be used by algorithms to understand patterns, make predictions, or identify trends. Here are some feature engineering techniques 🧵👇

15

90

326

David Andrés 🤖📈🐍

@daansan_ml

9 months

What is missing data? Missing data refers to the absence of values in a dataset where they are expected. It can arise from various reasons, such as: ▶️Data Entry Errors: Human errors during data entry can lead to missing values. For instance, someone might forget to fill in a

10

88

320

David Andrés 🤖📈🐍

@daansan_ml

9 months

Prophet is an open-source library developed by Facebook for Time Series Forecasting and has many advantages. Find 6️⃣ of them below 🧵 👇

7

60

318

David Andrés 🤖📈🐍

@daansan_ml

4 months

Does my data have a Unit Root? What is that and why it is important in Time Series forecasting? 🧵👇

9

70

322

David Andrés 🤖📈🐍

@daansan_ml

1 month

What is the difference between seasonality and cyclicality in time series forecasting❓ Discover it below 👇 🧵

3

73

325

David Andrés 🤖📈🐍

@daansan_ml

4 months

Would you like to create and train a neural network using TensorFlow and Keras? You can find the main steps to achieve a simple version of this here 👇 1⃣ Begin by importing the necessary modules: - Sequential to define a linear stack of network layers - Dense for fully

4

76

321

David Andrés 🤖📈🐍

@daansan_ml

3 months

Permutation Importance and SHAP are two model-agnostic techniques employed in machine learning for estimating the importance of features within models. Let's compare these 2 techniques 🧵👇

5

86

324

David Andrés 🤖📈🐍

@daansan_ml

10 months

What is data smoothing? ...and why may you need it? 🤔 Read this thread to learn more about it! 🧵 👇

17

73

317

David Andrés 🤖📈🐍

@daansan_ml

7 months

When evaluating the performance of Time Series forecasting models, several metrics can be used to assess their accuracy and predictive power. Here are 4️⃣ of the most used metrics for time series forecasting 🧵 👇

14

79

307

David Andrés 🤖📈🐍

@daansan_ml

3 months

Doing feature engineering for your Time Series data? Here is an interesting technique: "Time Since an Event" 🧵 👇

5

63

311

David Andrés 🤖📈🐍

@daansan_ml

2 months

How can you assess whether your ARIMA model is good or not? One way is checking the "summary" that the statsmodels library offers you 👇 🧵

6

72

314

David Andrés 🤖📈🐍

@daansan_ml

6 months

Cleaning your data before building your Time Series model is crucial. Learn how to do it, step by step 🧵👇

8

68

308

David Andrés 🤖📈🐍

@daansan_ml

9 months

Yesterday we released a new article: "How to forecast Time Series data using XGBoost?" 🤔 Discover it below 👇

16

64

305

David Andrés 🤖📈🐍

@daansan_ml

4 months

Are you familiar with the most common Machine Learning algorithms? Today, I will complete the Top 10 of the most commonly used ones! Check them out 🧵 👇

6

50

301

David Andrés 🤖📈🐍

@daansan_ml

4 months

How can you estimate the value of the MA term - q - in your ARIMA model? Here you have a step-by-step guide! 🧵👇

7

68

303

David Andrés 🤖📈🐍

@daansan_ml

6 months

Do you know that you can separate trend and seasonality in your time series data? Two popular decomposition methods are Seasonal Decompose and STL (Seasonal-Trend decomposition using LOESS). Let's find out more about them 🧵👇

9

54

297

David Andrés 🤖📈🐍

@daansan_ml

7 months

Last week I heard about the "Fuzzy Time Series"... I had never heard about that before, so I researched it. Here's what I found 🧵👇

5

58

286

David Andrés 🤖📈🐍

@daansan_ml

2 months

What is the seasonal component in time series analysis? Let's break it down! 👇🧵

2

75

286

David Andrés 🤖📈🐍

@daansan_ml

10 months

Cleaning your data before building your Time Series model is crucial. Learn how to do it, step by step 🧵👇

8

79

276

David Andrés 🤖📈🐍

@daansan_ml

7 months

In Time Series Analysis and Forecasting, a base model is often a simple model used as a benchmark to compare the performance of more complex models. Last time we talked about Simple Average... Let's introduce now Moving Average (MA)! 🧵 👇

10

61

276

David Andrés 🤖📈🐍

@daansan_ml

5 months

What are the steps of any Data Science project? 1️⃣ Define the problem or question to be answered: Clearly articulate the problem you aim to solve or the question you want to address. 2️⃣ Gather and understand the data: Collect relevant data and gain a thorough understanding of

9

69

269

David Andrés 🤖📈🐍

@daansan_ml

1 month

Today I'll introduce 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗩𝗲𝗰𝘁𝗼𝗿 𝗠𝗮𝗰𝗵𝗶𝗻𝗲𝘀 🤖 A useful Machine Learning algorithm that Data Scientists frequently use for both classification and regression problems. Read more about it 🧵 👇

5

66

264

David Andrés 🤖📈🐍

@daansan_ml

3 months

Permutation importance is a model-agnostic technique used to assess the importance of features in a model. This method involves systematically shuffling each feature's values one at a time and measuring the resulting change in model performance.

12

55

266

David Andrés 🤖📈🐍

@daansan_ml

6 months

Cosine similarity is a handy method to find two items' similarities. Widely used in NLP and in Recommendation Systems. Let's explain it by using a simple example of a content-based recommender system of books 🧵 👇

5

58

259

David Andrés 🤖📈🐍

@daansan_ml

6 months

Decision Trees is a key model in Machine Learning for both classification and regression. 🌳 They use a tree structure for decision-making processes (hence the name). Find out more about its components 🧵 👇

7

44

259

David Andrés 🤖📈🐍

@daansan_ml

5 months

Your models may be impacted by outliers! 🚨 From where may these outliers be coming? Let's find out the possible sources 🧵 👇

8

65

257

David Andrés 🤖📈🐍

@daansan_ml

6 months

Would you like to create and train a neural network using TensorFlow and Keras? You can find the main steps to achieve a simple version of this here 👇 1⃣ Begin by importing the necessary modules: - Sequential to define a linear stack of network layers - Dense for fully

13

62

248

David Andrés 🤖📈🐍

@daansan_ml

10 months

What are the steps of any Data Science project? 1️⃣ Define the problem or question to be answered: Clearly articulate the problem you aim to solve or the question you want to address. 2️⃣ Gather and understand the data: Collect relevant data and gain a thorough understanding of

11

62

245

David Andrés 🤖📈🐍

@daansan_ml

5 months

Looking to predict one Time Series variable based on another? Will it be beneficial? ✅ Or not? ❌ You should first check Granger causality. Check this out👇🧵

8

52

242

David Andrés 🤖📈🐍

@daansan_ml

10 months

Time to introduce the ✨𝗥𝗼𝗼𝘁 𝗠𝗲𝗮𝗻 𝗦𝗾𝘂𝗮𝗿𝗲𝗱 𝗘𝗿𝗿𝗼𝗿✨, another really useful error metric for Time Series and Machine Learning! Check this out if you are a Data Scientist! 🧑‍💻 🧵 👇

10

51

244

David Andrés 🤖📈🐍

@daansan_ml

7 months

ARIMA models with more than 1 variable? I introduce you to the ARIMAX models! 🧵 THREAD🧵 👇

4

54

244

David Andrés 🤖📈🐍

@daansan_ml

4 months

Using an ML approach like an XGBoost model to forecast Time Series Data? Extract the maximum information from the date 👇 Read more in the post below!

5

66

245

David Andrés 🤖📈🐍

@daansan_ml

1 month

Have you ever wondered how 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗩𝗲𝗰𝘁𝗼𝗿 𝗠𝗮𝗰𝗵𝗶𝗻𝗲𝘀 (SVM) can handle non-linear data? The "𝗞𝗲𝗿𝗻𝗲𝗹 𝗧𝗿𝗶𝗰𝗸" is a fascinating mathematical technique that allows efficient calculations and delivers powerful results! Let's learn more about it 🧵 👇

5

61

241

David Andrés 🤖📈🐍

@daansan_ml

5 months

Build an optimal ARIMA model efficiently. That's what you can achieve with the Box-Jenkins method. From raw data to a production-ready model step-by-step 🧵👇

6

47

239

David Andrés 🤖📈🐍

@daansan_ml

10 months

There is a kind of Neural Network that can be very useful to forecast Time Series data. These are called Recurrent Neural Networks or RNN. This type of neural network are especially designed to process sequential data, where the order of the data points is crucial, like Time

8

73

237

David Andrés 🤖📈🐍

@daansan_ml

11 months

Have you ever wondered how 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗩𝗲𝗰𝘁𝗼𝗿 𝗠𝗮𝗰𝗵𝗶𝗻𝗲𝘀 (SVM) can handle non-linear data? The "𝗞𝗲𝗿𝗻𝗲𝗹 𝗧𝗿𝗶𝗰𝗸" is a fascinating mathematical technique that allows efficient calculations and delivers powerful results! Let's learn more about it 🧵 👇

9

45

229

David Andrés 🤖📈🐍

@daansan_ml

10 months

Is your data too noisy? 🤔 Let's learn together how can you smooth your data! 🧵 👇

9

42

233

David Andrés 🤖📈🐍

@daansan_ml

10 months

Time to introduce the ✨𝗠𝗲𝗮𝗻 𝗔𝗯𝘀𝗼𝗹𝘂𝘁𝗲 𝗣𝗲𝗿𝗰𝗲𝗻𝘁𝗮𝗴𝗲 𝗘𝗿𝗿𝗼𝗿✨, a less known but really useful error metric for Time Series and Machine Learning! Check this out if you are a Data Scientist! 🧑‍💻 🧵 👇

9

41

233

David Andrés 🤖📈🐍

@daansan_ml

7 months

In Time Series Analysis and Forecasting, a base model is often a simple model used as a benchmark to compare the performance of more complex models. Let's introduce Exponential Smoothing (ES), another common basic or naive method that is commonly used as a base model 🧵 👇