Top 10 Open-Source Data Science Tools in 2022

Top 10 Open-Source Data Science Tools in 2022

I’m not going to list Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, TensorFlow, PyTorch, etc. You probably know about these already. There is nothing wrong with these libraries; they’re already the bare minimum essential for data science using python. And the...
How Can I Measure Data Quality?

How Can I Measure Data Quality?

Flag all your data quality issues by priority in a few lines of code “Everyone wants to do the model work, not the data work” — Google Research According to Alation’s State of Data Culture Report, 87% of employees attribute poor data quality to why most organizations...
High-quality data meets enterprise MLOps

High-quality data meets enterprise MLOps

According to the 2021 enterprise trends in machine learning report by Algorithmia, 83% of all organizations have increased their AI/ML budgets year-on-year, and the average number of data scientists employed has grown by 76% over the same period. However, the process...
Synthetic Time-Series Data: A GAN approach

Synthetic Time-Series Data: A GAN approach

Generate synthetic sequential data with TimeGAN Time-series or sequential data can be defined as any data that has time dependency. Cool, huh, but where can I find sequential data? Well, a bit everywhere, from credit card transactions, my everyday...
Synthetic Data with Gumbel-Softmax Activations

Synthetic Data with Gumbel-Softmax Activations

Photo by v2osk on Unsplash     In classification a central problem is how to effectively learn from discrete data formats (categorical or ordinal features). Most datasets present us with this problem so I guess it is fair to say that almost all data scientists have...