The goal of this article is a step-wise example of Style Transfer with Deep Neural Networks. And we will use PyTorch to recreate the style transfer method that is outlined in the paper Image Style Transfer Using Convolutional Neural Networks.
Thanks to the Udacity Deep Learning Nanodegree for providing the source code so I have the chance to create some artwork on my own!
As you may have seen in the picture below, the idea of style transfer is not a new concept. …
Recently, as I have been working on creating synthetic data, I found it’s actually an interesting topic and very helpful in many real work scenarios. Hence, I started gathering more information about this topic and learning more advanced methods to generate synthetic data with the latest results in research.
So I would like to have a walk-through about an ecosystem of libraries called Synthetic Data Vault (SDV) that allows users to easily learn single-table, multi-table and time-series datasets to later on generate new synthetic data that has the same format and statistical properties as the original dataset.
For a single…
Recently I had a random question that is there a dataset containing a variety of data science interview questions & answers? For now, I didn’t find any so I decided to create on my own! 🥳
Hence I spent several days gathering over 300 data science interview questions & answers and finally built a dataset large enough to explore.
To be honest, in the beginning, I thought it would be really difficult to cover the majority of types of data science questions. So I decided to gather only the non-coding ones.
Surprisingly, I found that, after reading hundreds of data…
This project includes steps of necessary exploration data analysis and the workflow of building a categorical salary predictor based on real-world data scraped from the Glassdoor.
Essentially, my goal is to have a better understanding about the data science job market so it will be fun ;)
In terms of data scraping tools, I found that Selenium is useful and straightforward to understand. There is actually one article on TDS that specifically explained about how to scrape data on Glassdoor with Selenium. The article can be found as the following:
However, since the original article is written in 2019, I…
SpaCy is one of my favourite NLP libraries. And I have been using spaCy to perform a lot of Named Entity Recognition (NER) tasks. Generally, we first need to load a spaCy pre-trained model of a specific language and fine-tune the model with our training dataset. The training process can be done offline with a local computer and we can even test the fine-tuned model performance by hosting it locally through Flask / Streamlit.
Although I have found many great tutorials on deploying a spaCy model locally with Flask / Streamlit, there are not many tutorials on how to deploy…
In this article I will guide you through my thoughts on how to build a fuzzy search algorithm. A very practical use case of this algorithm is that we can use it to find alternative names for a brand saying ‘Amazon’ and we want it to return strings such as ‘AMZ’, ‘AMZN’ or ‘AMZN MKTP’.
The article follows an outline as the following:
Merchant names cleaning can be a quite challenging problem. As different bank provides different quality of transaction data, there is not a very mature way to clean the data. Commonly, merchant names cleaning can be classified as a Named Entity Recognition (NER) task and be solved in a similar way as an entity extraction problem.
For a FinTech company, a merchant names cleaning step is important because developers need to utilize these cleaned merchant names out of originally messy transaction data to generate proper transaction categorization to deliver a better customer experience in terms of managing personal finance.
Finally, we are in year 2021 🎉
It's a new chapter of life 🐣
For me, as a data scientist, I wanted to use this opportunity to summarize a list of interesting datasets that I found on Kaggle in 2021. I also hope that this list can be useful to the people who are looking for data science projects to build their own portfolio.
After taking many different pathways trying to learn data science, the most effective one I found so far is to work on projects from real datasets. …
In this project, I managed to generate word clouds based on Quantum Physics articles from year 1994 to 2009 on arXiv.org.
Original dataset can be found on Kaggle:
Before introducing my work, I would also like to recommend readings related to my work that I learned a lot from on Medium.
Full dataset contains 6 columns includes each article’s
This project is originally for my Udacity Machine Learning Engineer Nanodegree capstone project.
I found the dataset on Kaggle linked as:
I am very proud to complete this project because it challenged my skills not only in Machine Learning Engineering but also in domains such as Data Engineering and Software Engineering. I managed to learn how to use the Streamlit library in Python to build my whole ML Web app. On the web interface, you can simply start from choosing your ML model type, then adjusting hyperparameters of the model and finally selecting your evaluation metrics.