If you are coming from a programming background, and feeling like your hands are tied when using SQL, this article will untie your hands. The article is highly conceptual assuming that you are familiar with programming and basic SQL already. The goal, therefore, is to help you better understand errors and debug with ease. Hence, I will not go over syntactical details besides a few photo illustration.
SQL is a rather high level programming language that it looks more like a frame work rather than a language. It’s formally categorized as a query language by Wikipedia for a reason and…
Every so often a complicated problem requires a simple and elegant solution. Decision Tree algorithm’s simple structure offers a powerful solution in both regression and classification forms. But we are going to focus on just its classification form with 2 sections: Decision tree structure and Complex data type.
Let’s fabricate 100 instances of example data with 3 independent features — “absence”, “Mid-term”, and “Final” —and 1 dependent feature “pass”. For demonstration purposes, all the features will be in binary value, yes or no, and gradually progress into other data types.
If you can’t wrap your head around Bayes Theorem like I am, give me five minutes, and let me take a crack at it with 2 sections: What is Bayes Theorem and Why is it hard to understand.
This isn’t a quick and dirty crash course. This isn’t even a normal attempt at Bayes theorem. It is rather a philosophical approach of Bayesian thinking. If you are looking for a quick formula to plug in numbers, then this article isn’t for you. But if you have some time to sit down and see how to view the world through a…
This is the beginning of all data science projects: data collection. While data repositories are everywhere, it’s convenient to turn the internet into your database. Typically this is a job for software engineers, but you might find yourself working on this in smaller companies. Let’s get to it!
The data is coming from tennisexpress.com because I play tennis. It’s a small dataset that contains 213 instances and 15 features.
I always give shout out to sources I find it useful and this time is no exception: https://www.youtube.com/watch?v=MeBU-4Xs2RU. …
The focus of this project is the last chapter of a data science project: Model deployment. So the rest of the steps will be very brief, but I will still go over them nevertheless. If you are only interested in model deployment, feel free to skip to section 7.
This time our goal is to predict a high school student’s final grade via its social economic standing and some school related features. Although the data came from a small sample size, the process of training a model to predict…
Would it be nice if we can find the early signs of diabetes? One of the promising application of machine learning can do just that. While this concept is nothing new and is widely used in many medical fields, it could be very helpful for beginner data scientists to see a different work flow with different dataset and style. So let’s get to it.
I acquired the data from University of California, Irvine (UCI) (https://archive.ics.uci.edu/ml/index.php), already in CSV format. Its size is relatively small, 520 instances (rows) with…
For a long time, car collisions have not only been the center of commuter’s stress points, it also causes great amounts of damage to public infrastructure (i.e., road signs, traffic lights) and resources (i.e., emergency calls that invoke police, firefighters, and ambulances). If city planners can clearly identify what conditions cause infrastructure damage as appose to personal injury, perhaps city planners can better shape the cities of tomorrow.
Fortunately, we do not have to find or scrape our own…