Every so often a complicated problem requires a simple and elegant solution. Decision Tree algorithm’s simple structure offers a powerful solution in both regression and classification forms. But we are going to focus on just its classification form with 2 sections: Decision tree structure and Complex data type.

Decision tree structure

Let’s fabricate 100 instances of example data with 3 independent features — “absence”, “Mid-term”, and “Final” —and 1 dependent feature “pass”. For demonstration purposes, all the features will be in binary value, yes or no, and gradually progress into other data types.


img source: https://www.etsy.com/listing/971500381/space-nesting-dolls-solar-system-gift?ga_order=most_relevant&ga_search_type=all&ga_view_type=gallery&ga_search_query=russian+dolls&ref=sc_gallery-2-8&plkey=29f3fe31212da444799920dba73a8279e4201678%3A971500381

If you can’t wrap your head around Bayes Theorem like I am, give me five minutes, and let me take a crack at it with 2 sections: What is Bayes Theorem and Why is it hard to understand.

This isn’t a quick and dirty crash course. This isn’t even a normal attempt at Bayes theorem. It is rather a philosophical approach of Bayesian thinking. If you are looking for a quick formula to plug in numbers, then this article isn’t for you. But if you have some time to sit down and see how to view the world through a…


Methodology:

  1. Introduction
  2. Data
  3. Conclusion
  1. Introduction

This is the beginning of all data science projects: data collection. While data repositories are everywhere, it’s convenient to turn the internet into your database. Typically this is a job for software engineers, but you might find yourself working on this in smaller companies. Let’s get to it!

2. Data

The data is coming from tennisexpress.com because I play tennis. It’s a small dataset that contains 213 instances and 15 features.

I always give shout out to sources I find it useful and this time is no exception: https://www.youtube.com/watch?v=MeBU-4Xs2RU. …


JAKER5000 https://www.teenvogue.com/story/why-teachers-getting-rid-grades

Methodology:

  1. Introduction
  2. Data
  3. EDA
  4. Data Engineering/cleaning
  5. Model Building
  6. Test
  7. Deployment
  8. Conclusion
  1. Introduction

The focus of this project is the last chapter of a data science project: Model deployment. So the rest of the steps will be very brief, but I will still go over them nevertheless. If you are only interested in model deployment, feel free to skip to section 7.

This time our goal is to predict a high school student’s final grade via its social economic standing and some school related features. Although the data came from a small sample size, the process of training a model to predict…


iStock.com

Methodology:

  1. Introduction
  2. Data
  3. EDA
  4. Data Engineering/cleaning
  5. Model Building
  6. Test/Conclusion

1. Introduction

Would it be nice if we can find the early signs of diabetes? One of the promising application of machine learning can do just that. While this concept is nothing new and is widely used in many medical fields, it could be very helpful for beginner data scientists to see a different work flow with different dataset and style. So let’s get to it.

2. Data

I acquired the data from University of California, Irvine (UCI) (https://archive.ics.uci.edu/ml/index.php), already in CSV format. Its size is relatively small, 520 instances (rows) with…


Photo: University of Washington, college of built environment, http://be.uw.edu/2017-cbe-research-open-labs/satelliteseattle/

Methodology:

  1. Introduction to the problem
  2. Data
  3. EDA(exploratory data analysis)
  4. Data preparation
  5. Feature Selection
  6. Model Building.
  7. Model Evaluation
  8. Conclusion

I. Introduction

For a long time, car collisions have not only been the center of commuter’s stress points, it also causes great amounts of damage to public infrastructure (i.e., road signs, traffic lights) and resources (i.e., emergency calls that invoke police, firefighters, and ambulances). If city planners can clearly identify what conditions cause infrastructure damage as appose to personal injury, perhaps city planners can better shape the cities of tomorrow.

II. Data

Fortunately, we do not have to find or scrape our own…

袁晗 | Luo, Yuan Han

Balance is the key Personal Website: https://sites.google.com/view/luoyuan/home

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store