A Technical Report on the Kaggle Titanic Dataset | by Bernard Worthy

INTRODUCTION

This report is predicated on the Titanic dataset from Kaggle(https://www.kaggle.com/c/titanic/data). The first goal of this technical report is to research this dataset and develop a predictive mannequin that predicts the survival price of passengers on the Titanic.

For this report, I used two python libraries to make my remark. I used the Pandas to learn, perceive and get insights from the information. I additionally used the Seaborn library to visualise the information.

From the Prolonged Knowledge Diagram (EDD), I noticed that there are 11 columns within the dataset with 6 numerical columns and 5 categorical columns:

Numerical Knowledge:

· PassengerId

· Survived

· Pclass

· Age

· Sibsp

· Parch

Categorical Knowledge:

· Identify

· Intercourse

· Ticket

· Cabin

· Embarked

OBSERVATION

By mere wanting on the knowledge, I used to be in a position to observe that, there have been 891 passengers on the titanic and the intercourse column is extremely associated to the Survived column as a lot of the survivors are ladies.

From the Prolonged knowledge dictionary (EDD), I made the next observations:

Lacking Values:

The EDD returned a rely from the values of the columns and from that rely I used to be in a position to decide which columns had lacking values, they embody:

· Age

· Cabin

· Embarked

Attainable Outliers:

I additionally seen attainable outliers in some columns and this was due to the leap in values between the seventy fifth and the a centesimal percentile. This was seen within the following columns

· Age

· Sibsp

· Parch

· Fare.

CONCLUSION

From the dataset, I noticed lacking values in just a few columns and they are often handled by both changing the lacking values with the median or mode of the column. The imply may also be used to deal with it however there are probabilities of you having outliers within the columns.

I additionally seen outliers in sure columns and they are often handled by changing the outliers with the both the 0th or 99th percentile.

https://hng.tech/internship, https://hng.tech/hire

Source link

Practical Applications of Information Theory in Machine Learning | by Rayan Yassminh | Jul, 2024

The Future of Philosophy Modernity in a Post-Technology Bro’s Utopia | by John @ Wellspring Publication | Nsight Predictives | Jul, 2024

9 Free Stanford AI Courses

Leave A Reply Cancel Reply

Practical Applications of Information Theory in Machine Learning | by Rayan Yassminh | Jul, 2024

The Future of Philosophy Modernity in a Post-Technology Bro’s Utopia | by John @ Wellspring Publication | Nsight Predictives | Jul, 2024

9 Free Stanford AI Courses

📈 Predicting Google Stock Prices with Lorentzian Classification 🚀 | by Unicorn Day | Jul, 2024

Leveraging Analytical and Machine Learning Techniques to Solve Complex Business Problems | by Fatbardha Maloku | Jul, 2024

Most Popular

The Hamas Threat of Hostage Execution Videos Looms Large Over Social Media

Revolutionizing the Way We Find Love

Federal Investigators Widen Tesla Inquiry, Company Says

Our Picks

Practical Applications of Information Theory in Machine Learning | by Rayan Yassminh | Jul, 2024

The Future of Philosophy Modernity in a Post-Technology Bro’s Utopia | by John @ Wellspring Publication | Nsight Predictives | Jul, 2024

9 Free Stanford AI Courses

A Technical Report on the Kaggle Titanic Dataset | by Bernard Worthy | Jun, 2024

Related Posts

Leave A Reply Cancel Reply