INTRODUCTION
This dataset is from Kaggle and it incorporates 3 years of collected information about gross sales of assorted vehicles and transportation autos and different modes of transportation. Python programming language was used to wash and analyze and generate visualizations from the dataset.
There are 25 columns on this dataset. There are two columns of the float datatype, seven columns of int64 datatype and sixteen of the article datatype (also called strings). Which will be categorised into 9 numerical columns and sixteen categorical columns.
A mere look on the dataset after studying the dataset right into a dataframe utilizing the python’s pandas library proven that a number of of the columns had some lacking rows with ‘Addressline2’ being virtually empty with simply 302 rows stuffed out of 2823 rows, ‘state’, ‘territory’ and ‘postalcode’ additionally having lacking values.
Additional cleansing was accomplished to the dataset earlier than chosen columns had been saved to a brand new dataframe for perception discovery. This may be seen within the Jupyter pocket book hooked up.
OBSERVATIONS
We determined to take out columns of curiosity and retailer in a brand new dataframe for additional perception discovery. We centered on the ‘product line’, ‘gross sales’, ‘metropolis’ and ‘months’ columns. As a way to make it extra handy for everybody, the values within the month’s column had been modified to their respective names. After which charts had been generated.
- Pie chart displaying share of Product Strains
A pie chart displaying the assorted product strains and their percentages. Right here, it was found that Traditional automobiles topped the record with a complete of 34% with classic automobiles following subsequent with a share of twenty-two%. The least on the record is trains with a complete of three% (I suppose that’s as a result of it’s the least fashionable and least accessible to most individuals).
2. Relationship between Product Strains and Month
The connection between the months and the product strains was inspected, leading to November boasting with probably the most gross sales made in a month. Traditional automobiles topped the record once more throughout all of the months with classic automobiles subsequent in line, there appeared to be a downward pattern of classic automobiles from Could until July earlier than we rose again as much as its peak in November. Trains got here in because the least all year long.
3. Relationship between Product Strains and Nation
The connection between the product strains, international locations and gross sales made was additionally analyzed with a countplot from python’s Seaborn library which confirmed USA consuming many of the gross sales made then France an, Spain and Italy following subsequent in no respective order.
CONCLUSION
In conclusion, the dataset had some anomalies that wanted to be cleaned earlier than the evaluation course of started. We found that Traditional and classic automobiles lead the record amongst the product strains offered. We additionally realized that gross sales usually rose to its peak within the month of November.
My appreciation goes to HNG Internship for the chance to develop my information evaluation abilities.
You may examine them out right here: https://hng.tech/internship, https://hng.tech/hire
The Jupyter pocket book will be accessed here .