On this venture, I got down to predict automobile promoting costs utilizing a dataset of assorted automobile options resembling engine capability, gasoline kind, and mileage. This process might be significantly helpful for people or dealerships seeking to estimate the honest market worth of autos primarily based on their traits.
In case you’re all for studying extra or making an attempt this out with your individual information, try the GitHub repository for this venture, the place I’ve uploaded all of the code and information.
The used automobile market is prospering, and with the speedy evolution of expertise, we are able to leverage machine studying fashions to foretell the promoting worth of a automobile. The purpose of this venture was to construct a regression mannequin that precisely predicts automobile costs primarily based on key options like engine dimension, mileage, gasoline kind, and others.
The dataset used on this venture contained a number of columns, resembling:
- 12 months: The manufacturing yr of the automobile.
- KM Pushed: How far the automobile has been pushed.
- Gasoline Kind: Kind of gasoline used (Diesel, Petrol, CNG, and so on.).
- Vendor Kind: Whether or not the vendor is a person or a dealership.
- Transmission: Whether or not the automobile has an computerized or guide transmission.
- Seats: Variety of seats within the automobile.
- Engine Capability: The engine’s capability in cubic centimeters.
- Max Energy: The utmost energy output of the engine in horsepower.
After cleansing the information, we have been capable of transfer ahead with the modeling course of.
Earlier than constructing the mannequin, it’s essential to grasp the relationships between the options. Right here’s a heatmap that visualizes the correlation between numeric columns:
As seen above, the promoting worth has the very best correlation with the max energy output of the automobile, indicating that automobiles with increased energy are likely to have increased promoting costs. Regardless of this, not one of the options confirmed such excessive correlation that it will warrant their exclusion.
To organize the information for machine studying, I carried out a number of transformations:
- Transformed categorical information, resembling gasoline kind and vendor kind, into numerical values.
- Scaled the options to standardize the information and enhance mannequin efficiency.
The ultimate set of options used for coaching the mannequin have been:
['year', 'km_driven', 'fuel', 'seller_type', 'transmission', 'seats',
'torque_rpm', 'mil_kmpl', 'engine_cc', 'max_power_new', 'First Owner',
'Fourth & Above Owner', 'Second Owner', 'Test Drive Car', 'Third Owner']
For this process, I selected to make use of a Random Forest Regressor, which is a sturdy machine studying algorithm that works effectively with each numerical and categorical information. Random Forests are additionally much less vulnerable to overfitting in comparison with less complicated choice tree fashions.
After coaching the mannequin, I evaluated its efficiency utilizing metrics resembling R-squared and Imply Squared Error. The mannequin carried out moderately effectively, with an R-squared rating of X.X, which signifies that it explains many of the variability within the automobile costs.
To check the mannequin, I used a brand new set of automobile information and made predictions. Right here’s an instance of a prediction for a 2017 automobile with 50,000 km pushed:
yr: 2017
km_driven: 50000
gasoline: Petrol
seller_type: Particular person
transmission: Guide
seats: 5
max_power: 120 hp
engine_capacity: 1800 cc
The anticipated promoting worth for this automobile is INR 1,259,990.
By utilizing machine studying, we are able to predict automobile promoting costs with a superb stage of accuracy. This mannequin might be deployed in dealerships or utilized by people to estimate the honest market worth of a car earlier than making a purchase order or sale.