Right here I’ll let you know a narrative, yesterday I used to be attempting to compete in kaggle competiton which is called as Pure Language Processing with Catastrophe Tweets , you get the problem hyperlink right here — https://www.kaggle.com/competitions/nlp-getting-started
This specific problem is ideal for information scientists trying to get began with Pure Language Processing. The competitors dataset just isn’t too massive, and even for those who don’t have a lot private computing energy, you are able to do the entire work in our free, no-setup, Jupyter Notebooks setting referred to as Kaggle Notebooks.
So on this problem I must Predict which Tweets are about actual disasters and which of them aren’t.
instance twets are like — Our Deeds are the Purpose of this #earthquake Might ALLAH Forgive us all which is disasterous because it matches the some phrases like earthquake .
The Submissions are evaluated utilizing F1 between the anticipated and anticipated solutions.
and my f1 rating was coming 0.79773 which additional could be improved utilizing BERT mannequin however on account of time contraint I haven’t accomplished the coaching however I’ve given the code within the pocket book to coach it and it’ll definately enhance th F1 rating .
So lets start our journey —
Firstly I’ll give a quick overview of dataset
You’ll want prepare.csv, take a look at.csv and sample_submission.csv.
Every pattern within the prepare and take a look at set has the next info:
- The
textual content
of a tweet - A
key phrase
from that tweet (though this can be clean!) - The
location
the tweet was despatched from (may additionally be clean)
You might be predicting whether or not a given tweet is about an actual catastrophe or not. In that case, predict a 1
. If not, predict a 0
.
- prepare.csv — the coaching set
- take a look at.csv — the take a look at set
- sample_submission.csv — a pattern submission file within the appropriate format
Don’t fear this description just isn’t written by me 😂 its given on the problem web page refere there to study extra about it.
Now I’ll start the chatGPT half like the way it simply accomplished the problem in simply two hours with the ultimate output file —
immediate — 1-)
Pure Language Processing with Catastrophe Tweets
The writer explicitly makes use of the phrase “ABLAZE” however means it metaphorically. That is clear to a human instantly, particularly with the visible help. But it surely’s much less clear to a machine.
On this competitors, you’re challenged to construct a machine studying mannequin that predicts which Tweets are about actual disasters and which one’s aren’t. You’ll have entry to a dataset of 10,000 tweets that had been hand labeled.
Submission File For every ID within the take a look at set, you have to predict 1 if the tweet is describing an actual catastrophe, and 0 in any other case.
The file ought to comprise a header and have the next format: id,goal 0,0 2,0 3,1 9,0 11,0
Have you ever underterstand this challenege , If sure then reply me sure solely
immediate —2-)
Dataset Description
What information do I want?
You’ll want prepare.csv, take a look at.csv and sample_submission.csv.
What ought to I count on the information format to be?
Every pattern within the prepare and take a look at set has the next info:
The textual content of a tweet A key phrase from that tweet (though this can be clean!)
The placement the tweet was despatched from (may additionally be clean)
What am I predicting?
You might be predicting whether or not a given tweet is about an actual catastrophe or not.
In that case, predict a 1. If not, predict a 0.
Recordsdata
prepare.csv — the coaching set
take a look at.csv — the take a look at set
sample_submission.csv — a pattern submission file within the appropriate format
Columns id — a novel identifier for every tweet
textual content — the textual content of the tweet
location — the situation the tweet was despatched from (could also be clean)
key phrase — a selected key phrase from the tweet (could also be clean)
goal — in prepare.csv solely, this denotes whether or not a tweet is about an actual catastrophe (1) or not (0)
simply undertand this description subsequent i’ll present extra data in subsequent immediate.
ChatGPT reply —
Bought it. I perceive the dataset description and the information concerned. Let me know whenever you’re prepared to supply extra info!
immediate — 3-)
train_df.data()
<class ‘pandas.core.body.DataFrame’>
RangeIndex: 7613 entries, 0 to 7612
Knowledge columns (complete 5 columns):
# Column Non-Null Depend Dtype — — — — — — — — — — — — — — —
0 id 7613 non-null int64 1 key phrase 7552 non-null object 2 location 5080 non-null object 3 textual content 7613 non-null object 4 goal 7613 non-null int64
did you get this .
So right here principally I’ve supplied the outline of my prepare dataset for it to grasp information totally.
chatGPT reply –
immediate — 4-)
chatGPT reply —
Immediate — 5)
Right here please ignore the spelling errors in my immediate , Sorry!
Now It give me step resolution for the issue ,firstly it began from preprocessing steps –
Response —
Completely, let’s begin with the preliminary preprocessing steps for an NLP-based downside like this:
- ****************************
- ****************************
Would you prefer to proceed with these steps or dive deeper into any particular one? Let me know if you wish to proceed or want extra particulars on any of those factors!
Right here I instructed the mannequin to offer me the step-wise resolution , don’t give the whole lot without delay .
now remainder of the issues may be very straightforward , you simply want to offer easy immediate like — thats good now lets transfer to subsequent step , for those who fulfill together with your respeonse.
This makes it straightforward to deal with the error if coming in any of the a part of the answer as its shifting step-by-step so it may possibly simply perceive the error half and will help you to unravel it
On the final once I was evaluating after coaching the mannequin , I’ve even ask it to enhance the accuracy , I’m not any immediate engineer, so principally I’ve handed the entire output to it to assist me to enhance the efficiency.
my immediate –
so it has given me many approaches , I requested him use anyone method that you simply assume may work however give me one method solely
so it give me the Pre-trained Transformers like BERT to enhance efficiency.
As I’ve informed you that I haven’t used the BERT method(it takes time to coach on 1.5 lakh rows ) , it was mid-night i used to be having the morning courses so I merely used LogisticRegression and uploaded my resolution , However I’ve given transformer method in my pocket book so that you can fine-tune it and enhance the accuracy .
My rank LOL 😆(407/872)— simply in two hours
Verify Out my kaggle pocket book — https://www.kaggle.com/code/abhi0708/natural-language-processing-with-disaster-tweets
ChatGPT Immediate hyperlink — https://chatgpt.com/share/66ec2dff-c1a4-8002-8aa0-e6159696588c
Competitors hyperlink — https://www.kaggle.com/competitions/nlp-getting-started
Thats it for at this time ….
Please sorry for the spelling errors in my immediate.
You dont must be nice immediate engineer to get what you need , simply be particular to what questions come to your thoughts whenever you attempt to resolve any downside , think about chatGPT like you might be simply working together with your good friend , who will help you with downside fixing method and may appropriate you .
You don’t must kind skilled immediate , simply be formal and ask it .
In case you have any questions, I will likely be blissful to reply them within the feedback part beneath!
And Don’t overlook to share this with the world to assist make it a greater place. Possibly your click on will change somebody’s life
Don’t Miss my upcoming updates. Be a part of me:
https://abhijeetas8660211.medium.com/subscribe
When you’ve discovered worth on this article, you’ll love the content material I create as a contract author. So please comply with me to not miss the approaching articles which may rework your studying and can ability up in your profession .
I recurrently write blogs on AI/ML and associated to all tech so please learn my different blogs In case you have time .
If you wish to know the best way to Enhance Gross sales with AI by predicting buyer behaviour then learn my one other blog- https://medium.com/gitconnected/boost-sales-with-ai-predict-visitor-purchases-using-bigquery-ml-gcp-acff07a7366f
Comply with me on LinkedIn : https://www.linkedin.com/in/abhijeet-singh-40a513258/
Comply with me on Github : https://github.com/abhijeetGithu