What is ChatGPT?
ChatGPT is the most recent and
state-of-the-art type of conversational AI that is the result of the most sophisticated
and revolutionary OpenAI AI
language models like GPT-3, which has made AI-generated text and conversational
AI increasingly realistic, and often hard to tell apart from something written
by a human. ChatGPT is the newest chatbot from OpenAI and it demonstrated that
chatbots can interact in a very conversational way just like humans do.
The GPT-3 model
is used by the ground-breaking chatbot, ChatGPT, to create plausible dialogue
from a brief writing prompt. This gives it the ability to create stories,
tackle challenging problems, clarify ideas, and—given the appropriate prompt.
How can we use ChatGPT?
As per the OpenAI, the chatGPT was
trained using Reinforcement Learning from Human Feedback (RLHF), with a few
minor variations in the data collection arrangement. An initial model was also
trained first using supervised fine-tuning by having human AI trainers act as
both the user and the AI assistant in chats. The trainers were provided with
access to sample writing recommendations to assist them in creating their
responses.
In order to build a reward model
for reinforcement learning, comparison data were needed, which included at
least two model replies ordered by quality. The chatbot interactions that AI
trainers conducted with it were used to get this data. A model-written
statement was chosen at random, sampled several potential conclusions, and AI
trainers were asked to rank them. Finally, a Proximal Policy
Optimization which is a new state of
art class reinforcement learning algorithm, by OpenAI, was used to adjust
the model using these reward models as it is much simpler to implement and tune
than other methods. This process was iterated upon several times as seen in the
Figure below:
Comments
Post a Comment