GPT Fine-Tuning
Are you having difficulty getting your GPT models to return the exact kind of outputs you're expecting? Have you already tried prompt engineering, yet the responses are still not quite what you want?
This is when you might want to try fine-tuning your GPT model. This involves performing a small amount of additional training on your GPT model, using your own data containing expected input and output pairs. With this, the GPT model will be more likely to give the expected outputs.
Quick Start
Before you start, make sure the NuPIC Inference Server and Training Module are up and running, and the Python environment is set up. To simplify things, we'll assume for this example that the Inference Server, Training Module, and Python clients are all installed on the same machine.
For fine-tuning GPT models, we recommend a GPU with sufficient memory. As a general rule of thumb, you'll want a GPU with at least twice the memory requirements of your model.
Let's start by navigating to the directory containing the GPT fine-tuning example.
cd nupic.examples/examples/gpt_finetuning
In this example, suppose you want to have a GPT model that's really good at producing summaries of news articles. With this model, you want to simply provide it with some prose, and the model should return a summary, without any additional prompt engineering. Here we'll do so by fine-tuning the lightweight Gemma 2B model.
We have the following files and directories in this folder:
/gpt_finetuning/
├── datasets
│ ├── cc_news_test.csv # For validating the model during training
│ └── cc_news_train.csv # Training dataset containing input and output example pairs
└── finetune.sh # Use this to start fine-tuning
Let's make finetune.sh
executable, and start fine-tuning right away!
chmod +x finetune.sh
./finetune.sh
This sends a fine-tuning job to the Training Module. When fine-tuning is completed, the tuned model is saved to the indicated directory. You should see the following contents:
finetuned_models/
└── gemma-2b-it-tuned-BFB4A28B48C246DA837364D676D15C7C
├── 1
│ ├── config.json
│ ├── generation_config.json
│ ├── model-00001-of-00003.safetensors
│ ├── model-00002-of-00003.safetensors
│ ├── model-00003-of-00003.safetensors
│ ├── model.py
│ ├── model.safetensors.index.json
│ ├── special_tokens_map.json
│ ├── tokenizer_config.json
│ └── tokenizer.json
└── config.pbtxt
To deploy this tuned model, copy the tuned model folder into the Inference Server.
cp -r finetuned_models/gemma-2b-it-tuned-BFB4A28B48C246DA837364D676D15C7C nupic/inference/models
We're almost ready to perform inference. Let's make some minor code changes to register our new model in the prompt formatter utilities. Open nupic/nupic.examples/src/nupic/client/utils.py
in a text editor of your choice. We want to add the tuned model to model_naming_mapping
and model_prompt_formatter_mapping
. The tuned model will use the same prompt formatter as the original base model:
model_naming_mapping = {
"llama2": "llama2.chat.7b",
"llama2-streaming": "llama2.chat.7b.streaming",
"llama3": "llama3.chat.8b",
"llama3-streaming": "llama3.chat.8b.streaming",
"zephyr": "zephyr.beta.7b",
"zephyr-streaming": "zephyr.beta.7b.streaming",
"gemma": "gemma.it.2b",
"gemma-streaming": "gemma.it.2b.streaming",
"gemma2": "gemma2.it.9b",
"gemma2-streaming": "gemma2.it.9b.streaming",
"nupic-corti": "nupic-gpt.7b-corti.v0",
"nupic-corti-streaming": "nupic-gpt.7b-corti.streaming.v0",
"nupic-dendi": "nupic-gpt.7b-dendi.v0",
"gemma-tuned": "gemma-2b-it-tuned-BFB4A28B48C246DA837364D676D15C7C" <---------
}
model_prompt_formatter_mapping = {
"llama2.chat.7b": Llama2PromptFormatter,
"llama3.chat.8b": Llama3PromptFormatter,
"zephyr.beta.7b": ZephyrPromptFormatter,
"gemma.it.2b": GemmaPromptFormatter,
"gemma2.it.9b": GemmaPromptFormatter,
"nupic-gpt.7b-corti.v0": NupicGPTPromptFormatter,
"nupic-gpt.7b-dendi.v0": NupicGPTPromptFormatter,
"gemma-2b-it-tuned-BFB4A28B48C246DA837364D676D15C7C": GemmaPromptFormatter <-------
}
Now let's navigate to and modify nupic/nupic.examples/examples/gpt/gpt.py
to test out the summarisation capabilities of the tuned model. We'll pass an excerpt from the United States Declaration of Independence as a input. Note that we're passing just the excerpt, without any further instructions to the model.
message = (
"""We hold these truths to be self-evident, that all men are created equal, that
they are endowed by their Creator with certain unalienable Rights, that among these are
Life, Liberty and the pursuit of Happiness.--That to secure these rights, Governments are
instituted among Men, deriving their just powers from the consent of the governed, --That
whenever any Form of Government becomes destructive of these ends, it is the Right of the
People to alter or to abolish it, and to institute new Government, laying its foundation
on such principles and organizing its powers in such form, as to them shall seem most
\likely to effect their Safety and Happiness. Prudence, indeed, will dictate that
Governments long established should not be changed for light and transient causes; and
accordingly all experience hath shewn, that mankind are more disposed to suffer, while
evils are sufferable, than to right themselves by abolishing the forms to which they are
accustomed. But when a long train of abuses and usurpations, pursuing invariably the same
Object evinces a design to reduce them under absolute Despotism, it is their right, it is
their duty, to throw off such Government, and to provide new Guards for their future
security.--Such has been the patient sufferance of these Colonies; and such is now the
necessity which constrains them to alter their former Systems of Government. The history
of the present King of Great Britain is a history of repeated injuries and usurpations,
all having in direct object the establishment of an absolute Tyranny over these States.
To prove this, let Facts be submitted to a candid world."""
)
Let's run the inference! Remember to specify that you want to use the tuned model:
python gpt.py -m gemma-tuned
If all goes well, the model will know that you want a summary (because that's how we tuned it).
Expected output:
user
You are a helpful AI assistant.
We hold these truths to be self-evident, that all men are created equal,
...
To prove this, let Facts be submitted to a candid world.
model
* All men are created equal.
* They are endowed by their Creator with certain unalienable Rights.
* These Rights include Life, Liberty and the pursuit of Happiness.
* Governments are instituted among Men, deriving their just powers from the consent of the governed.
* Whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it.
That's all to it! GPT fine-tuning isn't limited to just summarization tasks. It's generally useful if you have at least 1,000 examples of input-output pairs that you expect from the model.
Still curious? Let's dig a little deeper to see under the hood.
In More Detail
Let's start by examining the dataset. We can look at either cc_news_train.csv
or cc_news_test.csv
. The headers and first row should look like this:
Label | Text |
---|---|
Android bug causes text messages to show up in Google Search | "An unusual Android bug found by a Reddit user is causing a lot of people to scratch their heads: When typing ""the1975..com"" (note the extra dot) the poster found all of his recent text messages displayed through Google instead of the content he was searching for. Other Reddit users chimed in saying that they were getting the same results, with some even experiencing it when they performed a search for ""Vizela viagens"" (a travel agency in one Reddit user's home town), and the glitch also appeared with variant spellings too. Reddit users with a wide variety of Android devices said they were affected as well. Particularly interesting is one comment stating that if you type ""my text messages"" into the search bar you receive the same result. What this means for how Google may be caching text messages is unknown. Is it a security risk? Google has yet to issue a statement regarding the bug, and without some acknowledgement of the issue and its scope it's difficult to know if there is any real risk of data exposure. More likely is the possibility that the bug is connected to Google Assistant, as pointed out by an article on MSPoweruser: ""upon testing, I found out that any Android device with Google Assistant has this issue. It isn't certain if this is a deliberate action by Google or just a weird glitch that [lets] Google access the messages stored on the device."" SEE: Mobile device computing policy (Tech Pro Research) Google Assistant has been able to read text messages aloud for some time, raising the distinct possibility that Google simply overlooked a few precise search terms that would generate the same result. Those concerned about the privacy and security of their text messages can protect themselves, and the process is easy. Open the Settings app, tap on the Apps option, and then revoke SMS permissions from the Google app. There's a good chance this isn't anything more than a simple bug, but taking precautions to protect your personal messages is good to do just in case. The big takeaways for tech leaders: Android users are reportedly finding their text messages in Google Search results when typing specific things like ""the1975..com"" or ""Vizela viagens."" It's not known at this point if there is a security risk inherent in this glitch. It's more likely that this is a bug tied to Google Assistant's ability to read text messages aloud. If you're still concerned about device security you can disable Google's SMS permissions in the Android Settings app. Learn about the latest exploits and bugs by subscribing to our Cybersecurity Insider newsletter. Subscribe Also see" |
... | ... |
The datasets in this example are news articles from Common Crawl, which is a repository of open web-scraped datasets.
For GPT fine-tuning, it is necessary to have the Label
and Text
headers to denote expected outputs and inputs, respectively. In the case of our examples, Text
contains the original news article, and Label
contains the summary that we'd like the model to produce.
Now let's look at how the actual fine-tuning is done. Opening finetune.sh
, we see that the script is calling the regular nupic_train
CLI, which in turn communicates with the inference server. The regular CLI arguments are available, but we'll highlight the ones that are important for GPT fine-tuning:
python -m nupic.client.nupic_train \
--model google/gemma-2b-it \
--seed $seed \
--url http://localhost:8321 \
--batch_size 1 \
--epochs 5 \
--task_type gpt_sft \
--hf_access_token "your token here" \
--train_path $train_dataset \
--test_path $test_dataset \
--use_lora \
--input_prompt "$in_prompt" \
--response_prompt "$rsp_prompt" \
batch_size
: use small batch sizes as GPT models already require large amounts of memoryepochs
: start with a small number of epochs, e.g., 1 to 5 epochs. If not using LORA (see below), fine-tuning for too many epochs can cause a model to forget what it was originally trained on!task_type
: thegpt_sft
string tells the Training Module that we want to fine-tune a GPT modeluse_lora
: this argument indicates that instead of tuning the whole GPT model, which is computationally costly, we want to fine-tune just a small adaptor module that can modify the behavior of the modelinput_prompt
andoutput_prompt
: this prepends special tokens and/prompts to input and output examples in the dataset. This helps the model "recognise" that a given string is an input or output:
in_prompt="<start_of_turn>user\n Create a headline for the following news article:\n"
rsp_prompt="<end_of_turn>\n<start_of_turn>model\n"
Note the special tokens <start_of_turn>
and <end_of_turn>
. These are specific to each model family, and are typically required to help models recognise conversational turns. Please pay careful attention to the placement of these tokens for fine-tuning. For some models, the <end_of_turn>
token (or its equivalent) occurs comes at the end of the user's turn/start of the model's turn, such as in the case above for Gemma. However, in other models (such as Llama 2), the same token is instead required only at the end of the model's response. For more information, please consult the official documentation for each model.
Updated 5 months ago