Instantly share code, notes, and snippets.

@koen-dejonghe

koen-dejonghe / DinosaurIslandCharRnn.scala

  • Download ZIP
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Embed Embed this gist in your website.
  • Share Copy sharable link for this gist.
  • Clone via HTTPS Clone using the web URL.
  • Learn more about clone URLs
  • Save koen-dejonghe/7fdb032a800526a82cec604f61cefb24 to your computer and use it in GitHub Desktop.

Character level language model - Dinosaurus land ¶

Welcome to Dinosaurus Island! 65 million years ago, dinosaurs existed, and in this assignment they are back. You are in charge of a special task. Leading biology researchers are creating new breeds of dinosaurs and bringing them to life on earth, and your job is to give names to these dinosaurs. If a dinosaur does not like its name, it might go beserk, so choose wisely!

Luckily you have learned some deep learning and you will use it to save the day. Your assistant has collected a list of all the dinosaur names they could find, and compiled them into this dataset . (Feel free to take a look by clicking the previous link.) To create new dinosaur names, you will build a character level language model to generate new names. Your algorithm will learn the different name patterns, and randomly generate new names. Hopefully this algorithm will keep you and your team safe from the dinosaurs' wrath!

By completing this assignment you will learn:

  • How to store text data for processing using an RNN
  • How to synthesize data, by sampling predictions at each time step and passing it to the next RNN-cell unit
  • How to build a character-level text generation recurrent neural network
  • Why clipping the gradients is important

We will begin by loading in some functions that we have provided for you in rnn_utils . Specifically, you have access to functions such as rnn_forward and rnn_backward which are equivalent to those you've implemented in the previous assignment.

1 - Problem Statement ¶

1.1 - dataset and preprocessing ¶.

Run the following cell to read the dataset of dinosaur names, create a list of unique characters (such as a-z), and compute the dataset and vocabulary size.

The characters are a-z (26 characters) plus the "\n" (or newline character), which in this assignment plays a role similar to the <EOS> (or "End of sentence") token we had discussed in lecture, only here it indicates the end of the dinosaur name rather than the end of a sentence. In the cell below, we create a python dictionary (i.e., a hash table) to map each character to an index from 0-26. We also create a second python dictionary that maps each index back to the corresponding character character. This will help you figure out what index corresponds to what character in the probability distribution output of the softmax layer. Below, char_to_ix and ix_to_char are the python dictionaries.

1.2 - Overview of the model ¶

Your model will have the following structure:

  • Initialize parameters
  • Forward propagation to compute the loss function
  • Backward propagation to compute the gradients with respect to the loss function
  • Clip the gradients to avoid exploding gradients
  • Using the gradients, update your parameter with the gradient descent update rule.
  • Return the learned parameters

At each time-step, the RNN tries to predict what is the next character given the previous characters. The dataset $X = (x^{\langle 1 \rangle}, x^{\langle 2 \rangle}, ..., x^{\langle T_x \rangle})$ is a list of characters in the training set, while $Y = (y^{\langle 1 \rangle}, y^{\langle 2 \rangle}, ..., y^{\langle T_x \rangle})$ is such that at every time-step $t$, we have $y^{\langle t \rangle} = x^{\langle t+1 \rangle}$.

Figure 1 : Recurrent Neural Network, similar to what you had built in the previous notebook "Building a RNN - Step by Step".

2 - Building blocks of the model ¶

In this part, you will build two important blocks of the overall model:

  • Gradient clipping: to avoid exploding gradients
  • Sampling: a technique used to generate characters

You will then apply these two functions to build the model.

2.1 - Clipping the gradients in the optimization loop ¶

In this section you will implement the clip function that you will call inside of your optimization loop. Recall that your overall loop structure usually consists of a forward pass, a cost computation, a backward pass, and a parameter update. Before updating the parameters, you will perform gradient clipping when needed to make sure that your gradients are not "exploding," meaning taking on overly large values.

In the exercise below, you will implement a function clip that takes in a dictionary of gradients and returns a clipped version of gradients if needed. There are different ways to clip gradients; we will use a simple element-wise clipping procedure, in which every element of the gradient vector is clipped to lie between some range [-N, N]. More generally, you will provide a maxValue (say 10). In this example, if any component of the gradient vector is greater than 10, it would be set to 10; and if any component of the gradient vector is less than -10, it would be set to -10. If it is between -10 and 10, it is left alone.

Exercise : Implement the function below to return the clipped gradients of your dictionary gradients . Your function takes in a maximum threshold and returns the clipped versions of your gradients. You can check out this hint for examples of how to clip in numpy. You will need to use the argument out = ... .

Figure 2 : Visualization of gradient descent with and without gradient clipping, in a case where the network is running into slight "exploding gradient" problems.

Expected output:

2.2 - Sampling ¶

Now assume that your model is trained. You would like to generate new text (characters). The process of generation is explained in the picture below:

Character level language model - Dinosaurus Island ¶

Welcome to Dinosaurus Island! 65 million years ago, dinosaurs existed, and in this assignment, they have returned.

You are in charge of a special task: Leading biology researchers are creating new breeds of dinosaurs and bringing them to life on earth, and your job is to give names to these dinosaurs. If a dinosaur does not like its name, it might go berserk, so choose wisely!

Luckily you're equipped with some deep learning now, and you will use it to save the day! Your assistant has collected a list of all the dinosaur names they could find, and compiled them into this dataset . (Feel free to take a look by clicking the previous link.) To create new dinosaur names, you will build a character-level language model to generate new names. Your algorithm will learn the different name patterns, and randomly generate new names. Hopefully this algorithm will keep you and your team safe from the dinosaurs' wrath!

By the time you complete this assignment, you'll be able to:

  • Store text data for processing using an RNN
  • Build a character-level text generation model using an RNN
  • Sample novel sequences in an RNN
  • Explain the vanishing/exploding gradient problem in RNNs
  • Apply gradient clipping as a solution for exploding gradients

Begin by loading in some functions that are provided for you in rnn_utils . Specifically, you have access to functions such as rnn_forward and rnn_backward which are equivalent to those you've implemented in the previous assignment.

Table of Contents ¶

  • 1.1 - Dataset and Preprocessing
  • 1.2 - Overview of the Model
  • Exercise 1 - clip
  • Exercise 2 - sample
  • Exercise 3 - optimize
  • Exercise 4 - model
  • 4 - Writing like Shakespeare (OPTIONAL/UNGRADED)
  • 5 - References

Packages ¶

1 - problem statement ¶, 1.1 - dataset and preprocessing ¶.

Run the following cell to read the dataset of dinosaur names, create a list of unique characters (such as a-z), and compute the dataset and vocabulary size.

  • The characters are a-z (26 characters) plus the "\n" (or newline character).
  • Here, "\n" indicates the end of the dinosaur name rather than the end of a sentence.
  • char_to_ix : In the cell below, you'll create a Python dictionary (i.e., a hash table) to map each character to an index from 0-26.
  • This will help you figure out which index corresponds to which character in the probability distribution output of the softmax layer.

1.2 - Overview of the Model ¶

Your model will have the following structure:

  • Initialize parameters
  • Forward propagation to compute the loss function
  • Backward propagation to compute the gradients with respect to the loss function
  • Clip the gradients to avoid exploding gradients
  • Using the gradients, update your parameters with the gradient descent update rule.
  • Return the learned parameters

programming assignment dinosaur island character level language modeling

  • At each time-step, the RNN tries to predict what the next character is, given the previous characters.
  • $\mathbf{X} = (x^{\langle 1 \rangle}, x^{\langle 2 \rangle}, ..., x^{\langle T_x \rangle})$ is a list of characters from the training set.
  • $\mathbf{Y} = (y^{\langle 1 \rangle}, y^{\langle 2 \rangle}, ..., y^{\langle T_x \rangle})$ is the same list of characters but shifted one character forward.
  • At every time-step $t$, $y^{\langle t \rangle} = x^{\langle t+1 \rangle}$. The prediction at time $t$ is the same as the input at time $t + 1$.

2 - Building Blocks of the Model ¶

In this part, you will build two important blocks of the overall model:

  • Gradient clipping: to avoid exploding gradients
  • Sampling: a technique used to generate characters

You will then apply these two functions to build the model.

2.1 - Clipping the Gradients in the Optimization Loop ¶

In this section you will implement the clip function that you will call inside of your optimization loop.

Exploding gradients ¶

  • When gradients are very large, they're called "exploding gradients."
  • Exploding gradients make the training process more difficult, because the updates may be so large that they "overshoot" the optimal values during back propagation.

Recall that your overall loop structure usually consists of:

  • forward pass,
  • cost computation,
  • backward pass,
  • parameter update.

Before updating the parameters, you will perform gradient clipping to make sure that your gradients are not "exploding."

Gradient clipping ¶

In the exercise below, you will implement a function clip that takes in a dictionary of gradients and returns a clipped version of gradients, if needed.

  • There are different ways to clip gradients.
  • You will use a simple element-wise clipping procedure, in which every element of the gradient vector is clipped to fall between some range [-N, N].
  • The range is [-10, 10]
  • If any component of the gradient vector is greater than 10, it is set to 10.
  • If any component of the gradient vector is less than -10, it is set to -10.
  • If any components are between -10 and 10, they keep their original values.

programming assignment dinosaur island character level language modeling

Exercise 1 - clip ¶

Return the clipped gradients of your dictionary gradients .

  • Your function takes in a maximum threshold and returns the clipped versions of the gradients.
  • You will need to use the argument " out = ... ".
  • Using the " out " parameter allows you to update a variable "in-place".
  • If you don't use " out " argument, the clipped variable is stored in the variable "gradient" but does not update the gradient variables dWax , dWaa , dWya , db , dby .

Expected values

2.2 - Sampling ¶

Now, assume that your model is trained, and you would like to generate new text (characters). The process of generation is explained in the picture below:

programming assignment dinosaur island character level language modeling

Exercise 2 - sample ¶

Implement the sample function below to sample characters.

You need to carry out 4 steps:

  • This is the default input before you've generated any characters. You also set $a^{\langle 0 \rangle} = \vec{0}$
  • Step 2 : Run one step of forward propagation to get $a^{\langle 1 \rangle}$ and $\hat{y}^{\langle 1 \rangle}$. Here are the equations:

hidden state: $$ a^{\langle t+1 \rangle} = \tanh(W_{ax} x^{\langle t+1 \rangle } + W_{aa} a^{\langle t \rangle } + b)\tag{1}$$

activation: $$ z^{\langle t + 1 \rangle } = W_{ya} a^{\langle t + 1 \rangle } + b_y \tag{2}$$

prediction: $$ \hat{y}^{\langle t+1 \rangle } = softmax(z^{\langle t + 1 \rangle })\tag{3}$$

  • Note that $\hat{y}^{\langle t+1 \rangle }$ is a (softmax) probability vector (its entries are between 0 and 1 and sum to 1).
  • $\hat{y}^{\langle t+1 \rangle}_i$ represents the probability that the character indexed by "i" is the next character.
  • A softmax() function is provided for you to use.

Additional Hints ¶

  • $x^{\langle 1 \rangle}$ is x in the code. When creating the one-hot vector, make a numpy array of zeros, with the number of rows equal to the number of unique characters, and the number of columns equal to one. It's a 2D and not a 1D array.
  • $a^{\langle 0 \rangle}$ is a_prev in the code. It is a numpy array of zeros, where the number of rows is $n_{a}$, and number of columns is 1. It is a 2D array as well. $n_{a}$ is retrieved by getting the number of columns in $W_{aa}$ (the numbers need to match in order for the matrix multiplication $W_{aa}a^{\langle t \rangle}$ to work.
  • Official documentation for numpy.dot and numpy.tanh

Step 3 : Sampling:

  • Now that you have $y^{\langle t+1 \rangle}$, you want to select the next letter in the dinosaur name. If you select the most probable, the model will always generate the same result given a starting letter. To make the results more interesting, use np.random.choice to select a next letter that is likely , but not always the same.
  • Pick the next character's index according to the probability distribution specified by $\hat{y}^{\langle t+1 \rangle }$.
  • This means that if $\hat{y}^{\langle t+1 \rangle }_i = 0.16$, you will pick the index "i" with 16% probability.

Use np.random.choice .

Example of how to use np.random.choice() :

This means that you will pick the index ( idx ) according to the distribution:

$P(index = 0) = 0.1, P(index = 1) = 0.0, P(index = 2) = 0.7, P(index = 3) = 0.2$.

Note that the value that's set to p should be set to a 1D vector.

  • Also notice that $\hat{y}^{\langle t+1 \rangle}$, which is y in the code, is a 2D array.
  • Also notice, while in your implementation, the first argument to np.random.choice is just an ordered list [0,1,.., vocab_len-1], it is not appropriate to use char_to_ix.values() . The order of values returned by a Python dictionary .values() call will be the same order as they are added to the dictionary. The grader may have a different order when it runs your routine than when you run it in your notebook.
  • Documentation for the built-in Python function range

Docs for numpy.ravel , which takes a multi-dimensional array and returns its contents inside of a 1D vector.

Note that append is an "in-place" operation, which means the changes made by the method will remain after the call completes. In other words, don't do this:

  • The last step to implement in sample() is to update the variable x , which currently stores $x^{\langle t \rangle }$, with the value of $x^{\langle t + 1 \rangle }$.
  • You will represent $x^{\langle t + 1 \rangle }$ by creating a one-hot vector corresponding to the character that you have chosen as your prediction.
  • You will then forward propagate $x^{\langle t + 1 \rangle }$ in Step 1 and keep repeating the process until you get a "\n" character, indicating that you have reached the end of the dinosaur name.
  • You can either create a new numpy array: numpy.zeros
  • Or fill all values with a single number: numpy.ndarray.fill

Expected output

What you should remember :

  • Clip gradients before updating the parameters to avoid exploding gradients
  • Input a "dummy" vector of zeros as a default input
  • Run one step of forward propagation to get 𝑎⟨1⟩ (your first character) and 𝑦̂ ⟨1⟩ (probability distribution for the following character)
  • When sampling, avoid generating the same result each time given the starting letter (and make your names more interesting!) by using np.random.choice

3 - Building the Language Model ¶

It's time to build the character-level language model for text generation!

3.1 - Gradient Descent ¶

In this section you will implement a function performing one step of stochastic gradient descent (with clipped gradients). You'll go through the training examples one at a time, so the optimization algorithm will be stochastic gradient descent.

As a reminder, here are the steps of a common optimization loop for an RNN:

  • Forward propagate through the RNN to compute the loss
  • Backward propagate through time to compute the gradients of the loss with respect to the parameters
  • Clip the gradients
  • Update the parameters using gradient descent

Exercise 3 - optimize ¶

Implement the optimization process (one step of stochastic gradient descent).

The following functions are provided:

Recall that you previously implemented the clip function:

Parameters ¶

  • Note that the weights and biases inside the parameters dictionary are being updated by the optimization, even though parameters is not one of the returned values of the optimize function. The parameters dictionary is passed by reference into the function, so changes to this dictionary are making changes to the parameters dictionary even when accessed outside of the function.
  • Python dictionaries and lists are "pass by reference", which means that if you pass a dictionary into a function and modify the dictionary within the function, this changes that same dictionary (it's not a copy of the dictionary).

3.2 - Training the Model ¶

  • Given the dataset of dinosaur names, you'll use each line of the dataset (one name) as one training example.
  • Every 2000 steps of stochastic gradient descent, you will sample several randomly chosen names to see how the algorithm is doing.

Exercise 4 - model ¶

Implement model() .

When examples[index] contains one dinosaur name (string), to create an example (X, Y), you can use this:

Set the index idx into the list of examples ¶

  • Using the for-loop, walk through the shuffled list of dinosaur names in the list "examples."
  • For example, if there are n_e examples, and the for-loop increments the index to n_e onwards, think of how you would make the index cycle back to 0, so that you can continue feeding the examples into the model when j is n_e, n_e + 1, etc.
  • Hint: n_e + 1 divided by n_e is zero with a remainder of 1.
  • % is the modulo operator in python.

Extract a single example from the list of examples ¶

  • single_example : use the idx index that you set previously to get one word from the list of examples.

Convert a string into a list of characters: single_example_chars ¶

  • single_example_chars : A string is a list of characters.
  • You can use a list comprehension (recommended over for-loops) to generate a list of characters. str = 'I love learning' list_of_chars = [ c for c in str ] print ( list_of_chars )
  • For more on list comprehensions :

Convert list of characters to a list of integers: single_example_ix ¶

  • Create a list that contains the index numbers associated with each character.
  • Use the dictionary char_to_ix
  • You can combine this with the list comprehension that is used to get a list of characters from a string.

Create the list of input characters: X ¶

  • rnn_forward uses the None value as a flag to set the input vector as a zero-vector.
  • Prepend the list [ None ] in front of the list of input characters.
  • There is more than one way to prepend a value to a list. One way is to add two lists together: ['a'] + ['b']

Get the integer representation of the newline character ix_newline ¶

  • Get the integer representation of the newline character '\n' .
  • Use char_to_ix

Set the list of labels (integer representation of the characters): Y ¶

  • For example, Y[0] contains the same value as X[1]
  • Append the integer representation of the newline character to the end of Y .
  • Note that append is an in-place operation.
  • It might be easier for you to add two lists together.

When you run the following cell, you should observe your model outputting random-looking characters at the first iteration. After a few thousand iterations, your model should learn to generate reasonable-looking names.

Conclusion ¶

You can see that your algorithm has started to generate plausible dinosaur names towards the end of training. At first, it was generating random characters, but towards the end you could begin to see dinosaur names with cool endings. Feel free to run the algorithm even longer and play with hyperparameters to see if you can get even better results! Our implementation generated some really cool names like maconucon , marloralus and macingsersaurus . Your model hopefully also learned that dinosaur names tend to end in saurus , don , aura , tor , etc.

If your model generates some non-cool names, don't blame the model entirely -- not all actual dinosaur names sound cool. (For example, dromaeosauroides is an actual dinosaur name and is in the training set.) But this model should give you a set of candidates from which you can pick the coolest!

This assignment used a relatively small dataset, so that you're able to train an RNN quickly on a CPU. Training a model of the English language requires a much bigger dataset, and usually much more computation, and could run for many hours on GPUs. We ran our dinosaur name for quite some time, and so far our favorite name is the great, the fierce, the undefeated: Mangosaurus !

programming assignment dinosaur island character level language modeling

Congratulations! ¶

You've finished the graded portion of this notebook and created a working language model! Awesome job.

By now, you've:

  • Stored text data for processing using an RNN
  • Built a character-level text generation model
  • Explored the vanishing/exploding gradient problem in RNNs
  • Applied gradient clipping to avoid exploding gradients

You've also hopefully generated some dinosaur names that are cool enough to please you and also avoid the wrath of the dinosaurs. If you had fun with the assignment, be sure not to miss the ungraded portion, where you'll be able to generate poetry like the Bard Himself. Good luck and have fun!

4 - Writing like Shakespeare (OPTIONAL/UNGRADED) ¶

The rest of this notebook is optional and is not graded, but it's quite fun and informative, so you're highly encouraged to try it out!

A similar task to character-level text generation (but more complicated) is generating Shakespearean poems. Instead of learning from a dataset of dinosaur names, you can use a collection of Shakespearean poems. Using LSTM cells, you can learn longer-term dependencies that span many characters in the text--e.g., where a character appearing somewhere a sequence can influence what should be a different character, much later in the sequence. These long-term dependencies were less important with dinosaur names, since the names were quite short.

programming assignment dinosaur island character level language modeling

Below, you can implement a Shakespeare poem generator with Keras. Run the following cell to load the required packages and models. This may take a few minutes.

To save you some time, a model has already been trained for ~1000 epochs on a collection of Shakespearean poems called " [The Sonnets](shakespeare.txt) ."

Let's train the model for one more epoch. When it finishes training for an epoch (this will also take a few minutes), you can run generate_output , which will prompt you for an input ( < 40 characters). The poem will start with your sentence, and your RNN Shakespeare will complete the rest of the poem for you! For example, try, "Forsooth this maketh no sense" (without the quotation marks!). Depending on whether you include the space at the end, your results might also differ, so try it both ways, and try other inputs as well.

Congratulations on finishing this notebook! ¶

The RNN Shakespeare model is very similar to the one you built for dinosaur names. The only major differences are:

  • LSTMs instead of the basic RNN to capture longer-range dependencies
  • The model is a deeper, stacked LSTM model (2 layer)
  • Using Keras instead of Python to simplify the code

5 - References ¶

  • This exercise took inspiration from Andrej Karpathy's implementation: https://gist.github.com/karpathy/d4dee566867f8291f086 . To learn more about text generation, also check out Karpathy's blog post .

image

Character level language model – Dinosaurus Island

Description.

Dino Island

August 30, 2022

Build a character-level model to create new Dinosaur names as part of the RNN part of the Coursera Deep Learning Specialization.

The project explores

  • How to store text data for processing using an RNN
  • How to synthesize data, by sampling predictions at each time step and passing it to the next RNN-cell unit
  • How to build a character-level text generation recurrent neural network
  • Why clipping the gradients is important
  • Machine Learning

Dinosaur Island-Character-Level Language Modeling

:slight_smile:

Code Cell UNQ_C1: Function ‘clip’ is correct. Code Cell UNQ_C2: Function ‘sample’ is correct. Code Cell UNQ_C3: Unexpected error (KeyError(‘da_next’)) occurred during function check. We expected function optimize to return Test 3 failed. Please check that this function is defined properly. Code Cell UNQ_C4: Function ‘model’ is correct. If you see many functions being marked as incorrect, try to trace back your steps & identify if there is an incorrect function that is being used in other steps. This dependency may be the cause of the errors.

{moderator edit - solution code removed}

That code looks the same as mine. One thing to check: I’m pretty sure that the notebooks here in Course 5 do not do an automatic “Save” for your when you click “Submit Assignment”. So it’s possible that the grader is seeing an older version of the code. Try manually clicking “Save” and then submit again. Also are you sure that you pass the tests in the notebook (although that is never a guarantee of complete correctness)? Check your results for the “clip” function as well, since that is called here.

programming assignment dinosaur island character level language modeling

Character level language model - Dinosaurus Island

Welcome to Dinosaurus Island! 65 million years ago, dinosaurs existed, and in this assignment, they have returned.

You are in charge of a special task: Leading biology researchers are creating new breeds of dinosaurs and bringing them to life on earth, and your job is to give names to these dinosaurs. If a dinosaur does not like its name, it might go berserk 😉 . So choose wisely!

Luckily you're equipped with some deep learning now, and you will use it to save the day! Your assistant has collected a list of all the dinosaur names they could find, and compiled them into this dataset . (Feel free to take a look by clicking the previous link.) To create new dinosaur names, you will build a character-level language model to generate new names. Your algorithm will learn the different name patterns, and randomly generate new names. Hopefully this algorithm will keep you and your team safe from the dinosaurs' wrath!

By the time you complete this assignment, you'll be able to:

Store text data for processing using an RNN

Build a character-level text generation model using an RNN

Sample novel sequences in an RNN

Explain the vanishing/exploding gradient problem in RNNs

Apply gradient clipping as a solution for exploding gradients

Begin by loading in some functions that are provided for you in rnn_utils . Specifically, you have access to functions such as rnn_forward and rnn_backward which are equivalent to those you've implemented in the previous assignment.

Table of Contents

1 - problem statement, 1.1 - dataset and preprocessing, 1.2 - overview of the model, 2 - building blocks of the model, 2.1 - clipping the gradients in the optimization loop, exercise 1 - clip, 2.2 - sampling, exercise 2 - sample, 3 - building the language model, 3.1 - gradient descent, exercise 3 - optimize, 3.2 - training the model, exercise 4 - model, 4 - writing like shakespeare (optional/ungraded), 5 - references.

Run the following cell to read the dataset of dinosaur names, create a list of unique characters (such as a-z), and compute the dataset and vocabulary size.

The characters are a-z (26 characters) plus the "\n" (or newline character).

In this assignment, the newline character "\n" plays a role similar to the <EOS> (or "End of sentence") token discussed in lecture.

Here, "\n" indicates the end of the dinosaur name rather than the end of a sentence.

char_to_ix : In the cell below, you'll create a Python dictionary (i.e., a hash table) to map each character to an index from 0-26.

ix_to_char : Then, you'll create a second Python dictionary that maps each index back to the corresponding character.

This will help you figure out which index corresponds to which character in the probability distribution output of the softmax layer.

Your model will have the following structure:

Initialize parameters

Run the optimization loop

Forward propagation to compute the loss function

Backward propagation to compute the gradients with respect to the loss function

Clip the gradients to avoid exploding gradients

Using the gradients, update your parameters with the gradient descent update rule.

Return the learned parameters

programming assignment dinosaur island character level language modeling

At each time-step, the RNN tries to predict what the next character is, given the previous characters.

X = ( x ⟨ 1 ⟩ , x ⟨ 2 ⟩ , . . . , x ⟨ T x ⟩ ) \mathbf{X} = (x^{\langle 1 \rangle}, x^{\langle 2 \rangle}, ..., x^{\langle T_x \rangle}) X = ( x ⟨ 1 ⟩ , x ⟨ 2 ⟩ , ... , x ⟨ T x ​ ⟩ ) is a list of characters from the training set.

Y = ( y ⟨ 1 ⟩ , y ⟨ 2 ⟩ , . . . , y ⟨ T x ⟩ ) \mathbf{Y} = (y^{\langle 1 \rangle}, y^{\langle 2 \rangle}, ..., y^{\langle T_x \rangle}) Y = ( y ⟨ 1 ⟩ , y ⟨ 2 ⟩ , ... , y ⟨ T x ​ ⟩ ) is the same list of characters but shifted one character forward.

At every time-step t t t , y ⟨ t ⟩ = x ⟨ t + 1 ⟩ y^{\langle t \rangle} = x^{\langle t+1 \rangle} y ⟨ t ⟩ = x ⟨ t + 1 ⟩ . The prediction at time t t t is the same as the input at time t + 1 t + 1 t + 1 .

In this part, you will build two important blocks of the overall model:

Gradient clipping: to avoid exploding gradients

Sampling: a technique used to generate characters

You will then apply these two functions to build the model.

In this section you will implement the clip function that you will call inside of your optimization loop.

Exploding gradients

When gradients are very large, they're called "exploding gradients."

Exploding gradients make the training process more difficult, because the updates may be so large that they "overshoot" the optimal values during back propagation.

Recall that your overall loop structure usually consists of:

forward pass,

cost computation,

backward pass,

parameter update.

Before updating the parameters, you will perform gradient clipping to make sure that your gradients are not "exploding."

Gradient clipping

In the exercise below, you will implement a function clip that takes in a dictionary of gradients and returns a clipped version of gradients, if needed.

There are different ways to clip gradients.

You will use a simple element-wise clipping procedure, in which every element of the gradient vector is clipped to fall between some range [-N, N].

For example, if the N=10

The range is [-10, 10]

If any component of the gradient vector is greater than 10, it is set to 10.

If any component of the gradient vector is less than -10, it is set to -10.

If any components are between -10 and 10, they keep their original values.

programming assignment dinosaur island character level language modeling

Return the clipped gradients of your dictionary gradients .

Your function takes in a maximum threshold and returns the clipped versions of the gradients.

You can check out numpy.clip for more info.

You will need to use the argument " out = ... ".

Using the " out " parameter allows you to update a variable "in-place".

If you don't use " out " argument, the clipped variable is stored in the variable "gradient" but does not update the gradient variables dWax , dWaa , dWya , db , dby .

Expected values

Now, assume that your model is trained, and you would like to generate new text (characters). The process of generation is explained in the picture below:

programming assignment dinosaur island character level language modeling

Implement the sample function below to sample characters.

You need to carry out 4 steps:

Step 1 : Input the "dummy" vector of zeros x ⟨ 1 ⟩ = 0 ⃗ x^{\langle 1 \rangle} = \vec{0} x ⟨ 1 ⟩ = 0 .

This is the default input before you've generated any characters. You also set a ⟨ 0 ⟩ = 0 ⃗ a^{\langle 0 \rangle} = \vec{0} a ⟨ 0 ⟩ = 0

Step 2 : Run one step of forward propagation to get a ⟨ 1 ⟩ a^{\langle 1 \rangle} a ⟨ 1 ⟩ and y ^ ⟨ 1 ⟩ \hat{y}^{\langle 1 \rangle} y ^ ​ ⟨ 1 ⟩ . Here are the equations:

hidden state: a ⟨ t + 1 ⟩ = tanh ⁡ ( W a x x ⟨ t + 1 ⟩ + W a a a ⟨ t ⟩ + b ) (1) a^{\langle t+1 \rangle} = \tanh(W_{ax} x^{\langle t+1 \rangle } + W_{aa} a^{\langle t \rangle } + b)\tag{1} a ⟨ t + 1 ⟩ = tanh ( W a x ​ x ⟨ t + 1 ⟩ + W aa ​ a ⟨ t ⟩ + b ) ( 1 )

activation: z ⟨ t + 1 ⟩ = W y a a ⟨ t + 1 ⟩ + b y (2) z^{\langle t + 1 \rangle } = W_{ya} a^{\langle t + 1 \rangle } + b_y \tag{2} z ⟨ t + 1 ⟩ = W y a ​ a ⟨ t + 1 ⟩ + b y ​ ( 2 )

prediction: y ^ ⟨ t + 1 ⟩ = s o f t m a x ( z ⟨ t + 1 ⟩ ) (3) \hat{y}^{\langle t+1 \rangle } = softmax(z^{\langle t + 1 \rangle })\tag{3} y ^ ​ ⟨ t + 1 ⟩ = so f t ma x ( z ⟨ t + 1 ⟩ ) ( 3 )

Details about y ^ ⟨ t + 1 ⟩ \hat{y}^{\langle t+1 \rangle } y ^ ​ ⟨ t + 1 ⟩ :

Note that y ^ ⟨ t + 1 ⟩ \hat{y}^{\langle t+1 \rangle } y ^ ​ ⟨ t + 1 ⟩ is a (softmax) probability vector (its entries are between 0 and 1 and sum to 1).

y ^ i ⟨ t + 1 ⟩ \hat{y}^{\langle t+1 \rangle}_i y ^ ​ i ⟨ t + 1 ⟩ ​ represents the probability that the character indexed by "i" is the next character.

A softmax() function is provided for you to use.

Additional Hints

x ⟨ 1 ⟩ x^{\langle 1 \rangle} x ⟨ 1 ⟩ is x in the code. When creating the one-hot vector, make a numpy array of zeros, with the number of rows equal to the number of unique characters, and the number of columns equal to one. It's a 2D and not a 1D array.

a ⟨ 0 ⟩ a^{\langle 0 \rangle} a ⟨ 0 ⟩ is a_prev in the code. It is a numpy array of zeros, where the number of rows is n a n_{a} n a ​ , and number of columns is 1. It is a 2D array as well. n a n_{a} n a ​ is retrieved by getting the number of columns in W a a W_{aa} W aa ​ (the numbers need to match in order for the matrix multiplication W a a a ⟨ t ⟩ W_{aa}a^{\langle t \rangle} W aa ​ a ⟨ t ⟩ to work.

Official documentation for numpy.dot and numpy.tanh

Step 3 : Sampling:

Now that you have y ⟨ t + 1 ⟩ y^{\langle t+1 \rangle} y ⟨ t + 1 ⟩ , you want to select the next letter in the dinosaur name. If you select the most probable, the model will always generate the same result given a starting letter. To make the results more interesting, use np.random.choice to select a next letter that is likely , but not always the same.

Pick the next character's index according to the probability distribution specified by y ^ ⟨ t + 1 ⟩ \hat{y}^{\langle t+1 \rangle } y ^ ​ ⟨ t + 1 ⟩ .

This means that if y ^ i ⟨ t + 1 ⟩ = 0.16 \hat{y}^{\langle t+1 \rangle }_i = 0.16 y ^ ​ i ⟨ t + 1 ⟩ ​ = 0.16 , you will pick the index "i" with 16% probability.

Use np.random.choice .

Example of how to use np.random.choice() :

This means that you will pick the index ( idx ) according to the distribution:

P ( i n d e x = 0 ) = 0.1 , P ( i n d e x = 1 ) = 0.0 , P ( i n d e x = 2 ) = 0.7 , P ( i n d e x = 3 ) = 0.2 P(index = 0) = 0.1, P(index = 1) = 0.0, P(index = 2) = 0.7, P(index = 3) = 0.2 P ( in d e x = 0 ) = 0.1 , P ( in d e x = 1 ) = 0.0 , P ( in d e x = 2 ) = 0.7 , P ( in d e x = 3 ) = 0.2 .

Note that the value that's set to p should be set to a 1D vector.

Also notice that y ^ ⟨ t + 1 ⟩ \hat{y}^{\langle t+1 \rangle} y ^ ​ ⟨ t + 1 ⟩ , which is y in the code, is a 2D array.

Also notice, while in your implementation, the first argument to np.random.choice is just an ordered list [0,1,.., vocab_len-1], it is not appropriate to use char_to_ix.values() . The order of values returned by a Python dictionary .values() call will be the same order as they are added to the dictionary. The grader may have a different order when it runs your routine than when you run it in your notebook.

Step 4 : Update to x ⟨ t ⟩ x^{\langle t \rangle } x ⟨ t ⟩

The last step to implement in sample() is to update the variable x , which currently stores x ⟨ t ⟩ x^{\langle t \rangle } x ⟨ t ⟩ , with the value of x ⟨ t + 1 ⟩ x^{\langle t + 1 \rangle } x ⟨ t + 1 ⟩ .

You will represent x ⟨ t + 1 ⟩ x^{\langle t + 1 \rangle } x ⟨ t + 1 ⟩ by creating a one-hot vector corresponding to the character that you have chosen as your prediction.

You will then forward propagate x ⟨ t + 1 ⟩ x^{\langle t + 1 \rangle } x ⟨ t + 1 ⟩ in Step 1 and keep repeating the process until you get a "\n" character, indicating that you have reached the end of the dinosaur name.

In order to reset x before setting it to the new one-hot vector, you'll want to set all the values to zero.

You can either create a new numpy array: numpy.zeros

Or fill all values with a single number: numpy.ndarray.fill

Expected output

What you should remember :

Very large, or "exploding" gradients updates can be so large that they "overshoot" the optimal values during back prop -- making training difficult

Clip gradients before updating the parameters to avoid exploding gradients

Sampling is a technique you can use to pick the index of the next character according to a probability distribution.

To begin character-level sampling:

Input a "dummy" vector of zeros as a default input

Run one step of forward propagation to get 𝑎⟨1⟩ (your first character) and 𝑦̂ ⟨1⟩ (probability distribution for the following character)

When sampling, avoid generating tzhe same result each time given the starting letter (and make your names more interesting!) by using np.random.choice

It's time to build the character-level language model for text generation!

In this section you will implement a function performing one step of stochastic gradient descent (with clipped gradients). You'll go through the training examples one at a time, so the optimization algorithm will be stochastic gradient descent.

As a reminder, here are the steps of a common optimization loop for an RNN:

Forward propagate through the RNN to compute the loss

Backward propagate through time to compute the gradients of the loss with respect to the parameters

Clip the gradients

Update the parameters using gradient descent

Implement the optimization process (one step of stochastic gradient descent).

The following functions are provided:

Recall that you previously implemented the clip function:

Note that the weights and biases inside the parameters dictionary are being updated by the optimization, even though parameters is not one of the returned values of the optimize function. The parameters dictionary is passed by reference into the function, so changes to this dictionary are making changes to the parameters dictionary even when accessed outside of the function.

Python dictionaries and lists are "pass by reference", which means that if you pass a dictionary into a function and modify the dictionary within the function, this changes that same dictionary (it's not a copy of the dictionary).

Given the dataset of dinosaur names, you'll use each line of the dataset (one name) as one training example.

Every 2000 steps of stochastic gradient descent, you will sample several randomly chosen names to see how the algorithm is doing.

Implement model() .

When examples[index] contains one dinosaur name (string), to create an example (X, Y), you can use this:

Set the index idx into the list of examples

Using the for-loop, walk through the shuffled list of dinosaur names in the list "examples."

For example, if there are n_e examples, and the for-loop increments the index to n_e onwards, think of how you would make the index cycle back to 0, so that you can continue feeding the examples into the model when j is n_e, n_e + 1, etc.

Hint: n_e + 1 divided by n_e is zero with a remainder of 1.

% is the modulo operator in python.

Extract a single example from the list of examples

single_example : use the idx index that you set previously to get one word from the list of examples.

Convert a string into a list of characters: single_example_chars

single_example_chars : A string is a list of characters.

You can use a list comprehension (recommended over for-loops) to generate a list of characters.

For more on list comprehensions :

Convert list of characters to a list of integers: single_example_ix

Create a list that contains the index numbers associated with each character.

Use the dictionary char_to_ix

You can combine this with the list comprehension that is used to get a list of characters from a string.

Create the list of input characters: X

rnn_forward uses the None value as a flag to set the input vector as a zero-vector.

Prepend the list [ None ] in front of the list of input characters.

There is more than one way to prepend a value to a list. One way is to add two lists together: ['a'] + ['b']

Get the integer representation of the newline character ix_newline

ix_newline : The newline character signals the end of the dinosaur name.

Get the integer representation of the newline character '\n' .

Use char_to_ix

Set the list of labels (integer representation of the characters): Y

The goal is to train the RNN to predict the next letter in the name, so the labels are the list of characters that are one time-step ahead of the characters in the input X .

For example, Y[0] contains the same value as X[1]

The RNN should predict a newline at the last letter, so add ix_newline to the end of the labels.

Append the integer representation of the newline character to the end of Y .

Note that append is an in-place operation.

It might be easier for you to add two lists together.

When you run the following cell, you should observe your model outputting random-looking characters at the first iteration. After a few thousand iterations, your model should learn to generate reasonable-looking names.

You can see that your algorithm has started to generate plausible dinosaur names towards the end of training. At first, it was generating random characters, but towards the end you could begin to see dinosaur names with cool endings. Feel free to run the algorithm even longer and play with hyperparameters to see if you can get even better results! Our implementation generated some really cool names like maconucon , marloralus and macingsersaurus . Your model hopefully also learned that dinosaur names tend to end in saurus , don , aura , tor , etc.

If your model generates some non-cool names, don't blame the model entirely -- not all actual dinosaur names sound cool. (For example, dromaeosauroides is an actual dinosaur name and is in the training set.) But this model should give you a set of candidates from which you can pick the coolest!

This assignment used a relatively small dataset, so that you're able to train an RNN quickly on a CPU. Training a model of the English language requires a much bigger dataset, and usually much more computation, and could run for many hours on GPUs. We ran our dinosaur name for quite some time, and so far our favorite name is the great, the fierce, the undefeated: Mangosaurus !

programming assignment dinosaur island character level language modeling

Below, you can implement a Shakespeare poem generator with Keras. Run the following cell to load the required packages and models. This may take a few minutes.

To save you some time, a model has already been trained for ~1000 epochs on a collection of Shakespearean poems called " The Sonnets ."

Let's train the model for one more epoch. When it finishes training for an epoch (this will also take a few minutes), you can run generate_output , which will prompt you for an input ( < 40 characters). The poem will start with your sentence, and your RNN Shakespeare will complete the rest of the poem for you! For example, try, "Forsooth this maketh no sense" (without the quotation marks!). Depending on whether you include the space at the end, your results might also differ, so try it both ways, and try other inputs as well.

Congratulations on finishing this notebook!

The RNN Shakespeare model is very similar to the one you built for dinosaur names. The only major differences are:

LSTMs instead of the basic RNN to capture longer-range dependencies

The model is a deeper, stacked LSTM model (2 layer)

Using Keras instead of Python to simplify the code

This exercise took inspiration from Andrej Karpathy's implementation: https://gist.github.com/karpathy/d4dee566867f8291f086 . To learn more about text generation, also check out Karpathy's blog post .

IMAGES

  1. GitHub

    programming assignment dinosaur island character level language modeling

  2. Sequence-Models/Dinosaurus-Island-Character-level-language-model.ipynb

    programming assignment dinosaur island character level language modeling

  3. Sequence model W1 Dinosaur Island-Character-Level Language Modeling

    programming assignment dinosaur island character level language modeling

  4. Dinosaurus_Island_Character_level_language_model_final_v3b

    programming assignment dinosaur island character level language modeling

  5. Dinosaurus Island Character level language model final v3a.pdf

    programming assignment dinosaur island character level language modeling

  6. Dinosaurus_Island_Character_level_language_model_final_v3b

    programming assignment dinosaur island character level language modeling

VIDEO

  1. Write Dinosaur English Essay

  2. Dinosaurs island EP2 session 1

  3. Old jurassic world dinosaur island set review (include indoraptor)

  4. Model

  5. What Island does each Dinosaur live on? Part 2

  6. What Island does each dinosaur live on? Part 1

COMMENTS

  1. Dinosaurus_Island_Character_level_language_model_final_v3b

    The characters are a-z (26 characters) plus the "\n" (or newline character). In this assignment, the newline character "\n" plays a role similar to the <EOS> (or "End of sentence") token we had discussed in lecture. Here, "\n" indicates the end of the dinosaur name rather than the end of a sentence.

  2. Dinosaurus Island -- Character level language model final

    Deep Learning Specialization by Andrew Ng on Coursera. - deep-learning-coursera/Sequence Models/Dinosaurus Island -- Character level language model final - v3.ipynb at master · Kulbear/deep-learning-coursera

  3. Character level language model

    The characters are a-z (26 characters) plus the "\\n" (or newline character), which in this assignment plays a role similar to the <EOS> (or "End of sentence") token we had discussed in lecture, only here it indicates the end of the dinosaur name rather than the end of a sentence. In the cell below, we create a python dictionary (i.e., a hash table) to map each character to an index from 0-26.

  4. Character level language model

    The characters are a-z (26 characters) plus the "\\n" (or newline character). In this assignment, the newline character "\\n" plays a role similar to the <EOS> (or "End of sentence") token discussed in lecture. Here, "\\n" indicates the end of the dinosaur name rather than the end of a sentence.

  5. Character level language model

    Character level language model - Dinosaurus Island. Welcome to Dinosaurus Island! 65 million years ago, dinosaurs existed, and in this assignment they are back. You are in charge

  6. Dinosaurus Island -- Character level language model final

    The characters are a-z (26 characters) plus the "\n" (or newline character), which in this assignment plays a role similar to the <EOS> (or "End of sentence") token we had discussed in lecture, only here it indicates the end of the dinosaur name rather than the end of a sentence. In the cell below, we create a py dictionary (i.e., a hash table) to map each character to an index from 0-26.

  7. sushantdhumak/Dinosaurus_Island_Character: Coursera

    To create new dinosaur names, we will build a character-level language model to generate new names. Our algorithm will learn the different name patterns, and randomly generate new names. Hopefully this algorithm will keep you and your team safe from the dinosaurs' wrath! By the time we complete this assignment, we'll be able to:

  8. Dinosaurus Island -- Character level language model · GitHub

    import scorch.autograd._. * Assignment 2 in week 1 of the Coursera course "Recurrent Neural Networks" by deeplearning.ai and Andrew Ng. * Implementation in Scala with Scorch. * We will build a character level language model to generate new dinosaur names. * The algorithm will learn the different name patterns, and randomly generate new names.

  9. Character level language model

    The characters are a-z (26 characters) plus the "\\n" (or newline character). In this assignment, the newline character "\\n" plays a role similar to the <EOS> (or "End of sentence") token we had discussed in lecture. Here, "\\n" indicates the end of the dinosaur name rather than the end of a sentence.

  10. Character level language model

    Character level language model - Dinosaurus Island. import numpy as np from utils import * import random import pprint import copy. data = open ('dinos.txt', 'r'). read () ... There are 19909 total characters and 27 unique characters in your data. chars = sorted (chars) print (chars)

  11. Dinosaurus Island -- Character level language model final

    The characters are a-z (26 characters) plus the "\n" (or newline character), which in this assignment plays a role similar to the <EOS> (or "End of sentence") token we had discussed in lecture, only here it indicates the end of the dinosaur name rather than the end of a sentence. In the cell below, we create a python dictionary (i.e., a hash table) to map each character to an index from 0-26.

  12. Dinosaurus_Island_Character_level_language_model

    Character level language model - Dinosaurus Island¶ Welcome to Dinosaurus Island! 65 million years ago, dinosaurs existed, and in this assignment, they have returned. You are in charge of a special task: Leading biology researchers are creating new breeds of dinosaurs and bringing them to life on earth, and your job is to give names to these ...

  13. Dinosaurus_Island_Character_level_language_model_final_v3b.ipynb

    You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window.

  14. Character level language model

    Lee Hodg. Character level language model - Dinosaurus Island. Description. Coursera. Dino Island. August 30, 2022. Build a character-level model to create new Dinosaur names as part of the RNN part of the Coursera Deep Learning Specialization. The project explores. How to store text data for processing using an RNN.

  15. Dinosaur Island-Character-Level Language Modeling

    That code looks the same as mine. One thing to check: I'm pretty sure that the notebooks here in Course 5 do not do an automatic "Save" for your when you click "Submit Assignment". So it's possible that the grader is seeing an older version of the code.

  16. Sequence-Models-coursera/Week 1/Dinosaur Island -- Character-level

    Sequence Models by Andrew Ng on Coursera. Programming Assignments and Quiz Solutions. - Sequence-Models-coursera/Week 1/Dinosaur Island -- Character-level language model/Dinosaurus+Island+--+Character+level+language+model+final+-+v3.ipynb at master · gyunggyung/Sequence-Models-coursera

  17. Character level language model

    Luckily you're equipped with some deep learning now, and you will use it to save the day! Your assistant has collected a list of all the dinosaur names they could find, and compil

  18. Dinosaurus_Island_Character_level_language_model.ipynb

    You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window.

  19. PJY-609/Dinosaur-Island----Character-level-language-model

    Contribute to PJY-609/Dinosaur-Island----Character-level-language-model development by creating an account on GitHub. ... PJY-609/Dinosaur-Island----Character-level-language-model. This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

  20. YueyangJiang97/Dinosaurus_Island_Character_level_language_model

    Dinosaurus_Island_Character_level_language_model. About. No description, website, or topics provided. Resources. Readme Activity. Stars. 0 stars Watchers. 1 watching Forks. 0 forks Report repository Releases No releases published. Packages 0. No packages published . Languages. Jupyter Notebook 83.4%; Python 16.6%; Footer