Week 2- Book Summary Generator

Mehmet Akoguz
3 min readApr 18, 2021

Due to the lack of necessary datasets we had to change our Project content to some extent. Now we will use a dataset which contains summaries of different books of no particular genre from Wikipedia. We will use these summaries to classify the books into groups. Then we will use language generation to make new content from this data.

Most Frequent Tags Used

How the Dataset is structured:

Dataset consists of a Book ID, a hash of the book, name of the book, writers name and the publication date, the hashes of tags and the tags respectively and the summary which is a one continuous string in the end. All the different data is separated by a tab and the tags are made into a dictionary.

What is Transfer Learning?

Simply transfer learning is the use of a previously trained model in a new problem. It provides training of deep neural networks with smaller data. Since there are generally no millions of data points for real-world problems, it makes sense to use this approach in a train.

A well-trained and successful model for the solution of a problem is the process of applying the weights of a model to the neural network that was created to solve a new problem similar to your problem. It makes sense to use the knowledge of the neural network, which is trained with a large number of data, rather than learning it from scratch with little data at hand.

What is Perplexity?

In general, perplexity is a measurement of how well a probability model predicts a sample. In the context of Natural Language Processing, perplexity is one way to evaluate language models.

What is Natural Language Generation?

Natural Language Generation, as defined by Artificial Intelligence: Natural Language Processing Fundamentals, is the “process of producing meaningful phrases and sentences in the form of natural language.” In its essence, it automatically generates narratives that describe, summarize or explain input structured data in a human-like manner at the speed of thousands of pages per second.

Our Plan:

· We will clean our data of irregularities

· Transfer Learning: We will use a pre-trained NLP model to process our data.

· Data Augmentation: Modification of the data to suit our needs. If needed we will modify our tags to form a better system. The reason for that is there are an abundance of tag varieties and most of them are somewhat similar or in the same upper genre.

· Using Perplexity Intuition

· New Language Generation

Project Group:

--

--