Tuesday, April 7, 2020

few shot learning...

A metric is the key to learning.  But how to define a metric in a non-numeric space?

The challenge in symbolic learning is measuring the relationship between categories.  The current style in the deep learning community is to convert the categories to numbers and then train models on the numbers.

So the latest and greatest approach in the deep learning community is to create an embedding.  The embedding maps the categories to a numeric space.  Ideally the numeric space is created in a manner that maximizes the space.  Imagine a square where all the data falls into the top right corner, that is not an efficient use of the space.  Similarly the embedding tries to create a space that utilized efficiently, spreading the data in the space.

Well this mean that the embedding creates a space relative to the data set it is trained on.  Hence, the distribution of the training data is critical in determining the space.  When data from a new distribution appears it will not map well into the embedding.

Hence in the few-shot learning world, were very little is available for the training phase and it is assumed new distribution will arrive, it is not enough to just train a classifier, or search in the embedded space, it is necessary to recreate the embedding with the new data, otherwise the classifier will be trapped looking only in the top left corner of the space.

So now I understand what Bengio is saying, you learn the embedding in a broad space, then you use attention to focus only on part of that space for the problem specific part of that space.

Now if the embedding is broad enough, it didn't learn anything.  If the embedding is narrow it learnt the training distribution.  I get that.


Do they do that?

http://papers.nips.cc/paper/6996-prototypical-networks-for-few-shot-learning.pdf


So, it looks like they create the embedding with the entire train dataset?  I could not figure out if the embedding included data that was later defined as validation or test data or only built upon the training data...

—————
I was trying to understand if when you create the embeddings you train the networks on the entire train set? or the entire dataset?

And later when you define a train/test set for the prototypical networks, the train/test is similarly separated.

So for example, if say the embedding is created for categories A,B & C and then you build the prototypical network on A, B & C, then when categories X & Y come along, you utilized the existing embedding network, map X & Y to the embedded space and then import them into the prototypical network?  Or perhaps A,B,C & X,Y are all utilized to create the embedding, but only A,B,C are employed to create the prototypical network?
—————-

what is the analogy in the Ashlag/Kook world?

No comments:

Post a Comment