Skip to content Skip to sidebar Skip to footer

Strongly Increasing Memory Consumption When Using Elmo From Tensorflow-hub

I am currently trying to compare the similarity of millions of documents. For a first test on a CPU I reduced them to around 50 characters each and try to get the ELMo Embedding fo

Solution 1:

This is for TensorFlow 1.x without Eager mode, I suppose (or else the use of hub.Module would likely hit bigger problems).

In that programming model, you need to first express your computation in a TensorFlow graph, and then execute that graph repeatedly for each batch of data.

  • Constructing the module with hub.Module() and applying it to map an input tensor to an output tensor are both parts of graph building and should happen only once.

  • The loop over the input data should merely call session.run() to feed input and fetch output data from the fixed graph.

Fortunately, there is already a utility function to do all this for you:

import numpy as np
import tensorflow_hub as hub

# For demo use only. Extend to your actual I/O needs as you see fit.
inputs = (x for x in ["hello world", "quick brown fox"])

with hub.eval_function_for_module("https://tfhub.dev/google/elmo/2") as f:
  for pystr in inputs:
    batch_in = np.array([pystr])
    batch_out = f(batch_in)
    print(pystr, "--->", batch_out[0])

What this does for you in terms of raw TensorFlow is roughly this:

module = Module(ELMO_OR_WHATEVER)
tensor_in = tf.placeholder(tf.string, shape=[None])  # As befits `module`.
tensor_out = module(tensor_in)

# This kind of session handles init ops for you.with tf.train.SingularMonitoredSession() as sess:
  for pystr in inputs:
    batch_in = np.array([pystr])
    batch_out = sess.run(tensor_out, feed_dict={tensor_in: batch_in}
    print(pystr, "--->", batch_out[0])

If your needs are too complex for with hub.eval_function_for_module ..., you could build out this more explicit example.

Notice how the hub.Module is neither constructed nor called in the loop.

PS: Tired of worrying about building graphs vs running sessions? Then TF2 and eager execution are for you. Check out https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/tf2_text_classification.ipynb

Post a Comment for "Strongly Increasing Memory Consumption When Using Elmo From Tensorflow-hub"