Week 2 Meeting 2/29

Paper overview and teams

Attendance

  • Present: Alicia, Andrew, James, Praveen
  • Excused absence: Jasmine
  • Unexcused absence: Jim

Recap

Teams and Tasks

We are lucky to have access to source code, data, and pretrained models which the paper authors and other researchers have released.

However, we also face a challenge in that the network takes a long time to train. It’s unclear at this point how much 2-gpu training time is needed to replicate the pretrained checkpoint, which used either 1152 or 3456 GPU hours (the GitHub and paper disagree on the exact number). This will also make it harder to test our re-implementation and make sure it’s working correctly.

Because of this, we will try working on different tasks in parallel. We will split into 3 groups of 2, each group working on a different task. Every 2 weeks, there will be an opportunity to switch groups/tasks so that everyone can try working on different tasks throughout the semester.

Team assignments for the next 2 weeks were decided at the meeting:

  1. (James, Alicia) Understanding and re-implementing the UMTN architecture.
    • Reading and writing a lot of Pytorch and integrating UMTN’s custom CUDA kernels with Pytorch (rather than rewriting them from scratch, which would be impractical).
    • You can reference the original source code and use the custom CUDA kernels, but should try to rewrite the Pytorch code “from scratch” whenever possible (both for learning the network structure better and so that we can get the experience of writing and maintaining more complex Pytorch code).
  2. (Praveen, Andrew) Experiments related to training the UTMN.
    • e.g. training on new data, changing hyperparameters, adding a new loss term or even modifying the architecture
    • You can use the original source code for experiments
  3. (Jasmine, Jim) Experiments related to evaluating the UTMN.
    • e.g. adding two latent space vectors together and seeing what the decoded result is (this was also done in the original paper – perhaps try another modification of latent vectors e.g. an affine transform?), training a neural network on the latent space or otherwise generating/modifying latent space vectors, evaluating on new data
    • You can use the original source code and the pretrained model.

What to do by next meeting

If you missed the meeting:

Reimplementation team:

Training experiments team:

  • Brainstorm ideas for experiments
  • Run a training experiment (currently in progress)

Evaluation experiments team:

  • Brainstorm ideas for experiments
  • Try to get the sample code to run on CSUA using the pretrained model. When I did this there were a lot of issues to debug (like compiling the CUDA kernels for the CSUA’s GPUs). So let me know if anything breaks, as it almost certainly will.

Presenter for GM:

  • Present results of this week’s meeting at GM
  • TBD whether we can use this summary document or will need to create a Google slide.
Written on February 29, 2020