- Icecaps provides an array of tools for building neural conversational systems
- Icecaps is intended for Python environments and is built on top of TensorFlow
Microsoft Icecaps is a new open-source natural language processing (NLP) library focused on building intelligent conversation agents that can communicate naturally with humans.
Icecaps – which stands for Intelligent Conversation Engine: Code and Pre-trained Systems – provides an array of tools for users to build and customize conversational systems. Icecaps is currently on version 0.1.
Icecaps is intended for Python environments and is built on top of TensorFlow. It is recommended to use Icecaps in an Anaconda environment with Python 3.7.
Microsoft Icecaps’ design is based on a component-chaining architecture, where models are represented as chains of components (e.g. encoders and decoders) that data flows through. This enables complex multi-task learning environments with shared components between tasks.
The toolkit includes recent advances in conversational modelling such as Personalization embeddings, SpaceFusion and MRC-based knowledge grounding models.
There are also customized decoding tools that allow users to employ maximum mutual information, token filtering and repetition penalties to improve response quality and diversity, Microsoft writes on its website.
“Data processing tools are provided for users to easily convert their text data sets into binarized TFRecords. Our data processor features various text preprocessing tools, including byte pair encoding and fixed-length multi-turn context extraction,” it says.
Microsoft is also working on a number of other features for Icecaps 0.2 such as:
- More models, including stochastic answer networks and personalized transformers
- Lexical and contextual embedding generators
- New data processing features, including functionality for processing tree-structured JSON data
- An interactive GUI-based decoding session with improved flexibility
The GitHub repository for Icecaps features example scripts that users may use as templates to bootstrap their own projects.
There are several other open-source NLP-oriented toolkits. Some are
- Tensor2Tensor maintained by Google Brain
- AllenNLP – a PyTorch library developed byAI2 for natural language processing tasks
- OpenNMT – a popular neural machine translation toolkit originally developed for LuaTorch that now has implementations in PyTorch and TensorFlow.
- MarianNMT is another framework for neural machine translation developed between the Adam Mickiewicz University in Pozna and the University of Edinburgh.