* All rights of the data reserved for Speech Processing and Machine Intelligence Lab (SPMILab), Tsinghua University, Beijing * Iqiyi movie dialog dataset is collected and labeled under a crowdsourcing Wizard-of-Oz framework, as described in the following ICASSP-2018 paper. All the dialog data are in Chinese. After careful cleaning of the raw data, we collected 800 dialogues in total. There are 7 informable slots: Film name, Director, Actor, Genre, Country, Time, Payment, and 11 requestable slots: 7 from informable slots and 4 extra are Release_date, Critic_rating, Movie_length, Introduction. The organization of the files is as follows: Iqiyi_800.json This file contains 800 dialogs in total. Each dialog consists of the dialog data turn by turn and the labels for both informable slots and requestable slots. Iqiyi_ONTO.json This file contains the ontology of the dialog data, namely the slots and values. New_words_for_word_segmentation This file contains the new words used for Chinese word segmentation. We use the Jieba chinese word segmentation in our experiments. Synonym_lists.txt This file contains a list of synonym words. It is used for training Chinese word vector and constructing sematic dictionary. word_vectors.25d.pkl This file contains the pretrained 25-dimension Chinese word vector which is trained on the dialog corpus. Reference: Yinpei Dai, Zhijian Ou, et al. Tracking of enriched dialog states for flexible conversational information access. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018