Version0 /
GMTKFIXME: This likely does not belong here - this is referring to a specific accessor and not a module. However, since this is not in itself its own specific accessor but instead a particular use case it is unclear where to document this To get started using GMTK with the ptolemy accessor host, the first step is to install the GMTK from here. Be sure to get the latest version (1.3.3 or above). Extract the GMTK package and go through the installation steps provided in the GMTK manual or README. Acquire the json2gmtk package. There are two ways to interface with the GMTK through ptolemy: the webSocket accessor and the Shell accessor. We will focus on the webSocket accessor If you are planning to interface with GMTK using the webSocket accessor, you will need to clone and build websocketd, and add it to your path. This is described in the README that is displayed on the json2gmtk package page. GMTK CommandsStart with a gmtk_online.sh script something like the following: #! /bin/sh json2stream 39 0 | \ gmtkOnline \ -os1 - -nf1 39 -fmt1 ascii \ -strFile applause_detector.str \ -inputMasterFile applause_detector.mtr \ -inputTrainable applause_detector.gmp \ -mVitValsFile - -viterbiScore | \ vit2json The 39 there is for 39-element MFCC feature vectors like we use in the TIMIT speech recognition model. The number will need to change based on the size of the input feature vectors for the final model.
applause_detector.str is a textual specification of the dynamic Bayesian network's graphical structure. For an HMM this is pretty simple. applause_detector.mtr defines some of the non-graphical parameters of the DBN - any needed decision trees, deterministic conditional probability tables, etc. needed to define the model. applause_detector.gmp contains the numerical learned parameters of the model - state transition probabilities, means and covariances in the case of an HMM. json2stream converts the JSON input described below to GMTK's native streaming data format. vit2json converts GMTK's output to JSON as described below. json2stream and vit2json are available from https://bitbucket.org/rprogers/json2gmtk along with a little bit of documentation. You would configure the Exec accessor (or the GMTK-specialized subclass thereof) to run the gmtk_online.sh script. To use the WebSocketClient accessor (or the GMTK-specialized subclass thereof), you would run websocketd --port=8080 ./gmtk_online.sh
on the server machine and then aim the accessor at ws://server.ip.address:8080 Note that the inputs and outputs of the Exec- and WebSocket-based accessors are the same, so they should be interchangeable. We will probably want to have off-line data files containing feature vectors to test with. This can be done with the Exec or WebSocketClient accessors as well. Create a data_source.sh script: #! /bin/sh obs-cat \ -fmt1 pfile \ -of1 test_data.pfile \ -binaryOutput F | \ stream2json 39 0 Now data_source.sh can be run by an Exec accessor, or wrapped in another websocketd instance and accessed by a WebSocketClient accessor. The data_source accessor's output goes directly into the gmtk_online accessor's input. GMTK Accessor InputThe input is a sequence of frames. Each frame is a JSON array of length 2. The first element is an array of numbers containing the real-valued observations of variables in the model. The second element is an array of numbers containing the integer-valued observations of variables in the model. Note that JSON doesn't have an integer data type, so both elements are arrays of "number," but GMTK treats the first as an array of float and the second as an array of unsigned integers. For the above example with 39 real-valued MFCC features and no discrete (integer-valued) observations, a frame would like like Either the first array or the second array can be empty, but not both (there must be a positive number of observed variables). Also note that the WebSocketClient accessor seems to incorrectly produce empty objects ({}) instead of empty arrays ([]) in its output. The Exec accessor seems to handle empty arrays correctly. An empty frame, represented as an empty array [], signals that the following frames should be treated as conditionally independent of the previous frames. In an HMM, this means that the frame after the [] would use the initial state probability distribution rather than the state-to-state transition distribution. In GMTK's more general DBNs, it means the model starts over in the prolog instead of proceeding to the next chunk. Also, frame numbers restart at 0 after an empty frame. Two empty frames, [][], signals the end of the input stream. The gmtk_online.sh process will exit, and the Exec accessor would have to launch a new command or the WebSocketClient accessor initiate a new connection to the server to process any further data. GMTK Accessor OutputThe output is also a sequence of frames, but the sequence may be asynchronous and shorter with respect to the input sequence. An output frame is a JSON array of most-probable explanation (MPE) objects. (Jeff - we should think about how to extend this for k-best.) Each MPE object has 3 fields: "varibleName" is the name of a hidden variable in the model (string), "frame" is the input frame number (integer) the MPE value corresponds to, and "MPEvalue" is the most likely value of the hidden variable at that frame (integer or string). As in the input, an empty frame, [], signals the end of a segment of conditionally dependent output, and two empty frames, [][], signals the end of all output (gmtkOnline process ends). Also note that the WebSocketClient accessor incorrectly produces {} instead of [] here as well. As an example, if our applause_detection model has a hidden variable named "SoundEvent" with possible values "silence", "applause", and "humming", the output might look like: [ {"MPEvlaue":"silence", "frame":0, "variableName":"SoundEvent"} ] [ {"MPEvalue":"applause", "frame":42, "variableName":"SoundEvent"} ] [ {"MPEvalue":"humming", "frame":53, "variableName":"SoundEvent"} ] [ {"MPEvalue":"silence", "frame":63, "variableName":"SoundEvent"} ] [ {"MPEvalue":"silence", "frame":64, "variableName":"SoundEvent"} ] [] [] |