Skip to main content

Posts

Showing posts from November, 2017

Dialog Management

Dialogue management is a sequential decision-making process.  We can represent dialogue management with a dynamic Bayesian network (DBN) with two assumptions that network is stationary (that is,  probability  P ( X t | X t − 1 ) is identical for all values of t)   and Markov assumption holds true. DBN must also be able to calculate the relative utility of various actions possible in the current state. Considering the fact that nodes in our DBN may be affected the previous value of same or other nodes, the dialogue management is well represented by Dynamic Decision network (DDN). Representing the DDN as probabilistic graph model, we can use generalized variable elimination or likelihood weighting as two approaches for deriving inferences.  Using these inference algorithms, dialogue manager can update dialogue states on receiving new observations and select an appropriate action based on the new or updated state. Finding initial distribution...

Time Series Analysis

In time series, observations are not independent as opposed to cross-sectional data where one observation has no bearing on any other observations.  The goal of the time series is to find such relationship between current observation and its past observations and thus help in predicting future value. The response Y in time-series is composed of: level trend seasonality cycle auto-correlation noise Thus Yt = level + trend + season/cycle + noise.  This noise, even after removing level, trend

Information Retrieval System

Information retrieval (IR) task deals with finding all relevant documents related to user query. Central concepts to IR are removing stop words from corpus (collection of all documents) and query, stemming, lemmatization, representing documents and query to vector, and using some measure of proximity or distance to determine which documents could be relevant to query. Each word in the document after going through stemming and lemmatization is called term. Each unique term in corpus is represented as one dimension in document space. Thus the vectors representing the documents can have more than 10,000 dimension and thus suffer from high dimensionality. Since not all words occur in each document, documents vectors are very sparse and these words seems to follow Zipf's distribution. Value of each term in the document vector could be binary (term occur in the document or not), frequency (how often that term is found in the document), or using term frequency-inverse document frequen...

Sensing room occupancy

Many businesses are interested in knowing the utilization of conference rooms at their premises.  Audio devices in these conference rooms can be used to measure that utilization. When plotted, the audio activity detected by the microphone shows a bimodal characteristic with one mode representing when room is occupied and the other mode when room is not occupied. Here, I show an image of a bimodal graph generated through R-code. Here is the R-code: x <- c(rnorm(5000,1,1),rnorm(10000,9,1))  ggplot(data.frame(x=x))+geom_density(aes(x=x)) Our goal is to identify the mean and standard deviation of each mode. Following R code will tell the mean, standard deviation and proportion of data belonging to each node. set.seed(50) > bimodal <- normalmixEM(x, k = 2) number of iterations= 9  > bimodal$mu [1] 0.9992427 9.0070292 > bimodal$sigma   [1] 0.9922105 0.9964136 > bimodal$lambda [1] 0.3332021 0.6667979 > ...