Ok. You have heard this before. But when it comes to analytics, this takes a different turn. Here the bigger problem is not “Come on”, but it is “George”. If you are designing natural language query for analytics, “come on George” would have George as your primary concern for disambiguation rather than “come on” which would be a key area to address. Why? In Analytics, the first step we have to get out of the way is that is George Town, Mr George Mason, George and Company or George Cream or Georges garage which is sprinkled all over your data.
But wait it is more than that!
The steps to choose the right “George” depends on where “George” is in the sentence and if George is alone or is with some other words.
If I differentiate between Machine Learning and Analytics Learning, in machine learning, the index or the data is being built on interaction and interpretation. In Analytics, it is rebuilt on each step when data is indexed or added from the source.
Analytics world comprises of Dimensions and Matrices. For Natural language query, we usually classify the structure as Dimensions, Dates and Time and Matrices. The classification is purely to differentiate the value under each and process them appropriately. We will discuss this more in the future articles on Clickless Analytics.
Before we go further, let us acknowledge and put in two lines of how difficult it is for someone to understand “Come on George” unless the previous few sentences have been contextualised. The statement on its own has no power without previous statements even if there is extensive data in the learning store. There are multiple levels of contexts, like from the environment – for example, it is a pizza store, or user data, like the age group and shopping habits on a general e-commerce site. But by and large, if there is no previous input the sentence will hang fire for a long time unless there is some more input.
On the other hand, “Come on George” can be the only input into the analytics engine, and it will be okay.
In the Analytics NLQ after we classify, tokenize and get the POS, our need to get the sentiment is minimal. Our need is to disambiguate each token and put it in the right context.
In our world of Clickless Analytics from Smarten, we start with “come” a verb, “on” a preposition and “George” a noun. You can have an exclamation mark after the phrase, but that will have a negligible impact than an operator like <. While in ML based architecture the exclamation mark could play a detrimental role. This we will discuss in another article.
Smarten next looks for dependencies based on our unique indexing and aggregation engine. So “Come”, other than it being a verb would result in a part of a product name if we are selling books, like a book called “City Come A-Walking”. “On” would be treated as a preposition. A preposition is dependent on the following noun, but in our engine, the noun could be anywhere. “George” would be a customer name, salesman’s name and part of the city name Georgia in our indexing.
What does “come on George” mean in this dataset and what data should be displayed on executing this?
Yes, I am going to answer this. But like a real Author, I must keep you interested. We will be describing what is called the dependency engine in the coming article.
Why does one need a different NLQ engine in Analytics? The data changes!
In learning engines, data in the past along with the context are put in learning matrix. In this, the book with the title including the word “come” did not exist. The new data does. “George” joining the company this month as a salesman and “George Town” is his territory.
Now add one more complication to this. There is a dimension called City and a value called City. We will look at this too in the coming articles.
Debate on this is welcome since this the second part of the series of article.
Read more articles on “Clickless Analytics”: