Finding symmetries in an unsymmetrical world ..

Protected: Don’t be so focussed on the greener grass on other side, that you forget to water your own plants – aka being a superwomen

Posted by: Ekta Grover on: October 8, 2015

In: Uncategorized
Enter your password to view comments.

Custom Distance metric for handling Typographical errors optimized for a QWERTY keyboard

Posted by: Ekta Grover on: July 20, 2013

Problem statement:

For hand collected /crafted data sets, as with manually collected datasets from survey data and other data sets where the experience of the person keying in the data on a digital device, is short of an acceptable standard – the distance conventional metrics will fail. This is because these distance metrics “assume” that the data representation in the data sets proxies for the intent of the end-user, which is not true, when intentional errors creep us.

This requires a way to model for the unintentional errors that have creeped up since, and requires to model for the very structure of the input keyboard that is used to key in the data. In this algorithm, I explicitly model the QWERTY keyboard as a graphical entity, where I model the 26 alphabets of the English language as the nodes of a graph & the distance between the nodes accounting for the

Scope : This algorithm, will give the similar entities amongst many in a hand-crafted data sets. For example, for two entities, “Weka” and “Wkea” it will identify them as one entity, and with a spell checker in the final disambiguating pass, also identify which of the two entities is the actual entity .

The limitation of this apparatus , though is for non-identified/no-auto suggested words in the English dictionary , such as people names like “Ekta “ and “Keta” – in which case though it will identify and mark the two entities as the same, after having “identified the typo” – it will be limited in its scope to mention which one of them is actually the right representation.

In that sense, this application is the intent prediction in a global scope, but NOT the identification of which of the singular entities is the actual representation.

The QWERTY representation

Algorithm & data structure representation :

The three rows of the English alphabets in the QWERTY keyboard are represented as nodes of the graph, with a distance between the adjacent nodes as 1 . The adjacent nodes in turn are picked up, accounting for the fact that typographical errors are more likely to occur in a neighboring zone. For example, according to the schema above , Q is adjacent to Q,W,A and S – the adjacency being in turn represented by the neighborhood of that node. For example, for D as the key, W,E,R,S,F,X,C,V are its immediate neighborhood.
The concept of the neighborhood as above, also narrows down the what keys are acceptable typos’ in the immediate neighborhood, so that the distance metric does not penalize the user
Extension of the adjacent nodes is the concept where the users inadvertently swap the left and right keys . Like instead of typing “weka” the user types “wkea” , since the fingers are not user’s synched up while typing. This can be due to inexperience in typing, or while pure errors while speed-typing (typical of surveys, call center chat data etc.) – in this scope the comparison, instead of being with the neighborhood/adjacent nodes as above, is against the left vs. right portion of the QWERTY keyboard- with defined positions for what is left and right . This ensures that when two consecutive characters are deemed as “typos emanating from left and right hand asynchronization” – these characters indeed come from left and right part of the keyboard.
As the next step, the threshold of what is acceptable difference between the words is defined. Currently this is set as one-third of the absolute difference in the length of the words being compared. This is because the difference in the lengths of the words being compared has to be within a defined length . for example a user instead of typing “asynchronous” may type “asynchronus” (missing the “o” after “n”) – then given the threshold restrictions, the distance metric, should uniquely attribute this to asynchronous. (In this case, it will happen in the 1^st step of auto-suggest itself, there by narrowing the scope – or reinforcing the uniqueness of the attribute)
If the difference in the length of the words exceeds the threshold then the algorithm places a higer belief that the words are more likely to be different and uses the Levenshtein distance metric instead.
As a final step to identify which of the two keys is the rightful key (“Lengt” and “length” are both mapped to one entity, but the algorithm till step 4 does not know which is the right “word”) , the word being compared is passed over an “auto-suggest” – which looks up predefined words in the English dictionary and outputs the candidate(s) with the lowest distance metric.

As a proposed extension to this algorithm – I intend try out different dictionary approaches for auto-suggest (together with Apache Solr and elastic search)

I used Python’s networkx to model the QWERTY keyboard, and built the QWERTY distance metric on top of it.

Complexity

As in other distance metrics, this distance metric computes (n-1)comparisons for each word , thereby outputting the word with the closest distance, O(n²) complexity. Post this the auto-suggest features narrows down the rightful- scope by candidate key(s). Overall the complexity being O(n²) .

Performance metrics & Benchmark tests

The way to measure the performance of this algorithm is by passing it through the corpus of a hand-collected data set and compare its performance (time complexity and results against other distance metrics – like pure Levenshtein distance application, Manhattan & Euclidean distance, thought by a rule of thumb Levenshtein will do better on pure distance search among-st the other two (Manhattan & Euclidean distance)

While modeling this problem, I also found a version of Fuzzy string match as in here – http://search.cpan.org/~krburton/String-KeyboardDistance-1.01/KeyboardDistance.pm and I will contrast the performance results against this(pending)

Github codes are here https://github.com/ekta1007/Custom-Distance-function-for-typos-in-hand-generated-datasets-with-QWERY-Keyboard

Suggestions, critique is as always welcome.

—

Stay Green & Growing,

Ekta

Tags: custom distance metric, experiments in data mining, graphs in networkx, Handling typos on QWERTY keyboard in your datasets, python

So, how much is your network really worth – Experiments in data-mining, disambiguation & Natural Language processing

Posted by: Ekta Grover on: July 3, 2013

So, how much is your network really worth – Experiments in data-mining, disambiguation & Natural Language processing [Part 1]

This problem is inspired from the baby problem mentioned in Matthew A. Russell ‘s Mining the Social Web on mining Linkedin data for profit& fun. For the Faint-hearted, Short of time, Show me the Meat Quick folks, skip to the results, directly , or explore my codes in Github here

Original Problem Statement & Use-case for Motivation

The original idea was to fetch the data using Linkedin RESTful API’s and score the connections using a bunch of interfaces across people, social media (new feeds etc) , groups, communications, companies (Proxy for Revenue) and other metrics to score companies. The idea was to know who are the people who really add value to my network. Then scale this up by the roles I want to grow in, and here’s the fun part of it – since We are all Startup’s in Ourselves (Reid Hofmannn’s The Startup of you) , I could pivot to these interesting insights that this data set might give. Now, you might think – why not hand curate this all, (hand) “analyze” it and see what works , except that this doesn’t not scale, people keep transitioning all the time and so do your plans, plus why not build something beautiful instead ? Besides, I really wanted to work on something that lets me experiment on end-to-end with Data mining techniques, especially on disambiguating entities and Natural language processing that I am learning.

Challenges

The problem I faced here was multi-fold, some companies did not have official revenue numbers, plus this data-set had to be hand curated) – the problem though was that my architecture was so messed up that I could no longer validate what I see is what I “want” to get, and since I was coding all this data for “scale” in map-reduce on data I fetched via Linkedin’s APi’ (JSON & later the responses from the default XML format). So, to tackle it hands-on, I broke the problem(Divide & Conquer) to work only on “companies” for now . On a second thought, it made my life simple, now I don’t have to hand curate revenue number for “all” the companies(As a proxy for company reputation, amongst others) – I can just focus on hand-curate for top 20% say, since after that, things really thin out at the tails, as is confirmed by frequency distribution of the data.

What follows, is the real hands-on from the baby problem I attacked to disambiguate & do frequency plots on these company names.

Note the frequency plot above is pruned at Top 20 entities (weighted by frequency) – as I mentioned, the total entities are 761 , but plotting them in R & Inkspace gets cluttered [If you have smarter ways to visualize , like hovering text when pointed at by the mouse, but not written text, do drop a comment below- I am yet to experiment on the visualization front.

Data set :

761 data points on imported contacts from Linkedin – about companies that people work with, in my immediate network (You can import this manually, use my code in Github here – and see your personal wealth of Network for yourself !)

Methodology & Algorithm :

The prelim step involves looking up the data to do frequency plots on raw data . This is important because it gives you an intimate knowledge of 80-20 effort s that you should focus on, which si especially critical when doing processing with Natural language data sets and gives an idea of the transformations to do on it, including filtering for stop words. This is important since I do Frequent item set mining – and don’t want to give more weight-age to stop words like “the” in “The Bank of America” , “The Boston Consulting” , “ The Bank of New York Mellon” (This would have otherwise basketed all these items with same “key”(or the “Stem” of a unique entity), meaning since it would no longer find similarity with Bank & Boston, it will incorrectly interpret the name as “The” . This, trick, of course is only possible since I had a good look at the frequency maps of the raw data-set .

[On]Answering the WHY questions

Before I get to the algorithm, I will answer a few “Why” questions , One, Why did I choose Frequency item set mining over Pure Distance measures/Semantic distance measures – the reason also lies in the attributes of the data that I had. In this case, with natural language constructs, Semantic Distance, or pure distance metric would have done significantly worser and here’s why -:

We would want – {Though Works , Thought Works , India , Thought Works, US } all to be in same basket – and we ALSO NEED a way to know the “stem” of this entity. Though distance metric would fare average in this case, but so it will fare worser in cases such as this –

{The Bank of America, The bank of Boston, The Bank of New York” } and choosing an “intelligent” threshold would be the bottleneck – with a lot of manual hovering around the “best” distance to classify, had we chosen the Distance metric over frequency item set mining .

ALGORITHM

1. Transforming the data to handling any known abbreviations(example -[24]7 inc is the same as 247)converting to lower case, removing the stop words – This will create a list of similar companies – With this I create a list of companies with Key as the first character in each company, and value as the similar entities.

[Key is to think about reducing the working space, so that it is “scan’nable by human eyes” for discrepancies]

2. Remove the encoding (Very important, since I had internationalization in my input sample)- and do Frequency item set mining on the transformed dataset – In this, I fetch the output of the step 1 and use key, value pairs such as this -:

{‘Thought’: ‘Thought works ltd’ , ‘Thought works India’ , Thought works US’}

—- which then transforms to —-

{‘Thought’: ‘Thought works’ , ‘Thought works’ , Thought works’} [Post entity Disambiguation using Frequent item set mining, note that I carefully choose the “support” for the basket as the “total length of the basket, ie. 3 in this case. Thus, “Thought works” had to occur in all baskets to be qualified as a “(similar) unique diambiguated entity”

This will finally disambiguate and interpret Thought Works as ONE entity. Finally, I use the concept of flattening a list of lists, since some entities appear multiple times – and will thus be transformed as lists of all similar entities.

3. Do data cosmetics (convert the 1^st character of each word to upper case, except in stop words, Recall that I had first converted everything to lowercase ) and Plot the pretty table on the “standardized data-set” – to get the final frequency counts !

4. Use R to plot these frequencies as a node in the graph to visualize this and edit it out in Inkspace.

Data structures, & Design Paradigm

I use Greedy & Divide and Conquer design paradigms to approach this problem and reduce the sample size of “searches for similar companies” when in step 2 . Apart from this I exploited the “dictionaries” [and hence the dictionaries] for quick searches & lookups and getting the “Stem entity” “Frozen sets” for freezing the tuples/sets after having done the frequency item set mining, clever use of “support” for the “bucket” , use of sets for searching the “frequent items” in baskets that have duplicates [Recall that the main difference between a set and list is that the former is unordered and can not have duplicates by construct- so this saves us time for lookups and narrows down the work space, in case the “entire basket” is exactly with “identical items”] , that similar & identical items will lie locally, more on this is the python(py file at Github)

My code for Python (Step 1-3) ,and the input dataset I used is here at Github , and so is the code to plot the graph in R and modify in Inkspace (Step 4) .

There are three additional secrets to solving this problem, more like a secret sauce of beauty -:

1. I always challenge myself to simulate as if I am in an “interview” – and I am asked to solve this question. Since the natural temptation would be to think hardER – hence the optimal and accounting for design paradigm (Divide & conquer, Greedy, Dynamic Programming, why resolve that which can be “outsourced” after having solved at the 1st instance) and testing for corner cases approach comes in.

2. I always try to build for scale and as generic as possible, so that I can re-use my code on similar problem space. Why re-solve that which can be borrowed from your own problem-solving space ?

3. Think 1970′s when both the disk space & processing power was limited – and this is how your algorithm will have more resourcefulness than that you think you can do at first.

Together these elements take care of the beauty and elegant’ness part of the core engineering problem . Happy solving !

CRITIQUE is always appreciated – do post your comments, on what does not work & how I can make this better. Better still, looking to collaborate on the use case, as in the opening of the post above – reach out to me at Linkedin / or here .

Results & Resources

linkedin5_inkspace (PDF file)

Linkedin_processed_file.csv (Frequency maps on the Transformed file)

All of this in Github

—

Stay Tall,

Ekta

Tags: big data, data mining, data structures in data mining, disambiguation problem in data mining, Ekta Grover, how to solve an end-to-end data mining problem, Linkedin Network, machine learning, Natural language processing, Thinking data mining problems, Using Linkedin's data for data mining

Structured Networking for Women

Posted by: Ekta Grover on: January 17, 2013

In: Uncategorized
Leave a Comment

Tailored from my original post At Grace Hopper’s women in Computing, 2012 here .

Network, Network, Network – and do it right !

Networking is not “Foreplay” – and it isn’t creating Black holes either ..

And here’s why – :

Women haven’t traditionally had the benefit of “smoke-time” networking that seems to work so well for men (nor any womanized version of it). That said – surely there must be ways that amazing superwomen around the world are making it into the corner offices.

As a society, we have been forced to focus on getting business cards rather than building relationships that can be leveraged practically. This is equivalent of building black holes around us, in an attempt that someday we will find matter that will fill the hole.

The only problem with this approach is two fold: one, you are selling a bad product, and two, your self-promotional pitch was so unmemorable that the speaker decided not to set aside some processing power in her/his brain to remember you. No one remembers someone with an inflated sense of self-importance. There is a classical statistic from speed dating, which found that the men who hardly spoke but genuinely listened ended up getting dates more often than the other men who focused on speaking for a greater fraction of the speed dating.

At another end of the spectrum, I observe, often to my utter dismay, the fawning or servile approach. Rather than feeding their own self, the focus reverses to the person whom they deem influential enough. It may not be a conscious effort, and may simply stem from the lack of awareness of one’s real do-ablity. Specifically in Eastern cultures, including India, people are maneuvered to cushion their interests, to the point of obscurity, which is why sometimes the information and opportunities passes them by, and they don’t even take a notice. So, if you have a strong point, tie it together without taking the detour.

I said earlier that the focus should be on forging mutually befitting relationships, and here’s why: when you focus on creating black holes, as I call them, you are missing out on accidental benefits – amazing support systems, sponsors who would endorse you or vouch for you, mentoring opportunities, or just plain unbiased viewpoints – when you need it. So share something phenomenal that people will remember: spin relevant stories, be inquisitive and keep re-learning, and most of all, stay true to who you really are.

Differentiate the position from the person, be assertive, don’t downplay yourself, and yet don’t oversell. It is a classical balance problem.

” Network” , and network with an end-goal in mind, which is two fold : one, to create spheres of influence, and two, to create information advantage” .– Saundarya Rajesh, AVTAR

Four quick thoughts that sum this up:

1. Focus on the end goal and then work backward: Accidents -> acquaintances ->associates ->advocates -> allies. The point is simple if you want to be moving from “foreplay” conversations to forging lasting relationships for mutually befitting relationships you will invariably have to focus on quality.

2. Measure it: If you are not moving from acquaintances to allies over a period of time, the networks don’t serve a deeper meaning. You see, input alone doesn’t count, what you made of it does. The “measuring” bit doesn‘t have to be as flashy as excel but should suffice to give you a reinforcing feedback loop on your overall interpersonal and communication skills.

3. Have a compelling elevator pitch: An elevator pitch is something that gives a compelling introduction to who you are. Craft one, and try to improvise depending on the listener and her/his background and Intellectual giftedness. Of course this means that you also change dynamically depending on the sophistication that the listener might have about your profession.

4. Demonstrate equality: You can’t have an enriching conversation when you do not consider yourself to be one among equals. Equality begets equality, and it begins with awareness of who you are and what you stand for. Reboot your operating system and get moving – and yes, practice, deliver and then practice again.

The theory of 10,000 years from different schools of thought says that deliberate repetition is a key. So go develop a lens for the world you want to live in, and grow – and then just reverse engineer what you want to achieve of any relationship that could help you towards it and work your way through it. Of course, it assumes you focus on forging a mutually beneficial relationship and not just trying to get around into a parasitic relationship.

Guess what, we all like to push compelling and competitive candidatures.
Now, Go Rock!

Also published here – http://www.flexicareersindia.com/newsletter/jan2013/newsletter.htm

Tags: Ekta Grover, Grace Hopper Celebration for Women in Computing, Structured Networking, Structured Networking for Women

Emerging thoughts on Quora’s Business model

Posted by: Ekta Grover on: September 11, 2012

In: Uncategorized
Leave a Comment

The larger Question and the meat of it thereof : Wouldn’t the user pays subscription model work as well for Quora ?

Here’s what I mean – As a user , I navigate about 10-20 questions in Quora every time I take out “time” I am willing to engage myself in. And then I go back and start to work on something, hit discussions in a multitude of channels and I forget – and that is when I want to refer to the same fact – in the organized way I first saw it in its context.

I think, that is part of the opportunity for Quora. As Achilleas Vortselas mentioned (at Quora) , “Quora as middleman”, I envision the next layer of structuring and personalization to build on top of the collective knowledge of the community.

I want to (and I hope that other users want to, as well) organize this information in a useful way. And that could be highlighting the pages, that are only “specific to me” , like individual page reads. Similarly, as a user I would be willing to pay for premium services like organizing my knowledge and contacts, and since it is personal, non-obtrusive to any one else. It’s my personal learning dashboard, that I can cross refer years from now, since I find information in a tier based fashion. One, by meta tags AND/OR (Quora) Boards, and two, by my personal notes.

[Hypothesis: Since we are building social capital here, the content management, will take care of itself. ]

Challenge & opportunity : Information retrieval from the end user’s perspective.

[The Public Good…]
With literally tons of CEO’s and Executives engaging in this model – subscription should be able to make money. May be, in the future, Quora would make part of its revenue from firms funding their top employees, that is the future of Crowd Sourcing Revolution that we are beginning to see.

The future of this will be – the firms sponsoring employees/ committing resources to work on abstract problems and new concepts and “Grooming Aggregators” , since it will be far cheaper than the Home-Grown Talent model , difficult to source, retain and keep engaged all round the project.

In some sense, it is how Rating agencies make money. Everyone else pools in, because everyone else does, and that is the social good at its BEST.

[Hypothesis: Knowledge workers have always and will continue to pay for tools that make them more efficient ]

Challenge & opportunity : Of course that needs high computing and read-writes to store the user specific Information,and engineering a solution that can scale as well. Which is where the guiding rule of making money comes in. Make money roughly equal to (Revenue-Cost of engineering the solution), and should be positive.

So my Quick points here –
User pays subscription model

a. User specific page read/write(s) and features to organize and personalise the information retrieval for later use.

b. Putting structure in YOUR personal Network – listing and saving personal content/notes for lookup to organize the contacts from mere “followers and people who follow” paradigm .

c. The Public good : Rise of Crowd Sourcing and Social capital.

Business opportunity : Innovating on the user specific needs(Behavioral & Business needs) and tailoring subscription plan that match these needs .

PPS: Don’t pick me on words 🙂 Make money is dirty rule, but that’s what we are taking about here .

This article is a part of my original piece at Quora here . Cosmetics and Technical updates/stack probe in progress.

—

Ekta

On Energy, Intellect , Brand “You” and finding the CPI

Posted by: Ekta Grover on: September 4, 2012

In: The Think Tank
Leave a Comment

(Dear God, please tell my Mom I am doing very fine, and I still think that studying Quantitative Economics, and flying over 16 countries all the way was worth this Education)

Last week I was on my longest flight so far from the far west to the far east, and on my way back I bumped into a Gentleman who was coming back from Mongolia (the capital , Ulan Bator, to be precise). Talking with him made this trip one of my most shortest trip ever. He was into Marketing side of Construction and myself a bottom up economist and auto-didactic. So what could we talk about ?

I had no clue of Construction, or Marketing in construction business or even Mongolia . All I knew about Mongolia was that they have huge deposits of coal . Some people might hit a stalemate at that, what can you really talk, while I was like – tell me more about it !

We talked about the Mongolian empire, how the Mongols built the Monasteries all the way to Hungary(We could not agree whether the Mongols came up-to Hungary , but that is for some other time), it’s comparison with the Romans and the influence of Mongols on food. And seamlessly, we moved to the Chinese dynasties the Han empire, the Qing dynasty and how the emperors had tried to bring Buddhism to bind people together , and then we moved to the dispute over Tibet and Taiwan , and what I had learnt in the Harvard Project for Asian & International relations, in Taipei , Taiwan, that I was just coming from.

Come to think of it – it was a powerful conversation(and I really learnt a lot). Now that is the thing, finding CPI , or the common point of interest sets you apart from the rest. And come to think of it, it’s really an art. For once he did not tell me what a superwoman I was ( I like telling myself that I am one) – but I instantly knew that he enjoyed talking to me, as much as I did.

This being able to talk to people “about them” has been my journey of the last 4-5 years. I have practiced and matured it on CEO**’s and Prof’s – but that is not to brag- the point is about learning.

So my quick take-aways –

1. People are always willing to “Teach” you, once they see the intellect + energy + enthusiasm in you to learn .This is how we grow, and is a more sustainable way of learning rather than feeding some decaying facts into your brain.

2. You never choose your mentor. Your mentor chooses you. Point. (Borrowed from Indra Nooyi)

3. Always push meritocratic people, even if you have to go out of THE way to get them some limelight, and then just keep passing the ball.

4. You don’t ask, you don’t get. Under –rated, over-said, but true.

5. No matter what you want to learn or know, stop the foreplay, and get to the CPI , quick enough to hold them.

6. What goes around comes around – Gratitude can not be taught, practice it.

7. There is no single character that can beat your Genuineness – you will know it when it has arrived to you.

8. Push & stretch yourself, and then stretch yourself a little more.

9. You are beautiful, because of your words and your self awareness and that is something that sets you apart and a Brand “YOU” in this dynamic very competitive world.

10 . We are more than the organizations , and schools we represent – and in some way the hierarchies of power are shifting bottom up . How YOU speak really defines your School, your community and your organisation and your people.

11. NEVER stop learning.

12. Everything around you, has always been there, but when you meet people who have been there, done that, it sets a context that you can build on, like things, people, places, empires have suddenly started to exist in your conscience. Beg, Borrow, steal, but GET that context right in your head, you will amazed at how much you can really learn, after-all .

13. Stop being mediocre, and stop when you have made your point. And, hold your drink, if you don’t have that super take-away 🙂

(Emerging ideas from my book – 2o’s is the new 30’s , planning the dots, you will want to connect backward, Dedicated to Annie Fan, a superwoman who knows how to listen, and listen well)

—

e-hugs,

Ekta

** John Kearon, CEO, Brainjuicer that even shaped 4 months of my work – My Master’s Thesis – On how we think we think, is not how we really think . In some ways, that is THE bit about connecting the dots.

Tags: 20's is the new 30's - planning the dots, CPI, Ekta Grover, HPAIR, Meritocracy, you will want to connect backward

Participatory Public Policy – What India can learn from the West

Posted by: Ekta Grover on: August 11, 2012

In: Uncategorized
1 Comment

Traditionally, the goal of e-governance projects across the world has been broadly, twofold – ensuring a faster service time to address the pain points and lowering the cost of delivering the public services. However, as the new technologies such as Social media and Big data emerge, Governments should re-think of e-governance not merely as technology centric – but as a tool for participatory public policy.

The possibilities that social media has opened up for governments are many. For starters consider the GovLab, an initiative in USA which works with senior government executives and thought leaders from across the globe – and runs controlled experiments for a better public policy, while also helping reduce the costs of providing services. Or consider @sweden initiative by the government of Sweden, which along with an advertising company runs a twitter account controlled by ordinary citizens for seven days, on a roll basis. This project aims at better governance, engagement of its citizens by amplifying their voice in a transparent manner- while also supporting the tourism industry in Sweden.

The other end of the spectrum are projects aimed at reinforcing good citizenship based on behavioural economics – across health, reducing fecal pollution by dogs by inducing the dog-owners to clean (Taiwan) , and creating incentives to ask for receipt and beat corruption in tax compliance (mainland China). These are experiments the rest of the world is exploring top change behaviours that stick.

In the Indian context – now consider delivering mobile health care, where social media listening tools can offer countless opportunities to help deploy resources in an optimized manner – allowing for an efficient delivery to the “last mile”, and developing key infrastructure and utilities in the currently underserved districts. Or, consider supplementing the RBI’s latest e-governance application and online tracking system in its foreign exchange department with social media initiatives such as those developed by Clemson University, USA– that used social media listening tools to predict the direction of stock market and foreign exchange. Having similar projects can help triangulate the consumer and Foreign investor sentiment helping the central banks and governments to handle pressing monetary policy issues in a dynamic manner. The benefits are manifold – allowing for a better planning and attracting the fleeting FDI by signalling the trends in consumer market thereby re-enforcing that India is indeed a key market from where future growth will emerge.

Or look at the Aviation, FMCG or telecommunication sector – by bringing symmetry to the information in the markets – we can supplement existing institutions like the Competition Commission of India, CCI to extract useful information on consumer surplus to help decide on how best to allow the competitive landscape emerge, and help the businesses grow. By supplementing their insight with an additional triangulation of relevant and contemporaneous data, this will allow us to do what is actually in the consumer interest, rather than what “they” think it is. This is the helm of participatory governance.

Bringing such innovative disruptive projects will also help entice top talents into public sector and help revive the competitive landscape by restricting brain drain to the much coveted private sector. By using wisdom of crowds and crowd sourcing its problems – this will create a rich vibrant pool of ideas in a fair meritocratic manner, while keeping the overall costs of the project considerably low. We opened up our markets for the world in 1991, and now it is time to open the governance for our citizens and best brains, by making it relevant. Of course like all the sectors, in its bare bones, it has challenges manifold. The first is a cultivating the mindset – but the good news is the developing trend of today’s dynamic youth to lead parallel careers and consultative projects for the social good. Then there is a disheartening mere 10.2% internet penetration in India, the need for regulating the “social listening” if at such pilot programs are launched . Yet against all of this, we together, are more resourceful than the resources that constraint us.

In conclusion, amplifying the voice of the citizens in a transparent manner can help the governments do a cost benefit analysis of which public goods to develop, allowing for better five year plan – essentially outsourcing it’s public policy and developing a truly meritocratic governance. This in true terms is, the largest (democracy) – for the people, of the people and by the people.

The West is changing its traditional structures, and shifting the democracies bottom up – it’s time we caught up, too. Incredible India, after all.

Tags: Ekta Grover, Participatory Public Policy, Public Policy

Not just a Mark alone – The difference between Education and Getting Educated

Posted by: Ekta Grover on: July 13, 2012

In: Uncategorized
1 Comment

[In hindsight, everything falls in place, everything.. ]

No matter how good I perform, I am never satisfied with my performance in the “Blue Book”. One reason for this, is that my handwriting has always been so bad, that if asked to read it myself- it would take me an hour to decipher my own letters. (Thankfully I now type all my assignments.)

So taking you back in time to 2006, and one of this, “Blue Book Blues Chapter” . As always I was not happy with how I had performed. And I found myself sitting before the head of my department, Prof. Nitin V Pujari and in tears for a mark- to which he asked- “It is just one mark, does it really matter enough to weep?”

Sulking and clearing my chocked throat, I replied, “Yes Sir, because I have worked for this extra mark.” He not only did give me that extra mark, but also told me something that I will never forget, and it all comes as Déjà vu to me, even to this day.

“Tomorrow you will graduate and today you are more than willing to work (in Industry). But day after tomorrow you will want to come back and go back to study again. That is life, but you will not understand it today.”

Like everyone else, while at college I always considered myself to be a hard working student. But life that it really is, I seldom ever looked at opportunities to grow beyond as a person, to collaborate with network of researchers, or passionate fellow students who were then building software to remotely manage their Personal computer’s , or rewiring the bus(es) in their computer with their new found knowledge of Integrated circuits. As for my version of 2006, I never understood how to apply my new found knowledge, one that consumed 4 years of my life, and which I got by securing an All India rank of 188, something I am proud of, to this day.

So we know there is a Gap, with something amiss, “something” we expect “somebody” to give us “somewhere”. Our parents and often we ourselves come to think of this “something” as “Education”, “somebody” as “Teachers”, and “somewhere” to be the “College”. The question that then comes is three-fold – One, How to apply and Two, how to make most of your Education and Three, How did I change, and How can you evolve, too ? (Hopefully for better) I will try to answer all these questions sequentially, borrowing ideas from my forthcoming book.

The reason we don’t apply is because we either do not know, or that we focus too much energy on escaping the rat race, rather than building our skill sets and grooming ourselves. Anyone who understands the basic Demand-Supply , can intuitively understand that if there is a high supply of a skill, a neck to neck competition will follow towards supplying the limited demand.

Unless you really know what you want to do with your life – when you leave the college, you will really be Generalists, certified to be moulded into organizations. And , as the organizations start to demand plug and play workers – this will mean that you will be judged in your respective jobs on what you bring to the table and this will eventually impact how to shape your career and how soon you grow both personally and in your careers.

How to make most of your education – really is much personalized blue print, that I wish all students to develop while still at college. In essence this links to the “Application of ideas“ I mentioned above, and would require expanding the “eco-system of your skill sets” in the direction of where you really want to grow. This means to plan your dots backward, you will have to CREATE those dots in the first place.

And lastly, How do you start – well, knows no one answer, in fact there is no one single answer. It is about stretching yourself and then stretching a little more – for willpower is like muscle that has to be exercised every single time you are faced with something you think you can remotely achieve. The organizations and their needs are changing, and with that a you have to evolve too, and fast enough – and most importantly, continue to invest in yourself not for a job, but for a life-long process of learning and growing as a person.

When you really look at life, it is just a series of actions that we consciously choose to do that shape our perceptions, and thought processes – and in these formative years it is this thought process that decide how soon will you get there. Your mind, can achieve truly miraculous feats, if you let it. It is learning the difference between information, which is just rapidly aging facts –and to that truly learn we must make sense of this information and use it to build knowledge .

When I was a Bachelors student, I had never imagined I would change lanes completely from Computer Science to Quantitative Economics, my current Masters, which I shortly finish. In between this time, I have travelled 6 countries across 2 continents, failed another exam, and rebuilt myself all over again in a completely new culture and language , I knew nothing about. I have found in myself the courage to shut up that voice that tells me why I can’t do what I set to do, and to re-evolve and compete with my international peers in a Ph.D. program, when I did not even have a Bachelors Education in Economics.

I have matured as person, learnt how our inactions count as much as our actions and most importantly – I have learnt how to listen and express gratitude for all that I have. And for who I am today, I have to thank my teachers, for telling me all this even when I was I was not receptive, for otherwise, I would have missed this boat ; I now call life, altogether.

—

ekta

The way we the re-evolve, re-learn & re-optimize – The real evolution of applied economics (and the death of SEO, SEM & SMO’s)

Posted by: Ekta Grover on: May 1, 2012

In: Uncategorized
Leave a Comment

[On applied economics in business models, revenue growth & blue ocean strategies for the passer-by customers]

And then, there was a “Google” with its much coveted page ranking algorithm, but the users and the companies betting their money & resources on the web wanted more, and they wanted it fast.

So was born the Big data wave (well there was a lot more in the space between.. ). So what is this Big data wave, and what is the ecosystem behind it?

Quite simply it is this – given a piece of information, how likely is it that this behaviour will be “consistent” – or that given that I see a response A , what is the probability that it will continue to be (in some confidence intervals) , still A ? At it’s heart, Bayesian classifiers and the whole lot of Quantitative Econometrics & modelling tries to do just this – except the fact – that the data that we indeed have- is just NOT enough to MAKE scientific decisions. Now let’s understand why this is so –

Simply because the NEED of capturing so many dimensions of data have only evolved very recently, and thus given that we cannt really back test the models, is as good as saying, well , look we have a CLO \ CDO instrument ( the ones that got the 2008 Global Financial meltdown) – bu we do not know whether we can really bet on it – and now , surprise ! Companies value it like they value money – since everybody does, this must be WORKING .

Then is this picture true/not true ?

Well it depends, yes, having an additional way to triangulate / benchmark the Business decisions must be good , having a deep insight into customer buying patterns, and knowing how best to facilitate his decision making must be good too, but the fact remains simple –

“ Given that that information becomes a commodity, how do we price it ? ” – how do we know the price points that this information can be “traded at”. This question was originally raised in a Big data talk at the Churchill club, but it has stayed on. Ever since, I have asked this question to the CEO’s of Research Analytics firms, marketing & PR firms and to people in the Big data wave. They say, well right now, we are beginning to understand to measure different dimensions of the willingness to pay of the consumer, and the more and better insight we have into this , the better will we be able to “classify this information” – it is like the intelligent filtering , the more intelligence pumps in, but more is never enough , since it is so qualitative.

On top of this wave rides another wave, which is with Social media listening tools, trying to draw trends and analysis, and adapt before being lost in the over social- connected- spilling wave. This wave relies on trusting that (Thus if you throw in an assumption that people put their best foot forward, most of the quantitative models would collapse , on the predictability count – and they, like money is valued, solely because “everyone buys this currency” – so while the consumers get more and more aware of the “listening” they would get more “probabilistic-ally sophisticated” – though it is hard to say , whether it is good, or bad – surely that evolution will happen in my career trajectory, and would be wonderful to ride this wave, and not just be another “passer by” . So welcome again, to the new evolution , which si essentially “inside out” of the consumer needs, but just with some frills attached , may be because “the consumer himself does not know what he wants ” .

There is one more thought that I want to leave you with – ” Those that make the choices, simpler, faster, more guilt free, and use rewarding schemes, will get there first- and ofcourse cascading through the choice paradigm of the consumer has never been as challenging as this. So knowing I love coffee, only when it rains, and only in the evening, is a powerful information – This is the goal – knowing, when, what, how I like to consume the goods & services – this is the future of personal marketing.

( I used to write a lot about Permission marketing, but I think the loyalty gets all time low, and so does the ROI on the same)

So, Modelling has evolved, so has revenue metrics , “customer satisfaction” and the “needs of the customer” – the age of good to have, but why pay – but the good news is though the customers spend a lot of time with differing services, they are getting less price sensitive when it come to anything digital/technology ( Controlling for the business cycle affect) . In my previous post, I also described, how passer by’s are being tricked in consumers (pun-intended) – This in my opinion is the evolution of economics – and this evolution is VERY PROMISING, indeed !

So welcome to a world where all sciences come together in this one world !

And businesses always want more, they value growth & blue oceans, that’s where the meat is , afterall !

—

Ekta Grover

Tags: Bayesian classifiers, Big data wave, business models, economics, passer by customer, Quantitative Econometrics, scientific decision making, SEO

What the new digital currency can teach us about the e-commerce, and developing new Business models around it ?

Posted by: Ekta Grover on: March 15, 2012

In: Uncategorized
Leave a Comment

(Dead God, Where ever you are, please tell my Mom that it was the best for me to change lanes from Computer science to Quantitative Economics , and now understand the world more completely 🙂 )

[Includes excerpts from the “Alternative Payment methods” Workshop with Kai Boyd, CEO Deal United , March 2012 ]

We know the stuff about digital games, tracking our trends and the Big data, and the conversation starts exactly from this “How can make money in the ever increasing distributed e-commerce world ?

I attended a workshop with Kai Boyd, CEO Deal United a firm that invented Pay-By-shopping. Now virtual currency has been there for long, with Paypal being the market leader by far (by Revenue)

These are some of the statistics he mentioned (**Please see disclaimer)

Virtual currency constitutes $.5 Billion of the $1.5 Billion dollars of the Facebook revenue.

Of the 850 million users, 20-30 % of the users play games online

Of this the majority of the users are 22-55 (female)

Consumers “bought goods” worth as much as physical/real $ 10 in the virtual garden . Could you believe people buying virtual pigs for $ 10 ??

With a brief background on the online retail in gaming and other goods , we discussed the monetizing opportunities in the digital goods space. We moved over to the case studies & each team had to make piches on strength of the Business model, and every metric we could consider for the growth of the Business model – which could be customer growth, Merchant growth per month, facility of paying in local currency for transparency, Double layer security, No block up fees like Paypal . (more on this later )

The following were the case studies we had to perform :

SupersonicAds , Pay by Shopping , Click And Buy , sofortüberweisung , Amazon payments and ofcourse Paypal .

As for Pay.by.shopping and other players in this space – their model is simple :
Track the users using the sophiticated analytics tools that we have, to know what a consumer would be “willing to buy”. Through this, you have a panel data of consumers and a matrix of things he might want to buy. All this could effectively be tracked via Socil Media Analytical solutions such as semantic listening/research tools like crimson hexagon , radian6 .

[ Super hint : Quora pretty much cover very detailed crowd-sourced answers – please check them 🙂 There is a super amazing post on Zynga’s analytics and(early) success again at Quora, check it out . ]

Now pause for a moment, and think about the last time you went online to play a game, and quit just before you had to pay for an “advanced round” to buy sheep, or goat or whatever and you quit (Who buys Virtual sheep) Now, you see why they say, “your actions are as important as your inactions” – which is to say that by declining you are still giving the Analytics algorithm, if you would, an opportunity to know about your preferences.
Next week is your Girlfriend’s birthday, and facebook prompty reminds you of flowers, or expensive jewellery, or a dress (you get the point) – But here’s the trick –
You get to pay for the flowers and keep the “game” for free in a virtual world.

While according to Behavioral Economics and Neuroscience, you kept the “game” or software, or whatever it is you were willing to “try”, but not “buy” – you get an extra ulility from the free-seemingly-no-strings-attached .

EXCEPT that –
According to what we know from “Influence” and Gaming – once you keep something, the mind’s natural tendency is to continue using it, ven though after sometime you would have to pay for it. Like, Free copies of newspaper, like 3 months trials (cancellable afterwards) , and the list goes on .

Where is the OPPORTUNITY ?

There is a BIG space, that all Analytical companies are trying to harp on, develping relaible means to interpret this “Big Data”. It is HUGE money and a very recent way of making scientific decisions in revenue growth, sales, customer segmentation – across Telecommunication companies for tariff plans and Online retail companies & supermarkets for customer segmentation for premium & commodity products
(The telecommunication sector and Online retail are two main spaces where the Big players have virtually killed most margins to boost volume growth)

The Workshop

Later in the workshop we were divided into teams and we had to compare various “digital currency’ payment companies , like Paypal,Pay-By-shopping etc. I think I did a fair job by crisply pitching why click&buy was better vis a vis paypal.

The seemingly innocuous comparison between the Click&buy vs. Paypal just reduces to this —

1. Double layer of security – SSl coding & the security from normal bank registration processes

2. No fixed sum of money that rests in your account and verifical procedures – A new account is directly piad for by the debit card (unlike Paypal that “blocks” about $1.95 in your account.

3. The possibility of paying in your local currency – directly from your card , or the conversion which is transparent (paypal got on to this )

4. The huge Growth rate of new customers each month , new merchants and finally ..

5. Creating an eco-system of “local relevant offers” – that can be used to increse revenue streams from existing customers. Thus, capturing on the “localization” is a fair trend to bet on.

One more thing, we were given just 15 minutes 2 teams each of size 6 , and our team hot 3 topics and the other one 2. I broke down the team into subteam of size 2 to use the 15 minutes more effectively (so , teams of 3*2 = 6 with one topic each) . At the end of the workshop, the other team said it wasn’t fair for us to divide the team internally.

Well, welcome to the corporate, and getting things done. Point.
Think different.

—

Ekta

** All views, unless in Italics are my understanding or synthesis from the workshop. Please quote with discretion.

Tags: Click And Buy, Deal united, digital currency, Kai Boyd, pay.by.shopping, paypal, quantitative analytics, Social media

Older Entries

Finding symmetries in an unsymmetrical world ..

Protected: Don’t be so focussed on the greener grass on other side, that you forget to water your own plants – aka being a superwomen

Custom Distance metric for handling Typographical errors optimized for a QWERTY keyboard

So, how much is your network really worth – Experiments in data-mining, disambiguation & Natural Language processing

So, how much is your network really worth – Experiments in data-mining, disambiguation & Natural Language processing [Part 1]

Original Problem Statement & Use-case for Motivation

Challenges

Data set :

Methodology & Algorithm :

[On]Answering the WHY questions

Data structures, & Design Paradigm

CRITIQUE is always appreciated – do post your comments, on what does not work & how I can make this better. Better still, looking to collaborate on the use case, as in the opening of the post above – reach out to me at Linkedin / or here .

Results & Resources

Structured Networking for Women

Emerging thoughts on Quora’s Business model

On Energy, Intellect , Brand “You” and finding the CPI

Participatory Public Policy – What India can learn from the West

Not just a Mark alone – The difference between Education and Getting Educated

The way we the re-evolve, re-learn & re-optimize – The real evolution of applied economics (and the death of SEO, SEM & SMO’s)

Follow Blog via Email

Pages

Archives

That’s me

My live Rantings @Twitter

Recent Posts

Top Posts

Blog Stats

Categories

Top Clicks

Meta